Releases: choxos/rtransparency
rtransparency 1.0.0
First stable release.
Eight transparency indicators in full-text biomedical articles (PubMed Central XML or plain text): conflict-of-interest disclosure, funding disclosure, protocol registration, novelty, replication / external validation, data sharing, code sharing, and disclosure of generative-AI use. Each prediction is paired with the exact triggering text, with reproducible accuracy benchmarks. Includes multilingual COI and funding detection (English, Spanish, Portuguese, French, German, Italian), corpus-scale batch processing, and corpus summary, scoring and plotting.
Builds on the original 'rtransparent' tool of Serghiou et al. (2021), PLOS Biology, doi:10.1371/journal.pbio.3001107. See NEWS.md for details.
rtransparent 0.9.11
Citation, documentation, and packaging polish.
- Added inst/CITATION: citation() now returns the package and the foundational Serghiou et al. (2021) paper.
- New vignette 'Scope and limitations' documenting indicator semantics, known limitations, the output schema, and the link to FAIR-assessment tooling.
rtransparent 0.9.10
Replication is now accuracy-corrected; fresh validation of replication and AI.
- Replication added to rt_accuracy. A new replication-enriched validation (250 open-access articles, 111 hand-labeled positives) gives a stable sensitivity (92.8); the representative specificity (98.5) is carried from the 2023 1000-article sample. rt_summary() now reports an accuracy-corrected replication prevalence.
- AI disclosure validated on 2024-2025 articles: disclosure prevalence in unselected open-access literature is about two to three percent (far below curated corpora) and the detector's positives were precise, so AI stays uncorrected pending a disclosure-rich corpus.
- No detector logic changed; all held-out benchmarks unchanged.
rtransparent 0.9.9
Conflict-of-interest and funding detection in five more languages.
COI and funding statements are now detected in Spanish, Portuguese, French, German and Italian. On 70 open-access articles per language, COI detection rose most for monolingual articles (German 33%→97%, French 70%→80%). The new tokens are language-distinctive and absent from English, so the English detectors are unchanged (COI 100/91.8 on the 2023 sample; held-out Serghiou benchmarks untouched); the funding patterns corrected two mislabeled Spanish/Portuguese funding articles, lifting 2023-sample funding to 95.0/95.2. Because the text detectors share the PMC cores, the new languages also work in plain-text input.
Data-availability multilingual detection is deferred to a future release.
rtransparent 0.9.8
TXT/PMC detector parity.
rt_coi(), rt_fund() and rt_register() now route their text through the same detection cores as the PMC detectors, replacing separate, weaker logic. Measured on text from the 1000-article 2023 set (sensitivity / specificity): registration 46.2/98.7 to 90.4/98.4, conflicts of interest 88.8/86.3 to 88.6/90.4, funding 79.1/89.5 to 79.3/90.5. The remaining gap to PMC is the XML-only structural routes a plain-text file lacks.
Adds a TXT-parity benchmark (inst/benchmark/results_txt_parity.{csv,md}). The PMC detectors and held-out benchmarks are unchanged.
rtransparent 0.9.7
Corpus-scale batch processing.
- New
rt_all_pmc_dir()processes every PMC XML in a directory (or a vector of paths) throughrt_all_pmc()in one call: resumable via an output CSV (skip-done + chunked append), per-file failure isolation (is_success = FALSErows), a progress bar, and optional parallelism viafurrr+ an activefuture::plan(). furrrandfutureadded to Suggests (used only forparallel = TRUE).
Held-out Serghiou benchmarks and the novelty/replication gold set are unchanged.
rtransparent 0.9.6
The hand-labeled 2023 validation sample reaches a round 1000 articles (the full cached open-access PMC set).
Funding. The Portuguese no-funding declaration "os autores nao reportam qualquer financiamento" is now read as absence of funding.
1000-sample metrics (sens/spec): coi 100/91.8, fund 94.8/95.3, reg 84.6/99.2, nov 90.2/93.3, rep 82.4/98.5, data 91.1/97.8, code 93.9/99.0, ai 100/100. Held-out Serghiou et al. (2021) benchmarks and the novelty/replication gold set unchanged.
rtransparent 0.9.5
The hand-labeled 2023 validation sample is expanded to 980 articles (265 new), with a focused improvement to replication precision.
Replication precision. The detector now suppresses limitations/strengths discussion paragraphs, editorial statements about reproducibility as a value, reviews assessing the "validity of" a method, machine-learning evaluation-metric lists, results reproduced only within trial arms, and negative results ("not always replicated"). Replication PPV: 33.3 → 40.0 (gold set), 44 → 48.1 (2023 sample, specificity 98.5); sensitivity unchanged.
Funding. "did not receive any external financial support" now read as absence of funding.
980-sample metrics (sens/spec): coi 100/91.7, fund 94.8/95.2, reg 84.6/99.2, nov 90.1/93.4, rep 81.2/98.5, data 90.8/97.8, code 93.8/98.9, ai 100/100. Held-out Serghiou et al. (2021) benchmarks and the novelty gold set unchanged.
rtransparent 0.9.4
The hand-labeled 2023 validation sample is expanded to 715 articles (210 new), with three small detector fixes surfaced by the new batches.
Funding. "there are no source of support", "not supported by any organizations", "no external sources of funding", and "conducted without the receipt of any dedicated grant or financial support" are now read as absence of funding.
Novelty. "previously unobserved" added to the gap-claim cues; "undertake" added to the priority verbs ("the first to undertake ...").
715-sample metrics (independent): registration 88.9/99.6, novelty 89.1/94.5, code 92.0/99.7, replication 84.6/98.0; detector-adjudicated funding 93.2/95.5, data 90.9/97.9, coi 100/89.8, ai 100/100. Held-out Serghiou et al. (2021) benchmarks unchanged.
rtransparent 0.9.3
The hand-labeled 2023 validation sample is expanded to 505 articles (120 new), with three small detector fixes surfaced by the new batches.
Funding. "not financially supported by any funding or institutions" (the adverb "financially" previously broke the match) and non-English no-funding declarations (Portuguese "nao teve fontes de financiamento", Spanish "no recibio financiacion" / "sin financiacion") are now read as absence of funding.
AI. A section titled "Statement on the use of artificial intelligence" (and similar "... on the use of AI / generative AI / LLMs" headings) is now recognized as an AI-use disclosure.
505-sample metrics (independent): registration 88.2/99.4, novelty 87.7/95.8, code 94.1/99.6, replication 81.8/98.0; detector-adjudicated funding 91.8/95.3, data 88.6/97.7, coi 100/80.6, ai 100/100. Held-out Serghiou et al. (2021) benchmarks unchanged.