Research Integrity in an Era of AI and Massive Amounts of Data
The challenge of ensuring the integrity of biomedical research is enormous. The NIH has shone a spotlight on the issue with its Leading In Gold Standard Science – An NIH Implementation Plan publication, but awareness and proposed solutions have been appearing in the literature more frequently in recent years. Drs. Bauchner and Rivara have recently published two articles (in the traditional medical literature) identifying the most pressing challenges and suggesting where solutions are likely to exist
Here, they expand on those articles and offer more details on solutions. I find the discussion of the replicability and reproducibility of research particularly instructive. This article is a bit longer than our usual, but I felt like it warranted the space.
Adam Cifu
Much has changed in the past decade regarding research integrity. Massive amounts of data are now available through electronic health records, large cohort studies, and data-sharing agreements. AI is now pervasive in medicine, including its use in scientific discovery, analysis, and even authoring manuscripts. Combined with the unintended consequences of open access and the growth of predatory journals, maintaining research integrity is becoming more challenging. Indeed, the increase in the number of retractions, evidence of data dredging, and the politicalization of evidence has contributed to public skepticism about science.
Research integrity is a broad concept and “refers to all of the factors that underpin good research practice and promote trust and confidence in the research process.” It includes the concepts of honesty, transparency, accountability, respect, and rigour. Research and scientific misconduct are also important to understand. Research misconduct is usually defined as fabrication, falsification, and plagiarism, and is the responsibility of authors. Scientific misconduct is a broader term that includes research misconduct, but also includes issues such as inadequate peer review, which has led to thousands of retractions, undeclared conflicts of interest, lack of registration of randomized clinical trials, authorship disputes, and failure to publish the results of an RCT. Unlike research misconduct, which reflects specifically on authors, inadequate peer review is an editor’s responsibility. For some of the other issues, such as undeclared conflicts of interest or authorship disputes, differences of opinion may arise, and third parties are often needed to resolve these issues.
Given the importance of maintaining research integrity at a time when the public is confronted with misinformation and disinformation, and appears less sure of science, the following recommendations may be helpful in support of research integrity.
1. Clinical observational studies whose intent is to show a causal association between an exposure and an outcome, such as trial emulation studies, should be registered prior to data collection. Like with RCTs, registration will preclude data dredging and assist with the reproducibility of studies.
2. AI needs to assist humans with initial peer review. Data are emerging that AI is effective and potentially better than human peer review. More specifically, a recent publication suggests AI can assess adherence with the CONSORT reporting guidelines. AI should be an adjunct to human peer-review, assessing registration of RCTs and meta-analyses, potentially adherence with the scores of reporting guidelines like CONSORT, PRISMA, and STROBE, as well as reviewing methods and statistics. Time will tell if AI will be able to check references for accuracy, screen for image manipulation, and offer an opinion about originality. Concerns about the confidentiality of data need to be addressed.
3. Replication and reproducibility are complicated issues, and there is no broad agreement on what they mean. For example, Nosek and Errington reject the traditional definition of replication — repeating an experiment and getting the same results and instead believe that “replication is a study for which any outcome would be considered diagnostic evidence about a claim from prior research.” Recently, the NIH defined both:
replicability is “the ability to perform the same experiment or study using the same methods and conditions to achieve the same result.”
reproducibility is “the ability of independent researchers to test a hypothesis through multiple methods and consistently achieve results that confirm or refute it, ensuring findings are generalizable and robust across different approaches.”
A distinction should be made between lab-based science and clinical research. In lab-based science the goal is to ensure that the same experiment, if repeated, will produce the same. This is different and more complex in clinical research. Because of the vast amount of data now available and the myriad ways in which analyses can be conducted and variables coded, getting identical or similar results may not be possible unless an investigator has the identical data set and uses the same analytic approach.
This was recently highlighted in 2 different manuscripts. Wang and colleagues conducted a systematic review of articles reporting the relationship between red meat and mortality and identified 70 analytic approaches among the 15 publications that included 24 different cohorts. They then used specification curve analysis -- an analytic approach that identifies and calculates all reasonable specifications for a research question -- to identify 1208 analytic approaches. When applied to the NHANES data set, 435 analyses produced a hazard ratio greater than 1 (implying increased mortality) and 773 a hazard ratio less than 1. In a similar study, Silberzahn asked 29 teams to address the question of whether soccer referees were more likely to give red cards to dark-skin-toned players compared with light-skin-toned players. Twenty teams found such an association, and nine teams did not. Distribution approaches of variables included linear, logistic, and Poisson; the number of covariates ranged from 0 to 7; and the analytic approach included, but was not limited to, logistic regression, Bayesian logistic regression, and Tobit regression. The manuscripts by Wang and Siblerzahn highlight the enormous variability in results of observation studies depending upon the many assumptions made and analytic approaches taken.
So where does that leave us with regard to replication and reproducibility? There is broad agreement that these are important concepts, but how to pursue them and ensure credibility is not clear. For example, of the millions of experiments published each year, which ones should be replicated? Should it be random, or should “experts” determine those that are most important? If someone repeats a laboratory-based experiment or has access to clinical data and the analytic approach, but cannot produce the same finding, what happens next? Given that the overwhelming weight of evidence indicates that there is no relationship between immunizations and autism, if a new study finds such a relationship, what is to be believed? Making data public will help, but it is incumbent upon investigators to make their intent of repeating the experiment or re-analyzing data clear. In clinical research, if they use similar data, Wang and Siblerzahn tell us that the results very well could be different. Finally, the rewards for replication and reproducibility, including funding and academic credit, are just now emerging. The field will not gain traction without such rewards.
4. Journals should not publish RCTs and meta-analyses if there was no registration prior to data collection. Checking if these studies have been registered and are being reported consistent with registration is a challenge for all journals, particularly those with limited resources. In the coming years, AI will make this easier. If journals do publish such reports, the journal should insist that authors clarify why the study was not registered.
5. As funding bodies develop policies regarding research ethics, they should enforce those policies. For example, the NIH has been lax in ensuring that the results of RCTs that they fund are reported on clinicaltrials.gov. Its new policy of data sharing underlying publications needs to be enforced.
6. Many studies are conducted to generate hypotheses — indeed this has been part of the scientific process for centuries. It is time to be more specific about this process, particularly in research that involves human subjects. Hypothesis generating studies should be labelled as such, for example in the conclusion of the abstract. This may help health reporters and the public to better understand the scientific process.
Individuals and groups have focused on research integrity for decades. For example,
Sleuths review manuscripts, particularly focused on image manipulation.
Retraction Watch has tracked the number of retractions and studied and offered opinions about various aspects of scientific publishing (https://retractionwatch.com/).
The Center for Open Science has been committed to increasing “openness, integrity, and reproducibility of research.”
PubPeer allows post-publication comments on research reports and has often been the place where questions about image manipulation or data fabrication arise.
More recently, the NIH, the largest single funder of biomedical research, summarized nine tenets of scientific gold standard research, acknowledging past efforts and highlighting those that will be instituted.
Maintaining research integrity is more challenging than ever. It is very unlikely that humans alone will be up to the task, particularly with more than 3 million manuscripts published each year (and many more rejected). The amount of science being produced has exceeded the capacity of human peer review. It is time to embrace a more comprehensive approach – greater clarity of the intent of research, that is, registration of observational cohort studies, adherence to reporting guidelines, and human-AI peer review.
Howard Bauchner, MD, is a Professor of Pediatrics and Public Health at the Boston University Chobanian & Avedisian School of Medicine. He is also a Visiting Scholar at the National University of Singapore. Dr. Bauchner is also the former Editor in Chief of JAMA and the JAMA Network and former Editor in Chief of Archives of Disease in Childhood.
Dr Rivara is Professor of Pediatrics, Adjunct Professor of Epidemiology at the University of Washington. He was the former editor-in-chief of JAMA Pediatrics (2000-2017) and JAMA Network Open (2018-2024). He continues as an active clinician, mentor, and investigator



