The Ethics of AI in Scientific Publishing and Peer Review
The integration of artificial intelligence into scientific workflows has moved faster than the policies meant to regulate it. While Large Language Models (LLMs) like ChatGPT offer researchers tools to polish prose and summarize complex data, they also threaten the fundamental trust mechanism of science: peer review. Journals and publishers are currently fighting a two-front war against AI-generated paper mills and the use of AI by reviewers to assess confidential manuscripts.
The Rise of AI-Generated Manuscripts
The pressure to “publish or perish” has always plagued academia, but generative AI has weaponized this pressure. “Paper mills”—organizations that produce low-quality or fake scientific papers for a fee—are using AI to generate manuscripts at an industrial scale. This floods the submission systems of major publishers, clogging the pipeline for legitimate research.
The scale of this issue was highlighted in a massive purge by Wiley, one of the world’s largest academic publishers. In the past two years, Wiley has had to retract more than 11,300 papers. Many of these appeared in special issues of journals acquired from Hindawi. The financial impact was severe enough that Wiley paused the publication of valid special issues, reportedly costing the company upwards of $35 million to $40 million in revenue.
The Problem of “Tortured Phrases”
One of the distinct markers of AI-generated or AI-obfuscated text is the use of “tortured phrases.” To avoid plagiarism detection software, AI tools (or translation software used by paper mills) often replace standard terminology with bizarre synonyms.
Researchers Guillaume Cabanac and Cyril Labbé have documented thousands of these oddities. For example:
- “Counterfeit consciousness” instead of “artificial intelligence.”
- “Colossal information” instead of “big data.”
- “Bosom peril” instead of “breast cancer.”
These phrases act as red flags for editors. If an article discusses “colossal information” in a medical context, it is highly likely the paper was synthetic or manipulated to bypass standard checks.
AI in the Peer Reviewer's Chair
Perhaps more ethically complex than AI authorship is the use of AI in peer review. Reviewers are often unpaid and overworked, making the speed of an AI summary tempting. However, this raises immediate confidentiality and quality concerns.
Confidentiality Breaches
When a reviewer uploads a manuscript to a public LLM like ChatGPT or Claude, they are technically sharing unpublished, proprietary data with a third-party tech company. This violates the confidentiality agreements inherent in the review process. The data fed into these models could potentially train future versions, effectively leaking the novel research before it is published.
The “Regenerate Response” Scandal
The scientific community saw concrete evidence of this laziness recently when acceptance letters and peer review reports began appearing with the phrase “Regenerate response” or “As an AI language model, I cannot…” left in the text.
In one notable case discussed on social media and Retraction Watch, a reviewer for a violently competitive field copy-pasted their output without removing the chatbot’s standard disclaimer. This does not just look unprofessional; it proves that a human expert did not critically evaluate the methodology or data.
A study by researchers at Stanford University estimated that between 6.5% and 16.9% of text in peer reviews submitted to major AI conferences (like ICLR and NeurIPS) was likely written by AI. This suggests a significant portion of the scientific gatekeeping process is being outsourced to algorithms that cannot understand the nuance of physical experiments or theoretical novelty.
Journal Policies: Authorship and Disclosure
In response to these challenges, major scientific bodies have rushed to update their ethical guidelines. The consensus is shifting toward transparency rather than a total ban, though strict lines remain regarding authorship.
Can AI Be an Author?
The short answer is no. Both Nature and Science, two of the most prestigious journals in the world, have established that AI tools cannot be listed as authors.
- Accountability: An author must be able to take legal and scientific responsibility for the work. An AI cannot be sued, cannot sign a conflict-of-interest form, and cannot be held accountable for fraud.
- Committee on Publication Ethics (COPE): This global body advises that AI tools should be treated as methods. If an author uses ChatGPT to write code or refine the abstract, it must be disclosed in the “Methods” or “Acknowledgements” section, but the AI gets no byline.
The Disclosure Requirement
Publishers like Elsevier and Springer Nature now require authors to declare the use of AI generative tools. This includes specifying which tool was used and for what part of the process (e.g., “ChatGPT-4 was used to edit the grammar in the introduction”). Failure to disclose this information is increasingly viewed as a form of academic misconduct similar to hiding a financial conflict of interest.
The Arms Race of Detection Tools
To enforce these rules, publishers are turning to software solutions, but the technology is imperfect.
- Turnitin and iThenticate: These widely used plagiarism checkers have added AI-writing detection features. They analyze sentence structure and predictability (perplexity) to assign a probability score of AI involvement.
- The Risk of False Positives: These tools are not 100% accurate. There is a documented bias against non-native English speakers. A study published in Patterns showed that AI detectors often flag writing by non-native speakers as AI-generated because they tend to use simpler, more predictable sentence structures.
Because of these reliability issues, most reputable journals do not reject papers based solely on a software score. Editors are trained to look for the “tortured phrases” mentioned earlier or inconsistent citations (AI is notorious for “hallucinating” references that do not exist).
The Path Forward for Integrity
The integration of AI in science is inevitable, but maintaining integrity requires strict human oversight. The current ethical landscape rests on three pillars:
- Human Verification: No matter how good the summary, a human must check the primary data. AI cannot verify if a chemical reaction actually took place or if a patient population really existed.
- Total Transparency: Authors and reviewers must disclose their tool usage. If a reviewer uses AI to polish their critique, they must admit it and ensure the manuscript data remained private.
- Policy Enforcement: Journals must be willing to retract papers and ban reviewers who violate these trust mechanisms.
As the technology evolves, the scientific record depends on the understanding that AI is a tool to assist human inquiry, not a replacement for human judgment.
Frequently Asked Questions
Can I use ChatGPT to check the grammar of my scientific paper? Yes, most journals allow this, provided you disclose it. Nature and Elsevier generally permit AI for language editing, as it helps non-native speakers compete on a level playing field. Always check the specific “Guide for Authors” of the journal you are submitting to.
Why can’t AI be listed as a co-author? Authorship implies accountability. A co-author must be able to approve the final version of the manuscript and accept responsibility for its accuracy and integrity. An AI software program cannot legally or ethically perform these duties.
Do journals scan every paper for AI content? Most major publishers now integrate AI scanning into their initial submission checks alongside standard plagiarism detection. However, high scores usually trigger a manual review by an editor rather than an automatic rejection.
Is it safe to put a manuscript into ChatGPT for a summary? No. Unless you are using a localized, offline, or enterprise version of an LLM that guarantees data privacy, you are sharing unpublished intellectual property with a tech company. This counts as a breach of confidentiality for peer reviewers.