对AI生成论文摘要的鉴别效能——一项审稿人与AI检测工具的实证研究

黎世莹; 张慧; 吕叶辉; 姚雪珺; 余党会

文章摘要

黎世莹,张慧,吕叶辉,姚雪珺,余党会.对AI生成论文摘要的鉴别效能——一项审稿人与AI检测工具的实证研究.编辑学报,2026,38(1):82-89

对AI生成论文摘要的鉴别效能——一项审稿人与AI检测工具的实证研究

Discrimination efficiency of AI-generated paper abstracts:an empirical study on reviewers and AI detection tools

DOI：10.16811/j.cnki.1001-4314.2026.01.014

中文关键词: 生成式人工智能 ChatGPT DeepSeek 编辑作者审稿人

英文关键词: generative artificial intelligence ChatGPT DeepSeek editor author reviewer

基金项目:*2025年中国高校科技期刊研究会·TranSpread期刊传播基金项目(CUJS-TS-2025-019)；2025年江苏省科技期刊学会·TrendMD 期刊学术传播基金项目(JSKJQK-MPS/AJE-2025-007)

作者	单位
黎世莹	司法鉴定科学研究院,上海市法医学重点实验室,司法部司法鉴定重点实验室,上海市司法鉴定专业技术服务平台,200063
张慧	司法鉴定科学研究院,上海市法医学重点实验室,司法部司法鉴定重点实验室,上海市司法鉴定专业技术服务平台,200063
吕叶辉	上海健康医学院基础医学院,201318
姚雪珺	司法鉴定科学研究院,上海市法医学重点实验室,司法部司法鉴定重点实验室,上海市司法鉴定专业技术服务平台,200063
余党会	海军军医大学教研保障中心出版社《海军军医大学学报》编辑部,200433,上海

摘要点击次数: 878

全文下载次数: 482

中文摘要:

分析审稿人、Artificial Intelligence(AI)和专业检测工具对AI生成内容的鉴别能力,整理描绘主流AI的撰文水平和其生成内容的典型特征。设计双盲实验评估审稿人和生成式人工智能(Generative artificial intelligence,GenAI)对AI生成内容的鉴别准确率；分析国内外出版机构、高等院校与期刊平台针对GenAI颁布的最新政策和导向。研究发现审稿人对论著类AI生成摘要的识别准确率显著高于综述类(91.9% vs. 83.8%,P<0.05),且对全文生成摘要的误判率高于题目生成摘要(28 vs. 11篇次,P<0.05)。AI检测工具(如GPTZero)对生成内容的识别准确率高达90.0%,显著优于大语言模型(ChatGPT、DeepSeek)。国内外政策在AI署名、内容披露等方面存在区域性差异。针对AI生成内容,本文提出构建“算法初筛—人工复核—作者申诉”的审稿模式,让作者合理、规范应用AI撰文,为科技期刊应对GenAI变革操作范式进行初步探索。

英文摘要:

This study aims to systematically profile the writing patterns of mainstream Artificial Intelligence (AI) tools and characterize typical features of AI-generated content for editors and reviewers, thereby addressing the knowledge gap in human-AI discriminative capability. It further proposes actionable frameworks to help scientific journals navigate the transformative impact of Generative artificial intelligence (GenAI). A double-blind experiment was conducted to evaluate the accuracy of reviewers and GenAI (ChatGPT, DeepSeek) in detecting AI-generated content. Additionally, the latest policies and guidelines on GenAI issued by global publishing institutions, universities, and journal platforms were analyzed. Reviewers demonstrated significantly higher accuracy in identifying AI-generated abstracts for research articles compared to review articles (91.9% vs. 83.8%, P<0.05). Full-text-generated abstracts exhibited higher misjudgment rates than title-generated ones (28 vs. 11, P<0.05). AI detection tools (e.g., GPTZero) achieved 90.0% accuracy in identifying generated content, significantly outperforming large language models (ChatGPT, DeepSeek). Regional disparities were observed in policies regarding AI authorship and content disclosure. This study proposes an “algorithmic pre-screening—human review—author appeal” manuscript evaluation model. This operational paradigm, coupled with globally coordinated ethical frameworks, offers a scalable solution for academic publishing amid GenAI disruption.

查看全文查看/发表评论下载PDF阅读器

关闭