主管:中华人民共和国司法部
主办:司法鉴定科学研究院
ISSN 1671-2072  CN 31-1863/N

中国司法鉴定 ›› 2026 ›› Issue (2): 38-45.DOI: 10.3969/j.issn.1671-2072.2026.02.005

• 专题研讨:新质生产力赋能司法鉴定多场景应用 • 上一篇    下一篇

人工智能合成语音与自然语音的对比研究

廖方菱1,陈蔓青1,陈胜湘1,郭宇航1,杨英仓1,牟    帆2   

  1. 1. 贵州警察学院 刑事技术系; 2. 贵阳市公安局刑侦支队
  • 收稿日期:2025-04-24 出版日期:2026-03-15 发布日期:2026-03-25

Comparative Study of AI-Synthesized Speech and Natural Speech

LIAO Fangling1, CHEN Manqing1, CHEN Shengxiang1, GUO Yuhang1, YANG Yingcang1, MU Fan2   

  1. 1. Department of Forensic Science and Technology, Guizhou Police College; 
    2. Criminal Investigation Division, Guiyang Municipal Public Security Bureau
  • Received:2025-04-24 Published:2026-03-15 Online:2026-03-25

摘要: 目的 随着人工智能(artificial intelligence,AI)合成语音技术的快速发展,其在司法鉴定中的可检测性成为关键问题。通过听觉感知与声学量化双维度对比研究,系统分析AI合成语音与自然语音的差异特征,为司法实践中合成语音的识别、防范、检验和鉴定提供有效参考。方法 听觉检验采用李克特量表,对自然语音与合成语音的一致性进行评分;声学检验利用Praat语音分析软件提取基频、共振峰、音强、时长等特征参数,结合SPSS 27统计学分析软件进行成对样本t检验,量化自然语音与合成语音之间的差异性。结果 与自然语音相比,AI合成语音在听觉特征上表现为单音节完整性、儿化音特征、轻重音、语速、流畅度方面较差;声学检验中,基频与共振峰的统计学分析显示差异显著,而音强和时长的差异不显著。结论 司法鉴定中综合运用“人耳初筛”与声学量化双维度检验技术,可有效区分AI合成语音与自然语音,为相关合成语音的检验和鉴定提供技术支撑。

关键词: 人工智能, 合成语音, 自然语音, 对比研究, 司法鉴定

Abstract: Objective With the rapid development of artificial intelligence (AI)-synthesized speech technology, its detectability in forensic appraisal has become a key issue. This study systematically analyzes the differential features between AI-synthesized speech and natural speech through a two-dimensional comparative study of auditory perception and acoustic quantification, thereby providing an effective reference for the identification, prevention, inspection, and appraisal of synthesized speech in judicial practice. Methods In the auditory test, the Likert-type Scale was used to rate the consistency between natural speech and synthesized speech. Acoustic tests were conducted by extracting feature parameters such as fundamental frequency, formants, sound intensity, and duration using the Praat speech analysis software. Combined with SPSS 27 statistical analysis software, a paired-sample t-test was conducted to quantify the differences between natural speech and synthesized speech. Results Compared with natural speech, AI-synthesized speech exhibited poorer performance in terms of auditory features such as monosyllabic integrity, retroflex features, stress, speech rate, and fluency. Statistical analysis of the acoustic testing showed that there were significant differences in fundamental frequency and formants, while sound intensity and duration showed no significant differences. Conclusion The combined application of “human ear preliminary screening” and acoustic quantification two-dimensional testing techniques in forensic appraisal can effectively distinguish AI-synthesized speech from natural speech, providing technical support for the inspection and appraisal of AI-synthesized speech.

Key words: artificial intelligence (AI), synthesized speech, natural speech, comparative study, forensic appraisal

中图分类号: