主管:中华人民共和国司法部
主办:司法鉴定科学研究院
ISSN 1671-2072  CN 31-1863/N

中国司法鉴定 ›› 2022 ›› Issue (2): 69-72.DOI: 10.3969/j.issn.1671-2072.2022.02.011

• 鉴定科学 • 上一篇    下一篇

合成语音的声纹鉴定分析——以两名AI虚拟主播语音为基础

张学海,杨璐铭   

  1. 广东省公安厅刑事技术中心,广东 广州 510050
  • 收稿日期:2020-11-14 出版日期:2022-03-15 发布日期:2022-04-26
  • 作者简介:张学海(1993—),男,工程师,主要从事声纹鉴定研究。E-mail:gdgazxh@163.com

Voiceprint Identification Analysis of Speech Synthesis: Based on the Voice of Two AI Virtual Announcers

ZHANG Xuehai,YANG Luming   

  1. Forensic Science Center of Guangdong Provincial Public Security Bureau,Guangzhou 510050, China
  • Received:2020-11-14 Published:2022-03-15 Online:2022-04-26

摘要:

目的 探究现阶段的AI合成语音与真人语音在声纹检验方面的差异。方法 通过收集两名AI虚拟主播及其各自原型的语音,以声纹鉴定的角度进行听觉感知、语谱分析两个方面的研究。结果 合成语音在听觉感知上仍能发现缺乏情感和自然度、断句错误等问题,基于实验所用语音高频共振峰的相对稳定性,合成语音与其原型的差异主要表现在4 kHz以上的高频共振峰上,有些音节在3 kHz以上即能显出差别,合成语音部分音节内的辅音-元音过渡段缺失。结论 在当前技术水平下,合成语音在处理韵律问题上有待提高,听觉分析可作判断合成语音的声纹检验参考。在语谱分析中能在合成语音和真人语音的高频图谱以及部分音节的辅音-元音过渡中呈现差异。

关键词:

AI虚拟主播, 合成语音, 声纹鉴定

Abstract:

Objective To explore the differences between AI-synthesized speech and human speech in voiceprint inspection at the present stage. Methods By collecting the voices of two AI virtual anchors and their respective prototypes, two aspects of auditory perception and language spectrum analysis were conducted from the perspective of voiceprint identification. Results Synthetic speech still suffers from lack of emotion, unnatural speech, and punctuation errors in auditory perception. Based on the relative stability of the high-frequency formants of the speech used in the experiment, the difference between the synthesized speech and its prototype was mainly manifested in the high-frequency resonance above 4 kHz. On the peak, some syllables can show differences above 3 kHz, and the consonant-vowel transition in some syllables of synthesized speech is missing. Conclusion At the current level of technology, synthetic speech needs to be improved in dealing with prosody issues, and auditory analysis can be used as a reference for voiceprint test for judging synthetic speech. In the spectral analysis, differences can be shown in the high-frequency maps of synthetic and real speech and the consonant-vowel transitions of some syllables.

Key words:

AI virtual announcer, synthetic speech, voiceprint identification

中图分类号: