主管:中华人民共和国司法部
主办:司法鉴定科学研究院
ISSN 1671-2072  CN 31-1863/N

›› 2014 ›› Issue (1): 75-79.

• 鉴定实践 • 上一篇    下一篇

基于语料库的“普通话”地域性言语识别技术

王虹   

  1. 中国刑事警察学院
  • 收稿日期:2013-05-28 修回日期:2013-09-29 出版日期:2014-01-15 发布日期:2014-02-13
  • 通讯作者: 王虹

Corpus-based Regional Mandarin Recognition

  1. China National Police University
  • Received:2013-05-28 Revised:2013-09-29 Published:2014-01-15 Online:2014-02-13

摘要: 目的 探究汉语“普通话”中的地域性差别,发现言语特征,找到从有限的以普通话发音的语音材料中提取出更多地域性特征的途径和方法,提高地域性言语识别技术水平,更有效地为案件分析定向服务。方法 在建立较大规模的《面向案件言语识别应用的汉语“普通话”语料库》及查询检索系统的基础上,进行统计分析、归纳总结。结果 人们说普通话时会在声母、韵母、声调、重音、儿化、轻声等语音方面和词汇、语法等方面不同程度地出现其母语方言的固有特点。我们可以采用“以调值特征为中心,声韵特征相结合”、“利用各类特征的总和进行综合评断”等方法进行识别。结论 利用“普通话”语声进行言语人地域性识别是一种切实可行的技术方法。

关键词: 关键词:方言普通话, 语料库, 地域性言语识别

Abstract: Objective To study the speech characteristics in different regional Mandarin and establish a method for extracting regional speech characteristics from speech materials in Mandarin pronunciation. Method A large-scale Chinese Mandarin Corpus serving for speech recognition was established, as well as a query system. The speech data in the corpus were analyzed statistically and summarized. Results speakers demonstrated their native tongues in voice characteristics of initials, finals, tones, stress, retroflex suffixation, and neutral tones, as well as vocabulary and grammar characteristics. Therefore, the regional speech could be recognized by a comprehensive analysis of various characteristics, with emphasis on the feature of tone values and combination of initials and finals. Conclusion It is practicable to recognize the region of the speaker by analyzing his or her Mandarin speech.

Key words: font-size: 10.5pt, mso-bidi-font-size: 10.0pt, mso-font-kerning: 1.0pt, mso-ansi-language: EN-US, mso-fareast-language: ZH-CN, mso-bidi-language: AR-SA, mso-fareast-font-family: 宋体" lang="EN-US">dialect Mandarin, corpus, regional speech recognition