基于语言学音系例字的口音自动识别探究

doi:10.3969/j.issn.1671-2072.2021.02.006

中国司法鉴定 ›› 2021 ›› Issue (2): 38-42.DOI: 10.3969/j.issn.1671-2072.2021.02.006

基于语言学音系例字的口音自动识别探究

杨伟1，杨俊杰2

1.山西大学，山西太原 030006； 2.山西警察学院，山西太原 030021

收稿日期:2020-04-23 出版日期:2021-03-25 发布日期:2021-03-25
通讯作者: 杨俊杰（1973-），教授，博士，主要从事声像资料鉴定、汉语言学研究。
作者简介:杨伟（1982-），讲师，博士，主要从事计算语言学、方言识别研究。
基金资助:
教育部人文社会科学研究专项委托项目（YB1924C002G);山西省高校人文社科重点研究基地项目（2015309）。

Exploration of Automatic Accent Recognition Based on Linguistic Phonological Words

YANG Wei1, YANG Junjie2

1. Shanxi University， Taiyuan 030006， China； 2. Shanxi Police College， Taiyuan 030021, China

Received:2020-04-23 Published:2021-03-25 Online:2021-03-25

摘要/Abstract

摘要：

目的以方言语音学为视角，通过计算机分析和选择模型训练数据，探究提高口音自动识别准确率及优化模型训练数据的方法。方法采用音系分析、语音信号处理、数学模型实验和统计的方法，对包含约81 400段电子语音的37个方言点（每个方言点约2 200段），逐一进行音系提炼、挑选音系例字、电子语音预处理和提取梅尔倒谱系数（MFCC），构建高斯混合模型（GMM）进行口音识别分析。结果从方言语音中提取的音系例字作为训练集（约260字）的识别模型可以较好地完成口音识别任务，对比任选300个例字作为训练集的识别模型，无论是识别准确度还是待测语音数量的要求都有显著优势。结论基于语言学音系例字的口音自动识别方法，已经完成37个方言点高斯混合模型的建立，可以用于口音识别中辅助辨别分析。

关键词:

, 高斯混合模型, 口音自动识别, 音系, 语言学

Abstract:

Objective From perspective of dialect phonetics, through computer analysis and selection of model training data, this study explores the methods to improve the accuracy of automatic accent recognition and optimize the model training data. Methods By using methods of phonological analysis, signal processing, mathematical model and statistics, 37 dialect points (each dialect point is about 2 200) containing about 81 400 segments of electronic recording are selected phonological words, electronic speech preprocessing, extracted Mel-scale frequency cepstral coefficients, and Gaussian mixture models are constructed for accent recognition and analysis. Results The results show that the recognition model based on the training set of 260 words extracted from dialect speech can complete the task of accent recognition. Compared with the recognition model based on the training set of 300 words, it has significant advantages in recognition accuracy and the number of speech to be tested. Conclusion The automatic accent recognition method based on linguistic phonological words has completed the establishment of Gaussian mixture model of 37 dialect points, which can be used for auxiliary discrimination analysis in accent recognition.

Key words:

Gaussian mixture model, automatic accent recognition, phonology, linguistics

中图分类号:

杨伟, 杨俊杰.

基于语言学音系例字的口音自动识别探究 [J]. 中国司法鉴定, 2021(2): 38-42.

YANG Wei, YANG Junjie.

Exploration of Automatic Accent Recognition Based on Linguistic Phonological Words [J]. Chinese Journal of Forensic Sciences, 2021(2): 38-42.

[1]	柴智勇, 王学平. 剩磁法在汽车火灾原因鉴定中的适用性研究[J]. 中国司法鉴定, 2022, 121(2): 64-68.
[2]	朱广友, 夏文涛. 医疗损害鉴定中因果关系分析理论及其应用 [J]. 中国司法鉴定, 2022, 120(1): 93-99.
[3]	李云鹏, 孙鹏, 代雪晶, 等. 多光谱光场成像在公安视听技术中的应用 [J]. 中国司法鉴定, 2021, 117(4): 62-71.
[4]	周桂雪, 周适, 涂舜. 法庭科学质量控制措施的体系构成及域内实践审视 [J]. 中国司法鉴定, 2021, 116(3): 21-35.
[5]	刘猛康, 张晓梅. 手印相纸转印显现法的研发与应用 [J]. 中国司法鉴定, 2021, 115(2): 57-63.

基于语言学音系例字的口音自动识别探究

Exploration of Automatic Accent Recognition Based on Linguistic Phonological Words

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 5

编辑推荐

Metrics