Abstract:
Based on the analysis of air traffic control (ATC), in this paper, an end-to-end automatic speech recognition model is studied to address the speech recognition task for ATC in China. A deep learning model based on convolutional neural network and recurrent neural network is designed and implemented to translate the speech into text. Based on the CTC loss function, the proposed model is iteratively optimized with real samples that were collected from civil airports by voice recorder. The input and output of the proposed model are the ATC speech and the Chinese character with dedicated ATC terms respectively. Experiments are conducted to determine an optimal architecture and confirm the performance over existing approaches, achieving 9.49%-character error rate on a 10-hour training dataset. The result shows that the proposed model obtains a desired performance of the speech recognition.