Construction of Spoken Language Recognition Models Based on Prompt Methods and Knowledge Distillation Methods
-
-
Abstract
Prompt method is an effective approach for using pre-trained language models. It requires only a small number of examples to perform a new natural language task. This article presents a novel speech recognition model, which is called SpokenPrompt-KD and based on prompting and knowledge distillation methods. This model utilizes the Wav2Vec model to convert speech into a text embedding format recognizable by pre-trained language models, thereby extends the language model's few-shot learning capabilities into the realm of speech recognition. Simultaneously, it employs knowledge distillation methods to transfer knowledge from a teacher language model to a student speech model, aiming to enhance the model's accuracy in speech understanding tasks. The experimental results indicate that after pre-training on a 100-hour dataset, the model achieves an accuracy of 88.4% in classification tasks. The experiments demonstrate the feasibility and effectiveness of this model with few-shot learning capabilities in the field of speech recognition.
-
-