Conformer: Convolution-augmented Transformer for Speech RecognitionConformer:用于语音识别的卷积增强TransformerAbstract摘要Recently Transformer and Convolution neural network(CNN) based models have shown promising results in Automatic Speech Recognition (ASR),outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing content-based global interactions,while CNNs exploit local features efectively. In this work,we achieve the best of both worlds by studying how to combine convolution neural networks and transformers to model both local and global dependencies ofan audio sequence in a parameter-effcient way. To this regard,we propose the convolution-augmented transformer for speech recognition,named Conformer. Conf