------------------------------------------------------------------------------------------------ |
I am currently a Ph.D. candidate at School of Computer Science, Carnegie Mellon University. My research focuses on deep learning and its application to speech recognition. |
I am the author and maintainer of the end-to-end speech recognition toolkit Eesen, and the open-source deep learning toolkit PDNN.
Refer to my CV for more information. |
Research & Teaching Experience |
------------------------------------------------------------------------------------------------ |
Toolkits & Datasets |
------------------------------------------------------------------------------------------------ |
Eesen. End-to-End Speech Recognition using Deep RNNs (Models), CTC (Training) and WFSTs (Decoding)
PDNN. A python deep learning toolkit developed under the Theano environment |
Kaldi+PDNN. A
set of fully-fledged Kaldi DNN recipes. Implementations include hybrid
systems with DNNs and CNNs, tandem systems with bottleneck features,
etc. |
Viral Video Dataset. So far the largest public dataset for viral video research.
Honors & Awards
------------------------------------------------------------------------------------------------ |
Best Paper Nomination, ASRU 2015.
Best Poster Award, SLT 2014.
Excellent Master Graduate Student, Tsinghua University, 2011 |
Excellent Master Thesis Award, Tsinghua University, 2011 |
1st Class Graduate Scholarship (Chiang Chen Scholarship), Tsinghua University, 2010
1st Class Graduate Scholarship (Toshiba Scholarship), Tsinghua University, 2009
Publications |
------------------------------------------------------------------------------------------------ |
[ Google Scholar ]
Yajie Miao. Kaldi+ PDNN: building DNN-based ASR systems with Kaldi and PDNN. Manuscript, arXiv:1401.6984, communicated 27 Jan 2014
Journal Articles: |
Yajie Miao, Hao Zhang, Florian Metze. Speaker Adaptive Training of Deep Neural Network Acoustic Models using I-Vectors. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol 23, issue 11, 2015. |
Conference Papers: |
Yajie Miao, Jinyu Li, Yongqiang Wang, Shixiong Zhang, Yifan Gong. Simplifying Long Short-Term Memory Acoustic Models for Fast Training and Decoding. To appear in the 41th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016)
Yajie Miao, Mohammad Gowayyed, Xingyu Na, Tom Ko, Florian Metze, Alexander Waibel. An Empirical Exploration of CTC Acoustic Models. To appear in the 41th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016) |
Yajie Miao, Mohammad Gowayyed, Florian Metze. EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding. 2015 Automatic Speech Recognition and Understanding Workshop (ASRU 2015) Best Paper Nomination |
Yajie Miao, Florian Metze. On Speaker Adaptation of Long Short-Term Memory Recurrent Neural Networks. The 16th Annual Conference of the International Speech Communication Association (Interspeech 2015) |
Yajie Miao, Florian Metze. Distance-Aware DNNs for Robust Speech Recognition. The 16th Annual Conference of the International Speech Communication Association (Interspeech 2015) |
Justin Chiu, Yajie Miao, Alan W Black, Alex Rudnicky. Distributed Representation-based Spoken Word Sense Inductionn. The 16th Annual Conference of the International Speech Communication Association (Interspeech 2015) |
Yashesh Gaur, Florian Metze, Yajie Miao and Jeffrey Bigham. Using Keyword Spotting to Help Humans Correct Captioning Faster. The 16th Annual Conference of the International Speech Communication Association (Interspeech 2015) |
Hao Zhang, Yajie Miao, Florian Metze. Regularizing DNN Acoustic Models with Gaussian Stochastic Neurons. The 40th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2015)
Florian Metze, Ankur Gandhe, Yajie Miao, Zaid Sheikh, Yun Wang, et al. Semi-supervised Training in Low-Resource ASR and KWS. The 40th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2015) |
Yajie Miao, Lu Jiang, Hao Zhang, Florian Metze. Improvements to Speaker Adaptive Training of Deep Neural Networks. 2014 IEEE Spoken Language Technology Workshop (SLT 2014) Best Poster Award |
Yajie Miao, Florian Metze. Improving Language-Universal Feature Extraction with Deep Maxout and Convolutional Neural Networks. The 15th Annual Conference of the International Speech Communication Association (Interspeech 2014)
Yajie Miao, Hao Zhang, Florian Metze. Towards Speaker Adaptive Training of Deep Neural Network Acoustic Models. The 15th Annual Conference of the International Speech Communication Association (Interspeech 2014) |
Yajie Miao, Hao Zhang, Florian Metze. Distributed Learning of Multilingual DNN Feature Extractors using GPUs. The 15th Annual Conference of the International Speech Communication Association (Interspeech 2014) |
Lu Jiang, Yajie Miao, Yi Yang, ZhenZhong Lan, Alexander Hauptmann. Viral Video Style: A Closer Look at Viral Videos on YouTube. In ACM International Conference on Multimedia Retrieval (ICMR 2014) [ dataset ] |
Yajie Miao, Florian Metze, and Shourabh Rawat. Deep Maxout Networks for Low-Resource Speech Recognition. 2013 Automatic Speech Recognition and Understanding Workshop (ASRU 2013) |
Yajie Miao, Florian Metze. Improving Low-Resource CD-DNN-HMM using Dropout and Multilingual DNN Training. The 14th Annual Conference of the International Speech Communication Association (Interspeech 2013) |
Jonas Gehring, Wonkyum Lee, Kevin Kilgour, Ian Lane, Yajie Miao, Alex Waibel. Modular Combination of Deep Neural Networks for Acoustic Modeling. The 14th Annual Conference of the International Speech Communication Association (Interspeech 2013) |
Yajie Miao, Florian Metze, Alex Waibel. Learning Discriminative Basis Coefficients for Eigenspace MLLR Unsupervised . The 38th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013) |
Yajie Miao, Florian Metze, Alex Waibel. Subspace Mixture Model for Low-resource Speech Recognition in Cross-lingual Settings. The 38th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013) |
Jonas Gehring, Yajie Miao, Florian Metze, and Alex Waibel. Extracting Deep Bottleneck Features using Stacked Auto-encoders. The 38th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013)
Yajie Miao,
Chunping Li, Jie Tang, and Lili Zhao. Identifying New Categories in
Community Question Answering Archives: A Topic Modeling Approach. The 19th ACM Conference on Information and Knowledge Management (CIKM 2010) |
Yajie Miao, Lili Zhao, Chunping Li, and Jie Tang. Automatically Grouping Questions in Yahoo! Answers. 2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2010) |
Yajie Miao, and Chunping Li. Enhancing Query-oriented Summarization based on Sentence Wikification. SIGIR 2010 Workshop on Feature Generation and Selection for Information Retrieval. |
Yajie Miao, and Chunping Li. Mining Wikipedia and Yahoo! Answers for Question Expansion in Opinion QA. The 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2010) |
Song Gao, Yajie Miao, Liu Yang, and Chunping Li. Topic-Based Computing Model for Web Page Popularity and Website Influence. The 22nd Australasian Joint Conference on Artificial Intelligence (AI 2009) |