Daniel Povey
British speech recognition researcher
Daniel Povey's AcademicInfluence.com Rankings

Download Badge
Computer Science
Why Is Daniel Povey Influential?
(Suggest an Edit or Addition)According to Wikipedia, Daniel Povey is a British researcher in the fields of speech recognition and artificial intelligence. After graduating from Cambridge University, he held research positions at Microsoft and IBM from 2003 to 2012. He worked at Johns Hopkins University as a nontenured associate research professor in the Whiting School of Engineering prior to being fired in August 2019. Later in August 2019, after being fired by Johns Hopkins, Povey was slated to begin working for Facebook, but he rejected Facebook's conditions of employment just days before he would have begun working for them. He was appointed the chief speech scientist at Xiaomi in November 2019, and continued to hold this position as of October 2020. He is also the primary architect and maintainer of Kaldi.
Daniel Povey's Published Works
Published Works
- The Kaldi Speech Recognition Toolkit (2011) (5722)
- Librispeech: An ASR corpus based on public domain audio books (2015) (3691)
- X-Vectors: Robust DNN Embeddings for Speaker Recognition (2018) (1826)
- The HTK book version 3.4 (2006) (1059)
- A time delay neural network architecture for efficient modeling of long temporal contexts (2015) (913)
- Audio augmentation for speech recognition (2015) (885)
- Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI (2016) (830)
- MUSAN: A Music, Speech, and Noise Corpus (2015) (828)
- Minimum Phone Error and I-smoothing for improved discriminative training (2002) (816)
- Sequence-discriminative training of deep neural networks (2013) (680)
- A study on data augmentation of reverberant speech for robust speech recognition (2017) (622)
- Deep Neural Network Embeddings for Text-Independent Speaker Verification (2017) (614)
- Strategies for training large scale neural network language models (2011) (520)
- Boosted MMI for model and feature-space discriminative training (2008) (438)
- Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks (2018) (368)
- Large scale discriminative training of hidden Markov models for speech recognition (2002) (368)
- The subspace Gaussian mixture model - A structured model for speech recognition (2011) (326)
- fMPE: discriminatively trained features for speech recognition (2005) (322)
- Deep neural network-based speaker embeddings for end-to-end speaker verification (2016) (317)
- Improving deep neural network acoustic models using generalized maxout networks (2014) (313)
- A pitch extraction algorithm tuned for automatic speech recognition (2014) (309)
- Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification (2018) (217)
- Improved feature processing for deep neural networks (2013) (215)
- Speaker diarization using deep neural network embeddings (2017) (213)
- Speaker Recognition for Multi-speaker Conversations Using X-vectors (2019) (208)
- Subspace Gaussian Mixture Models for speech recognition (2010) (192)
- Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models (2010) (186)
- Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge (2018) (183)
- Parallel training of DNNs with Natural Gradient and Parameter Averaging (2014) (175)
- Spoken Language Recognition using X-vectors (2018) (166)
- Revisiting Recurrent Neural Networks for robust ASR (2012) (161)
- Minimum Bayes Risk decoding and system combination based on a recursion for edit distance (2011) (159)
- Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs (2018) (150)
- End-to-end Speech Recognition Using Lattice-free MMI (2018) (146)
- Large scale discriminative training for speech recognition (2000) (146)
- Advances in speech transcription at IBM under the DARPA EARS program (2006) (142)
- Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging (2014) (141)
- Krylov Subspace Descent for Deep Learning (2011) (130)
- Time delay deep neural network-based universal background models for speaker recognition (2015) (129)
- The IBM 2004 conversational telephony system for rich transcription (2005) (126)
- A Time-Restricted Self-Attention Layer for ASR (2018) (125)
- Generating exact lattices in the WFST framework (2012) (121)
- Multilingual deep neural network based acoustic modeling for rapid language adaptation (2014) (118)
- JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS (2015) (104)
- Using proxies for OOV keywords in the keyword search task (2013) (100)
- An Exploration of Dropout with LSTMs (2017) (97)
- A Pruned Rnnlm Lattice-Rescoring Algorithm for Automatic Speech Recognition (2018) (94)
- Emotion Identification from Raw Speech Signals Using DNNs (2018) (91)
- Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training (2007) (91)
- Improving speaker recognition performance in the domain adaptation challenge using deep neural networks (2014) (90)
- Anatomy of an extremely fast LVCSR decoder (2005) (88)
- Acoustic Modelling from the Signal Domain Using CNNs (2016) (86)
- Feature and model space speaker adaptation with full covariance Gaussians (2006) (85)
- State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18 (2019) (84)
- Reverberation robust acoustic modeling using i-vectors with time delay neural networks (2015) (82)
- GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio (2021) (74)
- Investigation of transfer learning for ASR using LF-MMI trained neural networks (2017) (72)
- Semi-Supervised Training of Acoustic Models Using Lattice-Free MMI (2018) (71)
- Neural Network Language Modeling with Letter-Based Features and Importance Sampling (2018) (68)
- MMI-MAP and MPE-MAP for acoustic model adaptation (2003) (63)
- Probing the Information Encoded in X-Vectors (2019) (61)
- The IBM 2006 Gale Arabic ASR System (2007) (60)
- Discriminative map for acoustic model adaptation (2003) (58)
- Improvements to fMPE for discriminative training of features (2005) (56)
- Frame discrimination training for HMMs for large vocabulary speech recognition (1999) (56)
- Morpheme-Based Language Modeling for Arabic Lvcsr (2006) (51)
- Pronunciation and silence probability modeling for ASR (2015) (51)
- Quantifying the value of pronunciation lexicons for keyword search in lowresource languages (2013) (50)
- THE CU-HTK MARCH 2000 HUB5E TRANSCRIPTION SYSTEM (2000) (49)
- Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition (2018) (46)
- Universal background model based speech recognition (2008) (46)
- Speaker Diarization with Region Proposal Network (2020) (45)
- Semi-supervised maximum mutual information training of deep neural network acoustic models (2015) (45)
- Far-Field ASR Without Parallel Data (2016) (44)
- Improved discriminative training techniques for large vocabulary continuous speech recognition (2001) (43)
- Feature space Gaussianization (2004) (43)
- A keyword search system using open source software (2014) (43)
- Large scale MMIE training for conversational telephone speech recognition (2000) (41)
- End-to-end Deep Neural Network Age Estimation (2018) (41)
- JHU Kaldi system for Arabic MGB-3 ASR challenge using diarization, audio-transcript alignment and transfer learning (2017) (41)
- Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program (2009) (39)
- Penalty function maximization for large margin HMM training (2008) (39)
- x-Vector DNN Refinement with Full-Length Recordings for Speaker Recognition (2019) (39)
- A Teacher-Student Learning Approach for Unsupervised Domain Adaptation of Sequence-Trained ASR Models (2018) (38)
- An improved consensus-like method for Minimum Bayes Risk decoding and lattice combination (2010) (37)
- A Tutorial-style Introduction to Subspace Gaussian Mixture Models for Speech Recognition (2009) (37)
- The Kaldi OpenKWS System: Improving Low Resource Keyword Search (2017) (35)
- A basis representation of constrained MLLR transforms for robust adaptation (2012) (35)
- Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR (2018) (34)
- DOVER-Lap: A Method for Combining Overlap-Aware Diarization Outputs (2020) (33)
- The 1998 HTK broadcast news transcription system: development and results (1999) (32)
- Speaking rate adaptation using continuous frame rate normalization (2010) (31)
- Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network (2019) (29)
- Discriminative training for HMM-based offline handwritten character recognition (2003) (29)
- Approaches to automatic lexicon learning with limited training examples (2010) (29)
- A novel estimation of feature-space MLLR for full-covariance models (2010) (28)
- The JHU Speaker Recognition System for the VOiCES 2019 Challenge (2019) (27)
- State-Level Data Borrowing for Low-Resource Speech Recognition Based on Subspace GMMs (2011) (27)
- Backstitch: Counteracting Finite-Sample Bias via Negative Steps (2017) (27)
- Multistream CNN for Robust Acoustic Modeling (2020) (25)
- Automatic transcription of conversational telephone speech (2005) (25)
- Discriminatively trained features using fMPE for multi-stream audio-visual speech recognition (2005) (25)
- PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR (2020) (25)
- Fast speaker adaptive training for speech recognition (2008) (23)
- Revisiting semi-continuous hidden Markov models (2012) (23)
- Automated Quality Monitoring for Call Centers using Speech and NLP Technologies (2006) (23)
- Some insights from translating conversational telephone speech (2014) (22)
- New features in the CU-HTK system for transcription of conversational telephone speech (2001) (22)
- speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment (2021) (21)
- Automated Quality Monitoring in the Call Center with ASR and Maximum Entropy (2006) (21)
- A basis method for robust estimation of constrained MLLR (2011) (19)
- Output-Gate Projected Gated Recurrent Unit for Speech Recognition (2018) (18)
- Strategies for using MLP based features with limited target-language training data (2011) (18)
- Wake Word Detection with Streaming Transformers (2021) (17)
- A diversity-penalizing ensemble training method for deep learning (2015) (17)
- Using ASR Methods for OCR (2019) (17)
- An Empirical Study of Transformer-Based Neural Language Model Adaptation (2020) (17)
- Robust Document Representations for Cross-Lingual Information Retrieval in Low-Resource Settings (2019) (16)
- A symmetrization of the Subspace Gaussian Mixture Model (2011) (16)
- A GPU-based WFST Decoder with Exact Lattice Generation (2018) (16)
- Phone duration modeling for LVCSR (2004) (15)
- The 2009 IBM GALE Mandarin broadcast transcription system (2010) (14)
- Translations of the Callhome Egyptian Arabic corpus for conversational speech translation (2014) (14)
- SPAM and full covariance for speech recognition (2006) (13)
- Porting: SwitchBoard to the VoiceMail task (2003) (13)
- Wake Word Detection with Alignment-Free Lattice-Free MMI (2020) (13)
- Acoustic Modeling from Frequency Domain Representations of Speech (2018) (13)
- Combination of FST and CN search in spoken term detection (2014) (13)
- CU-HTK April 2002 Switchboard System (2002) (12)
- A Parallelizable Lattice Rescoring Strategy with Neural Language Models (2021) (12)
- Quick fmllr for speaker adaptation in speech recognition (2008) (12)
- Automatic transcription of conversational telephone speech: development of the CU-HTK 2002 system (2003) (11)
- Pruned RNN-T for fast, memory-efficient ASR training (2022) (11)
- Improving LF-MMI Using Unconstrained Supervisions for ASR (2018) (11)
- Large margin semi-tied covariance transforms for discriminative training (2009) (11)
- The IBM Rich Transcription Spring 2006 Speech-to-Text System for Lecture Meetings (2006) (10)
- Minimum hypothesis phone error as a decoding method for speech recognition (2009) (9)
- Speaker Recognition Benchmark Using the CHiME-5 Corpus (2019) (9)
- Acoustic data-driven pronunciation lexicon generation for logographic languages (2016) (9)
- A Coarse-Grained Model for Optimal Coupling of ASR and SMT Systems for Speech Translation (2015) (9)
- Multi-PLDA Diarization on Children's Speech (2019) (9)
- Modeling gender dependency in the Subspace GMM framework (2012) (9)
- Modeling phonetic context with non-random forests for speech recognition (2015) (8)
- Speaker adaptation with an Exponential Transform (2011) (8)
- Acoustic Data-Driven Lexicon Learning Based on a Greedy Pronunciation Selection Framework (2017) (8)
- Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems (2020) (7)
- Secondary Classification for GMM Based Speaker Recognition (2006) (7)
- Feature and score level combination of subspace Gaussinas in LVCSR task (2013) (6)
- JHU Diarization System Description (2018) (5)
- The JHU ASR System for VOiCES from a Distance Challenge 2019 (2019) (5)
- The Impact of ASR on Speech-to-Speech Translation Performance (2007) (5)
- Improving Emotion Identification Using Phone Posteriors in Raw Speech Waveform Based DNN (2019) (4)
- Combining forward and backward search in decoding (2013) (4)
- Lhotse: a speech data representation library for the modern deep learning ecosystem (2021) (4)
- Frustratingly Easy Noise-aware Training of Acoustic Models (2020) (3)
- Feature and model space speaker adaptati (2006) (3)
- Approaches to Speech Recognition based on Speaker Recognition Techniques (2013) (3)
- Phone Duration Modeling for LVCSR Using Neural Networks (2017) (3)
- Mixture of Speaker-type PLDAs for Children's Speech Diarization (2020) (2)
- Low Development Cost , High Quality Speech Recognition for New Languages and Domains ” : Report from 2009 Johns Hopkins / CLSP Summer Workshop (2009) (2)
- OOV Recovery with Efficient 2nd Pass Decoding and Open-vocabulary Word-level RNNLM Rescoring for Hybrid ASR (2020) (2)
- Fast and parallel decoding for transducer (2022) (2)
- Neural Language Modeling with Implicit Cache Pointers (2020) (2)
- XMLLR for improved speaker adaptation in speech recognition (2008) (1)
- NOTES FOR AFFINE TRANSFORM-BASED VTLN (2014) (1)
- Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation (2022) (1)
- An Alternative to MFCCs for ASR (2020) (1)
- LET-Decoder: A WFST-Based Lazy-Evaluation Token-Group Decoder With Exact Lattice Generation (2021) (1)
- Efficient MDI Adaptation for n-gram Language Models (2020) (1)
- Corrections to "Automatic Transcription of Conversational Telephone Speech" (2006) (1)
- GPU-accelerated Guided Source Separation for Meeting Transcription (2022) (1)
- PAID I FEATURE AND SCORE LEVEL COMBINATION OF SUBSPACE GAUSSIANS IN LVCSR TASK (2013) (1)
- The Symmetric Subspace Gaussian Mixture Model (2010) (1)
- Removing redundancy from lattices (2014) (1)
- Building Keyword Search System from End-To-End Asr Systems (2023) (0)
- MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK (2015) (0)
- An Asynchronous WFST-Based Decoder for Automatic Speech Recognition (2021) (0)
- Optical Character Recognition with Chinese and Korean Character Decomposition (2019) (0)
- Incremental Lattice Determinization for WFST Decoders (2019) (0)
- SPAM and full covariance for Daniel Povey (2006) (0)
- Delay-penalized transducer for low-latency streaming ASR (2022) (0)
- Monte Carlo model-space noise adaptation for speech recognition (2008) (0)
- DIARIZATION USING DEEP NEURAL NETWORK EMBEDDINGS (2017) (0)
- A New Family of Extended Baum-Welch Update Rules (2008) (0)
- FOR MULTI-SPEAKER CONVERSATIONS USING X-VECTORS (2018) (0)
- Adapted Extended Baum-Welch transformations (2007) (0)
- LATTICE-RESCORING ALGORITHM FOR AUTOMATIC SPEECH RECOGNITION (2017) (0)
This paper list is powered by the following services:
Other Resources About Daniel Povey
What Schools Are Affiliated With Daniel Povey?
Daniel Povey is affiliated with the following schools: