Tara Sainath
Researcher
Tara Sainath's AcademicInfluence.com Rankings
Download Badge
Computer Science
Tara Sainath's Degrees
- PhD Electrical Engineering Stanford University
- Masters Electrical Engineering Stanford University
Why Is Tara Sainath Influential?
(Suggest an Edit or Addition)According to Wikipedia, Tara N. Sainath is an American computer scientist whose research involves deep learning applied to speech recognition. She is a principal research scientist at Google Research. Education and career Sainath was a student of electrical and engineering and computer science at the Massachusetts Institute of Technology, where she received a bachelor's degree, a master's degree in 2005, and a Ph.D. in 2009. Her master's thesis was Acoustic Landmark Detection and Segmentation using the McAulay-Quatieri Sinusoidal Model, supervised by Timothy Hazen, and her doctoral dissertation was Applications of Broad Class Knowledge for Noise Robust Speech Recognition, supervised by Victor Zue.
Tara Sainath's Published Works
Published Works
- Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups (2012) (8149)
- Deep Neural Networks for Acoustic Modeling in Speech Recognition (2012) (2360)
- Improving deep neural networks for LVCSR using rectified linear units and dropout (2013) (1306)
- Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks (2015) (1151)
- Deep convolutional neural networks for LVCSR (2013) (1083)
- State-of-the-Art Speech Recognition with Sequence-to-Sequence Models (2017) (997)
- Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets (2013) (540)
- Learning the speech front-end with raw waveform CLDNNs (2015) (474)
- Streaming End-to-end Speech Recognition for Mobile Devices (2018) (442)
- Deep Convolutional Neural Networks for Large-scale Speech Tasks (2015) (432)
- Convolutional neural networks for small-footprint keyword spotting (2015) (421)
- Deep Learning for Audio Signal Processing (2019) (331)
- Deep Belief Networks using discriminative features for phone recognition (2011) (317)
- Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization (2012) (247)
- A Comparison of Sequence-to-Sequence Models for Speech Recognition (2017) (238)
- Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home (2017) (212)
- Improvements to Deep Convolutional Neural Networks for LVCSR (2013) (209)
- Structured Transforms for Small-Footprint Deep Learning (2015) (205)
- Making Deep Belief Networks effective for large vocabulary continuous speech recognition (2011) (197)
- Deep Neural Network Language Models (2012) (192)
- Auto-encoder bottleneck features using deep belief networks (2012) (188)
- Multilingual Speech Recognition with a Single End-to-End Model (2017) (188)
- Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling (2019) (179)
- Learning filter banks within a deep neural network framework (2013) (168)
- An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model (2017) (167)
- A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency (2020) (166)
- Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition (2017) (165)
- Acoustic Modeling for Google Home (2017) (143)
- Lower Frame Rate Neural Network Acoustic Models (2016) (138)
- Query-by-example keyword spotting using long short-term memory networks (2015) (135)
- Acoustic modelling with CD-CTC-SMBR LSTM RNNS (2015) (129)
- Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models (2017) (128)
- Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model (2019) (121)
- Kernel methods match Deep Neural Networks on TIMIT (2014) (121)
- A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition (2018) (121)
- The shared views of four research groups ) (2012) (114)
- Large vocabulary automatic speech recognition for children (2015) (106)
- Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection (2016) (105)
- Two-Pass End-to-End Speech Recognition (2019) (98)
- Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes (2018) (97)
- Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition (2016) (97)
- Multi-Dialect Speech Recognition with a Single Sequence-to-Sequence Model (2017) (97)
- Recognizing Long-Form Speech Using Streaming End-to-End Models (2019) (91)
- A Spelling Correction Model for End-to-end Speech Recognition (2019) (87)
- Towards Fast and Accurate Streaming End-To-End ASR (2020) (87)
- Deep Context: End-to-end Contextual Speech Recognition (2018) (86)
- Locally-connected and convolutional neural networks for small footprint speaker recognition (2015) (85)
- Learning compact recurrent neural networks (2016) (82)
- Bayesian compressive sensing for phonetic classification (2010) (80)
- Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR (2011) (79)
- Joint training of convolutional and non-convolutional neural networks (2014) (78)
- Shallow-Fusion End-to-End Contextual Biasing (2019) (78)
- A Better and Faster end-to-end Model for Streaming ASR (2020) (76)
- Automatic gain control and multi-style training for robust small-footprint keyword spotting with deep neural networks (2015) (68)
- BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition (2021) (68)
- Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms (2015) (68)
- A voice-commandable robotic forklift working alongside humans in minimally-prepared outdoor environments (2010) (66)
- Temporal Modeling Using Dilated Convolution and Gating for Voice-Activity-Detection (2018) (62)
- Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks (2013) (61)
- Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks (2016) (61)
- A Comparison of End-to-End Models for Long-Form Speech Recognition (2019) (60)
- Factored spatial and spectral multichannel raw waveform CLDNNs (2016) (59)
- Exemplar-Based Processing for Speech Recognition: An Overview (2012) (58)
- FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization (2020) (54)
- Sparse representation features for speech recognition (2010) (53)
- Deliberation Model Based Two-Pass End-To-End Speech Recognition (2020) (53)
- Parallel Deep Neural Network Training for Big Data on Blue Gene/Q (2014) (52)
- Improving the Performance of Online Neural Transducer Models (2017) (50)
- Contextual Speech Recognition in End-to-end Neural Network Systems Using Beam Search (2018) (49)
- Cascaded Encoders for Unifying Streaming and Non-Streaming ASR (2020) (48)
- No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models (2017) (46)
- Self-Supervised Speech Representation Learning: A Review (2022) (46)
- Developing speech recognition systems for corpus indexing under the IARPA Babel program (2013) (46)
- An Analysis of "Attention" in Sequence-to-Sequence Models (2017) (44)
- Compression of End-to-End Models (2018) (44)
- An exploration of large vocabulary tools for small vocabulary phonetic recognition (2009) (42)
- Semi-supervised Training for End-to-end Models via Weak Distillation (2019) (41)
- Deep Scattering Spectrum with deep neural networks (2014) (37)
- Scaling End-to-End Models for Large-Scale Multilingual ASR (2021) (35)
- Phoebe: Pronunciation-aware Contextualization for End-to-end Speech Recognition (2019) (34)
- Performance of Mask Based Statistical Beamforming in a Smart Home Scenario (2018) (33)
- Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling (2020) (33)
- An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling (2021) (33)
- Improved pre-training of Deep Belief Networks using Sparse Encoding Symmetric Machines (2012) (33)
- Joint Endpointing and Decoding with End-to-end Models (2019) (32)
- Tied & Reduced RNN-T Decoder (2021) (29)
- Highway-LSTM and Recurrent Highway Networks for Speech Recognition (2017) (29)
- Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus (2020) (28)
- Endpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech Recognition (2017) (27)
- Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition (2018) (27)
- Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling (2016) (27)
- N-best entropy based data selection for acoustic modeling (2012) (27)
- An analysis of sparseness and regularization in exemplar-based methods for speech classification (2010) (27)
- Kalman filtering for compressed sensing (2010) (26)
- Less is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging (2020) (26)
- RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions (2020) (26)
- Improving Performance of End-to-End ASR on Numeric Sequences (2019) (25)
- Contextual Speech Recognition with Difficult Negative Training Examples (2018) (22)
- Joint Unsupervised and Supervised Training for Multilingual ASR (2021) (22)
- Accelerating Hessian-free optimization for Deep Neural Networks by implicit preconditioning and sampling (2013) (21)
- Transformer Based Deliberation for Two-Pass Speech Recognition (2021) (21)
- Deep scattering spectra with deep neural networks for LVCSR tasks (2014) (20)
- Unsupervised Audio Segmentation using Extended Baum-Welch Transformations (2007) (19)
- Universal ASR: Unify and Improve Streaming ASR with Full-context Modeling (2020) (18)
- Emitting Word Timings with End-to-End Models (2020) (17)
- Learning Word-Level Confidence for Subword End-To-End ASR (2021) (16)
- Improving The Latency And Quality Of Cascaded Encoders (2022) (15)
- Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models (2019) (15)
- Multitask Training with Text Data for End-to-End Speech Recognition (2020) (14)
- Echo State Speech Recognition (2021) (14)
- Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction (2016) (13)
- Exemplar-based Sparse Representation phone identification features (2011) (13)
- Sparse representations for text categorization (2010) (13)
- Reducing the Computational Complexity of Two-Dimensional LSTMs (2017) (12)
- Improvements to filterbank and delta learning within a deep neural network framework (2014) (12)
- Low Latency Speech Recognition Using End-to-End Prefetching (2020) (12)
- Improving the efficiency of forward-backward algorithm using batched computation in TensorFlow (2017) (12)
- A convex hull approach to sparse representations for exemplar-based speech recognition (2011) (12)
- Broad phonetic class recognition in a Hidden Markov model framework using extended Baum-Welch transformations (2007) (11)
- Improving Proper Noun Recognition in End-To-End Asr by Customization of the Mwer Loss Criterion (2020) (11)
- A Sinusoidal Model Approach to Acoustic Landmark Detection and Segmentation for Robust Segment-Based Speech Recognition (2006) (11)
- Raw Multichannel Processing Using Deep Neural Networks (2017) (10)
- Massively Multilingual ASR: A Lifelong Learning Solution (2022) (10)
- An Attention-Based Joint Acoustic and Text on-Device End-To-End Model (2020) (10)
- A-Functions: A generalization of Extended Baum-Welch transformations to convex optimization (2011) (9)
- Deliberation of Streaming RNN-Transducer by Non-Autoregressive Decoding (2021) (8)
- Acoustic landmark detection and segmentation using the McAulay-Quatieri Sinusoidal Model (2005) (8)
- JOIST: A Joint Speech and Text Streaming Model for ASR (2022) (8)
- Island-driven search using broad phonetic classes (2009) (8)
- Audio classification using extended baum-welch transformations (2007) (7)
- Reducing Computational Complexities of Exemplar-Based Sparse Representations with Applications to Large Vocabulary Speech Recognition (2011) (7)
- A Deliberation-Based Joint Acoustic and Text Decoder (2021) (7)
- Multistate Encoding with End-To-End Speech RNN Transducer Network (2020) (7)
- Improving the Fusion of Acoustic and Text Representations in RNN-T (2022) (7)
- Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition (2022) (6)
- Generalization of extended baum-welch parameter estimation for discriminative training and decoding (2008) (6)
- Parallel deep neural network training for LVCSR tasks using blue gene/Q (2014) (6)
- Sparse Representation Phone Identification Features for Speech Recognition (2010) (6)
- Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification (2022) (5)
- E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR (2022) (5)
- A Language Agnostic Multilingual Streaming On-Device ASR System (2022) (5)
- Enhancing Exemplar-Based Posteriors for Speech Recognition Tasks (2012) (5)
- Transducer-Based Streaming Deliberation for Cascaded Encoders (2022) (5)
- Gradient steepness metrics using extended Baum-Welch transformations for universal pattern recognition tasks (2008) (5)
- Turn-Taking Prediction for Natural Conversational Speech (2022) (4)
- Improving Deliberation by Text-Only and Semi-Supervised Training (2022) (4)
- Applications of broad class knowledge for noise robust speech recognition (2009) (4)
- Lookup-Table Recurrent Language Models for Long Tail Speech Recognition (2021) (4)
- An evaluation of posterior modeling techniques for phonetic recognition (2013) (4)
- Convergence of Line Search A-Function Methods (2011) (4)
- Automated Design Data and Rationale Capture (2002) (3)
- The Use of isometric transformations and bayesian estimation in compressive sensing for fMRI classification (2010) (3)
- Improving Rare Word Recognition with LM-aware MWER Training (2022) (3)
- A comparison of broad phonetic and acoustic units for noise robust segment-based phonetic recognition (2008) (3)
- Spectral Distortion Model for Training Phase-Sensitive Deep-Neural Networks for Far-Field Speech Recognition (2018) (3)
- Annealed f-Smoothing as a Mechanism to Speed up Neural Network Training (2017) (2)
- Self-supervised Representation Learning for Speech Processing (2022) (2)
- Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems (2022) (2)
- SYSTEM AND METHOD FOR SPEECH RECOGNITION (2016) (2)
- A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes (2022) (2)
- A generalized family of parameter estimation techniques (2009) (2)
- Streaming Align-Refine for Non-autoregressive Deliberation (2022) (1)
- Reducing Impediments to Collaboration in a Virtual Design World (2002) (1)
- UML: A Universal Monolingual Output Layer for Multilingual ASR (2023) (1)
- Incorporating sparse representation phone identification features in automatic speech recognition using exponential families (2010) (1)
- Improving training time of Hessian-free optimization for deep neural networks using preconditioning and sampling (2013) (1)
- JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition (2023) (1)
- Application specific loss minimization using gradient boosting (2011) (1)
- Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages (2023) (1)
- Scaling Up Deliberation For Multilingual ASR (2022) (1)
- E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model (2022) (1)
- Streaming Intended Query Detection using E2E Modeling for Continued Conversation (2022) (1)
- Data selection for language modeling using sparse representations (2010) (1)
- From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition (2023) (1)
- Sparse Representations for Speech Recognition (2014) (1)
- A Truly Multilingual First Pass and Monolingual Second Pass Streaming on-Device ASR System (2023) (1)
- Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models (2023) (0)
- Protecting data a network entity, while maintaining network properties (2012) (0)
- Massively Multilingual Shallow Fusion with Large Language Models (2023) (0)
- Techniques for Improving Training Time of Deep Neural Networks with Applications to Speech Recognition (2014) (0)
- Multi-Output RNN-T Joint Networks for Multi-Task Learning of ASR and Auxiliary Tasks (2023) (0)
- Reducing Impediments to Collaboration in a Virtual World (2002) (0)
- Dual Learning for Large Vocabulary On-Device ASR (2023) (0)
- Context-Aware end-to-end ASR Using Self-Attentive Embedding and Tensor Fusion (2023) (0)
- Improving Contextual Biasing with Text Injection (2023) (0)
- Applicant : International Business Machines 83 . R : 358 : E (2017) (0)
- Editorial Editorial of Special Issue on Self-Supervised Learning for Speech and Audio Processing (2022) (0)
- Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR (2023) (0)
- A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale (2023) (0)
- Adapted Extended Baum-Welch transformations (2007) (0)
- A New Family of Extended Baum-Welch Update Rules (2008) (0)
- A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition (2022) (0)
- End-to-End Speech Recognition: A Survey (2023) (0)
- NAM+: Towards Scalable End-to-End Contextual Biasing for Adaptive ASR (2023) (0)
- Lego-Features: Exporting modular encoder features for streaming and deliberation ASR (2023) (0)
- Resource-Efficient Transfer Learning From Speech Foundation Model Using Hierarchical Feature Fusion (2022) (0)
- Efficient Domain Adaptation for Speech Foundation Models (2023) (0)
This paper list is powered by the following services:
Other Resources About Tara Sainath
What Schools Are Affiliated With Tara Sainath?
Tara Sainath is affiliated with the following schools: