Shinji Watanabe
#112,064
Most Influential Person Now
Researcher
Shinji Watanabe's AcademicInfluence.com Rankings
Shinji Watanabecomputer-science Degrees
Computer Science
#4261
World Rank
#4483
Historical Rank
Computational Linguistics
#374
World Rank
#380
Historical Rank
Database
#1483
World Rank
#1558
Historical Rank

Download Badge
Computer Science
Shinji Watanabe's Degrees
- PhD Computer Science University of Tokyo
- Masters Computer Science University of Tokyo
- Bachelors Computer Science University of Tokyo
Similar Degrees You Can Earn
Why Is Shinji Watanabe Influential?
(Suggest an Edit or Addition)Shinji Watanabe's Published Works
Number of citations in a given year to any of this author's works
Total number of citations to an author for the works they published in a given year. This highlights publication of the most important work(s) by the author
Published Works
- Deep clustering: Discriminative embeddings for segmentation and separation (2015) (1036)
- ESPnet: End-to-End Speech Processing Toolkit (2018) (941)
- The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines (2015) (788)
- Joint CTC-attention based end-to-end speech recognition using multi-task learning (2016) (669)
- Hybrid CTC/Attention Architecture for End-to-End Speech Recognition (2017) (529)
- Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR (2015) (525)
- Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks (2015) (492)
- A Comparative Study on Transformer vs RNN in Speech Applications (2019) (486)
- Single-Channel Multi-Speaker Separation Using Deep Clustering (2016) (363)
- An analysis of environment, microphone and data simulation mismatches in robust speech recognition (2017) (317)
- SUPERB: Speech processing Universal PERformance Benchmark (2021) (294)
- The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines (2013) (290)
- Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM (2017) (270)
- Heavy ion synchrotron for medical use —HIMAC project at NIRS-Japan— (1992) (248)
- Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks (2016) (248)
- Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge (2018) (183)
- CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings (2020) (166)
- Ebola Virus Glycoprotein: Proteolytic Processing, Acylation, Cell Tropism, and Detection of Neutralizing Antibodies (2001) (164)
- Deep beamforming networks for multi-channel speech recognition (2016) (162)
- Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration (2019) (159)
- Topic Tracking Model for Analyzing Consumer Purchase Behavior (2009) (152)
- Recent Developments on Espnet Toolkit Boosted By Conformer (2020) (150)
- End-to-End Neural Speaker Diarization with Permutation-Free Objectives (2019) (141)
- Recurrent deep neural networks for robust speech recognition (2014) (136)
- Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks (2015) (135)
- Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit (2019) (135)
- End-to-End Neural Speaker Diarization with Self-Attention (2019) (134)
- Functional Importance of the Coiled-Coil of the Ebola Virus Glycoprotein (2000) (120)
- Fashion Coordinates Recommender System Using Photographs from Fashion Magazines (2011) (112)
- A Review of Speaker Diarization: Recent Advances with Deep Learning (2021) (112)
- A comprehensive map of the influenza A virus replication cycle (2013) (110)
- ESPnet-ST: All-in-One Speech Translation Toolkit (2020) (108)
- Discriminative NMF and its application to single-channel source separation (2014) (106)
- Language independent end-to-end architecture for joint language identification and speech recognition (2017) (106)
- Variational bayesian estimation and clustering for speech recognition (2004) (103)
- End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors (2020) (102)
- Joint CTC/attention decoding for end-to-end speech recognition (2017) (98)
- The third 'CHiME' speech separation and recognition challenge: Analysis and outcomes (2017) (96)
- The second ‘CHiME’ speech separation and recognition challenge: An overview of challenge systems and outcomes (2013) (93)
- End-to-end Speech Recognition With Word-Based Rnn Language Models (2018) (92)
- Influenza A Virus Can Undergo Multiple Cycles of Replication without M2 Ion Channel Activity (2001) (91)
- Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques (2019) (87)
- Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict (2020) (87)
- Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition (2014) (86)
- Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling (2018) (86)
- Unsupervised Activity Recognition with User's Physical Characteristics Data (2011) (81)
- MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition (2019) (80)
- Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera (2012) (79)
- Back-Translation-Style Data Augmentation for end-to-end ASR (2018) (78)
- Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming (2017) (77)
- Student-teacher network learning with enhanced features (2017) (75)
- GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio (2021) (74)
- Multichannel End-to-end Speech Recognition (2017) (74)
- Phasebook and Friends: Leveraging Discrete Representations for Source Separation (2018) (69)
- Deep long short-term memory adaptive beamforming networks for multichannel robust speech recognition (2017) (68)
- Cycle-consistency Training for End-to-end Speech Recognition (2018) (67)
- Multi-Channel Speech Recognition : LSTMs All the Way Through (2016) (66)
- Duration-Controlled LSTM for Polyphonic Sound Event Detection (2017) (66)
- A Purely End-to-End System for Multi-speaker Speech Recognition (2018) (65)
- Intermediate Loss Regularization for CTC-Based Speech Recognition (2021) (61)
- Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing (2009) (61)
- Sequence summarizing neural network for speaker adaptation (2016) (60)
- Self-supervised Sequence-to-sequence ASR using Unpaired Speech and Text (2019) (59)
- End-To-End Multi-Speaker Speech Recognition With Transformer (2020) (57)
- End-to-end Monaural Multi-speaker ASR System without Pretraining (2018) (56)
- Multilingual End-to-End Speech Translation (2019) (56)
- End-to-End Multi-Speaker Speech Recognition (2018) (56)
- Massively Multilingual Adversarial Speech Recognition (2019) (55)
- Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline (2018) (54)
- Multi-Modal Data Augmentation for End-to-end ASR (2018) (54)
- Augmentation adversarial training for unsupervised speaker recognition (2020) (53)
- An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition – (1975) (53)
- Bayesian Speech and Language Processing (2015) (53)
- Espresso: A Fast End-to-End Neural Speech Recognition Toolkit (2019) (51)
- The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition (2015) (50)
- DISCRIMINATIVE METHODS FOR NOISE ROBUST SPEECH RECOGNITION: A CHIME CHALLENGE BENCHMARK (2013) (50)
- An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech (2018) (49)
- Structure discovery of deep neural network based on evolutionary algorithms (2015) (49)
- Torchaudio: Building Blocks for Audio and Speech Processing (2021) (49)
- Weakly-Supervised Sound Event Detection with Self-Attention (2020) (47)
- Vaccination-infection interval determines cross-neutralization potency to SARS-CoV-2 Omicron after breakthrough infection by other variants (2022) (47)
- Self-Supervised Speech Representation Learning: A Review (2022) (46)
- Far-Field Automatic Speech Recognition (2020) (46)
- Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis (2019) (45)
- Speaker Diarization with Region Proposal Network (2020) (45)
- The MERL/MELCO/TUM system for the REVERB Challenge using Deep Recurrent Neural Network Feature Enhancement (2014) (45)
- Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis (2020) (44)
- Discriminative training based on an integrated view of MPE and MMI in margin and error space (2010) (44)
- Semi-Supervised End-to-End Speech Recognition (2018) (43)
- Listen and Fill in the Missing Letters: Non-Autoregressive Transformer for Speech Recognition (2019) (43)
- Robust speech recognition in unknown reverberant and noisy conditions (2015) (41)
- Improved Mask-CTC for Non-Autoregressive End-to-End ASR (2020) (40)
- Uncertainty propagation through deep neural networks (2015) (40)
- Speech recognition in the presence of highly non-stationary noise based on spatial, spectral and temporal speech/noise modeling combined with dynamic variance adaptation (2011) (39)
- High-accuracy user identification using EEG biometrics (2016) (38)
- Deep Recurrent Networks for Separation and Recognition of Single-Channel Speech in Nonstationary Background Audio (2017) (38)
- Acoustic Modeling for Distant Multi-talker Speech Recognition with Single- and Multi-channel Branches (2019) (38)
- Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition (2017) (38)
- Application of Variational Bayesian Approach to Speech Recognition (2002) (38)
- Speech Enhancement Using End-to-End Speech Recognition Objectives (2019) (37)
- Transformer ASR with Contextual Block Processing (2019) (37)
- New Era for Robust Speech Recognition (2017) (37)
- ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration (2020) (37)
- Findings of the IWSLT 2022 Evaluation Campaign (2022) (36)
- A Study of Learning Based Beamforming Methods for Speech Recognition (2016) (36)
- Weakly-Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation (2018) (35)
- A method for optimization of fuzzy reasoning by genetic algorithms and its application to discrimination of myocardial heart disease (1998) (34)
- Non-Autoregressive Transformer for Speech Recognition (2021) (34)
- CONFORMER-BASED SOUND EVENT DETECTION WITH SEMI-SUPERVISED LEARNING AND DATA AUGMENTATION (2020) (34)
- Deep unfolding for multichannel source separation (2016) (33)
- DOVER-Lap: A Method for Combining Overlap-Aware Diarization Outputs (2020) (33)
- Neural Speaker Diarization with Speaker-Wise Chain Rule (2020) (31)
- Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection (2016) (31)
- CONVOLUTION-AUGMENTED TRANSFORMER FOR SEMI-SUPERVISED SOUND EVENT DETECTION Technical Report (2020) (31)
- Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives (2020) (30)
- Black box optimization for automatic speech recognition (2014) (30)
- Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders (2019) (30)
- Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement (2019) (30)
- Structured Discriminative Models For Speech Recognition: An Overview (2012) (30)
- Speaker Adaptation for Multichannel End-to-End Speech Recognition (2018) (29)
- Topic tracking language model for speech recognition (2011) (29)
- Transfer Learning of Language-independent End-to-end ASR with Language Model Fusion (2018) (29)
- Conditional Diffusion Probabilistic Model for Speech Enhancement (2022) (29)
- SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities (2022) (29)
- Auxiliary Feature Based Adaptation of End-to-end ASR Systems (2018) (29)
- The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS (2020) (29)
- Statistical Voice Conversion Based on Noisy Channel Model (2012) (29)
- Ensemble learning for speech enhancement (2013) (28)
- Automatic determination of acoustic model topology using variational Bayesian estimation and clustering for large vocabulary continuous speech recognition (2004) (28)
- End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification (2020) (28)
- Effectiveness of discriminative training and feature transformation for reverberated and noisy speech (2013) (27)
- Dialog state tracking with attention-based sequence-to-sequence learning (2016) (27)
- Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend (2017) (27)
- Towards Online End-to-end Transformer Automatic Speech Recognition (2019) (27)
- Efficient learning for spoken language understanding tasks with word embedding based pre-training (2015) (27)
- Streaming Transformer Asr With Blockwise Synchronous Beam Search (2020) (26)
- Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition (2021) (26)
- Improving End-to-end Speech Recognition with Pronunciation-assisted Sub-word Modeling (2018) (26)
- Online End-To-End Neural Diarization with Speaker-Tracing Buffer (2020) (26)
- Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models (2019) (25)
- An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition (2021) (25)
- The 2020 ESPnet Update: New Features, Broadened Applications, Performance Improvements, and Future Plans (2020) (24)
- The CHiME Challenges: Robust Speech Recognition in Everyday Environments (2017) (24)
- Vectorized Beam Search for CTC-Attention-Based Speech Recognition (2019) (24)
- A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation (2021) (24)
- End-To-End Speaker Diarization as Post-Processing (2020) (24)
- Integrated network analysis reveals a novel role for the cell cycle in 2009 pandemic influenza virus-induced inflammation in macaque lungs (2012) (24)
- High‐quality InxGa1–xAs/Al0.30Ga0.70As quantum dots grown in inverted pyramids (2003) (23)
- ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet (2021) (23)
- The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap (2021) (23)
- Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection (2018) (23)
- End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend (2021) (23)
- Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy (2015) (23)
- Insertion-Based Modeling for End-to-End Automatic Speech Recognition (2020) (23)
- Discriminative method for recurrent neural network language models (2015) (22)
- Noise suppression with unsupervised joint speaker adaptation and noise mixture model estimation (2012) (22)
- Minimum word error training of long short-term memory recurrent neural network language models for speech recognition (2016) (22)
- HEAR: Holistic Evaluation of Audio Representations (2022) (22)
- Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds (2013) (21)
- ESPnet2-TTS: Extending the Edge of TTS Research (2021) (21)
- SPGISpeech: 5, 000 hours of transcribed financial audio for fully formatted end-to-end speech recognition (2021) (20)
- A Study on Speech Enhancement Based on Diffusion Probabilistic Model (2021) (20)
- Inhibitory mechanism of tranilast in human coronary artery smooth muscle cells proliferation, due to blockade of PDGF‐BB‐receptors (2000) (20)
- On dynamic resource management mechanism using control theoretic approach for wide-area grid computing (2005) (19)
- Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding (2022) (19)
- DUAL SYSTEM COMBINATION APPROACH FOR VARIOUS REVERBERANT ENVIRONMENTS WITH DEREVERBERATION TECHNIQUES (2014) (19)
- Investigating Self-Supervised Learning for Speech Enhancement and Separation (2022) (19)
- End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming (2020) (19)
- An Investigation of End-to-End Multichannel Speech Recognition for Reverberant and Mismatch Conditions (2019) (19)
- Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation (2021) (19)
- Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors (2021) (18)
- Acoustic Modeling for Overlapping Speech Recognition: Jhu Chime-5 Challenge System (2019) (18)
- Bayesian modelling of the speech spectrum using mixture of Gaussians (2004) (18)
- End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection (2020) (18)
- Real-time meeting recognition and understanding using distant microphones and omni-directional camera (2010) (18)
- Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks (2021) (18)
- Correlation between the structural and antiferromagnetic phase transitions in ZnCr2Se4 (2003) (18)
- Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training (2009) (17)
- Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition (2019) (17)
- Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data (2010) (17)
- Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec (2021) (17)
- Using ASR Methods for OCR (2019) (17)
- Composite embedding systems for ZeroSpeech2017 Track1 (2017) (16)
- Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals (2020) (16)
- BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection (2017) (16)
- Multi-Stream End-to-End Speech Recognition (2019) (16)
- Arabic Speech Recognition by End-to-End, Modular Systems and Human (2021) (16)
- Beamforming networks using spatial covariance features for far-field speech recognition (2016) (16)
- Stream Attention-based Multi-array End-to-end Speech Recognition (2018) (16)
- Improving End-to-End Single-Channel Multi-Talker Speech Recognition (2020) (15)
- Structural Bayesian Linear Regression for Hidden Markov Models (2014) (15)
- End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings (2021) (15)
- Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer (2013) (14)
- The Phasebook: Building Complex Masks via Discrete Representations for Source Separation (2019) (14)
- Bayesian linear regression for Hidden Markov Model based on optimizing variational bounds (2011) (14)
- End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection (2021) (14)
- ESPnet-ST IWSLT 2021 Offline Speech Translation System (2021) (14)
- Multi-Head Decoder for End-to-End Speech Recognition (2018) (14)
- Speaker-Conditional Chain Model for Speech Separation and Extraction (2020) (14)
- Probabilistic integration of joint density model and speaker model for voice conversion (2010) (14)
- DiscreTalk: Text-to-Speech as a Machine Translation Problem (2020) (14)
- Online Unsupervised Classification With Model Comparison in the Variational Bayes Framework for Voice Activity Detection (2010) (13)
- S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations (2021) (13)
- Study of the Performance of Automatic Speech Recognition Systems in Speakers with Parkinson's Disease (2019) (13)
- Student-Teacher Learning for BLSTM Mask-based Speech Enhancement (2018) (13)
- Driver confusion status detection using recurrent neural networks (2016) (13)
- A Study of Transducer Based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies (2021) (13)
- Evolution-Strategy-Based Automation of System Development for High-Performance Speech Recognition (2019) (13)
- Analysis of Multilingual Sequence-to-Sequence speech recognition systems (2018) (13)
- Dual-Path RNN for Long Recording Speech Separation (2021) (13)
- ORTHROS: non-autoregressive end-to-end speech translation With dual-decoder (2020) (13)
- Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition (2021) (13)
- A unified view for discriminative objective functions based on negative exponential of difference measure between strings (2009) (13)
- Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization (2020) (12)
- Acoustic model adaptation based on coarse/fine training of transfer vectors and its application to a speaker adaptation task (2004) (12)
- Different efficacies of neutralizing antibodies and antiviral drugs on SARS-CoV-2 Omicron subvariants, BA.1 and BA.2 (2022) (12)
- The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results (2022) (12)
- Promising Accurate Prefix Boosting for Sequence-to-sequence ASR (2018) (12)
- Uncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features (2015) (12)
- Context-Sensitive and Role-Dependent Spoken Language Understanding Using Bidirectional and Attention LSTMs (2016) (12)
- Automated structure discovery and parameter tuning of neural network language model based on evolution strategy (2016) (12)
- Context Sensitive Spoken Language Understanding Using Role Dependent LSTM Layers (2015) (12)
- TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation (2022) (12)
- End-to-End SpeakerBeam for Single Channel Target Speech Recognition (2019) (12)
- Online meeting recognizer with multichannel speaker diarization (2010) (11)
- Multi-encoder multi-resolution framework for end-to-end speech recognition (2018) (11)
- Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers (2021) (11)
- Non-Autoregressive Transformer Automatic Speech Recognition (2019) (11)
- Attention-Based ASR with Lightweight and Dynamic Convolutions (2019) (11)
- Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker (2021) (11)
- Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models (2022) (11)
- End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation (2022) (11)
- Conferencingspeech Challenge: Towards Far-Field Multi-Channel Speech Enhancement for Video Conferencing (2021) (11)
- Minimum Error Classification with geometric margin control (2010) (10)
- MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments (2012) (10)
- Analysis of Robustness of Deep Single-Channel Speech Separation Using Corpora Constructed From Multiple Domains (2019) (10)
- Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios (2021) (10)
- Streaming Transformer ASR with Blockwise Synchronous Inference (2020) (10)
- Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection (2012) (10)
- BAYESIAN ACOUSTIC MODELING FOR SPONTANEOUS SPEECH RECOGNITION (2004) (10)
- Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages (2022) (9)
- Differences in the effects of mutations in GP131, a guinea pig cytomegalovirus homologue of pentameric complex component UL130, on macrophage and epithelial cell infection. (2018) (9)
- Speaker Recognition Benchmark Using the CHiME-5 Corpus (2019) (9)
- Structural Classification Methods Based on Weighted Finite-State Transducers for Automatic Speech Recognition (2012) (9)
- Leveraging State-of-the-art ASR Techniques to Audio Captioning (2021) (9)
- CNN-based Multichannel End-to-End Speech Recognition for Everyday Home Environments* (2018) (9)
- A Sequential Pattern Classifier Based on Hidden Markov Kernel Machine and Its Application to Phoneme Classification (2010) (9)
- Encoder-Decoder Based Attractors for End-to-End Neural Diarization (2021) (9)
- Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models (2021) (9)
- Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR (2017) (9)
- Speech Processing for Digital Home Assistants (2019) (9)
- Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models (2020) (9)
- HEAR 2021: Holistic Evaluation of Audio Representations (2022) (9)
- Performance Evaluation to Optimize the UMP System Focusing on Network Transmission Speed (2007) (8)
- Building Corpora for Single-Channel Speech Separation Across Multiple Domains (2018) (8)
- Coupled Initialization of Multi-Channel Non-Negative Matrix Factorization Based on Spatial and Spectral Information (2017) (8)
- Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain (2021) (8)
- End-to-End ASR with Adaptive Span Self-Attention (2020) (8)
- Ensemble integration of calibrated speaker localization and statistical speech detection in domestic environments (2014) (8)
- A discriminative model for continuous speech recognition based on Weighted Finite State Transducers (2010) (8)
- Driver prediction to improve interaction with in-vehicle HMI (2015) (8)
- Statistical Dialogue Management using Intention Dependency Graph (2013) (8)
- Gibbs sampling based Multi-scale Mixture Model for speaker clustering (2011) (8)
- Constructing shared-state hidden Markov models based on a Bayesian approach (2002) (8)
- Dual-Path Modeling for Long Recording Speech Separation in Meetings (2021) (7)
- Variance Compensation for Recognition of Reverberant Speech with Dereverberation Preprocessing (2011) (7)
- Generalized Weighted-Prediction-Error Dereverberation with Varying Source Priors For Reverberant Speech Recognition (2019) (7)
- High-accuracy user identification using EEG biometrics (2016) (7)
- Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording (2020) (7)
- CMU’s IWSLT 2022 Dialect Speech Translation System (2022) (7)
- Feature Enhancement With Joint Use of Consecutive Corrupted and Noise Feature Vectors With Discriminative Region Weighting (2013) (7)
- Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization (2021) (7)
- Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation (2020) (7)
- Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization (2021) (7)
- Decoding network optimization using minimum transition error training (2012) (7)
- Application of variational Bayesian estimation and clustering to acoustic model adaptation (2003) (7)
- A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition (2019) (7)
- CTC Alignments Improve Autoregressive Translation (2022) (7)
- Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021 (2021) (7)
- Data Selection by Sequence Summarizing Neural Network in Mismatch Condition Training (2016) (7)
- In Vitro Efficacy of Antiviral Agents against Omicron Subvariant BA.4.6 (2022) (7)
- A generalized discriminative training framework for system combination (2013) (7)
- STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency (2022) (7)
- Computerized analysis for classification of heart diseases in echocardiographic images (1996) (6)
- Multi-mode Transformer Transducer with Stochastic Future Context (2021) (6)
- Subspace pursuit method for kernel-log-linear models (2011) (6)
- Immunogenicity and protective efficacy of SARS-CoV-2 recombinant S-protein vaccine S-268019-b in cynomolgus monkeys (2022) (6)
- Learning Speaker Embedding from Text-to-Speech (2020) (6)
- Differentiable Allophone Graphs for Language-Universal Speech Recognition (2021) (6)
- High density and reliable packaging technology with Non Conductive Film for 3D/TSV (2013) (6)
- Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN (2015) (6)
- Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions (2021) (6)
- Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization (2010) (6)
- Predictor–Corrector Adaptation by Using Time Evolution System With Macroscopic Time Scale (2010) (6)
- Speech Recognition Based on Student's t-Distribution Derived from Total Bayesian Framework (2006) (6)
- Effects of Bayesian predictive classification using variational Bayesian posteriors for sparse training data in speech recognition (2005) (6)
- Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge (2022) (6)
- ASR2K: Speech Recognition for Around 2000 Languages without Audio (2022) (6)
- Combined static and dynamic variance adaptation for efficient interconnection of speech enhancement pre-processor with speech recognizer (2008) (6)
- Fully Bayesian inference of multi-mixture Gaussian model and its evaluation using speaker clustering (2012) (6)
- Discriminative training of acoustic models for system combination (2013) (6)
- Training Noisy Single-Channel Speech Separation with Noisy Oracle Sources: A Large Gap and a Small Step (2020) (6)
- Evolutionary optimization of long short-term memory neural network language model (2016) (6)
- Leveraging Pre-trained Language Model for Speech Sentiment Analysis (2021) (6)
- Search error risk minimization in Viterbi beam search for speech recognition (2010) (6)
- Pretraining by Backtranslation for End-to-End ASR in Low-Resource Settings (2018) (5)
- Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates (2021) (5)
- Auxiliary Loss Function for Target Speech Extraction and Recognition with Weak Supervision Based on Speaker Characteristics (2021) (5)
- Roles of GP33, a guinea pig cytomegalovirus-encoded G protein-coupled receptor homolog, in cellular signaling, viral growth and inflammation in vitro and in vivo (2018) (5)
- Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding (2021) (5)
- Fast similarity search on a large speech data set with neighborhood graph indexing (2010) (5)
- Multi-Channel End-To-End Neural Diarization with Distributed Microphones (2021) (5)
- Two-Pass Low Latency End-to-End Spoken Language Understanding (2022) (5)
- Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring (2021) (5)
- Non-stationary noise estimation method based on bias-residual component decomposition for robust speech recognition (2011) (5)
- Discriminative feature transforms using differenced maximum mutual information (2012) (5)
- Multi-microphone speech recognition in everyday environments (2017) (5)
- A new stabilized zero-crossing representation in the wavelet transform domain and signal reconstruction (1995) (5)
- Layer Pruning on Demand with Intermediate CTC (2021) (5)
- The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge (2020) (5)
- ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding (2022) (5)
- Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem (2021) (5)
- Gaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech Recognition (2021) (5)
- Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition (2018) (5)
- Speaker Clustering Based on Utterance-Oriented Dirichlet Process Mixture Model (2011) (5)
- End-to-End Multilingual Multi-Speaker Speech Recognition (2019) (5)
- Continuous Speech Separation Using Speaker Inventory for Long Recording (2021) (4)
- Sequence Transduction with Graph-Based Supervision (2021) (4)
- Training data selection with user’s physical characteristics data for acceleration-based activity modeling (2013) (4)
- Online Continual Learning of End-to-End Speech Recognition Models (2022) (4)
- Encoder-Decoder Based Attractor Calculation for End-to-End Neural Diarization (2021) (4)
- Discriminative approach to dynamic variance adaptation for noisy speech recognition (2011) (4)
- E-Branchformer: Branchformer with Enhanced Merging for Speech Recognition (2022) (4)
- Online End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers (2021) (4)
- A unified interpretation of adaptation approaches based on a macroscopic time evolution system and indirect/direct adaptation approaches (2008) (4)
- TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation (2022) (4)
- Training Data Augmentation and Data Selection (2017) (4)
- LegoNN: Building Modular Encoder-Decoder Models (2022) (4)
- Acoustic Event Detection with Classifier Chains (2021) (4)
- End-to-end ASR to jointly predict transcriptions and linguistic annotations (2021) (4)
- Superb @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning (2022) (4)
- Evolution Strategy Based Neural Network Optimization and LSTM Language Model for Robust Speech Recognition (2016) (4)
- TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement (2023) (4)
- JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification (2021) (4)
- On Compressing Sequences for Self-Supervised Speech Models (2022) (4)
- A Robust Estimation Method of Noise Mixture Model for Noise Suppression (2011) (4)
- A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination (2010) (4)
- Cost-level integration of statistical and rule-based dialog managers (2014) (4)
- Characterization of a new polarity switching negative tone e-beam resist for 14nm and 10nm logic node mask fabrication and beyond (2014) (4)
- Minimum latency training of sequence transducers for streaming end-to-end speech recognition (2022) (4)
- Semi-Supervised Learning of a Pronunciation Dictionary from Disjoint Phonemic Transcripts and Text (2017) (4)
- Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis (2022) (4)
- Deep Speech Synthesis from Articulatory Representations (2022) (3)
- Log-linear dialog manager (2014) (3)
- Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis (2022) (3)
- Advanced computational models and learning theories for spoken language processing (2006) (3)
- High accurate model-integration-based voice conversion using dynamic features and model structure optimization (2011) (3)
- Attention-Based Multi-Hypothesis Fusion for Speech Summarization (2021) (3)
- Improving Speech Enhancement through Fine-Grained Speech Characteristics (2022) (3)
- On Prosody Modeling for ASR+TTS Based Voice Conversion (2021) (3)
- Vectorization of hypotheses and speech for faster beam search in encoder decoder-based speech recognition (2018) (3)
- Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation (2022) (3)
- Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language Documentation (2021) (3)
- SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy (2022) (3)
- SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition (2021) (3)
- Introduction to the Issue on Far-Field Speech Processing in the Era of Deep Learning: Speech Enhancement, Separation, and Recognition (2019) (3)
- Overview of the 2nd 'CHiME' Speech Separation and Recognition Challenge (2013) (3)
- Effectiveness of Single-Channel BLSTM Enhancement for Language Identification (2018) (3)
- Bayesian approaches to acoustic modeling: a review (2012) (3)
- Defining Reasonably Foreseeable Parameter Ranges Using Real-World Traffic Data for Scenario-Based Safety Assessment of Automated Vehicles (2022) (3)
- Train from scratch: Single-stage joint training of speech separation and recognition (2022) (3)
- EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers (2022) (3)
- Incremental Adaptation Based on a Macroscopic Time Evolution System (2007) (3)
- Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation (2022) (3)
- A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding (2022) (3)
- The JHU/KyotoU Speech Translation System for IWSLT 2018 (2018) (3)
- Selection of Shared-State Hidden Markov Model Structure Using Bayesian Criterion (2005) (3)
- ESPnet2 pretrained model, Shinji Watanabe/librispeech_asr_train_asr_transformer_e18_raw_bpe_sp_valid.acc.best, fs=16k, lang=en (2020) (3)
- Toward Streaming ASR with Non-Autoregressive Insertion-Based Model (2020) (3)
- END-TO-END ASR AND AUDIO SEGMENTATION WITH NON-AUTOREGRESSIVE INSERTION-BASED MODEL (2020) (2)
- Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks (2021) (2)
- Acoustic Model Adaptation Based on Coarse/Fine Training of Transfer Vectors Using Directional Statistics (2006) (2)
- Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model (2012) (2)
- Model Adaptation for Automatic Speech Recognition Based on Multiple Time Scale Evolution (2011) (2)
- Influence relation estimation based on lexical entrainment in conversation (2013) (2)
- Stereo-based feature enhancement using dictionary learning (2013) (2)
- Memory-Efficient Training of RNN-Transducer with Sampled Softmax (2022) (2)
- Low-latency meeting recognition and understanding using distant microphones (2011) (2)
- Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR (2022) (2)
- Joint Speech Recognition and Audio Captioning (2022) (2)
- Cross-Lingual Transfer for Speech Processing Using Acoustic Language Similarity (2021) (2)
- Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors (2022) (2)
- Automated Development of DNN Based Spoken Language Systems Using Evolutionary Algorithms (2020) (2)
- Using online model comparison in the Variational Bayes framework for online unsupervised Voice Activity Detection (2010) (2)
- Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR (2022) (2)
- Learning Influences from Word Use in Polylogue (2011) (2)
- Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble (2022) (2)
- Non-Autoregressive End-To-End Automatic Speech Recognition Incorporating Downstream Natural Language Processing (2022) (2)
- ESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper (2019) (2)
- A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech (2023) (2)
- End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party (2022) (2)
- NORMALIZATION AND ADAPTATION BY CONSISTENTLY EMPLOYING MAP ESTIMATION (2012) (2)
- Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis (2022) (2)
- Short range structure of lead-lithium fluoride obtained by XAFS analysis (2005) (2)
- Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge (2023) (2)
- Self-supervised Representation Learning for Speech Processing (2022) (2)
- TriniTTS: Pitch-controllable End-to-end TTS without External Aligner (2022) (2)
- Development of mapping system for distribution facility management (1989) (2)
- In search of strong embedding extractors for speaker diarisation (2022) (2)
- Prior-shared feature and model space speaker adaptation by consistently employing map estimation (2013) (2)
- A sampling-based speaker clustering using utterance-oriented Dirichlet process mixture model and its evaluation on large-scale data (2015) (2)
- A stabilized zero-crossing representation in the wavelet transform domain and its extension to image representation for early vision (1996) (2)
- Building Speech Recognition System from Untranscribed Data Report from JHU workshop 2016 (2016) (2)
- Are twindemics occurring? (2022) (2)
- An algorithm for making a correspondence of zero-crossing points in a wavelet transform domain with second-order derivative property (1995) (1)
- Basis vector orthogonalization for an improved kernel gradient matching pursuit method (2012) (1)
- A Kernel Machine Derived by Minimum Relative Entropy Discrimination For Automatic Speech Recognition (2009) (1)
- Bayesian approaches in speech recognition (2011) (1)
- On-line adaptation and Bayesian detection of environmental changes based on a macroscopic time evolution system (2009) (1)
- End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation (2022) (1)
- The Multimodal Information based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and Recognition (2023) (1)
- Better Intermediates Improve CTC Inference (2022) (1)
- Learning influences fromword use in polylogue (2011) (1)
- Novel Deep Architectures in Speech Processing (2017) (1)
- Run-and-Back Stitch Search: Novel Block Synchronous Decoding For Streaming Encoder-Decoder ASR (2022) (1)
- SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks (2022) (1)
- Large-scale learning of generalised representations for speaker recognition (2022) (1)
- Sequential maximum mutual information linear discriminant analysis for speech recognition (2014) (1)
- Convolutional bidirectional long short-term memory hidden Markov model hybrid system for polyphonic sound event detection (2016) (1)
- Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study (2023) (1)
- Application of topic tracking model to language model adaptation and meeting analysis (2010) (1)
- Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data (2011) (1)
- BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model (2022) (1)
- Low-Resource Contextual Topic Identification on Speech (2018) (1)
- Residual Language Model for End-to-end Speech Recognition (2022) (1)
- Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation (2022) (1)
- Prior-based Binary Masking and Discriminative Methods for Reverberant and Noisy Speech Recognition Using Distant Stereo Microphones (2017) (1)
- Guest Editorial for the special issue on Multi-Microphone Speech Recognition in Everyday Environments (2017) (1)
- An external quality assessment feasibility study; cross laboratory comparison of haemagglutination inhibition assay and microneutralization assay performance for seasonal influenza serology testing: A FLUCOP study (2023) (1)
- Sequence discriminative training for low-rank deep neural networks (2014) (1)
- Multi-blank Transducers for Speech Recognition (2022) (1)
- Bag Of ARCS: New representation of speech segment features based on finite state machines (2012) (1)
- Toolkits for Robust Speech Processing (2017) (1)
- Speaker Verification-Based Evaluation of Single-Channel Speech Separation (2021) (1)
- SpeechLMScore: Evaluating speech generation using speech language model (2022) (1)
- Context-aware Fine-tuning of Self-supervised Speech Models (2022) (1)
- Avoid Overthinking in Self-Supervised Models for Speech Recognition (2022) (1)
- Handling uncertain observations in unsupervised topic-mixture language model adaptation (2012) (1)
- Speaker-Independent Acoustic-to-Articulatory Speech Inversion (2023) (1)
- An efficient plasmid-driven system for the generation of influenza virus-like particles for vaccine (2001) (1)
- Joint speaker diarization and speech recognition based on region proposal networks (2021) (1)
- PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement (2023) (0)
- Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection (2022) (0)
- Streaming Joint Speech Recognition and Disfluency Detection (2022) (0)
- Report on NTT Communications Science Laboratories Open House 2011 (2011) (0)
- A new stabilized zero — Crossing representation in the wavelet transform domain and its application to image processing (1996) (0)
- Impact of Reinfection with SARS-CoV-2 Omicron Variants in Previously Infected Hamsters (2022) (0)
- The Method of Evaluating Icon based on Agreement Level with Design Concept Intended for Mobile Phone (2005) (0)
- Bridging Speech and Textual Pre-trained Models with Unsupervised ASR (2022) (0)
- Efficient Sequence Transduction by Jointly Predicting Tokens and Durations (2023) (0)
- Are Depositors Aware of the Governance of their Banks?1 (2013) (0)
- ESPnet2 pretrained model, Shinji Watanabe/laborotv_asr_train_asr_conformer2_latest33_raw_char_sp_valid.acc.ave, fs=16k, lang=jp (2020) (0)
- Adaptive & discriminative speech modeling to cope with temporal changes of environments (2011) (0)
- Robust speech processing using observation uncertainty and uncertainty propagation: session and paper overview (2015) (0)
- Stereo-Input Speech Recognition Using Sparseness-Based Blind Source Separation (2010) (0)
- When Is TTS Augmentation Through a Pivot Language Useful? (2022) (0)
- End-to-End Speech Recognition: A Survey (2023) (0)
- Intrusion of Coastal Oyashio water to Funka Bay and Tsugaru Strait occasionally disturbed by Kuroshio-originating warm core ring (2023) (0)
- Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining (2023) (0)
- A.R.アモンズと現代アメリカ詩 (1990) (0)
- Antiviral Susceptibilities of Distinct Lineages of Influenza C and D Viruses (2023) (0)
- SUPERB: Speech Understanding and PERformance Benchmark (2021) (0)
- Automated Discrimination of Heart Disease Using Artificial (1998) (0)
- ESPnet2 pretrained model, Shinji Watanabe/spgispeech_asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_bpe10000_valid.acc.ave, fs=16k, lang=en (2021) (0)
- ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit (2023) (0)
- Voice Activity Detection Using Dirichlet Prior (2009) (0)
- The potential of a universal influenza virus-like particle vaccine expressing a chimeric cytokine (2022) (0)
- Challenges of Corporate Alliance CLOMA toward Plastic Litter (2023) (0)
- An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer (2021) (0)
- Feature-space Adaptation with a Weighted Sum of Multiple Transformation Matrices Based on Regression Tree for Automatic Speech Recognition (2017) (0)
- Holding for the 77th Annual Meeting of the JSMI (2002) (0)
- A stabilized multiscale zero - crossing image representation for image processing tasks at the level of the early vision (1996) (0)
- 21231 A Design Method Considering Earthquake Input Levels for Base Isolated Structure with Variable Oil Damper (2011) (0)
- ESPnet2 pretrained model, Shinji Watanabe/gigaspeech_asr_train_asr_raw_en_bpe5000_valid.acc.ave, fs=16k, lang=en (2021) (0)
- Application of Source Separation to Robust Speech Analysis and Recognition (2018) (0)
- Research note: Residents’ Assessment of Local Government Information Systems (2014) (0)
- Strains in heterostructures detected by standard NMR (2010) (0)
- SUMMARIZING NEURAL NETWORK FOR SPEAKER ADAPTATION (2016) (0)
- BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder (2022) (0)
- Structural Bayesian Linear Regression for Hidden Markov Models (2013) (0)
- ESPnet-ONNX: Bridging a Gap Between Research and Production (2022) (0)
- Vector and matrix formulas (2015) (0)
- Immunogenicity and Protective Efficacy of Replication-Incompetent Influenza Virus-Like Particles (2001) (0)
- C L ] 1 O ct 2 01 9 MULTILINGUAL END-TO-END SPEECH TRANSLATION (2019) (0)
- Training data selection with user’s physical characteristics data for acceleration-based activity modeling (2011) (0)
- Gear shift control method for an automatic automotive transmission (1994) (0)
- The Purpose of Health and Labour Sciences Research Grants "Study in the Present Status of Alarms of Medical Equipment and Alarm-related Problems"(Alarm System for Medical Devices) (2002) (0)
- Speech recognition based on a Bayesian approach =Beizu-teki shuho ni motozuku onsei ninshiki (2006) (0)
- Bayesian Speech and Language Processing: Bayesian approach (2015) (0)
- Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization (2022) (0)
- Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering (2016) (0)
- A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge (2023) (0)
- Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments (2015) (0)
- Statistical models in speech and language processing (2015) (0)
- Influenza B Virus BM2 Protein Is Transported through the trans-Golgi Network as an Integral Membrane Protein (2003) (0)
- A pattern recognition device and pattern recognition methods (2013) (0)
- AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head (2023) (0)
- Saturation time of exposure interval for cross-neutralization response to SARS-CoV-2: Implications for vaccine dose interval (2023) (0)
- Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling (2023) (0)
- Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models (2022) (0)
- Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding (2023) (0)
- End-to-End Multi-Speaker ASR with Independent Vector Analysis (2022) (0)
- A dilemma of ground truth in noisy speech separation and an approach to lessen the impact of imperfect training data (2022) (0)
- Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models (2022) (0)
- InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss (2022) (0)
- Discriminative Beamforming with Phase-Aware Neural Networks for Speech Enhancement and Recognition (2017) (0)
- Use of the particle agglutination/particle agglutination inhibition test for antigenic analysis of SARS‐CoV‐2 (2022) (0)
- Enhancing Speech-to-Speech Translation with Multiple TTS Targets (2023) (0)
- ESPnet2 pretrained model, Shinji Watanabe/librispeech_asr_train_asr_conformer_raw_bpe_batch_bins30000000_accum_grad3_optim_conflr0.001_sp_valid.acc.ave, fs=16k, lang=en (2020) (0)
- DESIGN DEVELOPMENT ON NEW TOSHIBA RICE COOKER(Proceedings of the 35th Annual Conference of the JSSD) (1988) (0)
- Bayesian Speech and Language Processing: Introduction (2015) (0)
- 4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders (2022) (0)
- C-35 Design for Base-Isolated structure with Variable Oil Damper using design optimization method (2010) (0)
- A S ] 2 0 A ug 2 01 9 Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text (2019) (0)
- EURO: ESPnet Unsupervised ASR Open-source Toolkit (2022) (0)
- Decentralised control of corrugator line (1990) (0)
- An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion (2022) (0)
- A modeling toward the elucidation of disturbances in the complex system of the global coupled Ionosphere-Thermosphere (2002) (0)
- Integration of speech separation, diarization, and recognition for multi-speaker meetings: Separated LibriCSS dataset (2021) (0)
- Evaluation of Noisy Speech Recognition and Sequence Discriminative Training for Low-rank Deep Neural Network Acoustic Models (2016) (0)
- Improving Massively Multilingual ASR With Auxiliary CTC Objectives (2023) (0)
- Stereo-input speech recognition using sparseness-based time-frequency masking in a reverberant environment (2009) (0)
- Editorial Editorial of Special Issue on Self-Supervised Learning for Speech and Audio Processing (2022) (0)
- ESPnet2 pretrained model, Shinji Watanabe/spgispeech_asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_unnorm_bpe5000_valid.acc.ave, fs=16k, lang=en_unnorm (2021) (0)
- Effect of dialog acts on word use in polylogue (2012) (0)
- VQ-T: RNN Transducers using Vector-Quantized Prediction Network States (2022) (0)
- EAT: Enhanced ASR-TTS Framework for Self-supervised ASR (2020) (0)
- Dry, Focus, and Transcribe: End-to-End Integration of Dereverberation, Beamforming, and ASR (2019) (0)
- The Aim of the Special Feature(<Special Isuue:How should We Manage Medical Equipment?>) (2000) (0)
- Acid Rain: Statistical Analysis of Ionic Correlations Questioned (2005) (0)
- C L ] 3 M ay 2 02 1 SUPERB : Speech processing Universal PERformance Benchmark (2021) (0)
- Phase diagram and transport properties of Y1−xNdxCo2 pseudo-binary alloys (2013) (0)
- I3D: Transformer architectures with input-dependent dynamic depth for speech recognition (2023) (0)
- The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge (2023) (0)
- Observation of strains caused by heterostructure interfaces (2011) (0)
- 122. Evaluation of Processor SRX-503 (1993) (0)
- 2 N ov 2 01 8 VECTORIZATION OF HYPOTHESES AND SPEECH FOR FASTER BEAM SEARCH IN ENCODER DECODER-BASED SPEECH RECOGNITION (2018) (0)
- An Empirical Study of Training Mixture Generation Strategies on Speech Separation: Dynamic Mixing and Augmentation (2022) (0)
- Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments (2015) (0)
- Phone Inventories and Recognition for Every Language (2022) (0)
- ESPnet2 pretrained model, Shinji Watanabe/open_li52_asr_train_asr_raw_bpe7000_valid.acc.ave, fs=16k, lang=noinfo (2021) (0)
- Automatic DispenYing Checking System Using Two-dimensional Barcode Symbols. (1995) (0)
- Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History (2023) (0)
- Towards Zero-Shot Code-Switched Speech Recognition (2022) (0)
- Improving Frame-Online Neural Speech Enhancement With Overlapped-Frame Prediction (2022) (0)
- Paper index (2020) (0)
- Large Geometric Margin Minimum Error Classification (2009) (0)
- A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units (2022) (0)
- Pathogenicity of two novel human-origin H7N9 highly pathogenic avian influenza viruses in chickens and ducks (2018) (0)
- Integrating Multiple ASR Systems into NLP Backend with Attention Fusion (2022) (0)
- A method for converting a noisy signal in an extended audio signal (2015) (0)
- Acoustic models in speech recognition( ;recent progress and future prospects of automatic speech recognition research) (2009) (0)
- Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization (2022) (0)
This paper list is powered by the following services:
What Schools Are Affiliated With Shinji Watanabe?
Shinji Watanabe is affiliated with the following schools: