Shinji Watanabe

Q: What Schools Are Affiliated With Shinji Watanabe

Shinji Watanabe is affiliated with the following schools: Hokkaido University, University of Tokyo, University of Wisconsin–Madison, Carnegie Mellon University, Institute of Science Tokyo, Gifu Pharmaceutical University, Johns Hopkins University, Kyoto University, Waseda University

Shinji Watanabe's AcademicInfluence.com Rankings

Shinji Watanabe

Computer Science

#4261

World Rank

#4483

Historical Rank

Computational Linguistics

#374

World Rank

#380

Historical Rank

Database

#1483

World Rank

#1558

Historical Rank

computer-science Degrees

Download Badge

Computer Science

Shinji Watanabe's Degrees

PhD Computer Science University of Tokyo
Masters Computer Science University of Tokyo
Bachelors Computer Science University of Tokyo

Similar Degrees You Can Earn

Why Is Shinji Watanabe Influential?

(Suggest an Edit or Addition)

(See a Problem?)

Shinji Watanabe's Published Works

Number of citations in a given year to any of this author's works

Total number of citations to an author for the works they published in a given year. This highlights publication of the most important work(s) by the author

Published Works

Deep clustering: Discriminative embeddings for segmentation and separation (2015) (1036)
ESPnet: End-to-End Speech Processing Toolkit (2018) (941)
The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines (2015) (788)
Joint CTC-attention based end-to-end speech recognition using multi-task learning (2016) (669)
Hybrid CTC/Attention Architecture for End-to-End Speech Recognition (2017) (529)
Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR (2015) (525)
Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks (2015) (492)
A Comparative Study on Transformer vs RNN in Speech Applications (2019) (486)
Single-Channel Multi-Speaker Separation Using Deep Clustering (2016) (363)
An analysis of environment, microphone and data simulation mismatches in robust speech recognition (2017) (317)
SUPERB: Speech processing Universal PERformance Benchmark (2021) (294)
The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines (2013) (290)
Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM (2017) (270)
Heavy ion synchrotron for medical use —HIMAC project at NIRS-Japan— (1992) (248)
Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks (2016) (248)
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge (2018) (183)
CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings (2020) (166)
Ebola Virus Glycoprotein: Proteolytic Processing, Acylation, Cell Tropism, and Detection of Neutralizing Antibodies (2001) (164)
Deep beamforming networks for multi-channel speech recognition (2016) (162)
Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration (2019) (159)
Topic Tracking Model for Analyzing Consumer Purchase Behavior (2009) (152)
Recent Developments on Espnet Toolkit Boosted By Conformer (2020) (150)
End-to-End Neural Speaker Diarization with Permutation-Free Objectives (2019) (141)
Recurrent deep neural networks for robust speech recognition (2014) (136)
Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks (2015) (135)
Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit (2019) (135)
End-to-End Neural Speaker Diarization with Self-Attention (2019) (134)
Functional Importance of the Coiled-Coil of the Ebola Virus Glycoprotein (2000) (120)
Fashion Coordinates Recommender System Using Photographs from Fashion Magazines (2011) (112)
A Review of Speaker Diarization: Recent Advances with Deep Learning (2021) (112)
A comprehensive map of the influenza A virus replication cycle (2013) (110)
ESPnet-ST: All-in-One Speech Translation Toolkit (2020) (108)
Discriminative NMF and its application to single-channel source separation (2014) (106)
Language independent end-to-end architecture for joint language identification and speech recognition (2017) (106)
Variational bayesian estimation and clustering for speech recognition (2004) (103)
End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors (2020) (102)
Joint CTC/attention decoding for end-to-end speech recognition (2017) (98)
The third 'CHiME' speech separation and recognition challenge: Analysis and outcomes (2017) (96)
The second ‘CHiME’ speech separation and recognition challenge: An overview of challenge systems and outcomes (2013) (93)
End-to-end Speech Recognition With Word-Based Rnn Language Models (2018) (92)
Influenza A Virus Can Undergo Multiple Cycles of Replication without M2 Ion Channel Activity (2001) (91)
Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques (2019) (87)
Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict (2020) (87)
Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition (2014) (86)
Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling (2018) (86)
Unsupervised Activity Recognition with User's Physical Characteristics Data (2011) (81)
MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition (2019) (80)
Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera (2012) (79)
Back-Translation-Style Data Augmentation for end-to-end ASR (2018) (78)
Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming (2017) (77)
Student-teacher network learning with enhanced features (2017) (75)
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio (2021) (74)
Multichannel End-to-end Speech Recognition (2017) (74)
Phasebook and Friends: Leveraging Discrete Representations for Source Separation (2018) (69)
Deep long short-term memory adaptive beamforming networks for multichannel robust speech recognition (2017) (68)
Cycle-consistency Training for End-to-end Speech Recognition (2018) (67)
Multi-Channel Speech Recognition : LSTMs All the Way Through (2016) (66)
Duration-Controlled LSTM for Polyphonic Sound Event Detection (2017) (66)
A Purely End-to-End System for Multi-speaker Speech Recognition (2018) (65)
Intermediate Loss Regularization for CTC-Based Speech Recognition (2021) (61)
Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing (2009) (61)
Sequence summarizing neural network for speaker adaptation (2016) (60)
Self-supervised Sequence-to-sequence ASR using Unpaired Speech and Text (2019) (59)
End-To-End Multi-Speaker Speech Recognition With Transformer (2020) (57)
End-to-end Monaural Multi-speaker ASR System without Pretraining (2018) (56)
Multilingual End-to-End Speech Translation (2019) (56)
End-to-End Multi-Speaker Speech Recognition (2018) (56)
Massively Multilingual Adversarial Speech Recognition (2019) (55)
Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline (2018) (54)
Multi-Modal Data Augmentation for End-to-end ASR (2018) (54)
Augmentation adversarial training for unsupervised speaker recognition (2020) (53)
An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition – (1975) (53)
Bayesian Speech and Language Processing (2015) (53)
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit (2019) (51)
The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition (2015) (50)
DISCRIMINATIVE METHODS FOR NOISE ROBUST SPEECH RECOGNITION: A CHIME CHALLENGE BENCHMARK (2013) (50)
An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech (2018) (49)
Structure discovery of deep neural network based on evolutionary algorithms (2015) (49)
Torchaudio: Building Blocks for Audio and Speech Processing (2021) (49)
Weakly-Supervised Sound Event Detection with Self-Attention (2020) (47)
Vaccination-infection interval determines cross-neutralization potency to SARS-CoV-2 Omicron after breakthrough infection by other variants (2022) (47)
Self-Supervised Speech Representation Learning: A Review (2022) (46)
Far-Field Automatic Speech Recognition (2020) (46)
Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis (2019) (45)
Speaker Diarization with Region Proposal Network (2020) (45)
The MERL/MELCO/TUM system for the REVERB Challenge using Deep Recurrent Neural Network Feature Enhancement (2014) (45)
Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis (2020) (44)
Discriminative training based on an integrated view of MPE and MMI in margin and error space (2010) (44)
Semi-Supervised End-to-End Speech Recognition (2018) (43)
Listen and Fill in the Missing Letters: Non-Autoregressive Transformer for Speech Recognition (2019) (43)
Robust speech recognition in unknown reverberant and noisy conditions (2015) (41)
Improved Mask-CTC for Non-Autoregressive End-to-End ASR (2020) (40)
Uncertainty propagation through deep neural networks (2015) (40)
Speech recognition in the presence of highly non-stationary noise based on spatial, spectral and temporal speech/noise modeling combined with dynamic variance adaptation (2011) (39)
High-accuracy user identification using EEG biometrics (2016) (38)
Deep Recurrent Networks for Separation and Recognition of Single-Channel Speech in Nonstationary Background Audio (2017) (38)
Acoustic Modeling for Distant Multi-talker Speech Recognition with Single- and Multi-channel Branches (2019) (38)
Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition (2017) (38)
Application of Variational Bayesian Approach to Speech Recognition (2002) (38)
Speech Enhancement Using End-to-End Speech Recognition Objectives (2019) (37)
Transformer ASR with Contextual Block Processing (2019) (37)
New Era for Robust Speech Recognition (2017) (37)
ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration (2020) (37)
Findings of the IWSLT 2022 Evaluation Campaign (2022) (36)
A Study of Learning Based Beamforming Methods for Speech Recognition (2016) (36)
Weakly-Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation (2018) (35)
A method for optimization of fuzzy reasoning by genetic algorithms and its application to discrimination of myocardial heart disease (1998) (34)
Non-Autoregressive Transformer for Speech Recognition (2021) (34)
CONFORMER-BASED SOUND EVENT DETECTION WITH SEMI-SUPERVISED LEARNING AND DATA AUGMENTATION (2020) (34)
Deep unfolding for multichannel source separation (2016) (33)
DOVER-Lap: A Method for Combining Overlap-Aware Diarization Outputs (2020) (33)
Neural Speaker Diarization with Speaker-Wise Chain Rule (2020) (31)
Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection (2016) (31)
CONVOLUTION-AUGMENTED TRANSFORMER FOR SEMI-SUPERVISED SOUND EVENT DETECTION Technical Report (2020) (31)
Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives (2020) (30)
Black box optimization for automatic speech recognition (2014) (30)
Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders (2019) (30)
Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement (2019) (30)
Structured Discriminative Models For Speech Recognition: An Overview (2012) (30)
Speaker Adaptation for Multichannel End-to-End Speech Recognition (2018) (29)
Topic tracking language model for speech recognition (2011) (29)
Transfer Learning of Language-independent End-to-end ASR with Language Model Fusion (2018) (29)
Conditional Diffusion Probabilistic Model for Speech Enhancement (2022) (29)
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities (2022) (29)
Auxiliary Feature Based Adaptation of End-to-end ASR Systems (2018) (29)
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS (2020) (29)
Statistical Voice Conversion Based on Noisy Channel Model (2012) (29)
Ensemble learning for speech enhancement (2013) (28)
Automatic determination of acoustic model topology using variational Bayesian estimation and clustering for large vocabulary continuous speech recognition (2004) (28)
End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification (2020) (28)
Effectiveness of discriminative training and feature transformation for reverberated and noisy speech (2013) (27)
Dialog state tracking with attention-based sequence-to-sequence learning (2016) (27)
Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend (2017) (27)
Towards Online End-to-end Transformer Automatic Speech Recognition (2019) (27)
Efficient learning for spoken language understanding tasks with word embedding based pre-training (2015) (27)
Streaming Transformer Asr With Blockwise Synchronous Beam Search (2020) (26)
Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition (2021) (26)
Improving End-to-end Speech Recognition with Pronunciation-assisted Sub-word Modeling (2018) (26)
Online End-To-End Neural Diarization with Speaker-Tracing Buffer (2020) (26)
Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models (2019) (25)
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition (2021) (25)
The 2020 ESPnet Update: New Features, Broadened Applications, Performance Improvements, and Future Plans (2020) (24)
The CHiME Challenges: Robust Speech Recognition in Everyday Environments (2017) (24)
Vectorized Beam Search for CTC-Attention-Based Speech Recognition (2019) (24)
A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation (2021) (24)
End-To-End Speaker Diarization as Post-Processing (2020) (24)
Integrated network analysis reveals a novel role for the cell cycle in 2009 pandemic influenza virus-induced inflammation in macaque lungs (2012) (24)
High‐quality InxGa1–xAs/Al0.30Ga0.70As quantum dots grown in inverted pyramids (2003) (23)
ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet (2021) (23)
The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap (2021) (23)
Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection (2018) (23)
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend (2021) (23)
Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy (2015) (23)
Insertion-Based Modeling for End-to-End Automatic Speech Recognition (2020) (23)
Discriminative method for recurrent neural network language models (2015) (22)
Noise suppression with unsupervised joint speaker adaptation and noise mixture model estimation (2012) (22)
Minimum word error training of long short-term memory recurrent neural network language models for speech recognition (2016) (22)
HEAR: Holistic Evaluation of Audio Representations (2022) (22)
Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds (2013) (21)
ESPnet2-TTS: Extending the Edge of TTS Research (2021) (21)
SPGISpeech: 5, 000 hours of transcribed financial audio for fully formatted end-to-end speech recognition (2021) (20)
A Study on Speech Enhancement Based on Diffusion Probabilistic Model (2021) (20)
Inhibitory mechanism of tranilast in human coronary artery smooth muscle cells proliferation, due to blockade of PDGF‐BB‐receptors (2000) (20)
On dynamic resource management mechanism using control theoretic approach for wide-area grid computing (2005) (19)
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding (2022) (19)
DUAL SYSTEM COMBINATION APPROACH FOR VARIOUS REVERBERANT ENVIRONMENTS WITH DEREVERBERATION TECHNIQUES (2014) (19)
Investigating Self-Supervised Learning for Speech Enhancement and Separation (2022) (19)
End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming (2020) (19)
An Investigation of End-to-End Multichannel Speech Recognition for Reverberant and Mismatch Conditions (2019) (19)
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation (2021) (19)
Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors (2021) (18)
Acoustic Modeling for Overlapping Speech Recognition: Jhu Chime-5 Challenge System (2019) (18)
Bayesian modelling of the speech spectrum using mixture of Gaussians (2004) (18)
End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection (2020) (18)
Real-time meeting recognition and understanding using distant microphones and omni-directional camera (2010) (18)
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks (2021) (18)
Correlation between the structural and antiferromagnetic phase transitions in ZnCr2Se4 (2003) (18)
Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training (2009) (17)
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition (2019) (17)
Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data (2010) (17)
Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec (2021) (17)
Using ASR Methods for OCR (2019) (17)
Composite embedding systems for ZeroSpeech2017 Track1 (2017) (16)
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals (2020) (16)
BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection (2017) (16)
Multi-Stream End-to-End Speech Recognition (2019) (16)
Arabic Speech Recognition by End-to-End, Modular Systems and Human (2021) (16)
Beamforming networks using spatial covariance features for far-field speech recognition (2016) (16)
Stream Attention-based Multi-array End-to-end Speech Recognition (2018) (16)
Improving End-to-End Single-Channel Multi-Talker Speech Recognition (2020) (15)
Structural Bayesian Linear Regression for Hidden Markov Models (2014) (15)
End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings (2021) (15)
Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer (2013) (14)
The Phasebook: Building Complex Masks via Discrete Representations for Source Separation (2019) (14)
Bayesian linear regression for Hidden Markov Model based on optimizing variational bounds (2011) (14)
End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection (2021) (14)
ESPnet-ST IWSLT 2021 Offline Speech Translation System (2021) (14)
Multi-Head Decoder for End-to-End Speech Recognition (2018) (14)
Speaker-Conditional Chain Model for Speech Separation and Extraction (2020) (14)
Probabilistic integration of joint density model and speaker model for voice conversion (2010) (14)
DiscreTalk: Text-to-Speech as a Machine Translation Problem (2020) (14)
Online Unsupervised Classification With Model Comparison in the Variational Bayes Framework for Voice Activity Detection (2010) (13)
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations (2021) (13)
Study of the Performance of Automatic Speech Recognition Systems in Speakers with Parkinson's Disease (2019) (13)
Student-Teacher Learning for BLSTM Mask-based Speech Enhancement (2018) (13)
Driver confusion status detection using recurrent neural networks (2016) (13)
A Study of Transducer Based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies (2021) (13)
Evolution-Strategy-Based Automation of System Development for High-Performance Speech Recognition (2019) (13)
Analysis of Multilingual Sequence-to-Sequence speech recognition systems (2018) (13)
Dual-Path RNN for Long Recording Speech Separation (2021) (13)
ORTHROS: non-autoregressive end-to-end speech translation With dual-decoder (2020) (13)
Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition (2021) (13)
A unified view for discriminative objective functions based on negative exponential of difference measure between strings (2009) (13)
Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization (2020) (12)
Acoustic model adaptation based on coarse/fine training of transfer vectors and its application to a speaker adaptation task (2004) (12)
Different efficacies of neutralizing antibodies and antiviral drugs on SARS-CoV-2 Omicron subvariants, BA.1 and BA.2 (2022) (12)
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results (2022) (12)
Promising Accurate Prefix Boosting for Sequence-to-sequence ASR (2018) (12)
Uncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features (2015) (12)
Context-Sensitive and Role-Dependent Spoken Language Understanding Using Bidirectional and Attention LSTMs (2016) (12)
Automated structure discovery and parameter tuning of neural network language model based on evolution strategy (2016) (12)
Context Sensitive Spoken Language Understanding Using Role Dependent LSTM Layers (2015) (12)
TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation (2022) (12)
End-to-End SpeakerBeam for Single Channel Target Speech Recognition (2019) (12)
Online meeting recognizer with multichannel speaker diarization (2010) (11)
Multi-encoder multi-resolution framework for end-to-end speech recognition (2018) (11)
Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers (2021) (11)
Non-Autoregressive Transformer Automatic Speech Recognition (2019) (11)
Attention-Based ASR with Lightweight and Dynamic Convolutions (2019) (11)
Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker (2021) (11)
Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models (2022) (11)
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation (2022) (11)
Conferencingspeech Challenge: Towards Far-Field Multi-Channel Speech Enhancement for Video Conferencing (2021) (11)
Minimum Error Classification with geometric margin control (2010) (10)
MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments (2012) (10)
Analysis of Robustness of Deep Single-Channel Speech Separation Using Corpora Constructed From Multiple Domains (2019) (10)
Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios (2021) (10)
Streaming Transformer ASR with Blockwise Synchronous Inference (2020) (10)
Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection (2012) (10)
BAYESIAN ACOUSTIC MODELING FOR SPONTANEOUS SPEECH RECOGNITION (2004) (10)
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages (2022) (9)
Differences in the effects of mutations in GP131, a guinea pig cytomegalovirus homologue of pentameric complex component UL130, on macrophage and epithelial cell infection. (2018) (9)
Speaker Recognition Benchmark Using the CHiME-5 Corpus (2019) (9)
Structural Classification Methods Based on Weighted Finite-State Transducers for Automatic Speech Recognition (2012) (9)
Leveraging State-of-the-art ASR Techniques to Audio Captioning (2021) (9)
CNN-based Multichannel End-to-End Speech Recognition for Everyday Home Environments* (2018) (9)
A Sequential Pattern Classifier Based on Hidden Markov Kernel Machine and Its Application to Phoneme Classification (2010) (9)
Encoder-Decoder Based Attractors for End-to-End Neural Diarization (2021) (9)
Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models (2021) (9)
Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR (2017) (9)
Speech Processing for Digital Home Assistants (2019) (9)
Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models (2020) (9)
HEAR 2021: Holistic Evaluation of Audio Representations (2022) (9)
Performance Evaluation to Optimize the UMP System Focusing on Network Transmission Speed (2007) (8)
Building Corpora for Single-Channel Speech Separation Across Multiple Domains (2018) (8)
Coupled Initialization of Multi-Channel Non-Negative Matrix Factorization Based on Spatial and Spectral Information (2017) (8)
Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain (2021) (8)
End-to-End ASR with Adaptive Span Self-Attention (2020) (8)
Ensemble integration of calibrated speaker localization and statistical speech detection in domestic environments (2014) (8)
A discriminative model for continuous speech recognition based on Weighted Finite State Transducers (2010) (8)
Driver prediction to improve interaction with in-vehicle HMI (2015) (8)
Statistical Dialogue Management using Intention Dependency Graph (2013) (8)
Gibbs sampling based Multi-scale Mixture Model for speaker clustering (2011) (8)
Constructing shared-state hidden Markov models based on a Bayesian approach (2002) (8)
Dual-Path Modeling for Long Recording Speech Separation in Meetings (2021) (7)
Variance Compensation for Recognition of Reverberant Speech with Dereverberation Preprocessing (2011) (7)
Generalized Weighted-Prediction-Error Dereverberation with Varying Source Priors For Reverberant Speech Recognition (2019) (7)
High-accuracy user identification using EEG biometrics (2016) (7)
Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording (2020) (7)
CMU’s IWSLT 2022 Dialect Speech Translation System (2022) (7)
Feature Enhancement With Joint Use of Consecutive Corrupted and Noise Feature Vectors With Discriminative Region Weighting (2013) (7)
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization (2021) (7)
Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation (2020) (7)
Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization (2021) (7)
Decoding network optimization using minimum transition error training (2012) (7)
Application of variational Bayesian estimation and clustering to acoustic model adaptation (2003) (7)
A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition (2019) (7)
CTC Alignments Improve Autoregressive Translation (2022) (7)
Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021 (2021) (7)
Data Selection by Sequence Summarizing Neural Network in Mismatch Condition Training (2016) (7)
In Vitro Efficacy of Antiviral Agents against Omicron Subvariant BA.4.6 (2022) (7)
A generalized discriminative training framework for system combination (2013) (7)
STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency (2022) (7)
Computerized analysis for classification of heart diseases in echocardiographic images (1996) (6)
Multi-mode Transformer Transducer with Stochastic Future Context (2021) (6)
Subspace pursuit method for kernel-log-linear models (2011) (6)
Immunogenicity and protective efficacy of SARS-CoV-2 recombinant S-protein vaccine S-268019-b in cynomolgus monkeys (2022) (6)
Learning Speaker Embedding from Text-to-Speech (2020) (6)
Differentiable Allophone Graphs for Language-Universal Speech Recognition (2021) (6)
High density and reliable packaging technology with Non Conductive Film for 3D/TSV (2013) (6)
Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN (2015) (6)
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions (2021) (6)
Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization (2010) (6)
Predictor–Corrector Adaptation by Using Time Evolution System With Macroscopic Time Scale (2010) (6)
Speech Recognition Based on Student's t-Distribution Derived from Total Bayesian Framework (2006) (6)
Effects of Bayesian predictive classification using variational Bayesian posteriors for sparse training data in speech recognition (2005) (6)
Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge (2022) (6)
ASR2K: Speech Recognition for Around 2000 Languages without Audio (2022) (6)
Combined static and dynamic variance adaptation for efficient interconnection of speech enhancement pre-processor with speech recognizer (2008) (6)
Fully Bayesian inference of multi-mixture Gaussian model and its evaluation using speaker clustering (2012) (6)
Discriminative training of acoustic models for system combination (2013) (6)
Training Noisy Single-Channel Speech Separation with Noisy Oracle Sources: A Large Gap and a Small Step (2020) (6)
Evolutionary optimization of long short-term memory neural network language model (2016) (6)
Leveraging Pre-trained Language Model for Speech Sentiment Analysis (2021) (6)
Search error risk minimization in Viterbi beam search for speech recognition (2010) (6)
Pretraining by Backtranslation for End-to-End ASR in Low-Resource Settings (2018) (5)
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates (2021) (5)
Auxiliary Loss Function for Target Speech Extraction and Recognition with Weak Supervision Based on Speaker Characteristics (2021) (5)
Roles of GP33, a guinea pig cytomegalovirus-encoded G protein-coupled receptor homolog, in cellular signaling, viral growth and inflammation in vitro and in vivo (2018) (5)
Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding (2021) (5)
Fast similarity search on a large speech data set with neighborhood graph indexing (2010) (5)
Multi-Channel End-To-End Neural Diarization with Distributed Microphones (2021) (5)
Two-Pass Low Latency End-to-End Spoken Language Understanding (2022) (5)
Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring (2021) (5)
Non-stationary noise estimation method based on bias-residual component decomposition for robust speech recognition (2011) (5)
Discriminative feature transforms using differenced maximum mutual information (2012) (5)
Multi-microphone speech recognition in everyday environments (2017) (5)
A new stabilized zero-crossing representation in the wavelet transform domain and signal reconstruction (1995) (5)
Layer Pruning on Demand with Intermediate CTC (2021) (5)
The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge (2020) (5)
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding (2022) (5)
Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem (2021) (5)
Gaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech Recognition (2021) (5)
Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition (2018) (5)
Speaker Clustering Based on Utterance-Oriented Dirichlet Process Mixture Model (2011) (5)
End-to-End Multilingual Multi-Speaker Speech Recognition (2019) (5)
Continuous Speech Separation Using Speaker Inventory for Long Recording (2021) (4)
Sequence Transduction with Graph-Based Supervision (2021) (4)
Training data selection with user’s physical characteristics data for acceleration-based activity modeling (2013) (4)
Online Continual Learning of End-to-End Speech Recognition Models (2022) (4)
Encoder-Decoder Based Attractor Calculation for End-to-End Neural Diarization (2021) (4)
Discriminative approach to dynamic variance adaptation for noisy speech recognition (2011) (4)
E-Branchformer: Branchformer with Enhanced Merging for Speech Recognition (2022) (4)
Online End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers (2021) (4)
A unified interpretation of adaptation approaches based on a macroscopic time evolution system and indirect/direct adaptation approaches (2008) (4)
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation (2022) (4)
Training Data Augmentation and Data Selection (2017) (4)
LegoNN: Building Modular Encoder-Decoder Models (2022) (4)
Acoustic Event Detection with Classifier Chains (2021) (4)
End-to-end ASR to jointly predict transcriptions and linguistic annotations (2021) (4)
Superb @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning (2022) (4)
Evolution Strategy Based Neural Network Optimization and LSTM Language Model for Robust Speech Recognition (2016) (4)
TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement (2023) (4)
JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification (2021) (4)
On Compressing Sequences for Self-Supervised Speech Models (2022) (4)
A Robust Estimation Method of Noise Mixture Model for Noise Suppression (2011) (4)
A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination (2010) (4)
Cost-level integration of statistical and rule-based dialog managers (2014) (4)
Characterization of a new polarity switching negative tone e-beam resist for 14nm and 10nm logic node mask fabrication and beyond (2014) (4)
Minimum latency training of sequence transducers for streaming end-to-end speech recognition (2022) (4)
Semi-Supervised Learning of a Pronunciation Dictionary from Disjoint Phonemic Transcripts and Text (2017) (4)
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis (2022) (4)
Deep Speech Synthesis from Articulatory Representations (2022) (3)
Log-linear dialog manager (2014) (3)
Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis (2022) (3)
Advanced computational models and learning theories for spoken language processing (2006) (3)
High accurate model-integration-based voice conversion using dynamic features and model structure optimization (2011) (3)
Attention-Based Multi-Hypothesis Fusion for Speech Summarization (2021) (3)
Improving Speech Enhancement through Fine-Grained Speech Characteristics (2022) (3)
On Prosody Modeling for ASR+TTS Based Voice Conversion (2021) (3)
Vectorization of hypotheses and speech for faster beam search in encoder decoder-based speech recognition (2018) (3)
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation (2022) (3)
Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language Documentation (2021) (3)
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy (2022) (3)
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition (2021) (3)
Introduction to the Issue on Far-Field Speech Processing in the Era of Deep Learning: Speech Enhancement, Separation, and Recognition (2019) (3)
Overview of the 2nd 'CHiME' Speech Separation and Recognition Challenge (2013) (3)
Effectiveness of Single-Channel BLSTM Enhancement for Language Identification (2018) (3)
Bayesian approaches to acoustic modeling: a review (2012) (3)
Defining Reasonably Foreseeable Parameter Ranges Using Real-World Traffic Data for Scenario-Based Safety Assessment of Automated Vehicles (2022) (3)
Train from scratch: Single-stage joint training of speech separation and recognition (2022) (3)
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers (2022) (3)
Incremental Adaptation Based on a Macroscopic Time Evolution System (2007) (3)
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation (2022) (3)
A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding (2022) (3)
The JHU/KyotoU Speech Translation System for IWSLT 2018 (2018) (3)
Selection of Shared-State Hidden Markov Model Structure Using Bayesian Criterion (2005) (3)
ESPnet2 pretrained model, Shinji Watanabe/librispeech_asr_train_asr_transformer_e18_raw_bpe_sp_valid.acc.best, fs=16k, lang=en (2020) (3)
Toward Streaming ASR with Non-Autoregressive Insertion-Based Model (2020) (3)
END-TO-END ASR AND AUDIO SEGMENTATION WITH NON-AUTOREGRESSIVE INSERTION-BASED MODEL (2020) (2)
Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks (2021) (2)
Acoustic Model Adaptation Based on Coarse/Fine Training of Transfer Vectors Using Directional Statistics (2006) (2)
Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model (2012) (2)
Model Adaptation for Automatic Speech Recognition Based on Multiple Time Scale Evolution (2011) (2)
Influence relation estimation based on lexical entrainment in conversation (2013) (2)
Stereo-based feature enhancement using dictionary learning (2013) (2)
Memory-Efficient Training of RNN-Transducer with Sampled Softmax (2022) (2)
Low-latency meeting recognition and understanding using distant microphones (2011) (2)
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR (2022) (2)
Joint Speech Recognition and Audio Captioning (2022) (2)
Cross-Lingual Transfer for Speech Processing Using Acoustic Language Similarity (2021) (2)
Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors (2022) (2)
Automated Development of DNN Based Spoken Language Systems Using Evolutionary Algorithms (2020) (2)
Using online model comparison in the Variational Bayes framework for online unsupervised Voice Activity Detection (2010) (2)
Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR (2022) (2)
Learning Influences from Word Use in Polylogue (2011) (2)
Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble (2022) (2)
Non-Autoregressive End-To-End Automatic Speech Recognition Incorporating Downstream Natural Language Processing (2022) (2)
ESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper (2019) (2)
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech (2023) (2)
End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party (2022) (2)
NORMALIZATION AND ADAPTATION BY CONSISTENTLY EMPLOYING MAP ESTIMATION (2012) (2)
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis (2022) (2)
Short range structure of lead-lithium fluoride obtained by XAFS analysis (2005) (2)
Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge (2023) (2)
Self-supervised Representation Learning for Speech Processing (2022) (2)
TriniTTS: Pitch-controllable End-to-end TTS without External Aligner (2022) (2)
Development of mapping system for distribution facility management (1989) (2)
In search of strong embedding extractors for speaker diarisation (2022) (2)
Prior-shared feature and model space speaker adaptation by consistently employing map estimation (2013) (2)
A sampling-based speaker clustering using utterance-oriented Dirichlet process mixture model and its evaluation on large-scale data (2015) (2)
A stabilized zero-crossing representation in the wavelet transform domain and its extension to image representation for early vision (1996) (2)
Building Speech Recognition System from Untranscribed Data Report from JHU workshop 2016 (2016) (2)
Are twindemics occurring? (2022) (2)
An algorithm for making a correspondence of zero-crossing points in a wavelet transform domain with second-order derivative property (1995) (1)
Basis vector orthogonalization for an improved kernel gradient matching pursuit method (2012) (1)
A Kernel Machine Derived by Minimum Relative Entropy Discrimination For Automatic Speech Recognition (2009) (1)
Bayesian approaches in speech recognition (2011) (1)
On-line adaptation and Bayesian detection of environmental changes based on a macroscopic time evolution system (2009) (1)
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation (2022) (1)
The Multimodal Information based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and Recognition (2023) (1)
Better Intermediates Improve CTC Inference (2022) (1)
Learning influences fromword use in polylogue (2011) (1)
Novel Deep Architectures in Speech Processing (2017) (1)
Run-and-Back Stitch Search: Novel Block Synchronous Decoding For Streaming Encoder-Decoder ASR (2022) (1)
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks (2022) (1)
Large-scale learning of generalised representations for speaker recognition (2022) (1)
Sequential maximum mutual information linear discriminant analysis for speech recognition (2014) (1)
Convolutional bidirectional long short-term memory hidden Markov model hybrid system for polyphonic sound event detection (2016) (1)
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study (2023) (1)
Application of topic tracking model to language model adaptation and meeting analysis (2010) (1)
Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data (2011) (1)
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model (2022) (1)
Low-Resource Contextual Topic Identification on Speech (2018) (1)
Residual Language Model for End-to-end Speech Recognition (2022) (1)
Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation (2022) (1)
Prior-based Binary Masking and Discriminative Methods for Reverberant and Noisy Speech Recognition Using Distant Stereo Microphones (2017) (1)
Guest Editorial for the special issue on Multi-Microphone Speech Recognition in Everyday Environments (2017) (1)
An external quality assessment feasibility study; cross laboratory comparison of haemagglutination inhibition assay and microneutralization assay performance for seasonal influenza serology testing: A FLUCOP study (2023) (1)
Sequence discriminative training for low-rank deep neural networks (2014) (1)
Multi-blank Transducers for Speech Recognition (2022) (1)
Bag Of ARCS: New representation of speech segment features based on finite state machines (2012) (1)
Toolkits for Robust Speech Processing (2017) (1)
Speaker Verification-Based Evaluation of Single-Channel Speech Separation (2021) (1)
SpeechLMScore: Evaluating speech generation using speech language model (2022) (1)
Context-aware Fine-tuning of Self-supervised Speech Models (2022) (1)
Avoid Overthinking in Self-Supervised Models for Speech Recognition (2022) (1)
Handling uncertain observations in unsupervised topic-mixture language model adaptation (2012) (1)
Speaker-Independent Acoustic-to-Articulatory Speech Inversion (2023) (1)
An efficient plasmid-driven system for the generation of influenza virus-like particles for vaccine (2001) (1)
Joint speaker diarization and speech recognition based on region proposal networks (2021) (1)
PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement (2023) (0)
Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection (2022) (0)
Streaming Joint Speech Recognition and Disfluency Detection (2022) (0)
Report on NTT Communications Science Laboratories Open House 2011 (2011) (0)
A new stabilized zero — Crossing representation in the wavelet transform domain and its application to image processing (1996) (0)
Impact of Reinfection with SARS-CoV-2 Omicron Variants in Previously Infected Hamsters (2022) (0)
The Method of Evaluating Icon based on Agreement Level with Design Concept Intended for Mobile Phone (2005) (0)
Bridging Speech and Textual Pre-trained Models with Unsupervised ASR (2022) (0)
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations (2023) (0)
Are Depositors Aware of the Governance of their Banks?1 (2013) (0)
ESPnet2 pretrained model, Shinji Watanabe/laborotv_asr_train_asr_conformer2_latest33_raw_char_sp_valid.acc.ave, fs=16k, lang=jp (2020) (0)
Adaptive & discriminative speech modeling to cope with temporal changes of environments (2011) (0)
Robust speech processing using observation uncertainty and uncertainty propagation: session and paper overview (2015) (0)
Stereo-Input Speech Recognition Using Sparseness-Based Blind Source Separation (2010) (0)
When Is TTS Augmentation Through a Pivot Language Useful? (2022) (0)
End-to-End Speech Recognition: A Survey (2023) (0)
Intrusion of Coastal Oyashio water to Funka Bay and Tsugaru Strait occasionally disturbed by Kuroshio-originating warm core ring (2023) (0)
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining (2023) (0)
A.R.アモンズと現代アメリカ詩 (1990) (0)
Antiviral Susceptibilities of Distinct Lineages of Influenza C and D Viruses (2023) (0)
SUPERB: Speech Understanding and PERformance Benchmark (2021) (0)
Automated Discrimination of Heart Disease Using Artificial (1998) (0)
ESPnet2 pretrained model, Shinji Watanabe/spgispeech_asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_bpe10000_valid.acc.ave, fs=16k, lang=en (2021) (0)
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit (2023) (0)
Voice Activity Detection Using Dirichlet Prior (2009) (0)
The potential of a universal influenza virus-like particle vaccine expressing a chimeric cytokine (2022) (0)
Challenges of Corporate Alliance CLOMA toward Plastic Litter (2023) (0)
An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer (2021) (0)
Feature-space Adaptation with a Weighted Sum of Multiple Transformation Matrices Based on Regression Tree for Automatic Speech Recognition (2017) (0)
Holding for the 77th Annual Meeting of the JSMI (2002) (0)
A stabilized multiscale zero - crossing image representation for image processing tasks at the level of the early vision (1996) (0)
21231 A Design Method Considering Earthquake Input Levels for Base Isolated Structure with Variable Oil Damper (2011) (0)
ESPnet2 pretrained model, Shinji Watanabe/gigaspeech_asr_train_asr_raw_en_bpe5000_valid.acc.ave, fs=16k, lang=en (2021) (0)
Application of Source Separation to Robust Speech Analysis and Recognition (2018) (0)
Research note: Residents’ Assessment of Local Government Information Systems (2014) (0)
Strains in heterostructures detected by standard NMR (2010) (0)
SUMMARIZING NEURAL NETWORK FOR SPEAKER ADAPTATION (2016) (0)
BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder (2022) (0)
Structural Bayesian Linear Regression for Hidden Markov Models (2013) (0)
ESPnet-ONNX: Bridging a Gap Between Research and Production (2022) (0)
Vector and matrix formulas (2015) (0)
Immunogenicity and Protective Efﬁcacy of Replication-Incompetent Inﬂuenza Virus-Like Particles (2001) (0)
C L ] 1 O ct 2 01 9 MULTILINGUAL END-TO-END SPEECH TRANSLATION (2019) (0)
Training data selection with user’s physical characteristics data for acceleration-based activity modeling (2011) (0)
Gear shift control method for an automatic automotive transmission (1994) (0)
The Purpose of Health and Labour Sciences Research Grants "Study in the Present Status of Alarms of Medical Equipment and Alarm-related Problems"(Alarm System for Medical Devices) (2002) (0)
Speech recognition based on a Bayesian approach =Beizu-teki shuho ni motozuku onsei ninshiki (2006) (0)
Bayesian Speech and Language Processing: Bayesian approach (2015) (0)
Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization (2022) (0)
Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering (2016) (0)
A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge (2023) (0)
Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments (2015) (0)
Statistical models in speech and language processing (2015) (0)
Inﬂuenza B Virus BM2 Protein Is Transported through the trans-Golgi Network as an Integral Membrane Protein (2003) (0)
A pattern recognition device and pattern recognition methods (2013) (0)
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head (2023) (0)
Saturation time of exposure interval for cross-neutralization response to SARS-CoV-2: Implications for vaccine dose interval (2023) (0)
Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling (2023) (0)
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models (2022) (0)
Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding (2023) (0)
End-to-End Multi-Speaker ASR with Independent Vector Analysis (2022) (0)
A dilemma of ground truth in noisy speech separation and an approach to lessen the impact of imperfect training data (2022) (0)
Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models (2022) (0)
InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss (2022) (0)
Discriminative Beamforming with Phase-Aware Neural Networks for Speech Enhancement and Recognition (2017) (0)
Use of the particle agglutination/particle agglutination inhibition test for antigenic analysis of SARS‐CoV‐2 (2022) (0)
Enhancing Speech-to-Speech Translation with Multiple TTS Targets (2023) (0)
ESPnet2 pretrained model, Shinji Watanabe/librispeech_asr_train_asr_conformer_raw_bpe_batch_bins30000000_accum_grad3_optim_conflr0.001_sp_valid.acc.ave, fs=16k, lang=en (2020) (0)
DESIGN DEVELOPMENT ON NEW TOSHIBA RICE COOKER(Proceedings of the 35th Annual Conference of the JSSD) (1988) (0)
Bayesian Speech and Language Processing: Introduction (2015) (0)
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders (2022) (0)
C-35 Design for Base-Isolated structure with Variable Oil Damper using design optimization method (2010) (0)
A S ] 2 0 A ug 2 01 9 Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text (2019) (0)
EURO: ESPnet Unsupervised ASR Open-source Toolkit (2022) (0)
Decentralised control of corrugator line (1990) (0)
An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion (2022) (0)
A modeling toward the elucidation of disturbances in the complex system of the global coupled Ionosphere-Thermosphere (2002) (0)
Integration of speech separation, diarization, and recognition for multi-speaker meetings: Separated LibriCSS dataset (2021) (0)
Evaluation of Noisy Speech Recognition and Sequence Discriminative Training for Low-rank Deep Neural Network Acoustic Models (2016) (0)
Improving Massively Multilingual ASR With Auxiliary CTC Objectives (2023) (0)
Stereo-input speech recognition using sparseness-based time-frequency masking in a reverberant environment (2009) (0)
Editorial Editorial of Special Issue on Self-Supervised Learning for Speech and Audio Processing (2022) (0)
ESPnet2 pretrained model, Shinji Watanabe/spgispeech_asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_unnorm_bpe5000_valid.acc.ave, fs=16k, lang=en_unnorm (2021) (0)
Effect of dialog acts on word use in polylogue (2012) (0)
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States (2022) (0)
EAT: Enhanced ASR-TTS Framework for Self-supervised ASR (2020) (0)
Dry, Focus, and Transcribe: End-to-End Integration of Dereverberation, Beamforming, and ASR (2019) (0)
The Aim of the Special Feature(<Special Isuue:How should We Manage Medical Equipment?>) (2000) (0)
Acid Rain: Statistical Analysis of Ionic Correlations Questioned (2005) (0)
C L ] 3 M ay 2 02 1 SUPERB : Speech processing Universal PERformance Benchmark (2021) (0)
Phase diagram and transport properties of Y1−xNdxCo2 pseudo-binary alloys (2013) (0)
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition (2023) (0)
The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge (2023) (0)
Observation of strains caused by heterostructure interfaces (2011) (0)
122. Evaluation of Processor SRX-503 (1993) (0)
2 N ov 2 01 8 VECTORIZATION OF HYPOTHESES AND SPEECH FOR FASTER BEAM SEARCH IN ENCODER DECODER-BASED SPEECH RECOGNITION (2018) (0)
An Empirical Study of Training Mixture Generation Strategies on Speech Separation: Dynamic Mixing and Augmentation (2022) (0)
Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments (2015) (0)
Phone Inventories and Recognition for Every Language (2022) (0)
ESPnet2 pretrained model, Shinji Watanabe/open_li52_asr_train_asr_raw_bpe7000_valid.acc.ave, fs=16k, lang=noinfo (2021) (0)
Automatic DispenYing Checking System Using Two-dimensional Barcode Symbols. (1995) (0)
Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History (2023) (0)
Towards Zero-Shot Code-Switched Speech Recognition (2022) (0)
Improving Frame-Online Neural Speech Enhancement With Overlapped-Frame Prediction (2022) (0)
Paper index (2020) (0)
Large Geometric Margin Minimum Error Classification (2009) (0)
A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units (2022) (0)
Pathogenicity of two novel human-origin H7N9 highly pathogenic avian influenza viruses in chickens and ducks (2018) (0)
Integrating Multiple ASR Systems into NLP Backend with Attention Fusion (2022) (0)
A method for converting a noisy signal in an extended audio signal (2015) (0)
Acoustic models in speech recognition( ;recent progress and future prospects of automatic speech recognition research) (2009) (0)
Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization (2022) (0)

This paper list is powered by the following services:

What Schools Are Affiliated With Shinji Watanabe?

Shinji Watanabe is affiliated with the following schools:

Shinji Watanabe's Academic­Influence.com Rankings

Shinji Watanabe's Degrees

Similar Degrees You Can Earn

Why Is Shinji Watanabe Influential?

Shinji Watanabe's Published Works

Published Works

What Schools Are Affiliated With Shinji Watanabe?

Shinji Watanabe's AcademicInfluence.com Rankings