Why Is Yoshua Bengio Influential? (Suggest an Edit or Addition)
According to Wikipedia , Yoshua Bengio is a Canadian computer scientist, most noted for his work on artificial neural networks and deep learning. He is a professor at the Department of Computer Science and Operations Research at the Université de Montréal and scientific director of the Montreal Institute for Learning Algorithms .
(See a Problem?) Yoshua Bengio's Published Works
Number of citations in a given year to any of this author's works
Total number of citations to an author for the works they published in a given year. This highlights publication of the most important work(s) by the author
1940 1950 1960 1970 1980 1990 2000 2010 2020 0 12500 25000 37500 50000 62500 75000 87500 100000 112500 Published Papers Deep Learning (2015) (61305)Gradient-based learning applied to document recognition (1998) (39031)Generative Adversarial Nets (2014) (34498)Neural Machine Translation by Jointly Learning to Align and Translate (2014) (21547)Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation (2014) (17093)Understanding the difficulty of training deep feedforward neural networks (2010) (13880)Representation Learning: A Review and New Perspectives (2012) (9605)Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling (2014) (8549)Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (2015) (8100)Learning Deep Architectures for AI (2007) (7937)Graph Attention Networks (2017) (7433)Learning long-term dependencies with gradient descent is difficult (1994) (6682)Deep Sparse Rectifier Neural Networks (2011) (6627)A Neural Probabilistic Language Model (2003) (6527)How transferable are features in deep neural networks? (2014) (6436)Random Search for Hyper-Parameter Optimization (2012) (6358)Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion (2010) (6070)Extracting and composing robust features with denoising autoencoders (2008) (6010)On the Properties of Neural Machine Translation: Encoder–Decoder Approaches (2014) (4749)Convolutional networks for images, speech, and time series (1998) (4543)On the difficulty of training recurrent neural networks (2012) (4222)Pattern Recognition and Neural Networks (1995) (4194)Greedy Layer-Wise Training of Deep Networks (2006) (3992)Curriculum learning (2009) (3673)Algorithms for Hyper-Parameter Optimization (2011) (2846)FitNets: Hints for Thin Deep Nets (2014) (2369)BinaryConnect: Training Deep Neural Networks with binary weights during propagations (2015) (2315)Word Representations: A Simple and General Method for Semi-Supervised Learning (2010) (2247)Brain tumor segmentation with Deep Neural Networks (2015) (2220)Theano: A Python framework for fast computation of mathematical expressions (2016) (2195)Attention-Based Models for Speech Recognition (2015) (2080)Why Does Unsupervised Pre-training Help Deep Learning? (2010) (2019)Maxout Networks (2013) (1935)Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 (2016) (1913)Practical Recommendations for Gradient-Based Training of Deep Architectures (2012) (1825)Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation (2013) (1792)Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies (2001) (1692)A Structured Self-attentive Sentence Embedding (2017) (1687)Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach (2011) (1670)Learning deep representations by mutual information estimation and maximization (2018) (1631)Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models (2015) (1542)Semi-supervised Learning by Entropy Minimization (2004) (1488)Binarized Neural Networks (2016) (1442)Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations (2016) (1424)NICE: Non-linear Independent Components Estimation (2014) (1399)Theano: new features and speed improvements (2012) (1396)Contractive Auto-Encoders: Explicit Invariance During Feature Extraction (2011) (1329)The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation (2016) (1324)Deep Learning of Representations for Unsupervised and Transfer Learning (2011) (1201)Scaling learning algorithms towards AI (2007) (1183)Identifying and attacking the saddle point problem in high-dimensional non-convex optimization (2014) (1163)Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering (2003) (1093)Exploring Strategies for Training Deep Neural Networks (2009) (1082)Visualizing Higher-Layer Features of a Deep Network (2009) (1075)A Closer Look at Memorization in Deep Networks (2017) (1059)An empirical evaluation of deep architectures on problems with many factors of variation (2007) (1040)Deep Graph Infomax (2018) (1020)On the Number of Linear Regions of Deep Neural Networks (2014) (1013)A Recurrent Latent Variable Model for Sequential Data (2015) (975)Hierarchical Probabilistic Neural Network Language Model (2005) (973)End-to-end attention-based large vocabulary speech recognition (2015) (961)A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues (2016) (960)Challenges in representation learning: A report on three machine learning contests (2013) (944)HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (2018) (925)On Using Very Large Target Vocabulary for Neural Machine Translation (2014) (907)Inference for the Generalization Error (1999) (900)An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks (2013) (859)How to Construct Deep Recurrent Neural Networks (2013) (852)No Unbiased Estimator of the Variance of K-Fold Cross-Validation (2003) (845)Theano: A CPU and GPU Math Compiler in Python (2010) (839)Classification using discriminative restricted Boltzmann machines (2008) (830)Object Recognition with Gradient-Based Learning (1999) (824)Learning Structured Embeddings of Knowledge Bases (2011) (807)Representational Power of Restricted Boltzmann Machines and Deep Belief Networks (2008) (707)Gated Feedback Recurrent Neural Networks (2015) (699)Mutual Information Neural Estimation (2018) (667)Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription (2012) (651)Manifold Mixup: Better Representations by Interpolating Hidden States (2018) (627)BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 (2016) (599)Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon (2018) (594)Deep Learning of Representations: Looking Forward (2013) (593)Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space (2016) (571)Neural Probabilistic Language Models (2006) (564)A semantic matching energy function for learning with multi-relational data (2013) (564)Unitary Evolution Recurrent Neural Networks (2015) (547)Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism (2016) (538)Sharp Minima Can Generalize For Deep Nets (2017) (532)Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding (2015) (515)An Actor-Critic Algorithm for Sequence Prediction (2016) (515)Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives (2012) (504)MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis (2019) (501)Training deep neural networks with low precision multiplications (2014) (497)Understanding the exploding gradient problem (2012) (490)Convergence Properties of the K-Means Algorithms (1994) (490)SampleRNN: An Unconditional End-to-End Neural Audio Generation Model (2016) (478)Advances in optimizing recurrent networks (2012) (477)Pointing the Unknown Words (2016) (476)On Using Monolingual Corpora in Neural Machine Translation (2015) (475)Hierarchical Multiscale Recurrent Neural Networks (2016) (472)On the Spectral Bias of Neural Networks (2018) (469)Deep Complex Networks (2017) (464)A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion (2015) (459)A deep learning framework for neuroscience (2019) (457)Understanding intermediate layers using linear classifier probes (2016) (452)Mode Regularized Generative Adversarial Networks (2016) (442)Professor Forcing: A New Algorithm for Training Recurrent Networks (2016) (441)Gradient-Based Optimization of Hyperparameters (2000) (437)Generalized Denoising Auto-Encoders as Generative Models (2013) (433)Speaker Recognition from Raw Waveform with SincNet (2018) (430)Benchmarking Graph Neural Networks (2020) (425)What regularized auto-encoders learn from the data-generating distribution (2012) (424)A Parallel Mixture of SVMs for Very Large Scale Problems (2001) (421)The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training (2009) (416)End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results (2014) (410)Generative adversarial networks (2020) (399)Zero-data Learning of New Tasks (2008) (397)Interpolation Consistency Training for Semi-Supervised Learning (2019) (393)Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding (2013) (393)Toward Causal Representation Learning (2021) (386)Char2Wav: End-to-End Speech Synthesis (2017) (385)BilBOWA: Fast Bilingual Distributed Representations without Word Alignments (2014) (383)Deep Generative Stochastic Networks Trainable by Backprop (2013) (373)An Input Output HMM Architecture (1994) (368)Tackling Climate Change with Machine Learning (2019) (367)Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks (2015) (363)Bayesian Model-Agnostic Meta-Learning (2018) (361)Hierarchical Recurrent Neural Networks for Long-Term Dependencies (1995) (353)EmoNets: Multimodal deep learning approaches for emotion recognition in video (2015) (351)Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing (2012) (347)Input-output HMMs for sequence processing (1996) (345)Three Factors Influencing Minima in SGD (2017) (337)Combining modality specific deep neural networks for emotion recognition in video (2013) (332)Generalization in Deep Learning (2017) (331)Incorporating Second-Order Functional Knowledge for Better Option Pricing (2000) (330)Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks (2016) (319)Kernel Matching Pursuit (2002) (318)A Character-level Decoder without Explicit Segmentation for Neural Machine Translation (2016) (315)Learning Eigenfunctions Links Spectral Embedding and Kernel PCA (2004) (311)Pylearn2: a machine learning research library (2013) (305)Gradient based sample selection for online continual learning (2019) (305)Shallow vs. Deep Sum-Product Networks (2011) (305)Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses (2017) (302)Revisiting Natural Gradient for Deep Networks (2013) (302)Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning (2018) (296)Neural Networks with Few Multiplications (2015) (295)N-BEATS: Neural basis expansion analysis for interpretable time series forecasting (2019) (293)Markovian Models for Sequential Data (2004) (290)Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation (2016) (289)Better Mixing via Deep Representations (2012) (287)Learning Algorithms for the Classification Restricted Boltzmann Machine (2012) (287)Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations (2016) (282)Boosting Neural Networks (2000) (281)High quality document image compression with "DjVu" (1998) (279)Towards Biologically Plausible Deep Learning (2015) (278)Learning a synaptic learning rule (1991) (276)Global optimization of a neural network-hidden Markov model hybrid (1991) (273)The Manifold Tangent Classifier (2011) (264)Equilibrated adaptive learning rates for non-convex optimization (2015) (263)MetaGAN: An Adversarial Approach to Few-Shot Learning (2018) (263)An Empirical Study of Example Forgetting during Deep Neural Network Learning (2018) (263)Difference Target Propagation (2014) (260)On the Expressive Power of Deep Architectures (2011) (260)Drawing and Recognizing Chinese Characters with Recurrent Neural Network (2016) (252)Theano: Deep Learning on GPUs with Python (2012) (245)ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks (2015) (245)On the Optimization of a Synaptic Learning Rule (2007) (245)Higher Order Contractive Auto-Encoder (2011) (244)RMSProp and equilibrated adaptive learning rates for non-convex optimization. (2015) (244)A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms (2019) (243)Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus (2016) (242)Learning deep physiological models of affect (2013) (226)Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark (2016) (225)K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms (2001) (222)Unsupervised and Transfer Learning Challenge: a Deep Learning Approach (2011) (219)Noisy Activation Functions (2016) (217)Justifying and Generalizing Contrastive Divergence (2009) (214)GMNN: Graph Markov Neural Networks (2019) (213)Recurrent Independent Mechanisms (2019) (212)ReSeg: A Recurrent Neural Network-Based Model for Semantic Segmentation (2015) (207)The problem of learning long-term dependencies in recurrent networks (1993) (206)The Curse of Highly Variable Functions for Local Kernel Machines (2005) (205)Efficient Non-Parametric Function Induction in Semi-Supervised Learning (2004) (204)Speech Model Pre-training for End-to-End Spoken Language Understanding (2019) (204)Measuring the tendency of CNNs to Learn Surface Statistical Regularities (2017) (204)SpeechBrain: A General-Purpose Speech Toolkit (2021) (202)Dendritic cortical microcircuits approximate the backpropagation algorithm (2018) (201)Maximum-Likelihood Augmented Discrete Generative Adversarial Networks (2017) (201)Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model (2008) (199)On the number of response regions of deep feed forward networks with piece-wise linear activations (2013) (195)Disentangling Factors of Variation for Facial Expression Recognition (2012) (192)Light Gated Recurrent Units for Speech Recognition (2018) (192)A Deep Reinforcement Learning Chatbot (2017) (191)Learning normalized inputs for iterative estimation in medical image segmentation (2017) (189)Batch normalized recurrent neural networks (2015) (189)Experience Grounds Language (2020) (188)Topmoumoute Online Natural Gradient Algorithm (2007) (187)Unsupervised State Representation Learning in Atari (2019) (185)Image-to-image translation for cross-domain disentanglement (2018) (184)Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation (2016) (182)Blocks and Fuel: Frameworks for deep learning (2015) (177)Deep learning for AI (2021) (176)Multi-Task Self-Supervised Learning for Robust Speech Recognition (2020) (176)Convex Neural Networks (2005) (174)Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks (2019) (173)Improving Generative Adversarial Networks with Denoising Feature Matching (2016) (172)Hierarchical Neural Network Generative Models for Movie Dialogues (2015) (171)The Pytorch-kaldi Speech Recognition Toolkit (2018) (170)Low precision arithmetic for deep learning (2014) (169)Towards End-to-end Spoken Language Understanding (2018) (165)The Consciousness Prior (2017) (163)LeRec: A NN/HMM Hybrid for On-Line Handwriting Recognition (1995) (159)Z-Forcing: Training Stochastic Recurrent Networks (2017) (159)Artificial Neural Networks Applied to Taxi Destination Prediction (2015) (157)Predicting COVID-19 Pneumonia Severity on Chest X-ray With Deep Learning (2020) (153)Deep Learning for NLP (without Magic) (2012) (153)Learning to Understand Phrases by Embedding the Dictionary (2015) (152)Deep Belief Networks Are Compact Universal Approximators (2010) (151)Audio Chord Recognition with Recurrent Neural Networks (2013) (150)Knowledge Matters: Importance of Prior Information for Optimization (2013) (150)Neural networks for speech and sequence recognition (1996) (148)HeMIS: Hetero-Modal Image Segmentation (2016) (148)How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation (2014) (147)Manifold Parzen Windows (2002) (147)Variance Reduction in SGD by Distributed Importance Sampling (2015) (145)Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims (2020) (143)Architectural Complexity Measures of Recurrent Neural Networks (2016) (142)Montreal Neural Machine Translation Systems for WMT’15 (2015) (140)Boundary-Seeking Generative Adversarial Networks (2017) (140)Denoising Criterion for Variational Auto-Encoding Framework (2015) (138)Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks (2013) (137)Reweighted Wake-Sleep (2014) (137)Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews (2014) (136)On Multiplicative Integration with Recurrent Neural Networks (2016) (134)Multi-Prediction Deep Boltzmann Machines (2013) (132)Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks (1999) (130)Fine-grained attention mechanism for neural machine translation (2018) (126)Deep Learners Benefit More from Out-of-Distribution Examples (2011) (125)BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning (2018) (125)FigureQA: An Annotated Figure Dataset for Visual Reasoning (2017) (124)Marginalized Denoising Auto-encoders for Nonlinear Representations (2014) (123)InfoBot: Transfer and Exploration via the Information Bottleneck (2019) (122)Count-ception: Counting by Fully Convolutional Redundant Counting (2017) (119)Mining (2011) (119)Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines (2010) (118)Deep Directed Generative Models with Energy-Based Probability Estimation (2016) (118)Global training of document processing systems using graph transformer networks (1997) (117)A Neural Knowledge Language Model (2016) (114)Model Selection for Small Sample Regression (2002) (113)Inductive biases for deep learning of higher-level cognition (2020) (112)Learning Neural Causal Models from Unknown Interventions (2019) (111)Iterative Alternating Neural Attention for Machine Reading (2016) (110)BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop (2018) (110)Neural net language models (2008) (109)Gated Orthogonal Recurrent Units: On Learning to Forget (2017) (108)Temporal Pooling and Multiscale Learning for Automatic Annotation and Ranking of Music Audio (2011) (106)Understanding Representations Learned in Deep Architectures (2010) (102)Large-Scale Feature Learning With Spike-and-Slab Sparse Coding (2012) (97)Collaborative Filtering on a Family of Biological Targets (2006) (96)On integrating a language model into neural machine translation (2017) (96)Non-Local Manifold Tangent Learning (2004) (96)Taking on the curse of dimensionality in joint distributions using neural networks (2000) (96)Revisiting Fundamentals of Experience Replay (2020) (95)Deep Learning for Patient-Specific Kidney Graft Survival Analysis (2017) (94)An empirical analysis of dropout in piecewise linear networks (2013) (94)A Spike and Slab Restricted Boltzmann Machine (2011) (92)End-to-End Online Writer Identification With Recurrent Neural Network (2017) (92)On the saddle point problem for non-convex optimization (2014) (92)Modeling term dependencies with quantum language models for IR (2013) (91)Globally Trained Handwritten Word Recognizer Using Spatial Representation, Convolutional Neural Networks, and Hidden Markov Models (1993) (90)Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study (2019) (89)Feature-wise transformations (2018) (89)Deconstructing the Ladder Network Architecture (2015) (89)Recurrent Neural Networks for Missing or Asynchronous Data (1995) (88)Multi-Task Learning for Stock Selection (1996) (88)A hybrid Pareto model for asymmetric fat-tailed data: the univariate case (2009) (86)On the number of inference regions of deep feed forward networks with piece-wise linear activations (2013) (86)The need for privacy with public digital contact tracing during the COVID-19 pandemic (2020) (85)Disentangling Factors of Variation via Generative Entangling (2012) (85)Residual Connections Encourage Iterative Inference (2017) (85)Quickly Generating Representative Samples from an RBM-Derived Process (2011) (84)BPS: a learning algorithm for capturing the dynamic nature of speech (1989) (83)The Curse of Dimensionality for Local Kernel Machines (2005) (83)Unsupervised Models of Images by Spikeand-Slab RBMs (2011) (82)Gradient Starvation: A Learning Proclivity in Neural Networks (2020) (81)Estimating or Propagating Gradients Through Stochastic Neurons (2013) (81)A Walk with SGD (2018) (81)Context-dependent word representation for neural machine translation (2016) (80)ChatPainter: Improving Text to Image Generation using Dialogue (2018) (80)Multi-way, multilingual neural machine translation (2017) (79)A Generative Process for sampling Contractive Auto-Encoders (2012) (78)ObamaNet: Photo-realistic lip-sync from text (2017) (78)Maximum Entropy Generators for Energy-Based Models (2019) (77)BigBrain 3D atlas of cortical layers: Cortical and laminar thickness gradients diverge in sensory and motor cortices (2019) (77)Learning Independent Features with Adversarial Nets for Non-linear ICA (2017) (77)Straight to the Tree: Constituency Parsing with Neural Syntactic Distance (2018) (76)Parallel Tempering for Training of Restricted Boltzmann Machines (2010) (75)Quaternion Recurrent Neural Networks (2018) (75)Combined Reinforcement Learning via Abstract Representations (2018) (74)Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation (2014) (74)Spectral Clustering and Kernel PCA are Learning Eigenfunctions (2003) (73)Wasserstein Dependency Measure for Representation Learning (2019) (73)Big Neural Networks Waste Capacity (2013) (73)On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length (2018) (73)Deep convolutional networks for quality assessment of protein folds (2018) (72)Word-level training of a handwritten word recognizer based on convolutional neural networks (1994) (72)Adding noise to the input of a model trained with a regularized objective (2011) (72)Learning Speaker Representations with Mutual Information (2018) (71)Slow, Decorrelated Features for Pretraining Complex Cell-like Networks (2009) (71)Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction (2018) (71)Interpretable Convolutional Filters with SincNet (2018) (71)Compositional generalization in a deep seq2seq model by separating syntax and semantics (2019) (71)Using a Financial Training Criterion Rather than a Prediction Criterion (1997) (70)Deep Learning of Representations (2013) (70)Learning to Compute Word Embeddings On the Fly (2017) (70)Manifold Mixup: Encouraging Meaningful On-Manifold Interpolation as a Regularizer (2018) (70)CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning (2020) (70)Label Propagation and Quadratic Criterion (2006) (70)Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling (2020) (69)Recurrent Neural Networks With Limited Numerical Precision (2016) (69)Hyperbolic Discounting and Learning over Multiple Horizons (2019) (68)Entropy Regularization (2006) (68)Toward Training Recurrent Neural Networks for Lifelong Learning (2018) (68)Training Methods for Adaptive Boosting of Neural Networks (1997) (68)Independently Controllable Factors (2017) (67)STDP-Compatible Approximation of Backpropagation in an Energy-Based Model (2017) (66)Greedy Spectral Embedding (2005) (66)Hierarchical Memory Networks (2016) (65)DECISION TREES DO NOT GENERALIZE TO NEW VARIATIONS (2010) (65)Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding (2018) (64)Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition (2018) (64)High-dimensional sequence transduction (2012) (63)Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery (2012) (61)Independently Controllable Features (2017) (61)Invariant Representations for Noisy Speech Recognition (2016) (61)CLOSURE: Assessing Systematic Generalization of CLEVR Models (2019) (60)Non-Local Manifold Parzen Windows (2005) (60)Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization (2021) (59)Embedding Word Similarity with Neural Machine Translation (2014) (59)Incorporating Functional Knowledge in Neural Networks (2009) (58)Interpolated Adversarial Training: Achieving Robust Neural Networks Without Sacrificing Too Much Accuracy (2019) (58)Depth with Nonlinearity Creates No Bad Local Minima in ResNets (2018) (57)Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning (2020) (57)Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition (2017) (57)Beyond Skill Rating: Advanced Matchmaking in Ghost Recon Online (2012) (57)Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes (2016) (57)The Z-coder adaptive binary coder (1998) (56)Reading checks with multilayer graph transformer networks (1997) (56)Learning Fixed Points in Generative Adversarial Networks: From Image-to-Image Translation to Disease Detection and Localization (2019) (56)Continuous Neural Networks (2007) (55)Recall Traces: Backtracking Models for Efficient Reinforcement Learning (2018) (55)On the Spectral Bias of Deep Neural Networks (2018) (55)Hybrid Models for Learning to Branch (2020) (54)Torchmeta: A Meta-Learning library for PyTorch (2019) (54)A Connectionist Approach to Speech Recognition (1993) (54)GraphMix: Regularized Training of Graph Neural Networks for Semi-Supervised Learning (2019) (54)Artificial neural networks and their application to sequence recognition (1991) (54)Use of genetic programming for the search of a new learning rule for neural networks (1994) (53)Credit Assignment through Time: Alternatives to Backpropagation (1993) (53)HighRes-net: Recursive Fusion for Multi-Frame Super-Resolution of Satellite Imagery (2020) (53)Systematic generalisation with group invariant predictions (2021) (52)Cost functions and model combination for VaR-based asset allocation using neural networks (2001) (52)Twin Networks: Matching the Future for Sequence Generation (2017) (52)Spectral Dimensionality Reduction (2006) (52)Extensions to Metric-Based Model Selection (2003) (52)Bias learning, knowledge sharing (2003) (52)Memory Augmented Neural Networks with Wormhole Connections (2017) (52)Learning Anonymized Representations with Adversarial Neural Networks (2018) (51)Disentangling the independently controllable factors of variation by interacting with the world (2018) (51)STDP as presynaptic activity times rate of change of postsynaptic activity (2015) (51)Diffusion of Context and Credit Information in Markovian Models (1995) (51)Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation (2021) (51)Not All Neural Embeddings are Born Equal (2014) (49)On the Challenges of Physical Implementations of RBMs (2013) (49)AdaBoosting Neural Networks: Application to on-line Character Recognition (1997) (49)Learning Concept Embeddings for Query Expansion by Quantum Entropy Minimization (2014) (49)Low precision storage for deep learning (2014) (48)Selective small molecule peptidomimetic ligands of TrkC and TrkA receptors afford discrete or complete neurotrophic activities. (2005) (47)Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net (2017) (47)Experiments on the application of IOHMMs to model financial returns series (2001) (47)The representational geometry of word meanings acquired by neural machine translation models (2017) (46)RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs (2020) (46)Use machine learning to find energy materials (2017) (46)Iterative Neural Autoregressive Distribution Estimator NADE-k (2014) (45)Learning the dynamic nature of speech with back-propagation for sequences (1992) (45)Improving Speech Recognition by Revising Gated Recurrent Units (2017) (44)DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning (2020) (44)Adversarial Domain Adaptation for Stable Brain-Machine Interfaces (2018) (44)Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics (2019) (44)Dendritic error backpropagation in deep cortical microcircuits (2017) (44)On Adversarial Mixup Resynthesis (2019) (43)Nonlocal Estimation of Manifold Structure (2006) (43)Diet Networks: Thin Parameters for Fat Genomics (2016) (43)Large-Scale Learning of Embeddings with Reconstruction Sampling (2011) (42)Scaling Large Learning Problems with Hard Parallel Mixtures (2002) (42)Graph Neural Networks with Learnable Structural and Positional Representations (2021) (42)Coordination Among Neural Modules Through a Shared Global Workspace (2021) (41)Learning Tags that Vary Within a Song (2010) (41)Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules (2020) (41)Bias in Estimating the Variance of K-Fold Cross-Validation (2005) (41)Word normalization for on-line handwritten word recognition (1994) (41)Neural Production Systems (2021) (41)Variational Temporal Abstraction (2019) (40)Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models (2019) (40)Introduction to the special issue on neural networks for data mining and knowledge discovery (2000) (40)Early Inference in Energy-Based Models Approximates Back-Propagation (2015) (40)Towards a Biologically Plausible Backprop (2016) (40)Fortified Networks: Improving the Robustness of Deep Networks by Modeling the Manifold of Hidden Representations (2018) (39)Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems (2020) (39)A network of deep neural networks for Distant Speech Recognition (2017) (39)DEFactor: Differentiable Edge Factorization-based Probabilistic Graph Generation (2018) (39)11 Label Propagation and Quadratic Criterion (39)On the interplay between noise and curvature and its effect on optimization and generalization (2019) (39)Dynamic Neural Turing Machine with Continuous and Discrete Addressing Schemes (2018) (38)GSNs : Generative Stochastic Networks (2015) (38)Use machine learning to find energy materials. (2017) (37)Monaural Singing Voice Separation with Skip-Filtering Connections and Recurrent Inference of Time-Frequency Mask (2017) (37)GraphMix: Improved Training of GNNs for Semi-Supervised Learning (2020) (37)Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies (2020) (37)Scaling Up Spike-and-Slab Models for Unsupervised Feature Learning (2013) (36)ReSeg: A Recurrent Neural Network for Object Segmentation (2015) (36)Globally trained handwritten word recognizer using spatial representation, space displacement neural networks and hidden Markov models (1993) (36)Mollifying Networks (2016) (35)Training End-to-End Analog Neural Networks with Equilibrium Propagation (2020) (35)Inherent privacy limitations of decentralized contact tracing apps (2020) (35)Task Loss Estimation for Sequence Prediction (2015) (34)Quadratic Features and Deep Architectures for Chunking (2009) (34)Evolving Culture Versus Local Minima (2014) (34)Probabilistic Planning with Sequential Monte Carlo methods (2018) (34)Towards Non-saturating Recurrent Units for Modelling Long-term Dependencies (2019) (34)InfoMask: Masked Variational Latent Representation to Localize Chest Disease (2019) (33)Representation Mixing for TTS Synthesis (2018) (33)Object-Centric Image Generation from Layouts (2020) (33)Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future (2019) (33)Editorial introduction to the Neural Networks special issue on Deep Learning of Representations (2015) (33)Batch-normalized joint training for DNN-based distant speech recognition (2016) (32)Data-Driven Approach to Encoding and Decoding 3-D Crystal Structures (2019) (32)Online continual learning with no task boundaries (2019) (32)On the Learning Dynamics of Deep Neural Networks (2018) (32)Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing Its Gradient Estimator Bias (2020) (32)Unsupervised Sense Disambiguation Using Bilingual Probabilistic Models (2004) (32)Small-GAN: Speeding Up GAN Training Using Core-sets (2019) (32)Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives (2019) (32)Evolving Culture vs Local Minima (2012) (31)Adaptive Parallel Tempering for Stochastic Maximum Likelihood Learning of RBMs (2010) (31)An EM Algorithm for Asynchronous Input/Output Hidden Markov Models (1996) (31)Contextual tag inference (2011) (31)Fraternal Dropout (2017) (31)Learning Eigenfunctions of Similarity: Linking Spectral Clustering and Kernel PCA (2003) (31)Universal Successor Representations for Transfer Reinforcement Learning (2018) (31)Equivalence of Equilibrium Propagation and Recurrent Backpropagation (2017) (31)Interactive Language Learning by Question Answering (2019) (30)Adaptive Drift-Diffusion Process to Learn Time Intervals (2011) (30)Convolutional neural networks for mesh-based parcellation of the cerebral cortex (2018) (30)Meta-learning framework with applications to zero-shot time-series forecasting (2020) (30)On the search for new learning rules for ANNs (1995) (30)Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines (2013) (30)Texture Modeling with Convolutional Spike-and-Slab RBMs and Deep Extensions (2012) (29)HNHN: Hypergraph Networks with Hyperedge Neurons (2020) (29)Manifold Mixup: Learning Better Representations by Interpolating Hidden States (2018) (28)DEUP: Direct Epistemic Uncertainty Prediction (2021) (28)Exponentially Increasing the Capacity-to-Computation Ratio for Conditional Computation in Deep Learning (2014) (28)A Data-Efficient Framework for Training and Sim-to-Real Transfer of Navigation Policies (2018) (28)Joint Training of Deep Boltzmann Machines (2012) (27)h-detach: Modifying the LSTM Gradient Towards Better Optimization (2018) (27)Speech and Speaker Recognition from Raw Waveform with SincNet (2018) (27)Statistical Learning Algorithms Applied to Automobile Insurance Ratemaking (2003) (27)Finding Flatter Minima with SGD (2018) (27)Building Musically-relevant Audio Features through Multiple Timescale Representations (2012) (27)Quick Training of Probabilistic Neural Nets by Importance Sampling (2003) (27)Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information (2018) (27)Saliency is a Possible Red Herring When Diagnosing Poor Generalization (2021) (27)Discriminative Non-negative Matrix Factorization for Multiple Pitch Estimation (2012) (27)On Tracking The Partition Function (2011) (27)Regularized Auto-Encoders Estimate Local Statistics (2012) (27)An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming (2021) (27)Estimating Car Insurance Premia: a Case Study in High-Dimensional Data Inference (2001) (26)Towards Gene Expression Convolutions using Gene Interaction Graphs (2018) (26)Tractable Multivariate Binary Density Estimation and the Restricted Boltzmann Forest (2010) (26)Programmable execution of multi-layered networks for automatic speech recognition (1989) (26)How to Initialize your Network? Robust Initialization for WeightNorm & ResNets (2019) (26)Brain Inspired Reinforcement Learning (2004) (26)The Spike-and-Slab RBM and Extensions to Discrete and Sparse Data Distributions (2014) (25)Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input (2019) (25)Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning (2021) (25)Commonsense mining as knowledge base completion? A study on the impact of novelty (2018) (25)Visualizing the Consequences of Climate Change Using Cycle-Consistent Adversarial Networks (2019) (25)Bayesian Structure Learning with Generative Flow Networks (2022) (24)Phonetically motivated acoustic parameters for continuous speech recognition using artificial neural networks (1991) (24)Rethinking Distributional Matching Based Domain Adaptation (2020) (24)Learning from Partial Labels with Minimum Entropy (2004) (24)A Hybrid Pareto Mixture for Conditional Asymmetric Fat-Tailed Distributions (2009) (24)Alternative time representation in dopamine models (2010) (23)Gradient-based Learning Applied to Document Recognition Gt Graph Transformer. Gtn Graph Transformer Network. Hmm Hidden Markov Model. Hos Heuristic Oversegmentation. K-nn K-nearest Neighbor. Nn Neural Network. Ocr Optical Character Recognition. Pca Principal Component Analysis. Rbf Radial Basis Func (1998) (23)Generalization of Equilibrium Propagation to Vector Field Dynamics (2018) (23)Word normalization for online handwritten word recognition (1994) (23)The Causal-Neural Connection: Expressiveness, Learnability, and Inference (2021) (23)Focused Hierarchical RNNs for Conditional Sequence Processing (2018) (23)Joint Training Deep Boltzmann Machines for Classification (2013) (23)Learning from unexpected events in the neocortical microcircuit (2021) (22)Bounding the Test Log-Likelihood of Generative Models (2013) (22)Bidirectional Helmholtz Machines (2015) (22)On the Iterative Refinement of Densely Connected Representation Levels for Semantic Segmentation (2018) (22)Learning Neural Causal Models with Active Interventions (2021) (22)Robust Regression with Asymmetric Heavy-Tail Noise Distributions (2002) (22)Multi-Image Super-Resolution for Remote Sensing using Deep Recurrent Networks (2020) (22)Discriminative feature and model design for automatic speech recognition (1997) (22)The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach (2018) (21)Diffusion of Credit in Markovian Models (1994) (21)Augmented Functional Time Series Representation and Forecasting with Gaussian Processes (2007) (21)Efficient EM Training of Gaussian Mixtures with Missing Data (2012) (21)Learning Causal Models Online (2020) (21)Deriving Differential Target Propagation from Iterating Approximate Inverses (2020) (21)A robust adaptive stochastic gradient method for deep learning (2017) (20)Attention Based Pruning for Shift Networks (2019) (20)Phonetically-based multi-layered neural networks for vowel classification (1990) (20)On Training Recurrent Neural Networks for Lifelong Learning (2018) (20)Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs (2013) (20)Topic Segmentation : A First Stage to Dialog-Based Information Extraction (2001) (20)Locally Linear Embedding for dimensionality reduction in QSAR (2004) (20)On the challenge of learning complex functions. (2007) (20)Learning the 2-D Topology of Images (2007) (20)Continuous Domain Adaptation with Variational Domain-Agnostic Feature Replay (2020) (20)Support vector machines for improving the classification of brain PET images (1998) (20)Variational Causal Networks: Approximate Bayesian Inference over Causal Structures (2021) (20)A Highly Adaptive Acoustic Model for Accurate Multi-dialect Speech Recognition (2019) (20)An EM approach to grammatical inference: input/output HMMs (1994) (19)MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation (2018) (19)Deep Self-Taught Learning for Handwritten Character Recognition (2010) (19)Improving First and Second-Order Methods by Modeling Uncertainty (2010) (19)PROC OF THE IEEE NOVEMBER Gradient Based Learning Applied to Document Recognition (2006) (18)Perceptual Generative Autoencoders (2019) (18)Natural Gradient Revisited (2013) (18)Browsing through high quality document images with DjVu (1998) (18)Fast and Slow Learning of Recurrent Independent Mechanisms (2021) (18)Modeling the Long Term Future in Model-Based Reinforcement Learning (2018) (18)How does hemispheric specialization contribute to human-defining cognition? (2021) (18)Generative Flow Networks for Discrete Probabilistic Modeling (2022) (18)Predicting Solution Summaries to Integer Linear Programs under Imperfect Information with Machine Learning (2018) (18)On Training Deep Boltzmann Machines (2012) (18)Diet Networks: Thin Parameters for Fat Genomic (2016) (18)Feedforward Initialization for Fast Inference of Deep Generative Networks is biologically plausible (2016) (17)Unsupervised Learning of Semantics of Object Detections for Scene Categorization (2013) (17)Problems in the deployment of machine-learned models in health care (2021) (17)Towards Standardization of Data Licenses: The Montreal Data License (2019) (17)GradMask: Reduce Overfitting by Regularizing Saliency (2019) (17)Discrete-Valued Neural Communication (2021) (17)Speaker Independent Speech Recognition with Neural Networks and Speech Knowledge (1989) (17)The effects of negative adaptation in Model-Agnostic Meta-Learning (2018) (16)Autotagging music with conditional restricted Boltzmann machines (2011) (16)Information matrices and generalization (2019) (16)How can deep learning advance computational modeling of sensory information processing? (2018) (16)Equilibrium Propagation with Continual Weight Updates (2019) (16)Variational Bi-LSTMs (2017) (16)ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient (2014) (16)Chunked Autoregressive GAN for Conditional Waveform Synthesis (2021) (16)A Neural Support Vector Network architecture with adaptive kernels (2000) (16)The Benefits of Over-parameterization at Initialization in Deep ReLU Networks (2019) (16)Efficient recognition of immunoglobulin domains from amino acid sequences using a neural network (1990) (16)Keep Drawing It: Iterative language-based image generation and editing (2018) (16)Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio (2018) (16)Continuous optimization of hyper-parameters (2000) (15)Reinforcement Learning for Sustainable Agriculture (2019) (15)Use of neural networks for the recognition of place of articulation (1988) (15)On the Morality of Artificial Intelligence (2019) (15)The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget (2020) (15)Trajectory Balance: Improved Credit Assignment in GFlowNets (2022) (15)Deep Directed Generative Autoencoders (2014) (15)Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks (2017) (14)Twin Regularization for online speech recognition (2018) (14)Generalizable Features From Unsupervised Learning (2016) (14)Quantized Guided Pruning for Efficient Hardware Implementations of Convolutional Neural Networks (2018) (14)RetroGNN: Approximating Retrosynthesis by Graph Neural Networks for De Novo Drug Design (2020) (14)Twin Networks: Using the Future as a Regularizer (2017) (14)Transformers with Competitive Ensembles of Independent Mechanisms (2021) (14)Compositional Generalization by Factorizing Alignment and Translation (2020) (14)Learning semantic representations of objects and their parts (2014) (14)An EM Approach to Learning Sequential (1994) (14)Biological Sequence Design with GFlowNets (2022) (14)Learned-norm pooling for deep neural networks (2013) (14)COVI White Paper (2020) (14)The Statistical Inefficiency of Sparse Coding for Images (or, One Gabor to Rule them All) (2011) (13)DETONATION CLASSIFICATION FROM ACOUSTIC SIGNATURE WITH THE RESTRICTED BOLTZMANN MACHINE (2012) (13)A memory-efficient adaptive Huffman coding algorithm for very large sets of symbols (1998) (13)A hybrid coder for hidden Markov models using a recurrent neural networks (1990) (13)NU-GAN: High resolution neural upsampling with GAN (2020) (13)Input decay: simple and effective soft variable selection (2001) (13)Extending the Framework of Equilibrium Propagation to General Dynamics (2018) (13)Reinforced Imitation in Heterogeneous Action Space (2019) (13)Generalization of a Parametric Learning Rule (1993) (13)The First Conversational Intelligence Challenge (2018) (13)An objective function for STDP (2015) (13)Learning invariant features through local space contraction (2011) (13)GibbsNet: Iterative Adversarial Inference for Deep Graphical Models (2017) (12)Target Propagation (2015) (12)S2RMs: Spatially Structured Recurrent Modules (2020) (12)Learning Hierarchical Structures On-The-Fly with a Recurrent-Recursive Model for Sequences (2018) (12)Plan, Attend, Generate: Planning for Sequence-to-Sequence Models (2017) (12)How Transferable Are Features in Convolutional Neural Network Acoustic Models across Languages? (2019) (12)Dynamic Inference with Neural Interpreters (2021) (12)Universal Successor Features for Transfer Reinforcement Learning (2018) (12)Statistical Machine Learning Algorithms for Target Classification from Acoustic Signature (2009) (11)Image Segmentation by Iterative Inference from Conditional Score Estimation (2017) (11)Scaling up deep learning (2014) (11)The effect of task and training on intermediate representations in convolutional neural networks revealed with modified RV similarity analysis (2019) (11)HighRes-net: Multi-Frame Super-Resolution by Recursive Fusion (2019) (11)Conditioning and time representation in long short-term memory networks (2014) (11)Missing Data with Recurrent Networks Handling Asynchronous or Missing Data with Recurrent Networks (1998) (11)On Catastrophic Interference in Atari 2600 Games (2020) (11)BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization (2020) (11)Boundary Seeking GANs (2018) (11)Implicit Density Estimation by Local Moment Matching to Sample from Auto-Encoders (2012) (11)Locally Weighted Full Covariance Gaussian Density Estimation (2004) (10)A learning-based algorithm to quickly compute good primal solutions for Stochastic Integer Programs (2019) (10)Properties from Mechanisms: An Equivariance Perspective on Identifiable Representation Learning (2021) (10)A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning (2021) (10)Unsupervised and Transfer Learning under Uncertainty - From Object Detections to Scene Categorization (2013) (10)Multimodal Transitions for Generative Stochastic Networks (2013) (10)Predicting Infectiousness for Proactive Contact Tracing (2020) (10)A Deep Reinforcement Learning Chatbot (Short Version) (2018) (10)Dynamic Frame Skipping for Fast Speech Recognition in Recurrent Neural Network Based Acoustic Models (2018) (10)Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach (2020) (10)Weakly-supervised Knowledge Graph Alignment with Adversarial Learning (2019) (9)On the Equivalence between Deep NADE and Generative Stochastic Networks (2014) (9)Automated segmentation of cortical layers in BigBrain reveals divergent cortical and laminar thickness gradients in sensory and motor cortices. (2019) (9)Deep learning and cultural evolution (2014) (9)Generalization in Machine Learning via Analytical Learning Theory (2018) (9)The Octopus Approach to the Alexa Competition : A Deep Ensemble-based Socialbot (2017) (8)Compositional Attention: Disentangling Search and Retrieval (2021) (8)Untangling tradeoffs between recurrence and self-attention in artificial neural networks (2020) (8)Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery (2019) (8)Iteratively unveiling new regions of interest in Deep Learning models (2018) (8)Neural Network - Gaussian Mixture Hybrid for Speech Recognition or Density Estimation (1991) (8)A Large-Scale, Open-Domain, Mixed-Interface Dialogue-Based ITS for STEM (2020) (8)A3T: Adversarially Augmented Adversarial Training (2018) (8)How to construct deep recurrent neural networks: Proceedings of the Second International Conference on Learning Representations (ICLR 2014) (2014) (8)Untangling tradeoffs between recurrence and self-attention in neural networks (2020) (8)Unifying Generative Models with GFlowNets (2022) (8)TRAINING A NEURAL NETWORK WITH A FINANCIAL CRITERION RATHER THAN A PREDICTION CRITERION (2007) (8)Machines Who Learn. (2016) (8)Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing Accuracy (2019) (8)Multiscale sequence modeling with a learned dictionary (2017) (8)An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism (2009) (8)Oracle Performance for Visual Captioning (2015) (8)Modularity Matters: Learning Invariant Relational Reasoning Tasks (2018) (7)Découpage thématique des conversations : un outil d'aide à l'extraction (2002) (7)Deep Learning for Automatic Summary Scoring (2012) (7)Factorizing Declarative and Procedural Knowledge in Structured, Dynamical Environments (2021) (7)GFlowNet Foundations (2021) (7)A Generative Process for Contractive Auto-Encoders (2012) (7)Is a Modular Architecture Enough? (2022) (7)An Analysis of the Adaptation Speed of Causal Models (2020) (7)Discovering Shared Structure in Manifold Learning (2004) (7)Conditional Computation for Continual Learning (2019) (7)A simple and general method for semi-supervised learning (2010) (7)ACtuAL: Actor-Critic Under Adversarial Learning (2017) (7)Modeling Cloud Reflectance Fields using Conditional Generative Adversarial Networks (2020) (7)Multi-scale Feature Learning Dynamics: Insights for Double Descent (2021) (7)Non-parametric Regression between Riemannian Manifolds (2009) (7)Plan, Attend, Generate: Character-Level Neural Machine Translation with Planning (2017) (6)Learning the Arrow of Time for Problems in Reinforcement Learning (2020) (6)A Dataset of Topic-Oriented Human-to-Chatbot Dialogues (2018) (6)COVI-AgentSim: an Agent-based Model for Evaluating Methods of Digital Contact Tracing (2020) (6)Suitability of V1 Energy Models for Object Classification (2011) (6)Structured Sparsity Inducing Adaptive Optimizers for Deep Learning (2021) (6)FloW: A Dataset and Benchmark for Floating Waste Detection in Inland Waters (2021) (6)On random weights for texture generation in one layer CNNS (2017) (6)From Machine Learning to Robotics: Challenges and Opportunities for Embodied Intelligence (2021) (6)Big Data: Theoretical Aspects [Scanning the Issue] (2016) (6)Data-Driven Execution of Multi-Layered Networks for Automatic Speech Recognition (1988) (6)CACHE (Critical Assessment of Computational Hit-finding Experiments): A public–private partnership benchmarking initiative to enable the development of computational methods for hit-finding (2021) (6)Guest Introduction: Special Issue on New Methods for Model Selection and Model Combination (2002) (6)Machine Learning for Glacier Monitoring in the Hindu Kush Himalaya (2020) (6)NYU-MILA Neural Machine Translation Systems for WMT’16 (2016) (6)Variance Regularizing Adversarial Learning (2017) (6)Use of Multi-Layered Networks for Coding Speech with Phonetic Features (1988) (5)Discussion of "The Neural Autoregressive Distribution Estimator" (2011) (5)Shared Context Probabilistic Transducers (1997) (5)Mastering Rate based Curriculum Learning (2020) (5)On the Generalization Capability of Multi-Layered Networks in the Extraction of Speech Properties (1989) (5)On-line handwriting recognition with neural networks: Spatial representation versus temporal representation (1993) (5)Towards Understanding Generalization via Analytical Learning Theory (2018) (5)On the Morality of Artificial Intelligence [Commentary] (2020) (5)Learning Powerful Policies by Using Consistent Dynamics Model (2019) (5)Modeling Natural Image Covariance with a Spike and Slab Restricted Boltzmann Machine (2010) (5)Towards Open-Text Semantic Parsing via Multi-Task Learning of Structured Embeddings (2011) (5)Understanding deep architectures and the effect of unsupervised pre-training (2011) (5)Binary pseudowavelets and applications to bilevel image processing (1999) (5)hBERT + BiasCorp - Fighting Racism on the Web (2021) (5)Systematicity in a Recurrent Neural Network by Factorizing Syntax and Semantics (2020) (5)Generating Multiscale Amorphous Molecular Structures Using Deep Learning: A Study in 2D. (2020) (5)Combining Model-based and Model-free RL via Multi-step Control Variates (2018) (5)Deep Tempering (2014) (5)Underwhelming Generalization Improvements From Controlling Feature Attribution (2019) (5)Towards Scaling Difference Target Propagation by Learning Backprop Targets (2022) (5)Toward Next-Generation Artificial Intelligence: Catalyzing the NeuroAI Revolution (2022) (5)Learning from Learning Machines: Optimisation, Rules, and Social Norms (2019) (4)Weakly Supervised Representation Learning with Sparse Perturbations (2022) (4)Establishing an evaluation metric to quantify climate change image realism (2019) (4)Using Simulated Data to Generate Images of Climate Change (2020) (4)Use of multilayer networks for the recognition of phonetic features and phonemes (1989) (4)On Out-of-Sample Statistics for Time-Series (2002) (4)Forecasting and Trading Commodity Contract Spreads with Gaussian Processes (2007) (4)Probabilistic neural network models for sequential data (2000) (4)ClimateGAN: Raising Climate Change Awareness by Generating Images of Floods (2021) (4)Ghost Units Yield Biologically Plausible Backprop in Deep Neural Networks (2019) (4)Using Artificial Intelligence to Visualize the Impacts of Climate Change (2021) (4)A Hybrid Pareto Model for Conditional Density Estimation of Asymmetric Fat-Tail Data (2007) (4)Unifying Likelihood-free Inference with Black-box Sequence Design and Beyond (2021) (4)Cross-Modal Information Maximization for Medical Imaging: CMIM (2020) (4)Combating False Negatives in Adversarial Imitation Learning (2020) (4)Empirical performance upper bounds for image and video captioning (2015) (4)The Variational Walkback Algorithm (2016) (4)A Two-Stream Continual Learning System With Variational Domain-Agnostic Feature Replay (2021) (4)Trainable performance upper bounds for image and video captioning (2015) (4)A New Era: Intelligent Tutoring Systems Will Transform Online Learning for Millions (2022) (4)Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers (2020) (4)Learning GFlowNets from partial episodes for improved convergence and stability (2022) (4)RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds in vitro (2022) (4)Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL (2022) (4)Discrete Key-Value Bottleneck (2022) (4)Factorized embeddings learns rich and biologically meaningful embedding spaces using factorized tensor decomposition (2020) (3)Visual Concept Reasoning Networks (2020) (3)RetroGNN: Fast Estimation of Synthesizability for Virtual Screening and De Novo Design by Learning from Slow Retrosynthesis Software (2022) (3)CMIM: Cross-Modal Information Maximization For Medical Imaging (2021) (3)A Hybrid Pareto Model for Asymmetric Fat-Tail Data (2006) (3)Training Bidirectional Helmholtz Machines (2015) (3)Exploration-Driven Representation Learning in Reinforcement Learning (2021) (3)The Journey is the Reward: Unsupervised Learning of Influential Trajectories (2019) (3)GFlowNets and variational inference (2022) (3)Connectionist Models and their Application to Automatic Speech Recognition (1991) (3)Unsupervised one-to-many image translation (2018) (3)Continuous-Time Meta-Learning with Forward Mode Differentiation (2022) (3)Valorisation d'Options par Optimisation du Sharpe Ratio (2002) (3)Learning to rank for censored survival data (2018) (3)COVI White Paper-Version 1.1 (2020) (3)State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations (2019) (3)Reinforced Imitation Learning from Observations (2018) (3)Noisy K Best-Paths for Approximate Dynamic Programming with Application to Portfolio Optimization (2007) (3)Graph-Based Semi-Supervised Learning (2005) (3)An Actor-Critic Algorithm for Structured Prediction (2016) (3)Stochastic Learning of Strategic Equilibria for Auctions (1999) (3)Deep Architectures for Baby AI (2007) (3)Large-Scale Algorithms (2006) (3)Blocks and Fuel (2015) (3)Supplementary material for : How transferable are features in deep neural networks ? (2014) (3)A Walk with SGD: How SGD Explores Regions of Deep Network Loss? (2018) (3)Predicting Unreliable Predictions by Shattering a Neural Network (2021) (3)From STDP towards Biologically Plausible Deep Learning (2015) (3)Learning the Arrow of Time (2019) (3)On the Generalization and Adaption Performance of Causal Models (2022) (3)Applying Knowledge Transfer for Water Body Segmentation in Peru (2019) (3)Distributed Representation Prediction for Generalization to New Words (2006) (2)Learning Simple Non Stationarities with Hyper Parameters (1999) (2)Multi-Domain Balanced Sampling Improves Out-of-Distribution Generalization of Chest X-ray Pathology Prediction Models (2021) (2)Gaussian Mixture Densities for Classification of Nuclear Power Plant Data (1998) (2)On Random Weights for Texture Generation in One Layer Neural Networks (2016) (2)USE OF NEURAL NETWORKS FOR THE RECOGNITION OF PLACE (1988) (2)Deep Learning. Das umfassende Handbuch (2018) (2)MEMORY-EFFICIENT ADAPTIVE HUFFMAN CODING (1998) (2)Training opposing directed models using geometric mean matching (2015) (2)Document Analysis with Transducers (2015) (2)Radial Basis Functions for Speech Recognition (1992) (2)Multi-Task Learning For Option Pricing (2002) (2)Spatially Structured Recurrent Modules (2021) (2)Agnostic Physics-Driven Deep Learning (2022) (2)Comparative Study of Learning Outcomes for Online Learning Platforms (2021) (2)Unifying Likelihood-free Inference with Black-box Optimization and Beyond (2021) (2)Workshop summary: Workshop on learning feature hierarchies (2009) (2)Joint Learning of Generative Translator and Classifier for Visually Similar Classes (2019) (2)Speech coding with multilayer networks (1989) (2)Low-memory convolutional neural networks through incremental depth-first processing (2018) (2)Task Loss Estimation for Structured Prediction (2016) (2)Generalizing to a zero-data task : a computational chemistry case study (2006) (2)Régularisation du prix des options : Stacking (2002) (2)Estimators of Variance for K-Fold Cross-Validation (2003) (2)SGD Smooths The Sharpest Directions (2018) (2)Marathi Handwritten Numeral Recognition using Fourier Descriptors and Normalized Chain Code (2017) (2)Predictive Inference with Feature Conformal Prediction (2022) (2)A Common GPU n-Dimensional Array for Python and C (2011) (2)Extracting Hidden Sense Probabilities from Bitexts (2003) (2)Sparse Attentive Backtracking : Towards Efficient Credit Assignment In Recurrent Networks (2017) (2)Apprentissage machine efficace: theorie et pratique (2012) (2)Latent Bottlenecked Attentive Neural Processes (2022) (1)Deep Learning for NLP (without Magic) References (2012) (1)AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N (2022) (1)Training neural networks to recognize speech increased their correspondence to the human auditory pathway but did not yield a shared hierarchy of acoustic features (2021) (1)Comment améliorer la capacité de généralisation des algorithmes d'apprentissage pour la prise de décisions financières (2003) (1)On Out-of-Sample Statistics for Financial Time-Series (2002) (1)Étude du biais dans le prix des options (2002) (1)Designing Biological Sequences via Meta-Reinforcement Learning and Bayesian Optimization (2022) (1)Extended Semantic Tagging for Entity Extraction (1)The Effect of Diversity in Meta-Learning (2022) (1)Metric-based model selection for time-series forecasting (2002) (1)Markovian Models for Sequential (2004) (1)GraphMix: Improved Training of Graph Neural Networks for Semi-Supervised Learning (2020) (1)J un 2 01 3 Deep Learning of Representations : Looking Forward (2013) (1)Segmentation en thèmes de conversations téléphoniques : traitement en amont pour l’extraction d’information (2002) (1)Statistical Language and Speech Processing (2013) (1)Rethinking Learning Dynamics in RL using Adversarial Networks (2022) (1)Pattern Recognition (1998) (1)Exploring the Wasserstein metric for time-to-event analysis (2021) (1)Automated curriculum generation for Policy Gradients from Demonstrations (2019) (1)Avoidance Learning Using Observational Reinforcement Learning (2019) (1)Generative Augmented Flow Networks (2022) (1)InfoBot: Structured Exploration in ReinforcementLearning Using Information Bottleneck (2019) (1)Predicting ice flow using machine learning (2019) (1)Speech coding with multi-layer networks (1989) (1)Mode Regularized Generative Adversarial (2016) (1)Codon arrangement modulates MHC-I peptides presentation (2020) (1)Convergence Properties of Deep Neural Networks on Separable Data (2018) (1)A Neural Network to Detect Homologies in Proteins (1989) (1)Convergence Properties of the K-means Algorithms L Eon Bottou (1995) (1)Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 3-6, 2007 (2008) (1)Introduction to NIPS 2017 Competition Track (2018) (1)Quantized Guided Pruning for Efficient Hardware Implementations of Deep Neural Networks (2020) (1)Lookback for Learning to Branch (2022) (1)Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning (2022) (1)Sharp Minima Can Generalize For Deep Nets Supplementary Material (2017) (1)Incorporating complex cells into neural networks for pattern classification (2011) (1)Towards the Latent Transcriptome (2018) (1)Building Robust Ensembles via Margin Boosting (2022) (1)Approche statistique pour le repérage de mots informatifs dans les textes oraux (2004) (1)Gaussian Mixtures with Missing Data: an Ecien t EM Training Algorithm (1994) (1)Coordinating Policies Among Multiple Agents via an Intelligent Communication Channel (2022) (0)Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning (2022) (0)Learning Classical Planning Transition Functions by Deep Neural Networks (2020) (0)Learning semantic representations of objects and their parts (2013) (0)SUPPLEMENTARY MATERIAL-LEARNING TO NAVIGATE THE SYNTHETICALLY ACCESSIBLE CHEMICAL SPACE USING REINFORCEMENT LEARNING (2020) (0)FedILC: Weighted Geometric Mean and Invariant Gradient Covariance for Federated Learning on Non-IID Data (2022) (0)Artificial Intelligence Based Cloud Distributor (AI-CD): Probing Low Cloud Distribution with Generative Adversarial Neural Networks (2019) (0)Artificial Intelligence Pioneers But making those quantum leaps from science fiction to reality required hard work from computer scientists like (0)Pruning for efficient hardware implementations of deep neural networks (2020) (0)Extended Abstract Track Object-Centric Causal Representation Learning (2022) (0)Robust and Controllable Object-Centric Learning through Energy-based Models (2022) (0)Latent State Marginalization as a Low-cost Approach for Improving Exploration (2022) (0)Posterior samples of source galaxies in strong gravitational lenses with score-based priors (2022) (0)Information Fusion in Deep Convolutional Neural Networks for Biomedical Image Segmentation 1 (2018) (0)Université de Montréal Estimating the probability of a fleet vehicle accident: A deep learning approach using Conditional Variational Auto-Encoders (2020) (0)UOUS AND DISCRETE ADDRESSING SCHEMES (2016) (0)Neural Production Systems: Learning Rule-Governed Visual Dynamics (2021) (0)BabyAI 1.1 (2020) (0)Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning (2020) (0)Machine Learning (2021) (0)Contrastive introspection (ConSpec) to rapidly identify invariant prototypes for success in RL (2022) (0)Bayesian learning of Causal Structure and Mechanisms with GFlowNets and Variational Bayes (2022) (0)A General Purpose Neural Architecture for Geospatial Systems (2022) (0)Estimation de densité conditionnelle lorsque l'hypothèse de normalité est insatisfaisante (2004) (0)Proposed Architectural and Representational Modifications (2021) (0)L EARNING THE A RROW OF T IME FOR P ROBLEMS IN R EINFORCEMENT L EARNING (2020) (0)EVALUATING LONG-TERM DEPENDENCYBENCHMARK PROBLEMS BY RANDOM GUESSINGJ (2001) (0)The Challenge of Non-Linear Regression on Large Datasets with Asymmetric Heavy Tails (2002) (0)Extending Metric-Based Model Selection and Regularization in the Absence of Unlabeled Data (0)Learning powerful policies and better dynamics models by encouraging consistency (2018) (0)Estimating Car Insuran e Premia : a Case Study in High-Dimensional (2013) (0)F IT N ETS : H INTS FOR T HIN D EEP N ETS (2015) (0)SCANNING THE ISSUE Big Data : Theoretical Aspects (2015) (0)Stacked calibration of off-policy policy evaluation for video game matchmaking (2013) (0)Synergies Between Disentanglement and Sparsity: a Multi-Task Learning Perspective (2022) (0)Depthwith nonlinearity creates no bad localminima in ResNets (2019) (0)IAPR keynote lecture IV: Deep learning (2015) (0)Image-to-image Mapping with Many Domains by Sparse Attribute Transfer (2020) (0)SGD S MOOTHS THE S HARPEST D IRECTIONS (2018) (0)18 Large-Scale Algorithms (0)Learning of Sophisticated Curriculums by viewing them as Graphs over Tasks (2018) (0)Former NASA chief unveils $ 100 million neural chip maker KnuEdge (2016) (0){COMPANYNAME}11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery (2019) (0)How do We Train Deep Architectures ? (2009) (0)Optimization of Artificial Neural Network Hyperparameters For Processing Retrospective Information (2021) (0)Problèmes associés au déploiement des modèles fondés sur l’apprentissage machine en santé (2021) (0)Part I Feature Extraction Fundamentals 11 Ensembles of Regularized Least Squares Classifiers for High-dimensional Problems 15 Tree-based Ensembles with Dynamic Soft Feature Selection 18 Bayesian Support Vector Machines for Feature Ranking and Selection 21 Feature Selection via Sensitivity Analysis w (0)An Energy-Based Recurrent Neural Network for Multiple Fundamental Frequency Estimation (2011) (0)The K Best-Paths Approach to Approximate Dynamic Programming with Application to Portfolio Optimization (2006) (0)Learning Neural Generative Dynamics for Molecular Conformation Generation (2021) (0)Recurrent Neural Networks for Adaptive Temporal ProcessingYoshua Bengio (1993) (0)O ct 2 01 9 S MALL-GAN : S PEEDING UP GAN T RAINING USING C ORES ETS (2019) (0)Proceedings of the 22nd International Conference on Neural Information Processing Systems (2009) (0)A survey on recent activation functions with emphasis on oscillating activation functions (2022) (0)Conditioning and time representation in long short-term memory networks (2013) (0)FL Games: A federated learning framework for distribution shifts (2022) (0)LATTER M INIMA WITH SGD (2018) (0)Generalization to a zero-data task: an empirical study (0)Learning Generative Models with Locally Disentangled Latent Factors (2018) (0)Reassuring and Troubling Views on Graph-Based Semi-Supervised Learning (2005) (0)Collaborative filtering techniques for drug discovery par 7 M / t ( 3 ’ / 7 (2016) (0)The representational geometry of word meanings acquired by neural machine translation models (2017) (0)Generalization (2020) (0)Meta Attention Networks: Meta Learning Attention To Modulate Information Between Sparsely Interacting Recurrent Modules (2020) (0)Combating False Negatives in Adversarial Imitation Learning (Student Abstract) (2020) (0)Multi-Domain Balanced Sampling Improves Out-of- Generalization of Chest X-ray Pathology Prediction Models (2021) (0)Marathi Handwritten Numeral Recognition using Zernike Moments and Fourier Descriptors (2020) (0)Equivariance with Learned Canonicalization Functions (2022) (0)Deep Meditations : Controlled navigation of latent space (2018) (0)On the Optimization of a Synaptic LearningRuleSamy (1997) (0)Model Sele tion for Small Sample (2000) (0)Generalization of a Parametric LearningRule (1993) (0)Supplemental Material for : Deep Generative Stochastic Networks Trainable by Backprop (2014) (0)Graph Priors for Deep Neural Networks (2018) (0)Aprendizaje profundo. Tras años de decepciones, la inteligencia artiñcial está empezando a cumplir lo que prometia en sus comienzos gracias a esta potente técnica (2016) (0)On learning distributed representations of semantics (2011) (0)Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One (2022) (0)Forecasting Non-Stationary Volatility with Hyper-Parameters (2002) (0)BigBrain: 1D convolutional neural networks for automated sementation of cortical layers (2018) (0)Pen-based visitor registration system (PENGUIN) (1994) (0)PhAST: Physics-Aware, Scalable, and Task-specific GNNs for Accelerated Catalyst Design (2022) (0)Towards more hardware-friendly deep learning (2017) (0)WARDS BETTER OPTIMIZATION (2019) (0)Artificial Intelligence Cytometer in Blood (2019) (0)The AI Driving Olympics at NIPS 2018 (0)MIREX TAGGING CONTEST : A DEEP NEURAL NET APPROACH ( DRAFT ) (2008) (0)Learning Latent Multiscale Structure Using Recurrent Neural Networks (2016) (0)Proposed Algorithm : Algorithm (2007) (0)2 The Curse of Dimensionality for Classical Non-Parametric Models (0)On summarized validation curves and generalization (2019) (0)»Deep Learning ist keine Religion« (2018) (0)Les données au service du savoir (2017) (0)A comparative study on hybrid acoustic phonetic decoders based on artificial neural networks (1991) (0)On the Use of an Ear Model and Multi-Layered Networks for Automatic Speech Recognition (1990) (0)Markovian Models for Sequential DataYoshua (1996) (0)PAST DSAA KEYNOTE SPEAKERS (2020) (0)Contrastive introspection (ConSpec) to rapidly identify invariant steps for success (2022) (0)TRANSFER REINFORCEMENT LEARNING (2018) (0)Stochastic Gradient Descent on a Portfolio Management Training Criterion Using the IPA Gradient Estimator (2003) (0)SPECTRA: Sparse Entity-centric Transitions (2019) (0)Graph-Based Active Machine Learning Method for Diverse and Novel Antimicrobial Peptides Generation and Selection (2022) (0)E VALUATING G ENERALIZATION IN GF LOW N ETS FOR M OLECULE D ESIGN (2022) (0)Stateful active facilitator: Coordination and Environmental Heterogeneity in Cooperative Multi-Agent Reinforcement Learning (2022) (0)I NDUCTIVE B IASES FOR R ELATIONAL T ASKS (2022) (0)Learning Long-term Dependencies Using Cognitive Inductive Biases in Self-attention RNNs (2020) (0)Leveraging the Third Dimension in Contrastive Learning (2022) (0)L G ] 2 9 D ec 2 01 8 Quantized Guided Pruning for Efficient Hardware Implementations of Convolutional Neural Networks (2019) (0)A semantic matching energy function for learning with multi-relational data (2013) (0)Bayesian Structure Learning with Generative Flow Networks (Supplementary material) (2022) (0)Repérage de mots informatifs dans les textes conversationnels (2004) (0)Interventional Causal Representation Learning (2022) (0)Multi-Objective GFlowNets (2022) (0)GFlowOut: Dropout with Generative Flow Networks (2022) (0)M L ] 2 0 A ug 2 01 3 Pylearn 2 : a machine learning research library (2014) (0)Continual Weight Updates and Convolutional Architectures for Equilibrium Propagation (2020) (0)VIM: Variational Independent Modules for Video Prediction (2022) (0)(Private)-Retroactive Carbon Pricing [(P)ReCaP]: A Market-based Approach for Climate Finance and Risk Assessment (2022) (0)CAMAP: Artificial neural networks unveil the role of codon arrangement in modulating MHC-I peptides presentation (2020) (0)MAgNet: Mesh Agnostic Neural PDE Solver (2022) (0)IGURE QA : A N A NNOTATED F IGURE D ATASET FOR V ISUAL R EASONING (2018) (0)O BJECT - CENTRIC C OMPOSITIONAL I MAGINATION FOR V ISUAL A BSTRACT R EASONING (2022) (0)EnGAN: Latent Space MCMC and Maximum Entropy Generators for Energy-based Models (2018) (0)Exploring the Wasserstein metric for survival analysis (2021) (0)Neural Attentive Circuits (2022) (0)Université de Montréal Balancing Signals for Semi-Supervised Sequence Learning (2020) (0)Consistent Training via Energy-Based GFlowNets for Modeling Discrete Joint Distributions (2022) (0)Episodes Meta Sequence S 2 Fast Update Slow Update Fast Update Slow Update (2021) (0)Bayesian Dynamic Causal Discovery (2022) (0)Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints (2022) (0)RNNLOGIC: LEARNING LOGIC RULES FOR REASON- (2020) (0)Adaptive Discrete Communication Bottlenecks with Dynamic Vector Quantization (2022) (0)Proceedings of the 21st International Conference on Neural Information Processing Systems (2008) (0)On Neural Architecture Inductive Biases for Relational Tasks (2022) (0)Diversifying Design of Nucleic Acid Aptamers Using Unsupervised Machine Learning (2022) (0)More Papers This paper list is powered by the following services:
Other Resources About Yoshua Bengio What Schools Are Affiliated With Yoshua Bengio? Yoshua Bengio is affiliated with the following schools:
Yoshua Bengio's AcademicInfluence.com Rankings