Csaba Szepesv'ari
#115,923
Most Influential Person Now
Csaba Szepesv'ari's AcademicInfluence.com Rankings
Csaba Szepesv'aricomputer-science Degrees
Computer Science
#4522
World Rank
#4771
Historical Rank
Machine Learning
#912
World Rank
#925
Historical Rank
Artificial Intelligence
#1129
World Rank
#1149
Historical Rank
Database
#1737
World Rank
#1822
Historical Rank

Download Badge
Computer Science
Csaba Szepesv'ari's Degrees
- PhD Computer Science University of Szeged
Similar Degrees You Can Earn
Why Is Csaba Szepesv'ari Influential?
(Suggest an Edit or Addition)Csaba Szepesv'ari's Published Works
Number of citations in a given year to any of this author's works
Total number of citations to an author for the works they published in a given year. This highlights publication of the most important work(s) by the author
Published Works
- Bandit Based Monte-Carlo Planning (2006) (2872)
- Improved Algorithms for Linear Stochastic Bandits (2011) (1291)
- Algorithms for Reinforcement Learning (2010) (1198)
- Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms (2000) (641)
- Fast gradient-descent methods for temporal-difference learning with linear function approximation (2009) (584)
- Exploration-exploitation tradeoff using variance estimates in multi-armed bandits (2009) (569)
- Finite-Time Bounds for Fitted Value Iteration (2008) (452)
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path (2006) (396)
- Parametric Bandits: The Generalized Linear Case (2010) (367)
- X-Armed Bandits (2010) (360)
- Learning with a Strong Adversary (2015) (319)
- Regret Bounds for the Adaptive Control of Linear Quadratic Systems (2011) (316)
- A Generalized Reinforcement-Learning Model: Convergence and Applications (1996) (267)
- Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation (2009) (258)
- Cascading Bandits: Learning to Rank in the Cascade Model (2015) (239)
- Toward Off-Policy Learning Control with Function Approximation (2010) (238)
- Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods (2007) (238)
- A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation (2008) (235)
- Fitted Q-iteration in continuous action-space MDPs (2007) (225)
- Error Propagation for Approximate Policy and Value Iteration (2010) (210)
- Empirical Bernstein stopping (2008) (208)
- Online Optimization in X-Armed Bandits (2008) (208)
- Improved Rates for the Stochastic Continuum-Armed Bandit Problem (2007) (208)
- Online Learning under Delayed Feedback (2013) (207)
- Tuning Bandit Algorithms in Stochastic Environments (2007) (189)
- Model-Based Reinforcement Learning with Value-Targeted Regression (2020) (188)
- Online Markov Decision Processes Under Bandit Feedback (2010) (184)
- The grand challenge of computer Go (2012) (179)
- Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping (2008) (179)
- A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation (2008) (173)
- A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms (1999) (171)
- On the Global Convergence Rates of Softmax Policy Gradient Methods (2020) (162)
- Partial Monitoring - Classification, Regret Bounds, and Algorithms (2014) (155)
- Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs (2010) (153)
- Improved Monte-Carlo Search (2006) (146)
- Regularized Policy Iteration (2008) (141)
- The Asymptotic Convergence-Rate of Q-learning (1997) (137)
- Model-based reinforcement learning with nearly tight exploration complexity bounds (2010) (136)
- Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits (2012) (130)
- Learning with Good Feature Representations in Bandits and in RL with a Generative Model (2019) (124)
- Behaviour Suite for Reinforcement Learning (2019) (123)
- Finite time bounds for sampling based fitted value iteration (2005) (122)
- Toward Minimax Off-policy Value Estimation (2015) (119)
- Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes (2020) (115)
- Manifold-Adaptive Dimension Estimation (2007) (99)
- Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? (2018) (95)
- POLITEX: Regret Bounds for Policy Iteration using Expert Prediction (2019) (94)
- Toward a classification of finite partial-monitoring games (2010) (93)
- Multi-criteria Reinforcement Learning (1998) (92)
- Regularized Policy Iteration with Nonparametric Function Spaces (2016) (90)
- Bandits with Delayed, Aggregated Anonymous Feedback (2017) (90)
- Online Learning to Rank in Stochastic Click Models (2017) (86)
- Combinatorial Cascading Bandits (2015) (84)
- Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems (2009) (83)
- Interpolation-based Q-learning (2004) (83)
- The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits (2016) (83)
- Analysis of Kernel Mean Matching under Covariate Shift (2012) (80)
- Bernoulli Rank-1 Bandits for Click Feedback (2017) (80)
- DCM Bandits: Learning to Rank with Multiple Clicks (2016) (80)
- Training parsers by inverse reinforcement learning (2009) (80)
- Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions (2013) (80)
- Model-Free Linear Quadratic Control via Reduction to Expert Prediction (2018) (79)
- Partial Monitoring with Side Information (2012) (76)
- Conservative Bandits (2016) (73)
- The adversarial stochastic shortest path problem with unknown transition probabilities (2012) (72)
- Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control (2015) (71)
- Variational Policy Gradient Method for Reinforcement Learning with General Utilities (2020) (70)
- Following the Leader and Fast Rates in Online Linear Prediction: Curved Constraint Sets and Other Regularities (2017) (70)
- Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions (2020) (66)
- Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path (2015) (66)
- Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures (2018) (64)
- Model Selection in Contextual Stochastic Bandit Problems (2020) (63)
- Online learning for linearly parametrized control problems (2012) (62)
- Online Learning in Markov Decision Processes with Changing Cost Sequences (2014) (62)
- Active learning in heteroscedastic noise (2010) (61)
- Stochastic Rank-1 Bandits (2016) (61)
- Tighter risk certificates for neural networks (2020) (58)
- Active Learning in Multi-armed Bandits (2008) (57)
- CoinDICE: Off-Policy Confidence Interval Estimation (2020) (55)
- (Bandit) Convex Optimization with Biased Noisy Gradient Oracles (2015) (55)
- Randomized Exploration in Generalized Linear Bandits (2019) (55)
- The Online Loop-free Stochastic Shortest-Path Problem (2010) (54)
- Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits (2018) (53)
- PAC-Bayes Analysis Beyond the Usual Bounds (2020) (53)
- An Information-Theoretic Approach to Minimax Regret in Partial Monitoring (2019) (52)
- Model Selection in Reinforcement Learning (2011) (52)
- Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems (2011) (51)
- TopRank: A practical algorithm for online stochastic ranking (2018) (51)
- Adaptive Exploration in Linear Contextual Bandit (2019) (51)
- Module-Based Reinforcement Learning: Experiments with a Real Robot (1998) (50)
- Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits (2014) (50)
- Sequential Learning for Multi-Channel Wireless Network Monitoring With Channel Switching Costs (2014) (49)
- Prediction of protein functional domains from sequences using artificial neural networks. (2001) (48)
- Sequential learning for optimal monitoring of multi-channel wireless networks (2011) (48)
- Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments (2011) (45)
- PAC-Bayes bounds for stable algorithms with instance-dependent priors (2018) (43)
- Linear Multi-Resource Allocation with Semi-Bandit Feedback (2015) (43)
- Cost-sensitive Multiclass Classification Risk Bounds (2013) (43)
- Uncertainty, performance, and model dependency in approximate adaptive nonlinear control (1997) (42)
- Stochastic Optimization in a Cumulative Prospect Theory Framework (2018) (42)
- Near-optimal max-affine estimators for convex regression (2015) (42)
- Regularization in reinforcement learning (2011) (41)
- Deep Representations and Codes for Image Auto-Annotation (2012) (40)
- Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms (2016) (40)
- Online Learning with Gaussian Payoffs and Side Observations (2015) (39)
- An adaptive algorithm for finite stochastic partial monitoring (2012) (38)
- Reinforcement Learning Algorithms for MDPs (2011) (38)
- A General Projection Property for Distribution Families (2009) (36)
- Learning Exercise Policies for American Options (2009) (36)
- Statistical linear estimation with penalized estimators: an application to reinforcement learning (2012) (36)
- Bayesian Optimal Control of Smoothly Parameterized Systems (2015) (36)
- Continuous Time Associative Bandit Problems (2007) (35)
- Topology Learning Solved by Extended Objects: A Neural Network Model (1993) (35)
- Margin Maximizing Discriminant Analysis (2004) (34)
- On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method (2021) (34)
- Perturbed-History Exploration in Stochastic Linear Bandits (2019) (33)
- Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting (2020) (32)
- Evaluation and Analysis of the Performance of the EXP3 Algorithm in Stochastic Environments (2013) (32)
- Use of variance estimation in the multi-armed bandit problem (2006) (32)
- Model-based and Model-free Reinforcement Learning for Visual Servoing (2009) (31)
- An automatic method for the identification and interpretation of clustered microcalcifications in mammograms. (1999) (31)
- Detecting Overfitting via Adversarial Examples (2019) (30)
- LEAPSANDBOUNDS: A Method for Approximately Optimal Algorithm Configuration (2018) (30)
- Exploration by Optimisation in Partial Monitoring (2019) (29)
- LS-N-IPS: An Improvement of Particle Filters by Means of Local Search (2001) (29)
- Online Learning with Costly Features and Labels (2013) (29)
- Universal Option Models (2014) (29)
- Variance estimates and exploration function in multi-armed bandit (2008) (28)
- A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds (2017) (27)
- Performance of Nonlinear Approximate Adaptive Controllers (2003) (27)
- Stochastic Low-Rank Bandits (2017) (26)
- Regret Bounds for Model-Free Linear Quadratic Control (2018) (26)
- Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities (2016) (25)
- Online Sparse Reinforcement Learning (2020) (25)
- Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient (2020) (25)
- PAC-Bayes with Backprop (2019) (24)
- Regularized Fitted Q-iteration: Application to Planning (2008) (24)
- SDP Relaxation with Randomized Rounding for Energy Disaggregation (2016) (24)
- Neurocontroller using dynamic state feedback for compensatory control (1997) (24)
- Universal parameter optimisation in games based on SPSA (2006) (24)
- Structured Best Arm Identification with Fixed Confidence (2017) (24)
- REGO: Rank-based Estimation of Renyi Information using Euclidean Graph Optimization (2010) (23)
- Efficient Planning in Large MDPs with Weak Linear Function Approximation (2020) (23)
- Online Learning to Rank with Features (2018) (23)
- BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback (2018) (22)
- Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers (2018) (22)
- Learning when to stop thinking and do something! (2009) (22)
- Exploration-Enhanced POLITEX (2019) (22)
- Multi-Step Dyna Planning for Policy Evaluation and Control (2009) (22)
- Multiclass Classification Calibration Functions (2016) (21)
- When Is Partially Observable Reinforcement Learning Not Scary? (2022) (21)
- Escaping the Gravitational Pull of Softmax (2020) (21)
- Perturbed-History Exploration in Stochastic Multi-Armed Bandits (2019) (21)
- Asymptotically Optimal Information-Directed Sampling (2020) (21)
- Approximate geometry representations and sensory fusion (1996) (21)
- A Linearly Relaxed Approximate Linear Program for Markov Decision Processes (2017) (21)
- Optimal Resource Allocation with Semi-Bandit Feedback (2014) (21)
- Compressed Conditional Mean Embeddings for Model-Based Reinforcement Learning (2016) (20)
- On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function (2021) (20)
- Bounds and dynamics for empirical game theoretic analysis (2019) (20)
- Adaptive Monte Carlo via Bandit Allocation (2014) (20)
- Pseudo-MDPs and factored linear action models (2014) (20)
- Extending rapidly-exploring random trees for asymptotically optimal anytime motion planning (2010) (20)
- CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration (2019) (19)
- Self-organizing neurocontrol (1994) (19)
- Cleaning up the neighborhood: A full classification for adversarial partial monitoring (2018) (19)
- No Regrets for Learning the Prior in Bandits (2021) (19)
- Efficient approximate planning in continuous space Markovian Decision Problems (2001) (19)
- Agnostic KWIK learning and efficient approximate reinforcement learning (2011) (18)
- Sequential Importance Sampling for Visual Tracking Reconsidered (2003) (18)
- Characterizing the Representer Theorem (2013) (18)
- Shifting Regret, Mirror Descent, and Matrices (2016) (18)
- Efficient Local Planning with Linear Function Approximation (2021) (17)
- Approximate Policy Iteration with Linear Action Models (2012) (17)
- RSPSA: Enhanced Parameter Optimization in Games (2006) (17)
- Meta-Thompson Sampling (2021) (17)
- Self-Organizing Multi-Resolution Grid for Motion Planning and Control (1996) (16)
- Learning With Adversary (2015) (16)
- On Minimax Optimal Offline Policy Evaluation (2014) (15)
- Enhancing Particle Filters Using Local Likelihood Sampling (2004) (15)
- Regularized least-squares regression: Learning from a β-mixing sequence (2012) (15)
- Static and Dynamic Aspects of Optimal Sequential Decision Making (1998) (15)
- Behavior of an Adaptive Self-organizing Autonomous Agent Working with Cues and Competing Concepts (1993) (14)
- Bootstrapping Statistical Inference for Off-Policy Evaluation (2021) (14)
- Approximate Inverse-Dynamics Based Robust Control Using Static And Dynamic Feedback (1998) (14)
- Performance of Nonlinear Approximate Adaptive Controllers: French/Adaptive Controllers (2005) (13)
- Module Based Reinforcement Learning for a Real Robot (1997) (13)
- Provably Efficient Adaptive Approximate Policy Iteration (2020) (13)
- Differentiable Meta-Learning of Bandit Policies (2020) (13)
- A modular analysis of adaptive (non-)convex optimization: Optimism, composite objectives, variance reduction, and variational bounds (2020) (13)
- Distribution-Dependent Analysis of Gibbs-ERM Principle (2019) (13)
- Empirical Bayes Regret Minimization (2019) (12)
- Shortest Path Discovery Problems: A Framework, Algorithms and Experimental Results (2004) (12)
- Bayesian Optimal Control of Smoothly Parameterized Systems: The Lazy Posterior Sampling Algorithm (2014) (12)
- A simpler approach to accelerated optimization: iterative averaging meets optimism (2020) (12)
- On using likelihood-adjusted proposals in particle filtering: local importance sampling (2005) (12)
- Learning to segment from a few well-selected training images (2009) (12)
- A Markov-Chain Monte Carlo Approach to Simultaneous Localization and Mapping (2010) (11)
- Improved Regret Bound and Experience Replay in Regularized Policy Iteration (2021) (11)
- BubbleRank: Safe Online Learning to Rerank (2018) (11)
- Inverse Dynamics Controllers for Robust Control: Consequences for Neurocontrollers (1996) (11)
- Finite Time Bounds for Temporal Difference Learning with Function Approximation: Problems with some “state-of-the-art” results (2017) (11)
- Unsupervised Sequential Sensor Acquisition (2017) (11)
- On Multi-objective Policy Optimization as a Tool for Reinforcement Learning (2021) (11)
- Learning and Exploitation Do Not Conflict Under Minimax Optimality (1997) (10)
- The SBASE protein domain library, release 6.0: a collection of annotated protein sequence segments (1999) (10)
- An Evaluation Criterion for Macro-Learning and Some Results (1999) (10)
- On Identifying Good Options under Combinatorially Structured Feedback in Finite Noisy Environments (2015) (10)
- A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning (2012) (10)
- A Simpler Approach to Accelerated Stochastic Optimization: Iterative Averaging Meets Optimism (2020) (10)
- Decision-Theoretic Clustering of Strategies (2015) (10)
- PAC-Bayesian Policy Evaluation for Reinforcement Learning (2011) (10)
- Generalization in an autonomous agent (1994) (9)
- Reduced-Variance Payoff Estimation in Adversarial Bandit Problems (2005) (9)
- Efron-Stein PAC-Bayesian Inequalities (2019) (9)
- Local Importance Sampling: A Novel Technique to Enhance Particle Filtering (2006) (9)
- Bandits with Delayed Anonymous Feedback (2017) (9)
- Optimization Issues in KL-Constrained Approximate Policy Iteration (2021) (9)
- Alignment based kernel learning with a continuous set of base kernels (2011) (9)
- Differentiable Meta-Learning in Contextual Bandits (2020) (8)
- Online Algorithm for Unsupervised Sensor Selection (2019) (8)
- Learning near-optimal policies with fitted policy iteration and a single sample path (2005) (8)
- Non-Markovian Policies in Sequential Decision Problems (1998) (8)
- An integrated architecture for motion-control and path-planning (1998) (8)
- Understanding the Effect of Stochasticity in Policy Optimization (2021) (8)
- A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning (2021) (8)
- Efficient Stopping Rules (2008) (8)
- Computer aided diagnosis of clustered microcalcifications using artificial neural nets (2000) (8)
- Parallel and robust skeletonization built on self-organizing elements (1999) (8)
- Module Based Reinforcement Learning: An Application to a Real Robot (1997) (7)
- Near-Optimal Sample Complexity Bounds for Constrained MDPs (2022) (7)
- Differentiable Bandit Exploration (2020) (7)
- Linear Stochastic Approximation: Constant Step-Size and Iterate Averaging (2017) (7)
- An Exponential Efron-Stein Inequality for Lq Stable Learning Rules (2019) (7)
- Fast Cross-Validation for Incremental Learning (2015) (7)
- Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms (2010) (7)
- Multi-view Matrix Factorization for Linear Dynamical System Estimation (2017) (6)
- A Finite-Sample Generalization Bound for Semiparametric Regression: Partially Linear Models (2014) (6)
- On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data (2021) (6)
- Log-optimal currency portfolios and control Lyapunov exponents (2005) (6)
- Exploiting Symmetries to Construct Efficient MCMC Algorithms With an Application to SLAM (2015) (6)
- An Exponential Tail Bound for Lq Stable Learning Rules. Application to k-Folds Cross-Validation (2019) (6)
- Autonomous exploration for navigating in non-stationary CMPs (2019) (6)
- Value-Aware Loss Function for Model Learning in Reinforcement Learning (2016) (6)
- General Framework for Reinforcement Learning (1995) (6)
- Proceedings of the 22nd international conference on Algorithmic learning theory (1995) (6)
- Markov Decision Processes under Bandit Feedback (2015) (5)
- Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping (2021) (5)
- Module-Based Reinforcement Learning: Experiments with a Real Robot (1998) (5)
- On Optimality of Meta-Learning in Fixed-Design Regression with Weighted Biased Regularization (2020) (5)
- The Curse of Passive Data Collection in Batch Reinforcement Learning (2021) (5)
- Robust control using inverse dynamics neurocontrollers (1997) (5)
- On Learning the Optimal Waiting Time (2014) (5)
- Chaining Bounds for Empirical Risk Minimization (2016) (4)
- Cascading Bandits (2015) (4)
- An Asymptotic Scaling Analysis of LQ Performance for an Approximate Adaptive Control Design (2002) (4)
- Maximum Margin Discriminant Analysis based Face Recognition (2005) (4)
- On the Role of Optimization in Double Descent: A Least Squares Study (2021) (4)
- Optimistic MLE - A Generic Model-based Algorithm for Partially Observable Sequential Decision Making (2022) (4)
- Dynamic concept model learns optimal policies (1994) (4)
- Budgeted Distribution Learning of Belief Net Parameters (2010) (4)
- Think out of the "Box": Generically-Constrained Asynchronous Composite Optimization and Hedging (2019) (4)
- A Distribution-dependent Analysis of Meta Learning (2020) (3)
- LQ performance bounds for adaptive output feedback controllers for functionally uncertain nonlinear systems (2002) (3)
- Online Algorithm for Unsupervised Sequential Selection with Contextual Information (2020) (3)
- An a Priori Exponential Tail Bound for k-Folds Cross-Validation (2017) (3)
- Modular Reinforcement Learning: A Case Study in a Robot Domain (2000) (3)
- Self-organized learning of 3 dimensions (1994) (3)
- LMS-2: Towards an algorithm that is as cheap as LMS and almost as efficient as RLS (2009) (3)
- Sample-Efficient Reinforcement Learning of Partially Observable Markov Games (2022) (3)
- Models of active learning in group-structured state spaces (2010) (3)
- On Asymptotic and Finite-Time Optimality of Bayesian Predictors (2019) (3)
- Proceedings of the 10th European Workshop on Reinforcement Learning (2013) (3)
- Sequence Prediction Exploiting Similary Information (2007) (3)
- Confident Least Square Value Iteration with Local Access to a Simulator (2022) (3)
- ImpatientCapsAndRuns: Approximately Optimal Algorithm Configuration from an Infinite Pool (2020) (3)
- Deterministic Independent Component Analysis (2015) (3)
- Adaptive Approximate Policy Iteration (2021) (3)
- Crowdsourcing with Sparsely Interacting Workers (2017) (3)
- Algorithmic Learning Theory (2011) (2)
- Towards Painless Policy Optimization for Constrained MDPs (2022) (2)
- Speeding Up Planning in Markov Decision Processes via Automatically Constructed Abstraction (2008) (2)
- European Workshop on Reinforcement Learning (2008) (2)
- Integration of Artificial Neural Networks and Dynamic Concepts to an Adaptive and Self-Organizing Agent (1993) (2)
- Ockham's Razor Modeling of the Matrisome Channels of the Basal Ganglia Thalamocortical Loops (2001) (2)
- Reinforcement Learning: Theory and Practice (2010) (2)
- Convergent Reinforcement Learning with Value Function Interpolation (2001) (2)
- KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal (2022) (2)
- Confident Approximate Policy Iteration for Efficient Local Planning in qπ-realizable MDPs (2022) (2)
- Kernel Machine Based Feature Extraction Algorithms for Regression Problems (2004) (2)
- Complexity of learning: the case of everyday neural networks (1994) (2)
- X-mHMM: an efficient algorithm for training mixtures of HMMs when the number of mixtures is unknown (2005) (2)
- Pathological Effects of Variance on Classification-Based Policy Iteration (2015) (1)
- An Exponential Tail Bound for the Deleted Estimate (2019) (1)
- Parametric Bandits: The Generalized Linear Case (extended version) (2010) (1)
- Non-trivial two-armed partial-monitoring games are bandits (2011) (1)
- Sample Efficient Deep Reinforcement Learning via Local Planning (2023) (1)
- Least Squares Temporal Difference Learning and Galerkin ’ s Method (2011) (1)
- EARNING WITH A S TRONG A DVERSARY (2016) (1)
- Revisiting Simple Regret Minimization in Multi-Armed Bandits (2022) (1)
- Max-affine estimators for convex stochastic programming (2016) (1)
- Active Learning of Group-Structured Environments (2008) (1)
- Efficient object tracking in video sequences by means of LS-N-IPS (2001) (1)
- Robust Nonparametric Copula Based Dependence Estimators (2011) (1)
- Toward Manifold-Adaptive Learning (2007) (1)
- A unified modular analysis of online and stochastic optimization: adaptivity, optimism, non-convexity (2016) (1)
- Regularized Fitted Q-iteration : Application to Bounded Resource Planning (2009) (1)
- A Randomized Strategy for Learning to Combine Many Features (2012) (1)
- Sequential Learning without Feedback (2016) (1)
- Towards Facial Pose Tracking (2002) (0)
- Scaling of LQ performance in approximate adaptive designs (2000) (0)
- Reinforcement Learning Algorithms in Markov Decision Processes AAAI-10 Tutorial Part II: Learning to predict values (2010) (0)
- Reinforcement Learning Algorithms in Markov Decision Processes AAAI-10 Tutorial Part IV: Take home message (2010) (0)
- Comparison to Alternative Designs (2005) (0)
- Classification with Margin Constraints: A Unification with Applications to Optimization (2015) (0)
- Stochastic Processes and Markov Chains (2020) (0)
- Sztochasztikus Rendszerek és Pénzügyi Piacok Modellezése = Stochastic Systems and Modelling of Financial Markets (2007) (0)
- Function Approximator Designs for the Integrator Chain (2005) (0)
- Guest Editors' introduction (2014) (0)
- Hoeffding Bounds vs . Empirical (2018) (0)
- Automated Detection and Classification of Micro-Calcifications in Mammograms Using Artifical Neural Nets (1998) (0)
- Exponential Hardness of Reinforcement Learning with Linear Function Approximation (2023) (0)
- Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks (2022) (0)
- The Explore-Then-Commit Algorithm (2020) (0)
- Uncertainty Modelling, Control Design and System Performance (2005) (0)
- Workshop summary: On-line learning with limited feedback (2009) (0)
- FlexVoice: A Parametric Approach to High-Quality Speech Synthesis (2000) (0)
- Generalization Bounds for Partially Linear Models (2014) (0)
- Strict Feedback Systems (2005) (0)
- L EARNING WITH A S TRONG A DVERSARY (2015) (0)
- Revisiting Simple Regret: Fast Rates for Returning a Good Arm (2022) (0)
- Partial Monitoring (2020) (0)
- The Exp3 Algorithm (2020) (0)
- Appendix A: Lyapunov's Direct Method (2005) (0)
- Invited Talk: Towards Robust Reinforcement Learning Algorithms (2011) (0)
- Computer Aided Diagnosis of Clustered Microcalci fi cations Using Arti fi cial Neural Nets (2004) (0)
- Output Feedback Control (2005) (0)
- The return of $\epsilon$-greedy: sublinear regret for model-free linear quadratic control (2018) (0)
- Uncertainty and performance of adaptive controllers for functionally uncertain output feedback systems (1998) (0)
- Prediction of Protein Domain-Types by Backpropagation (2010) (0)
- Editors' Introduction (2011) (0)
- Bounds and dynamics for empirical game theoretic analysis (2019) (0)
- Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization (2022) (0)
- Artificial Neural Networks Prediction of Protein Functional Domains from Sequences Using (2001) (0)
- VALUATION : A N ADVERSARIAL APPROACH TO UNCOVER CATASTROPHIC FAILURES (2019) (0)
- Faster Rates, Adaptive Algorithms, and Finite-Time Bounds for Linear Composition Optimization and Gradient TD Learning (2022) (0)
- Performance-Evaluation for Automated Detection of Microcalcifications in Mammograms Using Three Different Film-Digitizers (1998) (0)
- The Role of Baselines in Policy Gradient Optimization (2023) (0)
- The Chain of Integrators (2005) (0)
- Convergent Temporal-Difference Learning with Arbitrary Differentiable Function Approximator (2010) (0)
- Efficient Planning in Combinatorial Action Spaces with Applications to Cooperative Multi-Agent Reinforcement Learning (2023) (0)
- Appendix B: Functional Bounds from System Identification (2005) (0)
- Towards Painless Policy Optimization for Constrained MDPs: Supplementary material (2022) (0)
This paper list is powered by the following services:
What Schools Are Affiliated With Csaba Szepesv'ari?
Csaba Szepesv'ari is affiliated with the following schools: