Lihong Li
#166,067
Most Influential Person Now
Lihong Li's AcademicInfluence.com Rankings
Lihong Licomputer-science Degrees
Computer Science
#9968
World Rank
#10457
Historical Rank
Machine Learning
#4504
World Rank
#4554
Historical Rank
Artificial Intelligence
#4862
World Rank
#4925
Historical Rank
Database
#6918
World Rank
#7159
Historical Rank

Download Badge
Computer Science
Lihong Li's Degrees
- PhD Computer Science Stanford University
- Masters Computer Science Stanford University
- Bachelors Computer Science Tsinghua University
Similar Degrees You Can Earn
Why Is Lihong Li Influential?
(Suggest an Edit or Addition)Lihong Li's Published Works
Number of citations in a given year to any of this author's works
Total number of citations to an author for the works they published in a given year. This highlights publication of the most important work(s) by the author
Published Works
- A contextual-bandit approach to personalized news article recommendation (2010) (2366)
- Parallelized Stochastic Gradient Descent (2010) (1256)
- An Empirical Evaluation of Thompson Sampling (2011) (1234)
- Contextual Bandits with Linear Payoff Functions (2011) (812)
- Doubly Robust Policy Evaluation and Learning (2011) (565)
- Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms (2010) (521)
- Neural Approaches to Conversational AI (2018) (519)
- Sparse Online Learning via Truncated Gradient (2008) (471)
- Doubly Robust Off-policy Value Evaluation for Reinforcement Learning (2015) (461)
- PAC model-free reinforcement learning (2006) (453)
- Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits (2014) (422)
- Towards a Unified Theory of State Abstraction for MDPs (2006) (395)
- End-to-End Task-Completion Neural Dialogue Systems (2017) (322)
- Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access (2016) (289)
- Reinforcement Learning in Finite MDPs: PAC Analysis (2009) (288)
- Contextual Bandit Algorithms with Supervised Learning Guarantees (2010) (271)
- Neuro-Symbolic Program Synthesis (2016) (267)
- Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation (2018) (254)
- Knows what it knows: a framework for self-aware learning (2008) (239)
- SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation (2017) (221)
- Learning from Logged Implicit Exploration Data (2010) (221)
- DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections (2019) (218)
- Provably Optimal Algorithms for Generalized Linear Contextual Bandits (2017) (213)
- Doubly Robust Policy Evaluation and Optimization (2014) (204)
- An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning (2008) (200)
- A Bayesian Sampling Approach to Exploration in Reinforcement Learning (2009) (181)
- Deep Reinforcement Learning with a Natural Language Action Space (2015) (179)
- Neural Logic Machines (2019) (162)
- Analyzing feature generation for value-function approximation (2007) (157)
- AlgaeDICE: Policy Gradient from Arbitrary Experience (2019) (155)
- Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning (2017) (148)
- BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems (2016) (148)
- A User Simulator for Task-Completion Dialogues (2016) (129)
- Stochastic Variance Reduction Methods for Policy Evaluation (2017) (127)
- Unbiased online active learning in data streams (2011) (120)
- Toward Minimax Off-policy Value Estimation (2015) (119)
- Neural Contextual Bandits with UCB-based Exploration (2019) (117)
- GenDICE: Generalized Offline Estimation of Stationary Values (2020) (115)
- Policy Certificates: Towards Accountable Reinforcement Learning (2018) (114)
- PAC-inspired Option Discovery in Lifelong Reinforcement Learning (2014) (110)
- Sample Complexity of Multi-task Reinforcement Learning (2013) (109)
- The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning (2009) (97)
- Online Evaluation for Information Retrieval (2016) (97)
- Adversarial Attacks on Stochastic Bandits (2018) (95)
- A unifying framework for computational reinforcement learning theory (2009) (85)
- Lazy Approximation for Solving Continuous Finite-Horizon MDPs (2005) (84)
- Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study (2015) (82)
- Off-Policy Evaluation via the Regularized Lagrangian (2020) (77)
- Reinforcement learning for dialog management using least-squares Policy iteration and fast feature selection (2009) (75)
- An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives (2015) (70)
- An unbiased offline evaluation of contextual bandit algorithms with generalized linear models (2011) (65)
- Recurrent Reinforcement Learning: A Hybrid Approach (2015) (60)
- Incremental Model-based Learners With Formal Learning-Time Guarantees (2006) (59)
- Online exploration in least-squares policy iteration (2009) (59)
- Doubly Robust Off-policy Evaluation for Reinforcement Learning (2015) (59)
- Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear (2016) (58)
- Data Poisoning Attacks in Contextual Bandits (2018) (57)
- Learning and planning in environments with delayed feedback (2009) (57)
- A Kernel Loss for Solving the Bellman Equation (2019) (56)
- CoinDICE: Off-Policy Confidence Interval Estimation (2020) (55)
- Randomized Exploration in Generalized Linear Bandits (2019) (55)
- Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation (2019) (54)
- Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking (2016) (52)
- Efficient Exploration for Dialog Policy Learning with Deep BBQ Networks \& Replay Buffer Spiking (2016) (48)
- Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads (2016) (48)
- Scalable Bilinear π Learning Using State and Action Features (2018) (46)
- Neural Approaches to Conversational AI: Question Answering, Task-oriented Dialogues and Social Chatbots (2019) (46)
- Subgoal Discovery for Hierarchical Dialogue Policy Learning (2018) (44)
- Boosting the Actor with Dual Critic (2017) (42)
- Toward Predicting the Outcome of an A/B Experiment for Search Relevance (2015) (41)
- Neural Thompson Sampling (2020) (41)
- Counterfactual Estimation and Optimization of Click Metrics for Search Engines (2014) (33)
- Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems (2017) (32)
- Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits (2012) (32)
- An Online Learning Framework for Refining Recency Search Results with User Click Feedback (2011) (32)
- Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders (2020) (30)
- Online learning for recency search ranking using real-time user feedback (2010) (29)
- Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning (2020) (29)
- Near-optimal Representation Learning for Linear Bandits and Linear RL (2021) (27)
- An Optimal High Probability Algorithm for the Contextual Bandit Problem (2010) (26)
- Provably Efficient Learning with Typed Parametric Models (2009) (26)
- A worst-case comparison between temporal difference and residual gradient with linear function approximation (2008) (25)
- CORL: A Continuous-state Offset-dynamics Reinforcement Learner (2008) (25)
- Smoothed Dual Embedding Control (2017) (25)
- Planning and Learning in Environments with Delayed Feedback (2007) (24)
- Click-based Hot Fixes for Underperforming Torso Queries (2016) (23)
- Linear-Time Estimators for Propensity Scores (2011) (22)
- Maintaining Equilibria During Exploration in Sponsored Search Auctions (2010) (22)
- On the Optimality of Batch Policy Optimization Algorithms (2021) (21)
- Generalized Thompson Sampling for Contextual Bandits (2013) (21)
- On the Prior Sensitivity of Thompson Sampling (2015) (21)
- Escaping the Gravitational Pull of Softmax (2020) (21)
- Sample Complexity Bounds of Exploration (2012) (19)
- Reducing reinforcement learning to KWIK online regression (2010) (18)
- Deep Reinforcement Learning with an Unbounded Action Space (2015) (18)
- Batch Stationary Distribution Estimation (2020) (17)
- Composite Task-Completion Dialogue System via Hierarchical Deep Reinforcement Learning (2017) (16)
- Prioritized Sweeping Converges to the Optimal Value Function (2008) (16)
- Combating Deep Reinforcement Learning's Sisyphean Curse with Intrinsic Fear (2017) (15)
- On Minimax Optimal Offline Policy Evaluation (2014) (15)
- Contextual Bandits with Global Constraints and Objective (2015) (15)
- Maintaining Equilibria During Exploration in Sponsored Search Auctions (2007) (15)
- Temporal supervised learning for inferring a dialog policy from example conversations (2014) (14)
- Open Problem: Regret Bounds for Thompson Sampling (2012) (14)
- Deep Reinforcement Learning with an Action Space Defined by Natural Language (2015) (14)
- Joint relevance and freshness learning from clickthroughs for news search (2012) (12)
- Neural Contextual Bandits with Upper Confidence Bound-Based Exploration (2019) (12)
- Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL (2020) (12)
- Offline Evaluation and Optimization for Interactive Systems (2015) (11)
- Understanding Domain Randomization for Sim-to-real Transfer (2021) (9)
- A Novel Benchmark Methodology and Data Repository for Real-life Reinforcement Learning (2009) (9)
- Exploiting User Preference for Online Learning in Web Content Optimization Systems (2014) (9)
- Efficient Online Bootstrapping for Large Scale Learning (2013) (7)
- Active Learning with Oracle Epiphany (2016) (7)
- Efficient Dialogue Policy Learning with BBQ-Networks (2016) (7)
- An Unbiased, Data-Driven, Offline Evaluation Method of Contextual Bandit Algorithms (2010) (7)
- Combating Deep Reinforcement Learning ’ s Sisyphean Curse with Reinforcement Learning (2017) (6)
- The Online Discovery Problem and Its Application to Lifelong Reinforcement Learning (2015) (6)
- Evaluation of Explore-Exploit Policies in Multi-result Ranking Systems (2015) (6)
- A perspective on off-policy evaluation in reinforcement learning (2019) (6)
- DualDICE: Efficient Estimation of Off-Policy Stationary Distribution Corrections (2019) (5)
- A Map of Bandits for E-commerce (2021) (5)
- Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes (2017) (5)
- Now I Remember! Episodic Memory For Reinforcement Learning (2018) (4)
- Deep Reinforcement Learning with a Combinatorial Action Space for Predicting and Tracking Popular Discussion Threads (2016) (4)
- Cloud control: voluntary admission control for intranet traffic management (2012) (4)
- Workshop summary: Results of the 2009 reinforcement learning competition (2009) (4)
- Exploration in Least-Squares Policy Iteration (2008) (3)
- Efficient Value-Function Approximation via Online Linear Regression (2008) (2)
- Scaffolding Networks for Teaching and Learning to Comprehend (2017) (2)
- PAC-MDP Reinforcement Learning with Bayesian Priors (2009) (2)
- Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing (2020) (1)
- Scaffolding Networks: Incremental Learning and Teaching Through Questioning (2017) (1)
- The Online Coupon-Collector Problem and Its Application to Lifelong Reinforcement Learning (2015) (1)
- Bandits with Generalized Linear Models (2012) (1)
- Neural Approaches to Conversational AI - Tutorial at ACL/SIGIR 2018 (2018) (1)
- Lazy Approximation : A New Approach for Solving Continuous Finite-Horizon MDPs (2005) (1)
- GenDICE: Offline Generalized Stationary Distribution Correction Estimation (2020) (0)
- Reinforcement Learning via Online Linear Regression (2007) (0)
- Doubly Robust Policy Evaluation and Optimization 1 (2015) (0)
- Machine Learning in the Bandit Setting: Algorithms, Evaluation, and Case Studies (CS Seminar Lecture Series) (2012) (0)
- Guest editorial: special issue on reinforcement learning for real life (2021) (0)
- Avoiding Catastrophic States with Intrinsic Fear (2018) (0)
- NeuralUCB: Contextual Bandits with Neural Network-Based Exploration (2019) (0)
- A Reinforcement Learning Approach to Estimating Long-term Treatment Effects (2022) (0)
- Session details: OOEW 2015 (2015) (0)
- OOSTING THE A CTOR WITH D UAL C RITIC (2018) (0)
- Offline Policy Optimization in RL with Variance Regularizaton (2022) (0)
- INFINITE-HORIZON REINFORCEMENT LEARNING (2020) (0)
- Final Program Report : SAMSI Computational Advertising Program Summer (2012) (0)
- A perspective on off-policy evaluation in reinforcement learning (2019) (0)
- DCS-TR-641 Exploration in Least-Squares Policy Iteration (2008) (0)
- Action Space Defined by Natural Language (2016) (0)
This paper list is powered by the following services: