Lihong Li

Lihong Li's AcademicInfluence.com Rankings

Lihong Li

Computer Science

#9968

World Rank

#10457

Historical Rank

Machine Learning

#4504

World Rank

#4554

Historical Rank

Artificial Intelligence

#4862

World Rank

#4925

Historical Rank

Database

#6918

World Rank

#7159

Historical Rank

computer-science Degrees

Download Badge

Computer Science

Lihong Li's Degrees

PhD Computer Science Stanford University
Masters Computer Science Stanford University
Bachelors Computer Science Tsinghua University

Similar Degrees You Can Earn

Why Is Lihong Li Influential?

(Suggest an Edit or Addition)

(See a Problem?)

Lihong Li's Published Works

Number of citations in a given year to any of this author's works

Total number of citations to an author for the works they published in a given year. This highlights publication of the most important work(s) by the author

Published Works

A contextual-bandit approach to personalized news article recommendation (2010) (2366)
Parallelized Stochastic Gradient Descent (2010) (1256)
An Empirical Evaluation of Thompson Sampling (2011) (1234)
Contextual Bandits with Linear Payoff Functions (2011) (812)
Doubly Robust Policy Evaluation and Learning (2011) (565)
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms (2010) (521)
Neural Approaches to Conversational AI (2018) (519)
Sparse Online Learning via Truncated Gradient (2008) (471)
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning (2015) (461)
PAC model-free reinforcement learning (2006) (453)
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits (2014) (422)
Towards a Unified Theory of State Abstraction for MDPs (2006) (395)
End-to-End Task-Completion Neural Dialogue Systems (2017) (322)
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access (2016) (289)
Reinforcement Learning in Finite MDPs: PAC Analysis (2009) (288)
Contextual Bandit Algorithms with Supervised Learning Guarantees (2010) (271)
Neuro-Symbolic Program Synthesis (2016) (267)
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation (2018) (254)
Knows what it knows: a framework for self-aware learning (2008) (239)
SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation (2017) (221)
Learning from Logged Implicit Exploration Data (2010) (221)
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections (2019) (218)
Provably Optimal Algorithms for Generalized Linear Contextual Bandits (2017) (213)
Doubly Robust Policy Evaluation and Optimization (2014) (204)
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning (2008) (200)
A Bayesian Sampling Approach to Exploration in Reinforcement Learning (2009) (181)
Deep Reinforcement Learning with a Natural Language Action Space (2015) (179)
Neural Logic Machines (2019) (162)
Analyzing feature generation for value-function approximation (2007) (157)
AlgaeDICE: Policy Gradient from Arbitrary Experience (2019) (155)
Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning (2017) (148)
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems (2016) (148)
A User Simulator for Task-Completion Dialogues (2016) (129)
Stochastic Variance Reduction Methods for Policy Evaluation (2017) (127)
Unbiased online active learning in data streams (2011) (120)
Toward Minimax Off-policy Value Estimation (2015) (119)
Neural Contextual Bandits with UCB-based Exploration (2019) (117)
GenDICE: Generalized Offline Estimation of Stationary Values (2020) (115)
Policy Certificates: Towards Accountable Reinforcement Learning (2018) (114)
PAC-inspired Option Discovery in Lifelong Reinforcement Learning (2014) (110)
Sample Complexity of Multi-task Reinforcement Learning (2013) (109)
The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning (2009) (97)
Online Evaluation for Information Retrieval (2016) (97)
Adversarial Attacks on Stochastic Bandits (2018) (95)
A unifying framework for computational reinforcement learning theory (2009) (85)
Lazy Approximation for Solving Continuous Finite-Horizon MDPs (2005) (84)
Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study (2015) (82)
Off-Policy Evaluation via the Regularized Lagrangian (2020) (77)
Reinforcement learning for dialog management using least-squares Policy iteration and fast feature selection (2009) (75)
An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives (2015) (70)
An unbiased offline evaluation of contextual bandit algorithms with generalized linear models (2011) (65)
Recurrent Reinforcement Learning: A Hybrid Approach (2015) (60)
Incremental Model-based Learners With Formal Learning-Time Guarantees (2006) (59)
Online exploration in least-squares policy iteration (2009) (59)
Doubly Robust Off-policy Evaluation for Reinforcement Learning (2015) (59)
Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear (2016) (58)
Data Poisoning Attacks in Contextual Bandits (2018) (57)
Learning and planning in environments with delayed feedback (2009) (57)
A Kernel Loss for Solving the Bellman Equation (2019) (56)
CoinDICE: Off-Policy Confidence Interval Estimation (2020) (55)
Randomized Exploration in Generalized Linear Bandits (2019) (55)
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation (2019) (54)
Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking (2016) (52)
Efficient Exploration for Dialog Policy Learning with Deep BBQ Networks \& Replay Buffer Spiking (2016) (48)
Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads (2016) (48)
Scalable Bilinear π Learning Using State and Action Features (2018) (46)
Neural Approaches to Conversational AI: Question Answering, Task-oriented Dialogues and Social Chatbots (2019) (46)
Subgoal Discovery for Hierarchical Dialogue Policy Learning (2018) (44)
Boosting the Actor with Dual Critic (2017) (42)
Toward Predicting the Outcome of an A/B Experiment for Search Relevance (2015) (41)
Neural Thompson Sampling (2020) (41)
Counterfactual Estimation and Optimization of Click Metrics for Search Engines (2014) (33)
Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems (2017) (32)
Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits (2012) (32)
An Online Learning Framework for Refining Recency Search Results with User Click Feedback (2011) (32)
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders (2020) (30)
Online learning for recency search ranking using real-time user feedback (2010) (29)
Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning (2020) (29)
Near-optimal Representation Learning for Linear Bandits and Linear RL (2021) (27)
An Optimal High Probability Algorithm for the Contextual Bandit Problem (2010) (26)
Provably Efficient Learning with Typed Parametric Models (2009) (26)
A worst-case comparison between temporal difference and residual gradient with linear function approximation (2008) (25)
CORL: A Continuous-state Offset-dynamics Reinforcement Learner (2008) (25)
Smoothed Dual Embedding Control (2017) (25)
Planning and Learning in Environments with Delayed Feedback (2007) (24)
Click-based Hot Fixes for Underperforming Torso Queries (2016) (23)
Linear-Time Estimators for Propensity Scores (2011) (22)
Maintaining Equilibria During Exploration in Sponsored Search Auctions (2010) (22)
On the Optimality of Batch Policy Optimization Algorithms (2021) (21)
Generalized Thompson Sampling for Contextual Bandits (2013) (21)
On the Prior Sensitivity of Thompson Sampling (2015) (21)
Escaping the Gravitational Pull of Softmax (2020) (21)
Sample Complexity Bounds of Exploration (2012) (19)
Reducing reinforcement learning to KWIK online regression (2010) (18)
Deep Reinforcement Learning with an Unbounded Action Space (2015) (18)
Batch Stationary Distribution Estimation (2020) (17)
Composite Task-Completion Dialogue System via Hierarchical Deep Reinforcement Learning (2017) (16)
Prioritized Sweeping Converges to the Optimal Value Function (2008) (16)
Combating Deep Reinforcement Learning's Sisyphean Curse with Intrinsic Fear (2017) (15)
On Minimax Optimal Offline Policy Evaluation (2014) (15)
Contextual Bandits with Global Constraints and Objective (2015) (15)
Maintaining Equilibria During Exploration in Sponsored Search Auctions (2007) (15)
Temporal supervised learning for inferring a dialog policy from example conversations (2014) (14)
Open Problem: Regret Bounds for Thompson Sampling (2012) (14)
Deep Reinforcement Learning with an Action Space Defined by Natural Language (2015) (14)
Joint relevance and freshness learning from clickthroughs for news search (2012) (12)
Neural Contextual Bandits with Upper Confidence Bound-Based Exploration (2019) (12)
Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL (2020) (12)
Offline Evaluation and Optimization for Interactive Systems (2015) (11)
Understanding Domain Randomization for Sim-to-real Transfer (2021) (9)
A Novel Benchmark Methodology and Data Repository for Real-life Reinforcement Learning (2009) (9)
Exploiting User Preference for Online Learning in Web Content Optimization Systems (2014) (9)
Efficient Online Bootstrapping for Large Scale Learning (2013) (7)
Active Learning with Oracle Epiphany (2016) (7)
Efficient Dialogue Policy Learning with BBQ-Networks (2016) (7)
An Unbiased, Data-Driven, Offline Evaluation Method of Contextual Bandit Algorithms (2010) (7)
Combating Deep Reinforcement Learning ’ s Sisyphean Curse with Reinforcement Learning (2017) (6)
The Online Discovery Problem and Its Application to Lifelong Reinforcement Learning (2015) (6)
Evaluation of Explore-Exploit Policies in Multi-result Ranking Systems (2015) (6)
A perspective on off-policy evaluation in reinforcement learning (2019) (6)
DualDICE: Efficient Estimation of Off-Policy Stationary Distribution Corrections (2019) (5)
A Map of Bandits for E-commerce (2021) (5)
Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes (2017) (5)
Now I Remember! Episodic Memory For Reinforcement Learning (2018) (4)
Deep Reinforcement Learning with a Combinatorial Action Space for Predicting and Tracking Popular Discussion Threads (2016) (4)
Cloud control: voluntary admission control for intranet traffic management (2012) (4)
Workshop summary: Results of the 2009 reinforcement learning competition (2009) (4)
Exploration in Least-Squares Policy Iteration (2008) (3)
Efficient Value-Function Approximation via Online Linear Regression (2008) (2)
Scaffolding Networks for Teaching and Learning to Comprehend (2017) (2)
PAC-MDP Reinforcement Learning with Bayesian Priors (2009) (2)
Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing (2020) (1)
Scaffolding Networks: Incremental Learning and Teaching Through Questioning (2017) (1)
The Online Coupon-Collector Problem and Its Application to Lifelong Reinforcement Learning (2015) (1)
Bandits with Generalized Linear Models (2012) (1)
Neural Approaches to Conversational AI - Tutorial at ACL/SIGIR 2018 (2018) (1)
Lazy Approximation : A New Approach for Solving Continuous Finite-Horizon MDPs (2005) (1)
GenDICE: Offline Generalized Stationary Distribution Correction Estimation (2020) (0)
Reinforcement Learning via Online Linear Regression (2007) (0)
Doubly Robust Policy Evaluation and Optimization 1 (2015) (0)
Machine Learning in the Bandit Setting: Algorithms, Evaluation, and Case Studies (CS Seminar Lecture Series) (2012) (0)
Guest editorial: special issue on reinforcement learning for real life (2021) (0)
Avoiding Catastrophic States with Intrinsic Fear (2018) (0)
NeuralUCB: Contextual Bandits with Neural Network-Based Exploration (2019) (0)
A Reinforcement Learning Approach to Estimating Long-term Treatment Effects (2022) (0)
Session details: OOEW 2015 (2015) (0)
OOSTING THE A CTOR WITH D UAL C RITIC (2018) (0)
Offline Policy Optimization in RL with Variance Regularizaton (2022) (0)
INFINITE-HORIZON REINFORCEMENT LEARNING (2020) (0)
Final Program Report : SAMSI Computational Advertising Program Summer (2012) (0)
A perspective on off-policy evaluation in reinforcement learning (2019) (0)
DCS-TR-641 Exploration in Least-Squares Policy Iteration (2008) (0)
Action Space Defined by Natural Language (2016) (0)

This paper list is powered by the following services:

Lihong Li's Academic­Influence.com Rankings

Lihong Li's Degrees

Similar Degrees You Can Earn

Why Is Lihong Li Influential?

Lihong Li's Published Works

Published Works

Lihong Li's AcademicInfluence.com Rankings