Jeffrey T. Leek
#46,771
Most Influential Person Now
Biostatistician and data scientist
Jeffrey T. Leek's AcademicInfluence.com Rankings
Jeffrey T. Leekcomputer-science Degrees
Computer Science
#3696
World Rank
#3885
Historical Rank
Data Science
#63
World Rank
#64
Historical Rank

Jeffrey T. Leekmathematics Degrees
Mathematics
#4555
World Rank
#6452
Historical Rank
Statistics
#291
World Rank
#352
Historical Rank

Download Badge
Computer Science Mathematics
Jeffrey T. Leek's Degrees
- Bachelors Mathematics University of Delaware
Similar Degrees You Can Earn
Why Is Jeffrey T. Leek Influential?
(Suggest an Edit or Addition)According to Wikipedia, Jeffrey Tullis Leek is an American biostatistician and data scientist working as a Vice President, Chief Data Officer, and Professor at Fred Hutchinson Cancer Research Center. He is an author of the Simply Statistics blog, and runs several online courses through Coursera, as part of their Data Science Specialization. His most popular course is The Data Scientist's Toolbox, which he instructed along with Roger Peng and Brian Caffo. Leek is best known for his contributions to genomic data analysis and critical view of research and the accuracy of popular statistical methods.
Jeffrey T. Leek's Published Works
Number of citations in a given year to any of this author's works
Total number of citations to an author for the works they published in a given year. This highlights publication of the most important work(s) by the author
Published Works
- Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown (2016) (3385)
- The sva package for removing batch effects and other unwanted variation in high-throughput experiments (2012) (2969)
- Tackling the widespread and critical impact of batch effects in high-throughput data (2010) (1635)
- Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis (2007) (1629)
- Temporal dynamics and genetic control of transcription in the human prefrontal cortex (2011) (645)
- Significance analysis of time course microarray experiments. (2005) (617)
- Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. (2012) (538)
- Ballgown bridges the gap between transcriptome assembly and expression analysis (2015) (523)
- svaseq: removing batch effects and other unwanted noise from sequencing data (2014) (377)
- A general framework for multiple testing dependence (2008) (365)
- Cloud-scale RNA-sequencing differential expression analysis with Myrna (2010) (341)
- Reproducible RNA-seq analysis using recount2 (2017) (311)
- Systems-level dynamic analyses of fate change in murine embryonic stem cells (2009) (291)
- EDGE: extraction and analysis of differential gene expression (2006) (263)
- Developmental and genetic regulation of the human cortex transcriptome illuminate schizophrenia pathogenesis (2018) (235)
- Polyester: Simulating RNA-Seq Datasets With Differential Transcript Expression (2014) (220)
- Sequencing technology does not eliminate biological variability (2011) (197)
- Transparency and reproducibility in artificial intelligence (2020) (167)
- ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets (2011) (166)
- Opinion: Reproducible research can still be wrong: Adopting a prevention approach (2015) (162)
- Statistics: P values are just the tip of the iceberg (2015) (162)
- What Should Researchers Expect When They Replicate Studies? A Statistical View of Replicability in Psychological Science (2016) (158)
- The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. (2007) (146)
- An estimate of the science-wise false discovery rate and application to the top medical literature. (2014) (141)
- Developmental regulation of human cortex transcription and its clinical relevance at base resolution (2014) (138)
- On the design and analysis of gene expression studies in human populations (2007) (126)
- Significance analysis and statistical dissection of variably methylated regions. (2012) (94)
- Functional annotation of human long noncoding RNAs via molecular phenotyping (2019) (91)
- Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction (2014) (89)
- Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive (2016) (87)
- Inflammatory molecular signature associated with infectious agents in psychosis. (2014) (86)
- Test set bias affects reproducibility of gene signatures (2015) (85)
- Evolution of cellular morpho-phenotypes in cancer metastasis (2015) (74)
- qSVA framework for RNA quality correction in differential expression analysis (2017) (74)
- A statistical definition for reproducibility and replicability (2016) (70)
- Five ways to fix statistics (2017) (69)
- Surrogate variable analysis (2007) (66)
- Addressing confounding artifacts in reconstruction of gene co-expression networks (2018) (61)
- Dissecting Inflammatory Complications in Critically Injured Patients by Within-Patient Gene Expression Changes: A Longitudinal Clinical Genomics Study (2011) (58)
- Rail‐RNA: scalable analysis of RNA‐seq splicing and coverage (2016) (57)
- A simple and reproducible breast cancer prognostic test (2013) (55)
- Flexible expressed region analysis for RNA-seq with derfinder (2016) (53)
- Asymptotic Conditional Singular Value Decomposition for High‐Dimensional Genomic Data (2011) (52)
- Removing batch effects for prediction problems with frozen surrogate variable analysis (2013) (51)
- Is most published research really false? (2016) (50)
- recount3: summaries and queries for large-scale RNA-seq expression and splicing (2021) (48)
- Differential expression analysis of RNA-seq data at single-base resolution (2014) (47)
- A direct approach to estimating false discovery rates conditional on covariates (2017) (45)
- Gene expression anti-profiles as a basis for accurate universal cancer signatures (2012) (44)
- Improving the value of public RNA-seq expression data by phenotype prediction (2017) (41)
- Cooperation between Referees and Authors Increases Peer Review Accuracy (2011) (40)
- Flexible isoform-level differential expression analysis with Ballgown (2014) (40)
- Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (2021) (39)
- Flexible analysis of transcriptome assemblies with Ballgown (2014) (39)
- BatchQC: interactive software for evaluating sample and batch effects in genomic data (2016) (36)
- The tspair package for finding top scoring pair classifiers in R (2009) (35)
- The Joint Null Criterion for Multiple Hypothesis Tests (2011) (34)
- Erratum: EDGE: Extraction and analysis of differential gene expression (Bioinformatics (2006) vol. 22 (4) (507-508)) (2006) (32)
- Five ways to fix statistics. (2017) (29)
- Practical impacts of genomic data “cleaning” on biological discovery using surrogate variable analysis (2015) (29)
- The practical effect of batch on genomic prediction (2012) (29)
- The Democratization of Data Science Education (2020) (25)
- A visual tool for defining reproducibility and replicability (2019) (24)
- Corrigendum: Differential expression analysis of RNA-seq data at single-base resolution (2014) (23)
- Methods for correcting inference based on outcomes predicted by machine learning (2020) (21)
- How to Share Data for Collaboration (2018) (21)
- The importance of transparency and reproducibility in artificial intelligence research (2020) (20)
- Erratum to: Practical impacts of genomic data “cleaning” on biological discovery using surrogate variable analysis (2015) (19)
- Recounting the FANTOM CAGE-Associated Transcriptome (2019) (18)
- A computationally efficient modular optimal discovery procedure (2011) (17)
- Transcriptional profile of platelets and iPSC-derived megakaryocytes from whole genome and RNA sequencing. (2020) (16)
- Cloud-scale RNA-sequencing differential (2010) (13)
- A randomized trial in a massive online open course shows people don’t know what a statistically significant relationship looks like, but they can learn (2014) (12)
- edge : Extraction of Differential Gene Expression Version 2 . 8 . 0 (2016) (11)
- derfinder: Software for annotation-agnostic RNA-seq differential expression analysis (2015) (11)
- A statistical approach to selecting and confirming validation targets in -omics experiments (2012) (10)
- Gene set bagging for estimating the probability a statistically significant result will replicate (2013) (9)
- Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce (2016) (8)
- Integrity of Induced Pluripotent Stem Cell (iPSC) Derived Megakaryocytes as Assessed by Genetic and Transcriptomic Analysis (2017) (8)
- recount: A large-scale resource of analysis-ready RNA-seq expression data (2016) (8)
- Widespread splicing of repetitive element loci into coding regions of gene transcripts. (2016) (8)
- Can MOOC Programs Improve Student Employment Prospects (2018) (8)
- A Significance Method for Time Course Microarray Experiments Applied to Two Human Studies (2004) (7)
- Gene set bagging for estimating replicability of gene set analyses (2013) (7)
- A regression framework for the proportion of true null hypotheses (2017) (7)
- SVAw - a web-based application tool for automated surrogate variable analysis of gene expression studies (2013) (7)
- A framework for RNA quality correction in differential expression analysis (2016) (7)
- regionReport: Interactive reports for region-level and feature-level genomic analyses (2016) (6)
- A Decision‐Theory Approach to Interpretable Set Analysis for High‐Dimensional Data (2013) (6)
- Personalized medicine: Keep a way open for tailored treatments (2012) (6)
- Rail-RNA: Scalable analysis of RNA-seq splicing and coverage (2015) (5)
- recount-brain: a curated repository of human brain RNA-seq datasets metadata (2019) (5)
- RNA-seq transcript quantification from reduced-representation data in recount2 (2018) (5)
- Human splicing diversity across the Sequence Read Archive (2016) (5)
- Strategies for cellular deconvolution in human brain RNA sequencing data (2020) (5)
- A glass half full interpretation of the replicability of psychological science (2015) (4)
- Comparison of Beginning R Students’ Perceptions of Peer-Made Plots Created in Two Plotting Systems: A Randomized Experiment (2020) (4)
- Empirical estimates suggest most published medical research is true (2013) (4)
- Sequestration: inadvertently killing biomedical research to score political points (2013) (4)
- Post-prediction Inference (2020) (4)
- Gene and protein expression in human megakaryocytes derived from induced pluripotent stem cells (2021) (4)
- regionReport: Interactive reports for region-based analyses. (2015) (3)
- Diagnosing Data Analytic Problems in the Classroom (2021) (3)
- Analysis of Student Behavior Using the R Package crsra (2019) (3)
- Abstract 2297: Differential analysis of gene expression across the human genome using recount2 and FANTOM-CAT (2018) (2)
- crsra: A package for Cleaning and Analyzing Coursera Research Export Data (2018) (2)
- Data science as a science (2017) (2)
- Rail-dbGaP: a protocol and tool for analyzing protected genomic data in a commercial cloud (2015) (2)
- Genomic and clinical predictors for improving estimator precision in randomized trials of breast cancer treatments (2016) (1)
- Gene expression anti-profiles as a basis for accurate universal cancer signatures (2012) (1)
- Tools for Analyzing R Code the Tidy Way (2019) (1)
- A simple and reproducible breast cancer prognostic test (2013) (1)
- Explanation implies causation? (2017) (1)
- Avoiding test set bias with rank-based prediction (2014) (1)
- Comparison of plotting system outputs in beginner analysts (2019) (1)
- Diversifying the genomic data science research community (2022) (1)
- Open-source Tools for Training Resources – OTTR (2022) (1)
- On the Structure of Multiple Testing Procedures (2006) (1)
- Addressing confounding artifacts in reconstruction of gene co-expression networks (2019) (1)
- Measurement, Summary, and Methodological Variation in RNA-sequencing (2014) (1)
- Erratum to: Practical impacts of genomic data “cleaning” on biological discovery using surrogate variable analysis (2016) (0)
- Measuring the Contribution of Genomic Predictors to Improving Estimator Precision in Randomized trials (2015) (0)
- Explanation of observational data engenders a causal belief about smoking and cancer (2018) (0)
- Statistical processes for facilitating personalized medicine (2013) (0)
- Sequestration: inadvertently killing biomedical research to score political points (2013) (0)
- Medicine is a data science, we should teach like it (2020) (0)
- regionReport: Interactive reports for region-based analyses (2015) (0)
- Mathematical and Computational Methodology for Predicting theEmergence of Insect Pests (2003) (0)
- Discussion of “visualizing statistical models: Removing the blindfold” (2015) (0)
- Bioconductor ’ s tspair package (2009) (0)
- Using the swfdr package to estimate false discovery rates conditional on covariates (2020) (0)
- Gene expression EDGE : extraction and analysis of differential gene expression (2006) (0)
- Reproducible RNA-seq analysis using recount2 (2017) (0)
- Previously titled: regionReport: Interactive reports for region-based analyses (2021) (0)
- recount3: summaries and queries for large-scale RNA-seq expression and splicing (2021) (0)
- Developmental and genetic regulation of the human cortex transcriptome illuminate schizophrenia pathogenesis (2018) (0)
- 1 ari : The Automated R Instructor (2020) (0)
- A statistical approach to selecting and confirming validation targets in -omics experiments (2012) (0)
- 694. RNA-Seq Samples Beyond the Known Transcriptome with Derfinder Available via Recount (2017) (0)
- Gene set bagging for estimating the probability a statistically significant result will replicate (2013) (0)
- Numerical approaches to solving PDE modelling Mountain PineBeetle Phenology (2003) (0)
- Widespread splicing of repetitive element loci into coding regions of gene transcripts. (2016) (0)
- Practical impacts of genomic data “cleaning” on biological discovery using surrogate variable analysis (2015) (0)
- A visual tool for defining reproducibility and replicability (2019) (0)
- 258 ari : The Automated R Instructor (2020) (0)
- Linking open-source code commits and MOOC grades to evaluate massive online open peer review (2021) (0)
- Ari: The Automated R Instructor (2020) (0)
- Publisher Correction: A visual tool for defining reproducibility and replicability (2019) (0)
- Abstract 330: Expression of Glycolytic Enzymes in Induced Pluripotent Stem Cells, Derived Megakaryocytes, and Endothelial Cells (2020) (0)
- Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive (2016) (0)
This paper list is powered by the following services:
Other Resources About Jeffrey T. Leek
What Schools Are Affiliated With Jeffrey T. Leek?
Jeffrey T. Leek is affiliated with the following schools: