Mike Cafarella | Academic Influence

Mike Cafarella's AcademicInfluence.com Rankings

Mike Cafarella

Computer Science

#926

World Rank

#961

Historical Rank

#503

USA Rank

Database

#329

World Rank

#343

Historical Rank

#154

USA Rank

computer-science Degrees

Download Badge

Computer Science

Mike Cafarella's Degrees

PhD Computer Science University of Washington
Masters Computer Science University of Washington
Bachelors Computer Science University of Washington

Similar Degrees You Can Earn

Why Is Mike Cafarella Influential?

(Suggest an Edit or Addition)

According to Wikipedia, Mike Cafarella is a computer scientist specializing in database management systems. He is a principal research scientist of computer science at MIT Computer Science and Artificial Intelligence Laboratory. Before coming to MIT, he was a professor of Computer Science and Engineering at the University of Michigan from 2009 to 2020. Along with Doug Cutting, he is one of the original co-founders of the Hadoop and Nutch open-source projects. Cafarella was born in New York City but moved to Westwood, MA early in his childhood. After completing his bachelor's degree at Brown University, he earned a Ph.D. specializing in database management systems at the University of Washington under Dan Suciu and Oren Etzioni. He was also involved in several notable start-ups, including Tellme Networks, and co-founder of Lattice Data, which was acquired by Apple in 2017.

(See a Problem?)

Mike Cafarella's Published Works

Number of citations in a given year to any of this author's works

Total number of citations to an author for the works they published in a given year. This highlights publication of the most important work(s) by the author

Published Works

Open Information Extraction from the Web (2007) (2389)
Unsupervised named-entity extraction from the Web: An experimental study (2005) (1263)
Web-scale information extraction in knowitall: (preliminary results) (2004) (911)
WebTables: exploring the power of tables on the web (2008) (687)
TextRunner: Open Information Extraction on the Web (2007) (372)
Data Integration for the Relational Web (2009) (238)
Theoretical Limits of Hydrogen Storage in Metal–Organic Frameworks: Opportunities and Trade-Offs (2013) (193)
Automatic Optimization for MapReduce Programs (2011) (187)
Uncovering the Relational Web (2008) (175)
Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison (2004) (149)
Machine Reading (2006) (147)
KnowItNow: Fast, Scalable Information Extraction from the Web (2005) (146)
Using Social Media to Measure Labor Market Flows (2014) (117)
Brainwash: A Data System for Feature Engineering (2013) (117)
Visualization-aware sampling for very large databases (2015) (113)
Structured Data on the Web (2009) (110)
Automatic web spreadsheet data extraction (2013) (106)
A search engine for natural language applications (2005) (99)
Building Nutch: Open Source Search (2004) (95)
Web-scale extraction of structured data (2009) (94)
Sample-driven schema mapping (2012) (93)
Ontology-Driven Information Extraction with OntoSyphon (2006) (91)
Foofah: Transforming Data By Example (2017) (90)
Database Learning: Toward a Database that Becomes Smarter Every Time (2017) (74)
Structured Querying of Web Text Data: A Technical Challenge (2007) (69)
Integrating spreadsheet data via accurate and low-effort extraction (2014) (66)
Manimal: relational optimization for data-intensive programs (2010) (57)
Extracting and Querying a Comprehensive Web Database (2009) (55)
Relational Web Search (2006) (54)
Web data management (2011) (50)
HARE: Hardware accelerator for regular expressions (2016) (50)
Senbazuru: A Prototype Spreadsheet Database Management System (2013) (49)
Ontology-driven, unsupervised instance population (2008) (49)
Ten Years of WebTables (2018) (49)
MIRIS: Fast Object Track Queries in Video (2020) (45)
Extracting Databases from Dark Data with DeepDive (2016) (40)
Navigating Extracted Data with Schema Discovery (2007) (40)
Long-tail Vocabulary Dictionary Extraction from the Web (2016) (38)
Link-Prediction Enhanced Consensus Clustering for Complex Networks (2015) (37)
Data management projects at Google (2008) (37)
Input selection for fast feature engineering (2016) (36)
Using web corpus statistics for program analysis (2014) (35)
Structured querying of web text (2007) (34)
DiagramFlyer: A Search Engine for Data-Driven Diagrams (2015) (34)
Neighbor-Sensitive Hashing (2015) (32)
Spreadsheet Property Detection With Rule-assisted Active Learning (2017) (30)
Physical Representation-Based Predicate Optimization for a Visual Analytics Database (2018) (30)
Unnatural Language Processing: Bridging the Gap Between Synthetic and Natural Language Data (2020) (29)
Machine Learning and Databases: The Sound of Things to Come or a Cacophony of Hype? (2015) (25)
DeepDive (2017) (24)
Structured Querying of Web Text A Technical Challenge (2006) (23)
DBExplorer: Exploratory Search in Databases (2016) (22)
HAWK: Hardware support for unstructured log processing (2016) (22)
Leveraging Noisy Lists for Social Feed Ranking (2013) (22)
Foofah: A Programming-By-Example System for Synthesizing Data Transformation Programs (2017) (19)
An Integrated Development Environment for Faster Feature Engineering (2014) (16)
Beaver (2018) (16)
Ringtail: Feature Selection For Easier Nowcasting (2013) (16)
Structured Queries Over Web Text (2006) (16)
Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction (2015) (11)
DQBarge: Improving Data-Quality Tradeoffs in Large-Scale Internet Services (2016) (11)
Predicate Optimization for a Visual Analytics Database (2018) (11)
DBOS: A Proposal for a Data-Centric Operating System (2020) (11)
Dark Data: Are we solving the right problems? (2016) (10)
Extracting and managing structured web data (2009) (10)
Duoquest: A Dual-Specification System for Expressive SQL Queries (2020) (10)
Minimizing Remote Accesses in MapReduce Clusters (2013) (10)
Ringtail: A Generalized Nowcasting System (2013) (10)
Context-specific Language Modeling for Human Trafficking Detection from Online Advertisements (2019) (9)
BeeCluster: drone orchestration via predictive optimization (2020) (9)
DBOS: A DBMS-oriented Operating System (2021) (9)
CLX: Towards verifiable PBE data transformation (2019) (8)
A Declarative Query Processing System for Nowcasting (2016) (7)
Sledgehammer: Cluster-Fueled Debugging (2018) (6)
A Method for Optimizing Opaque Filter Queries (2020) (6)
A query system for social media signals (2016) (6)
Runtime Support for Human-in-the-Loop Feature Engineering System (2016) (6)
Searching for Statistical Diagrams (2011) (5)
Constructing Expressive Relational Queries with Dual-Specification Synthesis (2020) (5)
Constraint-based Explanation and Repair of Filter-Based Transformations (2018) (5)
Beaver: Towards a Declarative Schema Mapping (2018) (4)
Reducing MapReduce Abstraction Costs for Text-centric Applications (2014) (4)
Pricing risk in prostitution: Evidence from online sex ads (2019) (4)
Replicated Layout for In-Memory Database Systems (2021) (3)
Ten Years of Web Tables (2018) (3)
Using web corpus statistics for program analysis (2014) (3)
Data Governance in a Database Operating System (DBOS) (2021) (3)
TextRunner (2007) (2)
Knowledge Graph Programming with a Human-in-the-Loop: Preliminary Results (2019) (2)
You can't debug what you can't see: Expanding observability with the OmniTable (2019) (2)
A Progress Report on DBOS: A Database-oriented Operating System (2022) (2)
Towards Data Discovery by Example (2020) (2)
Apiary: A DBMS-Backed Transactional Function-as-a-Service Framework (2022) (2)
Demonstration of a Multiresolution Schema Mapping System (2018) (2)
Infrastructure for Rapid Open Knowledge Network Development (2022) (1)
Debugging the OmniTable Way (2022) (1)
Controlled Intentional Degradation in Analytical Video Systems (2022) (1)
A Polystore Based Database Operating System (DBOS) (2020) (1)
Technical Report: An Overview of Data Integration and Preparation (2020) (1)
Unifacta: Profiling-driven String Pattern Standardization (2018) (1)
On Explaining Confounding Bias (2022) (1)
Rational pricing in prostitution : Evidence from online sex ads ∗ (2018) (1)
Technical Report on Data Integration and Preparation (2021) (1)
BE: A search engine for NLP research (2006) (1)
Surveillance VideoQuerying With A Human-in-the-Loop (2020) (0)
This paper is included in the Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation. Debugging the OmniTable Way (2022) (0)
On Automatic Database Management System Tuning Using Machine Learning (2021) (0)
Chapter 10 : Web Data Introduced by Peter Bailis Selected Readings : (2016) (0)
How Best to Build Web-Scale Data Managers? A Panel Discussion (2009) (0)
TagMe: GPS-Assisted Automatic Object Annotation in Videos (2021) (0)
HILDA'22: The SIGMOD 2022 Workshop on Human-in-the-Loop Data Analytics (2022) (0)
KnowItNow (2005) (0)
Building a Shared Conceptual Model of Complex, Heterogeneous Data Systems: A Demonstration (2022) (0)
Transactions Make Debugging Easy (2022) (0)
Disambiguating Natural Language Queries with Tuples (2019) (0)
Synthesizing Data Programs (2015) (0)
Faster Feature Engineering by Approximate Evaluation (2016) (0)
DBOS (2021) (0)
CLX: Towards a scalable and comprehensible design of PBE data transformations (2018) (0)
Enabling useful provenance in scripting languages with a human-in-the-loop (2022) (0)
Statistical Learning of ISP Peering Policies (0)
Call for a Shake Up in Search ! (2011) (0)
BeeCluster (2020) (0)

This paper list is powered by the following services:

Other Resources About Mike Cafarella

en.wikipedia.org

What Schools Are Affiliated With Mike Cafarella?

Mike Cafarella is affiliated with the following schools:

Image Attributions

Image Source for Mike Cafarella