Mike Cafarella
American computer scientist
Mike Cafarella's AcademicInfluence.com Rankings
Download Badge
Computer Science
Mike Cafarella's Degrees
- PhD Computer Science University of Washington
- Masters Computer Science University of Washington
- Bachelors Computer Science University of Washington
Similar Degrees You Can Earn
Why Is Mike Cafarella Influential?
(Suggest an Edit or Addition)According to Wikipedia, Mike Cafarella is a computer scientist specializing in database management systems. He is a principal research scientist of computer science at MIT Computer Science and Artificial Intelligence Laboratory. Before coming to MIT, he was a professor of Computer Science and Engineering at the University of Michigan from 2009 to 2020. Along with Doug Cutting, he is one of the original co-founders of the Hadoop and Nutch open-source projects. Cafarella was born in New York City but moved to Westwood, MA early in his childhood. After completing his bachelor's degree at Brown University, he earned a Ph.D. specializing in database management systems at the University of Washington under Dan Suciu and Oren Etzioni. He was also involved in several notable start-ups, including Tellme Networks, and co-founder of Lattice Data, which was acquired by Apple in 2017.
Mike Cafarella's Published Works
Published Works
- Open Information Extraction from the Web (2007) (2389)
- Unsupervised named-entity extraction from the Web: An experimental study (2005) (1263)
- Web-scale information extraction in knowitall: (preliminary results) (2004) (911)
- WebTables: exploring the power of tables on the web (2008) (687)
- TextRunner: Open Information Extraction on the Web (2007) (372)
- Data Integration for the Relational Web (2009) (238)
- Theoretical Limits of Hydrogen Storage in Metal–Organic Frameworks: Opportunities and Trade-Offs (2013) (193)
- Automatic Optimization for MapReduce Programs (2011) (187)
- Uncovering the Relational Web (2008) (175)
- Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison (2004) (149)
- Machine Reading (2006) (147)
- KnowItNow: Fast, Scalable Information Extraction from the Web (2005) (146)
- Using Social Media to Measure Labor Market Flows (2014) (117)
- Brainwash: A Data System for Feature Engineering (2013) (117)
- Visualization-aware sampling for very large databases (2015) (113)
- Structured Data on the Web (2009) (110)
- Automatic web spreadsheet data extraction (2013) (106)
- A search engine for natural language applications (2005) (99)
- Building Nutch: Open Source Search (2004) (95)
- Web-scale extraction of structured data (2009) (94)
- Sample-driven schema mapping (2012) (93)
- Ontology-Driven Information Extraction with OntoSyphon (2006) (91)
- Foofah: Transforming Data By Example (2017) (90)
- Database Learning: Toward a Database that Becomes Smarter Every Time (2017) (74)
- Structured Querying of Web Text Data: A Technical Challenge (2007) (69)
- Integrating spreadsheet data via accurate and low-effort extraction (2014) (66)
- Manimal: relational optimization for data-intensive programs (2010) (57)
- Extracting and Querying a Comprehensive Web Database (2009) (55)
- Relational Web Search (2006) (54)
- Web data management (2011) (50)
- HARE: Hardware accelerator for regular expressions (2016) (50)
- Senbazuru: A Prototype Spreadsheet Database Management System (2013) (49)
- Ontology-driven, unsupervised instance population (2008) (49)
- Ten Years of WebTables (2018) (49)
- MIRIS: Fast Object Track Queries in Video (2020) (45)
- Extracting Databases from Dark Data with DeepDive (2016) (40)
- Navigating Extracted Data with Schema Discovery (2007) (40)
- Long-tail Vocabulary Dictionary Extraction from the Web (2016) (38)
- Link-Prediction Enhanced Consensus Clustering for Complex Networks (2015) (37)
- Data management projects at Google (2008) (37)
- Input selection for fast feature engineering (2016) (36)
- Using web corpus statistics for program analysis (2014) (35)
- Structured querying of web text (2007) (34)
- DiagramFlyer: A Search Engine for Data-Driven Diagrams (2015) (34)
- Neighbor-Sensitive Hashing (2015) (32)
- Spreadsheet Property Detection With Rule-assisted Active Learning (2017) (30)
- Physical Representation-Based Predicate Optimization for a Visual Analytics Database (2018) (30)
- Unnatural Language Processing: Bridging the Gap Between Synthetic and Natural Language Data (2020) (29)
- Machine Learning and Databases: The Sound of Things to Come or a Cacophony of Hype? (2015) (25)
- DeepDive (2017) (24)
- Structured Querying of Web Text A Technical Challenge (2006) (23)
- DBExplorer: Exploratory Search in Databases (2016) (22)
- HAWK: Hardware support for unstructured log processing (2016) (22)
- Leveraging Noisy Lists for Social Feed Ranking (2013) (22)
- Foofah: A Programming-By-Example System for Synthesizing Data Transformation Programs (2017) (19)
- An Integrated Development Environment for Faster Feature Engineering (2014) (16)
- Beaver (2018) (16)
- Ringtail: Feature Selection For Easier Nowcasting (2013) (16)
- Structured Queries Over Web Text (2006) (16)
- Mindtagger: A Demonstration of Data Labeling in Knowledge Base Construction (2015) (11)
- DQBarge: Improving Data-Quality Tradeoffs in Large-Scale Internet Services (2016) (11)
- Predicate Optimization for a Visual Analytics Database (2018) (11)
- DBOS: A Proposal for a Data-Centric Operating System (2020) (11)
- Dark Data: Are we solving the right problems? (2016) (10)
- Extracting and managing structured web data (2009) (10)
- Duoquest: A Dual-Specification System for Expressive SQL Queries (2020) (10)
- Minimizing Remote Accesses in MapReduce Clusters (2013) (10)
- Ringtail: A Generalized Nowcasting System (2013) (10)
- Context-specific Language Modeling for Human Trafficking Detection from Online Advertisements (2019) (9)
- BeeCluster: drone orchestration via predictive optimization (2020) (9)
- DBOS: A DBMS-oriented Operating System (2021) (9)
- CLX: Towards verifiable PBE data transformation (2019) (8)
- A Declarative Query Processing System for Nowcasting (2016) (7)
- Sledgehammer: Cluster-Fueled Debugging (2018) (6)
- A Method for Optimizing Opaque Filter Queries (2020) (6)
- A query system for social media signals (2016) (6)
- Runtime Support for Human-in-the-Loop Feature Engineering System (2016) (6)
- Searching for Statistical Diagrams (2011) (5)
- Constructing Expressive Relational Queries with Dual-Specification Synthesis (2020) (5)
- Constraint-based Explanation and Repair of Filter-Based Transformations (2018) (5)
- Beaver: Towards a Declarative Schema Mapping (2018) (4)
- Reducing MapReduce Abstraction Costs for Text-centric Applications (2014) (4)
- Pricing risk in prostitution: Evidence from online sex ads (2019) (4)
- Replicated Layout for In-Memory Database Systems (2021) (3)
- Ten Years of Web Tables (2018) (3)
- Using web corpus statistics for program analysis (2014) (3)
- Data Governance in a Database Operating System (DBOS) (2021) (3)
- TextRunner (2007) (2)
- Knowledge Graph Programming with a Human-in-the-Loop: Preliminary Results (2019) (2)
- You can't debug what you can't see: Expanding observability with the OmniTable (2019) (2)
- A Progress Report on DBOS: A Database-oriented Operating System (2022) (2)
- Towards Data Discovery by Example (2020) (2)
- Apiary: A DBMS-Backed Transactional Function-as-a-Service Framework (2022) (2)
- Demonstration of a Multiresolution Schema Mapping System (2018) (2)
- Infrastructure for Rapid Open Knowledge Network Development (2022) (1)
- Debugging the OmniTable Way (2022) (1)
- Controlled Intentional Degradation in Analytical Video Systems (2022) (1)
- A Polystore Based Database Operating System (DBOS) (2020) (1)
- Technical Report: An Overview of Data Integration and Preparation (2020) (1)
- Unifacta: Profiling-driven String Pattern Standardization (2018) (1)
- On Explaining Confounding Bias (2022) (1)
- Rational pricing in prostitution : Evidence from online sex ads ∗ (2018) (1)
- Technical Report on Data Integration and Preparation (2021) (1)
- BE: A search engine for NLP research (2006) (1)
- Surveillance VideoQuerying With A Human-in-the-Loop (2020) (0)
- This paper is included in the Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation. Debugging the OmniTable Way (2022) (0)
- On Automatic Database Management System Tuning Using Machine Learning (2021) (0)
- Chapter 10 : Web Data Introduced by Peter Bailis Selected Readings : (2016) (0)
- How Best to Build Web-Scale Data Managers? A Panel Discussion (2009) (0)
- TagMe: GPS-Assisted Automatic Object Annotation in Videos (2021) (0)
- HILDA'22: The SIGMOD 2022 Workshop on Human-in-the-Loop Data Analytics (2022) (0)
- KnowItNow (2005) (0)
- Building a Shared Conceptual Model of Complex, Heterogeneous Data Systems: A Demonstration (2022) (0)
- Transactions Make Debugging Easy (2022) (0)
- Disambiguating Natural Language Queries with Tuples (2019) (0)
- Synthesizing Data Programs (2015) (0)
- Faster Feature Engineering by Approximate Evaluation (2016) (0)
- DBOS (2021) (0)
- CLX: Towards a scalable and comprehensible design of PBE data transformations (2018) (0)
- Enabling useful provenance in scripting languages with a human-in-the-loop (2022) (0)
- Statistical Learning of ISP Peering Policies (0)
- Call for a Shake Up in Search ! (2011) (0)
- BeeCluster (2020) (0)
This paper list is powered by the following services:
Other Resources About Mike Cafarella
What Schools Are Affiliated With Mike Cafarella?
Mike Cafarella is affiliated with the following schools: