Kunle Olukotun
American computer scientist
Kunle Olukotun's AcademicInfluence.com Rankings
Download Badge
Computer Science
Kunle Olukotun's Degrees
- Masters Electrical Engineering and Computer Science Stanford University
Why Is Kunle Olukotun Influential?
(Suggest an Edit or Addition)According to Wikipedia, Oyekunle Ayinde "Kunle" Olukotun is a British-born Nigerian computer scientist who is the Cadence Design Systems Professor of the Stanford School of Engineering, Professor of Electrical Engineering and Computer Science at Stanford University and the director of the Stanford Pervasive Parallelism Lab. Olukotun is known as the “father of the multi-core processor”, and the leader of the Stanford Hydra Chip Multiprocessor research project. Olukotun's achievements include designing the first general-purpose multi-core CPU, innovating single-chip multiprocessor and multi-threaded processor design, and pioneering multicore CPUs and GPUs, transactional memory technology and domain-specific languages programming models. Olukotun's research interests include computer architecture, parallel programming environments and scalable parallel systems, domain specific languages and high-level compilers.
Kunle Olukotun's Published Works
Published Works
- Map-Reduce for Machine Learning on Multicore (2006) (1161)
- Niagara: a 32-way multithreaded Sparc processor (2005) (1081)
- STAMP: Stanford Transactional Applications for Multi-Processing (2008) (995)
- The case for a single-chip multiprocessor (1996) (867)
- Transactional memory coherence and consistency (2004) (760)
- The Future of Microprocessors (2005) (686)
- A Single-Chip Multiprocessor (1997) (449)
- Data speculation support for a chip multiprocessor (1998) (420)
- The Stanford Hydra CMP (2000) (375)
- Accelerating CUDA graph algorithms at maximum warp (2011) (373)
- An effective hybrid transactional memory system with strong isolation guarantees (2007) (307)
- Green-Marl: a DSL for easy and efficient graph analysis (2012) (307)
- Efficient Parallel Graph Exploration on Multi-Core CPU and GPU (2011) (292)
- DAWNBench : An End-to-End Deep Learning Benchmark and Competition (2017) (272)
- OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning (2011) (223)
- A practical concurrent binary search tree (2010) (217)
- A Heterogeneous Parallel Framework for Domain-Specific Languages (2011) (204)
- REMARC : Reconfigurable Multimedia Array Coprocessor (1999) (200)
- EmptyHeaded: A Relational Engine for Graph Processing (2015) (188)
- The Atomos transactional programming language (2006) (187)
- Architectural Semantics for Practical Transactional Memory (2006) (187)
- Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms (2015) (186)
- Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages (2014) (177)
- A domain-specific approach to heterogeneous parallelism (2011) (175)
- Hardware acceleration of database operations (2014) (168)
- A quantitative analysis of reconfigurable coprocessors for multimedia applications (1998) (166)
- Energy-Efficient Abundant-Data Computing: The N3XT 1,000x (2015) (165)
- Plasticine: A reconfigurable architecture for parallel patterns (2017) (164)
- The Jrpm system for dynamically parallelizing Java programs (2003) (161)
- Maximizing CMP throughput with mediocre cores (2005) (159)
- Understanding and optimizing asynchronous low-precision stochastic gradient descent (2017) (155)
- A Scalable, Non-blocking Approach to Transactional Memory (2007) (147)
- Programming with transactional coherence and consistency (TCC) (2004) (141)
- Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems (2014) (140)
- Spatial: a language and compiler for application accelerators (2018) (139)
- Analysis and design of latch-controlled synchronous digital circuits (1990) (124)
- Language virtualization for heterogeneous parallel computing (2010) (122)
- CheckT/sub c/ and minT/sub c/: timing verification and optimal clocking of synchronous digital circuits (1990) (112)
- Evaluation of Design Alternatives for a Multiprocessor Microprocessor (1996) (111)
- Exposing speculative thread parallelism in SPEC2000 (2005) (111)
- Using thread-level speculation to simplify manual parallelization (2003) (108)
- The common case transactional behavior of multithreaded programs (2006) (108)
- Optimizing data structures in high-level programs: new directions for extensible compilers based on staging (2013) (104)
- High-Accuracy Low-Precision Training (2018) (98)
- Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark (2018) (97)
- iChip Multiprocessor Architecture: Techniques to Improve Throughput and Latency (2007) (97)
- Automatic Generation of Efficient Accelerators for Reconfigurable Hardware (2016) (96)
- The OpenTM Transactional Application Programming Interface (2007) (95)
- Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor (1997) (92)
- On fast parallel detection of strongly connected components (SCC) in small-world graphs (2013) (92)
- Java as a specification language for hardware-software systems (1997) (91)
- Improving the performance of speculatively parallel applications on the Hydra CMP (1999) (90)
- Implementing Domain-Specific Languages for Heterogeneous Parallel Computing (2011) (89)
- Exploiting method-level parallelism in single-threaded Java programs (1998) (88)
- A highly scalable Restricted Boltzmann Machine FPGA implementation (2009) (86)
- Tradeoffs in transactional memory virtualization (2006) (84)
- GraphOps: A Dataflow Library for Graph Analytics Acceleration (2016) (80)
- Characterization of TCC on chip-multiprocessors (2005) (77)
- Eigenbench: A simple exploration tool for orthogonal TM characteristics (2010) (75)
- A practical FPGA-based framework for novel CMP research (2007) (75)
- Transactional coherence and consistency: simplifying parallel hardware and software (2004) (75)
- REMARC (abstract): reconfigurable multimedia array coprocessor (1998) (69)
- Generating Configurable Hardware from Parallel Patterns (2015) (69)
- Exploring the design space for a shared-cache multiprocessor (1994) (69)
- Increasing Cache Port Efficiency for Dynamic Superscalar Microprocessors (1996) (67)
- Building-Blocks for Performance Oriented DSLs (2011) (65)
- Composition and Reuse with Compiled Domain-Specific Languages (2013) (63)
- Hardware system synthesis from Domain-Specific Languages (2014) (62)
- A General Method for Compiling Event-Driven Simulations (1995) (62)
- Practical Design Space Exploration (2018) (61)
- A software-hardware cosynthesis approach to digital system simulation (1994) (57)
- ATLAS: A Chip-Multiprocessor with Transactional Memory Support (2007) (56)
- Transactional predication: high-performance concurrent sets and maps for STM (2010) (55)
- Simplifying Scalable Graph Processing with a Domain-Specific Language (2014) (54)
- TEST: a Tracer for Extracting Speculative Threads (2003) (53)
- Have abstraction and eat performance, too: Optimized heterogeneous computing with parallel patterns (2016) (51)
- Designing High Bandwidth On-chip Caches (1997) (49)
- Multicore Processors and Systems (2009) (49)
- Transactional collection classes (2007) (49)
- Locality-Aware Mapping of Nested Parallel Patterns on GPUs (2014) (48)
- Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling (2016) (47)
- Testing implementations of transactional memory (2006) (45)
- Surgical precision JIT compilers (2014) (43)
- A Large-Scale Architecture for Restricted Boltzmann Machines (2010) (42)
- Runtime automatic speculative parallelization (2011) (41)
- Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data (2018) (40)
- The impact of shared-cache clustering in small-scale shared-memory multiprocessors (1996) (38)
- Exploiting Coarse-Grain Parallelism in the MPEG-2 Algorithm (1998) (38)
- Performance optimization of pipelined primary cache (1992) (35)
- Ubiquitous Parallel Computing from Berkeley, Illinois, and Stanford (2010) (34)
- Forge: generating a high performance DSL implementation from a declarative specification (2014) (34)
- A scalable formal verification methodology for pipelined microprocessors (1996) (33)
- The hierarchical multi-bank DRAM: a high-performance architecture for memory integrated with processors (1997) (33)
- Implementing and evaluating nested parallel transactions in software transactional memory (2010) (33)
- Verifying correct pipeline implementation for microprocessors (1997) (32)
- The Information-Form Data Association Filter (2005) (31)
- The Benefits of Clustering in Shared Address Space Multiprocessors: An Applications-Driven Investigation (1995) (31)
- Transactional Execution of Java Programs (2005) (29)
- CCSTM: A Library-Based STM for Scala (2010) (29)
- Digital system simulation: methodologies and examples (1998) (28)
- Elastic RSS: Co-Scheduling Packets and Cores Using Programmable NICs (2019) (28)
- TAPE: a transactional application profiling environment (2005) (28)
- A Preliminary Investigation into Parallel Routing on a Hypercube Computer (1987) (27)
- Infrastructure for Usable Machine Learning: The Stanford DAWN Project (2017) (27)
- Considerations in the Design of Hydra: A Multiprocessor-on-a-Chip Microarchitecture (1998) (27)
- Feedback-directed barrier optimization in a strongly isolated STM (2009) (26)
- Performance Optimization of Pipelined Primary Caches (1992) (26)
- Hardware acceleration of transactional memory on commodity systems (2011) (26)
- EmptyHeaded (2017) (26)
- Go Meta! A Case for Generative Programming and DSLs in Performance Critical Systems (2015) (25)
- SARA: Scaling a Reconfigurable Dataflow Accelerator (2021) (23)
- Transactional Memory Coherence and Consistency ( TCC ) (2004) (22)
- MLSys: The New Frontier of Machine Learning Systems (2019) (21)
- Taurus: An Intelligent Data Plane (2020) (20)
- A Single Chip Multiprocessor Integrated with DRAM (1997) (20)
- Transactional Memory: The Hardware-Software Interface (2007) (20)
- Scalable Interconnects for Reconfigurable Spatial Architectures (2019) (19)
- EMEURO: A framework for generating multi-purpose accelerators via deep learning (2015) (19)
- REMARC: Reconfigurable Multimedia Array Coprocessor (Abstract). (1998) (19)
- Compilation of sparse array programming models (2021) (18)
- Targeting Dynamic Compilation for Embedded Environments (2002) (18)
- The design of a microsupercomputer (1991) (18)
- Executing Java programs with transactional memory (2006) (18)
- Bayesian Optimization with a Prior for the Optimum (2021) (18)
- Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width (2015) (17)
- High Bandwidth On-Chip Cache Design (2001) (17)
- LevelHeaded: A Unified Engine for Business Intelligence and Linear Algebra Querying (2018) (17)
- Gorgon: Accelerating Machine Learning from Relational Data (2020) (17)
- Implementing a cache for a high-performance GaAs microprocessor (1991) (16)
- HyperMapper: a Practical Design Space Exploration Framework (2019) (16)
- FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures (2010) (15)
- Old techniques for new join algorithms: A case study in RDF processing (2016) (15)
- Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator (2019) (15)
- Flare: Native Compilation for Heterogeneous Workloads in Apache Spark (2017) (15)
- EmptyHeaded: Boolean Algebra Based Graph Processing (2015) (15)
- Building and Using the ATLAS Transactional Memory System (2006) (12)
- A chip prototyping substrate: the flexible architecture for simulation and testing (FAST) (2005) (12)
- ATLAS: A Scalable Emulator for Transactional Parallel Systems (2005) (12)
- Efficient state representation for symbolic simulation (2002) (11)
- Delite (2014) (11)
- Towards soft optimization techniques for parallel cognitive applications (2007) (11)
- Capstan: A Vector RDA for Sparsity (2021) (11)
- Are single-chip multiprocessors in reach? (2001) (11)
- Rationale, Design and Performance of the Hydra Multiprocessor (1994) (11)
- Plasticine: A Reconfigurable Accelerator for Parallel Patterns (2018) (11)
- Hardware/software co-design for high performance computing: Challenges and opportunities (2010) (10)
- Mind the Gap: Bridging Multi-Domain Query Workloads with EmptyHeaded (2017) (10)
- The Identity Management Kalman Filter (IMKF) (2006) (10)
- Automatic support for multi-module parallelism from computational patterns (2015) (10)
- Emulation and Prototyping Of Digital Systems (1996) (9)
- Making nested parallel transactions practical using lightweight hardware support (2010) (9)
- Multilevel Optimization of Pipelined Caches (1997) (9)
- TensorFlow to Cloud FPGAs: Tradeoffs for Accelerating Deep Neural Networks (2019) (9)
- Implementing and Evaluating a Model Checker for Transactional Memory Systems (2010) (9)
- Taurus: a data plane architecture for per-packet ML (2020) (9)
- Aurochs: An Architecture for Dataflow Threads (2021) (8)
- Beyond parallel programming with domain specific languages (2014) (8)
- A case of system-level hardware/software co-design and co-verification of a commodity multi-processor system with custom hardware (2012) (8)
- Report for the NSF Workshop on Cross ‐ layer Power Optimization and Management (2012) (7)
- Prior-guided Bayesian Optimization (2020) (7)
- DeepFreak: Learning Crystallography Diffraction Patterns with Automated Machine Learning (2019) (6)
- Workshop on Advancing Computer Architecture Research ( ACAR-II ) Laying a New Foundation for IT : Computer Architecture for 2025 and Beyond (2011) (5)
- High performance embedded domain specific languages (2012) (5)
- Achieving scalable hardware verification with symbolic simulation (2003) (5)
- Polystore++: Accelerated Polystore System for Heterogeneous Workloads (2019) (5)
- Improving the performance of speculatively parallel applications on the Hydra CMP (1999) (5)
- High Performance Cache Architectures to Support Dynamic Superscalar Microprocessors (1995) (5)
- Technology-organization tradeoffs in the architecture of a high-performance processor (1991) (5)
- JMTP: an architecture for exploiting concurrency in embedded Java applications with real-time considerations (1999) (4)
- Workshop on Advancing Computer Architecture Research ( ACAR-1 ) Failure is not an Option : Popular Parallel Programming (2010) (3)
- Hierarchical Gate-Array Routing on a Hypercube Multiprocessor (1990) (3)
- "Can We Still Keep the Faith?": A debate on the Future of Multi-Core Systems (2007) (3)
- Improving software concurrency with hardware-assisted memory snapshot (2008) (3)
- Efficient Multiway Hash Join on Reconfigurable Hardware (2019) (3)
- Domain-Specific Languages for Heterogeneous Parallel Computing (2013) (3)
- TEST: A Tracer for Extracting Speculative Thread (2003) (3)
- The Jrpm System for Dynamically Parallelizing Sequential Java Programs (2003) (3)
- The Software Stack for Transactional Memory Challenges and Opportunities (2006) (2)
- Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter Networks (2022) (2)
- Chip Multiprocessor Architecture (2007) (2)
- CHIP MULTIPROCESSORS OFFER AN ECONOMICAL, SCALABLE ARCHITECTURE FOR FUTURE MICROPROCESSORS . THREAD-LEVEL SPECULATION SUPPORT ALLOWS THEM TO SPEED UP PAST SOFTWARE . THE STANFORD HYDRA CMP (2000) (2)
- The Sparse Abstract Machine (2022) (2)
- Plasticine (2017) (2)
- Utilizing Static Analysis and Code Generation to Accelerate Neural Networks (2012) (1)
- Ased: availability, security, and debugging support usingtransactional memory (2008) (1)
- An application analysis framework for polymorphic chip multiprocessors (2005) (1)
- Scaling data analytics with moore's law (2016) (1)
- New Generation Microprocessor Architecture (2):NIAGARA: A 32-Way Multithreaded SPARC Processor (2005) (1)
- The stanford pervasive parallelism lab (2009) (1)
- The Impact of Shared-Cache Clustering in (1996) (1)
- Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture (2022) (1)
- BaCO: A Fast and Portable Bayesian Compiler Optimization Framework (2022) (1)
- High-Accuracy Low-Precision Training High-Accuracy Low-Precision Training (2018) (1)
- Efficient Memory Partitioning in Software Defined Hardware (2022) (1)
- Exploring the Utility of Developer Exhaust (2018) (1)
- DCP: an algorithm for datapath/control partitioning of synthesizable RTL models (1998) (1)
- Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (2008) (1)
- A New Approach to Programming and Prototyping Parallel Systems (2005) (1)
- DiffraNet: Automatic Classification of Serial Crystallography Diffraction Patterns (2018) (1)
- SERVING RECURRENT NEURAL NETWORKS EFFICIENTLY WITH A SPATIAL ARCHITECTURE (2019) (1)
- Guest Editorial (2022) (0)
- 2021 IEEE International Symposium on Workload Characterization (2021) (0)
- Java as a Specification Lan uage for Hardware-Software Systems (1997) (0)
- Author's retrospective for: improving the performance of speculatively parallel applications on the hydra CMP (2014) (0)
- Chopping off the Tail: Bounded Non-Determinism for Real-Time Accelerators (2021) (0)
- Session details: Session order 8: programming systems session (2014) (0)
- IPDPS 2011 Wednesday 25th Year Panel: What's ahead? (2011) (0)
- Accelerating SLIDE: Exploiting Sparsity on Accelerator Architectures (2022) (0)
- Why Build a Cmp? (2007) (0)
- Session details: Session order 8: programming systems session (2014) (0)
- "Let the Data Flow!" (2021) (0)
- Session details: Multimedia and mobile (2009) (0)
- Coarse-Grained Reconfigurable Architectures (2020) (0)
- WHAT'S AHEAD? (2011) (0)
- Analysis of the Time-To-Accuracy Metric and Entries in the DAWNBench Deep Learning Benchmark (2018) (0)
- LevelHeaded: Making Worst-Case Optimal Joins Work in the Common Case (2017) (0)
- Guest Editorial (2013) (0)
- Global perspectives of diversity, equity, and inclusion (2022) (0)
- Practical Design Space Exploration 1 (2019) (0)
- Chip multiprocessor architecture: A programmability-driven approach (2010) (0)
- Revet: A Language and Compiler for Dataflow Threads (2023) (0)
- Extreme scale computing: Challenges and opportunities (2010) (0)
- Speculation Support for a Chip Multiprocessor (0)
- Plasticine - A Universal Data Analytics Accelerator (2020) (0)
- Ubiquitous Parallel Computing the Parlab at Berkeley, Upcrc-illinois, and the Pervasive Parallel Laboratory at Stanford Are Studying How to Make Parallel Programming Succeed given Industry's Recent Shift to Multicore Computing. All Three Centers Assume That Future Microprocessors Will Have Hundreds (2011) (0)
- Session details: Multimedia and mobile (2009) (0)
- Panel Statement (2011) (0)
- 17th International Conference on Parallel Architectures and Compilation Techniques, PACT 2008, Toronto, Ontario, Canada, October 25-29, 2008 (2008) (0)
- High performance lattice regression on FPGAs via a high level hardware description language (2021) (0)
- Sigma: Compiling Einstein Summations to Locality-Aware Dataflow (2023) (0)
- Hot topic: low power multi-core architectures [Special Session] (2005) (0)
- Train Model using Network Architecture and Log Records Trained Architecture N 2 . Predict Performance of Untrained Architecture (2018) (0)
- Session details: Hot topic - low-power multi-core architectures (2005) (0)
- A Flexible , Efficient Concurrent Garbage Collector for Speculative Thread Processors (2003) (0)
- I Digital System Simulation : Methodologies and Examples Design level Algorithm (2004) (0)
- PMU PCU PMU PCU PMU PCU PMU PCU PMU DRAM Interface DRAM Interface DRAM Interface DRAM Interface PMU Pattern Memory Unit PCU Pattern Compute Unit PMU PCU PMU PCU PMU PCU (2019) (0)
- Bridging the Performance Gap between Manual and Automatic Compilers with Intent-based Compilation (2015) (0)
This paper list is powered by the following services:
Other Resources About Kunle Olukotun
What Schools Are Affiliated With Kunle Olukotun?
Kunle Olukotun is affiliated with the following schools: