Richard Vuduc
#109,836
Most Influential Person Now
American computer scientist
Richard Vuduc's AcademicInfluence.com Rankings
Richard Vuduccomputer-science Degrees
Computer Science
#4721
World Rank
#4984
Historical Rank
Parallel Computing
#65
World Rank
#67
Historical Rank
Database
#8228
World Rank
#8584
Historical Rank

Download Badge
Computer Science
Richard Vuduc's Degrees
- PhD Computer Science Stanford University
- Masters Computer Science Stanford University
- Bachelors Computer Science University of California, Berkeley
Similar Degrees You Can Earn
Why Is Richard Vuduc Influential?
(Suggest an Edit or Addition)According to Wikipedia, Richard Vuduc is a tenured professor of computer science at the Georgia Institute of Technology. His research lab, The HPC Garage, studies high-performance computing, scientific computing, parallel algorithms, modeling, and engineering. He is a member of the Association for Computing Machinery . As of 2022, Vuduc serves as Vice President of the SIAM Activity Group on Supercomputing. He has co-authored over 200 articles in peer-reviewed journals and conferences.
Richard Vuduc's Published Works
Number of citations in a given year to any of this author's works
Total number of citations to an author for the works they published in a given year. This highlights publication of the most important work(s) by the author
Published Works
- Optimization of sparse matrix-vector multiplication on emerging multicore platforms (2007) (826)
- SUSTain (2018) (558)
- OSKI: A Library of Automatically Tuned Sparse Matrix Kernels (2005) (556)
- Model-driven autotuning of sparse matrix-vector multiply on GPUs (2010) (437)
- Sparsity: Optimization Framework for Sparse Matrix Kernels (2004) (361)
- Automatic performance tuning of sparse matrix kernels (2003) (288)
- Self-Adapting Linear Algebra Algorithms and Software (2005) (234)
- A performance analysis framework for identifying potential benefits in GPGPU applications (2012) (208)
- A massively parallel adaptive fast-multipole method on heterogeneous architectures (2009) (184)
- Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure (2005) (165)
- Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures (2010) (165)
- Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply (2002) (154)
- A Roofline Model of Energy (2013) (144)
- Falcon: fault localization in concurrent programs (2010) (140)
- On the limits of GPU acceleration (2010) (136)
- Many-Thread Aware Prefetching Mechanisms for GPGPU Applications (2010) (133)
- POET: Parameterized Optimizations for Empirical Tuning (2007) (131)
- Statistical Models for Empirical Search-Based Performance Tuning (2004) (118)
- When cache blocking of sparse matrix vector multiply works and why (2007) (116)
- When Prefetching Works, When It Doesn’t, and Why (2012) (116)
- Self-stabilizing iterative solvers (2013) (96)
- Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems (2009) (88)
- Automated Empirical Optimization (2011) (84)
- Algorithmic Time, Energy, and Power on Candidate HPC Compute Building Blocks (2014) (76)
- Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply (2004) (71)
- Performance evaluation of concurrent collections on high-performance multicore computing systems (2010) (69)
- HiCOO: Hierarchical Storage of Sparse Tensors (2018) (68)
- Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures (2010) (65)
- An input-adaptive and in-place approach to dense tensor-times-matrix multiply (2015) (63)
- Effective Source-to-Source Outlining to Support Whole Program Empirical Optimization (2009) (62)
- SPARTan: Scalable PARAFAC2 for Large & Sparse Data (2017) (61)
- Autotuning in High-Performance Computing Applications (2018) (60)
- On the communication complexity of 3D FFTs and its implications for Exascale (2012) (59)
- Communicating Software Architecture using a Unified Single-View Visualization (2007) (51)
- Model-Driven Sparse CP Decomposition for Higher-Order Tensors (2017) (51)
- Undifferentiated facial electromyography responses to dynamic, audio-visual emotion displays in individuals with autism spectrum disorders. (2013) (50)
- A Distributed CPU-GPU Sparse Direct Solver (2014) (47)
- Statistical Models for Automatic Performance Tuning (2001) (45)
- A Unified Approach for Localizing Non-deadlock Concurrency Bugs (2012) (44)
- SWAMI: a framework for collaborative filtering algorithm development and evaluation. (2000) (42)
- Sparse Hierarchical Tucker Factorization and Its Application to Healthcare (2015) (41)
- Direct N-body Kernels for Multicore Platforms (2009) (41)
- CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems (2015) (41)
- Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method (2010) (38)
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU) (2012) (32)
- Balance Principles for Algorithm-Architecture Co-Design (2011) (32)
- Improving distributed memory applications testing by message perturbation (2006) (32)
- Load-Balanced Sparse MTTKRP on GPUs (2019) (32)
- Optimizing Sparse Tensor Times Matrix on Multi-core and Many-Core Architectures (2016) (31)
- Image segmentation using fractal dimension (2002) (29)
- SWAMI (poster session): a framework for collaborative filtering algorithm development and evaluation (2000) (29)
- The Backstroke framework for source level reverse computation applied to parallel discrete event simulation (2011) (27)
- A Distributed Kernel Summation Framework for General-Dimension Machine Learning (2012) (26)
- A Theoretical Framework for Algorithm-Architecture Co-design (2013) (26)
- Improving the energy efficiency of Big Cores (2014) (26)
- Performance Modeling and Analysis of Cache Blocking in Sparse Matrix Vector Multiply (2004) (26)
- Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems (2018) (25)
- Optimizing sparse tensor times matrix on GPUs (2019) (24)
- A type theory for probability density functions (2012) (24)
- Branch-Avoiding Graph Algorithms (2014) (23)
- What GPU Computing Means for High-End Systems (2011) (22)
- Efficient and effective sparse tensor reordering (2019) (22)
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (2012) (22)
- A CPU: GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method (2014) (21)
- The Optimized Sparse Kernel Interface (OSKI) Library User's Guide for Version 1.0.1h (2007) (20)
- An Initial Characterization of the Emu Chick (2018) (19)
- [Personal health]. (1969) (19)
- [Personal health]. (1969) (19)
- Code Generators for Automatic Tuning of Numerical Kernels: Experiences with FFTW (2000) (18)
- Parameterizing loop fusion for automated empirical tuning (2005) (18)
- Memory Hierarchy Optimizations and Performance ounds for Sparse A (2003) (17)
- Annotating user-defined abstractions for optimization (2005) (17)
- Performance Optimizations and Bounds for Sparse Symmetric Matrix-Multiple Vector Multiply (1985) (17)
- Memory Hierarchy Optimizations and Performance Bounds for Sparse A T Ax (2003) (16)
- Tool Support for Inspecting the Code Quality of HPC Applications (2007) (16)
- A New Method for Program Inversion (2012) (15)
- Modern Accelerator Technologies for Geographic Information Science (2013) (15)
- Griffin: grouping suspicious memory-access patterns to improve understanding of concurrency bugs (2013) (15)
- Temporal phenotyping of medically complex children via PARAFAC2 tensor factorization (2019) (15)
- A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems (2015) (15)
- SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping (2018) (15)
- Efficient Communications in Training Large Scale Neural Networks (2016) (15)
- A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices (2018) (14)
- Techniques for specifying bug patterns (2007) (13)
- A supernodal all-pairs shortest path algorithm (2020) (13)
- Design and Implementation of a Communication-Optimal Classifier for Distributed Kernel Support Vector Machines (2017) (13)
- Brief announcement: towards a communication optimal fast multipole method and its implications at exascale (2012) (13)
- Optimizing the computation of n-point correlations on large-scale astronomical data (2012) (12)
- A Brief History and Introduction to GPGPU (2013) (12)
- Fast sensitivity computations for trajectory optimization (2010) (12)
- GraSP: distributed streaming graph partitioning (2015) (12)
- Algorithmic Skeletons (2011) (11)
- Hybrid Dynamic Trees for Extreme-Resolution 3D Sparse Data Modeling (2016) (11)
- Polyadic Regression and its Application to Chemogenomics (2017) (10)
- How much (execution) time and energy does my algorithm cost? (2013) (10)
- A GPU-parallel construction of volumetric tree (2015) (10)
- Methods for High-Throughput Computation of Elementary Functions (2013) (9)
- Programming Strategies for Irregular Algorithms on the Emu Chick (2018) (9)
- A massively parallel adaptive fast multipole method on heterogeneous architectures (2012) (9)
- CUP: Cluster Pruning for Compressing Deep Neural Networks (2019) (9)
- A Microbenchmark Characterization of the Emu Chick (2018) (8)
- A Graphical Approach for Freeform Surface Offsetting With GPU Acceleration for Subtractive 3D Printing (2016) (8)
- Prospects for scalable 3D FFTs on heterogeneous exascale systems (2011) (8)
- Statistical Modeling of Feedback Data in an Automatic Tuning System (2000) (8)
- Adaptive Deep Path: Efficient Coverage of a Known Environment under Various Configurations (2019) (8)
- CA-SVM : Communication-Avoiding Support Vector Machines on Clusters (2016) (7)
- Applying the concurrent collections programming model to asynchronous parallel dense linear algebra (2010) (7)
- A Self-Correcting Connected Components Algorithm (2016) (7)
- A Wavelet Collocation Method for Solving PDEs (2001) (6)
- Analyzing the Energy Efficiency of the Fast Multipole Method Using a DVFS-Aware Energy Model (2016) (6)
- Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (2013) (6)
- Understanding the design trade-offs among current multicore systems for numerical computations (2009) (6)
- International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, New Orleans, LA, USA, November 16-21, 2014 (2014) (6)
- A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems (2019) (5)
- Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs (2020) (5)
- Distributed-Memory Parallel Symmetric Nonnegative Matrix Factorization (2020) (4)
- A communication-avoiding 3D sparse triangular solver (2019) (4)
- Step Ring Based 3D Path Planning via GPU Simulation for Subtractive 3D Printing (2016) (4)
- Synthesizing Loops for Program Inversion (2012) (4)
- Spatter: A Benchmark Suite for Evaluating Sparse Access Patterns (2018) (4)
- Support for Whole-Program Analysis and the Verification of the One-Definition Rule in C++ (2006) (4)
- An Extensible Open-Source Compiler Infrastructure for Testing (2005) (4)
- Modeling the Power Variability of Core Speed Scaling on Homogeneous Multicore Systems (2017) (4)
- Intrepydd: performance, productivity, and portability for data science application kernels (2020) (3)
- Atomic Operations (2011) (3)
- Scalable Knowledge Graph Analytics at 136 Petaflop/s (2020) (3)
- Auto-Tuning Distributed-Memory 3-Dimensional Fast Fourier Transforms on the Cray XT4 (2009) (3)
- Modeling and Analysis for Performance and Power (2012) (3)
- Step Ring-Based Three-Dimensional Path Planning Via Graphics Processing Unit Simulation for Subtractive Three-Dimensional Printing (2017) (2)
- Analyzing and Visualizing Whole Program Architectures (2007) (2)
- Toward interactive statistical modeling (2010) (2)
- Spatter: A Tool for Evaluating Gather / Scatter Performance (2018) (2)
- Wanted: Floating-Point Add Round-off Error instruction (2016) (2)
- An interface for a self-optimizing sparse matrix kernel library (2005) (2)
- Evaluating Gather and Scatter Performance on CPUs and GPUs (2020) (2)
- Sustainable Software Development for Next-Gen Sequencing (NGS) Bioinformatics on Emerging Platforms (2013) (2)
- A distributed kernel summation framework for general‐dimension machine learning (2014) (2)
- Toward a Theory of Algorithm-Architecture Co-design (2012) (2)
- Architectural Visualization of C/C++ Source Code for Program Comprehension (2006) (1)
- Communication-avoiding kernel ridge regression on parallel and distributed systems (2021) (1)
- Proceedings of the First International Workshop on Post Moore ' s Era Supercomputing (2016) (1)
- An Energy-Efficient Single-Source Shortest Path Algorithm (2018) (1)
- Nimble GNN Embedding with Tensor-Train Decomposition (2022) (1)
- Communication-Optimal Parallel N-body Solvers (2012) (1)
- POET : Parameterized Optimization for Empirical Tuning (2007) (1)
- Polyadic Regression and its Application to Chemogenomics-Supplementary Material Ioakeim Perros (2017) (1)
- CA-SVM : Communication-Avoiding Parallel Support Vector Machines on Distributed Systems (2015) (1)
- Numerical Algorithms with Tunable Parallelism (2008) (1)
- Courses in High-performance Computing for Scientists and Engineers (2012) (1)
- Characterizing Application Runtime Behavior from System Logs and Metrics (2011) (1)
- Faster parallel collision detection at high resolution for CNC milling applications (2019) (1)
- A GPU-Accelerated Freeform Surface Offsetting Method for High-Resolution Subtractive 3D Printing (Machining) (2018) (1)
- Scalable All-pairs Shortest Paths for Huge Graphs on Multi-GPU Clusters (2020) (1)
- An interface for multidimensional arrays in Arkouda (2021) (1)
- Optimizations & Bounds for Sparse Symmetric Matrix-Vector Multiply (2004) (0)
- Online model swapping for architectural simulation (2020) (0)
- Introduction for Special Issue on Autotuning (2013) (0)
- Ab Initio Molecular Dynamics (2011) (0)
- “Smarter” NICs for faster molecular dynamics: a case study (2022) (0)
- Furious . js : a Model for Offloading Compute-Intensive JavaScript Applications (2015) (0)
- Comprehending Software Architecture using a Single-View Visualization (2007) (0)
- Superfluorescence in the presence of inhomogeneousbroadening and relaxation (1997) (0)
- Message from the IISWC 2015 General Co-Chairs (2015) (0)
- HPPAC Workshop Introduction (2017) (0)
- Self-stabilizing Connected Components (2019) (0)
- Two Algorithms for Sorting On Heterogeneous Clusters (2012) (0)
- A Simple Methodology for Computing Families of Algorithms (2018) (0)
- Scalable Knowledge-Graph Analytics at 136 Petaflops/s – Data Readme (2020) (0)
- Exaflops Biomedical Knowledge Graph Analytics (2022) (0)
- P L ] 2 0 A ug 2 01 8 A Simple Methodology for Computing Families of Algorithms FLAME Working Note # 87 (2018) (0)
- Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms Permalink (2008) (0)
- ORCA: Outlier detection and Robust Clustering for Attributed graphs (2021) (0)
- Automated Performance Tuning (2011) (0)
- Is it Nemo or Dory? Fast and accurate object detection for IoT and edge devices (2021) (0)
- Parameterization and Search-space Exploitation of Loop Fusion (2005) (0)
- Max orientation coverage: efficient path planning to avoid collisions in the CNC milling of 3D objects (2020) (0)
- Unconventional wisdom in multicore computing (2010) (0)
- AAS 09-337 FAST SENSITIVITY COMPUTATIONS FOR TRAJECTORY OPTIMIZATION (2009) (0)
- ParaGraph: An application-simulator interface and toolkit for hardware-software co-design (2022) (0)
- SPARTan (2017) (0)
- The Sixth International Workshop on Automatic Performance Tuning (iWAPT2011) (2011) (0)
- Jack, The Autotuner (2022) (0)
- 2 Summary of Initial Proposal 3.1 Concurrent Collections (cnc): a New Programming Model for Hpc 3.3 Tunable " Fast-and-loose " Synchronization (2010) (0)
- Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From Georgia Tech (2022) (0)
- Performance Optimizations and Bounds for Sparse Matrix Kernels (2002) (0)
- Recovery of superfluorescence in inhomogeneously broadened systems through rapid relaxation (1997) (0)
- Algorithms and software with turnable parallelism (2010) (0)
This paper list is powered by the following services:
Other Resources About Richard Vuduc
What Schools Are Affiliated With Richard Vuduc?
Richard Vuduc is affiliated with the following schools: