James Demmel
#6,168
Most Influential Person Now
American mathematician and computer scientist
James Demmel's AcademicInfluence.com Rankings
James Demmelcomputer-science Degrees
Computer Science
#782
World Rank
#809
Historical Rank
#423
USA Rank
Numerical Analysis
#5
World Rank
#5
Historical Rank
#4
USA Rank
Parallel Computing
#13
World Rank
#13
Historical Rank
#11
USA Rank
James Demmelmathematics Degrees
Mathematics
#417
World Rank
#842
Historical Rank
#189
USA Rank
Linear Algebra
#1
World Rank
#1
Historical Rank
#1
USA Rank
Measure Theory
#467
World Rank
#663
Historical Rank
#195
USA Rank
Download Badge
Computer Science Mathematics
James Demmel's Degrees
- PhD Computer Science Stanford University
- Masters Computer Science Stanford University
- Bachelors Mathematics California Institute of Technology
Similar Degrees You Can Earn
Why Is James Demmel Influential?
(Suggest an Edit or Addition)According to Wikipedia, James Weldon Demmel Jr. is an American mathematician and computer scientist, the Dr. Richard Carl Dehmel Distinguished Professor of Mathematics and Computer Science at the University of California, Berkeley.
James Demmel's Published Works
Published Works
- Applied Numerical Linear Algebra (1997) (3157)
- Templates for the Solution of Algebraic Eigenvalue Problems (2000) (1586)
- IEEE Standard for Floating-Point Arithmetic (2008) (1446)
- LAPACK Users' Guide, Third Edition (1999) (1243)
- Benchmarking GPUs to tune dense linear algebra (2008) (919)
- ScaLAPACK Users' Guide (1987) (914)
- A Supernodal Approach to Sparse Partial Pivoting (1999) (873)
- Optimization of sparse matrix-vector multiplication on emerging multicore platforms (2007) (826)
- An updated set of basic linear algebra subprograms (BLAS) (2002) (719)
- LAPACK Users' Guide, 3rd ed. (1999) (669)
- SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems (2003) (657)
- A view of the parallel computing landscape (2009) (653)
- LAPACK: A portable linear algebra library for high-performance computers (1990) (616)
- Large Batch Optimization for Deep Learning: Training BERT in 76 minutes (2019) (577)
- OSKI: A Library of Automatically Tuned Sparse Matrix Kernels (2005) (556)
- Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology (1997) (495)
- ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance (1995) (466)
- Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects (2009) (460)
- Jacobi's Method is More Accurate than QR (1989) (444)
- Accurate Singular Values of Bidiagonal Matrices (1990) (400)
- Communication-optimal Parallel and Sequential QR and LU Factorizations (2008) (395)
- ImageNet Training in Minutes (2017) (365)
- ScaLAPACK user's guide (1997) (293)
- On condition numbers and the distance to the nearest ill-posed problem (2015) (293)
- Automatic performance tuning of sparse matrix kernels (2003) (288)
- SuperLU Users'' Guide (1997) (287)
- Fast $\ell_1$ -SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime (2012) (278)
- Precimonious: Tuning assistant for floating-point precision (2013) (272)
- An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination (1997) (261)
- Minimizing Communication in Numerical Linear Algebra (2009) (241)
- Design, implementation and testing of extended and mixed precision BLAS (2000) (240)
- Fast linear algebra is stable (2006) (222)
- Computing the Singular Value Decomposition with High Relative Accuracy (1997) (219)
- The generalized Schur decomposition of an arbitrary pencil A–λB—robust software with error bounds and applications. Part I: theory and algorithms (1993) (217)
- Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms (2011) (216)
- LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs (2008) (213)
- LAPACK Users' guide (third ed.) (1999) (210)
- Computing accurate eigensystems of scaled diagonally dominant matrices: LAPACK working note No. 7 (1988) (207)
- Minimizing Polynomials via Sum of Squares over the Gradient Ideal (2004) (194)
- Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology (1997) (190)
- The Probability That a Numerical, Analysis Problem Is Difficult (2013) (161)
- Parallel numerical linear algebra (1993) (159)
- A massively parallel tensor contraction framework for coupled-cluster computations (2014) (158)
- Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply (2002) (154)
- Avoiding communication in sparse matrix computations (2008) (150)
- Minimizing communication in sparse matrix solvers (2009) (149)
- The generalized Schur decomposition of an arbitrary pencil A–λB—robust software with error bounds and applications. Part II: software and applications (1993) (147)
- An inverse free parallel spectral divide and conquer algorithm for nonsymmetric eigenproblems (1997) (146)
- Error bounds from extra-precise iterative refinement (2006) (135)
- Communication-optimal parallel algorithm for strassen's matrix multiplication (2012) (127)
- On swapping diagonal blocks in real Schur form (1993) (123)
- SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization (2010) (120)
- The Accurate and Efficient Solution of a Totally Positive Generalized Vandermonde Linear System (2005) (118)
- Statistical Models for Empirical Search-Based Performance Tuning (2004) (118)
- When cache blocking of sparse matrix vector multiply works and why (2007) (116)
- Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication (2011) (114)
- Communication lower bounds and optimal algorithms for numerical linear algebra*† (2014) (112)
- Reducing BERT Pre-Training Time from 3 Days to 76 Minutes (2019) (109)
- The Componentwise Distance to the Nearest Singular Matrix (1992) (108)
- Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Part I (1993) (107)
- Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication (2013) (107)
- On computing givens rotations reliably and efficiently (2002) (106)
- Performance and Accuracy of LAPACK's Symmetric Tridiagonal Eigensolvers (2008) (102)
- An American National Standard- IEEE Standard for Binary Floating-Point Arithmetic (1985) (102)
- On a Block Implementation of Hessenberg Multishift QR Iteration (1989) (101)
- Communication-Avoiding QR Decomposition for GPUs (2011) (100)
- Communication optimal parallel multiplication of sparse random matrices (2013) (98)
- Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions (2013) (96)
- Making Sparse Gaussian Elimination Scalable by Static Pivoting (1998) (92)
- Graph expansion and communication costs of fast matrix multiplication (2012) (90)
- The dimension of matrices (matrix pencils) with given Jordan (Kronecker) canonical forms (1995) (89)
- Computing the Generalized Singular Value Decomposition (1993) (89)
- Accurate and Efficient Floating Point Summation (2003) (88)
- The condition number of equivalence transformations that block diagonalize matrix pencils (1983) (87)
- Modeling the benefits of mixed data and task parallelism (1995) (84)
- ScaLAPACK: A Linear Algebra Library for Message-Passing Computers (1997) (82)
- Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication (2015) (80)
- Large-batch training for LSTM and beyond (2019) (79)
- CALU: A Communication Optimal LU Factorization Algorithm (2011) (78)
- Communication Avoiding Rank Revealing QR Factorization with Column Pivoting (2015) (78)
- Representations of positive polynomials on noncompact semialgebraic sets via KKT ideals (2007) (76)
- Fast Reproducible Floating-Point Summation (2013) (76)
- Underflow and the Reliability of Numerical Software (1984) (76)
- Accurate Singular Value Decompositions of Structured Matrices (1999) (75)
- Stability of block algorithms with fast level-3 BLAS (1992) (74)
- Accurate Floating Point Summation (2002) (74)
- The bidiagonal singular value decomposition and Hamiltonian mechanics: LAPACK working note No. 11 (1989) (74)
- Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply (2004) (71)
- Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures (2013) (70)
- Stability of block LU factorization (1995) (69)
- Communication Avoiding Gaussian elimination (2008) (68)
- Fast matrix multiplication is stable (2006) (68)
- Communication-optimal Parallel and Sequential Cholesky Decomposition (2009) (67)
- The smallest perturbation of a submatrix which lowers the rank and constrained total least squares problems (1987) (66)
- Sparse Gaussian Elimination on High Performance Computers (1996) (65)
- The Parallel Computing Laboratory at U.C. Berkeley: A Research Agenda Based on the Berkeley View (2008) (65)
- Scaling Deep Learning on GPU and Knights Landing clusters (2017) (65)
- New Numerical Techniques and Tools in SUGAR for 3D MEMS Simulation (2001) (64)
- Floating-Point Precision Tuning Using Blame Analysis (2016) (63)
- Using the Matrix Sign Function to Compute Invariant Subspaces (1998) (63)
- Communication lower bounds and optimal algorithms for programs that reference arrays - Part 1 (2013) (62)
- On Floating Point Errors in Cholesky (1989) (62)
- Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds (2012) (61)
- Computing stable eigendecompositions of matrices (1986) (61)
- Parallel Symbolic Factorization for Sparse LU with Static Pivoting (2007) (58)
- Solving Sparse Linear Systems with Sparse Backward Error (2015) (58)
- Improving communication performance in dense linear algebra via topology aware collectives (2011) (57)
- The strong stability of algorithms for solving symmetric linear systems (1989) (57)
- Accurate solutions of ill-posed problems in control theory (1986) (57)
- Three methods for refining estimates of invariant subspaces (1987) (57)
- Improved error bounds for underdetermined system solvers (1993) (56)
- Balancing sparse matrices for computing eigenvalues (2000) (56)
- Sparse SOS Relaxations for Minimizing Functions that are Summations of Small Polynomials (2006) (56)
- LAPACK's user's guide (1992) (56)
- Parallel Reproducible Summation (2015) (55)
- Minimizing Communication in All-Pairs Shortest Paths (2013) (55)
- Minimizing Communication in Linear Algebra (2009) (54)
- Avoiding Communication in Nonsymmetric Lanczos-Based Krylov Subspace Methods (2013) (54)
- LAPACK: a portable linear algebra library for supercomputers (1989) (54)
- Communication-Avoiding Parallel Strassen: Implementation and performance (2012) (53)
- A Scalable Sparse Direct Solver Using Static Pivoting (1999) (53)
- Accurate SVDs of weakly diagonally dominant M-matrices (2004) (53)
- Communication-avoiding parallel and sequential QR factorizations (2008) (53)
- Anchor loss simulation in resonators (2005) (51)
- 100-epoch ImageNet Training with AlexNet in 24 Minutes (2017) (51)
- Computing Connecting Orbits via an Improved Algorithm for Continuing Invariant Subspaces (2000) (51)
- Faster numerical algorithms via exception handling (1993) (51)
- The Spectral Decomposition of Nonsymmetric Matrices on Distributed Memory Parallel Computers (1997) (50)
- Accurate and efficient expression evaluation and linear algebra (2007) (49)
- On computing accurate singular values and eigenvalues of matrices with acyclic graphs (1992) (49)
- Reconstructing Householder Vectors from Tall-Skinny QR (2014) (49)
- Unconstrained Energy Functionals for Electronic Structure Calculations (1998) (48)
- Cache efficient bidiagonalization using BLAS 2.5 operators (2008) (47)
- Avoiding Communication in Computing Krylov Subspaces (2007) (47)
- On computing condition numbers for the nonsymmetric eigenproblem (1993) (46)
- Addressing the needs of complex MEMS design (2002) (46)
- Trading Off Parallelism and Numerical Stability (1992) (46)
- Matrix factorizations at scale: A comparison of scientific data analytics in spark and C+MPI using three case studies (2016) (45)
- Statistical Models for Automatic Performance Tuning (2001) (45)
- Block LU factorization (1992) (44)
- LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version (2012) (43)
- Automatic Performance Tuning and Analysis of Sparse Triangular Solve (2002) (42)
- A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of s-Step Krylov Subspace Methods (2014) (42)
- CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems (2015) (41)
- On the correctness of some bisection-like parallel eigenvalue algorithms in floating point arithmetic. (1995) (40)
- Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies (2016) (39)
- Autotuning Sparse Matrix-Vector Multiplication for Multicore (2012) (39)
- Accurate SVDs of polynomial Vandermonde matrices involving orthonormal polynomials (2006) (38)
- Continuation of Invariant Subspaces in Large Bifurcation Problems (2008) (38)
- Global minimization of rational functions and the nearest GCDs (2006) (37)
- Communication-avoiding algorithms for linear algebra and beyond (2013) (37)
- 3D MEMS Simulation Modeling Using Modified Nodal Analysis (2001) (37)
- Extra-Precise Iterative Refinement for Overdetermined Least Squares Problems (2009) (36)
- A counterexample for two conjectures about stability (1987) (36)
- Asynchronous Parallel Greedy Coordinate Descent (2016) (36)
- Algorithm 880: A testing infrastructure for symmetric tridiagonal eigensolvers (2008) (36)
- Write-Avoiding Algorithms (2016) (35)
- Using PHiPAC to speed error back-propagation learning (1997) (35)
- Diamagnetically Levitated MEMS Accelerometers (2007) (35)
- CoSA: Scheduling by Constrained Optimization for Spatial Accelerators (2021) (35)
- A Numerical Analyst's Jordan Canonical Form. (1983) (33)
- On computing accurate singular values and eigenvalues of acyclic matrices (1992) (32)
- Accurate and efficient evaluation of Schur and Jack functions (2005) (32)
- Numerical linear algebra (1993) (32)
- Computing Stable Eigendecompositions of Matrix Pencils (2015) (31)
- Practical experience in the numerical dangers of heterogeneous computing (1997) (30)
- Fast and Accurate Floating Point Summation with Application to Computational Geometry (2004) (30)
- Tradeoffs between synchronization, communication, and computation in parallel linear algebra computations (2014) (30)
- Perfect Strong Scaling Using No Additional Energy (2013) (30)
- Prospectus for the Next LAPACK and ScaLAPACK Libraries (2006) (29)
- Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs (2019) (28)
- Efficient Reproducible Floating Point Summation and BLAS (2015) (28)
- Matrix Computations (Gene H. Golub And Charles F. van Loan) (1986) (27)
- Sugar: Advancements in a 3D Multi-domain Simulation Package for MEMS (2001) (27)
- Graph expansion and communication costs of fast matrix multiplication: regular submission (2011) (27)
- s-Step Krylov Subspace Methods as Bottom Solvers for Geometric Multigrid (2014) (27)
- Performance Modeling and Analysis of Cache Blocking in Sparse Matrix Vector Multiply (2004) (26)
- Practical Techniques for Measuring MEMS Properties (2004) (26)
- Models and Scheduling Algorithms for Mixed Data and Task Parallel Programs (1997) (26)
- Basic Linear Algebra Subprograms (BLAS) (2011) (25)
- Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems (2018) (25)
- The geometry of III-conditioning (1987) (23)
- Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication (2012) (23)
- The PHiPAC v1.0 Matrix-Multiply Distribution (1998) (23)
- On the Complexity of Computing Error Bounds (2001) (22)
- Using GPUs to Accelerate the Bisection Algorithm for Finding Eigenvalues of Symmetric Tridiagonal Matrices (2007) (22)
- Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting (2018) (21)
- Minimizing Communication for Eigenproblems and the Singular Value Decomposition (2010) (21)
- Inverse Free Parallel Spectral Divide and Conquer Algorithms for (1994) (21)
- GPTune: multitask learning for autotuning exascale applications (2021) (21)
- Avoiding Communication in Two-Sided Krylov Subspace Methods (2011) (21)
- A Data Broker for Distributed Computing Environments (2001) (21)
- LAPACK Working Note #5 : Provisional Contents (1988) (20)
- Accurate SVDs of Structured Matrices (1998) (20)
- Computing small singular values of bidiagonal matrices with guaranteed high relative accuracy: LAPACK working note number 3 (1988) (20)
- Programming tools and environments (1998) (20)
- The Optimized Sparse Kernel Interface (OSKI) Library User's Guide for Version 1.0.1h (2007) (20)
- Stably Computing the Kronecker Structure and Reducing Subspaces of Singular Pencils A-λ for Uncertain Data (1986) (20)
- Electro micro-metrology (2005) (20)
- Communication-Optimal Convolutional Neural Nets (2018) (20)
- Communication costs of Strassen's matrix multiplication (2014) (19)
- Communication-Avoiding Symmetric-Indefinite Factorization (2014) (19)
- Implementing Communication-Optimal Parallel and Sequential QR Factorizations (2008) (19)
- The inherent inaccuracy of implicit tridiagonal QR (1992) (18)
- Numerical Reproducibility and Accuracy at ExaScale (2013) (18)
- Code Generators for Automatic Tuning of Numerical Kernels: Experiences with FFTW (2000) (18)
- Avoiding Communication in Successive Band Reduction (2015) (18)
- Communication efficient gaussian elimination with partial pivoting using a shape morphing data layout (2013) (18)
- TORCH Computational Reference Kernels - A Testbed for Computer Science Research (2010) (18)
- The Performance of Finding Eigenvalues and Eigenvaectors of Dense Symmetric Matrices on Distributed Memory Computers (1995) (18)
- Model Reduction for RF MEMS Simulation (2004) (18)
- Minimum Ellipsoid Bounds for Solutions of Polynomial Systems via Sum of Squares (2004) (17)
- Percu: a holistic method for evaluating high performance computing systems (2008) (17)
- Preconditioning sparse matrices for computing eigenvalues and solving linear systems of equations (2001) (17)
- Exploiting Data Sparsity in Parallel Matrix Powers Computations (2013) (17)
- Memory Hierarchy Optimizations and Performance ounds for Sparse A (2003) (17)
- Simulation tools for damping in high frequency resonators (2005) (17)
- Performance Optimizations and Bounds for Sparse Symmetric Matrix-Multiple Vector Multiply (1985) (17)
- Execution time of symmetric eigensolvers (1997) (17)
- LAPACK 2005 Prospectus: Reliable and Scalable Software for Linear Algebra Computations on High End Computers (2005) (17)
- Instrumenting Linear Algebra Energy Consumption via On-chip Energy Counters (2012) (17)
- ImageNet Training in 24 Minutes (2017) (17)
- Memory Hierarchy Optimizations and Performance Bounds for Sparse A T Ax (2003) (16)
- Accuracy of the s-Step Lanczos Method for the Symmetric Eigenproblem in Finite Precision (2015) (16)
- A Fast and Stable Nonsymmetric Eigensolver for Certain Structured Matrices (2005) (16)
- Practical Experience in the Dangers of Heterogeneous Computing (1996) (16)
- Document for the Basic Linear Algebra Subprograms (BLAS) standard: BLAS Technical Forum (2001) (16)
- Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures (2013) (15)
- A preliminary analysis of Cyclops Tensor Framework (2012) (15)
- Prospectus for the Development of a Linear Algebra Library for High-Performance Computers (1997) (15)
- Communication avoiding algorithms (2012) (14)
- Matrix Computations; Second Edition (Gene Golub and Charles F. Van Loan) (1990) (14)
- LAPACK Working Note 70: On the Correctness of Parallel Bisection in Floating Point (1994) (14)
- Trade-Offs Between Synchronization, Communication, and Computation in Parallel Linear Algebra Computations (2016) (14)
- Simple Fabrication Process for Self-Aligned, High-Performance Microscanners— Demonstrated Use to Generate a 2-D Ablation Pattern (2007) (13)
- Toward accurate polynomial evaluation in rounded arithmetic (2005) (13)
- Design and Implementation of a Communication-Optimal Classifier for Distributed Kernel Support Vector Machines (2017) (13)
- Brief announcement: communication bounds for heterogeneous architectures (2011) (13)
- Open problems in numerical linear algebra (1992) (13)
- A Project for Developing a Linear Algebra Library for High-Performance Computers (1989) (13)
- LAPACK Working Note 9: A test matrix generation suite (1989) (12)
- Avoiding communication in primal and dual block coordinate descent methods (2016) (12)
- Brief announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem (2010) (12)
- Bifurcation Analysis of Large Equilibrium Systems in Matlab (2005) (12)
- Installation Guide for ScaLAPACK (1992) (12)
- Common Issues (2000) (12)
- Effects of Underflow on Solving Linear Systems (1983) (11)
- A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue Problem (2016) (11)
- 3. Linear Least Squares Problems (1997) (11)
- Communication avoiding successive band reduction (2012) (11)
- LAPACK Working Note 39: On Designing Portable High Performance Numerical Libraries (1991) (11)
- On Holder-Brascamp-Lieb inequalities for torsion-free discrete Abelian groups (2015) (11)
- AN EFFICIENT DEFLATION TECHNIQUE FOR THE COMMUNICATION- AVOIDING CONJUGATE GRADIENT METHOD ∗ (2014) (11)
- Matrix Multiplication Algorithm Selection with Support Vector Machines (2015) (10)
- Communication-optimal parallel and sequential Cholesky decomposition: extended abstract (2009) (10)
- Templates for Linear Algebra Problems (1995) (10)
- On error analysis in arithmetic with varying relative precision (2018) (10)
- Numerical evaluation of the Communication-Avoiding Lanczos algorithm (2012) (9)
- Augmented Arithmetic Operations Proposed for IEEE-754 2018 (2018) (9)
- Tradeoffs between synchronization , communication , and work in parallel linear algebra computations (2014) (9)
- Performance of a Parallel Global Atmospheric Chemical Tracer Model (1995) (9)
- Nonnegative Diagonals and High Performance on Low-Profile Matrices from Householder QR (2009) (9)
- LAPACK Working Note 88: Efficient Computation of the Singular Value Decomposition with Applications to Least Squares Problems (1994) (9)
- Parallel Symbolic Factorization for Sparse LU Factorization with Static Pivoting (2005) (9)
- Communication-Avoiding Krylov Techniques on Graphic Processing Units (2013) (8)
- FRPA: A Framework for Recursive Parallel Algorithms (2015) (8)
- Author retrospective for optimizing matrix multiply using PHiPAC: a portable high-performance ANSI C coding methodology (2014) (8)
- Extending access to HPC skills through a blended online course (2015) (8)
- Parallelepipeds obtaining HBL lower bounds (2016) (8)
- Statistical Modeling of Feedback Data in an Automatic Tuning System (2000) (8)
- ST-HEC : Reliable and Scalable Software for Linear Algebra Computations on High End Computers (8)
- Structured and parameter-dependent eigensolvers for simulation-based design of resonant mems (2006) (8)
- FAST, MEMS-BASED, PHASE-SHIFTING INTERFEROMETER 1 (2006) (8)
- LAPACK: A Linear Algebra Library for High-Performance Computers (1992) (8)
- Communication Lower Bounds for Tensor Contraction Algorithms (2015) (7)
- On structured singular values (2018) (7)
- Contracting Symmetric Tensors Using Fewer Multiplications (2015) (7)
- FAST , MEMS-BASED , PHASE-SHIFTING INTERFEROMETER (2006) (7)
- CA-SVM : Communication-Avoiding Support Vector Machines on Clusters (2016) (7)
- LAPACK for Distributed Memory Architectures: The Next Generation (1993) (7)
- LAPACK Working Note 86: The Performance of Finding Eigenvalues and Eigenvectors of Dense Symmetric Matrices on Distributed Memory Computers (1994) (7)
- The Limit of the Batch Size (2020) (7)
- Steve Smale and the Geometry of Ill-Conditioning (1993) (7)
- 2. Iterative Methods (1994) (7)
- Reproducible Tall-Skinny QR (2015) (7)
- Avoiding Communication in Proximal Methods for Convex Optimization Problems (2017) (7)
- Accurate and efficient computations with structured matrices (2002) (6)
- Algorithms for Efficient Reproducible Floating Point Summation (2020) (6)
- Chapter 9 Communication Avoiding ( CA ) and Other Innovative Algorithms (2013) (6)
- The dangers of heterogeneous network computing: heterogeneous networks considered harmful (1996) (6)
- Multitask and Transfer Learning for Autotuning Exascale Applications (2019) (6)
- The Complexity of Accurate Floating Point Computation (2003) (6)
- A TESTING INFRASTRUCTURE FOR LAPACK ’ S SYMMETRIC EIGENSOLVERS (2007) (6)
- LAPACK User's Guide / E. Anderson ... (1999) (6)
- Automatic Performance Tuning for the Multi-section with Multiple Eigenvalues Method for Symmetric Tridiagonal Eigenproblems (2006) (6)
- Runtime Data Layout Scheduling for Machine Learning Dataset (2017) (6)
- A Principled Kernel Testbed for Hardware/Software Co-Design Research (2010) (6)
- Shape Optimization of Transfer Functions (2004) (6)
- Rethinking the Value of Asynchronous Solvers for Distributed Deep Learning (2020) (5)
- LAPACK Working Note 26: Prospectus for an Extension to LAPACK: A Portable Linear Algebra Library for High-Performance Computers (1990) (5)
- A simple process to fabricate self-aligned, high-performance torsional microscanners; demonstrated use in a two-dimensional scanner (2005) (5)
- Sequential Communication Bounds for Fast Linear Algebra (2012) (5)
- Accurate and Ecient Algorithms for Floating Point Computation (2003) (5)
- Communication-Optimal Tilings for Projective Nested Loops with Arbitrary Bounds (2020) (5)
- A Generalized Randomized Rank-Revealing Factorization (2019) (4)
- Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem (2020) (4)
- Poster: Beating MKL and ScaLAPACK at Rectangular Matrix Multiplication Using the BFS/DFS Approach (2012) (4)
- A 3D Parallel Algorithm for QR Decomposition (2018) (4)
- Matrix Multiplication on Multidimensional Torus Networks (2012) (4)
- MEMS Process Characterization with an on-Chip Device (2006) (4)
- Non-Hermitian Eigenvalue Problems (2000) (4)
- An arithmetic complexity lower bound for computing rational functions, with applications to linear algebra (2013) (4)
- Error Analysis of the S-Step Lanczos Method in Finite Precision (2014) (4)
- Fast Bilinear Algorithms for Symmetric Tensor Contractions (2020) (4)
- Non-smooth Bayesian Optimization in Tuning Problems (2021) (4)
- Speeding up ImageNet Training on Supercomputers (2018) (4)
- Accelerating Time-To-Solution for Computational Science and Engineering (2009) (4)
- Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization (2017) (4)
- Accurate oating point summation (4)
- Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour (2020) (3)
- Communication-optimal iterative methods (2009) (3)
- An improved analysis and unified perspective on deterministic and randomized low rank matrix approximations (2019) (3)
- ImageNet Training by CPU: AlexNet in 11 Minutes and ResNet-50 in 48 Minutes (2017) (3)
- LAPACK Working Note 93 Installation Guide for ScaLAPACK1 (1995) (3)
- Automatic Performance Tuning for the Multi-section with Multiple Eigenvalues Method for the Symmetric Eigenproblem (2006) (3)
- Architecting an autograder for parallel code (2014) (3)
- Programming Tools (1998) (3)
- A Kernel Testbed for Parallel Architecture, Language, and Performance Research (2010) (3)
- The Castle Project (2000) (3)
- Fast LSTM by dynamic decomposition on cloud and distributed systems (2020) (3)
- The Parallel Computing Laboratory at U . C . (2008) (3)
- LAPACK Working Note 60 , UT CS-93-192 Parallel numerical linear algebra (1997) (3)
- LAPACK Working Note 23: Improved Error Bounds for Underdetermined System Solvers (1990) (3)
- Communication Lower Bounds of Bilinear Algorithms for Symmetric Tensor Contractions (2017) (3)
- LAPACK Working Note 53: Trading Off Parallelism and Numerical Stability (1992) (3)
- Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems (2018) (2)
- Network Topologies and Inevitable Contention (2016) (2)
- Prospectus for a Dense Linear Algebra Software Library (2007) (2)
- Gram-Schmidt Adaptive Algorithms. (1980) (2)
- 4. Related Issues (1994) (2)
- DEGAS: Dynamic Exascale Global Address Space Programming Environments (2018) (2)
- Towards Optimal Petascale Simulations (2013) (2)
- Auto-Precision Scaling for Distributed Deep Learning (2019) (2)
- A New Algorithm for the Symmetric Tridiagonal Eigenvalue Problem (1993) (2)
- Contention Bounds for Combinations of Computation Graphs and Network Topologies. (2014) (2)
- Analysis of the Finite Precision s-Step Biconjugate Gradient Method (2014) (2)
- Parallel and Communication Avoiding Least Angle Regression (2019) (2)
- LAPACK Working Note 112: Practical Experience in the Dangers ofHeterogeneous Computing (1996) (2)
- REPRESENTATION OF NON-NEGATIVE POLYNOMIALS VIA THE KKT IDEALS (2010) (2)
- Techniques for the automatic debugging of scientific floating-point programs (2010) (2)
- An interface for a self-optimizing sparse matrix kernel library (2005) (2)
- Providing a supported online course on parallel computing (2013) (2)
- L ARGE B ATCH O PTIMIZATION FOR D EEP L EARNING : T RAINING BERT IN 76 MINUTES (2020) (1)
- Enhancing Autotuning Capability with a History Database (2021) (1)
- Enhancing Scalability of Sparse Direct Methods (2007) (1)
- On the Conditioning of the Nonsymmetric Eigenproblem: Theory and Software (2015) (1)
- LAPACK Working Note 103: A Supernodal Approach to Sparse Partial Pivoting (1995) (1)
- On Parallel Numerical Software Libraries (1997) (1)
- Vision-based teleoperation of a stroboscopic microscopic interferownetric system for remote dynamic MEMS testing (2005) (1)
- The Condition Number of Similarities that Diagonalize Matrices (1983) (1)
- Reconstructing Householder Vectors from Tall-Skinny QR October 26 , 2013 (2013) (1)
- LAPACK Working Note 91: The Spectral Decomposition of Nonsymmetric Matrices on Distributed Memory Parallel Computers (1995) (1)
- Graph Expansion and Communication Costs of Algorithms (2010) (1)
- Bounds for Heterogeneous Architectures (2011) (1)
- CA-SVM : Communication-Avoiding Parallel Support Vector Machines on Distributed Systems (2015) (1)
- A Brief Tour of Eigenproblems (2000) (1)
- The SVD, Eigenproblem, and Invariant Subspaces: Algorithms (2010) (1)
- Communication-avoiding kernel ridge regression on parallel and distributed systems (2021) (1)
- LAPACK Working Note 14 On Floating Point Errors in CholeskyJames (1)
- Dynamic scaling for low-precision learning (2021) (1)
- Implementing a Collaborative Online Course to Extend Access to HPC Skills (2016) (1)
- Distributed-Memory Sparse Kernels for Machine Learning (2022) (1)
- Minimizing Polynomials Over Semialgebraic Sets (2005) (1)
- An efficient algorithm for locating and continuing connecting orbits (1997) (1)
- 2. Linear Equation Solving (1997) (1)
- Conference: Three Decades of Numerical Linear Algebra at Berkeley (1993) (1)
- Toward accurate polynomial evaluation in rounded arithmetic (short report) (2005) (1)
- Communication bounds for convolutional neural networks (2022) (1)
- Non-Negative Diagonals and High Performance on Low-Prole (2008) (1)
- Fast LSTM Inference by Dynamic Decomposition on Cloud Systems (2019) (1)
- Singular Value Decomposition (2000) (1)
- A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of Communication-Avoiding Krylov Subspace Methods (2012) (1)
- Architecture-Aware Algorithms for Scalable Performance and Resilience on Heterogeneous Architectures (2013) (1)
- Reconstructing Householder Vectors from Tall-Skinny QR Grey (2013) (1)
- Computing the Singular Value Decompositionwith High Relative AccuracyLAPACK Working Note 119 , CS-97-348 (1997) (1)
- State Space Search (2011) (1)
- 7. Iterative Methods for Eigenvalue Problems (1997) (1)
- Dynamical Aspects of the Bidiagonal Singular Value Decomposition (1991) (1)
- Performance Tuning of Matrix Triple Products Based on Matrix Structure (2004) (1)
- Accurate and efficient expression evaluation and linear algebra, or why it can be easier to compute accurate eigenvalues of a Vandermonde matrix than the accurate sum of 3 numbers (2012) (1)
- GPTune (2021) (1)
- Lawrence Berkeley National Laboratory Recent Work Title Matrix Factorizations at Scale : a Comparison of Scientific Data Analytics in Spark and C + MPI Using Three Case Studies : Permalink (2016) (0)
- An arithmetic complexity lower bound for computing rational functions , with applications to structured and sparse linear algebra (2018) (0)
- Bounds for Heterogeneous Architectures ( Regular Submission ) (2011) (0)
- Communication-Avoiding Optimization of Geometric Multigrid on GPUs (2012) (0)
- 3. Performance of LAPACK (1999) (0)
- 5. Documentation and Software Conventions (1999) (0)
- Communication Avoiding Symmetric Band Reduction (2012) (0)
- Rethinking algorithms for future architectures: Communication-avoiding algorithms (2011) (0)
- Making sparse matrix computations scalable (invited talk abstract) (1999) (0)
- Performance of the Symmetric Eigenproblem (2013) (0)
- An intervalal gorithm for solving systems of linear equations to prespecified accuracy (1985) (0)
- Distributed-Memory Randomized Algorithms for Sparse Tensor CP Decomposition (2022) (0)
- [2] Edward Beltrami, Mathematical Models for Society and Biology, Academic (0)
- Matrix Computations and Scientific Computing Seminar Organizer : (2017) (0)
- Toolboxes and Templates for Large Scale Linear Algebra Problems (2002) (0)
- 6. Accuracy and Stability (1997) (0)
- 6. Installing LAPACK Routines (1999) (0)
- Optimizations & Bounds for Sparse Symmetric Matrix-Vector Multiply (2004) (0)
- Talk Abstracts Jack (2021) (0)
- Effect on Run-time Auto-tuning for the Multi-section with Multiple Eigenvalues Method (2006) (0)
- Avoiding Gaussian Elimination (2008) (0)
- Program Correctness, Verification and Testing for Exascale (Corvette) (2018) (0)
- 3. Contents of ScaLAPACK (1997) (0)
- 4. Nonsymmetric Eigenvalue Problems (1997) (0)
- 2. Getting Started with ScaLAPACK (1997) (0)
- Fast Exact Leverage Score Sampling from Khatri-Rao Products with Applications to Tensor Decomposition (2023) (0)
- REVIEWS AND DESCRIPTIONS OF TABLES AND BOOKS (1990) (0)
- Fe b 20 05 Minimizing Polynomials Over Semialgebraic Sets ∗ (2005) (0)
- Server Farm (2011) (0)
- An Interval Algorithm for Solving Systems of Linear Equations to (1983) (0)
- Computing Accurate Eigensystems f Scaled Diagonally Dominant Matrices ) ( Appeared in (1980) (0)
- 5. The Symmetric Eigenproblem and Singular Value Decomposition (1997) (0)
- Final Report for UC Berkeley Terascale Optimal PDE Solvers TOPS DOE Award Number DE-FC02-01ER25478 9/15/2001 – 9/14/2006 (2007) (0)
- Randomized Numerical Linear Algebra : A Perspective on the Field With an Eye to Software (2023) (0)
- Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms Permalink (2008) (0)
- 7 Acknowledgements (2007) (0)
- Lower Bounds for Tensor Contraction Algorithms (2015) (0)
- LAPACK Working Note UT CS Parallel numerical linear algebra (1997) (0)
- On Statisti al Models in Automati TuningRi hard (2007) (0)
- 4. Accuracy and Stability (1999) (0)
- 5. Remaining Topics (1994) (0)
- Final Report from The University of Texas at Austin for DEGAS: Dynamic Global Address Space programming environments (2018) (0)
- 2016 Dense Linear Algebra Software Packages Survey (2016) (0)
- LAPACK Working Note 46: Computing the Generalized Singular Value Decomposition (1992) (0)
- Automatic Performance Tuning of Sparse Matrix-Multiple Vector Multiply (2002) (0)
- 5. Performance of ScaLAPACK (1997) (0)
- Supernode Partitioning (2011) (0)
- LAPACK Working Note 93: Installation Guide for ScaLAPACK (VERSION 1.0) (1995) (0)
- Avoiding Communication in Numerical Linear Algebra (2011) (0)
- Communication on networks of finite automata: three instances of wormhole routing (1998) (0)
- 4. Data Distributions and Software Conventions (1997) (0)
- Reconstructing householder vectors from TSQR. (2014) (0)
- Number 7 (2017) (0)
- Reproducible Parallel Floating-Point Computations (2012) (0)
- Accurate Solution to Elliptic Eigenvalue Problems using Finite Elements (2005) (0)
- Avoiding Communication in Logistic Regression (2020) (0)
- Proposed Consistent Exception Handling for the BLAS and LAPACK (2022) (0)
- 2. Contents of LAPACK95 (1999) (0)
- Nearly Optimal Block-Jacobi Preconditioning (2023) (0)
- High Productivity Computing Systems (HPCS) Library Study Effort (2008) (0)
- 83% ImageNet Accuracy in One Hour (2020) (0)
- Computation of the Singular Value Decomposition with Applications to Least Squares Problems (1997) (0)
- Theory and Numerics of Matrix Eigenvalue Problems (0)
- Performance Optimizations and Bounds for Sparse Matrix Kernels (2002) (0)
- 6. Iterative Methods for Linear Systems (1997) (0)
- Avoiding Symmetric Band Reduction (2012) (0)
- Special-Purpose Machines (2011) (0)
This paper list is powered by the following services:
Other Resources About James Demmel
What Schools Are Affiliated With James Demmel?
James Demmel is affiliated with the following schools: