# Jack Dongarra

American computer scientist

## Jack Dongarra's AcademicInfluence.com Rankings

## Download Badge

Computer Science

## Jack Dongarra's Degrees

- Bachelors Mathematics Chicago State University

## Similar Degrees You Can Earn

## Why Is Jack Dongarra Influential?

(Suggest an Edit or Addition)According to Wikipedia, Jack Joseph Dongarra is an American computer scientist and mathematician. He is the American University Distinguished Professor of Computer Science in the Electrical Engineering and Computer Science Department at the University of Tennessee. He holds the position of a Distinguished Research Staff member in the Computer Science and Mathematics Division at Oak Ridge National Laboratory, Turing Fellowship in the School of Mathematics at the University of Manchester, and is an adjunct professor and teacher in the Computer Science Department at Rice University. He served as a faculty fellow at the Texas A&M University Institute for Advanced Study . Dongarra is the founding director of the Innovative Computing Laboratory at the University of Tennessee. He was the recipient of the Turing Award in 2021.

## Jack Dongarra's Published Works

### Published Works

- MPI: The Complete Reference (1996) (2781)
- Accepted for publication (1999) (2354)
- PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing (1995) (2272)
- A set of level 3 basic linear algebra subprograms (1990) (2024)
- Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation (2004) (1598)
- Templates for the Solution of Algebraic Eigenvalue Problems (2000) (1586)
- LAPACK Users' Guide, Third Edition (1999) (1243)
- Automated empirical optimizations of software and the ATLAS project (2001) (1213)
- Automatically Tuned Linear Algebra Software (1998) (1174)
- An extended set of FORTRAN basic linear algebra subprograms (1988) (1080)
- LINPACK Users' Guide (1987) (1007)
- ScaLAPACK Users' Guide (1987) (914)
- Matrix Eigensystem Routines — EISPACK Guide (1974) (876)
- Pvm 3 user's guide and reference manual (1993) (830)
- The LINPACK Benchmark: past, present and future (2003) (821)
- Top500 Supercomputer Sites (1997) (775)
- The International Exascale Software Project roadmap (2011) (735)
- A Portable Programming Interface for Performance Evaluation on Modern Processors (2000) (725)
- An updated set of basic linear algebra subprograms (BLAS) (2002) (719)
- Performance of various computers using standard linear equations software (1990) (688)
- LAPACK Users' Guide, 3rd ed. (1999) (669)
- A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures (2007) (553)
- Solving linear systems on vector and shared memory computers (1990) (535)
- Numerical Linear Algebra for High-Performance Computers (1998) (511)
- MPI - The Complete Reference: Volume 1, The MPI Core (1998) (491)
- A User''s Guide to PVM Parallel Virtual Machine (1991) (485)
- ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance (1995) (466)
- Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects (2009) (460)
- NetSovle: A Network Server for Solving Computational Science Problems (1996) (447)
- Special Issue on Program Generation, Optimization, and Platform Adaptation (2005) (436)
- Towards dense linear algebra for hybrid GPU accelerated manycore systems (2009) (436)
- Matrix Eigensystem Routines — EISPACK Guide Extension (1977) (427)
- Exascale computing and big data (2015) (413)
- ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers (1992) (407)
- Sourcebook of parallel computing (2003) (401)
- Performance of various computers using standard linear equations software (1990) (384)
- The GrADS Project: Software Support for High-Level Grid Application Development (2001) (377)
- The PVM Concurrent Computing System: Evolution, Experiences, and Trends (1994) (375)
- FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World (2000) (370)
- Netsolve: a Network-Enabled Server for Solving Computational Science Problems (1997) (365)
- DAGuE: A Generic Distributed DAG Engine for High Performance Computing (2011) (358)
- From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming (2012) (357)
- Collecting Performance Data with PAPI-C (2009) (331)
- Distributed and Cloud Computing: From Parallel Processing to the Internet of Things (2011) (327)
- The HPC Challenge (HPCC) benchmark suite (2006) (322)
- Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programs (1990) (309)
- Performance analysis of MPI collective operations (2005) (298)
- New Grid Scheduling and Rescheduling Methods in the GrADS Project (2004) (298)
- Guest Editors Introduction to the top 10 algorithms (2000) (298)
- ScaLAPACK user's guide (1997) (293)
- Matrix Market: a web resource for test matrix collections (1996) (281)
- Overview of GridRPC: A Remote Procedure Call API for Grid Computing (2002) (269)
- Introduction to the HPC Challenge Benchmark Suite (2005) (266)
- A fully parallel algorithm for the symmetric eigenvalue problem (1985) (263)
- Dense linear algebra solvers for multicore with GPU accelerators (2010) (259)
- Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine (1984) (258)
- Chebyshev tau-QZ algorithm methods for calculating spectra of hydrodynamic stability problems (1995) (258)
- Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines (1994) (254)
- A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters (2000) (252)
- Toward a New Metric for Ranking High Performance Computing Systems (2013) (239)
- Self-Adapting Linear Algebra Algorithms and Software (2005) (234)
- Condition Numbers of Gaussian Random Matrices (2005) (227)
- Computational Science — ICCS 2003 (2003) (226)
- Algorithm 656: an extended set of basic linear algebra subprograms: model implementation and test programs (1988) (210)
- LAPACK Users' guide (third ed.) (1999) (210)
- PaRSEC: Exploiting Heterogeneity to Enhance Scalability (2013) (207)
- Distribution of mathematical software via electronic mail (1985) (207)
- Computational Science - ICCS 2007, 7th International Conference, Beijing, China, May 27 - 30, 2007, Proceedings, Part III (2007) (205)
- Algorithm-based fault tolerance applied to high performance computing (2009) (199)
- Computational Science - ICCS 2005, 5th International Conference, Atlanta, GA, USA, May 22-25, 2005, Proceedings, Part I (2005) (199)
- Integrated Pvm Framework Supports Heterogeneous Network Computing (1993) (198)
- Hierarchical Data Format (2011) (197)
- A Proposal for a Set of Parallel Basic Linear Algebra Subprograms (1995) (196)
- Performance of various computers using standard linear equations software in a Fortran environment (1987) (193)
- The LINPACK Benchmark: An Explanation (1988) (193)
- Parallel tiled QR factorization for multicore architectures (2007) (192)
- Accelerating scientific computations with mixed precision algorithms (2008) (189)
- Automatic Blocking of Nested Loops (1990) (186)
- A Note on Auto-tuning GEMM for GPUs (2009) (185)
- Computational Science — ICCS 2002 (2002) (184)
- Computational Science: Ensuring America's Competitiveness (2005) (183)
- An Improved Magma Gemm For Fermi Graphics Processing Units (2010) (183)
- Automatically Tuned Collective Communications (2000) (178)
- Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers (1993) (177)
- Distribution of mathematical software via electronic mail (1987) (164)
- Post-failure recovery of MPI communication capability (2013) (163)
- Experiments with Scheduling Using Simulated Annealing in a Grid Environment (2002) (161)
- A message passing standard for MPP and workstations (1996) (161)
- Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems) (2006) (159)
- Squeezing the most out of an algorithm in CRAY FORTRAN (1984) (158)
- A Test Matrix Collection for Non-Hermitian Eigenvalue Problems (1997) (154)
- Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems (2007) (150)
- Block reduction of matrices to condensed forms for eigenvalue computations (1990) (148)
- Software Libraries for Linear Algebra Computations on High Performance Computers (1995) (145)
- QUARK Users' Guide: QUeueing And Runtime for Kernels (2011) (145)
- Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems (2007) (144)
- Self adaptivity in Grid computing (2005) (144)
- Visualization and debugging in a heterogeneous environment (1993) (143)
- Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers (2018) (142)
- Graphical development tools for network-based concurrent supercomputing (1991) (142)
- An evaluation of User-Level Failure Mitigation support in MPI (2012) (141)
- Toward a framework for preparing and executing adaptive grid programs (2002) (139)
- Algorithm-based fault tolerance for dense matrix factorizations (2012) (138)
- Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA (2011) (138)
- The Impact of Multicore on Math Software (2006) (136)
- Pvm: A Users' Guide and Tutorial for Network Parallel Computing (1994) (136)
- Unrolling loops in fortran (1979) (130)
- Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization (2008) (130)
- A proposal for a set of level 3 basic linear algebra subprograms (1987) (129)
- Users' Guide to NetSolve v1.4.1 (2002) (127)
- Algorithm-Based Fault Tolerance for Fail-Stop Failures (2008) (122)
- Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems (2009) (121)
- Fault tolerant high performance computing by a coding approach (2005) (120)
- Using PAPI for Hardware Performance Monitoring on Linux Systems (2001) (119)
- Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community (2011) (118)
- QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators (2011) (117)
- Applied Mathematics Research for Exascale Computing (2014) (115)
- A performance oriented migration framework for the grid (2003) (114)
- An Introduction to the MPI Standard (1995) (114)
- A Proposal for a User-Level, Message-Passing Interface in a Distributed Memory Environment (1993) (112)
- Scheduling workflow applications on processors with different capabilities (2006) (109)
- Vectorizing compilers: a test suite and results (1988) (109)
- A metascheduler for the Grid (2002) (107)
- SRS: A Framework for Developing Malleable and Migratable Parallel Applications for Distributed Systems (2003) (106)
- Message-Passing Performance of Various Computers (1995) (106)
- Preface: Basic Linear Algebra Subprograms Technical (Blast) Forum Standard (2002) (105)
- HPCG Benchmark: a New Metric for Ranking High Performance Computing Systems∗ (2015) (105)
- Scheduling dense linear algebra operations on multicore processors (2010) (104)
- Handbook of Research on Scalable Computing Technologies (2009) (103)
- Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy (2008) (102)
- A look at scalable dense linear algebra libraries (1992) (102)
- HARNESS and fault tolerant MPI (2001) (101)
- DOE Advanced Scientific Computing Advisory Subcommittee (ASCAC) Report: Top Ten Exascale Research Challenges (2014) (101)
- High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems (2016) (101)
- Report on the Sunway TaihuLight System (2016) (101)
- SCHEDULE: Tools for developing and analyzing parallel Fortran programs (1986) (100)
- On some parallel banded system solvers (1984) (100)
- Performance, Design, and Autotuning of Batched GEMM for GPUs (2016) (99)
- Autotuning GEMM Kernels for the Fermi GPU (2012) (97)
- IMPROVING THE ACCURACY OF COMPUTED EIGENVALUES AND EIGENVECTORS (1983) (97)
- The Impact of Multicore on Computational Science Software (2007) (96)
- Accelerating Numerical Dense Linear Algebra Calculations with GPUs (2014) (96)
- LAPACK Working Note 94: A User''s Guide to the BLACS v1.0 (1995) (94)
- Visual programming and debugging for parallel computing (1995) (94)
- Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems (2012) (94)
- LAPACK++: A design overview of object-oriented extensions for high performance linear algebra (1993) (93)
- Automatically Tuned Linear Algebra Software (ATLAS) (2011) (93)
- Scheduling Block-Cyclic Array Redistribution (1998) (93)
- HARNESS: a next generation distributed virtual machine (1999) (91)
- Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries (2001) (91)
- Vector and Parallel Processing — VECPAR 2000 (2001) (90)
- Review of Performance Analysis Tools for MPI Parallel Programs (2001) (89)
- A Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures (1999) (89)
- Performance of various computers using standard linear equations software in a FORTRAN environment (1988) (87)
- Computer benchmarking: paths and pitfalls (1987) (87)
- Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources (2006) (87)
- Comparative study of one-sided factorizations with multiple software packages on multi-core hardware (2009) (86)
- Applying NetSolve's network-enabled server (1998) (86)
- Towards Efficient MapReduce Using MPI (2009) (85)
- Experiences and lessons learned with a portable interface to hardware performance counters (2003) (85)
- High-performance computing: clusters, constellations, MPPs, and future directions (2003) (84)
- Two Dimensional Basic Linear Algebra Communication Subprograms (1993) (83)
- The International Exascale Software Project: a Call To Cooperative Action By the Global High-Performance Community (2009) (83)
- ScaLAPACK: A Linear Algebra Library for Message-Passing Computers (1997) (82)
- Faster, Cheaper, Better { a Hybridization Methodology to Develop Linear Algebra Software for GPUs (2010) (82)
- Implementation of mixed precision in solving systems of linear equations on the Cell processor (2007) (81)
- Standards for graph algorithm primitives (2014) (80)
- CIFTS: A Coordinated Infrastructure for Fault-Tolerant Systems (2009) (79)
- Parallel matrix transpose algorithms on distributed memory concurrent computers (1993) (78)
- Recovery Patterns for Iterative Methods in a Parallel Unstable Environment (2007) (78)
- LAPACK Working Note No. 2: Block reduction of matrices to condensed forms for eigenvalue computations (1987) (77)
- Unified model for assessing checkpointing protocols at extreme‐scale (2014) (77)
- Algorithm-based diskless checkpointing for fault tolerant matrix operations (1995) (77)
- Scalability Issues Affecting the Design of a Dense Linear Algebra Library (1994) (77)
- Scientific Computing with Multicore and Accelerators (2010) (76)
- Numerical Libraries and the Grid (2001) (75)
- A Tool to Aid in the Design, Implementation, and Understanding of Matrix Algorithms for Parallel Processors (1990) (74)
- Iterative Sparse Triangular Solves for Preconditioning (2015) (74)
- Adaptive Scheduling for Task Farming with Grid Middleware (1999) (74)
- Self-Adapting Numerical Software for Next Generation Applications (2003) (74)
- Redesigning the message logging model for high performance (2010) (73)
- Big data and extreme-scale computing (2018) (73)
- Introduction to the HPCChallenge Benchmark Suite (2004) (73)
- SCOP3: A Rough Guide to Scientific Computing On the PlayStation 3 (2007) (73)
- Reduction to condensed form for the eigenvalue problem on distributed memory architectures (1992) (73)
- Matrix Eigensystem Routines - EISPACK Guide, Second Edition (1976) (72)
- The Design and Implementation of the Parallel Out-of-coreScaLAPACK LU, QR, and Cholesky Factorization Routines (1997) (71)
- A proposal for an extended set of Fortran Basic Linear Algebra Subprograms (1985) (70)
- NetSolve: Grid enabling scientific computing environments (2004) (70)
- Self-adapting software for numerical linear algebra and LAPACK for clusters (2003) (69)
- Computational Science — ICCS 2001 (2001) (69)
- Accelerating GPU Kernels for Dense Linear Algebra (2010) (69)
- Algorithmic Redistribution Methods for Block-Cyclic Decompositions (1999) (68)
- Implementing Linear Algebra Routines on Multi-core Processors with Pipelining and a Look Ahead (2006) (68)
- Fault-Tolerant Matrix Operations for Networks of Workstations Using Diskless Checkpointing (1997) (68)
- A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators (2010) (67)
- Linear algebra on high performance computers (1986) (67)
- Scheduling in the Grid application development software project (2004) (66)
- Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing (2010) (66)
- Developing numerical libraries in Java (1998) (66)
- Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor (2009) (66)
- Solving banded systems on a parallel processor (1987) (65)
- Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels (2011) (65)
- Basic Linear Algebra Subprograms Technical (Blast) Forum Standard (1) (2002) (64)
- NetSolve: Past, Present, and Future - A Look at a Grid Enabled Server (2003) (64)
- Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems (2004) (63)
- A comparison of search heuristics for empirical code optimization (2008) (63)
- Some issues in dense linear algebra for multicore and special purpose architectures (2008) (63)
- Recent Advances in the Message Passing Interface - 17th European MPI Users' Group Meeting, EuroMPI 2010, Stuttgart, Germany, September 12-15, 2010. Proceedings (2010) (63)
- PaRSEC: A programming paradigm exploiting heterogeneity for enhancing scalability (2013) (63)
- A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures (2002) (62)
- Performance of various computers using standard linear equations software in a Fortran environment (1983) (62)
- Computational Science – ICCS 2009: 9th International Conference Baton Rouge, LA, USA, May 25-27, 2009 Proceedings, Part I (2009) (62)
- End-user Tools for Application Performance Analysis Using Hardware Counters (2001) (62)
- The PlayStation 3 for High-Performance Scientific Computing (2008) (61)
- Generalized QR factorization and its applications (1992) (61)
- Scalable Networked Information Processing Environment (SNIPE) (1997) (61)
- A Step towards Energy Efficient Computing: Redesigning a Hydrodynamic Application on CPU-GPU (2014) (61)
- Autotuning in High-Performance Computing Applications (2018) (60)
- LU factorization for accelerator-based systems (2011) (60)
- Implementation of some concurrent algorithms for matrix factorization (1986) (59)
- Batched matrix computations on hardware accelerators based on GPUs (2015) (59)
- Evaluating Block Algorithm Variants in LAPACK (1989) (58)
- An algebra for cross-experiment performance analysis (2004) (58)
- Numerical Libraries And The Grid: The GrADS Experiments With ScaLAPACK (2001) (58)
- The Netlib Mathematical Software Repository (1995) (57)
- Hierarchical DAG Scheduling for Hybrid Distributed Systems (2015) (57)
- Key Concepts for Parallel Out-of-Core LU Factorization (1996) (57)
- The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems (2017) (57)
- HPC Challenge Benchmark (2011) (57)
- Robust task scheduling in non-deterministic heterogeneous computing systems (2006) (56)
- LAPACK's user's guide (1992) (56)
- Optimizing symmetric dense matrix-vector multiplication on GPUs (2011) (56)
- High Performance Computing for Computational Science (2003) (56)
- Process Fault Tolerance: Semantics, Design and Applications for High Performance Computing (2005) (55)
- Recent trends in the marketplace of high performance computing (2005) (55)
- High-performance computing systems: Status and outlook* (2012) (55)
- Investigating half precision arithmetic to accelerate dense linear system solvers (2017) (55)
- A comparative study of automatic vectorizing compilers (1991) (54)
- High-Performance Tensor Contractions for GPUs (2016) (53)
- Request Sequencing: Optimizing Communication for the Grid (2000) (53)
- MPI collective algorithm selection and quadtree encoding (2006) (53)
- Environments and Tools for Parallel Scientific Computing (1993) (53)
- Innovations of the NetSolve Grid Computing System (2002) (53)
- Building and Using a Fault-Tolerant MPI Implementation (2004) (52)
- LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU (2014) (52)
- HeNCE: graphical development tools for network-based concurrent computing (1992) (52)
- The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale (2018) (52)
- An Improved MAGMA GEMM for Fermi GPUs (2010) (52)
- Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures (2011) (52)
- High-Performance Matrix-Matrix Multiplications of Very Small Matrices (2016) (51)
- TOP500 Supercomputer sites 11/2000 (2000) (51)
- High Performance Heterogeneous Computing (2009) (51)
- Energy Footprint of Advanced Dense Numerical Linear Algebra Using Tile Algorithms on Multicore Architectures (2012) (51)
- Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs (2015) (50)
- Fully Dynamic Scheduler for Numerical Computing on Multicore Processors (2009) (50)
- Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface (1997) (50)
- The Spectral Decomposition of Nonsymmetric Matrices on Distributed Memory Parallel Computers (1997) (50)
- The design of scalable software libraries for distributed memory concurrent computers (1993) (50)
- A scalable framework for heterogeneous GPU-based clusters (2012) (49)
- Dynamic task discovery in PaRSEC: a data-flow task-based runtime (2017) (49)
- PVM: Experiences, current status and future direction (1993) (49)
- Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting (2014) (49)
- Improving the Performance of CA-GMRES on Multicores with Multiple GPUs (2014) (48)
- HPCG Technical Specification (2013) (48)
- A Parallel Algorithm for the Reduction of a Nonsymmetric Matrix to Block Upper-Hessenberg Form (1994) (48)
- Multiprocessing linear algebra algorithms on the CRAY X-MP-2: Experiences with small granularity (1984) (48)
- The marketplace of high-performance computing (1999) (47)
- EZTrace: A Generic Framework for Performance Analysis (2011) (47)
- Towards an Accurate Model for Collective Communications (2001) (47)
- A Parallel Algorithm for the Nonsymmetric Eigenvalue Problem (1993) (47)
- The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software (2008) (47)
- PTG: An Abstraction for Unhindered Parallelism (2014) (47)
- QR factorization for the Cell Broadband Engine (2009) (47)
- The design and implementation of the parallel out-of-core ScaLAPACK LU, QR, and Cholesky factorization routines (2000) (46)
- The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques (2018) (46)
- Squeezing the most out of eigenvalue solvers on high-performance computers (1986) (46)
- Trends in High Performance Computing (2004) (46)
- Algorithmic Issues on Heterogeneous Computing Platforms (1999) (46)
- The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form (1995) (46)
- LAPACK Working Note 74: A Sparse Matrix Library in C++ for High Performance Architectures (1994) (46)
- Efficient Pattern Search in Large Traces Through Successive Refinement (2004) (45)
- Adaptive precision in block‐Jacobi preconditioning for iterative sparse linear system solvers (2019) (45)
- 1 Cloud Service Reliability : Modeling and Analysis (2010) (45)
- Self-adapting numerical software (SANS) effort (2006) (45)
- A portable environment for developing parallel FORTRAN programs (1987) (44)
- HeNCE: A Heterogeneous Network Computing Environment (1994) (44)
- QR factorization of tall and skinny matrices in a grid computing environment (2009) (44)
- DARPA's HPCS Program- History, Models, Tools, Languages (2008) (43)
- PB-BLAS: a set of Parallel Block Basic Linear Algebra Subprograms (1994) (43)
- Algorithmic bombardment for the iterative solution of linear systems: a poly-iterative approach (1994) (43)
- Recent Developments in Gridsolve (2006) (43)
- Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs (2016) (43)
- Accelerating Linear System Solutions Using Randomization Techniques (2013) (42)
- Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures (2011) (42)
- PVMPI: An Integration of the PVM and MPI Systems (1996) (42)
- Limitations of the PlayStation 3 for High Performance Cluster Computing (2007) (42)
- Overview of VPE: A Visual Environment for Message-Passing Parallel Programming (1994) (42)
- Recent Enhancements To Pvm (1995) (42)
- PVM and HeNCE: tools for heterogeneous network computing (1993) (41)
- Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems (2014) (41)
- Highly Scalable Self-Healing Algorithms for High Performance Scientific Computing (2009) (41)
- Correlated Set Coordination in Fault Tolerant Message Logging Protocols (2011) (41)
- The TOP500 List and Progress in High-Performance Computing (2015) (40)
- Reasons for a pessimistic or optimistic message logging protocol in MPI uncoordinated failure, recovery (2009) (40)
- LAPACK for Distributed Memory Architectures: Progress Report (1991) (40)
- Computer benchmarking: Paths and pitfalls: The most popular way of rating computer performance can confuse as well as inform; avoid misunderstanding by asking just what the benchmark is measuring (1987) (40)
- Kernel Assisted Collective Intra-node MPI Communication among Multi-Core and Many-Core CPUs (2011) (40)
- Advanced Architecture Computers (1989) (40)
- The quest for petascale computing (2001) (40)
- A survey of numerical linear algebra methods utilizing mixed-precision arithmetic (2021) (39)
- A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures (2011) (39)
- Determining the idle time of a tiling: new results (1997) (39)
- Binomial Graph: A Scalable and Fault-Tolerant Logical Network Topology (2007) (39)
- Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs (2015) (39)
- Numerical linear algebra algorithms and software (2000) (39)
- Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems (2010) (39)
- SLATE: design of a modern distributed and accelerated linear algebra library (2019) (39)
- A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations (2015) (38)
- HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi (2015) (38)
- Recent Advances in Parallel Virtual Machine and Message Passing Interface (2005) (38)
- LAPACK Working Note 24: LAPACK Block Factorization Algorithms on the INtel iPSC/860 (1990) (38)
- Implementing a Sparse Matrix Vector Product for the SELL-C / SELL-C-σ formats on NVIDIA GPUs (2014) (38)
- Tile QR factorization with parallel panel processing for multicore architectures (2010) (38)
- Numerical Considerations in Computing Invariant Subspaces (1992) (37)
- A Proposed API for Batched Basic Linear Algebra Subprograms (2016) (37)
- Soft error resilient QR factorization for hybrid system with GPGPU (2011) (37)
- Incomplete Sparse Approximate Inverses for Parallel Preconditioning (2018) (37)
- Netlib and NA-Net: Building a Scientific Computing Community (2008) (37)
- Hierarchical QR factorization algorithms for multi-core clusters (2013) (36)
- GridSolve: The Evolution of A Network Enabled Solver (2006) (36)
- A Proposal for User-Level Failure Mitigation in the MPI-3 Standard (2012) (36)
- A Fast Batched Cholesky Factorization on a GPU (2014) (36)
- A novel hybrid CPU–GPU generalized eigensolver for electronic structure calculations based on fine-grained memory aware tasks (2012) (36)
- A new metric for ranking high-performance computing systems (2016) (35)
- Computer benchmarks (1993) (35)
- LAPPACK Working Note No. 28: The IBM RISC System/6000 and Linear Algebra Operations (1990) (35)
- Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime Environment (2014) (35)
- A Comparison of Parallel Solvers for Diagonally Dominant and General Narrow-Banded Linear Systems (1999) (35)
- Distibuted Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA (2011) (35)
- Particle Dynamics (2011) (35)
- AlgoWiki: an Open Encyclopedia of Parallel Algorithmic Features (2015) (35)
- An object oriented design for high performance linear algebra on distributed memory architectures (1993) (35)
- Basic Linear Algebra Comrnunication Subprograms (1991) (35)
- High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures (2013) (35)
- A Block-Asynchronous Relaxation Method for Graphics Processing Units (2011) (34)
- Static tiling for heterogeneous computing platforms (1999) (34)
- Solving Computational Grand Challenges Using a Network of Heterogeneous Supercomputers (1991) (34)
- Investigating power capping toward energy‐efficient scientific applications (2019) (34)
- Performance Portability of a GPU Enabled Factorization with the DAGuE Framework (2011) (34)
- LAPACK Working Note 37: Two Dimensional Basic Linear Algebra Communication Subprograms (1991) (34)
- The NetSolve environment: progressing towards the seamless grid (2000) (34)
- Can Hardware Performance Counters Produce Expected, Deterministic Results? (2010) (34)
- clMAGMA: high performance dense linear algebra with OpenCL (2014) (34)
- A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI (2012) (34)
- Preconditioned Krylov solvers on GPUs (2017) (34)
- LU Factorization with Partial Pivoting for a Multicore System with Accelerators (2013) (33)
- MPI_Connect Managing Heterogeneous MPI Applications Ineroperation and Process Control (1998) (33)
- Porting the PLASMA Numerical Library to the OpenMP Standard (2017) (33)
- Numerically Stable Real Number Codes Based on Random Matrices (2005) (33)
- Retrospect: Deterministic Replay of MPI Applications for Interactive Distributed Debugging (2007) (33)
- Using Jacobi iterations and blocking for solving sparse triangular systems in incomplete factorization preconditioning (2018) (33)
- Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems (2011) (33)
- HARNESS: Heterogeneous Adaptable Reconfigurable NEtworked SystemS (1998) (33)
- High Performance Dense Linear System Solver with Soft Error Resilience (2011) (33)
- Evaluation of the HPC Challenge Benchmarks in Virtualized Environments (2011) (33)
- Applied Parallel Computing. State of the Art in Scientific Computing, 8th International Workshop, PARA 2006, Umeå, Sweden, June 18-21, 2006, Revised Selected Papers (2007) (33)
- Logistical computing and internetworking: middleware for the use of storage in communication (2001) (32)
- Accelerating collaborative filtering using concepts from high performance computing (2015) (32)
- State-of-the-art eigensolvers for electronic structure calculations of large scale nano-systems (2008) (32)
- The eigenvalue problem for Hermitian matrices with time reversal symmetry (1984) (32)
- Parallel loops - a test suite for parallelizing compilers: description and example results (1991) (32)
- A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic (2020) (32)
- Trends in high performance computing: a historical overview and examination of future developments (2006) (32)
- QR Factorization for the CELL Processor (2008) (32)
- Tools to aid in the analysis of memory access patterns for FORTRAN programs (1988) (32)
- Dynamic Reconfiguration and Virtual Machine Management in the Harness Metacomputing System (1998) (32)
- High-performance high-resolution semi-Lagrangian tracer transport on a sphere (2011) (32)
- Accurate Cache and TLB Characterization Using Hardware Counters (2004) (31)
- A Class of Communication-avoiding Algorithms for Solving General Dense Linear Systems on CPU/GPU Parallel Machines (2012) (31)
- Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product (2015) (31)
- JLAPACK-compiling LAPACK Fortran to Java (1999) (31)
- Algorithm-Based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy (2015) (31)
- Report on the Fujitsu Fugaku System (2020) (31)
- Sunway TaihuLight supercomputer makes its appearance (2016) (31)
- Power monitoring with PAPI for extreme scale architectures and dataflow-based programming models (2014) (31)
- Process Distance-Aware Adaptive MPI Collective Communications (2011) (31)
- Implementation and Usage of the PERUSE-Interface in Open MPI (2006) (31)
- Implementation of the Mixed-Precision High Performance LINPACK Benchmark on the CELL Processor (2006) (31)
- Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency (2012) (31)
- Lapack95 users' guide (2001) (31)
- Practical experience in the numerical dangers of heterogeneous computing (1997) (30)
- A Scalable Approach to MPI Application Performance Analysis (2005) (30)
- HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters (2012) (30)
- Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems (2012) (30)
- High Performance Computing for Computational Science — VECPAR 2002 (2003) (30)
- A collection of parallel linear equations routines for the Denelcor HEP (1984) (30)
- High performance matrix inversion based on LU factorization for multicore architectures (2011) (30)
- Algorithmic Based Fault Tolerance Applied to High Performance Computing (2008) (30)
- A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs (2012) (30)
- Experimental parallel computing architectures (1987) (29)
- Performance Instrumentation and Measurement for Terascale Systems (2003) (29)
- Fast Batched Matrix Multiplication for Small Sizes Using Half-Precision Arithmetic on GPUs (2019) (29)
- The Virtual Instrument: Support for Grid-Enabled Mcell Simulations (2004) (29)
- Application-Level Tools (2004) (29)
- The design of linear algebra libraries for high performance computers (1993) (29)
- Design for a Soft Error Resilient Dynamic Task-Based Runtime (2015) (29)
- Proceedings of the International Conference on Computational Science, ICCS 2011 (2011) (29)
- Prospectus for the Next LAPACK and ScaLAPACK Libraries (2006) (29)
- Level 3 BLAS for distributed memory concurrent computers (1993) (29)
- LAPACK Working Note 41: Installation Guide for LAPACK (1992) (29)
- Distributed Probabilistic Model-Building Genetic Algorithm (2003) (29)
- Analytical modeling and optimization for affinity based thread scheduling on multicore systems (2009) (28)
- On the Convergence of Computational and Data Grids (2001) (28)
- A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction (2012) (28)
- The National HPCC Software Exchange (1995) (28)
- With Extreme Computing, the Rules Have Changed (2017) (28)
- TEG: A High-Performance, Scalable, Multi-network Point-to-Point Communications Methodology (2004) (28)
- Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures (2010) (28)
- Exploring New Architectures in Accelerating CFD for Air Force Applications (2008) (28)
- visPerf: Monitoring Tool for Grid Computing (2003) (28)
- Sparse direct solvers with accelerators over DAG runtimes (2012) (28)
- Heterogeneous MPI Application Interoperation and Process Management under PVMPI (1997) (28)
- Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-sigma formats on NVIDIA GPUs (2014) (27)
- Implementing dense linear algebra algorithms using multitasking on the CRAY X-MP-4 (or approaching the Gigaflop) (1986) (27)
- Biological sequence alignment on the computational grid using the GrADS framework (2005) (27)
- Top500 Supercomputer Sites - 13th edition (1998) (27)
- High Performance Dense Linear System Solver with Resilience to Multiple Soft Errors (2012) (27)
- Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing (2009) (26)
- Tiling with limited resources (1997) (26)
- Parallelizing the Divide and Conquer Algorithm for the SymmetricTridiagonal Eigenvalue Problem on Distributed Memory Architectures (1998) (26)
- Engineering the grid - status and perspective (2006) (26)
- Implementation in ScaLAPACK of Divide-and-Conquer Algorithms forBanded and Tridiagonal Linear Systems (1997) (26)
- Multithreading in the PLASMA Library (2014) (26)
- Harnessing the Computing Continuum for Programming Our World (2020) (26)
- Reverse Communication Interface for Linear Algebra Templates for Iterative Methods (1995) (26)
- Reducing the Amount of Pivoting in Symmetric Indefinite Systems (2011) (26)
- Towards Achieving Performance Portability Using Directives for Accelerators (2016) (25)
- Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications (2020) (25)
- Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems (2020) (25)
- Exploiting Mixed Precision Floating Point Hardware in Scientific Computations (2006) (25)
- L2 Cache Modeling for Scientific Applications on Chip Multi-Processors (2007) (25)
- Basic Linear Algebra Subprograms (BLAS) (2011) (25)
- Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs (2017) (25)
- Visual Programming and Parallel Computing (1994) (25)
- Particle Methods (2011) (25)
- Self-Adapting Numerical Software and Automatic Tuning of Heuristics (2003) (25)
- Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives (2017) (24)
- Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems (2011) (24)
- Optimizing Krylov Subspace Solvers on Graphics Processing Units (2014) (24)
- Load-balancing Sparse Matrix Vector Product Kernels on GPUs (2020) (24)
- A survey of recent developments in parallel implementations of Gaussian elimination (2015) (24)
- Hybrid Multicore Cholesky Factorization with Multiple GPU Accelerators (2010) (24)
- Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures (2010) (24)
- Scalable Fault Tolerant MPI: Extending the Recovery Algorithm (2005) (24)
- HPCG Benchmark Technical Specification (2013) (24)
- Overlapping Computation and Communication for Advection on Hybrid Parallel Computers (2011) (24)
- Open MPI's TEG Point-to-Point Communications Methodology: Comparison to Existing Implementations (2004) (24)
- Anatomy of a globally recursive embedded LINPACK benchmark (2012) (23)
- Efficient exascale discretizations: High-order finite element methods (2021) (23)
- Feedback-directed thread scheduling with memory considerations (2007) (23)
- Failure Detection and Propagation in HPC systems (2016) (23)
- Experiments with Strassen's Algorithm: From Sequential to Parallel (2006) (23)
- Libraries for linear algebra (1995) (23)
- Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi (2013) (23)
- IML++ v. 1.2 Iterative Methods Library Reference Guide | NIST (1996) (23)
- Computing the conditioning of the components of a linear least‐squares solution (2007) (23)
- Users''Guide to NetSolve V2.0 (2004) (23)
- A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures (2012) (23)
- HPC Challenge: Design, History, and Implementation Highlights (2017) (23)
- Sparse Matrix Libraries in C++ for High Performance Architectures (1997) (23)
- Composing resilience techniques: ABFT, periodic and incremental checkpointing (2015) (23)
- Matrix product on heterogeneous master-worker platforms (2008) (22)
- Fault Tolerant Communication Library and Applications for High Performance Computing (2003) (22)
- Blueprint for a New Computing Infrastructure (2nd ed.) (2004) (22)
- Optimal Checkpointing Period: Time vs. Energy (2013) (22)
- Computational Science - ICCS 2004 (2004) (22)
- Towards batched linear solvers on accelerated hardware platforms (2015) (22)
- Parallel Tiled QR Factorization for Multicore Architectures LAPACK Working Note # 190 (2007) (22)
- Performance of various computers using standard linear equations software in a Fortran environment (1983) (22)
- Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs (2017) (22)
- Practical scalable consensus for pseudo-synchronous distributed systems (2015) (22)
- Application-specific tools (1998) (22)
- Fast Cholesky factorization on GPUs for batch and native modes in MAGMA (2017) (22)
- Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems (2015) (22)
- Decision Trees and MPI Collective Algorithm Selection Problem (2007) (22)
- Program analysis environments for parallel language systems: the tau environment (1994) (22)
- Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures (2011) (22)
- A Scalable Checkpoint Encoding Algorithm for Diskless Checkpointing (2008) (22)
- Performance Modeling for Self Adapting Collective Communications for MPI (2001) (22)
- Self adaptivity in Grid computing: Research Articles (2005) (22)
- Recursive approach in sparse matrix LU factorization (2001) (22)
- Problem-solving environments (2003) (22)
- ParILUT - A New Parallel Threshold ILU Factorization (2018) (22)
- Flexible collective communication tuning architecture applied to Open MPI (2006) (22)
- Exploiting Fine-Grain Parallelism in Recursive LU Factorization (2011) (22)
- heFFTe: Highly Efficient FFT for Exascale (2020) (21)
- Building fault surviv-able mpi programs with ft-mpi using diskless-checkpointing (2005) (21)
- Automatic optimisation of parallel linear algebra routines in systems with variable load (2003) (21)
- Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project (2010) (21)
- Race to Exascale (2019) (21)
- An Effective Empirical Search Method for Automatic Software Tuning (2005) (21)
- PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution (2015) (21)
- Performance of asynchronous optimized Schwarz with one-sided communication (2019) (21)
- Recent Advances in Parallel Virtual Machine and Message Passing Interface (2003) (21)
- Fault tolerant matrix operations using checksum and reverse computation (1996) (21)
- Scalable Fault Tolerant Protocol for Parallel Runtime Environments (2006) (21)
- Improving the Accuracy of Computed Singular Values (1983) (21)
- Heterogeneous network computing (1991) (21)
- Extending the scope of the Checkpoint‐on‐Failure protocol for forward recovery in standard MPI (2013) (21)
- Solving the Secular Equation Including Spin Orbit Coupling for Systems with Inversion and Time Reversal Symmetry (1984) (21)
- Autotuning GEMMs for Fermi (2011) (21)
- Deploying fault tolerance and taks migration with NetSolve (1999) (21)
- The Performance of PVM on MPP Systems (1995) (21)
- Updating incomplete factorization preconditioners for model order reduction (2016) (20)
- Heterogeneous Streaming (2016) (20)
- Parallel Processing and Applied Mathematics (2013) (20)
- Automatic blocking of QR and LU factorizations for locality (2004) (20)
- Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications (2005) (20)
- GPU-Aware Non-contiguous Data Movement In Open MPI (2016) (20)
- Fault-tolerant matrix operations for parallel and distributed systems (1996) (20)
- High Performance Computing for Computational Science - VECPAR 2004, 6th International Conference, Valencia, Spain, June 28-30, 2004, Revised Selected and Invited Papers (2005) (20)
- Experiences in autotuning matrix multiplication for energy minimization on GPUs (2015) (20)
- Rectangular full packed format for cholesky's algorithm: factorization, solution, and inversion (2009) (20)
- Acceleration of GPU-based Krylov solvers via data transfer reduction (2015) (20)
- HeNCE: A User''s Guide, Version 1.2 (1992) (20)
- Corrigenda: “An Extended Set of FORTRAN Basic Linear Algebra Subprograms” (1988) (20)
- ACCT: Automatic Collective Communications Tuning (2000) (20)
- Performance of LAPACK: a portable library of numerical linear algebra routines (1992) (20)
- PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP (2019) (20)
- LAPACK Working Note #5 : Provisional Contents (1988) (20)
- Programming tools and environments (1998) (20)
- An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs (2010) (20)
- Locality and Topology Aware Intra-node Communication among Multicore CPUs (2010) (19)
- Automatic experimental analysis of communication patterns in virtual topologies (2005) (19)
- Automatic analysis of inefficiency patterns in parallel applications (2007) (19)
- GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement (2011) (19)
- Software distribution using Xnetlib (1995) (19)
- Self-healing network for scalable fault-tolerant runtime environments (2010) (19)
- Performance of various computers using standard sparse linear equations solving techniques (1993) (19)
- Communication-Avoiding Symmetric-Indefinite Factorization (2014) (19)
- A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations (2018) (19)
- A Parallel Algorithm for the Non-Symmetric Eigenvalue Problem (1991) (19)
- Algorithm 710: FORTRAN subroutines for computing the eigenvalues and eigenvectors of a general matrix by reduction to general tridiagonal form (1990) (19)
- Correlated set coordination in fault tolerant message logging protocols for many‐core clusters (2013) (19)
- Improvement of parallelization efficiency of batch pattern BP training algorithm using Open MPI (2010) (19)
- Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction (2011) (19)
- Fault tolerant matrix operations for networks of workstations using multiple checkpointing (1997) (19)
- Programming methodology and performance issues for advanced computer architectures (1988) (19)
- Parallel Ocean Program (POP) (2011) (18)
- A Parallel Solver for Incompressible Fluid Flows (2013) (18)
- PARKBENCH Report -1: Public International Benchmarks for Parallel Computers, Technical Report: UT-CS-93-213 (1994) (18)
- Multithreading for synchronization tolerance in matrix factorization (2007) (18)
- Evaluating the Performance of MPI-2 Dynamic Communicators and One-Sided Communication (2003) (18)
- Preface: Basic Linear Algebra Subprograms Technical (Blast) Forum Standard (2002) (18)
- OMPIO: A Modular Software Architecture for MPI I/O (2011) (18)
- On Using Incremental Profiling for the Performance Analysis of Shared Memory Parallel Applications (2007) (18)
- PERI auto-tuning (2008) (18)
- NetSolve's Network Enabled Server: Examples and Applications (1999) (17)
- Linear algebra libraries for high-performance computers: a personal perspective (1993) (17)
- Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs (2017) (17)
- Parallel Virtual Machine — EuroPVM '96 (1996) (17)
- MIAMI: A framework for application performance diagnosis (2014) (17)
- Enhancing Parallelism of Tile QR Factorization for Multicore Architectures (2010) (17)
- LAPACK 2005 Prospectus: Reliable and Scalable Software for Linear Algebra Computations on High End Computers (2005) (17)
- Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs (2016) (17)
- Using Agent-Based Software for Scientific Computing in the NetSolve System (1998) (17)
- Logistical quality of service in NetSolve (1999) (17)
- A look back on 30 years of the Gordon Bell Prize (2017) (17)
- Source book of parallel computing. (2003) (17)
- Large Scale Computations in Air Pollution Modelling (1999) (17)
- Toward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem (2012) (17)
- Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms (2013) (17)
- Applied Parallel Computing Industrial Computation and Optimization (1996) (17)
- DEPLOYING PARALLEL NUMERICAL LIBRARY ROUTINES TO CLUSTER COMPUTING IN A SELF ADAPTING FASHION (2002) (16)
- A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures (2011) (16)
- Algorithm 589: SICEDR: A FORTRAN Subroutine for Improving the Accuracy of Computed Matrix Eigenvalues (1982) (16)
- Increasing the performance of mathematical software through high-level modularity (1985) (16)
- Implementation of mixed precision in solving systems of linear equations on the Cell processor: Research Articles (2007) (16)
- Practical Experience in the Dangers of Heterogeneous Computing (1996) (16)
- Proceedings of the 16th international symposium on High performance distributed computing (2007) (16)
- LAPACK Working Note 16: `Results from the Initial Release of LAPACK,'' (1989) (16)
- C++ API for BLAS and LAPACK (2017) (16)
- Document for the Basic Linear Algebra Subprograms (BLAS) standard: BLAS Technical Forum (2001) (16)
- Clusters and Computational Grids for Scientific Computing (1999) (16)
- Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators (2010) (15)
- Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the Synergistic Processing Element of the CELL Processor (2008) (15)
- Deploying Fault-Tolerance and Task Migration with NetSolve (1998) (15)
- LU, QR, and Cholesky factorizations: Programming model, performance analysis and optimization techniques for the Intel Knights Landing Xeon Phi (2016) (15)
- High-performance Cholesky factorization for GPU-only execution (2017) (15)
- On the performance and energy efficiency of sparse linear algebra on GPUs (2017) (15)
- Assessing the Impact of ABFT and Checkpoint Composite Strategies (2014) (15)
- A fast algorithm for the symmetric eigenvalue problem (1985) (15)
- Revisiting the Double Checkpointing Algorithm (2013) (15)
- Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs (2015) (15)
- Automatic translation of Fortran to JVM bytecode (2001) (15)
- An iterative solver benchmark (2001) (15)
- Comparison of the CRAY X-MP-4, Fujitsu VP-200, and Hitachi S-810/20 (1985) (15)
- Designing SLATE: Software for Linear Algebra Targeting Exascale (2017) (15)
- Fault Tolerant MPI for the HARNESS Meta-computing System (2001) (15)
- Prospectus for the Development of a Linear Algebra Library for High-Performance Computers (1997) (15)
- High Performance Linear Algebra Package LAPACK90 (1998) (15)
- An asynchronous algorithm on the NetSolve global computing system (2006) (15)
- LAPACK Working Note 34: Workshop on the BLACS (1991) (15)
- Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi (2017) (15)
- Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors (2019) (15)
- Parallel Block Hessenberg Reduction usingAlgorithms-By-Tiles for Multicore ArchitecturesRevisited (2009) (15)
- Power profiling of Cholesky and QR factorizations on distributed memory systems (2014) (15)
- Beyond the CPU: Hardware Performance Counter Monitoring on Blue Gene/Q (2013) (15)
- MAGMA embedded: Towards a dense linear algebra library for energy efficient extreme computing (2015) (15)
- Processes Distribution of Homogeneous Parallel Linear Algebra Routines on Heterogeneous Clusters (2005) (15)
- A Framework for Out of Memory SVD Algorithms (2017) (15)
- Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures (2013) (15)
- Revisiting Matrix Product on Master-Worker Platforms (2006) (15)
- Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations (2013) (15)
- Efficient implementation of quantum materials simulations on distributed CPU-GPU systems (2015) (14)
- Proceedings of the 9th International Conference on Computational Science (2009) (14)
- Software technologies (2003) (14)
- NetSolve: a network-enabled solver; examples and users (1998) (14)
- Enabling interactive and collaborative oil reservoir simulations on the Grid (2005) (14)
- Power Management and Event Verification in PAPI (2016) (14)
- Netlib services and resources (1994) (14)
- A failure detector for HPC platforms (2018) (14)
- Tools for Heterogeneous Network Computing (1993) (14)
- An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems (2014) (14)
- EISPACK: a package for solving matrix eigenvalue problems (1983) (14)
- Dynamically Balanced Synchronization-Avoiding LU Factorization with Multicore and GPUs (2014) (14)
- Performance optimization of Sparse Matrix‐Vector Multiplication for multi‐component PDE‐based applications using GPUs (2016) (14)
- Parallel reduction to Hessenberg form with Algorithm-Based Fault Tolerance (2013) (14)
- Computational Science — ICCS 2002 (2002) (14)
- Scalability Analysis of the SPEC OpenMP Benchmarks on Large-Scale Shared Memory Multiprocessors (2007) (14)
- Optimization for performance and energy for batched matrix computations on GPUs (2015) (14)
- Seamless Access to Adaptive Solver Algorithms (2000) (14)
- Scalable techniques for fault tolerant high performance computing (2006) (14)
- Algorithms and optimization techniques for high-performance matrix-matrix multiplications of very small matrices (2019) (14)
- Strengthening compute and data intensive capacities of Armenia (2015) (14)
- The design and implementation of the parallel out-of-core ScaLAPACK LU, QR and Cholesky factorization routines (1997) (14)
- Location-independent naming for virtual distributed software repositories (1995) (14)
- Reliability Analysis of Self-Healing Network using Discrete-Event Simulation (2007) (14)
- Recent Advances in Parallel Virtual Machine and Message Passing Interface (2002) (13)
- Reducing the Time to Tune Parallel Dense Linear Algebra Routines with Partial Execution and Performance Modeling (2011) (13)
- Automating the Large-Scale Collection and Analysis of Performance (2004) (13)
- Tuning stationary iterative solvers for fault resilience (2015) (13)
- Computational Science — ICCS 2002 (2002) (13)
- A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines (2021) (13)
- LAPACK Working Note 61: An Object Oriented Design for High Performance Linear Algebra on Distributed Memory Architectures (1993) (13)
- High Performance Development for High End Computing With Python Language Wrapper (PLW) (2007) (13)
- A scalable approach to solving dense linear algebra problems on hybrid CPU‐GPU systems (2015) (13)
- Conjugate-gradient eigenvalue solvers in computing electronic properties of nanostructure architectures (2006) (13)
- A Fault-Tolerant Communication Library for Grid Environments (2003) (13)
- The TOP500: History, Trends, and Future Directions in High Performance Computing (2020) (13)
- Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures (2017) (13)
- Batched Generation of Incomplete Sparse Approximate Inverses on GPUs (2016) (13)
- Search Space Generation and Pruning System for Autotuners (2016) (13)
- Parallel Band Two-Sided MatrixBidiagonalization for Multicore Architectures (2009) (13)
- Out of memory SVD solver for big data (2017) (13)
- Efficiency of General Krylov Methods on GPUs -- An Experimental Study (2016) (13)
- Mixed-precision orthogonalization scheme and adaptive step size for CA-GMRES on GPUs (2014) (13)
- Computational Science — ICCS 2003 (2003) (13)
- A Project for Developing a Linear Algebra Library for High-Performance Computers (1989) (13)
- LAPACK Working Note 58: ``The Design of Linear Algebra Libraries for High Performance Computers (1993) (13)
- Tiling on systems with communication/computation overlap (1999) (13)
- Preliminary results of autotuning GEMM kernels for the NVIDIA Kepler architecture-GeForce GTX 680 (2012) (13)
- A Holistic Approach for Performance Measurement and Analysis for Petascale Applications (2009) (13)
- Multi-GPU Implementation of LU Factorization (2012) (13)
- Impacts of Multi-GPU MPI Collective Communications on Large FFT Computation (2019) (13)
- NetBuild: transparent cross‐platform access to computational software libraries (2002) (12)
- Workshop on Environments and Tools for Parallel Scientific Computing (1997) (12)
- Linear algebra software for large-scale accelerated multicore computing* (2016) (12)
- AlgoWiki Project as an Extension of the Top500 Methodology (2018) (12)
- High Performance Computing for Computational Science - Vecpar 2004 (2008) (12)
- Proceedings of the International Conference on Computational Science-Part II (2008) (12)
- Installation Guide for ScaLAPACK (1992) (12)
- GrADSolve - RPC for High Performance Computing on the Grid (2003) (12)
- Why is it Hard to Describe Properties of Algorithms (2016) (12)
- Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads across Accelerators, Coprocessors, and Multicore Processors (2014) (12)
- Performance-Portable Autotuning of OpenCL Kernels for Convolutional Layers of Deep Neural Networks (2016) (12)
- Model-Driven One-Sided Factorizations on Multicore Accelerated Systems (2014) (12)
- Numerical algorithms for high-performance computational science (2020) (12)
- Selected Results from the ParkBench Benchmark (1996) (12)
- LAPACK Working Note 18: Implementation Guide for LAPACK (1990) (12)
- Predicting the electronic properties of 3D, million-atom semiconductor nanostructure architectures (2006) (12)
- Visualizing execution traces with task dependencies (2015) (12)
- ParILUT - A Parallel Threshold ILU for GPUs (2019) (12)
- Efficient parallelization of batch pattern training algorithm on many-core and cluster architectures (2013) (12)
- PDS: A Performance Database Server (1994) (12)
- Common Issues (2000) (12)
- ADAPT: an event-based adaptive collective communication framework (2018) (12)
- Scheduling two-sided transformations using tile algorithms on multicore architectures (2010) (12)
- Applying aspect-orient programming concepts to a component-based programming model (2003) (12)
- Self Adaptive Application Level Fault Tolerance for Parallel and Distributed Computing (2007) (12)
- 1. High-Performance Computing (1998) (12)
- New Algorithm for Computing Eigenvectors of the Symmetric Eigenvalue Problem (2014) (11)
- Plan B: Interruption of Ongoing MPI Operations to Support Failure Recovery (2015) (11)
- Dense Linear Algebra for Hybrid GPU-Based Systems (2010) (11)
- LAPACK Working Note 102: IML++ v. 1.2: Iterative Methods Library Reference Guide (1995) (11)
- Distributed-memory lattice H -matrix factorization (2019) (11)
- From Serial Loops to Parallel Execution on Distributed Systems (2012) (11)
- Scalable linear algebra software libraries for distributed memory concurrent computers (1995) (11)
- Big Data Meets Computational Science, Preface for ICCS 2014 (2014) (11)
- Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software (2009) (11)
- LAPACK Working Note 39: On Designing Portable High Performance Numerical Libraries (1991) (11)
- Looking back at dense linear algebra software (2014) (11)
- Packed Storage Extension for ScaLAPACK (1998) (11)
- Performance and reliability trade-offs for the double checkpointing algorithm (2014) (11)
- Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU (2016) (11)
- Performance Study of LU Factorization with Low Communication Overhead on Multiprocessors (1995) (11)
- Poster: Matrices over Runtime Systems at Exascale (2012) (11)
- Improving the Accuracy of Computed Matrix Eigenvalues (1980) (11)
- Proceedings of the Fourth SIAM Conference on Parallel Processing for Scientific Computing, Chicago, Illinois, USA, December 11-13, 1989 (1990) (11)
- Adaptive precision solvers for sparse linear systems (2015) (11)
- Parallel and Distributed Scientific Computing: A Numerical Linear Algebra Problem Solving Environment Designer's Perspective (1999) (11)
- Energy efficiency and performance frontiers for sparse computations on GPU supercomputers (2015) (11)
- Accelerating the SVD bi-diagonalization of a batch of small matrices using GPUs (2018) (11)
- Scaling up matrix computations on shared-memory manycore systems with 1000 CPU cores (2014) (10)
- Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools (2019) (10)
- Evaluating Dynamic Communicators and One-Sided Operations for Current MPI Libraries (2005) (10)
- MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs (2016) (10)
- On Scalability for MPI Runtime Systems (2011) (10)
- Finite-choice algorithm optimization in Conjugate Gradients∗ (2003) (10)
- Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols (2010) (10)
- How Elegant Code Evolves with Hardware: The Case of Gaussian Elimination (2007) (10)
- A linear algebra library for high-performance computers (1989) (10)
- LAPACK Working Note 19: Evaluating Block Algorithm Variants in LAPACK (1990) (10)
- Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for Improving the Stability and Performance of CA-GMRES on GPUs (2014) (10)
- Randomized algorithms to update partial singular value decomposition on a hybrid CPU/GPU cluster (2015) (10)
- Accelerating Computation of Eigenvectors in the Dense Nonsymmetric Eigenvalue Problem (2014) (10)
- Templates for Linear Algebra Problems (1995) (10)
- Optimized Batched Linear Algebra for Modern Architectures (2017) (10)
- GrADSolve a grid-based RPC system for parallel computing with application-level scheduling (2004) (10)
- SmartGridRPC: The new RPC model for high performance Grid computing (2010) (10)
- Advanced Computing Research Facility, Mathematics and Computer Science Division, Argonne National Laboratory (1989) (10)
- MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs (2019) (10)
- Applied Parallel Computing Computations in Physics, Chemistry and Engineering Science (1995) (9)
- From High-Level Specification to High-Performance Code (2018) (9)
- Batched one-sided factorizations of tiny matrices using GPUs: Challenges and countermeasures (2018) (9)
- The Parkbench Benchmark Collection (1995) (9)
- Mixed-precision block gram Schmidt orthogonalization (2015) (9)
- Improving the Performance of the GMRES Method using Mixed-Precision Techniques (2020) (9)
- Roadmap for the Development of a Linear Algebra Library for Exascale Computing: SLATE: Software for Linear Algebra Targeting Exascale (2017) (9)
- ScaLAPACK Tutorial (1996) (9)
- Scheduling Two-sided Transformations using Algorithms-by-Tiles on Multicore Architectures LAPACK Working Note # 214 (2009) (9)
- Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators (2012) (9)
- The use of bulk states to accelerate the band edge state calculation of a semiconductor quantum dot (2007) (9)
- PVMPI Provides Interoperability Between MPI Implementations (1997) (9)
- Numerical Libraries and Tools for Scalable Parallel Cluster Computing (2001) (9)
- \nspcg User's Guide, Version 1.0: a Package for Solving Large Sparse Linear Systems by Various Iterative Methods." Report Numerical Results an Experiment Was Conducted Comparing times on a Cray X-mp Computer for Computing the Multiple Rst Order Linear Recursion (1988) (9)
- Towards numerical benchmark for half-precision floating point arithmetic (2017) (9)
- On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties (2012) (9)
- HARNESS fault tolerant MPI design, usage and performance issues (2002) (9)
- The Problem With the Linpack Benchmark 1.0 Matrix Generator (2008) (9)
- QCG-OMPI: MPI applications on grids (2011) (9)
- Array Redistribution in ScaLAPACK Using PVM (1995) (9)
- Network-Enabled Server Systems: Deploying Scientific Simulations on the Grid (2001) (9)
- The Component Structure of a Self-Adapting Numerical Software System (2005) (9)
- Generic Matrix Multiplication for Multi-GPU Accelerated Distributed-Memory Platforms over PaRSEC (2019) (9)
- Taskers and General Resource Managers: PVM Supporting DCE Process Management (1996) (9)
- Fine-grained bit-flip protection for relaxation methods (2016) (9)
- Interactive grid-access using GridSolve and Giggle (2008) (9)
- Solving Linear Diophantine Systems on Parallel Architectures (2019) (9)
- On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures (2016) (9)
- Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators (2012) (9)
- Domain Overlap for Iterative Sparse Triangular Solves on GPUs (2016) (9)
- ScaLAPACK++: an object oriented linear algebra library for scalable systems (1993) (9)
- The design library: of a parallel dense linear algebra software Reduction to Hessenberg, tridiagonal, and bidiagonal form* (1995) (9)
- Reinventing High Performance Computing: Challenges and Opportunities (2022) (9)
- Stochastic Performance Prediction for Iterative Algorithms in Distributed Environments (1999) (9)
- Crpc Research Into Linear Algebra Software for High Performance Computers (1994) (9)
- Computational Science - ICCS 2004: 4th International Conference, Kraków, Poland, June 6-9, 2004, Proceedings, Part II (Lecture Notes in Computer Science) (2004) (9)
- Special Report: 1989 Gordon Bell Prize (1990) (8)
- BLAS for GPUs (2010) (8)
- Reducing the amount of out‐of‐core data access for GPU‐accelerated randomized SVD (2020) (8)
- POSTER: Utilizing dataflow-based execution for coupled cluster methods (2014) (8)
- Reliability and Performance Models for Grid Computing (2010) (8)
- Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning (2017) (8)
- Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC (2022) (8)
- Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices (2017) (8)
- Tuned: An Open MPI Collective Communications Component (2007) (8)
- Matrix Algebra for GPU and Multicore Architectures (MAGMA) for Large Petascale Systems (2014) (8)
- Task-Based Cholesky Decomposition on Knights Corner Using OpenMP (2016) (8)
- Symmetric Indefinite Linear Solver Using OpenMP Task on Multicore Architectures (2018) (8)
- GPUDirect MPI Communications and Optimizations to Accelerate FFTs on Exascale Systems (2019) (8)
- Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs (2018) (8)
- 2014 International Conference on Computational Science (2014) (8)
- Providing Access to High Performance Computing Technologies (1996) (8)
- OpenCL Evaluation for Numerical Linear Algebra Library Development (2011) (8)
- Providing Infrastructure and Interface to High Performance Applications in a Distributed Setting (2000) (8)
- High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs (2020) (8)
- Hash Functions for Datatype Signatures in MPI (2005) (8)
- With Extreme Scale Computing the Rules Have Changed (2016) (8)
- Computational Science - ICCS 2006, 6th International Conference, Reading, UK, May 28-31, 2006, Proceedings, Part I (2006) (8)
- Computational Benefit of GPU Optimization for the Atmospheric Chemistry Modeling (2018) (8)
- A Proposal for a Fortran 90 Interface for LAPACK (1995) (8)
- The use of Java in the NetSolve project (1997) (8)
- Matrix multiplication on batches of small matrices in half and half-complex precisions (2020) (8)
- Constructing Resiliant Communication Infrastructure for Runtime Environments (2009) (8)
- Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs (2018) (8)
- LINPACK working note No. 9: preliminary LINPACK user's guide. [In FORTRAN] (1977) (8)
- PAPI software-defined events for in-depth performance analysis (2019) (8)
- LAPACK: A Linear Algebra Library for High-Performance Computers (1992) (8)
- PARKBENCH: Methodology, Relations and Results (1996) (8)
- Digital Software and Data Repositories for Support of Scientific Computing (1995) (8)
- Network-Enabled Solvers And the NetSolve Project (1998) (8)
- Evaluating the Performance of NVIDIA’s A100 Ampere GPU for Sparse and Batched Computations (2020) (8)
- Multi-criteria Checkpointing Strategies: Response-Time versus Resource Utilization (2013) (7)
- Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs (2020) (7)
- Comparing the performance of rigid, moldable and grid-shaped applications on failure-prone HPC platforms (2019) (7)
- Recent Advances in Parallel Virtual Machineand Message Passing Interface: 13th European PVM/MPI User's Group Meeting, Bonn, Germany, September 17-20, 2006, ... (Lecture Notes in Computer Science) (2006) (7)
- Analytical Modeling for Affinity-Based Thread Scheduling on Multicore Platforms ∗ (2008) (7)
- Performance of Hierarchical-matrix BiCGStab Solver on GPU Clusters (2018) (7)
- Biannual Top-500 Computer Lists Track Changing Environments for Scientific Computing from the First Vector Machines to Today's Cluster-based Systems (2001) (7)
- Comparison of Nonlinear Conjugate-Gradient Methods for Computing the Electronic Properties of Nanostructure Architectures (2005) (7)
- 2. Iterative Methods (1994) (7)
- CPU-GPU hybrid bidiagonal reduction with soft error resilience (2013) (7)
- BlackjackBench: Portable Hardware Characterization with Automated Results' Analysis (2014) (7)
- Fully Empirical Autotuned QR Factorization For Multicore Architectures (2011) (7)
- Performance Complexity of LU Factorization with E cient Pipelining and Overlap on a Multiprocessor (1994) (7)
- Design and Implementation for FFT-ECP on Distributed Accelerated Systems (2019) (7)
- Mixed-Precision Solution of Linear Systems Using Accelerator-Based Computing (2020) (7)
- Towards Continuous Benchmarking: An Automated Performance Evaluation Framework for High Performance Software (2019) (7)
- Non‐GPU‐resident symmetric indefinite factorization (2017) (7)
- Parallel Simulation of Superscalar Scheduling (2014) (7)
- LAPACK for Distributed Memory Architectures: The Next Generation (1993) (7)
- ORNL Cray X1 evaluation status report (2004) (7)
- Tiling for Heterogeneous Computing Platforms (2006) (7)
- Adding Context and Static Groups into PVMJ (1995) (7)
- GRID-ENABLING PROBLEM SOLVING ENVIRONMENTS: A CASE STUDY OF SCIRUN AND NETSOLVE (7)
- High performance computing : technology, methods and applications (1995) (7)
- Computational science - ICCS 2003 : International conference, Melbourne, Australia and St. Petersburg, Russia, June 2-4, 2003 : proceedings. - Pt III (2003) (7)
- Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms (2018) (7)
- Vector and parallel computing : issues in applied research and development (1989) (7)
- Changing technologies of HPC (1996) (7)
- Evaluation and Design of FFT for Distributed Accelerated Systems (2018) (7)
- Asynchronous SGD for DNN training on Shared-memory Parallel Architectures (2020) (7)
- MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing (2019) (7)
- On block-asynchronous execution on GPUs (2016) (7)
- Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations (2016) (7)
- Reliability and Performance Modeling and Analysis for Grid Computing (2009) (7)
- Scheduling tasks with precedence constraints on heterogeneous distributed computing systems (2006) (7)
- Accelerating Restarted GMRES With Mixed Precision Arithmetic (2021) (7)
- Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation (2006) (6)
- Scalable Dense Linear Algebra on Heterogeneous Hardware (2012) (6)
- Panel Statement (2011) (6)
- The dangers of heterogeneous network computing: heterogeneous networks considered harmful (1996) (6)
- Selected numerical algorithms (2004) (6)
- Technologies for repository interoperation and access control (1998) (6)
- Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures (2015) (6)
- Proceedings of the International Conference on Computational Science, ICCS 2012 (2012) (6)
- Using Advanced Vector Extensions AVX-512 for MPI Reductions (2020) (6)
- Applied Parallel Computing: State of the Art in Scientific Computing (Lecture Notes in Computer Science) (2006) (6)
- High Performance Linear Algebra Package for FORTRAN 90 (1998) (6)
- Optimal Routing in Binomial Graph Networks (2007) (6)
- The LAPACK for clusters project: an example of self adapting numerical software (2004) (6)
- Computational Science – ICCS 2018 (2018) (6)
- Random Sampling to Update Partial Singular Value Decomposition on a Hybrid CPU / GPU Cluster (2015) (6)
- Towards Half-Precision Computation for Complex Matrices: A Case Study for Mixed Precision Solvers on GPUs (2019) (6)
- Designing LU-QR Hybrid Solvers for Performance and Stability (2014) (6)
- Parallel Processing and Applied Mathematics (2011) (6)
- SmartGridRPC: The new RPC model for high performance Grid Computing and Its Implementation in SmartGridSolve (2010) (6)
- Data through the Computational Lens (2017) (6)
- Report on the TianHe-2A System (2017) (6)
- Computational Science – ICCS 2019 (2019) (6)
- Working Note 17: Experiments with QR/QL Methods For The Symmetric Tridiagonal Eigenproblem (1989) (6)
- Mixing LU and QR factorization algorithms to design high-performance dense linear algebra solvers (2015) (6)
- Overview of the HPC Challenge Benchmark Suite (2006) (6)
- Accelerating NWChem Coupled Cluster through dataflow-based execution (2018) (6)
- LAPACK++ V. 1.0: High Performance Linear Algebra Users'' Guides (1995) (6)
- Solving the Generalized Symmetric Eigenvalue Problem using Tile Algorithms on Multicore Architectures (2011) (6)
- Heterogenous Acceleration for Linear Algebra in Multi-coprocessor Environments (2014) (6)
- More on Scheduling Block-Cyclic Array Redistribution (1998) (6)
- Request Sequencing: Enabling Workflow for Efficient Problem Solving in GridSolve (2008) (6)
- Towards a High-Performance Tensor Algebra Package for Accelerators (2015) (6)
- Implementing a Systolic Algorithm for QR Factorization on Multicore Clusters with PaRSEC (2013) (6)
- Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices Using GPUs (2020) (6)
- Computational Science - ICCS 2002 : International Conference, Amsterdam, The Netherlands, April 21-24, 2002 : proceedings (2002) (6)
- Providing GPU Capability to LU and QR within the ScaLAPACK Framework (2012) (6)
- HAN: a Hierarchical AutotuNed Collective Communication Framework (2020) (6)
- Performance Technologies for Peta-Scale Systems: A White Paper Prepared by the Performance Evaluation Research Center and Collaborators (2003) (6)
- LAPACK User's Guide / E. Anderson ... (1999) (6)
- Multi-Elimination ILU Preconditioners on GPUs (2014) (6)
- Simplified grid computing through spreadsheets and NetSolve (2004) (6)
- Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems (2020) (6)
- Evaluating the Performance of Skeleton-Based High Level Parallel Programs (2004) (6)
- LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System (2012) (6)
- Vector and Parallel Processing – VECPAR’98 (1998) (6)
- MAGMA templates for scalable linear algebra on emerging architectures (2020) (6)
- Summary of Software for Linear Algebra Freely Available on the Web (2006) (6)
- On the Design, Development, and Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors (2015) (6)
- NetSolve/D: a massively parallel grid execution system for scalable data intensive collaboration (2005) (6)
- Computational Science at the Gates of Nature, Preface for ICCS 2015 (2015) (6)
- Flexible Linear Algebra Development and Scheduling with Cholesky Factorization (2015) (5)
- Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication (2013) (5)
- Software Repository Interoperability (1996) (5)
- Accelerating NWChem Coupled Cluster through dataflow-based execution (2015) (5)
- Implementation of protein tertiary structure prediction system with NetSolve (2004) (5)
- Assessing the cost of redistribution followed by a computational kernel: Complexity and performance results (2016) (5)
- Parallel IO Support for Meta-computing Applications: MPI_Connect IO Applied to PACX-MPI (2001) (5)
- Design and Implementation of the PULSAR Programming System for Large Scale Computing (2017) (5)
- SCALABLE , TRUSTWORTHY NETWORK COMPUTING USING UNTRUSTED INTERMEDIARIES A Position Paper (2003) (5)
- Proceedings of the Third International Workshop on Applied Parallel Computing, Industrial Computation and Optimization (1996) (5)
- Solving dense symmetric indefinite systems using GPUs (2017) (5)
- Batched BLAS (Basic Linear Algebra Subprograms) 2018 Specification (2018) (5)
- Basic Linear Algeblra Communication Subprograms (1991) (5)
- Trace-based performance analysis for the petascale simulation code FLASH (2011) (5)
- NA-NET: Numerical Analysis NET (1991) (5)
- LAPACK Working Note 26: Prospectus for an Extension to LAPACK: A Portable Linear Algebra Library for High-Performance Computers (1990) (5)
- LAPACK Working Note 30: Reduction to Condensed Form for the Eigenvalue Problem on Distributed Memory Architectures (1991) (5)
- Active Logistical State Management in GridSolve/L (2003) (5)
- Clusters and computational grids for scientific computing - introduction (2001) (5)
- Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures LAPACK Working Note-222 (2009) (5)
- Scalability Issues in FFT Computation (2021) (5)
- Using PVM 3.0 to Run Grand Challenge Applications on a Heterogeneous Network of Parallel Computers (1993) (5)
- A Numerical Linear Algebra Problem Solving Environment Designer's Perspective (LAPACK Working Note 139) (1999) (5)
- ALGORITHM 656 An Extended Set of Basic Linear Algebra Subprograms: Model and Test Programs lmplementatioh (1988) (5)
- Checkpointing Strategies for Shared High-Performance Computing Platforms (2019) (5)
- The Impact of Multicore on Math Software and Exploiting Single Precision Computing to Obtain Double Precision Results (2006) (5)
- Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures LAPACK Working Note # 209 (2008) (5)
- Linear Systems Solvers for Distributed-Memory Machines with GPU Accelerators (2019) (5)
- Installation Guide and Design of the HPF 1.1 interface toScaLAPACK, SLHPF (1998) (5)
- Distributed-memory multi-GPU block-sparse tensor contraction for electronic structure (2020) (5)
- LINPACK Working Note #3: Fortran BLAS Timing (1980) (5)
- Performance evaluation of eigensolvers in nanostructurecomputations (2006) (5)
- Virtual Systolic Array for QR Decomposition (2013) (5)
- Improving Time to Solution with Automated Performance Analysis (2004) (5)
- A draft standard for message passing in a distributed memory environment (1994) (5)
- Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2014) (5)
- High Performance Computing Today (2000) (5)
- An Overview of High Performance Computing and Challenges for the Future (2008) (5)
- Preface to the special issue on the basic linear algebra subprograms (BLAS) (2002) (5)
- Integrated Tool Capabilities for Performance Instrumentation and Measurement (5)
- SuperNeurons: FFT-based Gradient Sparsification in the Distributed Training of Deep Neural Networks (2018) (5)
- LAPACK working note No. 10: Installing and testing the initial release of LAPACK Unix and non-Unix versions (1989) (5)
- 1988 Gordon Bell Prize (1989) (4)
- 11. The Singular Value Decomposition (1979) (4)
- Case studies on the development of ScaLAPACK and the NAG Numerical PVM Library (1996) (4)
- Parallel Two-Stage Hessenberg Reduction using Tile Algorithms for Multicore Architectures (2009) (4)
- 17th Edition of TOP500 List of World's Fastest SupercomputersReseased (2001) (4)
- Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs (2010) (4)
- BlackjackBench: portable hardware characterization (2011) (4)
- Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW (2011) (4)
- Accelerating Time-To-Solution for Computational Science and Engineering (2009) (4)
- Overview of high performance computers (2002) (4)
- Remote Software Toolkit Installer (2005) (4)
- A parallel linear algebra library for the denelcor HEP (1985) (4)
- Recent Advances in Parallel Virtual Machine (PVM) and Message Passing Interface (MPI) - 10th European PVM/MPI Users' Group Meeting, Venice, Italy, September 29 - October 2, 2003, Proceedings (2003) (4)
- GPU-accelerated co-design of induced dimension reduction: algorithmic fusion and kernel overlap (2015) (4)
- Users' Guide to GridSolve Version 0.15 (2006) (4)
- Tridiagonalization of a Symmetric Dense Matrix on a GPU Cluster (2013) (4)
- Bringing High Performance Computing to Big Data Algorithms (2017) (4)
- Automatic analysis of inefficiency patterns in parallel applications: Research Articles (2007) (4)
- Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems (2012) (4)
- Computational Science – ICCS 2019: 19th International Conference, Faro, Portugal, June 12–14, 2019, Proceedings, Part III (2019) (4)
- Active netlib: an active mathematical software collection for inquiry-based computational science & engineering education (2002) (4)
- Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the CELL Processor (2008) (4)
- Another Architecture: PVM on Windows 95/NT (1996) (4)
- Parallel I/O Library (PIO) (2011) (4)
- Variable-Size Batched Gauss-Huard for Block-Jacobi Preconditioning (2017) (4)
- Progressive Optimization of Batched LU Factorization on GPUs (2019) (4)
- Preface (2010) (4)
- Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators (2018) (4)
- Recent trends in high performance computing (2009) (4)
- TOP500 Supercomputers for June 2002 (2002) (4)
- Data through the Computational Lens, Preface for ICCS 2016 (2016) (4)
- Building blocks for iterative solution of linear systems (1993) (4)
- Secure Remote Access to Numerical Software and Computation Hardware (2000) (4)
- Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES (2014) (4)
- Guest Editorial: Benchmarking of high performance computers (1991) (4)
- Empirical Performance Tuning of Dense Linear Algebra Software (2010) (4)
- Gordon Bell prize lectures (1991) (4)
- Developing an Architecture to Support the Implementation and Development of Scientific computing Applications (2000) (4)
- High Performance Computing in the U.S. in 1995 An Analysis on the Basis of the TOP500 List (1995) (4)
- Sampling algorithms to update truncated SVD (2017) (4)
- PLASMA (2019) (4)
- Dense Linear Algebra on Accelerated Multicore Hardware (2012) (4)
- Access-averse framework for computing low-rank matrix approximations (2014) (4)
- National Science Foundation Advisory Committee for CyberInfrastructure Task Force on Software for Science and Engineering (2011) (4)
- Parallel BLAS Performance Report (2018) (4)
- GrADSolve - a Grid-based RPC system for Remote Invocation of Parallel Software (2003) (4)
- On The Implementation Of A Fully Parallel Algorithm For The Symmetric Eigenvalue Problem (1986) (4)
- Massively Parallel Automated Software Tuning (2019) (4)
- Structure-Aware Linear Solver for Realtime Convex Optimization for Embedded Systems (2017) (3)
- A New Recursive Implementation of Sparse Cholesky Factorization (2000) (3)
- Algorithms on massively parallel architectures : DPLASMA (2011) (3)
- Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices (2017) (3)
- Design of Interactive Environment for Numerically Intensive Parallel Linear Algebra Calculations (2004) (3)
- Chapter in Wiley Encyclopedia of Electrical and Electronics Engineering (1999) (3)
- Using Arm Scalable Vector Extension to Optimize OPEN MPI (2020) (3)
- Reusable software and algorithms (2003) (3)
- The PVM System (1994) (3)
- Translational process: Mathematical software perspective (2021) (3)
- Lightning Talk : Creating a Standardised Set of Batched BLAS Routines (2016) (3)
- Special Report: 1990 Gordon Bell Prize Winners (1991) (3)
- A Preconditioned Conjugate Gradient Method for Solving a Class of Non-Symmetric Linear Systems (1981) (3)
- LAPACK Working Note 93 Installation Guide for ScaLAPACK1 (1995) (3)
- SLATE Users' Guide (2020) (3)
- Hybrid Multi-elimination ILU Preconditioners on GPUs (2014) (3)
- Prototype of the National High-Performance Software Exchange (1994) (3)
- Experiences with CODE and HeNCE in Visual Programming for Parallel Computing (1995) (3)
- Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation (2019) (3)
- Hessenberg Reduction with Transient Error Resilience on GPU-Based Hybrid Architectures (2016) (3)
- Industrial Application Areas of High-Performance Computing (1997) (3)
- Least squares solvers for distributed-memory machines with GPU accelerators (2019) (3)
- DOE Advanced Scientific Advisory Committee (ASCAC): Workforce Subcommittee Letter (2014) (3)
- The 30th Anniversary of the Supercomputing Conference: Bringing the Future Closer - Supercomputing History and the Immortality of Now (2018) (3)
- Algorithm Design for Different Computer Architectures (1989) (3)
- Autotuning dense linear algebra libraries on GPUs (2010) (3)
- Parallel Processing and Applied Mathematics : 11th International Conference, PPAM 2015, Krakow, Poland, September 6-9, 2015. Revised Selected Papers, Part II (2016) (3)
- Programming the LU Factorization for a Multicore System with Accelerators (2012) (3)
- Scheduling for Numerical Linear Algebra Library at Scale (2008) (3)
- LAPACK95 ‐ high performance linear algebra package (2000) (3)
- BlackjackBench: portable hardware characterization (2012) (3)
- A graphics tool to aid in the generation of parallel FORTRAN programs (1989) (3)
- NanoPSE: Nanoscience Problem Solving Environment for atomistic electronic structure of semiconductor nanostructures (2005) (3)
- Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization (2019) (3)
- Parallel and Distributed Scientific Computing (2000) (3)
- Flexible Data Redistribution in a Task-Based Runtime System (2020) (3)
- RIBAPI - Repository in a Box Application Programmer's Interface (2001) (3)
- Algorithmic and software challenges when moving towards exascale (2012) (3)
- Analysis of various scalar , vector , and parallel implementations of RandomAccess ∗ (2010) (3)
- Performance Complexity of Lu Factorization with Eecient Pipelining and Overlap on a Multiprocessor Performance Complexity of Lu Factorization with Eecient Pipelining and Overlap on a Multiprocessor (2007) (3)
- Parallel Virtual Machine - EuroPVM'96: Third European PVM Conference, Munich, Germany, October, 7 - 9, 1996. Proceedings (1996) (3)
- The Problem with the Linpack Benchmark Matrix Generator (2008) (3)
- Architecture-aware Algorithms and Software for Peta and Exascale Computing (2011) (3)
- TOP500 Sublist for November 2001 (2001) (3)
- Performance Engineering: Understanding and Improving thePerformance of Large-Scale Codes (2007) (3)
- A Test Suite for PVM (1995) (3)
- The Design and Implementation of the Reduction Routines in ScaLAPACK (1995) (3)
- Scientific discovery and engineering innovation requires unifying traditionally separated high performance computing and big data analytics. (2015) (3)
- Providing Uniform Dynamic Access to Numerical Software (1999) (3)
- History of PVM Versions (1994) (3)
- Parallel Numerical Linear Algebra (1999) (3)
- Hydrodynamic Computation with Hybrid Programming on CPU-GPU Clusters (2013) (3)
- Using long vector extensions for MPI reductions (2021) (3)
- Evaluation of directive-based performance portable programming models (2019) (3)
- Bidiagonalization with Parallel Tiled Algorithms (2016) (3)
- Computation at the Frontiers of Science, preface for ICCS 2013 (2013) (3)
- Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU Clusters (2018) (3)
- Programming Tools (1998) (3)
- A hybrid Hermitian general eigenvalue solver (2012) (3)
- Sparse Linear Algebra (2010) (3)
- Replacing Pivoting in Distributed Gaussian Elimination with Randomized Techniques (2020) (3)
- Computational Science - ICCS 2002, Proceedings Part III (2002) (3)
- Computational science: ICCS 2006. Volumes 1-4 (2006) (3)
- Non-GPU-resident Dense Symmetric Indefinite Factorization (2016) (3)
- Post-exascale supercomputing: research opportunities abound (2018) (3)
- Increasing Accuracy of Iterative Refinement in Limited Floating-Point Arithmetic on Half-Precision Accelerators (2019) (3)
- Recent Advances in the Message Passing Interface. Proceeedings of the 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26 (2012) (3)
- Computational Science - ICCS 2007: 7th International Conference, Beijing, China, Proceedings, Part IV (2007) (3)
- Computational Science – ICCS 2018 (2018) (3)
- Towards bulk based preconditioning for quantum dot computations (2006) (3)
- FFT-ECP Implementation Optimizations and Features Phase (2019) (3)
- The Art of Computational Science, Bridging Gaps - Forming Alloys. Preface for ICCS 2017 (2018) (3)
- Recent advances in the message passing interface : 18th European MPI Users' Group Meeting, EuroMPI 2011, Santorini, Greece, September 18-21, 2011 : proceedings (2011) (3)
- Truss Structual Optimization using NetSolve System (2002) (3)
- Parallel Random Access Machines (PRAM) (2011) (2)
- A Unified HPC Environment for Hybrid Manycore/GPU Distributed Systems (2011) (2)
- Mixed-Tool Performance Analysis on Hybrid Multicore Architectures (2010) (2)
- A Jaccard Weights Kernel Leveraging Independent Thread Scheduling on GPUs (2018) (2)
- 4. Related Issues (1994) (2)
- Software-Defined Events through PAPI (2019) (2)
- igh Performance Computing for Computational Science - VECPAR 2002, 5th International Conference, Porto, Portugal, June 26-28, 2002, Selected Papers and Invited Talks (2003) (2)
- Parallel Scientific Computing (1994) (2)
- High Performance Computing and Communications: First International Conference, HPCC 2005, Sorrento, Italy, September, 21-23, 2005, Proceedings (Lecture Notes in Computer Science) (2005) (2)
- Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms (2013) (2)
- LINPACK working note No. 15: LINPACK, a package for solving linear systems (1982) (2)
- Present and Future Supercomputer Architectures (2004) (2)
- Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation (2017) (2)
- Parallel Scientific Computing, First International Workshop, PARA '94, Lyngby, Denmark, June 20-23, 1994, Proceedings (1994) (2)
- of a Self-Adapting Numerical Software (2005) (2)
- Automatic search for patterns of inefficient behavior in parallel applications (2005) (2)
- Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, Houston, Texas, USA, March 25-27, 1991 (1992) (2)
- LAPACK Working Note 81: Quick Installation Guide for LAPACK on UNIX Systems (1994) (2)
- Hands-On Research and Training in High Performance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments (2019) (2)
- QR Factorization for the CELL Processor – LAPACK Working Note 201 (2008) (2)
- Towards a Parallel Tile LDL Factorization for Multicore Architectures (2011) (2)
- Batch QR Factorization on GPUs: Design, Optimization, and Tuning (2022) (2)
- Block-Cyclic Array Redistribution on Networks of Workstations (1997) (2)
- National HPCC Software Exchange (NHSE): Uniting the High Performance Computing and Communications Community (1998) (2)
- A Comparison of Parallel Solvers for General Narrow Banded LinearSystems (1999) (2)
- Experiences with Windows NT as a Cluster Computing Platform for Parallel Computing (1999) (2)
- Performance analysis and design of a hessenberg reduction using stabilized blocked elementary transformations for new architectures (2015) (2)
- Revisiting Credit Distribution Algorithms for Distributed Termination Detection (2021) (2)
- Fully Empirical Autotuned QR Factorization For (2011) (2)
- Network enabled solvers for scientific computing using the NetSolve system (1997) (2)
- Evolution of Numerical Software for Dense Linear Algebra (2018) (2)
- International Conference on Computational Science, ICCS 2010 (2010) (2)
- Computational Science in the Interconnected World: Selected papers from 2019 International Conference on Computational Science (2020) (2)
- Science at the intersection of data, modelling, and computation (2019) (2)
- Computational Science - ICCS 2006, 6th International Conference, Reading, UK, May 28-31, 2006, Proceedings, Part II (2006) (2)
- Active netlib: an active mathematical software collection for inquiry-based computational science & engineering education (2002) (2)
- High Performance Linear System Solver with Resilience to Multiple Soft Errors (2011) (2)
- Flexible batched sparse matrix-vector product on GPUs (2017) (2)
- Batched Matrix Computations on Hardware Accelerators (2015) (2)
- Modeling of L2 Cache Behavior for Thread-Parallel Scientific Programs on Chip Multi-Processors ∗ (2006) (2)
- Weighted dynamic scheduling with many parallelism grains for offloading of numerical workloads to multiple varied accelerators (2015) (2)
- Small Tensor Operations on Advanced Architectures for High-Order Applications (2017) (2)
- Special section: Grid computing and the message passing interface (2008) (2)
- Bulk Synchronous Parallelism (BSP) (2011) (2)
- Parallel processing for scientific computing. Proceedings (1992) (2)
- 5. Building Blocks in Linear Algebra (1998) (2)
- Do Moldable Applications Perform Better on Failure-Prone HPC Platforms? (2018) (2)
- DIVIDE & CONQUER ON HYBRID GPU-ACCELERATED MULTICORE SYSTEMS (2012) (2)
- Special section: Applications of distributed and grid computing (2008) (2)
- Assessing the impact of ABFT & Checkpoint composite strategies (2013) (2)
- Self-Adapting Software for Numerical Linear Algebra Library Routines on Clusters (2003) (2)
- Prospectus for a Dense Linear Algebra Software Library (2007) (2)
- Poster: new features of the PAPI hardware counter library (2011) (2)
- 1989 Gordon Bell Prize (1990) (2)
- Parallel Linear Algebra Software (2006) (2)
- Chapter 11 Collecting Performance Data with PAPIC (2010) (2)
- On a Direct Algorithm for Computing Invariant Subspaces With. . . (1991) (2)
- Self-healing in Binomial Graph Networks (2007) (2)
- Possibilities for Active Messaging in PVM (1995) (2)
- Recent Advances in the Message Passing Interface - 18th European MPI Users’ Group Meeting, EuroMPI 2011. Proceedings. (2011) (2)
- Automated Empiri al Optimization of Software and theATLAS Proje t (2000) (2)
- Future linear-algebra libraries (1996) (2)
- Special section: Cluster and computational grids for scientific computing (2008) (2)
- A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling (2009) (2)
- An update notice on the level 3 BLAS (1989) (2)
- Improved Runtime and Transfer Time Prediction Mechanisms in a Network Enabled Servers Middleware (2007) (2)
- LAPACK Working Note 112: Practical Experience in the Dangers ofHeterogeneous Computing (1996) (2)
- Performance Analysis of Heterogeneous Algorithms (2009) (2)
- A Pattern-Based Approach to Automated Application Performance Analysis (2005) (2)
- Constructing Numerical Software Libraries for High-Performance Computing Environments (1994) (2)
- A Grid Computing Environment for Enabling Large Scale Quantum Mechanical Simulations (2000) (2)
- Linear Algebra Software (2011) (2)
- Evaluating computers and their performance: Perspectives, pitfalls, and paths (1987) (2)
- Parallel Processing and Applied Mathematics. 10th International Conference, PPAM 2013. Revised Selected Papers (2014) (2)
- New directions in software for advanced computer architectures (1984) (2)
- Architecture-Aware Algorithms for Scalable Performance and Resilience on Heterogeneous Architectures (2013) (2)
- Parallel programming considerations (2003) (2)
- Latency Hiding (2011) (2)
- Integrating Deep Learning in Domain Sciences at Exascale (2020) (2)
- International Conference On Computational Science, ICCS 2015: Computational Science at the Gates of Nature (2015) (2)
- PLASMA 17 Performance Report (2017) (2)
- Distribution of Computations with Nonconstant Performance Models of Heterogeneous Processors (2009) (2)
- TOP500 Supercomputer sites 11/2000 - eScholarship (2000) (2)
- The evolution of mathematical software (2022) (2)
- Identification of performance characteristics from multi-view trace analysis (2003) (2)
- Management of the NHSE -- a Virtual Distributed Digital Library (1995) (1)
- ECP Milestone Report FFT-ECP Implementation Optimizations and Features Phase WBS 2 . 3 . 3 . 09 , Milestone FFT-ECP ST-MS-10-1440 Stanimire (2019) (1)
- Parallel Processing and Applied Mathematics, 6th International Conference, PPAM 2005, Poznan, Poland, September 11-14, 2005, Revised Selected Papers (2006) (1)
- Performance Analysis and Optimisation of Two-sided Factorization Algorithms for Heterogeneous Platform (2015) (1)
- Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime (2014) (1)
- High Performance Computing (HPC) Challenge (HPCC) Benchmark Suite Development (2005) (1)
- High Performance Computers and Algorithms From Linear Algebra (1986) (1)
- LAPACK Working Note 109 BLAS Technical Workshop (1995) (1)
- Dagstuhl Seminar on Instruction-Level Parallelism and Parallelizing Compilation (2008) (1)
- SLATE Working Note 12: Implementing Matrix Inversions (2019) (1)
- Constructing numerical software libraries for HPCC environments (1994) (1)
- Proceedings of the International Conference on Computational Sciences-Part I (2001) (1)
- What should we expect from parallel language standards ? Discussion (1992) (1)
- Mixed-precision orthogonalization scheme and its case studies with CA-GMRES on a GPU (2014) (1)
- Activities and Results of the Recent Meeting of the International Exascale Software Project ( IESP ) , San Francisco , CA , USA , April 2011 (2011) (1)
- Guest Editorial: Foreword (2009) (1)
- New eigensolvers for large-scale nanoscience simulations (2008) (1)
- Parallel Processing and Applied Mathematics, 4th International Conference, PPAM 2001 Naleczow, Poland, September 9-12, 2001, Revised Papers (2002) (1)
- C++ API for Batch BLAS (2017) (1)
- A Framework to Exploit Data Sparsity in Tile Low-Rank Cholesky Factorization (2022) (1)
- LAPACK Working Note 117: A FORTRAN 90 Interface for LAPACK:LAPACK90, version 1.0 (1996) (1)
- Parallel Dense Linear Algebra Software in the Multicore Era (2009) (1)
- Applied Parallel Computing, State of the Art in Scientific Computing, 7th International Workshop, PARA 2004, Lyngby, Denmark, June 20-23, 2004, Revised Selected Papers (2006) (1)
- Harnessing GPU's Tensor Cores Fast FP16 Arithmetic to Speedup Mixed-Precision Iterative Refinement Solvers and Achieve 74 Gflops/Watt on Nvidia V100 (2018) (1)
- LAPACK Working Note #224 QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment (2009) (1)
- An efficient distributed randomized solver with application to large dense linear systems (2012) (1)
- Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part V (2020) (1)
- Disaster Survival Guide in Petascale Computing: An Algorithmic Approach (2013) (1)
- Evaluating Data Redistribution in PaRSEC (2021) (1)
- LAPACK Working Note 91: The Spectral Decomposition of Nonsymmetric Matrices on Distributed Memory Parallel Computers (1995) (1)
- Management of the Nationale HPCC Software Exchange - A Virtual Distributed Digital Library (1995) (1)
- Computing Low-Rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and Its Application to Solving a Hierarchically Semiseparable Linear System of Equations (2015) (1)
- Algorithmic Issues on Heterogeneous Computing (1999) (1)
- The Semantic Conference Organizer (2003) (1)
- Counter Inspection Toolkit: Making Sense Out of Hardware Performance Events (2017) (1)
- Enabling interactive and collaborative oil reservoir simulations on the Grid: Research Articles (2005) (1)
- Combining multitask and transfer learning with deep Gaussian processes for autotuning-based performance engineering (2023) (1)
- Characterization of Power Usage and Performance in Data-Intensive Applications Using MapReduce over MPI (2019) (1)
- Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part VII (2020) (1)
- Scaling point set registration in 3D across thread counts on multicore and hardware accelerator platforms through autotuning for large scale analysis of scientific point clouds (2017) (1)
- Prospectus for an Extension to LAPACK: A Portable Linear Algebra Linrary . . . (1990) (1)
- Accelerating Multi - Process Communication for Parallel 3-D FFT (2021) (1)
- Computational Science - ICCS 2006, 6th International Conference, Reading, UK, May 28-31, 2006, Proceedings, Part IV (2006) (1)
- Chapter 1 Fault tolerance techniques for high-performance computing (2015) (1)
- LINPACK working note number3: Fortran BLAS timing (1980) (1)
- Automated Empirical Tuning of a Multiresolution Analysis Kernel (2007) (1)
- Self Adapting Application Level Fault Tolerance for Parallel and Distributed Computing (2007) (1)
- Netlib Services and Resources (Revised) (1994) (1)
- The Boole Lecture Trends in High Performance Computing (2004) (1)
- Optimizing performance and reliability in distributed computing systems through wide spectrum storage (2003) (1)
- Another Architecture : PVM on Windows 95 / (1996) (1)
- A look at the evolution of mathematical software for dense matrix problems over the past fifteen years (1987) (1)
- LAPACK Working Note 31: Generalized QR Factorization and its Applications (1991) (1)
- Request Sequencing : Optimizing Communication for the Grid 0 (1)
- Interdisciplinary and Multidisciplinary Research in Computer Science, IEEE CS Proceeding of the First International Multi-Symposium of Computer and Computational Sciences (IMSCCS|06), June 20-24, 2006, Zhejiang University, Hangzhou, China, Vol. 2 (2006) (1)
- Accelerating Krylov Subspace Solvers on Graphics Processing Units (2014) (1)
- Transparent Cross-Platform Access to Software Services using GridSolve and GridRPC (2009) (1)
- Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization (2018) (1)
- Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations (2016) (1)
- Livermore Loops (2011) (1)
- Computational science for a better future (2022) (1)
- 1990 Gordon Bell Prize Winners (1991) (1)
- Enabling workflows in GridSolve: request sequencing and service trading (2013) (1)
- Using Power Demand and Residual Load Imbalance in the Load Balancing to Save Energy of Parallel Systems (2019) (1)
- High performance computing and trends: connecting computational requirements with computing resources (2001) (1)
- Future Trends in Computing (2009) (1)
- TOP500 Supercomputers for June 2003 (2003) (1)
- Testing Software for LAPACK90 (1998) (1)
- Dynamic Contaminant Identification in Water (2006) (1)
- Computational Science - ICCS 2006: 6th International Conference, Reading, UK, Proceedings, Part III (2006) (1)
- NewGrid Scheduling 1 and ReschedulingMethods 2 in the GrADS Project (2005) (1)
- Aasen ’ s Symmetric Indefinite Linear Solvers in LAPACK (2017) (1)
- Tuning Principal Component Analysis for GRASS GIS on Multi-core and GPU Architectures (2010) (1)
- Project-Based Research and Training in High Performance Data Sciences, Data Analytics, and Machine Learning (2020) (1)
- Computing Least Squares Condition Numbers on Hybrid Multicore/GPU Systems (2015) (1)
- Parallel Processing and Applied Mathematics (2002) (1)
- International Conference on Computational Science, ICCS 2017, 12-14 June 2017, Zurich, Switzerland (2017) (1)
- New Multi-Stage Algorithm for Symmetric Eigenvalues and Eigenvectors Achieves Two- Fold Speedup (2014) (1)
- Evaluation of dataflow programming models for electronic structure theory (2018) (1)
- Message‐Passing Software Systems (1999) (1)
- Computational Science - ICCS 2008, 8th International Conference, Kraków, Poland, June 23-25, 2008, Proceedings, Part III (2008) (1)
- Communication Avoiding 2D Stencil Implementations over PaRSEC Task-Based Runtime (2020) (1)
- Performance and library issues for mathematical software on high performance computers (1984) (1)
- Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem (2014) (1)
- TOP500 Report 1996 (1996) (1)
- Evolution of the HPC Market (1997) (1)
- Shopping for mathematical software electronically (1989) (1)
- Parallel Processing Research in the Former Soviet Union (1992) (1)
- Computational Science – ICCS 2019 (2019) (1)
- State Space Search (2011) (1)
- 2. Overview of Current High-Performance Computers (1998) (1)
- MIXED-PRECISION ALGORITHM FOR FINDING SELECTED (2021) (1)
- PULSAR Users’ Guide, Parallel Ultra-Light Systolic Array Runtime (2014) (1)
- High Performance Computing, Computational Grid, and Numerical Libraries (2002) (1)
- Lightweight Superscalar Task Execution in Distributed Memory (2014) (1)
- Chapter 1: System Models and Enabling Technologies (42 Pages) Revised Chapter 1 System Models and Enabling Technologies 1.2 Enabling Technologies for Distributed Computing 7 1.2.1 System Components and Wide-area Networking 1.2.2 Virtual Machines and Virtualization Middleware 1.2.3 Trends in Distribu (1)
- Program Graphs (2011) (1)
- Variable-Size Batched Condition Number Calculation on GPUs (2018) (1)
- Targeting multi-core architectures for linear algebra applications (2006) (1)
- Self-adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures (2014) (1)
- 20 years of computational science: Selected papers from 2020 International Conference on Computational Science (2021) (1)
- Toward High Performance Divide and Conquer Eigensolver for Dense Symmetric Matrices (2011) (1)
- Multi-criteria checkpointing strategies: optimizing response-time versus resource utilization (2013) (1)
- High-Performance Computing in Industry (1997) (1)
- Supporting Heterogeneous Network Computing: Pvm (2007) (1)
- Three Tools to Help with Cluster and Grid Computing: SANS-Effort, PAPI, and NetSolve (2002) (1)
- Recent Advances in the Message Passing Interface (2012) (1)
- Performance tuning of CEED software and 1st and 2nd wave apps (2019) (1)
- Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications (2022) (1)
- POMPEI: Programming with OpenMP4 for Exascale Investigations (2017) (1)
- Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing (2004) (1)
- Proceedings of the 16th International Symposium on High-Performance Distributed Computing (HPDC-16 2007), 25-29 June 2007, Monterey, California, USA (2007) (1)
- Implementing Matrix Multiplication on the Cell B. E (2010) (1)
- Parallel Processing and Applied Mathematics: 5th International Conference, PPAM 2003, Czestochowa, Poland, September 7-10, 2003. Revised Papers (Lecture Notes in Computer Science) (2004) (1)
- Fog Computing (2020) (1)
- Dense Linear Algebra (2012) (1)
- Performance Technolgies for Peta-Scale Systems: A White Paper Prepared by the Performance Evaluation Research Center (2003) (1)
- Coming Multicore Revolution (2007) (1)
- Advances in Mixed Precision Algorithms: 2021 Edition (2021) (1)
- An evaluation of User-Level Failure Mitigation support in MPI (2013) (1)
- SCHEDULE: An Environment for Developing Transportable Explicitly Parallel Codes in Fortran-Abstract (1987) (1)
- Selected Papers and Invited Talks from the Third International Conference on Vector and Parallel Processing (1998) (1)
- MAGMA-sparse Interface Design Whitepaper (2017) (1)
- PLASMA 17.1 Functionality Report (2017) (1)
- Parallel Processing for Scientific Computing. (1993) (1)
- The TOP500-Report : Special Issue "Supercomputer" (1997) (0)
- Reducing Out-of-Core Data Access for GPU-accelerated Randomized SVD (2019) (0)
- ATLAS on the BlueGene/L – Preliminary Results (2006) (0)
- Performance Tuning SLATE (2020) (0)
- PVM takes over the world (1993) (0)
- Master Node Slave Node Internal Network External Network PC Cluster User NodeExternal Network Administration Node Repository Node (2003) (0)
- Heterogeneous Network-Based Concurrent Computing Systems (1995) (0)
- Summer institute in parallel computing, September 5--15, 1989 (1989) (0)
- Performance of advanced architectures (1986) (0)
- Guest Editor’s Note: Special Issue on Clusters, Clouds and Data for Scientific Computing (2017) (0)
- Autotuning Techniques for Performance-Portable Point Set Registration in 3D (2018) (0)
- Parallel Computing with Application-Level Scheduling ? (2003) (0)
- Templates and numerical linear algebra (2003) (0)
- Exploiting Block Structures of KKT Matrices for Efficient Solution of Convex Optimization Problems (2021) (0)
- Theory of Mazurkiewicz-Traces (2011) (0)
- TOP500 Supercomputer Sites 1995 (1995) (0)
- Data Movement Interfaces to Support Dataflow Runtimes (2018) (0)
- Distributed Information Management in the National HPCC Software Exchange (1995) (0)
- Lapack for Fortran90 Compiler (1996) (0)
- Chapter 4 Power Management and Event Verification in PAPI (2016) (0)
- Conference Spotlight - Circuits and Devices Sessions at CEATEC (2005) (0)
- Developing Information Power Grid Based Algorithms and Software (1998) (0)
- Chapter 2 Parallel Programming Considerations (0)
- 5. Remaining Topics (1994) (0)
- Congratulations to the winners! (1975) (0)
- Preface To the Special Issue (1997) (0)
- Empowering Science through Computing, Preface for ICCS 2012 (2012) (0)
- 3. Documentation Design and Program Examples (2001) (0)
- SIAM Conference on Parallel Processing for Scientific Computing, 4th, Chicago, IL, Dec. 11-13, 1989, Proceedings (1990) (0)
- A Tribute to Gene Golub (2008) (0)
- TOWARDS AN ACCURATE MODEL FOR COLLECTIVE COMMUNICATIONS 1 (2004) (0)
- SLATE Developers' Guide (2019) (0)
- Testing Software for LAPACK 90 (1998) (0)
- Sequential Task Flow Runtime Model Improvements and Limitations (2022) (0)
- Proceedings of the 23rd European MPI Users' Group Meeting (2016) (0)
- Message from the General Chairs (2018) (0)
- 4. Performance: Analysis, Modeling, and Measurements (1998) (0)
- Proceedings of the second workshop on Scalable algorithms for large-scale systems (2011) (0)
- Advances in Mixed Precision Algorithms: 2021 Edition. (2021) (0)
- 3. Implementation Details and Overhead (1998) (0)
- Performances comparées de 80 ordinateurs sur des programmes Fortran (1984) (0)
- Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure (2011) (0)
- 1. General Matrices (1979) (0)
- Editorial introduction to the special issue on computational linear algebra and sparse matrix computations (2007) (0)
- Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part VI (2020) (0)
- PAPI: Counting outside the Box (2018) (0)
- Techniques for Solving Large-Scale Graph Problems on Heterogeneous Platforms (2016) (0)
- 2015 Salishan Final Program (2015) (0)
- LAPACK Working Note 25: Numerical Consideration in Computing Invariant Subspaces (1990) (0)
- Simulation of the Evolution of Clusters of Galaxies on Heterogeneous Computational Grids (2009) (0)
- Accelerating the SVD Bidiagonalization of a Batch of Small Matrices using GPUsI (2018) (0)
- 2. Getting Started with ScaLAPACK (1997) (0)
- Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface (2010) (0)
- Toolboxes and Templates for Large Scale Linear Algebra Problems (2002) (0)
- Proceedings of the IEEE/ACM SC95 Conference - Table of Contents (1995) (0)
- NetBuild : Automated Installation and Use of Network-Accessible Software Libraries † (2004) (0)
- 3 Directive-based Programming Models for Accelerators 3 . 1 OpenMP (2017) (0)
- The 2006 HPC challenge awards (2006) (0)
- 7. Driver Routines for Standard Eigenvalue Problems (2001) (0)
- Conclusions of The Nato Arw on Large Scale Computations in Air Pollution Modelling (1999) (0)
- 4. Performance and Troubleshooting (2001) (0)
- Parallel Operating System (2011) (0)
- Linear algebra - software issues (2011) (0)
- What it Takes to keep PAPI Instrumental for the HPC Community (2019) (0)
- Fy 2006 Lacsi Project Proposal Fy 2006 Proposal (2005) (0)
- A More Portable HeFFTe: Implementing a Fallback Algorithm for Scalable Fourier Transforms (2021) (0)
- POHLL: Workshop on performance optimization for high-level languages and libraries (2008) (0)
- The Case for Directive Programming for Accelerator Autotuner Optimization (2017) (0)
- Introduction to the Special Issue (2012) (0)
- Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA@SC 2015, Austin, Texas, USA, November 15, 2015 (2015) (0)
- High Performance Computing Trends, Supercomputers, Clusters and Grids (2004) (0)
- Bibliometric Landscape of the ACM Digital Library (2005) (0)
- 3 Shallow Water Equations Solver Developing Scientiic Applications in Glu (2007) (0)
- The use of Java in theNetSolve projectH (1997) (0)
- CISIS 2009 Reviewers List (2009) (0)
- Organizers Put Mathematics to Work For the Math Sciences Community Calling on their experience (0)
- Server Farm (2011) (0)
- Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part II (2020) (0)
- Performance improvements of common sparse numerical linear algebra computations (2003) (0)
- Hence: a Heterogeneous Network Computing Environment Hence: a Heterogeneous Network Computing Environment (1993) (0)
- Linear Systems Performance Report (2018) (0)
- LINPACK working note No. 13: implementation guide for LINPACK (1980) (0)
- Spi More on Scheduling Block-cyclic Array Redistribution 1 More on Scheduling Block-cyclic Array Redistribution 2 More on Scheduling Block-cyclic Array Redistribution (2007) (0)
- Implementing Matrix Factorizations on the Cell B. E (2010) (0)
- Comparing performance of s-step and pipelined GMRES on distributed-memory multicore CPUs (2017) (0)
- TOP500 Supercomputers for November 2002 (2002) (0)
- SLATE MIXED PRECISION PERFORMANCE REPORT 1 (2019) (0)
- Proceedings of the second workshop on Scalable algorithms for large-scale systems, ScalA@SC 2011, Seattle, WA, USA, November 14, 2011 (2011) (0)
- for the HARNESS Meta-computing System (2001) (0)
- Priorities and Strategies (2004) (0)
- Preface (2003) (0)
- IPDPS 2011 Tuesday 25th Year Panel - Looking back (2011) (0)
- Algorithm Design for Large-Scale Computations (1987) (0)
- ICL-UT-1803 Data Movement interfaces to support dataflow runtimes (2018) (0)
- A Framework For Migrating Applications Under Changing Load Conditions In The Grid ? (0)
- A Not So Simple Matter of Software; The Evolution of Mathematical Software: Software and Algorithms Follow the Hardware (2022) (0)
- New Building Blocks for HPC in 1995 (1996) (0)
- Computational Science – ICCS 2019 (2019) (0)
- Mixed precision and approximate 3D FFTs: Speed for accuracy trade-off with GPU-aware MPI and run-time data compression (2022) (0)
- Understanding Native Event Semantics (2019) (0)
- Selected papers of the Workshop on Clusters, Clouds and Grids for Scientific Computing (CCGSC) (2011) (0)
- Optimizing Batch HGEMM on Small Sizes Using Tensor Cores (2019) (0)
- Message from the High Performance Computing and Communications 2022 General Chairs (2022) (0)
- A Comparison of 2 x 2 and 3 x 3 Block Saddle Point Formulations of Weak Constraint 4 D-Var Ieva (2019) (0)
- Special Issue on Tools in the ACTS Collection 2004 (2006) (0)
- Hybrid LU factorization on multi-GPU multi-core heterogeneous platforms (2012) (0)
- Distributed Termination Detection for HPC Task-Based Environments (2018) (0)
- Bsp (2020) (0)
- Exascale Computing Systems in e-Infrastructures (2015) (0)
- Users'' Guide to NetSolve, version 1.1.b (Client and Server) (1998) (0)
- Handbook of Research on Scalable Computing Technologies 2-Volumes (2009) (0)
- Static tiling for heterogeneous computing platforms 1 (1998) (0)
- Optimization of Injection Schedule of Diesel Engine Using GridRPC (2003) (0)
- Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part III (2020) (0)
- 11. The Generalized Eigenproblem (1998) (0)
- TOP500 Supercomputers for November 2004 (2004) (0)
- 2. Contents of LAPACK95 (1999) (0)
- Combining Measurement and Stochastic Modelling to Enhance Scheduling Decisions for a Parallel Mean Value Analysis Algorithm (2018) (0)
- Guest Editors' Note: Special Issue on Clusters, Clouds, and Data for Scientific Computing (2013) (0)
- Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs (2022) (0)
- Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II (2009) (0)
- Preface (2007) (0)
- Recent Advances in the Message Passing Interface: 18th European MPI Users' Group Meeting, EuroMPI 2011, Santorini, Greece, September 18-21, 2011. ... / Programming and Software Engineering) (2011) (0)
- High Performance Computing : 30th International Conference, ISC High Performance 2015, Frankfurt, Germany, July 12-16, 2015 : proceedings (2015) (0)
- Algorithmic Issues on Heterogeneous Computing Platforms Algorithmic Issues on Heterogeneous Computing Platforms Algorithmic Issues on Heterogeneous Computing Platforms (1998) (0)
- Vector and parallel processing - VECPAR'98 : Third International Conference Porto, Portugal, June 21-23, 1998 : selected papers and invited talks (1999) (0)
- A Scalable Parallel Library for Numerical Linear Algebra. (1996) (0)
- HPC Forecast (2023) (0)
- Abstract: Matrices Over Runtime Systems at Exascale (2012) (0)
- Dependency-Driven Scheduling of Dense Matrix Factorizations on Shared-Memory Systems (2009) (0)
- Publishing House "Academic Publications": Founding Publisher Prof. Drumi Bainov Editorial Board (2015) (0)
- Power profiling of Cholesky and QR factorizations on distributed memory systems (2012) (0)
- We Thank Cnr and Murst for Nancial Support, Caspur for the Use of Their Dec{alpha Cluster and Cineca for an Allocation of Cpu Time on the Cray{c90 (0)
- TOP500 Supercomputers for November 2003 (2003) (0)
- Tiling on Systems with Communi ation / Computation Overlap (1997) (0)
- Performance Analysis of Parallel FFT on Large Multi-GPU Systems (2022) (0)
- Proceedings of the Third European PVM Conference on Parallel Virtual Machine (1996) (0)
- Solver Interface & Performance on Cori (2018) (0)
- PVM 3 Routines (1994) (0)
- software (SANS) effort (2006) (0)
- From Dinos to Rhinos (1994) (0)
- Chapter 3 Clustered Systems for Massive Parallelism Summary : Clustering (0)
- Preface (2001) (0)
- Introduction for August Special Issue CCDSC (2013) (0)
- Keeneland: Computational Science Using Heterogeneous GPU Computing (2017) (0)
- 0 V ' W % t SI 1 A PRECONDITIONED CONJUGATE GRADIENT METHOD FOR SOLVING A CLASS OF NON-SYMMETRIC LINEAR SYSTEMS by (2015) (0)
- Empirical Tuning of a Multiresolution Analysis Kernel using a Specialized Code Generator (2007) (0)
- Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part I (2020) (0)
- Highly Parallel Computing Solving a System of Dense Linear Equations Top500 -manufacturers (0)
- A Cross-Platform Infrastructure for Scalable Runtime Application Performance Analysis (2005) (0)
- And Climate Modeling. 4 Current Status and Availibility (1991) (0)
- Final Report on LLNL Subcontract B503962 Atlas (2001) (0)
- Changes in Dense Linear Algebra Kernels: Decades-Long Perspective (2011) (0)
- 6. Driver Routines for Least Squares Problems (2001) (0)
- Thread Level Speculation (TLS) Parallelization (2011) (0)
- Randomized Numerical Linear Algebra : A Perspective on the Field With an Eye to Software (2023) (0)
- Parallel Processing and Applied Mathematics, 8th International Conference, PPAM 2009, Wroclaw, Poland, September 13-16, 2009. Revised Selected Papers, Part I (2010) (0)
- Benchmarks to supplant export FPDR (Floating Point Data Rate) calculations (1988) (0)
- CONCURRENCY AND COMPUTATION : PRACTICE AND EXPERIENCE Concurrency Computat (2005) (0)
- NetSolve and Its Applications (2001) (0)
- Proposed Consistent Exception Handling for the BLAS and LAPACK (2022) (0)
- Pipelined Shared Memory Implementation of Linear Algebra Routines with Arbitrary Lookahead-LU , Cholesky , QR (0)
- General/Program Co-Chairs: (2008) (0)
- PAQR: Pivoting Avoiding QR factorization (2022) (0)
- Workshop 16: Performance evaluation and prediction (1997) (0)
- The Center for Grid Applications Development Software (1998) (0)
- Parallel Processing of Remotely Sensed Hyperspectral Images on Heterogeneous Clusters (2009) (0)
- D7.8 Release of the NLAFET library (2019) (0)
- Request Sequencing: Enabling Workflow for Efficient Parallel Problem Solving in GridSolve (2008) (0)
- Implementation of the C++ API for Batch BLAS (2018) (0)
- How PVM Works (1994) (0)
- Heterogeneous Platforms and Their Uses (2009) (0)
- TOP500 Supercomputers for June 2005 (2005) (0)
- Book Reviews : The Connection Machine (1987) (0)
- 5. Driver Routines for Linear Systems (2001) (0)
- for High-Performance Computers (1987) (0)
- Editorial (2009) (0)
- SLATE Mixed Precision Performance Report (2019) (0)
- Minimizing System Noise Effects For Extreme-Scale Scientific Simulation Through Function Delegation (2013) (0)
- Parallel and Distributed Processing and Applications, Third International Symposium, ISPA 2005, Nanjing, China, November 2-5, 2005, Proceedings (2005) (0)
- The 20th Heterogeneity in Computing Workshop (HCW 2011) (2011) (0)
- Grades Based on : RQ ZHHNO \ KRPHZRUN (2003) (0)
- High Performance Computing Trends and Self Adapting Numerical Software (2003) (0)
- Proceedings of the 29th European MPI Users' Group Meeting (2013) (0)
- Communication Performance Models for High‐Performance Heterogeneous Platforms (2009) (0)
- Computational Science - ICCS 2005, 5th International Conference, Atlanta, GA, USA, May 22-25, 2005, Proceedings, Part III (2005) (0)
- Benchmarking and Analysis of High Productibility Computing (HPCS) (2006) (0)
- Computational Science-ICCS 2003, Melbourne, Australia and St. Petersburg, Russia, Proceedings Part II (2003) (0)
- Autotuning dense linear algebra libraries on multicore architectures (2010) (0)
- Software-Defined Events (SDEs) in MAGMA-Sparse (2018) (0)
- Chapter 13 Parallel Linear Algebra Software (2005) (0)
- Introduction to the HPC Challenge Benchmark Suite - eScholarship (2005) (0)
- High Performance Realtime Convex Solver for Embedded Systems (2016) (0)
- Supernode Partitioning (2011) (0)
- Improvements in the efficient composition of applications built using a component-based programming environment (2004) (0)
- Distributed Information Management in the National HPCC Software Exchange (1995) (0)
- Developing a tuned version of scaLAPACK's linear equation solver (2000) (0)
- PVM User Interface (1994) (0)
- A Further Proposal for a Fortran 90 Interface for LAPACK (1997) (0)
- High-Performance GMRES Multi-Precision Benchmark: Design, Performance, and Challenges (2022) (0)
- Message from the program chairs of HPCC 2015 (2015) (0)
- UvA-DARE (Digital Academic Integrating agent-based modelling with copula theory: Preliminary insights and open problems (2020) (0)
- Software development for parallel systems (1991) (0)
- ParILUT - A New Parallel Threshold ILU (2018) (0)
- Netsolve and its application (2001) (0)
- International Workshop on Parallel Matrix Algorithms and Applications Parallel Restricted Maximum Likelihood Estimation for Linear Models with a Dense Exogenous Matrix. Iterative Methods Least-squares Polynomial Preconditioners for Symmetric Indefinite Linear Parallel Computation of Generalized Eige (2000) (0)
- Place-Transition Nets (2011) (0)
- Programming the next generation of supercomputers: proceedings for the Argonne workshop (1984) (0)
- 4. Positive Definite Band Matrices (1979) (0)
- Scalable Data Generation for Evaluating Mixed-Precision Solvers (2020) (0)
- Performance evaluation for petascale quantum simulation tools (2009) (0)
- Vector and Parallel Processing - VECPAR'96, Second International Conference, Porto, Portugal, September 25-27, Selected Papers (1997) (0)
- Strategic Use of Data Assimilation for Dynamic Data-Driven Simulation (2020) (0)
- Algebra Development and Scheduling with Cholesky Factorization (2015) (0)
- Hpcu '99 New Trends in High Performance Computing Annual Conference for Vendor-independent Hpc Users Group Conference Program Conference Organizers and Committees General Chairs Conference Chair Program Chair Local Organizers Is Hpc Platform Portability a Fallacy? 2:30 Calculating Radiative Heat Tra (2007) (0)
- HCW 2013 Keynote Talk (2013) (0)
- Numerical Linear Algebra Software for Heterogeneous Clusters (2009) (0)
- Appendix A: Appendix to Chapter 4 (2009) (0)
- Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface (2007) (0)
- Message from HPCC2016 Chairs (2017) (0)
- 7. Tridiagonal Matrices (1979) (0)
- 5. Performance of ScaLAPACK (1997) (0)
- Modeling of L 2 Cache Behavior for Thread-Parallel Scientific Programs on Chip MultiProcessors (2006) (0)
- Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (2017) (0)
- 8. Driver Routines for Generalized Eigenvalue Problems (2001) (0)
- Selected papers from the Second International Conference on Vector and Parallel Processing (1996) (0)
- GCC 2008 Conference Committee (2008) (0)
- Preface (2001) (0)
- A Draft Standard for Message Passing on Distributed Memory Computers (1993) (0)
- International Conference on Computational Science, ICCS 2013: Barcelona, Spain, June 5- June 7, 2013 (2013) (0)
- Mixed-Precision Algorithm for Finding Selected Eigenvalues and Eigenvectors of Symmetric and Hermitian Matrices1 (2022) (0)
- 6. Accuracy and Stability (1997) (0)
- Trade-offs in Context Identifier Allocation in MPI (2017) (0)
- Evaluation of dataflow programmingmodels for electronic structure theory (2018) (0)
- Institute in parallel computing: Final report (1988) (0)
- 3. Performance of LAPACK (1999) (0)
- Proc. 5th ICCS, Part III (2005) (0)
- Graphics tools for developing high-performance algorithms* (2020) (0)
- Preface (2020) (0)
- Hpcu '99 New Trends in High Performance Computing Annual Conference for Vendor-independent Hpc Users Group Conference Program Conference Organizers and Committees General Chairs Conference Chair Program Chair Local Organizers Is Hpc Platform Portability a Fallacy? 2:30 Calculating Radiative Heat Tra (2007) (0)
- Threshold Pivoting for Dense LU Factorization (2022) (0)
- Remembering Ken Kennedy (2007) (0)
- Guest editors’ note (2011) (0)
- Proceedings of the 5th international conference on Computational Science - Volume Part III (2006) (0)
- Parallel and Distributed System Simulation (1998) (0)
- Computational Science - ICCS 2008, 8th International Conference, Kraków, Poland, June 23-25, 2008, Proceedings, Part II (2008) (0)
- Loop Tiling (2011) (0)
- Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications (2006) (0)
- Interior state computation of nano structures (2008) (0)
- Portable and Efficient Dense Linear Algebra in the Beginning of the Exascale Era (2022) (0)
- A New Approach to Scientific Computation (Ulrich W. Kulisch and Willard L. Miranker, eds.) (1985) (0)
- 3. Contents of ScaLAPACK (1997) (0)
- Message from HPSEC Workshop Co-chairs (2006) (0)
- Vector and parallel processing - VECPAR ʾ96 : Second International Conference on Vector and Parallel Processing - Systems and Applications, Porto, Portugal, September 25-27, 1996 : selected papers (1997) (0)
- The Component Structure 1 of a Self-Adapting Numerical Software 2 System 3 (2005) (0)
- Workshop on Java and components for parallelism, distribution and concurrency - JAVAPDC (2009) (0)
- Computational Science – ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part IV (2020) (0)
- MPI: The Complete Reference [Book Review] (1997) (0)
- 7. Krylov Subspaces: Projection (1998) (0)
- Porting the PLASMA Numerical Library to the OpenMP Standard (2016) (0)
- POMPEI : Programming with OpenMP 4 for Exascale Investigations ∗ (2017) (0)
- and Thomas Schulthess fine-grained memory aware tasks GPU generalized eigensolver for electronic structure calculations based on − A novel hybrid CPU (2013) (0)
- Cluster 2003 Conference Organization Committee (2003) (0)
- Proceedings of the 1st international conference on Computational science: PartI (2003) (0)
- Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers (2022) (0)
- Guest editors’ note: Special issue on clusters, clouds, and data for scientific computing (2019) (0)
- Numerical Libraries and the Grid Numerical Libraries and the Grid Motivation on the Grid (2001) (0)
- Least Squares Performance Report (2018) (0)
- Proceedings of the First International Workshop on Parallel Scientific Computing (1994) (0)
- Guest Editors' Note: Special Issue on Clusters, Clouds and Data for Scientific Computing (2015) (0)
- High Productivity Computing Systems (HPCS) Library Study Effort (2008) (0)
- Enabling workflows in GridSolve: request sequencing and service trading (2011) (0)
- 5. Documentation and Software Conventions (1999) (0)
- 8. The Cholesky Decomposition (1979) (0)
- LAPACK Working Note 101 A Proposal for a Fortran 90 Interface forLAPACKJack (2013) (0)
- Creating Software Technology to Harness the Power of Leadership-class Computing Systems (2007) (0)
- Means of Achieving Cross-program Focus, Coordination, and Technology Transfer (1995) (0)
- 1 Reliability and Performance Models for Grid Computing (2010) (0)
- 10. Updating QR & Cholesky Decompositions (1979) (0)
- Designing algorithms in linear algebra for different computer architectures (1984) (0)
- Efficient Eigensolver Algorithms on Accelerator Based Architectures (2015) (0)
- Implementing Matrix Inversions (2019) (0)
- An Iterative Solver Benchmark Lapack working note 152 (0)
- Proceedings of the 2003 international conference on Computational science: PartIII (2003) (0)
- Proceedings of the Second Workshop on Environments and Tools for Scientific Computing (1994) (0)
- PLASMA View project Performance API ( PAPI ) View project (2016) (0)
- CEED ECP Milestone Report: Improve Performance and Capabilities of CEED-Enabled ECP Applications on Summit/Sierra (2020) (0)
- Updating incomplete factorization preconditioners for model order reduction (2016) (0)
- 2016 Dense Linear Algebra Software Packages Survey (2016) (0)
- 9. The QR Decomposition (1979) (0)
- Parallel Processing and Applied Mathematics: 6th International Conference, PPAM 2005Poznan, Poland, September 11-14, 2005 Revised Selected Papers (Lecture Notes in Computer Science) (2006) (0)
- PARA'04, State-of-the-art in scientific computing: LNCS Proceedings (2006) (0)
- I In the midst of rapid development of high performance computing beyond the petascale and the emergence of new (2014) (0)
- LAPACK is now available (1992) (0)
- O the Quest for Petascale Computing H I G H -p E R F O R M a N C E C O M P U T I N G (0)
- Special Topic: High Performance Computing A new metric for ranking high-performance computing systems (2016) (0)
- Overlap Communication in MPI Implementations (2014) (0)
- LAPACK FOR FORTRAN 90 (2011) (0)
- EduPar Keynote (2017) (0)
- Dam Eguelin (2007) (0)
- ASYNCHRONOUS ITERATIVE SOLVERS FOR EXTREME-SCALE COMPUTING (2021) (0)
- Preconditioning Communication-Avoiding Krylov Methods. (2015) (0)
- Parallel Processing and Applied Mathematics, 7th International Conference, PPAM 2007, Gdansk, Poland, September 9-12, 2007, Revised Selected Papers (2008) (0)
- Au th or ' s pe rs on al co py The use of bulk states to accelerate the band edge state calculation of a semiconductor quantum dot q (2006) (0)
- Providing Access to High Performance Computing Technologies 1.1 Overview of the Nhse (1996) (0)
- Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007 (2007) (0)
- 6. Direct Solution of Sparse Linear Systems (1998) (0)
- PaRSEC: A Software Framework for Performance and Productivity on Hybrid, Manycore Platforms (2016) (0)
- High Performance Computing for Computational Science - VECPAR 2004: 6th International Conference, Valencia, Spain, June 28-30, 2004, Revised Selected and ... Papers (Lecture Notes in Computer Science) (2005) (0)
- Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title (2005) (0)
- FFT-ECP Fast Fourier Transform (2019) (0)
- Special Issue: Manycore and Accelerator-based High-performance Scientific Computing Introduction (2012) (0)
- Interactive and Dynamic Content in Software Repositories (1997) (0)
- Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption (2018) (0)
- 2. Band Matrices (1979) (0)
- Static Scheduling for Distributed Applications on the Grid Using Genetic Algorithm (2005) (0)
- Proceedings of the First international conference on High Performance Computing and Communications (2005) (0)
- Appendix A: Appendix to Chapter 3 (2009) (0)
- Parallel Prefix Algorithms (2011) (0)
- Linear-Algebra Programs (1982) (0)
- HPCS HPCchallenge Benchmark Suite (2005) (0)
- Basis Programming Techniques (1994) (0)
- Preface (2000) (0)
- Comparing Distributed Termination Detection Algorithms for Modern HPC Platforms (2022) (0)
- 5. Symmetric Indefinite Matrices (1979) (0)
- [2] Edward Beltrami, Mathematical Models for Society and Biology, Academic (0)
- LAPACK Working Note 93: Installation Guide for ScaLAPACK (VERSION 1.0) (1995) (0)
- Preface (1994) (0)
- Preface: Clusters and Computational Grids for Scientific Computing (2001) (0)
- Parallel Norms Performance Report (2018) (0)
- Pentium (1995) (0)
- Parallel Processing and Applied Mathematics, 5th International Conference, PPAM 2003, Czestochowa, Poland, September 7-10, 2003. Revised Papers (2004) (0)
- Context Identifier Allocation in Open MPI (2016) (0)
- Modied Cyclic Algo- Rithms for Solving Triangular Systems on Distributed-memory Multiprocessors, Siam Complexity of Dense-linear-system Solution on a Multi- Processor Ring (2007) (0)
- On Designing Portable High Performance . . . (1991) (0)
- Position Paper (1995) (0)
- Foreword (2009) (0)
- The TOP500 Report 1995 (1996) (0)
- Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers (2001) (0)
- 4. Data Distributions and Software Conventions (1997) (0)
- Computational Science - ICCS 2004 (2004) (0)
- Evaluation of high-performance computing software (1996) (0)
- Programming Systems for High‐Performance Heterogeneous Computing (2009) (0)
- Tools to aid in the development high-performance algorithms (1989) (0)
- Matri xProduc to nHeterogeneou sMaster-Worke rPlatforms (2008) (0)
- Computational Science - ICCS 2001: International Conference San Francisco, CA, USA, May 28—30, 2001 Proceedings, Part II (2001) (0)
- Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency (2011) (0)
- 6. Installing LAPACK Routines (1999) (0)
- LAWN 294: Aasen's Symmetric Indenite Linear Solvers in LAPACK (2017) (0)
- Repository Interoperation and Access Control (1998) (0)
- Proceedings of the Third international conference on Parallel and Distributed Processing and Applications (2005) (0)
- The Future of the BLAS (1999) (0)
- An Empirical View of SLATE Algorithms on Scalable Hybrid System (2019) (0)
- 6. Triangular Matrices (1979) (0)
- 4. Accuracy and Stability (1999) (0)
- Panel: many-task computing meets exascales (2011) (0)
- SLATE Working Note 13: Implementing Singular Value and Symmetric Eigenvalue Solvers (2019) (0)
- Task based Cholesky decomposition on Xeon Phi architectures using OpenMP (2018) (0)
- Coordinated Fault Tolerance for High-Performance Computing (2013) (0)
- High Performance Computing and Communications, First International Conference, HPCC 2005, Sorrento, Italy, September 21-23, 2005, Proceedings (2005) (0)
- Computational Science — ICCS 2003 (2003) (0)
- Extreme-scale Algorithms and Solver Resilience (2016) (0)
- Overview of Recent SupercomputersAad (1996) (0)
- Initial Integration and Evaluation of SLATE Parallel BLAS in LATTE (2018) (0)
- Scalable Ecosystems for Data Science ( SEDS ) (0)
- An iterative solver benchmark 1 (2014) (0)
- 9. Driver Routines for Singular Value Problems (2001) (0)
- REVIEWS AND DESCRIPTIONS OF TABLES AND BOOKS (1990) (0)
- The Netsolve Project in Denmark (1997) (0)
- Initial Integration and Evaluation of SLATE and STRUMPACK (2018) (0)
- A Distributed Memory Implementation of the Nonsymmetric QR Algorithm (1997) (0)
- Special-Purpose Machines (2011) (0)
- 10. Linear Eigenvalue Problems Ax=λx (1998) (0)
- Editorial (1992) (0)
- Fault Tolerance in Message Passing and in Action (2004) (0)
- Tensor Contractions using Optimized Batch GEMM Routines (2018) (0)
- Parallel Processing and Applied Mathematics (2011) (0)
- Benchmarks to Supplant Export "Fpdr" Calculations (2017) (0)
- Fast Fourier Transforms (2010) (0)
- AFRL-RY-WP-TR-2012-0137 BLACKJACK (2012) (0)
- Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface (2012) (0)
- Algorithms and Libraries (1998) (0)
- Cholesky Across Accelerators (2015) (0)
- 8. Iterative Methods for Linear Systems (1998) (0)
- Does your tool support PAPI SDEs yet (2019) (0)
- Perfmon: an On-line Performance Monitoring Library for Heterogeneous Environments (1996) (0)
- Computational Science - ICCS 2006, 6th International Conference, Reading, UK, May 28-31, 2006, Proceedings, Part III (2006) (0)
- Waveguides for spin-polarized currents in diluted magnetic semiconductor — nanomagnet hybrids (2009) (0)
- TOP500 Supercomputers for June 2004 (2004) (0)
- Eigenvalue Computation with NetSolve Global Computing System (2005) (0)
- International Conference on Computational Science 2016, ICCS 2016, 6-8 June 2016, San Diego, California, USA (2016) (0)
- Preface (1970) (0)
- Performance evaluation of LU factorization through hardware counter measurements (2012) (0)
- MAtrix, TEnsor, and Deep-learning Optimized Routines (MATEDOR) (2018) (0)
- Scheduling Block-Cyclic Array Redistribution* (1997) (0)
- Parallel Processing and Applied Mathematics (2013) (0)
- 10. Computational Routines (2001) (0)
- 7 Acknowledgements (2007) (0)
- 3. Positive Definite Matrices (1979) (0)
- Deep Gaussian process with multitask and transfer learning for performance optimization (2022) (0)
- Trends in high performance computing and using numerical libraries on clusters (2002) (0)
- Algorithm design for high-performance computers (1986) (0)
- Tools for Developing and Analyzing Parallel For (2007) (0)
- Formulation of Requirements for new PAPI++ Software Package: Part I: Survey Results (2020) (0)

This paper list is powered by the following services:

## Other Resources About Jack Dongarra

## What Schools Are Affiliated With Jack Dongarra?

Jack Dongarra is affiliated with the following schools: