Pr P. Sadayappan

Pr P. Sadayappan's AcademicInfluence.com Rankings

Pr P. Sadayappan

Computer Science

#8122

World Rank

#8543

Historical Rank

Parallel Computing

#42

World Rank

#44

Historical Rank

Algorithms

#319

World Rank

#323

Historical Rank

Database

#5155

World Rank

#5354

Historical Rank

computer-science Degrees

Download Badge

Computer Science

Why Is Pr P. Sadayappan Influential?

(Suggest an Edit or Addition)

(See a Problem?)

Pr P. Sadayappan's Published Works

Number of citations in a given year to any of this author's works

Total number of citations to an author for the works they published in a given year. This highlights publication of the most important work(s) by the author

Published Works

A practical automatic polyhedral parallelizer and locality optimizer (2008) (914)
Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems (2008) (395)
Scalable work stealing (2009) (284)
High-performance code generation for stencil computations on GPU architectures (2012) (252)
Automatic C-to-CUDA Code Generation for Affine Programs (2010) (244)
Effective automatic parallelization of stencil computations (2007) (238)
Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model (2008) (237)
On improving the performance of sparse matrix-vector multiplication (1997) (230)
A compiler framework for optimization of affine loop nests for gpgpus (2008) (227)
Languages and Compilers for Parallel Computing (1992) (216)
Compile-Time Techniques for Data Distribution in Distributed Memory Machines (1991) (199)
Distributed job scheduling on computational Grids using multiple simultaneous requests (2002) (191)
Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models (2005) (186)
Characterization of backfilling strategies for parallel job scheduling (2002) (183)
UTS: An Unbalanced Tree Search Benchmark (2006) (178)
Scalable I/O forwarding framework for high-performance computing systems (2009) (171)
Polyhedral-based data reuse optimization for configurable computing (2013) (161)
Task allocation onto a hypercube by recursive mincut bipartitioning (1990) (153)
Annotation-based empirical performance tuning using Orio (2009) (151)
On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines (1993) (137)
Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining (2011) (136)
Nearest-Neighbor Mapping of Finite Element Graphs onto Processor Meshes (1987) (133)
Loop transformations: convexity, pruning and optimization (2011) (132)
Selective Reservation Strategies for Backfill Job Scheduling (2002) (131)
Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications (2014) (129)
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories (2008) (129)
When polyhedral transformations meet SIMD code generation (2013) (125)
A stencil compiler for short-vector SIMD architectures (2013) (123)
PLuTo: A Practical and Fully Automatic Polyhedral Program Optimization System (2015) (120)
Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures (2011) (116)
Automatic code generation for many-body electronic structure methods: the tensor contraction engine (2006) (115)
Cluster partitioning approaches to mapping parallel programs onto a hypercube (1987) (112)
Scheduling of Parallel Jobs in a Heterogeneous Multi-site Environement (2003) (111)
Hybrid Hexagonal/Classical Tiling for GPUs (2014) (111)
Communication-Free Hyperplane Partitioning of Nested Loops (1991) (106)
Automatic Selection of Sparse Matrix Representation on GPUs (2015) (104)
Predictive Modeling in a Polyhedral Optimization Space (2011) (101)
Parametric multi-level tiling of imperfectly nested loops (2009) (97)
A reliable multicast algorithm for mobile ad hoc networks (2002) (92)
Tiling Multidimensional Itertion Spaces for Multicomputers (1992) (87)
The rectilinear steiner arborescence problem (1992) (86)
Iterative Algorithms for Solution of Large Sparse Systems of Linear Equations on Hypercubes (1988) (84)
Combined iterative and model-driven optimization in an automatic parallelization framework (2010) (84)
Adaptive sparse tiling for sparse matrix multiplication (2019) (83)
Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors (2009) (80)
Tiling multidimensional iteration spaces for nonshared memory machines (1991) (79)
Split tiling for GPUs: automatic parallelization using trapezoidal tiles (2013) (79)
Dynamic Load Balancing of Unbalanced Computations Using Message Passing (2007) (79)
PARDA: A Fast Parallel Reuse Distance Analysis Algorithm (2012) (77)
On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution (1997) (76)
Scioto: A Framework for Global-View Task Parallelism (2008) (76)
Parameterized tiling revisited (2010) (73)
Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors (2009) (71)
Parallel FPGA-based all-pairs shortest-paths in a directed graph (2006) (70)
A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction (1993) (70)
Effective automatic parallelization and locality optimization using the polyhedral model (2008) (66)
Optimization by neural networks (1988) (65)
A methodology for parallelizing programs for multicomputers and complex memory multiprocessors (1989) (65)
Communication modeling of heterogeneous networks of workstations for performance characterization of collective operations (1999) (65)
A framework for enhancing data reuse via associative reordering (2014) (63)
Dynamic trace-based analysis of vectorization potential of applications (2012) (62)
Tiling of Iteration Spaces for Multicomputers (1990) (62)
DynTile: Parametric tiled loop generation for parallel execution on multicore processors (2010) (62)
Effective Selection of Partition Sizes for Moldable Scheduling of Parallel Jobs (2002) (62)
An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs (2014) (61)
A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry (2002) (61)
Analytical Bounds for Optimal Tile Size Selection (2012) (61)
Optimal loop unrolling for GPGPU programs (2010) (61)
Using machine learning to improve automatic vectorization (2012) (60)
Moldable Parallel Job Scheduling Using Job Efficiency: An Iterative Approach (2006) (58)
Communication efficient matrix multiplication on hypercubes (1994) (57)
Hybrid parallel programming with MPI and unified parallel C (2010) (57)
Space-time trade-off optimization for a class of electronic structure calculations (2002) (56)
Towards a 'neural' architecture for abductive reasoning (1988) (55)
Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning (2009) (55)
Selective buddy allocation for scheduling parallel jobs on clusters (2002) (54)
An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications (2009) (52)
QoPS: A QoS Based Scheme for Parallel Job Scheduling (2003) (51)
Efficient transposition algorithms for large matrices (1993) (51)
Selective preemption strategies for parallel job scheduling (2002) (51)
Multi-phase array redistribution: modeling and evaluation (1995) (51)
Using overlays for efficient data transfer over shared wide-area networks (2008) (48)
Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization (2001) (46)
Characterizing and enhancing global memory data coalescing on GPUs (2015) (45)
Efficient sparse-matrix multi-vector product on GPUs (2018) (45)
Job fairness in non-preemptive job scheduling (2004) (45)
An approach to communication-efficient data redistribution (1994) (43)
Memory-optimal evaluation of expression trees involving large objects (1999) (43)
Enabling software management for multicore caches with a lightweight hardware support (2009) (43)
A hypergraph partitioning based approach for scheduling of tasks with batch-shared I/O (2005) (42)
Loop optimization for a class of memory-constrained computations (2001) (42)
A Robust Scheduling Strategy for Moldable Scheduling of Parallel Jobs. (2003) (42)
Register optimizations for stencils on GPUs (2018) (40)
Code generation for parallel execution of a class of irregular loops on distributed memory systems (2012) (40)
Static and Dynamic Frequency Scaling on Multicore CPUs (2016) (39)
Unfairness Metrics for Space-Sharing Parallel Job Schedulers (2005) (39)
Circuit Simulation on Shared-Memory Multiprocessors (1988) (38)
ATP: Directed Graph Embedding with Asymmetric Transitivity Preservation (2018) (37)
Integrating parallel file systems with object-based storage devices (2007) (37)
High-performance sparse matrix-vector multiplication on GPUs for structured grid computations (2012) (36)
Fast NIC-based barrier over Myrinet/GM (2001) (36)
Analytical modeling of cache behavior for affine programs (2017) (36)
Broadcast/Multicast over Myrinet Using NIC-Assisted Multidestination Messages (2000) (35)
Automatic mapping of nested loops to FPGAS (2007) (34)
Mapping combinatorial optimization problems onto neural networks (1995) (33)
Performance optimization of tensor contraction expressions for many-body methods in quantum chemistry. (2009) (33)
MultiGraph: Efficient Graph Processing on GPUs (2017) (32)
Load-Balanced Sparse MTTKRP on GPUs (2019) (32)
Complete exchange in 2D meshes (1994) (32)
Evaluating the Impact of Programming Language Features on the Performance of Parallel Applications on Cluster Architectures (2003) (31)
Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations (2005) (31)
Resource conscious reuse-driven tiling for GPUs (2016) (31)
Toward Optimizing Latency Under Throughput Constraints for Application Workflows on Clusters (2007) (31)
The Relation Between Diamond Tiling and Hexagonal Tiling (2014) (31)
A robust scheduling technology for moldable scheduling of parallel jobs (2003) (30)
An Integrated Approach for Processor Allocation and Scheduling of Mixed-Parallel Applications (2006) (30)
Global communication optimization for tensor contraction expressions under memory constraints (2003) (30)
An integrated framework for performance-based optimization of scientific workflows (2009) (30)
Characterization and enhancement of dynamic mapping heuristics for heterogeneous systems (2000) (30)
Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations (2018) (29)
A dynamic scheduling approach for coordinated wide-area data transfers using GridFTP (2008) (29)
Towards provision of quality of service guarantees in job scheduling (2004) (28)
Distributed memory code generation for mixed Irregular/Regular computations (2015) (28)
Performance optimization of a class of loops implementing multidimensional integrals (1999) (28)
A Code Generator for High-Performance Tensor Contractions on GPUs (2019) (28)
Optimization of a Class of Multi-Dimensional Integrals on Parallel Machines (1997) (27)
EXTENT: a portable programming environment for designing and implementing high-performance block recursive algorithms (1994) (27)
Towards effective automatic parallelization for multicore systems (2008) (27)
Combining analytical and empirical approaches in tuning matrix transposition (2006) (26)
Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths (2006) (26)
A Communication-Optimal Framework for Contracting Distributed Tensors (2014) (25)
Analytical characterization and design space exploration for optimization of CNNs (2021) (25)
Effective Machine Learning Based Format Selection and Performance Modeling for SpMV on GPUs (2018) (25)
Hypergraph Partitioning for Automatic Memory Hierarchy Management (2006) (25)
Parallel Job Scheduling Policies to Improve Fairness: A Case Study (2006) (25)
Implementing Fast Fourier Transforms on Distributed-Memory Multiprocessors Using Data Redistributions (1994) (25)
On Characterizing the Data Access Complexity of Programs (2014) (24)
Optimistic Delinearization of Parametrically Sized Arrays (2015) (24)
Optimal Algorithms for All-to-All Personalized Communication on Rings and Two Dimensional Tori (1997) (24)
Neural Network Assisted Tile Size Selection (2010) (24)
A Framework for Generating Distributed-Memory Parallel Programs for Block Recursive Algorithms (1986) (24)
Compiler/Runtime Framework for Dynamic Dataflow Parallelization of Tiled Programs (2015) (24)
SDSLc: a multi-target domain-specific compiler for stencil computations (2015) (24)
An OSD-based approach to managing directory operations in parallel file systems (2008) (23)
A Duplication Based Algorithm for Optimizing Latency Under Throughput Constraints for Streaming Workflows (2008) (23)
Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms (2003) (23)
Model-Driven SIMD Code Generation for a Multi-resolution Tensor Kernel (2011) (22)
Compile-time techniques for parallel execution of loops on distributed memory multiprocessors (1990) (22)
Parametric Tiling of Affine Loop Nests (2010) (22)
Efficient Sparse Matrix Factorization for Circuit Simulation on Vector Supercomputers (1989) (22)
Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions (2012) (22)
Dynamic selection of tile sizes (2011) (22)
PrimeTile: A Parametric Multi-Level Tiler for Imperfect Loop Nests (2009) (21)
A framework for load balancing of Tensor Contraction expressions via dynamic task partitioning (2013) (21)
Parallel CCD++ on GPU for Matrix Factorization (2017) (21)
Applying MPI derived datatypes to the NAS benchmarks: A case study (2004) (21)
Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals (1999) (20)
Affine Transformations for Communication Minimal Parallelization and Locality Optimization of Arbitrarily Nested Loop Sequences (2007) (20)
Analytical cache modeling and tilesize optimization for tensor contractions (2019) (20)
Implementation and performance of a binary lattice gas algorithm on parallel processor systems (1989) (20)
On fusing recursive traversals of K-d trees (2016) (20)
Efficient parallel out-of-core matrix transposition (2004) (20)
PolyCheck: dynamic verification of iteration space transformations on affine programs (2016) (20)
Effective resource management for enhancing performance of 2D and 3D stencils on GPUs (2016) (19)
Optimizing latency and throughput of application workflows on clusters (2011) (19)
Stratification driven placement of complex data: A framework for distributed data analytics (2013) (19)
Assessment and enhancement of meta-schedulers for multi-site job sharing (2005) (19)
Efficient Index Set Generation for Compiling HPF Array Statements on Distributed-Memory Machines (1996) (18)
Effective padding of multidimensional arrays to avoid cache conflict misses (2016) (18)
Parameterized specification, configuration and execution of data-intensive scientific workflows (2010) (18)
Hybrid Iterative and Model-Driven Optimization in the Polyhedral Model (2008) (18)
An Approach to Communication-eecient Data Redistribution (1994) (18)
Revisiting the metadata architecture of parallel file systems (2008) (17)
All-to-all broadcast on switch-based clusters of workstations (1999) (17)
UPC Implementation of an Unbalanced Tree Search Benchmark (2003) (17)
Memory-Constrained Data Locality Optimization for Tensor Contractions (2003) (17)
Identifying Cost-Effective Common Subexpressions to Reduce Operation Count in Tensor Contraction Evaluations (2006) (17)
An algebraic theory for modeling direct interconnection networks (1992) (17)
Modeling and Optimizing Large-Scale Wide-Area Data Transfers (2014) (17)
An efficient mixed-mode representation of sparse tensors (2019) (17)
On improving performance of sparse matrix-matrix multiplication on GPUs (2017) (17)
Register allocation and promotion through combined instruction scheduling and loop unrolling (2016) (17)
Performance benefits of NIC-based barrier on myrinet/GM (2001) (17)
MOLAR: adaptive runtime support for high-end computing operating and runtime systems (2006) (16)
Accelerating Strassen-Winograd's matrix multiplication algorithm on GPUs (2013) (16)
Sampled Dense Matrix Multiplication for High-Performance Machine Learning (2018) (16)
A Domain-Specific Compiler for a Parallel Multiresolution Adaptive Numerical Simulation Environment (2016) (16)
Optimization of Memory Usage and Communication Requirements for a Class of Loops Implementing Multi-Dimensiona l Integrals (1999) (16)
On Optimizing Complex Stencils on GPUs (2019) (16)
VIBe: a micro-benchmark suite for evaluating virtual interface architecture (VIA) implementations (2001) (16)
An elegant sufficiency: load-aware differentiated scheduling of data transfers (2015) (15)
On characterizing the data movement complexity of computational DAGs for parallel execution (2014) (15)
StVEC: A Vector Instruction Extension for High Performance Stencil Computation (2011) (15)
Global Trees: A framework for linked data structures on distributed memory parallel systems (2008) (15)
On fairness in distributed job scheduling across multiple sites (2004) (15)
Practical abduction: characterization, decomposition and concurrency (1995) (15)
Mapping Finite Element Graphs onto Processor Meshes (1987) (14)
Supernodal Sparse Cholesky Factorization on Distributed-Memory Multiprocessors (1993) (14)
Task Scheduling and File Replication for Data-Intensive Jobs with Batch-shared I/O (2006) (14)
A performance optimization framework for compilation of tensor contraction expressions into parallel (2002) (14)
Data Access Complexity: The Red/Blue Pebble Game Revisited (2013) (14)
Characterizing dataset dependence for sparse matrix-vector multiplication on GPUs (2015) (14)
Automated derivation of parametric data movement lower bounds for affine programs (2019) (14)
Are nonblocking networks really needed for high-end-computing workloads? (2008) (14)
Locality Conscious Processor Allocation and Scheduling for Mixed Parallel Applications (2006) (13)
A Compiler Analysis to Determine Useful Cache Size for Energy Efficiency (2013) (13)
Performance modeling and optimization of parallel out-of-core tensor contractions (2005) (13)
A message passing benchmark for unbalanced applications (2008) (13)
Memory-Constrained Communication Minimization for a Class of Array Computations (2002) (12)
PIPES: A Language and Compiler for Task-Based Programming on Distributed-Memory Clusters (2016) (12)
A framework for characterizing overlap of communication and computation in parallel applications (2008) (12)
Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver (2004) (12)
Compiling Array Statements for Efficient Execution on Distributed-Memory Machines: Two-Level Mappings (1995) (12)
Scheduling File Transfers for Data-Intensive Jobs on Heterogeneous Clusters (2007) (12)
On Using the Roofline Model with Lower Bounds on Data Movement (2015) (11)
A fast implementation of MLR-MCL algorithm on multi-core processors (2014) (11)
Beyond reuse distance analysis (2013) (11)
Efficient search‐space pruning for integrated fusion and tiling transformations (2005) (11)
Implementing TreadMarks over Virtual Interface Architecture on Myrinet and gigabit Ethernet: Challenges, design experience, and performance evaluation (2001) (11)
Analyzing and Minimizing the Impact of Opportunity Cost in QoS-aware Job Scheduling (2007) (11)
A methodology for generating data distributions to optimize communication (1992) (11)
Low Latency Message-Passing for Reflective Memory Networks (1999) (11)
Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions (2005) (11)
Multi-Phase Redistribution: A Communication-Efficient Approach to Array Redistribution (1995) (10)
A Data Locality Aware Online Scheduling Approach for I/O-Intensive Jobs with File Sharing (2006) (10)
Selective Recovery from Failures in a Task Parallel Programming Model (2010) (10)
Understanding parallelism-inhibiting dependences in sequential Java programs (2010) (10)
A technique for overlapping computation and communication for block recursive algorithms (1998) (10)
Parametric GPU Code Generation for Affine Loop Programs (2013) (10)
Optimizing Tensor Contractions in CCSD(T) for Efficient Execution on GPUs (2018) (10)
Efficient static scheduling of loops on synchronous multiprocessors (1989) (10)
Efficient Tiled Sparse Matrix Multiplication through Matrix Signatures (2020) (10)
Associative Instruction Reordering to Alleviate Register Pressure (2018) (10)
On data dependence analysis for compiling programs on distributed-memory machines (extended abstract) (1993) (10)
Polyhedral Model (10)
Multifrontal Factorization of Sparse Matrices on Shared-Memory Multiprocessors (1991) (9)
Load Balancing of Dynamical Nucleation Theory Monte Carlo Simulations through Resource Sharing Barriers (2012) (9)
GPU code optimization using abstract kernel emulation and sensitivity analysis (2018) (9)
Memory minimization for tensor contractions using integer linear programming (2006) (9)
Multi-hop path splitting and multi-pathing optimizations for data transfers over shared wide-area networks using gridFTP (2008) (9)
On Recovering Multi-Dimensional Arrays in Polly (2015) (9)
NIC-based rate control for proportional bandwidth allocation in Myrinet clusters (2001) (9)
A global address space approach to automated data management for parallel Quantum Monte Carlo applications (2012) (9)
Optimal Reordering and Mapping of a Class of Nested-Loops for Parallel Execution (1996) (9)
PWCET: Power-Aware Worst Case Execution Time Analysis (2014) (8)
Nested Loop Tiling for Distributed Memory Machines (1990) (8)
An extensible global address space framework with decoupled task and data abstractions (2006) (8)
On Efficient Out-of-core Matrix Transposition � (2003) (8)
Compiler-assisted detection of transient memory errors (2014) (8)
Framework for Distributed Contractions of Tensors with Symmetry (2013) (8)
Parallelization and performance evaluation of circuit simulation on a shared-memory multiprocessor (1988) (8)
Access based data decomposition fam distributed memory machines (1991) (8)
Performance Optimization of a Class of Loops Involving Sums of Products of Sparse Arrays (1999) (8)
Incremental Generation of Index Sets for Array Statement Execution on Distributed-Memory Machines (1994) (8)
Implementing TreadMarksover VIA on Myrinet and Gigabit Ethernet: Challenges, Design Experience, and Performance Evaluation (2001) (8)
Automatic parallelization of a class of irregular loops for distributed memory systems (2014) (7)
Scheduling of tasks with batch-shared I/O on heterogeneous systems (2006) (7)
Communication reduction for distributed sparse matrix factorization on a processor mesh (1989) (7)
Adaptive parallel tiled code generation and accelerated auto-tuning (2013) (7)
TTLG - An Efficient Tensor Transposition Library for GPUs (2018) (7)
Differentiated Scheduling of Response-Critical and Best-Effort Wide-Area Data Transfers (2016) (7)
Partitioning Graphs on Message-Passing Machines by Pairwise Mincut (1998) (7)
On the Synthesis of Parallel Programs from Tensor Product Formulas for Block Recursive Algorithms (1992) (7)
Parallel Direct Solution of Sparse Linear Systems (1993) (7)
Low-latency message passing on workstation clusters using SCRAMNet (1999) (7)
A Methodology for Generating Efficient Disk-Based Algorithms from Tensor Product Formulas (1993) (7)
PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization (2019) (6)
A Performance Instrumentation Framework to Characterize Computation-Communication Overlap in Message-Passing Systems (2006) (6)
Effective Utilization of Tensor Symmetry in Operation Optimization of Tensor Contraction Expressions (2012) (6)
GAMMA : Global Arrays Meets MATLAB ∗ (2006) (6)
Brief Announcement: Approximating the I/O Complexity of One-Shot Red-Blue Pebbling (2016) (6)
An Algebraic Approach to Cache Memory Characterization for Block Recursive Algorithms (1994) (6)
A model-driven blocking strategy for load balanced sparse matrix-vector multiplication on GPUs (2015) (6)
Application-Specific Fault Tolerance via Data Access Characterization (2011) (6)
Compiler Support for Software Cache Coherence (2016) (6)
ALO-NMF: Accelerated Locality-Optimized Non-negative Matrix Factorization (2020) (6)
On sparse matrix reordering for parallel factorization (1994) (6)
Iteration space tiling for distributed memory machines (1992) (5)
An approach to locality-conscious load balancing and transparent memory hierarchy management with a global-address-space parallel programming model (2006) (5)
Work stealing for GPU‐accelerated parallel programs in a global address space framework (2016) (5)
REVENUE MAXIMIZATION IN MARKET-BASED PARALLEL JOB SCHEDULERS (5)
On mapping data and computation for parallel sparse Cholesky factorization (1995) (5)
Robust scheduling of moldable parallel jobs (2004) (5)
Global graphs: A middleware for large scale graph processing (2014) (5)
Fault oblivious eXascale whitepaper (2011) (5)
Efficient multicast algorithms for switch-based irregular heterogeneous networks of workstations (2001) (5)
Cache miss characterization and data locality optimization for imperfectly nested loops on shared memory multiprocessors (2005) (4)
Compile-time and run-time optimizations for enhancing locality and parallelism on multi-core and many-core systems (2009) (4)
Optimizing the Four-Index Integral Transform Using Data Movement Lower Bounds Analysis (2017) (4)
CAST: Contraction Algorithm for Symmetric Tensors (2014) (4)
IOOpt: automatic derivation of I/O complexity bounds for affine programs (2021) (4)
Parallel Latent Dirichlet Allocation on GPUs (2018) (4)
Opportune Job Shredding: An Effective Approach for Scheduling Parameter Sweep Appli (2003) (4)
The Promises of Hybrid Hexagonal/Classical Tiling for GPU (2013) (4)
Memory-adaptive parallel sparse Cholesky factorization (1994) (4)
Scalable Heterogeneous Execution of a Coupled-Cluster Model with Perturbative Triples (2020) (4)
Data and Computation Abstractions for Dynamic and Irregular Computations (2005) (4)
Compile-Time Charactirization Recurrent Patterns in Irregular Computations (1993) (4)
Characterization of Data Movement Requirements for Sparse Matrix Computations on GPUs (2017) (4)
Layout transformation support for the disk resident arrays framework (2006) (4)
A Roofline-Based Performance Estimator for Distributed Matrix-Multiply on Intel CnC (2015) (3)
An Efficient Distributed Shared Memory Toolbox for MATLAB (3)
Domain Specific Language Support for Exascale (2017) (3)
Global‐view coefficients: a data management solution for parallel quantum Monte Carlo applications (2016) (3)
International Conference on Computational Science, ICCS 2012 (2012) (3)
A Domain-Specific Language and Compiler for Stencil Computations on Short-Vector SIMD and GPU Architectures (2013) (3)
Qos in parallel job scheduling (2008) (3)
Electronic Structure Methods: The Tensor Contraction Engine ⁄ (2015) (3)
Balancing Web server load for adaptable video distribution (2000) (3)
Parallel Data-Local Training for Optimizing Word2Vec Embeddings for Word and Graph Embeddings (2019) (3)
Characterizing Computation-Communication Overlap in Message-Passing Systems (2008) (3)
Use of PVFS for efficient execution of jobs with pipeline-shared I/O (2004) (3)
A Clustering Algorithm for Parallel Sparse Cholesky Factorization (1995) (3)
Architecting and Programming a Hardware-Incoherent Multiprocessor Cache Hierarchy (2016) (3)
Non-collective parallel I/O for global address space programming models (2007) (3)
One-to-one mapping of process graphs onto a hypercube (1989) (3)
Empirical Performance-Model Driven Data Layout Optimization (2004) (3)
A high productivity framework for parallel data intensive computing in matlab (2009) (2)
Data access optimizations for parallel computers (1998) (2)
Communication-efficient implementation of block recursive algorithms on distributed-memory machines (1994) (2)
TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition (2022) (2)
Techniques for Providing Hard Quality‐of‐Service Guarantees in Job Scheduling (2009) (2)
A reordering and mapping algorithm for parallel sparse Cholesky factorization (1994) (2)
On the automatic generation of data distributions (1993) (2)
A Compiler Framework for Optimization of Affine Loop Nests for General Purpose Computations on GPUs (2015) (2)
Introduction to the JPDC Special Issue on Domain-Specific Languages and High-Level Frameworks for High-Performance Computing (2014) (2)
Efficient Multicast Algorithms for Heterogeneous Switch-based Irregular Networks of Workstations (2001) (2)
Efficient run-time support for global view programming of linked data structures on distributed memory parallel systems (2010) (2)
Integrated Data and Task Management for Scientific Applications (2008) (2)
Final Project Report. Scalable fault tolerance runtime technology for petascale computers (2015) (2)
Complier Techniques for Efficient Parallelization of Out-of-Core Tensor Contractions (2005) (2)
A global address space framework for locality aware scheduling of block-sparse computations (2007) (1)
Training of Deep Learning Pipelines on Memory-Constrained GPUs via Segmented Fused-Tiled Execution (2022) (1)
Efficient Distributed Algorithms for Convolutional Neural Networks (2021) (1)
Compile-Time Optimizations for Tensor Contraction Expressions (2003) (1)
Hardware/Software Codesign for All-Pairs Shortest-Paths on a Reconfigurable Supercomputer (2006) (1)
Final Project Report: A Polyhedral Transformation Framework for Compiler Optimization (2015) (1)
Data management and query - Hypergraph partitioning for automatic memory hierarchy management (2006) (1)
An Evaluation of Preemption Strategies for Parallel Job Scheduling (1)
A data-locality aware mapping and scheduling framework for data-intensive computing (2008) (1)
Comparative analysis of approaches to hardware acceleration for sparse-matrix factorization (1988) (1)
A clustered reduced communication element by element preconditioned conjugate gradient algorithm for finite element computations (1994) (1)
Performance modeling for GPUs using abstract kernel emulation (2018) (1)
Integrated compiler optimizations for tensor contractions (2008) (1)
Data layout optimization techniques for modern and emerging architectures (2009) (1)
Efficient Cache Simulation for Affine Computations (2017) (1)
A Parallel Progressive Refinement Image Rendering Algorithm on a Scalable Multithreaded VLSI Processor Array (1993) (1)
Efficient Layout Transformation for Disk-Based Multidimensional Arrays (2004) (1)
Fast Collective Communication Algorithms for Reflective Memory Network Clusters (2000) (1)
An Ecien t Distributed Shared Memory Toolbox for MATLAB (2007) (1)
Checksumming Strategies for Data in Volatile Memories (2014) (0)
A Tiling Perspective for Register Optimization (2014) (0)
WOSC 2014: second workshop on optimizing stencil computations (2014) (0)
An Integrated Approach to Task Scheduling and File Replication∗ (2005) (0)
Final Report for Project DE-FC02-06ER25755 [Pmodels2] (2014) (0)
POHLL: Workshop on performance optimization for high-level languages and libraries (2008) (0)
Guest Editors’ Introduction (2016) (0)
Whole-Program Adaptive Error Detection and Mitigation (2020) (0)
An Asymptotically Optimal Minimum Degree Ordering of Regular Grids (1995) (0)
A Special Issue of Journal of Parallel and Distributed Computing: Domain-Specific Languages and High-Level Frameworks for High-Performance Computing (2013) (0)
Session details: Parallel applications (2007) (0)
Automatic code generation for stencil computations on gpu architectures (2012) (0)
iWAPT Invited Talks (2015) (0)
Scalable I / O Forwarding Framework for Petascale Architectures (2009) (0)
Introduction (1996) (0)
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, February 25-29, 2012 (2012) (0)
Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing (2005) (0)
Eﬀicient convolution optimisation by composing micro-kernels (2022) (0)
Languages and Compilers for Parallel Computing: 18th International Workshop, LCPC 2005Hawthorne, NY, USA, October 20-22, 2005Revised Selected Papers (Lecture Notes in Computer Science) (2007) (0)
Memory Optimizations in an Array Language (2022) (0)
POSTER: Statement Reordering to Alleviate Register Pressure for Stencils on GPUs (2017) (0)
Design and Implementation of an Eecient Sorting Algorithm on Vector Multiprocessors (2007) (0)
Autotuning Convolutions is Easier Than You Think (2022) (0)
Profile-Guided Object-Level Cache Partitioning (2008) (0)
Loop Transformations for Parallel Execution of a Class of Nested Loops on Shared-Memory Multiprocessors (2007) (0)
Tiling for Optimal Resource Utilization (2008) (0)
BOA: A partitioned view of genome assembly (2022) (0)
Introduction to the Special Issue on PPoPP'12 (2015) (0)
Poster: FOX: a fault-oblivious extreme scale execution environment (2011) (0)
Codesign for All-Pairs Shortest-Pathson a Reconfigurable Supercomputer (2006) (0)
Augmenting the Roofline Model via Lower Bounds on Data Movement (2014) (0)
Characterization of bandwidth requirements of algorithms for extreme scale science (2016) (0)
Are Nonblocking Networks Really Needed for (2008) (0)
Proceedings of the Second Workshop on Optimizing Stencil Computations (2014) (0)
GADBMS: A Framework for Scalable Array Analytics (2012) (0)
Parallel LDA with Over-Decomposition (2017) (0)

This paper list is powered by the following services:

What Schools Are Affiliated With Pr P. Sadayappan?

Pr P. Sadayappan is affiliated with the following schools:

Ohio State University

Pr P. Sadayappan's Academic­Influence.com Rankings

Why Is Pr P. Sadayappan Influential?

Pr P. Sadayappan's Published Works

Published Works

What Schools Are Affiliated With Pr P. Sadayappan?

Pr P. Sadayappan's AcademicInfluence.com Rankings