Pr P. Sadayappan
#152,484
Most Influential Person Now
Pr P. Sadayappan's AcademicInfluence.com Rankings
Pr P. Sadayappancomputer-science Degrees
Computer Science
#8122
World Rank
#8543
Historical Rank
Parallel Computing
#42
World Rank
#44
Historical Rank
Algorithms
#319
World Rank
#323
Historical Rank
Database
#5155
World Rank
#5354
Historical Rank

Download Badge
Computer Science
Why Is Pr P. Sadayappan Influential?
(Suggest an Edit or Addition)Pr P. Sadayappan's Published Works
Number of citations in a given year to any of this author's works
Total number of citations to an author for the works they published in a given year. This highlights publication of the most important work(s) by the author
Published Works
- A practical automatic polyhedral parallelizer and locality optimizer (2008) (914)
- Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems (2008) (395)
- Scalable work stealing (2009) (284)
- High-performance code generation for stencil computations on GPU architectures (2012) (252)
- Automatic C-to-CUDA Code Generation for Affine Programs (2010) (244)
- Effective automatic parallelization of stencil computations (2007) (238)
- Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model (2008) (237)
- On improving the performance of sparse matrix-vector multiplication (1997) (230)
- A compiler framework for optimization of affine loop nests for gpgpus (2008) (227)
- Languages and Compilers for Parallel Computing (1992) (216)
- Compile-Time Techniques for Data Distribution in Distributed Memory Machines (1991) (199)
- Distributed job scheduling on computational Grids using multiple simultaneous requests (2002) (191)
- Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models (2005) (186)
- Characterization of backfilling strategies for parallel job scheduling (2002) (183)
- UTS: An Unbalanced Tree Search Benchmark (2006) (178)
- Scalable I/O forwarding framework for high-performance computing systems (2009) (171)
- Polyhedral-based data reuse optimization for configurable computing (2013) (161)
- Task allocation onto a hypercube by recursive mincut bipartitioning (1990) (153)
- Annotation-based empirical performance tuning using Orio (2009) (151)
- On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines (1993) (137)
- Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining (2011) (136)
- Nearest-Neighbor Mapping of Finite Element Graphs onto Processor Meshes (1987) (133)
- Loop transformations: convexity, pruning and optimization (2011) (132)
- Selective Reservation Strategies for Backfill Job Scheduling (2002) (131)
- Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications (2014) (129)
- Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories (2008) (129)
- When polyhedral transformations meet SIMD code generation (2013) (125)
- A stencil compiler for short-vector SIMD architectures (2013) (123)
- PLuTo: A Practical and Fully Automatic Polyhedral Program Optimization System (2015) (120)
- Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures (2011) (116)
- Automatic code generation for many-body electronic structure methods: the tensor contraction engine (2006) (115)
- Cluster partitioning approaches to mapping parallel programs onto a hypercube (1987) (112)
- Scheduling of Parallel Jobs in a Heterogeneous Multi-site Environement (2003) (111)
- Hybrid Hexagonal/Classical Tiling for GPUs (2014) (111)
- Communication-Free Hyperplane Partitioning of Nested Loops (1991) (106)
- Automatic Selection of Sparse Matrix Representation on GPUs (2015) (104)
- Predictive Modeling in a Polyhedral Optimization Space (2011) (101)
- Parametric multi-level tiling of imperfectly nested loops (2009) (97)
- A reliable multicast algorithm for mobile ad hoc networks (2002) (92)
- Tiling Multidimensional Itertion Spaces for Multicomputers (1992) (87)
- The rectilinear steiner arborescence problem (1992) (86)
- Iterative Algorithms for Solution of Large Sparse Systems of Linear Equations on Hypercubes (1988) (84)
- Combined iterative and model-driven optimization in an automatic parallelization framework (2010) (84)
- Adaptive sparse tiling for sparse matrix multiplication (2019) (83)
- Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors (2009) (80)
- Tiling multidimensional iteration spaces for nonshared memory machines (1991) (79)
- Split tiling for GPUs: automatic parallelization using trapezoidal tiles (2013) (79)
- Dynamic Load Balancing of Unbalanced Computations Using Message Passing (2007) (79)
- PARDA: A Fast Parallel Reuse Distance Analysis Algorithm (2012) (77)
- On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution (1997) (76)
- Scioto: A Framework for Global-View Task Parallelism (2008) (76)
- Parameterized tiling revisited (2010) (73)
- Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors (2009) (71)
- Parallel FPGA-based all-pairs shortest-paths in a directed graph (2006) (70)
- A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction (1993) (70)
- Effective automatic parallelization and locality optimization using the polyhedral model (2008) (66)
- Optimization by neural networks (1988) (65)
- A methodology for parallelizing programs for multicomputers and complex memory multiprocessors (1989) (65)
- Communication modeling of heterogeneous networks of workstations for performance characterization of collective operations (1999) (65)
- A framework for enhancing data reuse via associative reordering (2014) (63)
- Dynamic trace-based analysis of vectorization potential of applications (2012) (62)
- Tiling of Iteration Spaces for Multicomputers (1990) (62)
- DynTile: Parametric tiled loop generation for parallel execution on multicore processors (2010) (62)
- Effective Selection of Partition Sizes for Moldable Scheduling of Parallel Jobs (2002) (62)
- An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs (2014) (61)
- A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry (2002) (61)
- Analytical Bounds for Optimal Tile Size Selection (2012) (61)
- Optimal loop unrolling for GPGPU programs (2010) (61)
- Using machine learning to improve automatic vectorization (2012) (60)
- Moldable Parallel Job Scheduling Using Job Efficiency: An Iterative Approach (2006) (58)
- Communication efficient matrix multiplication on hypercubes (1994) (57)
- Hybrid parallel programming with MPI and unified parallel C (2010) (57)
- Space-time trade-off optimization for a class of electronic structure calculations (2002) (56)
- Towards a 'neural' architecture for abductive reasoning (1988) (55)
- Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning (2009) (55)
- Selective buddy allocation for scheduling parallel jobs on clusters (2002) (54)
- An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications (2009) (52)
- QoPS: A QoS Based Scheme for Parallel Job Scheduling (2003) (51)
- Efficient transposition algorithms for large matrices (1993) (51)
- Selective preemption strategies for parallel job scheduling (2002) (51)
- Multi-phase array redistribution: modeling and evaluation (1995) (51)
- Using overlays for efficient data transfer over shared wide-area networks (2008) (48)
- Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization (2001) (46)
- Characterizing and enhancing global memory data coalescing on GPUs (2015) (45)
- Efficient sparse-matrix multi-vector product on GPUs (2018) (45)
- Job fairness in non-preemptive job scheduling (2004) (45)
- An approach to communication-efficient data redistribution (1994) (43)
- Memory-optimal evaluation of expression trees involving large objects (1999) (43)
- Enabling software management for multicore caches with a lightweight hardware support (2009) (43)
- A hypergraph partitioning based approach for scheduling of tasks with batch-shared I/O (2005) (42)
- Loop optimization for a class of memory-constrained computations (2001) (42)
- A Robust Scheduling Strategy for Moldable Scheduling of Parallel Jobs. (2003) (42)
- Register optimizations for stencils on GPUs (2018) (40)
- Code generation for parallel execution of a class of irregular loops on distributed memory systems (2012) (40)
- Static and Dynamic Frequency Scaling on Multicore CPUs (2016) (39)
- Unfairness Metrics for Space-Sharing Parallel Job Schedulers (2005) (39)
- Circuit Simulation on Shared-Memory Multiprocessors (1988) (38)
- ATP: Directed Graph Embedding with Asymmetric Transitivity Preservation (2018) (37)
- Integrating parallel file systems with object-based storage devices (2007) (37)
- High-performance sparse matrix-vector multiplication on GPUs for structured grid computations (2012) (36)
- Fast NIC-based barrier over Myrinet/GM (2001) (36)
- Analytical modeling of cache behavior for affine programs (2017) (36)
- Broadcast/Multicast over Myrinet Using NIC-Assisted Multidestination Messages (2000) (35)
- Automatic mapping of nested loops to FPGAS (2007) (34)
- Mapping combinatorial optimization problems onto neural networks (1995) (33)
- Performance optimization of tensor contraction expressions for many-body methods in quantum chemistry. (2009) (33)
- MultiGraph: Efficient Graph Processing on GPUs (2017) (32)
- Load-Balanced Sparse MTTKRP on GPUs (2019) (32)
- Complete exchange in 2D meshes (1994) (32)
- Evaluating the Impact of Programming Language Features on the Performance of Parallel Applications on Cluster Architectures (2003) (31)
- Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations (2005) (31)
- Resource conscious reuse-driven tiling for GPUs (2016) (31)
- Toward Optimizing Latency Under Throughput Constraints for Application Workflows on Clusters (2007) (31)
- The Relation Between Diamond Tiling and Hexagonal Tiling (2014) (31)
- A robust scheduling technology for moldable scheduling of parallel jobs (2003) (30)
- An Integrated Approach for Processor Allocation and Scheduling of Mixed-Parallel Applications (2006) (30)
- Global communication optimization for tensor contraction expressions under memory constraints (2003) (30)
- An integrated framework for performance-based optimization of scientific workflows (2009) (30)
- Characterization and enhancement of dynamic mapping heuristics for heterogeneous systems (2000) (30)
- Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations (2018) (29)
- A dynamic scheduling approach for coordinated wide-area data transfers using GridFTP (2008) (29)
- Towards provision of quality of service guarantees in job scheduling (2004) (28)
- Distributed memory code generation for mixed Irregular/Regular computations (2015) (28)
- Performance optimization of a class of loops implementing multidimensional integrals (1999) (28)
- A Code Generator for High-Performance Tensor Contractions on GPUs (2019) (28)
- Optimization of a Class of Multi-Dimensional Integrals on Parallel Machines (1997) (27)
- EXTENT: a portable programming environment for designing and implementing high-performance block recursive algorithms (1994) (27)
- Towards effective automatic parallelization for multicore systems (2008) (27)
- Combining analytical and empirical approaches in tuning matrix transposition (2006) (26)
- Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths (2006) (26)
- A Communication-Optimal Framework for Contracting Distributed Tensors (2014) (25)
- Analytical characterization and design space exploration for optimization of CNNs (2021) (25)
- Effective Machine Learning Based Format Selection and Performance Modeling for SpMV on GPUs (2018) (25)
- Hypergraph Partitioning for Automatic Memory Hierarchy Management (2006) (25)
- Parallel Job Scheduling Policies to Improve Fairness: A Case Study (2006) (25)
- Implementing Fast Fourier Transforms on Distributed-Memory Multiprocessors Using Data Redistributions (1994) (25)
- On Characterizing the Data Access Complexity of Programs (2014) (24)
- Optimistic Delinearization of Parametrically Sized Arrays (2015) (24)
- Optimal Algorithms for All-to-All Personalized Communication on Rings and Two Dimensional Tori (1997) (24)
- Neural Network Assisted Tile Size Selection (2010) (24)
- A Framework for Generating Distributed-Memory Parallel Programs for Block Recursive Algorithms (1986) (24)
- Compiler/Runtime Framework for Dynamic Dataflow Parallelization of Tiled Programs (2015) (24)
- SDSLc: a multi-target domain-specific compiler for stencil computations (2015) (24)
- An OSD-based approach to managing directory operations in parallel file systems (2008) (23)
- A Duplication Based Algorithm for Optimizing Latency Under Throughput Constraints for Streaming Workflows (2008) (23)
- Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms (2003) (23)
- Model-Driven SIMD Code Generation for a Multi-resolution Tensor Kernel (2011) (22)
- Compile-time techniques for parallel execution of loops on distributed memory multiprocessors (1990) (22)
- Parametric Tiling of Affine Loop Nests (2010) (22)
- Efficient Sparse Matrix Factorization for Circuit Simulation on Vector Supercomputers (1989) (22)
- Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions (2012) (22)
- Dynamic selection of tile sizes (2011) (22)
- PrimeTile: A Parametric Multi-Level Tiler for Imperfect Loop Nests (2009) (21)
- A framework for load balancing of Tensor Contraction expressions via dynamic task partitioning (2013) (21)
- Parallel CCD++ on GPU for Matrix Factorization (2017) (21)
- Applying MPI derived datatypes to the NAS benchmarks: A case study (2004) (21)
- Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals (1999) (20)
- Affine Transformations for Communication Minimal Parallelization and Locality Optimization of Arbitrarily Nested Loop Sequences (2007) (20)
- Analytical cache modeling and tilesize optimization for tensor contractions (2019) (20)
- Implementation and performance of a binary lattice gas algorithm on parallel processor systems (1989) (20)
- On fusing recursive traversals of K-d trees (2016) (20)
- Efficient parallel out-of-core matrix transposition (2004) (20)
- PolyCheck: dynamic verification of iteration space transformations on affine programs (2016) (20)
- Effective resource management for enhancing performance of 2D and 3D stencils on GPUs (2016) (19)
- Optimizing latency and throughput of application workflows on clusters (2011) (19)
- Stratification driven placement of complex data: A framework for distributed data analytics (2013) (19)
- Assessment and enhancement of meta-schedulers for multi-site job sharing (2005) (19)
- Efficient Index Set Generation for Compiling HPF Array Statements on Distributed-Memory Machines (1996) (18)
- Effective padding of multidimensional arrays to avoid cache conflict misses (2016) (18)
- Parameterized specification, configuration and execution of data-intensive scientific workflows (2010) (18)
- Hybrid Iterative and Model-Driven Optimization in the Polyhedral Model (2008) (18)
- An Approach to Communication-eecient Data Redistribution (1994) (18)
- Revisiting the metadata architecture of parallel file systems (2008) (17)
- All-to-all broadcast on switch-based clusters of workstations (1999) (17)
- UPC Implementation of an Unbalanced Tree Search Benchmark (2003) (17)
- Memory-Constrained Data Locality Optimization for Tensor Contractions (2003) (17)
- Identifying Cost-Effective Common Subexpressions to Reduce Operation Count in Tensor Contraction Evaluations (2006) (17)
- An algebraic theory for modeling direct interconnection networks (1992) (17)
- Modeling and Optimizing Large-Scale Wide-Area Data Transfers (2014) (17)
- An efficient mixed-mode representation of sparse tensors (2019) (17)
- On improving performance of sparse matrix-matrix multiplication on GPUs (2017) (17)
- Register allocation and promotion through combined instruction scheduling and loop unrolling (2016) (17)
- Performance benefits of NIC-based barrier on myrinet/GM (2001) (17)
- MOLAR: adaptive runtime support for high-end computing operating and runtime systems (2006) (16)
- Accelerating Strassen-Winograd's matrix multiplication algorithm on GPUs (2013) (16)
- Sampled Dense Matrix Multiplication for High-Performance Machine Learning (2018) (16)
- A Domain-Specific Compiler for a Parallel Multiresolution Adaptive Numerical Simulation Environment (2016) (16)
- Optimization of Memory Usage and Communication Requirements for a Class of Loops Implementing Multi-Dimensiona l Integrals (1999) (16)
- On Optimizing Complex Stencils on GPUs (2019) (16)
- VIBe: a micro-benchmark suite for evaluating virtual interface architecture (VIA) implementations (2001) (16)
- An elegant sufficiency: load-aware differentiated scheduling of data transfers (2015) (15)
- On characterizing the data movement complexity of computational DAGs for parallel execution (2014) (15)
- StVEC: A Vector Instruction Extension for High Performance Stencil Computation (2011) (15)
- Global Trees: A framework for linked data structures on distributed memory parallel systems (2008) (15)
- On fairness in distributed job scheduling across multiple sites (2004) (15)
- Practical abduction: characterization, decomposition and concurrency (1995) (15)
- Mapping Finite Element Graphs onto Processor Meshes (1987) (14)
- Supernodal Sparse Cholesky Factorization on Distributed-Memory Multiprocessors (1993) (14)
- Task Scheduling and File Replication for Data-Intensive Jobs with Batch-shared I/O (2006) (14)
- A performance optimization framework for compilation of tensor contraction expressions into parallel (2002) (14)
- Data Access Complexity: The Red/Blue Pebble Game Revisited (2013) (14)
- Characterizing dataset dependence for sparse matrix-vector multiplication on GPUs (2015) (14)
- Automated derivation of parametric data movement lower bounds for affine programs (2019) (14)
- Are nonblocking networks really needed for high-end-computing workloads? (2008) (14)
- Locality Conscious Processor Allocation and Scheduling for Mixed Parallel Applications (2006) (13)
- A Compiler Analysis to Determine Useful Cache Size for Energy Efficiency (2013) (13)
- Performance modeling and optimization of parallel out-of-core tensor contractions (2005) (13)
- A message passing benchmark for unbalanced applications (2008) (13)
- Memory-Constrained Communication Minimization for a Class of Array Computations (2002) (12)
- PIPES: A Language and Compiler for Task-Based Programming on Distributed-Memory Clusters (2016) (12)
- A framework for characterizing overlap of communication and computation in parallel applications (2008) (12)
- Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver (2004) (12)
- Compiling Array Statements for Efficient Execution on Distributed-Memory Machines: Two-Level Mappings (1995) (12)
- Scheduling File Transfers for Data-Intensive Jobs on Heterogeneous Clusters (2007) (12)
- On Using the Roofline Model with Lower Bounds on Data Movement (2015) (11)
- A fast implementation of MLR-MCL algorithm on multi-core processors (2014) (11)
- Beyond reuse distance analysis (2013) (11)
- Efficient search‐space pruning for integrated fusion and tiling transformations (2005) (11)
- Implementing TreadMarks over Virtual Interface Architecture on Myrinet and gigabit Ethernet: Challenges, design experience, and performance evaluation (2001) (11)
- Analyzing and Minimizing the Impact of Opportunity Cost in QoS-aware Job Scheduling (2007) (11)
- A methodology for generating data distributions to optimize communication (1992) (11)
- Low Latency Message-Passing for Reflective Memory Networks (1999) (11)
- Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions (2005) (11)
- Multi-Phase Redistribution: A Communication-Efficient Approach to Array Redistribution (1995) (10)
- A Data Locality Aware Online Scheduling Approach for I/O-Intensive Jobs with File Sharing (2006) (10)
- Selective Recovery from Failures in a Task Parallel Programming Model (2010) (10)
- Understanding parallelism-inhibiting dependences in sequential Java programs (2010) (10)
- A technique for overlapping computation and communication for block recursive algorithms (1998) (10)
- Parametric GPU Code Generation for Affine Loop Programs (2013) (10)
- Optimizing Tensor Contractions in CCSD(T) for Efficient Execution on GPUs (2018) (10)
- Efficient static scheduling of loops on synchronous multiprocessors (1989) (10)
- Efficient Tiled Sparse Matrix Multiplication through Matrix Signatures (2020) (10)
- Associative Instruction Reordering to Alleviate Register Pressure (2018) (10)
- On data dependence analysis for compiling programs on distributed-memory machines (extended abstract) (1993) (10)
- Polyhedral Model (10)
- Multifrontal Factorization of Sparse Matrices on Shared-Memory Multiprocessors (1991) (9)
- Load Balancing of Dynamical Nucleation Theory Monte Carlo Simulations through Resource Sharing Barriers (2012) (9)
- GPU code optimization using abstract kernel emulation and sensitivity analysis (2018) (9)
- Memory minimization for tensor contractions using integer linear programming (2006) (9)
- Multi-hop path splitting and multi-pathing optimizations for data transfers over shared wide-area networks using gridFTP (2008) (9)
- On Recovering Multi-Dimensional Arrays in Polly (2015) (9)
- NIC-based rate control for proportional bandwidth allocation in Myrinet clusters (2001) (9)
- A global address space approach to automated data management for parallel Quantum Monte Carlo applications (2012) (9)
- Optimal Reordering and Mapping of a Class of Nested-Loops for Parallel Execution (1996) (9)
- PWCET: Power-Aware Worst Case Execution Time Analysis (2014) (8)
- Nested Loop Tiling for Distributed Memory Machines (1990) (8)
- An extensible global address space framework with decoupled task and data abstractions (2006) (8)
- On Efficient Out-of-core Matrix Transposition � (2003) (8)
- Compiler-assisted detection of transient memory errors (2014) (8)
- Framework for Distributed Contractions of Tensors with Symmetry (2013) (8)
- Parallelization and performance evaluation of circuit simulation on a shared-memory multiprocessor (1988) (8)
- Access based data decomposition fam distributed memory machines (1991) (8)
- Performance Optimization of a Class of Loops Involving Sums of Products of Sparse Arrays (1999) (8)
- Incremental Generation of Index Sets for Array Statement Execution on Distributed-Memory Machines (1994) (8)
- Implementing TreadMarksover VIA on Myrinet and Gigabit Ethernet: Challenges, Design Experience, and Performance Evaluation (2001) (8)
- Automatic parallelization of a class of irregular loops for distributed memory systems (2014) (7)
- Scheduling of tasks with batch-shared I/O on heterogeneous systems (2006) (7)
- Communication reduction for distributed sparse matrix factorization on a processor mesh (1989) (7)
- Adaptive parallel tiled code generation and accelerated auto-tuning (2013) (7)
- TTLG - An Efficient Tensor Transposition Library for GPUs (2018) (7)
- Differentiated Scheduling of Response-Critical and Best-Effort Wide-Area Data Transfers (2016) (7)
- Partitioning Graphs on Message-Passing Machines by Pairwise Mincut (1998) (7)
- On the Synthesis of Parallel Programs from Tensor Product Formulas for Block Recursive Algorithms (1992) (7)
- Parallel Direct Solution of Sparse Linear Systems (1993) (7)
- Low-latency message passing on workstation clusters using SCRAMNet (1999) (7)
- A Methodology for Generating Efficient Disk-Based Algorithms from Tensor Product Formulas (1993) (7)
- PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization (2019) (6)
- A Performance Instrumentation Framework to Characterize Computation-Communication Overlap in Message-Passing Systems (2006) (6)
- Effective Utilization of Tensor Symmetry in Operation Optimization of Tensor Contraction Expressions (2012) (6)
- GAMMA : Global Arrays Meets MATLAB ∗ (2006) (6)
- Brief Announcement: Approximating the I/O Complexity of One-Shot Red-Blue Pebbling (2016) (6)
- An Algebraic Approach to Cache Memory Characterization for Block Recursive Algorithms (1994) (6)
- A model-driven blocking strategy for load balanced sparse matrix-vector multiplication on GPUs (2015) (6)
- Application-Specific Fault Tolerance via Data Access Characterization (2011) (6)
- Compiler Support for Software Cache Coherence (2016) (6)
- ALO-NMF: Accelerated Locality-Optimized Non-negative Matrix Factorization (2020) (6)
- On sparse matrix reordering for parallel factorization (1994) (6)
- Iteration space tiling for distributed memory machines (1992) (5)
- An approach to locality-conscious load balancing and transparent memory hierarchy management with a global-address-space parallel programming model (2006) (5)
- Work stealing for GPU‐accelerated parallel programs in a global address space framework (2016) (5)
- REVENUE MAXIMIZATION IN MARKET-BASED PARALLEL JOB SCHEDULERS (5)
- On mapping data and computation for parallel sparse Cholesky factorization (1995) (5)
- Robust scheduling of moldable parallel jobs (2004) (5)
- Global graphs: A middleware for large scale graph processing (2014) (5)
- Fault oblivious eXascale whitepaper (2011) (5)
- Efficient multicast algorithms for switch-based irregular heterogeneous networks of workstations (2001) (5)
- Cache miss characterization and data locality optimization for imperfectly nested loops on shared memory multiprocessors (2005) (4)
- Compile-time and run-time optimizations for enhancing locality and parallelism on multi-core and many-core systems (2009) (4)
- Optimizing the Four-Index Integral Transform Using Data Movement Lower Bounds Analysis (2017) (4)
- CAST: Contraction Algorithm for Symmetric Tensors (2014) (4)
- IOOpt: automatic derivation of I/O complexity bounds for affine programs (2021) (4)
- Parallel Latent Dirichlet Allocation on GPUs (2018) (4)
- Opportune Job Shredding: An Effective Approach for Scheduling Parameter Sweep Appli (2003) (4)
- The Promises of Hybrid Hexagonal/Classical Tiling for GPU (2013) (4)
- Memory-adaptive parallel sparse Cholesky factorization (1994) (4)
- Scalable Heterogeneous Execution of a Coupled-Cluster Model with Perturbative Triples (2020) (4)
- Data and Computation Abstractions for Dynamic and Irregular Computations (2005) (4)
- Compile-Time Charactirization Recurrent Patterns in Irregular Computations (1993) (4)
- Characterization of Data Movement Requirements for Sparse Matrix Computations on GPUs (2017) (4)
- Layout transformation support for the disk resident arrays framework (2006) (4)
- A Roofline-Based Performance Estimator for Distributed Matrix-Multiply on Intel CnC (2015) (3)
- An Efficient Distributed Shared Memory Toolbox for MATLAB (3)
- Domain Specific Language Support for Exascale (2017) (3)
- Global‐view coefficients: a data management solution for parallel quantum Monte Carlo applications (2016) (3)
- International Conference on Computational Science, ICCS 2012 (2012) (3)
- A Domain-Specific Language and Compiler for Stencil Computations on Short-Vector SIMD and GPU Architectures (2013) (3)
- Qos in parallel job scheduling (2008) (3)
- Electronic Structure Methods: The Tensor Contraction Engine ⁄ (2015) (3)
- Balancing Web server load for adaptable video distribution (2000) (3)
- Parallel Data-Local Training for Optimizing Word2Vec Embeddings for Word and Graph Embeddings (2019) (3)
- Characterizing Computation-Communication Overlap in Message-Passing Systems (2008) (3)
- Use of PVFS for efficient execution of jobs with pipeline-shared I/O (2004) (3)
- A Clustering Algorithm for Parallel Sparse Cholesky Factorization (1995) (3)
- Architecting and Programming a Hardware-Incoherent Multiprocessor Cache Hierarchy (2016) (3)
- Non-collective parallel I/O for global address space programming models (2007) (3)
- One-to-one mapping of process graphs onto a hypercube (1989) (3)
- Empirical Performance-Model Driven Data Layout Optimization (2004) (3)
- A high productivity framework for parallel data intensive computing in matlab (2009) (2)
- Data access optimizations for parallel computers (1998) (2)
- Communication-efficient implementation of block recursive algorithms on distributed-memory machines (1994) (2)
- TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition (2022) (2)
- Techniques for Providing Hard Quality‐of‐Service Guarantees in Job Scheduling (2009) (2)
- A reordering and mapping algorithm for parallel sparse Cholesky factorization (1994) (2)
- On the automatic generation of data distributions (1993) (2)
- A Compiler Framework for Optimization of Affine Loop Nests for General Purpose Computations on GPUs (2015) (2)
- Introduction to the JPDC Special Issue on Domain-Specific Languages and High-Level Frameworks for High-Performance Computing (2014) (2)
- Efficient Multicast Algorithms for Heterogeneous Switch-based Irregular Networks of Workstations (2001) (2)
- Efficient run-time support for global view programming of linked data structures on distributed memory parallel systems (2010) (2)
- Integrated Data and Task Management for Scientific Applications (2008) (2)
- Final Project Report. Scalable fault tolerance runtime technology for petascale computers (2015) (2)
- Complier Techniques for Efficient Parallelization of Out-of-Core Tensor Contractions (2005) (2)
- A global address space framework for locality aware scheduling of block-sparse computations (2007) (1)
- Training of Deep Learning Pipelines on Memory-Constrained GPUs via Segmented Fused-Tiled Execution (2022) (1)
- Efficient Distributed Algorithms for Convolutional Neural Networks (2021) (1)
- Compile-Time Optimizations for Tensor Contraction Expressions (2003) (1)
- Hardware/Software Codesign for All-Pairs Shortest-Paths on a Reconfigurable Supercomputer (2006) (1)
- Final Project Report: A Polyhedral Transformation Framework for Compiler Optimization (2015) (1)
- Data management and query - Hypergraph partitioning for automatic memory hierarchy management (2006) (1)
- An Evaluation of Preemption Strategies for Parallel Job Scheduling (1)
- A data-locality aware mapping and scheduling framework for data-intensive computing (2008) (1)
- Comparative analysis of approaches to hardware acceleration for sparse-matrix factorization (1988) (1)
- A clustered reduced communication element by element preconditioned conjugate gradient algorithm for finite element computations (1994) (1)
- Performance modeling for GPUs using abstract kernel emulation (2018) (1)
- Integrated compiler optimizations for tensor contractions (2008) (1)
- Data layout optimization techniques for modern and emerging architectures (2009) (1)
- Efficient Cache Simulation for Affine Computations (2017) (1)
- A Parallel Progressive Refinement Image Rendering Algorithm on a Scalable Multithreaded VLSI Processor Array (1993) (1)
- Efficient Layout Transformation for Disk-Based Multidimensional Arrays (2004) (1)
- Fast Collective Communication Algorithms for Reflective Memory Network Clusters (2000) (1)
- An Ecien t Distributed Shared Memory Toolbox for MATLAB (2007) (1)
- Checksumming Strategies for Data in Volatile Memories (2014) (0)
- A Tiling Perspective for Register Optimization (2014) (0)
- WOSC 2014: second workshop on optimizing stencil computations (2014) (0)
- An Integrated Approach to Task Scheduling and File Replication∗ (2005) (0)
- Final Report for Project DE-FC02-06ER25755 [Pmodels2] (2014) (0)
- POHLL: Workshop on performance optimization for high-level languages and libraries (2008) (0)
- Guest Editors’ Introduction (2016) (0)
- Whole-Program Adaptive Error Detection and Mitigation (2020) (0)
- An Asymptotically Optimal Minimum Degree Ordering of Regular Grids (1995) (0)
- A Special Issue of Journal of Parallel and Distributed Computing: Domain-Specific Languages and High-Level Frameworks for High-Performance Computing (2013) (0)
- Session details: Parallel applications (2007) (0)
- Automatic code generation for stencil computations on gpu architectures (2012) (0)
- iWAPT Invited Talks (2015) (0)
- Scalable I / O Forwarding Framework for Petascale Architectures (2009) (0)
- Introduction (1996) (0)
- Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, February 25-29, 2012 (2012) (0)
- Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing (2005) (0)
- Efficient convolution optimisation by composing micro-kernels (2022) (0)
- Languages and Compilers for Parallel Computing: 18th International Workshop, LCPC 2005Hawthorne, NY, USA, October 20-22, 2005Revised Selected Papers (Lecture Notes in Computer Science) (2007) (0)
- Memory Optimizations in an Array Language (2022) (0)
- POSTER: Statement Reordering to Alleviate Register Pressure for Stencils on GPUs (2017) (0)
- Design and Implementation of an Eecient Sorting Algorithm on Vector Multiprocessors (2007) (0)
- Autotuning Convolutions is Easier Than You Think (2022) (0)
- Profile-Guided Object-Level Cache Partitioning (2008) (0)
- Loop Transformations for Parallel Execution of a Class of Nested Loops on Shared-Memory Multiprocessors (2007) (0)
- Tiling for Optimal Resource Utilization (2008) (0)
- BOA: A partitioned view of genome assembly (2022) (0)
- Introduction to the Special Issue on PPoPP'12 (2015) (0)
- Poster: FOX: a fault-oblivious extreme scale execution environment (2011) (0)
- Codesign for All-Pairs Shortest-Pathson a Reconfigurable Supercomputer (2006) (0)
- Augmenting the Roofline Model via Lower Bounds on Data Movement (2014) (0)
- Characterization of bandwidth requirements of algorithms for extreme scale science (2016) (0)
- Are Nonblocking Networks Really Needed for (2008) (0)
- Proceedings of the Second Workshop on Optimizing Stencil Computations (2014) (0)
- GADBMS: A Framework for Scalable Array Analytics (2012) (0)
- Parallel LDA with Over-Decomposition (2017) (0)
This paper list is powered by the following services:
What Schools Are Affiliated With Pr P. Sadayappan?
Pr P. Sadayappan is affiliated with the following schools: