Dhabaleswar K. D. K. Panda

Dhabaleswar K. D. K. Panda's AcademicInfluence.com Rankings

Computer Science

#5746

World Rank

#6065

Historical Rank

Parallel Computing

#29

World Rank

#29

Historical Rank

Database

#2877

World Rank

#3002

Historical Rank

computer-science Degrees

Download Badge

Computer Science

Why Is Dhabaleswar K. D. K. Panda Influential?

(Suggest an Edit or Addition)

(See a Problem?)

Dhabaleswar K. D. K. Panda's Published Works

Number of citations in a given year to any of this author's works

Total number of citations to an author for the works they published in a given year. This highlights publication of the most important work(s) by the author

Published Works

High Performance RDMA-Based MPI Implementation over InfiniBand (2003) (433)
High Performance VMM-Bypass I/O in Virtual Machines (2006) (319)
A case for high performance computing with virtual machines (2006) (317)
EMP: Zero-Copy OS-Bypass NIC-Driven Gigabit Ethernet Message Passing (2001) (204)
Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System (2007) (186)
Memcached Design on High Performance RDMA Capable Interconnects (2011) (184)
High performance RDMA-based design of HDFS over InfiniBand (2012) (179)
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics (2003) (174)
High performance virtual machine migration with RDMA over modern interconnects (2007) (145)
MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters (2011) (144)
Scalable Earthquake Simulation on Petascale Supercomputers (2010) (140)
Beyond block I/O: Rethinking traditional storage primitives (2011) (134)
S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters (2017) (130)
Efficient collective communication on heterogeneous networks of workstations (1998) (129)
Design and implementation of MPICH2 over InfiniBand with RDMA support (2003) (121)
Virtual machine aware communication libraries for high performance computing (2007) (119)
High-Performance Design of Hadoop RPC with RDMA over InfiniBand (2013) (118)
High Performance Remote Memory Access Communication: The Armci Approach (2006) (118)
Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs (2013) (112)
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device (2005) (112)
Fast collective operations using shared and remote memory access protocols on clusters (2003) (110)
Multidestination Message Passing in Wormhole k-ary n-cube Networks with Base Routing Conformed Paths (1999) (105)
RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits (2006) (104)
Multidestination Message Passing Mechanism Conforming to Base Wormhole Routing Scheme (1994) (100)
Performance characterization of a 10-Gigabit Ethernet TOE (2005) (97)
MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics (2008) (96)
LiMIC: support for high-performance MPI intra-node communication on Linux cluster (2005) (96)
High-Performance Design of HBase with RDMA over InfiniBand (2012) (95)
Microbenchmark performance comparison of high-speed cluster interconnects (2004) (95)
PVFS over InfiniBand: design and performance evaluation (2003) (95)
A reliable multicast algorithm for mobile ad hoc networks (2002) (92)
High performance MPI-2 one-sided communication over InfiniBand (2004) (91)
Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters (2006) (86)
A trip-based multicasting model for wormhole-routed networks with virtual channels (1993) (84)
Accelerating Spark with RDMA for Big Data Processing: Early Experiences (2014) (84)
Fast and scalable MPI-level broadcast using InfiniBand's hardware multicast support (2004) (84)
Designing topology-aware collective communication algorithms for large scale InfiniBand clusters: Case studies with Scatter and Gather (2010) (83)
Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture (2015) (79)
Design of High Performance MVAPICH2: MPI2 over InfiniBand (2006) (79)
CIFTS: A Coordinated Infrastructure for Fault-Tolerant Systems (2009) (79)
Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand (2006) (78)
Optimizing MPI Communication on Multi-GPU Systems Using CUDA Inter-Process Communication (2012) (73)
GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation (2014) (73)
Sockets Direct Protocol over InfiniBand in clusters: is it beneficial? (2004) (72)
High-Performance RDMA-based Design of Hadoop MapReduce over InfiniBand (2013) (72)
High Performance Design for HDFS with Byte-Addressability of NVM and RDMA (2016) (70)
Multicast on irregular switch-based networks with wormhole routing (1997) (68)
Towards NIC-based intrusion detection (2003) (67)
Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation (2004) (65)
Communication modeling of heterogeneous networks of workstations for performance characterization of collective operations (1999) (65)
SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience (2013) (64)
Implementing Multidestination Worms In Switch-based Parallel Systems: Architectural Alternatives And Their Impact (1997) (64)
Frontera: The Evolution of Leadership Computing at the National Science Foundation (2020) (63)
High-Performance and Scalable MPI over InfiniBand with Reduced Memory Usage: An In-Depth performance Analysis (2006) (61)
High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters (2007) (59)
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements (2007) (59)
HIPIQS: a high-performance switch architecture using input queuing (1998) (59)
Scalable Memcached Design for InfiniBand Clusters Using Hybrid Transports (2012) (58)
Shared receive queue based scalable MPI design for InfiniBand clusters (2006) (58)
High performance user level sockets over Gigabit Ethernet (2002) (57)
A 1 PB/s file system to checkpoint three million MPI tasks (2013) (57)
Unifying UPC and MPI runtimes: experience with MVAPICH (2010) (56)
Performance Analysis and Evaluation of PCIe 2.0 and Quad-Data Rate InfiniBand (2008) (55)
Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2 (2011) (54)
High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters: a study with parallel 3D FFT (2011) (54)
Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes (2012) (54)
High performance implementation of MPI derived datatype communication over InfiniBand (2004) (53)
Efficient broadcast and multicast on multistage interconnection networks using multiport encoding (1996) (53)
Performance Analysis and Evaluation of InfiniBand FDR and 40GigE RoCE on HPC and Cloud Computing Systems (2012) (53)
Nomad: migrating OS-bypass networks in virtual machines (2007) (52)
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck (2004) (52)
Scalable NIC-based Reduction on Large-scale Clusters (2003) (52)
Can High-Performance Interconnects Benefit Hadoop Distributed File System ? (2010) (51)
Selective preemption strategies for parallel job scheduling (2002) (51)
QoPS: A QoS Based Scheme for Parallel Job Scheduling (2003) (51)
MVAPICH-Aptus: Scalable high-performance multi-transport MPI over InfiniBand (2008) (50)
Designing multi-leader-based Allgather algorithms for multi-core clusters (2009) (49)
Host-assisted zero-copy remote memory access communication on InfiniBand (2004) (49)
High Performance RDMA Based All-to-All Broadcast for InfiniBand Clusters (2005) (49)
Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms (1995) (49)
Efficient Shared Memory and RDMA Based Design for MPI_Allgather over InfiniBand (2006) (48)
Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms (2007) (48)
High-Performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA (2015) (47)
Scaling alltoall collective on multi-core systems (2008) (46)
Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms (2004) (45)
High-performance design of apache spark with RDMA and its benefits on various workloads (2016) (45)
An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures (2017) (45)
HOMR: a hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects (2014) (44)
Stampede 2: The Evolution of an XSEDE Supercomputer (2017) (44)
Lightweight kernel-level primitives for high-performance MPI intra-node communication over multi-core systems (2007) (44)
Adaptive connection management for scalable MPI over InfiniBand (2006) (43)
OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters (2012) (43)
Design and Evaluation of Network Topology-/Speed- Aware Broadcast Algorithms for InfiniBand Clusters (2011) (41)
Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective (2007) (41)
Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters (2010) (41)
Reducing Connection Memory Requirements of MPI for InfiniBand Clusters: A Message Coalescing Approach (2007) (40)
Bandwidth-Optimal Complete Exchange on Wormhole-Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach (1997) (39)
MVAPICH-PRISM: A proxy-based communication framework using InfiniBand and SCIF for Intel MIC clusters (2013) (39)
Fast and Scalable Barrier Using RDMA and Multicast Mechanisms for InfiniBand-Based Clusters (2003) (39)
Efficient Implementation of MPI-2 Passive One-Sided Communication on InfiniBand Clusters (2004) (39)
High performance and reliable NIC-based multicast over Myrinet/GM-2 (2003) (38)
MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems (2001) (38)
Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning (2016) (37)
How much does network contention affect distributed shared memory performance? (1997) (37)
Reducing cache invalidation overheads in wormhole routed DSMs using multidestination message passing (1996) (37)
RDMA over Ethernet — A preliminary study (2009) (37)
InfiniBand Architecture (2001) (37)
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? (2017) (37)
Enhancing Checkpoint Performance with Staging IO and SSD (2010) (37)
Data intensive computing (2006) (37)
High performance support of parallel virtual file system (PVFS2) over Quadrics (2005) (37)
Designing Scalable Graph500 Benchmark with Hybrid MPI+OpenSHMEM Programming Models (2013) (36)
Scalable MPI design over InfiniBand using eXtended Reliable Connection (2008) (36)
Fast NIC-based barrier over Myrinet/GM (2001) (36)
Efficient SMP-aware MPI-level broadcast over InfiniBand's hardware multicast (2006) (36)
Protocols and strategies for optimizing performance of remote memory operations on clusters (2002) (36)
Broadcast/Multicast over Myrinet Using NIC-Assisted Multidestination Messages (2000) (35)
Efficient Intra-node Communication on Intel-MIC Clusters (2013) (35)
High Performance Data Transfer in Grid Environment Using GridFTP over InfiniBand (2010) (34)
Multicasting in Irregular Networks with Cut-Through Switches Using Tree-Based Multidestination Worms (1997) (34)
Designing an Efficient Kernel-Level and User-Level Hybrid Approach for MPI Intra-Node Communication on Multi-Core Systems (2008) (33)
Complete exchange in 2D meshes (1994) (32)
Monitoring and Predicting Hardware Failures in HPC Clusters with FTB-IPMI (2012) (32)
Benefits of high speed interconnects to cluster file systems: a case study with Lustre (2006) (32)
Group-based Coordinated Checkpointing for MPI: A Case Study on InfiniBand (2007) (32)
Hybrid algorithms for complete exchange in 2D meshes (2001) (32)
Performance Characterization of Hypervisor-and Container-Based Virtualization for HPC on SR-IOV Enabled InfiniBand Clusters (2016) (32)
Design and evaluation of benchmarks for financial applications using Advanced Message Queuing Protocol (AMQP) over InfiniBand (2008) (32)
High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations (2007) (31)
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation (2018) (31)
A case for application-oblivious energy-efficient MPI runtime (2015) (31)
Supporting Hybrid MPI and OpenSHMEM over InfiniBand: Design and Performance Evaluation (2012) (31)
Designing Clustered Multiprocessor Systems under Packaging and Technological Advancements (1996) (31)
Efficient asynchronous memory copy operations on multi-core systems and I/OAT (2007) (31)
Zero-Copy MPI Derived Datatype Communication over InfiniBand (2004) (31)
MPI over InfiniBand: Early Experiences (2003) (31)
Efficient collective operations using remote memory operations on VIA-based clusters (2003) (31)
Efficient virtual interface architecture (VIA) support for the IBM SP switch-connected NT clusters (2000) (31)
Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters (2014) (30)
Extending OpenSHMEM for GPU Computing (2013) (30)
Characterization and enhancement of dynamic mapping heuristics for heterogeneous systems (2000) (30)
SOR-HDFS: a SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS (2014) (30)
Efficient and scalable barrier over Quadrics and Myrinet with a new NIC-based collective message passing protocol (2004) (29)
Optimal multicast with packetization and network interface support (1997) (29)
A reliable hardware barrier synchronization scheme (1997) (29)
Towards provision of quality of service guarantees in job scheduling (2004) (28)
Can user-level protocols take advantage of multi-CPU NICs? (2002) (28)
Supporting efficient noncontiguous access in PVFS over Infiniband (2003) (28)
RDMA-Based Job Migration Framework for MPI over InfiniBand (2010) (28)
Evaluating InfiniBand performance with PCI Express (2005) (28)
Minimizing node contention in multiple multicast on wormhole k-ary n-cube networks (1996) (27)
Performance evaluation of InfiniBand with PCI Express (2004) (27)
Design and Evaluation of Generalized Collective Communication Primitives with Overlap Using ConnectX-2 Offload Engine (2010) (27)
High performance MPI library over SR-IOV enabled infiniband clusters (2014) (27)
Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application (2010) (26)
Performance characterization and acceleration of in-memory file systems for Hadoop and Spark applications on HPC clusters (2015) (26)
Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers (2011) (26)
Exploiting NIC architectural support for enhancing IP-based protocols on high-performance networks (2005) (26)
High-radix symbolic substitution and superposition techniques for optical matrix algebraic computations (1992) (26)
Scalable Reduction Collectives with Data Partitioning-based Multi-Leader Design (2017) (26)
Micro-benchmark level performance comparison of high-speed cluster interconnects (2003) (26)
Asynchronous zero-copy communication for synchronous sockets in the sockets direct protocol (SDP) over InfiniBand (2006) (26)
System-Level Scalable Checkpoint-Restart for Petascale Computing (2016) (26)
Head-to-TOE Evaluation of High-Performance Sockets over Protocol Offload Engines (2005) (25)
Natively Supporting True One-Sided Communication in MPI on Multi-core Systems with InfiniBand (2009) (25)
Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters (2013) (25)
High Performance MPI Library for Container-Based HPC Cloud on InfiniBand Clusters (2016) (25)
SSD-Assisted Hybrid Memory to Accelerate Memcached over High Performance Networks (2012) (25)
Impact of adaptivity on the behavior of networks of workstations under bursty traffic (1998) (25)
Accelerating I/O Performance of Big Data Analytics on HPC Clusters through RDMA-Based Key-Value Store (2015) (25)
Implementing efficient and scalable flow control schemes in MPI over InfiniBand (2004) (25)
Performance modeling of subnet management on fat tree InfiniBand networks using OpenSM (2005) (24)
Congestion avoidance on manycore high performance computing systems (2012) (24)
Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? (2013) (24)
Supporting MPI-2 One Sided Communication on Multi-rail InfiniBand Clusters: Design Challenges and Performance Benefits (2005) (24)
OMP: a RISC-based multiprocessor using orthogonal-access memories and multiple spanning buses (1990) (24)
NIC-Based Reduction in Myrinet Clusters: Is It Beneficial? (2003) (23)
Impact of high performance sockets on data intensive applications (2003) (23)
Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand (2009) (23)
Efficient barrier using remote memory operations on VIA-based clusters (2002) (23)
Automatic Path Migration over InfiniBand: Early Experiences (2007) (23)
Can Inter-VM Shmem Benefit MPI Applications on SR-IOV Based Virtualized Infiniband Clusters? (2014) (23)
Power-Check: An Energy-Efficient Checkpointing Framework for HPC Clusters (2015) (23)
Impact of on-demand connection management in MPI over VIA (2002) (23)
NIC-based offload of dynamic user-defined modules for Myrinet clusters (2004) (23)
Fast and Scalable Startup of MPI Programs in InfiniBand Clusters (2004) (23)
Designing NFS with RDMA for Security, Performance and Scalability (2007) (22)
A Comprehensive Study of MapReduce Over Lustre for Intermediate Data Placement and Shuffle Strategies on HPC Clusters (2017) (22)
MPI performance engineering with the MPI tool interface: the integration of MVAPICH and TAU (2017) (22)
OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training (2018) (22)
EReinit: Scalable and efficient fault‐tolerance for bulk‐synchronous MPI applications (2018) (22)
Where to provide support for efficient multicasting in irregular networks: network interface or switch? (1998) (22)
MVAPICH2 over OpenStack with SR-IOV: An Efficient Approach to Build HPC Clouds (2015) (22)
Simulation Of Modern Parallel Systems: A CSIM-based Approach (1997) (22)
Fast checkpointing by Write Aggregation with Dynamic Buffer and Interleaving on multicore architecture (2009) (22)
Efficient and scalable all-to-all personalized exchange for InfiniBand-based clusters (2004) (22)
Designing MPI Library with Dynamic Connected Transport (DCT) of InfiniBand: Early Experiences (2014) (22)
High-Performance Hybrid Key-Value Store on Modern Clusters with RDMA Interconnects and SSDs: Non-blocking Extensions, Designs, and Benefits (2016) (22)
ScELA: scalable and extensible launching architecture for clusters (2008) (22)
Designing Efficient Asynchronous Memory Operations Using Hardware Copy Engine: A Case Study with I/OAT (2007) (22)
Is Singularity-based Container Technology Ready for Running MPI Applications on HPC Clouds? (2017) (21)
Lock-Free Asynchronous Rendezvous Design for MPI Point-to-Point Communication (2008) (21)
Applying MPI derived datatypes to the NAS benchmarks: A case study (2004) (21)
Impact of multiple consumption channels on wormhole routed k-ary n-cube networks (1993) (21)
HAND: A Hybrid Approach to Accelerate Non-contiguous Data Movement Using MPI Datatypes on GPU Clusters (2014) (21)
Improving Scalability of OpenMP Applications on Multi-core Systems Using Large Page Support (2007) (21)
Global reduction in wormhole k-ary n-cube networks with multidestination exchange worms (1995) (21)
MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit (2011) (21)
Adaptive and Dynamic Design for MPI Tag Matching (2016) (21)
Issues in Designing Efficient and Practical Algorithms for Collective Communication on Wormhole-Rout (1995) (21)
Benefits of I/O Acceleration Technology (I/OAT) in Clusters (2007) (21)
Comparison and Evaluation of Design Choices for Implementing the Virtual Interface Architecture (VIA) (2000) (20)
Unifier: unifying cache management and communication buffer management for PVFS over InfiniBand (2004) (20)
Efficient Barrier and Allreduce on IBA clusters using hardware multicast and adaptive algorithms (2004) (20)
Efficient Barrier and Allreduce on IBA clusters using hardware multicast and adaptive algorithms (2004) (20)
Designing truly one-sided MPI-2 RMA intra-node communication on multi-core systems (2010) (20)
High Performance Broadcast Support in La-Mpi Over Quadrics (2005) (20)
Multiple Multicast with Minimized Node Contention on Wormhole k-ary n-cube Networks (1999) (20)
A Software Based Approach for Providing Network Fault Tolerance in Clusters with uDAPL interface: MPI Level Design and Performance Evaluation (2006) (19)
Scheduling of MPI-2 one sided operations over InfiniBand (2005) (19)
IMCa: A High Performance Caching Front-End for GlusterFS on InfiniBand (2008) (19)
NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems (2020) (19)
The MVAPICH project: Transforming research into high-performance MPI library for HPC community (2020) (19)
CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart (2011) (19)
Application-bypass broadcast in MPICH over GM (2003) (19)
Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems (2009) (19)
Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning (2017) (18)
Zero-copy protocol for MPI using infiniband unreliable datagram (2007) (18)
Designing Non-blocking Broadcast with Collective Offload on InfiniBand Clusters: A Case Study with HPL (2011) (18)
UPC on MIC: Early Experiences with Native and Symmetric Modes (2013) (18)
High Performance Pipelined Process Migration with RDMA (2011) (18)
Performance of HPC Middleware over InfiniBand WAN (2008) (18)
A Micro-benchmark Suite for Evaluating HDFS Operations on Modern Clusters (2012) (18)
A scalable and portable approach to accelerate hybrid HPL on heterogeneous CPU-GPU clusters (2013) (18)
Performance Characterization of DNN Training using TensorFlow and PyTorch on Modern Clusters (2019) (18)
Designing passive synchronization for MPI-2 one-sided communication to maximize overlap (2008) (18)
Can memory-less network adapters benefit next-generation infiniband systems? (2005) (18)
Accelerating TensorFlow with Adaptive RDMA-Based gRPC (2018) (18)
MR-Advisor: A comprehensive tuning, profiling, and prediction tool for MapReduce execution frameworks on HPC clusters (2017) (18)
Special Issue on Workstation Clusters and Network-Based Computing: Guest Editors' Introduction (1997) (17)
High-Performance and Resilient Key-Value Store with Online Erasure Coding for Big Data Workloads (2017) (17)
Supporting iWARP Compatibility and Features for Regular Network Adapters (2005) (17)
MVAPICH2-MIC: A High Performance MPI Library for Xeon Phi Clusters with InfiniBand (2013) (17)
Alleviating Consumption Channel Bottleneck in Wormhole-Routed k-ary n-Cube Systems (1998) (17)
Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters Using Shared Memory Backed Windows (2011) (17)
Implementing efficient MPI on LAPI for IBM RS/6000 SP systems: Experiences and performance evaluation (1999) (17)
Efficient one-copy MPI shared memory communication in Virtual Machines (2008) (17)
All-to-all broadcast on switch-based clusters of workstations (1999) (17)
Performance benefits of NIC-based barrier on myrinet/GM (2001) (17)
Bridging the Ethernet-Ethernot Performance Gap (2006) (17)
Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing (2001) (17)
Supporting Strong Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand (2004) (17)
MapReduce over Lustre: Can RDMA-Based Approach Benefit? (2014) (16)
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand (2004) (16)
Exploiting Non-blocking Remote Memory Access Communication in Scientific Benchmarks (2003) (16)
High Performance MPI over iWARP: Early Experiences (2007) (16)
Slurm-V: Extending Slurm for Building Efficient HPC Cloud with SR-IOV and IVShmem (2016) (16)
Designing high performance DSM systems using InfiniBand features (2004) (16)
Designing Locality and NUMA Aware MPI Runtime for Nested Virtualization based HPC Cloud with SR-IOV Enabled InfiniBand (2017) (16)
Efficient data access strategies for Hadoop and Spark on HPC cluster with heterogeneous storage (2016) (16)
pNFS/PVFS2 over InfiniBand: early experiences (2007) (16)
High performance and scalable mpi intra-node communication middleware for multi-core clusters (2009) (16)
High-Performance Virtual Machine Migration Framework for MPI Applications on SR-IOV Enabled InfiniBand Clusters (2017) (16)
VIBe: a micro-benchmark suite for evaluating virtual interface architecture (VIA) implementations (2001) (16)
How Can We Design Better Networks for DSM Systems? (1997) (16)
Reducing network contention with mixed workloads on modern multicore, clusters (2009) (16)
In-memory I/O and replication for HDFS with Memcached: Early experiences (2014) (15)
Evaluation of Energy Characteristics of MPI Communication Primitives with RAPL (2013) (15)
An efficient scheme for complete exchange in 2D tori (1995) (15)
Designing Efficient Cooperative Caching Schemes for Multi-Tier Data-Centers over RDMA-enabled Networks (2006) (15)
Multi-threaded UPC runtime with network endpoints: Design alternatives and evaluation on multi-core architectures (2011) (15)
Boldio: A hybrid and resilient burst-buffer over lustre for accelerating big data I/O (2016) (15)
DLoBD: A Comprehensive Study of Deep Learning over Big Data Stacks on HPC Clusters (2018) (15)
Designing Next Generation Clusters: Evaluation of InfiniBand DDR/QDR on Intel Computing Platforms (2009) (15)
Improving Application Performance and Predictability Using Multiple Virtual Lanes in Modern Multi-core InfiniBand Clusters (2010) (15)
Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters (2015) (15)
Analyzing the impact of supporting out-of-order communication on in-order performance with iWARP (2007) (14)
Contention-Aware Kernel-Assisted MPI Collectives for Multi-/Many-Core Systems (2017) (14)
MIC-Check: a distributed check pointing framework for the intel many integrated cores architecture (2014) (14)
Scalable Graph500 design with MPI-3 RMA (2014) (14)
Re-Designing CNTK Deep Learning Framework on Modern GPU Enabled Clusters (2016) (14)
NIC-based atomic operations on Myrinet/GM (2002) (14)
Designing Efficient Shared Address Space Reduction Collectives for Multi-/Many-cores (2018) (14)
Performance Evaluation of RDMA over IP: A Case Study with the Ammasso Gigabit Ethernet NIC (2005) (14)
Design Alternatives for Virtual Interface Architecture and an Implementation on IBM Netfinity NT Cluster (2001) (14)
Designing Non-blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled Clusters (2015) (14)
Characterization of Big Data Stream Processing Pipeline: A Case Study using Flink and Kafka (2017) (14)
Designing MPI Library with On-Demand Paging (ODP) of InfiniBand: Challenges and Benefits (2016) (14)
GEMS: GPU-Enabled Memory-Aware Model-Parallelism System for Distributed DNN Training (2020) (14)
Efficient Barrier and Allreduce on InfiniBand Clusters using Hardware Multicast and Adaptive Algorithms (2004) (14)
Optimizing mechanisms for latency tolerance in remote memory access communication on clusters (2003) (14)
CUDA Kernel Based Collective Reduction Operations on Large-scale GPU Clusters (2016) (14)
A Comprehensive Performance Evaluation of OpenSHMEM Libraries on InfiniBand Clusters (2014) (14)
High Performance MPI on IBM 12x InfiniBand Architecture (2007) (13)
MIC-RO: enabling efficient remote offload on heterogeneous many integrated core (MIC) clusters with InfiniBand (2013) (13)
SALaR: Scalable and Adaptive Designs for Large Message Reduction Collectives (2018) (13)
MIBA: A Micro-Benchmark Suite for Evaluating InfiniBand Architecture Implementations (2003) (13)
Reconfigurable vector register windows for fast matrix computation on the orthogonal multiprocessor (1990) (13)
DDSS: A Low-Overhead Distributed Data Sharing Substrate for Cluster-Based Data-Centers over Modern Interconnects (2006) (13)
ProOnE: a general-purpose protocol onload engine for multi- and many-core architectures (2009) (13)
Efficient Asynchronous Communication Progress for MPI without Dedicated Resources (2018) (13)
Scalable, high-performance NIC-based all-to-all broadcast over Myrinet/GM (2004) (13)
Multicasting on Switch-Based Irregular Networks Using Multi-drop Path-Based Multidestination Worms (1997) (13)
Scalable architectures with k-ary n-cube cluster-c organization (1993) (13)
Designing Dynamic and Adaptive MPI Point-to-Point Communication Protocols for Efficient Overlap of Computation and Communication (2017) (13)
Wide-area overlay networking to manage science DMZ accelerated flows (2014) (13)
Evaluating the Impact of RDMA on Storage I/O over InfiniBand (2004) (13)
An MPI-Stream Hybrid Programming Model for Computational Clusters (2010) (12)
A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters (2014) (12)
Designing high-performance and resilient message passing on InfiniBand (2010) (12)
Performance Modeling for RDMA-Enhanced Hadoop MapReduce (2014) (12)
Design of network topology aware scheduling services for large InfiniBand clusters (2013) (12)
Designing Scalable Out-of-core Sorting with Hybrid MPI+PGAS Programming Models (2014) (12)
Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects (2020) (12)
Task assignment on distributed-memory systems with adaptive wormhole routing (1993) (12)
TupleQ: Fully-asynchronous and zero-copy MPI over InfiniBand (2009) (12)
Characterizing CUDA Unified Memory (UM)-Aware MPI Designs on Modern GPU Architectures (2019) (12)
Scalable MiniMD Design with Hybrid MPI and OpenSHMEM (2014) (12)
MR-Advisor: A Comprehensive Tuning Tool for Advising HPC Users to Accelerate MapReduce Applications on Supercomputers (2016) (12)
Designing Large Hierarchical Multiprocessor Systems under Processor, Interconnection, and Packaging Advancements (1994) (11)
Design and Implementation of High Performance MVAPICH2 (MPI2 over InfiniBand) (11)
Design and Implementation of High Performance MVAPICH2 (MPI2 over InfiniBand) (11)
A Novel Functional Partitioning Approach to Design High-Performance MPI-3 Non-blocking Alltoallv Collective on Multi-core Systems (2013) (11)
On using connection-oriented vs. connection-less transport for performance and scalability of collective and one-sided operations: trade-offs and impact (2007) (11)
Intra-MIC MPI Communication using MVAPICH 2 : Early Experience (2012) (11)
Implementing TreadMarks over Virtual Interface Architecture on Myrinet and gigabit Ethernet: Challenges, design experience, and performance evaluation (2001) (11)
Optimized large-message broadcast for deep learning workloads: MPI, MPI+NCCL, or NCCL2? (2019) (11)
Application-bypass reduction for large-scale clusters (2004) (11)
INAM2: InfiniBand Network Analysis and Monitoring with MPI (2016) (11)
Swift-X: Accelerating OpenStack Swift with RDMA for Building an Efficient HPC Cloud (2017) (11)
Low Latency Message-Passing for Reflective Memory Networks (1999) (11)
Architectural Support for Efficient Multicasting in Irregular Networks (2001) (11)
Performance Evaluation of MPI Libraries on GPU-Enabled OpenPOWER Architectures: Early Experiences (2019) (10)
Scaling TensorFlow, PyTorch, and MXNet using MVAPICH2 for High-Performance Deep Learning on Frontera (2019) (10)
PMI Extensions for Scalable MPI Startup (2014) (10)
High Performance MPI Datatype Support with User-Mode Memory Registration: Challenges, Designs, and Benefits (2015) (10)
Designing High-Performance MPI Libraries with On-the-fly Compression for Modern GPU Clusters* (2021) (10)
Can a Decentralized Metadata Service Layer Benefit Parallel Filesystems? (2011) (10)
Scalable and Distributed Key-Value Store-based Data Management Using RDMA-Memcached (2017) (10)
Design Alternatives and Performance Trade-Offs for Implementing MPI-2 over InfiniBand (2005) (10)
Exploiting RDMA operations for Providing Efficient Fine-Grained Resource Monitoring in Cluster-based Servers (2006) (9)
Designing a Profiling and Visualization Tool for Scalable and In-depth Analysis of High-Performance GPU Clusters (2019) (9)
Designing a high-performance clustered NAS: a case study with pNFS over RDMA on InfiniBand (2008) (9)
Mizan-RMA: Accelerating Mizan Graph Processing Framework with MPI RMA (2016) (9)
NIC-based reduction algorithms for large-scale clusters (2006) (9)
Minimizing Network Contention in InfiniBand Clusters with a QoS-Aware Data-Staging Framework (2012) (9)
Design and implementation of open MPI over Quadrics/Elan4 (2005) (9)
Efficient and truly passive MPI-3 RMA using InfiniBand atomics (2013) (9)
Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast (2019) (9)
Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand (2006) (9)
Network-Based Parallel Computing. Communication, Architecture, and Applications (1999) (9)
NIC-based intrusion detection : A feasibility study (2002) (9)
NIC-based rate control for proportional bandwidth allocation in Myrinet clusters (2001) (9)
Non-Blocking PMI Extensions for Fast MPI Startup (2015) (9)
Fast Broadcast and Multicast in Wormhole Multistage Networks with Multidestination Worms (1995) (9)
Can Network-Offload Based Non-blocking Neighborhood MPI Collectives Improve Communication Overheads of Irregular Graph Algorithms? (2012) (9)
On the provision of prioritization and soft qos in dynamically reconfigurable shared data-centers over infiniband (2005) (9)
Designing Zero-Copy Message Passing Interface Derived Datatype Communication Over Infiniband: Alternative Approaches and Performance Evaluation (2005) (9)
ormhole-Routed Networks with Virtual Channels (1996) (9)
INAM - A Scalable InfiniBand Network Analysis and Monitoring Tool (2011) (9)
High performance distributed deep learning: a beginner's guide (2019) (8)
Analyzing, Modeling, and Provisioning QoS for NVMe SSDs (2018) (8)
Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers (2007) (8)
CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters (2016) (8)
BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs (2021) (8)
Implementing TreadMarksover VIA on Myrinet and Gigabit Ethernet: Challenges, Design Experience, and Performance Evaluation (2001) (8)
FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures (2020) (8)
Initial study of multi-endpoint runtime for MPI+OpenMP hybrid programming model on multi-core systems (2014) (8)
MPI over uDAPL: Can High Performance and Portability Exist Across Architectures? (2006) (8)
Designing a Micro-Benchmark Suite to Evaluate gRPC for TensorFlow: Early Experiences (2018) (8)
FALCON: Efficient Designs for Zero-Copy MPI Datatype Processing on Emerging Architectures (2019) (8)
Can software reliability outperform hardware reliability on high performance interconnects?: a case study with MPI over infiniband (2008) (8)
Cooperative Rendezvous Protocols for Improved Performance and Overlap (2018) (8)
Understanding the communication characteristics in HBase: What are the fundamental bottlenecks? (2012) (8)
Advanced Flow-control Mechanisms for the Sockets Direct Protocol over InfiniBand (2007) (8)
Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase (2016) (8)
Exploiting Maximal Overlap for Non-Contiguous Data Movement Processing on Modern GPU-Enabled Systems (2016) (8)
A Case for UDP Offload Engines in LambdaGrids (2006) (7)
Experiences with software MPEG-2 video decompression on an SMP PC (1998) (7)
Message-ordering for wormhole-routed multiport systems with link contention and routing adaptivity (1994) (7)
Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over (2004) (7)
Implementing TreadMarks over GM on Myrinet: challenges, design experience, and performance evaluation (2003) (7)
Efficient Hardware Multicast Group Management for Multiple MPI Communicators over InfiniBand (2005) (7)
Design alternatives for implementing fence synchronization in MPI-2 one-sided communication for InfiniBand clusters (2009) (7)
Community Climate System Model (CCSM) (2011) (7)
High performance OpenSHMEM for Xeon Phi clusters: Extensions, runtime designs and application co-design (2014) (7)
Low-latency message passing on workstation clusters using SCRAMNet (1999) (7)
Efficient design for MPI asynchronous progress without dedicated resources (2019) (7)
Designing and Evaluating MPI-2 Dynamic Process Management Support for InfiniBand (2009) (7)
Designing Topology-Aware Communication Schedules for Alltoall Operations in Large InfiniBand Clusters (2014) (7)
Analysis of design considerations for optimizing multi-channel MPI over InfiniBand (2005) (7)
MPI-IO on DAFs over VIA: implementation and performance evaluation (2002) (7)
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow (2019) (7)
Designing next generation data-centers with advanced communication protocols and systems services (2006) (7)
Barrier synchronization in distributed-memory multiprocessors using rendezvous primitives (1993) (7)
Designing high performance communication runtime for GPU managed memory: early experiences (2016) (7)
Cutting the Tail: Designing High Performance Message Brokers to Reduce Tail Latencies in Stream Processing (2018) (6)
Enhancing the Performance of NFSv4 with RDMA (2007) (6)
Resource requirements for digital computations on electrooptical systems. (1991) (6)
Optimizing Collective Communication in UPC (2014) (6)
Can RDMA benefit online data processing workloads on memcached and MySQL? (2015) (6)
An efficient hardware-software approach to network fault tolerance with InfiniBand (2009) (6)
High Performance Alltoall and Allgather Designs for InfiniBand MIC Clusters (2014) (6)
Benchmarking key-value stores on high-performance storage and interconnects for web-scale workloads (2015) (6)
Prioritized demand multiplexing (PDM): a low-latency virtual channel flow control framework for prioritized traffic (1997) (6)
Performance Characterization of Hadoop Workloads on SR-IOV-Enabled Virtualized InfiniBand Clusters (2016) (6)
NemC: A Network Emulator for Cluster-of-Clusters (2006) (6)
Can Checkpoint/Restart Mechanisms Benefit from Hierarchical Data Staging? (2011) (6)
Communication and Architectural Support for Network-Based Parallel Computing (1997) (6)
Impact of Node Level Caching in MPI Job Launch Mechanisms (2009) (6)
Designing a Portable MPI-2 over Modern Interconnects Using uDAPL Interface (2005) (6)
Understanding the Memory-Utilization of MPI Libraries: Challenges and Designs in Implementing the MPI_T Interface (2014) (6)
Clustering and intra-processor scheduling for explicitly-parallel programs on distributed-memory systems (1994) (6)
Optimized Distributed Data Sharing Substrate in Multi-core Commodity Clusters: A Comprehensive Study with Applications (2008) (6)
Architecture for caching responses with multiple dynamic dependencies in multi-tier data-centers over InfiniBand (2005) (6)
MPI-2 One-Sided Usage and Implementation for Read Modify Write Operations: A Case Study with HPCC (2007) (5)
Redesigning MPI shared memory communication for large multi-core architecture (2013) (5)
Efficient multicast algorithms for switch-based irregular heterogeneous networks of workstations (2001) (5)
Designing Scalable and High-Performance MPI Libraries on Amazon Elastic Fabric Adapter (2019) (5)
Performance Evaluation of MM5 on Clusters with Modern Interconnects: Scalability and Impact (2005) (5)
Benefits of processor clustering in designing large parallel systems: when and how? (1996) (5)
Experiences and Benefits of Running RDMA Hadoop and Spark on SDSC Comet (2016) (5)
MetaData persistence using storage class memory: experiences with flash-backed DRAM (2013) (5)
Topology agnostic hot‐spot avoidance with InfiniBand (2009) (5)
Optimisation and performance evaluation of mechanisms for latency tolerance in remote memory access communication on clusters (2004) (5)
High-Performance Adaptive MPI Derived Datatype Communication for Modern Multi-GPU Systems (2019) (5)
SHMEMPMI -- Shared Memory Based PMI for Improved Performance and Scalability (2016) (5)
Dynamic Kernel Fusion for Bulk Non-contiguous Data Transfer on GPU Clusters (2020) (5)
Designing high performance and scalable mpi over infiniband (2004) (5)
Optimizing synchronization operations for remote memory communication systems (2003) (5)
High-Performance Multi-Rail Erasure Coding Library over Modern Data Center Architectures: Early Experiences (2018) (5)
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks (2013) (5)
Design and Characterization of InfiniBand Hardware Tag Matching in MPI (2020) (4)
NVMD: Non-volatile memory assisted design for accelerating MapReduce and DAG execution frameworks on HPC systems (2017) (4)
Adaptive and Hierarchical Large Message All-to-all Communication Algorithms for Large-scale Dense GPU Systems (2021) (4)
Adaptive Routing on the New Switch Chip for IBM SP Systems (2001) (4)
Designing QoS Aware MPI for InfiniBand-Techinical Report (2008) (4)
A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance Networks (2014) (4)
High performance network i/o in virtual machines over modern interconnects (2008) (4)
SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures* (2019) (4)
DistMILE: A Distributed Multi-Level Framework for Scalable Graph Embedding (2021) (4)
A QoS framework for clusters to support applications with resource adaptivity and predictable performance (2002) (4)
Scalable and high performance collective communication for next generation multicore infiniband clusters (2008) (4)
Accelerated Real-time Network Monitoring and Profiling at Scale using OSU INAM (2020) (4)
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (2008) (4)
Designing a ROCm-Aware MPI Library for AMD GPUs: Early Experiences (2021) (4)
Designing high-end computing systems with InfiniBand and10-Gigabit Ethernet iWARP (2007) (4)
High Performance OpenSHMEM Strided Communication Support with InfiniBand UMR (2015) (4)
A Scalable InfiniBand Network Topology-Aware Performance Analysis Tool for MPI (2012) (4)
GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks (2015) (4)
High Performance Block I/O for Global File System (GFS) with InfiniBand RDMA (2006) (4)
Enabling Performance Efficient Runtime Support for Hybrid MPI+UPC++ Programming Models (2016) (4)
MPI and communication - High-performance and scalable MPI over InfiniBand with reduced memory usage: an in-depth performance analysis (2006) (4)
Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects (2019) (4)
Accurate Load Monitoring for Cluster-based Web Data-Centers over RDMA-enabled Networks (4)
CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC (2016) (4)
C-GDR: High-Performance Container-Aware GPUDirect MPI Communication Schemes on RDMA Networks (2019) (4)
Characterizing Deep Learning over Big Data (DLoBD) Stacks on RDMA-Capable Networks (2017) (4)
Can Non-volatile Memory Benefit MapReduce Applications on HPC Clusters? (2016) (4)
Optimal Phase Barrier Synchronization in K-ary N-cube Wormhole-routed Systems Using Multirendezvous Primitives (1993) (4)
Issues in Designing Scalable Systems with K-ary N-cube Cluster-c Organization (1994) (4)
Codesign for InfiniBand Clusters (2011) (4)
Architectural support for efficient communication in scalable parallel systems (1998) (3)
Scalable systems software - A software based approach for providing network fault tolerance in clusters with uDAPL interface: MPI level design and performance evaluation (2006) (3)
A Case for Non-blocking Collectives in OpenSHMEM: Design, Implementation, and Performance Evaluation using MVAPICH2-X (2015) (3)
Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM (2015) (3)
Reducing Diff Overhead in Software DSM Systems using RDMA Operations in InfiniBand (2004) (3)
Profile-Based Load Balancing for Heterogeneous Clusters * (2007) (3)
High Performance Design and Implementation of Nemesis Communication Layer for Two-Sided and One-Sided MPI Semantics in MVAPICH2 (2010) (3)
Can NIC Memory in InfiniBand Benefit Communication Performance? — A Study with Mellanox Adapter (2004) (3)
UMR-EC: A Unified and Multi-Rail Erasure Coding Library for High-Performance Distributed Storage Systems (2019) (3)
Accelerating Big Data Processing on Modern Clusters (2015) (3)
Designing communication strategies for heterogeneous parallel systems (1998) (3)
Designing Registration Caching Free High-Performance MPI Library with Implicit On-Demand Paging (ODP) of InfiniBand (2017) (3)
Exploiting the Benefits of Multiple-Path Network DSM Systems: Architectural Alternatives and Performance Evaluation (1999) (3)
MPI-LiFE: Designing High-Performance Linear Fascicle Evaluation of Brain Connectome with MPI (2017) (3)
Balancing Web server load for adaptable video distribution (2000) (3)
Design and Implementation of Key Proposed MPI-3 One-Sided Communication Semantics on InfiniBand (2011) (3)
Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters (2016) (3)
Communication-Aware Hardware-Assisted MPI Overlap Engine (2020) (3)
Interactive medical data on demand: a high-performance imaged-based approach across heterogeneous environments. (2000) (3)
Efficient Training of Semantic Image Segmentation on Summit using Horovod and MVAPICH2-GDR (2020) (3)
Performance characterization and acceleration of big data workloads on OpenPOWER system (2017) (3)
USC orthogonal multiprocessor for image processing with neural networks (1990) (3)
Architectural issues in designing heterogeneous parallel systems with passive star-coupled optical interconnection (1994) (3)
A Plugin-Based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS (2015) (3)
HPC Meets Cloud: Building Efficient Clouds for HPC, Big Data, and Deep Learning Middleware and Applications (2017) (3)
Demotion-based exclusive caching through demote buffering: design and evaluations over different networks (2003) (3)
High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi Clusters (2015) (3)
Accelerating MPI All-to-All Communication with Online Compression on Modern GPU Clusters (2022) (3)
Adaptive routing in RS/6000 SP-like bidirectional multistage interconnection networks (2000) (3)
Communication mechanisms and algorithms for supporting scalable collective communication on parallel systems (1998) (3)
A Scalable Network-Based Performance Analysis Tool for MPI on Large-Scale HPC Systems (2017) (3)
Designing Network Failover and Recovery in MPI for Multi-Rail InfiniBand Clusters (2012) (3)
EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures (2018) (3)
Jodia (L5) and Mahadevpur (H4/5): Two Recent Ordinary Chondrite Falls in India (2009) (3)
High Performance MPI over the Slingshot Interconnect: Early Experiences (2022) (3)
Scalable MPI Collectives using SHARP: Large Scale Performance Evaluation on the TACC Frontera System (2020) (3)
Co-Designing MPI Library and Applications for InfiniBand Clusters (2011) (2)
SUPER: SUb-Graph Parallelism for TransformERs (2021) (2)
Exploiting Remote Memory in InfiniBand Clusters using a High Performance Network Block Device (HPBD) (2005) (2)
Can Scatter Communication Take Advantage of Multidestination Message Passing? (2000) (2)
Thinking Beyond the RAM Disk for In-Memory Checkpointing of HPC Applications (2013) (2)
Machine-agnostic and Communication-aware Designs for MPI on Emerging Architectures (2020) (2)
Leveraging Network-level parallelism with Multiple Process-Endpoints for MPI Broadcast (2019) (2)
Offloaded GPU Collectives Using CORE-Direct and CUDA Capabilities on InfiniBand Clusters (2015) (2)
Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications (2016) (2)
QoS-aware middleware for cluster-based servers to support interactive and resource-adaptive applications (2003) (2)
Architectural Design of Orthogonal Multiprocessor for Multidimensional Information Processing (1991) (2)
On-demand Connection Management for OpenSHMEM and OpenSHMEM+MPI (2015) (2)
Accelerating CPU-based Distributed DNN Training on Modern HPC Clusters using BlueField-2 DPUs (2021) (2)
Designing High-End Computing Systems with InfiniBand and High-Speed Ethernet (2010) (2)
Efficient Multicast Algorithms for Heterogeneous Switch-based Irregular Networks of Workstations (2001) (2)
Complete Exchange in 2 D MeshesN (1994) (2)
Designing Processor-cluster Based Systems: Interplay Between Cluster Organizations and Collective Co (1996) (2)
Characterizing and accelerating indexing techniques on distributed ordered tables (2017) (2)
Advanced RDMA-Based Admission Control for Modern Data-Centers (2008) (2)
Algorithm-Driven Simulation and Performance Projection of a RISC-based Orthogonal Multiprocessor (1990) (2)
Networking and communication challenges for post-exascale systems (2018) (2)
Exploring Hybrid MPI+Kokkos Tasks Programming Model (2020) (2)
Sockets direct protocol for hybrid network stacks: a case study with iWARP over 10G Ethernet (2008) (2)
Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures (2019) (2)
Does RDMA-based enhanced Hadoop MapReduce need a new performance model? (2013) (2)
MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling (2017) (2)
Analyzing and Understanding the Impact of Interconnect Performance on HPC, Big Data, and Deep Learning Applications: A Case Study with InfiniBand EDR and HDR (2020) (2)
Designing Virtualization-Aware and Automatic Topology Detection Schemes for Accelerating Hadoop on SR-IOV-Enabled Clouds (2016) (2)
Fast Data Manipulation in Multiprocessors Using Parallel Pipelined Memories (1991) (2)
Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries (2022) (1)
Characterization of Structure-Borne Road/Tire Noise Inside a Passenger Car Cabin Using Path Based Analysis (2013) (1)
SCOR-KV: SIMD-Aware Client-Centric and Optimistic RDMA-Based Key-Value Store for Emerging CPU Architectures (2019) (1)
Performance Characterization of Network Mechanisms for Non-Contiguous Data Transfers in MPI (2020) (1)
Active network interface: opportunities and challenges (2002) (1)
Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures (2018) (1)
High-Performance Coarray Fortran Support with MVAPICH2-X: Initial Experience and Evaluation (2015) (1)
Designing processor-cluster based systems: interplay between cluster organizations and broadcasting algorithms (1996) (1)
Eecient Collective Communication on Heterogenous Networks of Workstations Eecient Collective Communication on Heterogeneous Networks of Workstations 1 (1998) (1)
Design and Implementation of Open MPI over QsNet/Elan4 (2004) (1)
Hy-Fi: Hybrid Five-Dimensional Parallel DNN Training on High-Performance GPU Clusters (2022) (1)
Fast Collective Communication Algorithms for Reflective Memory Network Clusters (2000) (1)
Configurable, Highly Parallel Computer (2011) (1)
INAM: Cross-stack Profiling and Analysis of Communication in MPI-based Applications (2021) (1)
Designing high-performance and scalable clustered network attached storage with infiniband (2008) (1)
Building Efficient HPC Cloud with SR-IOV-Enabled InfiniBand: The MVAPICH2 Approach (2017) (1)
A Parallel-Serial Binary Arbitration Scheme for Collision-Free Multi-Access Techniques (1988) (1)
Message Vectorization for Converting Multicomputer Programs to Shared-Memory Multiprocessors (1991) (1)
Designing next generation clusters with InfiniBand and 10GE/iWARP: Opportunities and challenges (2008) (1)
A Portable InfiniBand Module for MPICH 2 / Nemesis : Design and Evaluation (2009) (1)
Impact of InfiniBand DC Transport Protocol on Energy Consumption of All-to-All Collective Algorithms (2015) (1)
An Evaluation of Preemption Strategies for Parallel Job Scheduling (1)
Cluster File Systems (2011) (1)
Benefits of Dedicating Resource Sharing Services in DataCenters for Emerging Multi-Core Systems (2007) (1)
Improving cluster performance through the use of programmable network interfaces (2003) (1)
Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters (2016) (1)
Analysis of routing in pyramid architectures (1993) (1)
Designing efficient communication subsystems for distributed shared memory (dsm) systems (1999) (1)
Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware (2015) (1)
Can Streaming SIMD Non-Temporal Instructions Benefit Intra-node MPI Communication on Modern Multi-core Platforms? (2010) (1)
Designing high performance parallel systems: a processor-cluster based approach (1996) (1)
Community Ice Code (CICE) (2011) (1)
Topology agnostic hot-spot avoidance with InfiniBand (2009) (1)
Optimizing Distributed DNN Training Using CPUs and BlueField-2 DPUs (2022) (1)
Can high performance software DSM systems designed with InfiniBand features benefit from PCI-Express? (2005) (1)
MPI Meets Cloud: Case Study with Amazon EC2 and Microsoft Azure (2020) (1)
Optimizing a Stencil-Based Application for Earthquake Modeling on Modern InfiniBand Clusters (2009) (1)
On the Hardware Requirement for 2-D Image Convolution in Electro-Optical Systems1 (1989) (1)
Geochemical Evidence for the Meteorite Impact Origin of Ramgarh Structure, India (2008) (1)
Layout-aware Hardware-assisted Designs for Derived Data Types in MPI (2021) (1)
Reliable Hardware Barrier SynchronizationSchemes (1997) (1)
Arm meets Cloud: A Case Study of MPI Library Performance on AWS Arm-based HPC Cloud with Elastic Fabric Adapter (2022) (1)
Efficient MPI-based Communication for GPU-Accelerated Dask Applications (2021) (1)
Critical Sections (2011) (1)
Bene ts of Processor Clustering in Designing Large ParallelSystems : When and How ? (1995) (1)
Towards Architecture-aware Hierarchical Communication Trees on Modern HPC Systems (2021) (1)
Optical arithmetic using high-radix symbolic substitution rules (1989) (1)
Blink: Towards Efficient RDMA-based Communication Coroutines for Parallel Python Applications (2020) (1)
Efficient Personalized and Non-Personalized Alltoall Communication for Modern Multi-HCA GPU-Based Clusters (2022) (0)
Design and Performance of PVFS over InfiniBand (0)
Distributed Memory Parallel Systems (1999) (0)
Panel: Data intensive computing. (2006) (0)
Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version (2023) (0)
Proceedings of the International Conference on Parallel Processing Workshops: Welcome Message (2010) (0)
Functional Partitioning: Enabling and Optimizing Exascale Runtime Services (2012) (0)
Computer Performance Evaluation. Modelling Techniques and Tools (2003) (0)
Special Issue on Communication Architecture for Clusters: Guest Editors Introduction (2004) (0)
Feature estimation for efficient streaming (2002) (0)
Final Report for Project DE-FC02-06ER25755 [Pmodels2] (2014) (0)
Designing Next Generation Clusters, Cluster-Based Servers and Datacenters with InfiniBand: Opportunities and Challenges (2003) (0)
4.2 Multi-phase Barrier Synchronization 4.3 Optimal Number of Phases 3.1.2 Multi-path-based Schemes 3.2 Synchronization-worm-based Scheme 4 Barrier Synchronization Using Multiren- Dezvous Primitives 4.1 Single-phase Barrier Synchronization (1993) (0)
Networking and communication challenges for post-exascale systems (2018) (0)
Designing Efficient Pipelined Communication Schemes using Compression in MPI Libraries (2022) (0)
Tutorial: Designing High-End Computing Systems with Infiniband and 10-Gigabit Ethernet (2009) (0)
Spark Meets MPI: Towards High-Performance Communication Framework for Spark using MPI (2022) (0)
Message from the general co-chairs IEEE ICPADS 2014 (2014) (0)
Proceedings of the First International Workshop on Communication and Architectural Support for Network-Based Parallel Computing (1997) (0)
Adaptive Receiver Window Scaling: Minimizing MPI Communication Memory over InfiniBand (2006) (0)
Kernel-Assisted Communication Engine for MPI on Emerging Manycore Processors (2017) (0)
Proceedings of the Second International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications (1998) (0)
High Performance MPI over the Slingshot Interconnect (2023) (0)
AccDP: Accelerated Data-Parallel Distributed DNN Training for Modern GPU-Based HPC Clusters (2022) (0)
Challenges and Opportunities in Designing High-Performance and Scalable Middleware for HPC and AI: Past, Present, and Future (2022) (0)
Proceedings, 1999 International Conference on Parallel Processing, 21-24 September 1999 Aizu-Wakamatsu City, Japan (1999) (0)
Tutorial III (2006) (0)
Large-Message Nonblocking MPI_Iallgather and MPI Ibcast Offload via BlueField-2 DPU (2021) (0)
Performance Evaluation of RDMA over IP : A Case Study with Ammasso Gigabit Ethernet (2005) (0)
Designing Multi-Core Aware Inter-Communicator Operations for MPI-2 Dynamic Process Management (2009) (0)
Performance Evaluation of RDMA over IP : A Case Study with Ammasso Gigabit Ethernet (2005) (0)
Highly Efficient Alltoall and Alltoallv Communication Algorithms for GPU Systems (2022) (0)
Designing High Performance DSM Systems using InfiniBand: Opportunities, Challenges and Experiences (0)
Introduction to HPBDC Workshop (2017) (0)
Optimized MPI Gather Collective for Many Integrated Core (MIC) InfiniBand Clusters (2013) (0)
Introduction to HPBDC 2019 (2019) (0)
Workshop Introduction (2002) (0)
IPDPS 2007: Comments from the Guest Editor (2009) (0)
Efficient and Scalable NIC-Based Barrier over Quadrics and Myrinet (0)
Dynamic Reconfigurability Support for providing Soft QoS Guarantees in Cluster-based Multi-Tier DataCenters over (0)
Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters (2016) (0)
IBM Power (2011) (0)
Lightning Talks of EduHPC 2022 (2022) (0)
A Portable Client/Server Communication Middleware over SANs: Design and Performance Evaluation with Virtual Interface and InfiniBand (2003) (0)
Designing Hierarchical Multi-HCA Aware Allgather in MPI (2022) (0)
Towards Java-based HPC using the MVAPICH2 Library: Early Experiences (2022) (0)
TC on Parallel Processing (1995) (0)
Spark-uDAPL: Cost-Saving Big Data Analytics on Microsoft Azure Cloud with RDMA Networks* (2018) (0)
Network-Assisted Noncontiguous Transfers for GPU-Aware MPI Libraries (2023) (0)
Coordinated Fault-Tolerance for High-Performance Computing Final Project Report (2011) (0)
Cross-layer Visualization and Profiling of Network and I/O Communication for HPC Clusters (2021) (0)
Additional Studies of Materials from the Ramgarh Structure, India (2009) (0)
Architectural and Communication Issues in DesigningHeterogeneous Parallel Systems with OpticalInterconnection (1994) (0)
Design and Evaluation of Shared Memory CommunicationBenchmarks on Emerging Architectures using MVAPICH2 (2019) (0)
Performance Analysis and Improved Communication Overlap for a Seismic Modeling Application on Large InfiniBand Clusters (2010) (0)
CCF THPC inaugural issue editorial (2019) (0)
Scalable and high-performance mpi design for very large infiniband clusters (2007) (0)
Eecient Multicast on Irregular Switch-based Networks with Cut-through Switching This Manuscript Is under Review for Publication in the Ieee Transactions on Parallel and Distributed Systems. Eecient Multicast on Irregular Switch-based Networks with Cut-through Switching (2007) (0)
Tutorials - HOTI 2012 (2012) (0)
Enhancing mpi with modern networking mechanisms in cluster interconnects (2006) (0)
"Hey CAI" - Conversational AI Enabled User Interface for HPC Tools (2022) (0)
State of InfiniBand in designing HPC clusters, storage/file systems, and datacenters [datacenters read as data centers] (2004) (0)
Scalable Out-of-core OpenSHMEM Library for HPC (2015) (0)
Numerical Modeling of Critical Path Contributions for NVH Prediction of Vehicle (2013) (0)
Introduction to HPBDC 2018 (2018) (0)
IPDPS 2007 Organization (2007) (0)
Commodity High Performance Interconnects (2009) (0)
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning (2023) (0)
Design and implementation of high performance communication subsystems for clusters (2000) (0)
Tutorial: Infiniband and 10-Gigabit Ethernet for Dummies (2009) (0)
OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems (2021) (0)
CCF THPC inaugural issue editorial (2019) (0)
Redesigning MPI shared memory communication for large multi-core architecture (2012) (0)
Supercomputing Frontiers: 6th Asian Conference, SCFA 2020, Singapore, February 24–27, 2020, Proceedings (2020) (0)
Proceedings of the 2008 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, ANCS 2008, San Jose, California, USA, November 6-7, 2008 (2008) (0)
NFS / RDMA over InfiniBand : Is It Beneficial ? ∗ (0)
Designing Zero-Copy FTP Mechanisms to Achieve High Performance Data-Transfer over InfiniBand WAN (2008) (0)
Table of Contents (2003) (0)
Designing Next Generation Clusters with Infiniband: Opportunities and Challenges (2003) (0)
HPBDC Introduction and Committees (2016) (0)
An Enhanced MPI-2 Dynamic Process Management Support for InfiniBand (2009) (0)
under Pa Technological Advancements (1996) (0)
High perfomance and network fault tolerant mpi with multi-pathing over infiniband (2007) (0)
OpenSHMEM Non-blocking Data Movement Operations with MVAPICH2-X: Early Experiences (2016) (0)
Designing high performance and scalable distributed datacenter services over modern interconnects (2008) (0)
Accelerating GPU-based Machine Learning in Python using MPI Library: A Case Study with MVAPICH2-GDR (2020) (0)
Designing NextGeneration Data-Centers withAdvanced Communication Protocols andSystems Services * (2006) (0)
An Architectural study of Cluster-Based Multi-Tier DataCenters (2004) (0)
Better NFS through RDMA and Efficient Memory Registration (2007) (0)
Second workshop on system-level Virtualization for High Performance Computing (HPCVirt 2008) (2008) (0)
Veloblock: Efficient and Scalable RDMA Fast Path for InfiniBand (2009) (0)
Architectural Support for Eecient Multicasting in Irregular Networks Architectural Support for Eecient Multicasting in Irregular Networks (2001) (0)
PVFS2 over Quadrics: Design, Implementation and Performan ce Evaluation (2005) (0)
AdvancedRDMA-basedAdmissionControlforModernData-Centers (2008) (0)
InfiniBand (2011) (0)
18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018, Washington, DC, USA, May 1-4, 2018 (2018) (0)
Tutorials (2018) (0)
Efficient Checkpoint/Restart for Multi-Channel MPI over Multi-core Clusters (2009) (0)
Vectorized interprocessor communication and data movement in shared-memory multiprocessors (1992) (0)
Exploiting and Evaluating OpenSHMEM on KNL Architecture (2017) (0)
Accelerating Broadcast Communication with GPU Compression for Deep Learning Workloads (2022) (0)
CCGrid 2020 Committees (2020) (0)
Tutorial 2: InfiniBand Architecture and Where it is Headed (2002) (0)
Community Climate Model (CCM) (2011) (0)
Future Directions of the Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Program (2020) (0)
The emergence of workstation clusters: Should we continue to build mpps? [panel session] (1998) (0)
Communication and memory management in networked storage systems (2004) (0)
Designing High-Performance Communication Subsystems: Top Five Problems to Solve and Five Problems Not to Solve During the Next Five Years (Panel) (1997) (0)
Adaptive Routing on the New Swit h Chipfor IBM SP (2007) (0)
Eecient Scatter Communication in Wormhole K-ary N-cubes with Multidestination Message Passing 1 (1996) (0)
A Case for UDP Of fl oad Engines in LambdaGrids (2006) (0)
Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences (2021) (0)
Collective Communication, Network Support For (2011) (0)

This paper list is powered by the following services:

What Schools Are Affiliated With Dhabaleswar K. D. K. Panda?

Dhabaleswar K. D. K. Panda is affiliated with the following schools:

Ohio State University

Dhabaleswar K. D. K. Panda's Academic­Influence.com Rankings

Why Is Dhabaleswar K. D. K. Panda Influential?

Dhabaleswar K. D. K. Panda's Published Works

Published Works

What Schools Are Affiliated With Dhabaleswar K. D. K. Panda?

Dhabaleswar K. D. K. Panda's AcademicInfluence.com Rankings