Dhabaleswar K. D. K. Panda
#130,849
Most Influential Person Now
Dhabaleswar K. D. K. Panda's AcademicInfluence.com Rankings
Dhabaleswar K. D. K. Pandacomputer-science Degrees
Computer Science
#5746
World Rank
#6065
Historical Rank
Parallel Computing
#29
World Rank
#29
Historical Rank
Database
#2877
World Rank
#3002
Historical Rank

Download Badge
Computer Science
Why Is Dhabaleswar K. D. K. Panda Influential?
(Suggest an Edit or Addition)Dhabaleswar K. D. K. Panda's Published Works
Number of citations in a given year to any of this author's works
Total number of citations to an author for the works they published in a given year. This highlights publication of the most important work(s) by the author
Published Works
- High Performance RDMA-Based MPI Implementation over InfiniBand (2003) (433)
- High Performance VMM-Bypass I/O in Virtual Machines (2006) (319)
- A case for high performance computing with virtual machines (2006) (317)
- EMP: Zero-Copy OS-Bypass NIC-Driven Gigabit Ethernet Message Passing (2001) (204)
- Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System (2007) (186)
- Memcached Design on High Performance RDMA Capable Interconnects (2011) (184)
- High performance RDMA-based design of HDFS over InfiniBand (2012) (179)
- Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics (2003) (174)
- High performance virtual machine migration with RDMA over modern interconnects (2007) (145)
- MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters (2011) (144)
- Scalable Earthquake Simulation on Petascale Supercomputers (2010) (140)
- Beyond block I/O: Rethinking traditional storage primitives (2011) (134)
- S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters (2017) (130)
- Efficient collective communication on heterogeneous networks of workstations (1998) (129)
- Design and implementation of MPICH2 over InfiniBand with RDMA support (2003) (121)
- Virtual machine aware communication libraries for high performance computing (2007) (119)
- High-Performance Design of Hadoop RPC with RDMA over InfiniBand (2013) (118)
- High Performance Remote Memory Access Communication: The Armci Approach (2006) (118)
- Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs (2013) (112)
- Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device (2005) (112)
- Fast collective operations using shared and remote memory access protocols on clusters (2003) (110)
- Multidestination Message Passing in Wormhole k-ary n-cube Networks with Base Routing Conformed Paths (1999) (105)
- RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits (2006) (104)
- Multidestination Message Passing Mechanism Conforming to Base Wormhole Routing Scheme (1994) (100)
- Performance characterization of a 10-Gigabit Ethernet TOE (2005) (97)
- MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics (2008) (96)
- LiMIC: support for high-performance MPI intra-node communication on Linux cluster (2005) (96)
- High-Performance Design of HBase with RDMA over InfiniBand (2012) (95)
- Microbenchmark performance comparison of high-speed cluster interconnects (2004) (95)
- PVFS over InfiniBand: design and performance evaluation (2003) (95)
- A reliable multicast algorithm for mobile ad hoc networks (2002) (92)
- High performance MPI-2 one-sided communication over InfiniBand (2004) (91)
- Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters (2006) (86)
- A trip-based multicasting model for wormhole-routed networks with virtual channels (1993) (84)
- Accelerating Spark with RDMA for Big Data Processing: Early Experiences (2014) (84)
- Fast and scalable MPI-level broadcast using InfiniBand's hardware multicast support (2004) (84)
- Designing topology-aware collective communication algorithms for large scale InfiniBand clusters: Case studies with Scatter and Gather (2010) (83)
- Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture (2015) (79)
- Design of High Performance MVAPICH2: MPI2 over InfiniBand (2006) (79)
- CIFTS: A Coordinated Infrastructure for Fault-Tolerant Systems (2009) (79)
- Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand (2006) (78)
- Optimizing MPI Communication on Multi-GPU Systems Using CUDA Inter-Process Communication (2012) (73)
- GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation (2014) (73)
- Sockets Direct Protocol over InfiniBand in clusters: is it beneficial? (2004) (72)
- High-Performance RDMA-based Design of Hadoop MapReduce over InfiniBand (2013) (72)
- High Performance Design for HDFS with Byte-Addressability of NVM and RDMA (2016) (70)
- Multicast on irregular switch-based networks with wormhole routing (1997) (68)
- Towards NIC-based intrusion detection (2003) (67)
- Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation (2004) (65)
- Communication modeling of heterogeneous networks of workstations for performance characterization of collective operations (1999) (65)
- SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience (2013) (64)
- Implementing Multidestination Worms In Switch-based Parallel Systems: Architectural Alternatives And Their Impact (1997) (64)
- Frontera: The Evolution of Leadership Computing at the National Science Foundation (2020) (63)
- High-Performance and Scalable MPI over InfiniBand with Reduced Memory Usage: An In-Depth performance Analysis (2006) (61)
- High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters (2007) (59)
- DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements (2007) (59)
- HIPIQS: a high-performance switch architecture using input queuing (1998) (59)
- Scalable Memcached Design for InfiniBand Clusters Using Hybrid Transports (2012) (58)
- Shared receive queue based scalable MPI design for InfiniBand clusters (2006) (58)
- High performance user level sockets over Gigabit Ethernet (2002) (57)
- A 1 PB/s file system to checkpoint three million MPI tasks (2013) (57)
- Unifying UPC and MPI runtimes: experience with MVAPICH (2010) (56)
- Performance Analysis and Evaluation of PCIe 2.0 and Quad-Data Rate InfiniBand (2008) (55)
- Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2 (2011) (54)
- High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters: a study with parallel 3D FFT (2011) (54)
- Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes (2012) (54)
- High performance implementation of MPI derived datatype communication over InfiniBand (2004) (53)
- Efficient broadcast and multicast on multistage interconnection networks using multiport encoding (1996) (53)
- Performance Analysis and Evaluation of InfiniBand FDR and 40GigE RoCE on HPC and Cloud Computing Systems (2012) (53)
- Nomad: migrating OS-bypass networks in virtual machines (2007) (52)
- Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck (2004) (52)
- Scalable NIC-based Reduction on Large-scale Clusters (2003) (52)
- Can High-Performance Interconnects Benefit Hadoop Distributed File System ? (2010) (51)
- Selective preemption strategies for parallel job scheduling (2002) (51)
- QoPS: A QoS Based Scheme for Parallel Job Scheduling (2003) (51)
- MVAPICH-Aptus: Scalable high-performance multi-transport MPI over InfiniBand (2008) (50)
- Designing multi-leader-based Allgather algorithms for multi-core clusters (2009) (49)
- Host-assisted zero-copy remote memory access communication on InfiniBand (2004) (49)
- High Performance RDMA Based All-to-All Broadcast for InfiniBand Clusters (2005) (49)
- Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms (1995) (49)
- Efficient Shared Memory and RDMA Based Design for MPI_Allgather over InfiniBand (2006) (48)
- Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms (2007) (48)
- High-Performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA (2015) (47)
- Scaling alltoall collective on multi-core systems (2008) (46)
- Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms (2004) (45)
- High-performance design of apache spark with RDMA and its benefits on various workloads (2016) (45)
- An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures (2017) (45)
- HOMR: a hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects (2014) (44)
- Stampede 2: The Evolution of an XSEDE Supercomputer (2017) (44)
- Lightweight kernel-level primitives for high-performance MPI intra-node communication over multi-core systems (2007) (44)
- Adaptive connection management for scalable MPI over InfiniBand (2006) (43)
- OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters (2012) (43)
- Design and Evaluation of Network Topology-/Speed- Aware Broadcast Algorithms for InfiniBand Clusters (2011) (41)
- Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective (2007) (41)
- Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters (2010) (41)
- Reducing Connection Memory Requirements of MPI for InfiniBand Clusters: A Message Coalescing Approach (2007) (40)
- Bandwidth-Optimal Complete Exchange on Wormhole-Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach (1997) (39)
- MVAPICH-PRISM: A proxy-based communication framework using InfiniBand and SCIF for Intel MIC clusters (2013) (39)
- Fast and Scalable Barrier Using RDMA and Multicast Mechanisms for InfiniBand-Based Clusters (2003) (39)
- Efficient Implementation of MPI-2 Passive One-Sided Communication on InfiniBand Clusters (2004) (39)
- High performance and reliable NIC-based multicast over Myrinet/GM-2 (2003) (38)
- MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems (2001) (38)
- Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning (2016) (37)
- How much does network contention affect distributed shared memory performance? (1997) (37)
- Reducing cache invalidation overheads in wormhole routed DSMs using multidestination message passing (1996) (37)
- RDMA over Ethernet — A preliminary study (2009) (37)
- InfiniBand Architecture (2001) (37)
- Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? (2017) (37)
- Enhancing Checkpoint Performance with Staging IO and SSD (2010) (37)
- Data intensive computing (2006) (37)
- High performance support of parallel virtual file system (PVFS2) over Quadrics (2005) (37)
- Designing Scalable Graph500 Benchmark with Hybrid MPI+OpenSHMEM Programming Models (2013) (36)
- Scalable MPI design over InfiniBand using eXtended Reliable Connection (2008) (36)
- Fast NIC-based barrier over Myrinet/GM (2001) (36)
- Efficient SMP-aware MPI-level broadcast over InfiniBand's hardware multicast (2006) (36)
- Protocols and strategies for optimizing performance of remote memory operations on clusters (2002) (36)
- Broadcast/Multicast over Myrinet Using NIC-Assisted Multidestination Messages (2000) (35)
- Efficient Intra-node Communication on Intel-MIC Clusters (2013) (35)
- High Performance Data Transfer in Grid Environment Using GridFTP over InfiniBand (2010) (34)
- Multicasting in Irregular Networks with Cut-Through Switches Using Tree-Based Multidestination Worms (1997) (34)
- Designing an Efficient Kernel-Level and User-Level Hybrid Approach for MPI Intra-Node Communication on Multi-Core Systems (2008) (33)
- Complete exchange in 2D meshes (1994) (32)
- Monitoring and Predicting Hardware Failures in HPC Clusters with FTB-IPMI (2012) (32)
- Benefits of high speed interconnects to cluster file systems: a case study with Lustre (2006) (32)
- Group-based Coordinated Checkpointing for MPI: A Case Study on InfiniBand (2007) (32)
- Hybrid algorithms for complete exchange in 2D meshes (2001) (32)
- Performance Characterization of Hypervisor-and Container-Based Virtualization for HPC on SR-IOV Enabled InfiniBand Clusters (2016) (32)
- Design and evaluation of benchmarks for financial applications using Advanced Message Queuing Protocol (AMQP) over InfiniBand (2008) (32)
- High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations (2007) (31)
- Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation (2018) (31)
- A case for application-oblivious energy-efficient MPI runtime (2015) (31)
- Supporting Hybrid MPI and OpenSHMEM over InfiniBand: Design and Performance Evaluation (2012) (31)
- Designing Clustered Multiprocessor Systems under Packaging and Technological Advancements (1996) (31)
- Efficient asynchronous memory copy operations on multi-core systems and I/OAT (2007) (31)
- Zero-Copy MPI Derived Datatype Communication over InfiniBand (2004) (31)
- MPI over InfiniBand: Early Experiences (2003) (31)
- Efficient collective operations using remote memory operations on VIA-based clusters (2003) (31)
- Efficient virtual interface architecture (VIA) support for the IBM SP switch-connected NT clusters (2000) (31)
- Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters (2014) (30)
- Extending OpenSHMEM for GPU Computing (2013) (30)
- Characterization and enhancement of dynamic mapping heuristics for heterogeneous systems (2000) (30)
- SOR-HDFS: a SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS (2014) (30)
- Efficient and scalable barrier over Quadrics and Myrinet with a new NIC-based collective message passing protocol (2004) (29)
- Optimal multicast with packetization and network interface support (1997) (29)
- A reliable hardware barrier synchronization scheme (1997) (29)
- Towards provision of quality of service guarantees in job scheduling (2004) (28)
- Can user-level protocols take advantage of multi-CPU NICs? (2002) (28)
- Supporting efficient noncontiguous access in PVFS over Infiniband (2003) (28)
- RDMA-Based Job Migration Framework for MPI over InfiniBand (2010) (28)
- Evaluating InfiniBand performance with PCI Express (2005) (28)
- Minimizing node contention in multiple multicast on wormhole k-ary n-cube networks (1996) (27)
- Performance evaluation of InfiniBand with PCI Express (2004) (27)
- Design and Evaluation of Generalized Collective Communication Primitives with Overlap Using ConnectX-2 Offload Engine (2010) (27)
- High performance MPI library over SR-IOV enabled infiniband clusters (2014) (27)
- Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application (2010) (26)
- Performance characterization and acceleration of in-memory file systems for Hadoop and Spark applications on HPC clusters (2015) (26)
- Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers (2011) (26)
- Exploiting NIC architectural support for enhancing IP-based protocols on high-performance networks (2005) (26)
- High-radix symbolic substitution and superposition techniques for optical matrix algebraic computations (1992) (26)
- Scalable Reduction Collectives with Data Partitioning-based Multi-Leader Design (2017) (26)
- Micro-benchmark level performance comparison of high-speed cluster interconnects (2003) (26)
- Asynchronous zero-copy communication for synchronous sockets in the sockets direct protocol (SDP) over InfiniBand (2006) (26)
- System-Level Scalable Checkpoint-Restart for Petascale Computing (2016) (26)
- Head-to-TOE Evaluation of High-Performance Sockets over Protocol Offload Engines (2005) (25)
- Natively Supporting True One-Sided Communication in MPI on Multi-core Systems with InfiniBand (2009) (25)
- Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters (2013) (25)
- High Performance MPI Library for Container-Based HPC Cloud on InfiniBand Clusters (2016) (25)
- SSD-Assisted Hybrid Memory to Accelerate Memcached over High Performance Networks (2012) (25)
- Impact of adaptivity on the behavior of networks of workstations under bursty traffic (1998) (25)
- Accelerating I/O Performance of Big Data Analytics on HPC Clusters through RDMA-Based Key-Value Store (2015) (25)
- Implementing efficient and scalable flow control schemes in MPI over InfiniBand (2004) (25)
- Performance modeling of subnet management on fat tree InfiniBand networks using OpenSM (2005) (24)
- Congestion avoidance on manycore high performance computing systems (2012) (24)
- Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? (2013) (24)
- Supporting MPI-2 One Sided Communication on Multi-rail InfiniBand Clusters: Design Challenges and Performance Benefits (2005) (24)
- OMP: a RISC-based multiprocessor using orthogonal-access memories and multiple spanning buses (1990) (24)
- NIC-Based Reduction in Myrinet Clusters: Is It Beneficial? (2003) (23)
- Impact of high performance sockets on data intensive applications (2003) (23)
- Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand (2009) (23)
- Efficient barrier using remote memory operations on VIA-based clusters (2002) (23)
- Automatic Path Migration over InfiniBand: Early Experiences (2007) (23)
- Can Inter-VM Shmem Benefit MPI Applications on SR-IOV Based Virtualized Infiniband Clusters? (2014) (23)
- Power-Check: An Energy-Efficient Checkpointing Framework for HPC Clusters (2015) (23)
- Impact of on-demand connection management in MPI over VIA (2002) (23)
- NIC-based offload of dynamic user-defined modules for Myrinet clusters (2004) (23)
- Fast and Scalable Startup of MPI Programs in InfiniBand Clusters (2004) (23)
- Designing NFS with RDMA for Security, Performance and Scalability (2007) (22)
- A Comprehensive Study of MapReduce Over Lustre for Intermediate Data Placement and Shuffle Strategies on HPC Clusters (2017) (22)
- MPI performance engineering with the MPI tool interface: the integration of MVAPICH and TAU (2017) (22)
- OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training (2018) (22)
- EReinit: Scalable and efficient fault‐tolerance for bulk‐synchronous MPI applications (2018) (22)
- Where to provide support for efficient multicasting in irregular networks: network interface or switch? (1998) (22)
- MVAPICH2 over OpenStack with SR-IOV: An Efficient Approach to Build HPC Clouds (2015) (22)
- Simulation Of Modern Parallel Systems: A CSIM-based Approach (1997) (22)
- Fast checkpointing by Write Aggregation with Dynamic Buffer and Interleaving on multicore architecture (2009) (22)
- Efficient and scalable all-to-all personalized exchange for InfiniBand-based clusters (2004) (22)
- Designing MPI Library with Dynamic Connected Transport (DCT) of InfiniBand: Early Experiences (2014) (22)
- High-Performance Hybrid Key-Value Store on Modern Clusters with RDMA Interconnects and SSDs: Non-blocking Extensions, Designs, and Benefits (2016) (22)
- ScELA: scalable and extensible launching architecture for clusters (2008) (22)
- Designing Efficient Asynchronous Memory Operations Using Hardware Copy Engine: A Case Study with I/OAT (2007) (22)
- Is Singularity-based Container Technology Ready for Running MPI Applications on HPC Clouds? (2017) (21)
- Lock-Free Asynchronous Rendezvous Design for MPI Point-to-Point Communication (2008) (21)
- Applying MPI derived datatypes to the NAS benchmarks: A case study (2004) (21)
- Impact of multiple consumption channels on wormhole routed k-ary n-cube networks (1993) (21)
- HAND: A Hybrid Approach to Accelerate Non-contiguous Data Movement Using MPI Datatypes on GPU Clusters (2014) (21)
- Improving Scalability of OpenMP Applications on Multi-core Systems Using Large Page Support (2007) (21)
- Global reduction in wormhole k-ary n-cube networks with multidestination exchange worms (1995) (21)
- MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit (2011) (21)
- Adaptive and Dynamic Design for MPI Tag Matching (2016) (21)
- Issues in Designing Efficient and Practical Algorithms for Collective Communication on Wormhole-Rout (1995) (21)
- Benefits of I/O Acceleration Technology (I/OAT) in Clusters (2007) (21)
- Comparison and Evaluation of Design Choices for Implementing the Virtual Interface Architecture (VIA) (2000) (20)
- Unifier: unifying cache management and communication buffer management for PVFS over InfiniBand (2004) (20)
- Efficient Barrier and Allreduce on IBA clusters using hardware multicast and adaptive algorithms (2004) (20)
- Efficient Barrier and Allreduce on IBA clusters using hardware multicast and adaptive algorithms (2004) (20)
- Designing truly one-sided MPI-2 RMA intra-node communication on multi-core systems (2010) (20)
- High Performance Broadcast Support in La-Mpi Over Quadrics (2005) (20)
- Multiple Multicast with Minimized Node Contention on Wormhole k-ary n-cube Networks (1999) (20)
- A Software Based Approach for Providing Network Fault Tolerance in Clusters with uDAPL interface: MPI Level Design and Performance Evaluation (2006) (19)
- Scheduling of MPI-2 one sided operations over InfiniBand (2005) (19)
- IMCa: A High Performance Caching Front-End for GlusterFS on InfiniBand (2008) (19)
- NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems (2020) (19)
- The MVAPICH project: Transforming research into high-performance MPI library for HPC community (2020) (19)
- CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart (2011) (19)
- Application-bypass broadcast in MPICH over GM (2003) (19)
- Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems (2009) (19)
- Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning (2017) (18)
- Zero-copy protocol for MPI using infiniband unreliable datagram (2007) (18)
- Designing Non-blocking Broadcast with Collective Offload on InfiniBand Clusters: A Case Study with HPL (2011) (18)
- UPC on MIC: Early Experiences with Native and Symmetric Modes (2013) (18)
- High Performance Pipelined Process Migration with RDMA (2011) (18)
- Performance of HPC Middleware over InfiniBand WAN (2008) (18)
- A Micro-benchmark Suite for Evaluating HDFS Operations on Modern Clusters (2012) (18)
- A scalable and portable approach to accelerate hybrid HPL on heterogeneous CPU-GPU clusters (2013) (18)
- Performance Characterization of DNN Training using TensorFlow and PyTorch on Modern Clusters (2019) (18)
- Designing passive synchronization for MPI-2 one-sided communication to maximize overlap (2008) (18)
- Can memory-less network adapters benefit next-generation infiniband systems? (2005) (18)
- Accelerating TensorFlow with Adaptive RDMA-Based gRPC (2018) (18)
- MR-Advisor: A comprehensive tuning, profiling, and prediction tool for MapReduce execution frameworks on HPC clusters (2017) (18)
- Special Issue on Workstation Clusters and Network-Based Computing: Guest Editors' Introduction (1997) (17)
- High-Performance and Resilient Key-Value Store with Online Erasure Coding for Big Data Workloads (2017) (17)
- Supporting iWARP Compatibility and Features for Regular Network Adapters (2005) (17)
- MVAPICH2-MIC: A High Performance MPI Library for Xeon Phi Clusters with InfiniBand (2013) (17)
- Alleviating Consumption Channel Bottleneck in Wormhole-Routed k-ary n-Cube Systems (1998) (17)
- Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters Using Shared Memory Backed Windows (2011) (17)
- Implementing efficient MPI on LAPI for IBM RS/6000 SP systems: Experiences and performance evaluation (1999) (17)
- Efficient one-copy MPI shared memory communication in Virtual Machines (2008) (17)
- All-to-all broadcast on switch-based clusters of workstations (1999) (17)
- Performance benefits of NIC-based barrier on myrinet/GM (2001) (17)
- Bridging the Ethernet-Ethernot Performance Gap (2006) (17)
- Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing (2001) (17)
- Supporting Strong Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand (2004) (17)
- MapReduce over Lustre: Can RDMA-Based Approach Benefit? (2014) (16)
- Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand (2004) (16)
- Exploiting Non-blocking Remote Memory Access Communication in Scientific Benchmarks (2003) (16)
- High Performance MPI over iWARP: Early Experiences (2007) (16)
- Slurm-V: Extending Slurm for Building Efficient HPC Cloud with SR-IOV and IVShmem (2016) (16)
- Designing high performance DSM systems using InfiniBand features (2004) (16)
- Designing Locality and NUMA Aware MPI Runtime for Nested Virtualization based HPC Cloud with SR-IOV Enabled InfiniBand (2017) (16)
- Efficient data access strategies for Hadoop and Spark on HPC cluster with heterogeneous storage (2016) (16)
- pNFS/PVFS2 over InfiniBand: early experiences (2007) (16)
- High performance and scalable mpi intra-node communication middleware for multi-core clusters (2009) (16)
- High-Performance Virtual Machine Migration Framework for MPI Applications on SR-IOV Enabled InfiniBand Clusters (2017) (16)
- VIBe: a micro-benchmark suite for evaluating virtual interface architecture (VIA) implementations (2001) (16)
- How Can We Design Better Networks for DSM Systems? (1997) (16)
- Reducing network contention with mixed workloads on modern multicore, clusters (2009) (16)
- In-memory I/O and replication for HDFS with Memcached: Early experiences (2014) (15)
- Evaluation of Energy Characteristics of MPI Communication Primitives with RAPL (2013) (15)
- An efficient scheme for complete exchange in 2D tori (1995) (15)
- Designing Efficient Cooperative Caching Schemes for Multi-Tier Data-Centers over RDMA-enabled Networks (2006) (15)
- Multi-threaded UPC runtime with network endpoints: Design alternatives and evaluation on multi-core architectures (2011) (15)
- Boldio: A hybrid and resilient burst-buffer over lustre for accelerating big data I/O (2016) (15)
- DLoBD: A Comprehensive Study of Deep Learning over Big Data Stacks on HPC Clusters (2018) (15)
- Designing Next Generation Clusters: Evaluation of InfiniBand DDR/QDR on Intel Computing Platforms (2009) (15)
- Improving Application Performance and Predictability Using Multiple Virtual Lanes in Modern Multi-core InfiniBand Clusters (2010) (15)
- Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters (2015) (15)
- Analyzing the impact of supporting out-of-order communication on in-order performance with iWARP (2007) (14)
- Contention-Aware Kernel-Assisted MPI Collectives for Multi-/Many-Core Systems (2017) (14)
- MIC-Check: a distributed check pointing framework for the intel many integrated cores architecture (2014) (14)
- Scalable Graph500 design with MPI-3 RMA (2014) (14)
- Re-Designing CNTK Deep Learning Framework on Modern GPU Enabled Clusters (2016) (14)
- NIC-based atomic operations on Myrinet/GM (2002) (14)
- Designing Efficient Shared Address Space Reduction Collectives for Multi-/Many-cores (2018) (14)
- Performance Evaluation of RDMA over IP: A Case Study with the Ammasso Gigabit Ethernet NIC (2005) (14)
- Design Alternatives for Virtual Interface Architecture and an Implementation on IBM Netfinity NT Cluster (2001) (14)
- Designing Non-blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled Clusters (2015) (14)
- Characterization of Big Data Stream Processing Pipeline: A Case Study using Flink and Kafka (2017) (14)
- Designing MPI Library with On-Demand Paging (ODP) of InfiniBand: Challenges and Benefits (2016) (14)
- GEMS: GPU-Enabled Memory-Aware Model-Parallelism System for Distributed DNN Training (2020) (14)
- Efficient Barrier and Allreduce on InfiniBand Clusters using Hardware Multicast and Adaptive Algorithms (2004) (14)
- Optimizing mechanisms for latency tolerance in remote memory access communication on clusters (2003) (14)
- CUDA Kernel Based Collective Reduction Operations on Large-scale GPU Clusters (2016) (14)
- A Comprehensive Performance Evaluation of OpenSHMEM Libraries on InfiniBand Clusters (2014) (14)
- High Performance MPI on IBM 12x InfiniBand Architecture (2007) (13)
- MIC-RO: enabling efficient remote offload on heterogeneous many integrated core (MIC) clusters with InfiniBand (2013) (13)
- SALaR: Scalable and Adaptive Designs for Large Message Reduction Collectives (2018) (13)
- MIBA: A Micro-Benchmark Suite for Evaluating InfiniBand Architecture Implementations (2003) (13)
- Reconfigurable vector register windows for fast matrix computation on the orthogonal multiprocessor (1990) (13)
- DDSS: A Low-Overhead Distributed Data Sharing Substrate for Cluster-Based Data-Centers over Modern Interconnects (2006) (13)
- ProOnE: a general-purpose protocol onload engine for multi- and many-core architectures (2009) (13)
- Efficient Asynchronous Communication Progress for MPI without Dedicated Resources (2018) (13)
- Scalable, high-performance NIC-based all-to-all broadcast over Myrinet/GM (2004) (13)
- Multicasting on Switch-Based Irregular Networks Using Multi-drop Path-Based Multidestination Worms (1997) (13)
- Scalable architectures with k-ary n-cube cluster-c organization (1993) (13)
- Designing Dynamic and Adaptive MPI Point-to-Point Communication Protocols for Efficient Overlap of Computation and Communication (2017) (13)
- Wide-area overlay networking to manage science DMZ accelerated flows (2014) (13)
- Evaluating the Impact of RDMA on Storage I/O over InfiniBand (2004) (13)
- An MPI-Stream Hybrid Programming Model for Computational Clusters (2010) (12)
- A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters (2014) (12)
- Designing high-performance and resilient message passing on InfiniBand (2010) (12)
- Performance Modeling for RDMA-Enhanced Hadoop MapReduce (2014) (12)
- Design of network topology aware scheduling services for large InfiniBand clusters (2013) (12)
- Designing Scalable Out-of-core Sorting with Hybrid MPI+PGAS Programming Models (2014) (12)
- Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects (2020) (12)
- Task assignment on distributed-memory systems with adaptive wormhole routing (1993) (12)
- TupleQ: Fully-asynchronous and zero-copy MPI over InfiniBand (2009) (12)
- Characterizing CUDA Unified Memory (UM)-Aware MPI Designs on Modern GPU Architectures (2019) (12)
- Scalable MiniMD Design with Hybrid MPI and OpenSHMEM (2014) (12)
- MR-Advisor: A Comprehensive Tuning Tool for Advising HPC Users to Accelerate MapReduce Applications on Supercomputers (2016) (12)
- Designing Large Hierarchical Multiprocessor Systems under Processor, Interconnection, and Packaging Advancements (1994) (11)
- Design and Implementation of High Performance MVAPICH2 (MPI2 over InfiniBand) (11)
- Design and Implementation of High Performance MVAPICH2 (MPI2 over InfiniBand) (11)
- A Novel Functional Partitioning Approach to Design High-Performance MPI-3 Non-blocking Alltoallv Collective on Multi-core Systems (2013) (11)
- On using connection-oriented vs. connection-less transport for performance and scalability of collective and one-sided operations: trade-offs and impact (2007) (11)
- Intra-MIC MPI Communication using MVAPICH 2 : Early Experience (2012) (11)
- Implementing TreadMarks over Virtual Interface Architecture on Myrinet and gigabit Ethernet: Challenges, design experience, and performance evaluation (2001) (11)
- Optimized large-message broadcast for deep learning workloads: MPI, MPI+NCCL, or NCCL2? (2019) (11)
- Application-bypass reduction for large-scale clusters (2004) (11)
- INAM2: InfiniBand Network Analysis and Monitoring with MPI (2016) (11)
- Swift-X: Accelerating OpenStack Swift with RDMA for Building an Efficient HPC Cloud (2017) (11)
- Low Latency Message-Passing for Reflective Memory Networks (1999) (11)
- Architectural Support for Efficient Multicasting in Irregular Networks (2001) (11)
- Performance Evaluation of MPI Libraries on GPU-Enabled OpenPOWER Architectures: Early Experiences (2019) (10)
- Scaling TensorFlow, PyTorch, and MXNet using MVAPICH2 for High-Performance Deep Learning on Frontera (2019) (10)
- PMI Extensions for Scalable MPI Startup (2014) (10)
- High Performance MPI Datatype Support with User-Mode Memory Registration: Challenges, Designs, and Benefits (2015) (10)
- Designing High-Performance MPI Libraries with On-the-fly Compression for Modern GPU Clusters* (2021) (10)
- Can a Decentralized Metadata Service Layer Benefit Parallel Filesystems? (2011) (10)
- Scalable and Distributed Key-Value Store-based Data Management Using RDMA-Memcached (2017) (10)
- Design Alternatives and Performance Trade-Offs for Implementing MPI-2 over InfiniBand (2005) (10)
- Exploiting RDMA operations for Providing Efficient Fine-Grained Resource Monitoring in Cluster-based Servers (2006) (9)
- Designing a Profiling and Visualization Tool for Scalable and In-depth Analysis of High-Performance GPU Clusters (2019) (9)
- Designing a high-performance clustered NAS: a case study with pNFS over RDMA on InfiniBand (2008) (9)
- Mizan-RMA: Accelerating Mizan Graph Processing Framework with MPI RMA (2016) (9)
- NIC-based reduction algorithms for large-scale clusters (2006) (9)
- Minimizing Network Contention in InfiniBand Clusters with a QoS-Aware Data-Staging Framework (2012) (9)
- Design and implementation of open MPI over Quadrics/Elan4 (2005) (9)
- Efficient and truly passive MPI-3 RMA using InfiniBand atomics (2013) (9)
- Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast (2019) (9)
- Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand (2006) (9)
- Network-Based Parallel Computing. Communication, Architecture, and Applications (1999) (9)
- NIC-based intrusion detection : A feasibility study (2002) (9)
- NIC-based rate control for proportional bandwidth allocation in Myrinet clusters (2001) (9)
- Non-Blocking PMI Extensions for Fast MPI Startup (2015) (9)
- Fast Broadcast and Multicast in Wormhole Multistage Networks with Multidestination Worms (1995) (9)
- Can Network-Offload Based Non-blocking Neighborhood MPI Collectives Improve Communication Overheads of Irregular Graph Algorithms? (2012) (9)
- On the provision of prioritization and soft qos in dynamically reconfigurable shared data-centers over infiniband (2005) (9)
- Designing Zero-Copy Message Passing Interface Derived Datatype Communication Over Infiniband: Alternative Approaches and Performance Evaluation (2005) (9)
- ormhole-Routed Networks with Virtual Channels (1996) (9)
- INAM - A Scalable InfiniBand Network Analysis and Monitoring Tool (2011) (9)
- High performance distributed deep learning: a beginner's guide (2019) (8)
- Analyzing, Modeling, and Provisioning QoS for NVMe SSDs (2018) (8)
- Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers (2007) (8)
- CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters (2016) (8)
- BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs (2021) (8)
- Implementing TreadMarksover VIA on Myrinet and Gigabit Ethernet: Challenges, Design Experience, and Performance Evaluation (2001) (8)
- FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures (2020) (8)
- Initial study of multi-endpoint runtime for MPI+OpenMP hybrid programming model on multi-core systems (2014) (8)
- MPI over uDAPL: Can High Performance and Portability Exist Across Architectures? (2006) (8)
- Designing a Micro-Benchmark Suite to Evaluate gRPC for TensorFlow: Early Experiences (2018) (8)
- FALCON: Efficient Designs for Zero-Copy MPI Datatype Processing on Emerging Architectures (2019) (8)
- Can software reliability outperform hardware reliability on high performance interconnects?: a case study with MPI over infiniband (2008) (8)
- Cooperative Rendezvous Protocols for Improved Performance and Overlap (2018) (8)
- Understanding the communication characteristics in HBase: What are the fundamental bottlenecks? (2012) (8)
- Advanced Flow-control Mechanisms for the Sockets Direct Protocol over InfiniBand (2007) (8)
- Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase (2016) (8)
- Exploiting Maximal Overlap for Non-Contiguous Data Movement Processing on Modern GPU-Enabled Systems (2016) (8)
- A Case for UDP Offload Engines in LambdaGrids (2006) (7)
- Experiences with software MPEG-2 video decompression on an SMP PC (1998) (7)
- Message-ordering for wormhole-routed multiport systems with link contention and routing adaptivity (1994) (7)
- Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over (2004) (7)
- Implementing TreadMarks over GM on Myrinet: challenges, design experience, and performance evaluation (2003) (7)
- Efficient Hardware Multicast Group Management for Multiple MPI Communicators over InfiniBand (2005) (7)
- Design alternatives for implementing fence synchronization in MPI-2 one-sided communication for InfiniBand clusters (2009) (7)
- Community Climate System Model (CCSM) (2011) (7)
- High performance OpenSHMEM for Xeon Phi clusters: Extensions, runtime designs and application co-design (2014) (7)
- Low-latency message passing on workstation clusters using SCRAMNet (1999) (7)
- Efficient design for MPI asynchronous progress without dedicated resources (2019) (7)
- Designing and Evaluating MPI-2 Dynamic Process Management Support for InfiniBand (2009) (7)
- Designing Topology-Aware Communication Schedules for Alltoall Operations in Large InfiniBand Clusters (2014) (7)
- Analysis of design considerations for optimizing multi-channel MPI over InfiniBand (2005) (7)
- MPI-IO on DAFs over VIA: implementation and performance evaluation (2002) (7)
- HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow (2019) (7)
- Designing next generation data-centers with advanced communication protocols and systems services (2006) (7)
- Barrier synchronization in distributed-memory multiprocessors using rendezvous primitives (1993) (7)
- Designing high performance communication runtime for GPU managed memory: early experiences (2016) (7)
- Cutting the Tail: Designing High Performance Message Brokers to Reduce Tail Latencies in Stream Processing (2018) (6)
- Enhancing the Performance of NFSv4 with RDMA (2007) (6)
- Resource requirements for digital computations on electrooptical systems. (1991) (6)
- Optimizing Collective Communication in UPC (2014) (6)
- Can RDMA benefit online data processing workloads on memcached and MySQL? (2015) (6)
- An efficient hardware-software approach to network fault tolerance with InfiniBand (2009) (6)
- High Performance Alltoall and Allgather Designs for InfiniBand MIC Clusters (2014) (6)
- Benchmarking key-value stores on high-performance storage and interconnects for web-scale workloads (2015) (6)
- Prioritized demand multiplexing (PDM): a low-latency virtual channel flow control framework for prioritized traffic (1997) (6)
- Performance Characterization of Hadoop Workloads on SR-IOV-Enabled Virtualized InfiniBand Clusters (2016) (6)
- NemC: A Network Emulator for Cluster-of-Clusters (2006) (6)
- Can Checkpoint/Restart Mechanisms Benefit from Hierarchical Data Staging? (2011) (6)
- Communication and Architectural Support for Network-Based Parallel Computing (1997) (6)
- Impact of Node Level Caching in MPI Job Launch Mechanisms (2009) (6)
- Designing a Portable MPI-2 over Modern Interconnects Using uDAPL Interface (2005) (6)
- Understanding the Memory-Utilization of MPI Libraries: Challenges and Designs in Implementing the MPI_T Interface (2014) (6)
- Clustering and intra-processor scheduling for explicitly-parallel programs on distributed-memory systems (1994) (6)
- Optimized Distributed Data Sharing Substrate in Multi-core Commodity Clusters: A Comprehensive Study with Applications (2008) (6)
- Architecture for caching responses with multiple dynamic dependencies in multi-tier data-centers over InfiniBand (2005) (6)
- MPI-2 One-Sided Usage and Implementation for Read Modify Write Operations: A Case Study with HPCC (2007) (5)
- Redesigning MPI shared memory communication for large multi-core architecture (2013) (5)
- Efficient multicast algorithms for switch-based irregular heterogeneous networks of workstations (2001) (5)
- Designing Scalable and High-Performance MPI Libraries on Amazon Elastic Fabric Adapter (2019) (5)
- Performance Evaluation of MM5 on Clusters with Modern Interconnects: Scalability and Impact (2005) (5)
- Benefits of processor clustering in designing large parallel systems: when and how? (1996) (5)
- Experiences and Benefits of Running RDMA Hadoop and Spark on SDSC Comet (2016) (5)
- MetaData persistence using storage class memory: experiences with flash-backed DRAM (2013) (5)
- Topology agnostic hot‐spot avoidance with InfiniBand (2009) (5)
- Optimisation and performance evaluation of mechanisms for latency tolerance in remote memory access communication on clusters (2004) (5)
- High-Performance Adaptive MPI Derived Datatype Communication for Modern Multi-GPU Systems (2019) (5)
- SHMEMPMI -- Shared Memory Based PMI for Improved Performance and Scalability (2016) (5)
- Dynamic Kernel Fusion for Bulk Non-contiguous Data Transfer on GPU Clusters (2020) (5)
- Designing high performance and scalable mpi over infiniband (2004) (5)
- Optimizing synchronization operations for remote memory communication systems (2003) (5)
- High-Performance Multi-Rail Erasure Coding Library over Modern Data Center Architectures: Early Experiences (2018) (5)
- A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks (2013) (5)
- Design and Characterization of InfiniBand Hardware Tag Matching in MPI (2020) (4)
- NVMD: Non-volatile memory assisted design for accelerating MapReduce and DAG execution frameworks on HPC systems (2017) (4)
- Adaptive and Hierarchical Large Message All-to-all Communication Algorithms for Large-scale Dense GPU Systems (2021) (4)
- Adaptive Routing on the New Switch Chip for IBM SP Systems (2001) (4)
- Designing QoS Aware MPI for InfiniBand-Techinical Report (2008) (4)
- A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance Networks (2014) (4)
- High performance network i/o in virtual machines over modern interconnects (2008) (4)
- SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures* (2019) (4)
- DistMILE: A Distributed Multi-Level Framework for Scalable Graph Embedding (2021) (4)
- A QoS framework for clusters to support applications with resource adaptivity and predictable performance (2002) (4)
- Scalable and high performance collective communication for next generation multicore infiniband clusters (2008) (4)
- Accelerated Real-time Network Monitoring and Profiling at Scale using OSU INAM (2020) (4)
- Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (2008) (4)
- Designing a ROCm-Aware MPI Library for AMD GPUs: Early Experiences (2021) (4)
- Designing high-end computing systems with InfiniBand and10-Gigabit Ethernet iWARP (2007) (4)
- High Performance OpenSHMEM Strided Communication Support with InfiniBand UMR (2015) (4)
- A Scalable InfiniBand Network Topology-Aware Performance Analysis Tool for MPI (2012) (4)
- GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks (2015) (4)
- High Performance Block I/O for Global File System (GFS) with InfiniBand RDMA (2006) (4)
- Enabling Performance Efficient Runtime Support for Hybrid MPI+UPC++ Programming Models (2016) (4)
- MPI and communication - High-performance and scalable MPI over InfiniBand with reduced memory usage: an in-depth performance analysis (2006) (4)
- Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects (2019) (4)
- Accurate Load Monitoring for Cluster-based Web Data-Centers over RDMA-enabled Networks (4)
- CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC (2016) (4)
- C-GDR: High-Performance Container-Aware GPUDirect MPI Communication Schemes on RDMA Networks (2019) (4)
- Characterizing Deep Learning over Big Data (DLoBD) Stacks on RDMA-Capable Networks (2017) (4)
- Can Non-volatile Memory Benefit MapReduce Applications on HPC Clusters? (2016) (4)
- Optimal Phase Barrier Synchronization in K-ary N-cube Wormhole-routed Systems Using Multirendezvous Primitives (1993) (4)
- Issues in Designing Scalable Systems with K-ary N-cube Cluster-c Organization (1994) (4)
- Codesign for InfiniBand Clusters (2011) (4)
- Architectural support for efficient communication in scalable parallel systems (1998) (3)
- Scalable systems software - A software based approach for providing network fault tolerance in clusters with uDAPL interface: MPI level design and performance evaluation (2006) (3)
- A Case for Non-blocking Collectives in OpenSHMEM: Design, Implementation, and Performance Evaluation using MVAPICH2-X (2015) (3)
- Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM (2015) (3)
- Reducing Diff Overhead in Software DSM Systems using RDMA Operations in InfiniBand (2004) (3)
- Profile-Based Load Balancing for Heterogeneous Clusters * (2007) (3)
- High Performance Design and Implementation of Nemesis Communication Layer for Two-Sided and One-Sided MPI Semantics in MVAPICH2 (2010) (3)
- Can NIC Memory in InfiniBand Benefit Communication Performance? — A Study with Mellanox Adapter (2004) (3)
- UMR-EC: A Unified and Multi-Rail Erasure Coding Library for High-Performance Distributed Storage Systems (2019) (3)
- Accelerating Big Data Processing on Modern Clusters (2015) (3)
- Designing communication strategies for heterogeneous parallel systems (1998) (3)
- Designing Registration Caching Free High-Performance MPI Library with Implicit On-Demand Paging (ODP) of InfiniBand (2017) (3)
- Exploiting the Benefits of Multiple-Path Network DSM Systems: Architectural Alternatives and Performance Evaluation (1999) (3)
- MPI-LiFE: Designing High-Performance Linear Fascicle Evaluation of Brain Connectome with MPI (2017) (3)
- Balancing Web server load for adaptable video distribution (2000) (3)
- Design and Implementation of Key Proposed MPI-3 One-Sided Communication Semantics on InfiniBand (2011) (3)
- Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters (2016) (3)
- Communication-Aware Hardware-Assisted MPI Overlap Engine (2020) (3)
- Interactive medical data on demand: a high-performance imaged-based approach across heterogeneous environments. (2000) (3)
- Efficient Training of Semantic Image Segmentation on Summit using Horovod and MVAPICH2-GDR (2020) (3)
- Performance characterization and acceleration of big data workloads on OpenPOWER system (2017) (3)
- USC orthogonal multiprocessor for image processing with neural networks (1990) (3)
- Architectural issues in designing heterogeneous parallel systems with passive star-coupled optical interconnection (1994) (3)
- A Plugin-Based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS (2015) (3)
- HPC Meets Cloud: Building Efficient Clouds for HPC, Big Data, and Deep Learning Middleware and Applications (2017) (3)
- Demotion-based exclusive caching through demote buffering: design and evaluations over different networks (2003) (3)
- High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi Clusters (2015) (3)
- Accelerating MPI All-to-All Communication with Online Compression on Modern GPU Clusters (2022) (3)
- Adaptive routing in RS/6000 SP-like bidirectional multistage interconnection networks (2000) (3)
- Communication mechanisms and algorithms for supporting scalable collective communication on parallel systems (1998) (3)
- A Scalable Network-Based Performance Analysis Tool for MPI on Large-Scale HPC Systems (2017) (3)
- Designing Network Failover and Recovery in MPI for Multi-Rail InfiniBand Clusters (2012) (3)
- EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures (2018) (3)
- Jodia (L5) and Mahadevpur (H4/5): Two Recent Ordinary Chondrite Falls in India (2009) (3)
- High Performance MPI over the Slingshot Interconnect: Early Experiences (2022) (3)
- Scalable MPI Collectives using SHARP: Large Scale Performance Evaluation on the TACC Frontera System (2020) (3)
- Co-Designing MPI Library and Applications for InfiniBand Clusters (2011) (2)
- SUPER: SUb-Graph Parallelism for TransformERs (2021) (2)
- Exploiting Remote Memory in InfiniBand Clusters using a High Performance Network Block Device (HPBD) (2005) (2)
- Can Scatter Communication Take Advantage of Multidestination Message Passing? (2000) (2)
- Thinking Beyond the RAM Disk for In-Memory Checkpointing of HPC Applications (2013) (2)
- Machine-agnostic and Communication-aware Designs for MPI on Emerging Architectures (2020) (2)
- Leveraging Network-level parallelism with Multiple Process-Endpoints for MPI Broadcast (2019) (2)
- Offloaded GPU Collectives Using CORE-Direct and CUDA Capabilities on InfiniBand Clusters (2015) (2)
- Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications (2016) (2)
- QoS-aware middleware for cluster-based servers to support interactive and resource-adaptive applications (2003) (2)
- Architectural Design of Orthogonal Multiprocessor for Multidimensional Information Processing (1991) (2)
- On-demand Connection Management for OpenSHMEM and OpenSHMEM+MPI (2015) (2)
- Accelerating CPU-based Distributed DNN Training on Modern HPC Clusters using BlueField-2 DPUs (2021) (2)
- Designing High-End Computing Systems with InfiniBand and High-Speed Ethernet (2010) (2)
- Efficient Multicast Algorithms for Heterogeneous Switch-based Irregular Networks of Workstations (2001) (2)
- Complete Exchange in 2 D MeshesN (1994) (2)
- Designing Processor-cluster Based Systems: Interplay Between Cluster Organizations and Collective Co (1996) (2)
- Characterizing and accelerating indexing techniques on distributed ordered tables (2017) (2)
- Advanced RDMA-Based Admission Control for Modern Data-Centers (2008) (2)
- Algorithm-Driven Simulation and Performance Projection of a RISC-based Orthogonal Multiprocessor (1990) (2)
- Networking and communication challenges for post-exascale systems (2018) (2)
- Exploring Hybrid MPI+Kokkos Tasks Programming Model (2020) (2)
- Sockets direct protocol for hybrid network stacks: a case study with iWARP over 10G Ethernet (2008) (2)
- Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures (2019) (2)
- Does RDMA-based enhanced Hadoop MapReduce need a new performance model? (2013) (2)
- MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling (2017) (2)
- Analyzing and Understanding the Impact of Interconnect Performance on HPC, Big Data, and Deep Learning Applications: A Case Study with InfiniBand EDR and HDR (2020) (2)
- Designing Virtualization-Aware and Automatic Topology Detection Schemes for Accelerating Hadoop on SR-IOV-Enabled Clouds (2016) (2)
- Fast Data Manipulation in Multiprocessors Using Parallel Pipelined Memories (1991) (2)
- Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries (2022) (1)
- Characterization of Structure-Borne Road/Tire Noise Inside a Passenger Car Cabin Using Path Based Analysis (2013) (1)
- SCOR-KV: SIMD-Aware Client-Centric and Optimistic RDMA-Based Key-Value Store for Emerging CPU Architectures (2019) (1)
- Performance Characterization of Network Mechanisms for Non-Contiguous Data Transfers in MPI (2020) (1)
- Active network interface: opportunities and challenges (2002) (1)
- Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures (2018) (1)
- High-Performance Coarray Fortran Support with MVAPICH2-X: Initial Experience and Evaluation (2015) (1)
- Designing processor-cluster based systems: interplay between cluster organizations and broadcasting algorithms (1996) (1)
- Eecient Collective Communication on Heterogenous Networks of Workstations Eecient Collective Communication on Heterogeneous Networks of Workstations 1 (1998) (1)
- Design and Implementation of Open MPI over QsNet/Elan4 (2004) (1)
- Hy-Fi: Hybrid Five-Dimensional Parallel DNN Training on High-Performance GPU Clusters (2022) (1)
- Fast Collective Communication Algorithms for Reflective Memory Network Clusters (2000) (1)
- Configurable, Highly Parallel Computer (2011) (1)
- INAM: Cross-stack Profiling and Analysis of Communication in MPI-based Applications (2021) (1)
- Designing high-performance and scalable clustered network attached storage with infiniband (2008) (1)
- Building Efficient HPC Cloud with SR-IOV-Enabled InfiniBand: The MVAPICH2 Approach (2017) (1)
- A Parallel-Serial Binary Arbitration Scheme for Collision-Free Multi-Access Techniques (1988) (1)
- Message Vectorization for Converting Multicomputer Programs to Shared-Memory Multiprocessors (1991) (1)
- Designing next generation clusters with InfiniBand and 10GE/iWARP: Opportunities and challenges (2008) (1)
- A Portable InfiniBand Module for MPICH 2 / Nemesis : Design and Evaluation (2009) (1)
- Impact of InfiniBand DC Transport Protocol on Energy Consumption of All-to-All Collective Algorithms (2015) (1)
- An Evaluation of Preemption Strategies for Parallel Job Scheduling (1)
- Cluster File Systems (2011) (1)
- Benefits of Dedicating Resource Sharing Services in DataCenters for Emerging Multi-Core Systems (2007) (1)
- Improving cluster performance through the use of programmable network interfaces (2003) (1)
- Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters (2016) (1)
- Analysis of routing in pyramid architectures (1993) (1)
- Designing efficient communication subsystems for distributed shared memory (dsm) systems (1999) (1)
- Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware (2015) (1)
- Can Streaming SIMD Non-Temporal Instructions Benefit Intra-node MPI Communication on Modern Multi-core Platforms? (2010) (1)
- Designing high performance parallel systems: a processor-cluster based approach (1996) (1)
- Community Ice Code (CICE) (2011) (1)
- Topology agnostic hot-spot avoidance with InfiniBand (2009) (1)
- Optimizing Distributed DNN Training Using CPUs and BlueField-2 DPUs (2022) (1)
- Can high performance software DSM systems designed with InfiniBand features benefit from PCI-Express? (2005) (1)
- MPI Meets Cloud: Case Study with Amazon EC2 and Microsoft Azure (2020) (1)
- Optimizing a Stencil-Based Application for Earthquake Modeling on Modern InfiniBand Clusters (2009) (1)
- On the Hardware Requirement for 2-D Image Convolution in Electro-Optical Systems1 (1989) (1)
- Geochemical Evidence for the Meteorite Impact Origin of Ramgarh Structure, India (2008) (1)
- Layout-aware Hardware-assisted Designs for Derived Data Types in MPI (2021) (1)
- Reliable Hardware Barrier SynchronizationSchemes (1997) (1)
- Arm meets Cloud: A Case Study of MPI Library Performance on AWS Arm-based HPC Cloud with Elastic Fabric Adapter (2022) (1)
- Efficient MPI-based Communication for GPU-Accelerated Dask Applications (2021) (1)
- Critical Sections (2011) (1)
- Bene ts of Processor Clustering in Designing Large ParallelSystems : When and How ? (1995) (1)
- Towards Architecture-aware Hierarchical Communication Trees on Modern HPC Systems (2021) (1)
- Optical arithmetic using high-radix symbolic substitution rules (1989) (1)
- Blink: Towards Efficient RDMA-based Communication Coroutines for Parallel Python Applications (2020) (1)
- Efficient Personalized and Non-Personalized Alltoall Communication for Modern Multi-HCA GPU-Based Clusters (2022) (0)
- Design and Performance of PVFS over InfiniBand (0)
- Distributed Memory Parallel Systems (1999) (0)
- Panel: Data intensive computing. (2006) (0)
- Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version (2023) (0)
- Proceedings of the International Conference on Parallel Processing Workshops: Welcome Message (2010) (0)
- Functional Partitioning: Enabling and Optimizing Exascale Runtime Services (2012) (0)
- Computer Performance Evaluation. Modelling Techniques and Tools (2003) (0)
- Special Issue on Communication Architecture for Clusters: Guest Editors Introduction (2004) (0)
- Feature estimation for efficient streaming (2002) (0)
- Final Report for Project DE-FC02-06ER25755 [Pmodels2] (2014) (0)
- Designing Next Generation Clusters, Cluster-Based Servers and Datacenters with InfiniBand: Opportunities and Challenges (2003) (0)
- 4.2 Multi-phase Barrier Synchronization 4.3 Optimal Number of Phases 3.1.2 Multi-path-based Schemes 3.2 Synchronization-worm-based Scheme 4 Barrier Synchronization Using Multiren- Dezvous Primitives 4.1 Single-phase Barrier Synchronization (1993) (0)
- Networking and communication challenges for post-exascale systems (2018) (0)
- Designing Efficient Pipelined Communication Schemes using Compression in MPI Libraries (2022) (0)
- Tutorial: Designing High-End Computing Systems with Infiniband and 10-Gigabit Ethernet (2009) (0)
- Spark Meets MPI: Towards High-Performance Communication Framework for Spark using MPI (2022) (0)
- Message from the general co-chairs IEEE ICPADS 2014 (2014) (0)
- Proceedings of the First International Workshop on Communication and Architectural Support for Network-Based Parallel Computing (1997) (0)
- Adaptive Receiver Window Scaling: Minimizing MPI Communication Memory over InfiniBand (2006) (0)
- Kernel-Assisted Communication Engine for MPI on Emerging Manycore Processors (2017) (0)
- Proceedings of the Second International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications (1998) (0)
- High Performance MPI over the Slingshot Interconnect (2023) (0)
- AccDP: Accelerated Data-Parallel Distributed DNN Training for Modern GPU-Based HPC Clusters (2022) (0)
- Challenges and Opportunities in Designing High-Performance and Scalable Middleware for HPC and AI: Past, Present, and Future (2022) (0)
- Proceedings, 1999 International Conference on Parallel Processing, 21-24 September 1999 Aizu-Wakamatsu City, Japan (1999) (0)
- Tutorial III (2006) (0)
- Large-Message Nonblocking MPI_Iallgather and MPI Ibcast Offload via BlueField-2 DPU (2021) (0)
- Performance Evaluation of RDMA over IP : A Case Study with Ammasso Gigabit Ethernet (2005) (0)
- Designing Multi-Core Aware Inter-Communicator Operations for MPI-2 Dynamic Process Management (2009) (0)
- Performance Evaluation of RDMA over IP : A Case Study with Ammasso Gigabit Ethernet (2005) (0)
- Highly Efficient Alltoall and Alltoallv Communication Algorithms for GPU Systems (2022) (0)
- Designing High Performance DSM Systems using InfiniBand: Opportunities, Challenges and Experiences (0)
- Introduction to HPBDC Workshop (2017) (0)
- Optimized MPI Gather Collective for Many Integrated Core (MIC) InfiniBand Clusters (2013) (0)
- Introduction to HPBDC 2019 (2019) (0)
- Workshop Introduction (2002) (0)
- IPDPS 2007: Comments from the Guest Editor (2009) (0)
- Efficient and Scalable NIC-Based Barrier over Quadrics and Myrinet (0)
- Dynamic Reconfigurability Support for providing Soft QoS Guarantees in Cluster-based Multi-Tier DataCenters over (0)
- Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters (2016) (0)
- IBM Power (2011) (0)
- Lightning Talks of EduHPC 2022 (2022) (0)
- A Portable Client/Server Communication Middleware over SANs: Design and Performance Evaluation with Virtual Interface and InfiniBand (2003) (0)
- Designing Hierarchical Multi-HCA Aware Allgather in MPI (2022) (0)
- Towards Java-based HPC using the MVAPICH2 Library: Early Experiences (2022) (0)
- TC on Parallel Processing (1995) (0)
- Spark-uDAPL: Cost-Saving Big Data Analytics on Microsoft Azure Cloud with RDMA Networks* (2018) (0)
- Network-Assisted Noncontiguous Transfers for GPU-Aware MPI Libraries (2023) (0)
- Coordinated Fault-Tolerance for High-Performance Computing Final Project Report (2011) (0)
- Cross-layer Visualization and Profiling of Network and I/O Communication for HPC Clusters (2021) (0)
- Additional Studies of Materials from the Ramgarh Structure, India (2009) (0)
- Architectural and Communication Issues in DesigningHeterogeneous Parallel Systems with OpticalInterconnection (1994) (0)
- Design and Evaluation of Shared Memory CommunicationBenchmarks on Emerging Architectures using MVAPICH2 (2019) (0)
- Performance Analysis and Improved Communication Overlap for a Seismic Modeling Application on Large InfiniBand Clusters (2010) (0)
- CCF THPC inaugural issue editorial (2019) (0)
- Scalable and high-performance mpi design for very large infiniband clusters (2007) (0)
- Eecient Multicast on Irregular Switch-based Networks with Cut-through Switching This Manuscript Is under Review for Publication in the Ieee Transactions on Parallel and Distributed Systems. Eecient Multicast on Irregular Switch-based Networks with Cut-through Switching (2007) (0)
- Tutorials - HOTI 2012 (2012) (0)
- Enhancing mpi with modern networking mechanisms in cluster interconnects (2006) (0)
- "Hey CAI" - Conversational AI Enabled User Interface for HPC Tools (2022) (0)
- State of InfiniBand in designing HPC clusters, storage/file systems, and datacenters [datacenters read as data centers] (2004) (0)
- Scalable Out-of-core OpenSHMEM Library for HPC (2015) (0)
- Numerical Modeling of Critical Path Contributions for NVH Prediction of Vehicle (2013) (0)
- Introduction to HPBDC 2018 (2018) (0)
- IPDPS 2007 Organization (2007) (0)
- Commodity High Performance Interconnects (2009) (0)
- MCR-DL: Mix-and-Match Communication Runtime for Deep Learning (2023) (0)
- Design and implementation of high performance communication subsystems for clusters (2000) (0)
- Tutorial: Infiniband and 10-Gigabit Ethernet for Dummies (2009) (0)
- OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems (2021) (0)
- CCF THPC inaugural issue editorial (2019) (0)
- Redesigning MPI shared memory communication for large multi-core architecture (2012) (0)
- Supercomputing Frontiers: 6th Asian Conference, SCFA 2020, Singapore, February 24–27, 2020, Proceedings (2020) (0)
- Proceedings of the 2008 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, ANCS 2008, San Jose, California, USA, November 6-7, 2008 (2008) (0)
- NFS / RDMA over InfiniBand : Is It Beneficial ? ∗ (0)
- Designing Zero-Copy FTP Mechanisms to Achieve High Performance Data-Transfer over InfiniBand WAN (2008) (0)
- Table of Contents (2003) (0)
- Designing Next Generation Clusters with Infiniband: Opportunities and Challenges (2003) (0)
- HPBDC Introduction and Committees (2016) (0)
- An Enhanced MPI-2 Dynamic Process Management Support for InfiniBand (2009) (0)
- under Pa Technological Advancements (1996) (0)
- High perfomance and network fault tolerant mpi with multi-pathing over infiniband (2007) (0)
- OpenSHMEM Non-blocking Data Movement Operations with MVAPICH2-X: Early Experiences (2016) (0)
- Designing high performance and scalable distributed datacenter services over modern interconnects (2008) (0)
- Accelerating GPU-based Machine Learning in Python using MPI Library: A Case Study with MVAPICH2-GDR (2020) (0)
- Designing NextGeneration Data-Centers withAdvanced Communication Protocols andSystems Services * (2006) (0)
- An Architectural study of Cluster-Based Multi-Tier DataCenters (2004) (0)
- Better NFS through RDMA and Efficient Memory Registration (2007) (0)
- Second workshop on system-level Virtualization for High Performance Computing (HPCVirt 2008) (2008) (0)
- Veloblock: Efficient and Scalable RDMA Fast Path for InfiniBand (2009) (0)
- Architectural Support for Eecient Multicasting in Irregular Networks Architectural Support for Eecient Multicasting in Irregular Networks (2001) (0)
- PVFS2 over Quadrics: Design, Implementation and Performan ce Evaluation (2005) (0)
- AdvancedRDMA-basedAdmissionControlforModernData-Centers (2008) (0)
- InfiniBand (2011) (0)
- 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018, Washington, DC, USA, May 1-4, 2018 (2018) (0)
- Tutorials (2018) (0)
- Efficient Checkpoint/Restart for Multi-Channel MPI over Multi-core Clusters (2009) (0)
- Vectorized interprocessor communication and data movement in shared-memory multiprocessors (1992) (0)
- Exploiting and Evaluating OpenSHMEM on KNL Architecture (2017) (0)
- Accelerating Broadcast Communication with GPU Compression for Deep Learning Workloads (2022) (0)
- CCGrid 2020 Committees (2020) (0)
- Tutorial 2: InfiniBand Architecture and Where it is Headed (2002) (0)
- Community Climate Model (CCM) (2011) (0)
- Future Directions of the Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Program (2020) (0)
- The emergence of workstation clusters: Should we continue to build mpps? [panel session] (1998) (0)
- Communication and memory management in networked storage systems (2004) (0)
- Designing High-Performance Communication Subsystems: Top Five Problems to Solve and Five Problems Not to Solve During the Next Five Years (Panel) (1997) (0)
- Adaptive Routing on the New Swit h Chipfor IBM SP (2007) (0)
- Eecient Scatter Communication in Wormhole K-ary N-cubes with Multidestination Message Passing 1 (1996) (0)
- A Case for UDP Of fl oad Engines in LambdaGrids (2006) (0)
- Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences (2021) (0)
- Collective Communication, Network Support For (2011) (0)
This paper list is powered by the following services:
What Schools Are Affiliated With Dhabaleswar K. D. K. Panda?
Dhabaleswar K. D. K. Panda is affiliated with the following schools: