Bill Dally
#11,910
Most Influential Person Now
American computer scientist
Bill Dally's AcademicInfluence.com Rankings
Bill Dallyengineering Degrees
Engineering
#396
World Rank
#686
Historical Rank
#182
USA Rank
Electrical Engineering
#101
World Rank
#117
Historical Rank
#47
USA Rank
Bill Dallycomputer-science Degrees
Computer Science
#795
World Rank
#822
Historical Rank
#431
USA Rank
Parallel Computing
#4
World Rank
#4
Historical Rank
#4
USA Rank
Computer Architecture
#7
World Rank
#7
Historical Rank
#6
USA Rank
Algorithms
#41
World Rank
#41
Historical Rank
#15
USA Rank
Download Badge
Engineering Computer Science
Bill Dally's Degrees
- PhD Computer Science California Institute of Technology
- Masters Electrical Engineering Stanford University
- Bachelors Electrical Engineering Virginia Tech
Similar Degrees You Can Earn
Why Is Bill Dally Influential?
(Suggest an Edit or Addition)According to Wikipedia, William James Dally is an American computer scientist and educator. Formerly a professor of Electrical Engineering and Computer Science at Stanford University and MIT, he is the chief scientist and senior vice president at Nvidia where he leads the company's research efforts in high-performance computing and artificial intelligence. Since 2021, he has been a member of the President's Council of Advisors on Science and Technology .
Bill Dally's Published Works
Published Works
- Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding (2015) (6725)
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size (2016) (5225)
- Learning both Weights and Connections for Efficient Neural Network (2015) (4910)
- Principles and Practices of Interconnection Networks (2004) (3386)
- Route packets, not wires: on-chip interconnection networks (2001) (2636)
- Deadlock-Free Message Routing in Multiprocessor Interconnection Networks (1987) (2274)
- EIE: Efficient Inference Engine on Compressed Deep Neural Network (2016) (2092)
- Virtual-channel flow control (1990) (1658)
- Route packets, not wires: on-chip inteconnection networks (2001) (1382)
- Performance Analysis of k-Ary n-Cube Interconnection Networks (1987) (1045)
- Memory access scheduling (2000) (989)
- Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training (2017) (980)
- The GPU Computing Era (2010) (971)
- Trained Ternary Quantization (2016) (914)
- SCNN: An accelerator for compressed-sparse convolutional neural networks (2017) (863)
- Digital systems engineering (1998) (843)
- - LEVEL ACCURACY WITH 50 X FEWER PARAMETERS AND < 0 . 5 MB MODEL SIZE (2016) (672)
- Technology-Driven, Highly-Scalable Dragonfly Topology (2008) (630)
- A detailed and flexible cycle-accurate Network-on-Chip simulator (2013) (609)
- GPUs and the Future of Parallel Computing (2011) (581)
- A delay model and speculative architecture for pipelined routers (2001) (576)
- Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels (1993) (570)
- Design tradeoffs for tiled CMP on-chip networks (2006) (566)
- ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA (2016) (546)
- Sequoia: Programming the Memory Hierarchy (2006) (533)
- Research Challenges for On-Chip Interconnection Networks (2007) (495)
- Smart Memories: a modular reconfigurable architecture (2000) (490)
- The torus routing chip (2005) (462)
- Flattened butterfly: a cost-efficient topology for high-radix networks (2007) (460)
- Flattened Butterfly Topology for On-Chip Networks (2007) (445)
- Imagine: Media Processing with Streams (2001) (382)
- Wave height variation across beaches of arbitrary profile (1985) (365)
- Point Sample Rendering (1998) (360)
- Merrimac: Supercomputing with Streams (2003) (341)
- Programmable Stream Processors (2003) (336)
- Register organization for media processing (2000) (308)
- The J-machine Multicomputer: An Architectural Evaluation (1993) (297)
- A bandwidth-efficient architecture for media processing (1998) (293)
- Energy-efficient mechanisms for managing thread context in throughput processors (2011) (264)
- A 14-mW 6.25-Gb/s Transceiver in 90-nm CMOS (2007) (258)
- The Imagine Stream Processor (2002) (256)
- A VLSI Architecture for Concurrent Data Structures (1987) (233)
- Low-power area-efficient high-speed I/O circuit techniques (2000) (231)
- The message-driven processor: a multicomputer processing node with efficient mechanisms (1992) (214)
- A low-power multiplying DLL for low-jitter multigigahertz clock generation in highly integrated digital chips (2002) (214)
- Express Cubes: Improving the Performance of k-Ary n-Cube Interconnection Networks (1989) (214)
- Exploring the Regularity of Sparse Structure in Convolutional Neural Networks (2017) (211)
- The BlackWidow High-Radix Clos Network (2006) (190)
- Microarchitecture of a high radix router (2005) (188)
- Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture (2019) (186)
- GOAL: a load-balanced adaptive routing algorithm for torus networks (2003) (179)
- J-machine: A fine-grain concurrent computer (1989) (178)
- Efficient Embedded Computing (2008) (175)
- Flit-reservation flow control (2000) (168)
- Suspended Sediment Transport and Beach Profile Evolution (1984) (165)
- Transmitter equalization for 4-Gbps signaling (1997) (156)
- DSD: Dense-Sparse-Dense Training for Deep Neural Networks (2016) (155)
- Jitter transfer characteristics of delay-locked loops - theories and design techniques (2003) (151)
- Evaluating the Imagine stream architecture (2004) (150)
- Deep Generative Adversarial Networks for Compressed Sensing Automates MRI (2017) (144)
- A Delay Model for Router Microarchitectures (2001) (140)
- The M-machine multicomputer (1997) (138)
- How scaling will change processor architecture (2004) (138)
- Evaluating Bufferless Flow Control for On-chip Networks (2010) (137)
- Elastic-buffer flow control for on-chip networks (2009) (135)
- The Reliable Router: A Reliable and High-Performance Communication Substrate for Parallel Computers (1994) (132)
- Hardware support for fast capability-based addressing (1994) (128)
- Guaranteed scheduling for switches with configuration overhead (2002) (128)
- President’s Council of Advisors on Science and Technology (PCAST) (2015) (126)
- A Programmable 512 GOPS Stream Processor for Signal, Image, and Video Processing (2007) (122)
- Processor Coupling: Integrating Compile Time and Runtime Scheduling for Parallelism (1992) (122)
- SpArch: Efficient Architecture for Sparse Matrix Multiplication (2020) (121)
- Indirect adaptive routing on large scale interconnection networks (2009) (118)
- A modeling investigation of the breaking wave roller with application to cross‐shore currents (1995) (118)
- Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems (2017) (117)
- Domain-specific hardware accelerators (2020) (116)
- A MODEL FOR BREAKER DECAY ON BEACHES (1984) (116)
- Efficient conditional operations for data-parallel architectures (2000) (114)
- Scaling the Power Wall: A Path to Exascale (2014) (114)
- Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor (2012) (110)
- Architecture of a message-driven processor (1987) (110)
- Concurrent aggregates (CA) (1990) (108)
- Exploring the Granularity of Sparsity in Convolutional Neural Networks (2017) (106)
- Efficient Sparse-Winograd Convolutional Neural Networks (2018) (103)
- A programming system for the imagine media processor (2002) (101)
- DOE Advanced Scientific Computing Advisory Subcommittee (ASCAC) Report: Top Ten Exascale Research Challenges (2014) (101)
- Network and processor architecture for message-driven computers (1990) (99)
- System design of the J-Machine (1990) (99)
- Compiling for stream processing (2006) (98)
- The M-Machine multicomputer (1995) (91)
- Worst-case Traffic for Oblivious Routing Functions (2002) (89)
- Compilation for explicitly managed memory hierarchies (2007) (89)
- Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor (1998) (88)
- Media processing applications on the Imagine stream processor (2002) (88)
- VLSI architecture: past, present, and future (1999) (83)
- Darwin: A Genomics Co-processor Provides up to 15,000X Acceleration on Long Read Assembly (2018) (82)
- Random breaking waves: Field verification of a wave-by-wave algorithm for engineering application (1992) (81)
- Allocator implementations for network-on-chip routers (2009) (80)
- Design of a Self-Timed VLSI Multicomputer Communication Controller, (1987) (80)
- Globally Adaptive Load-Balanced Routing on Tori (2004) (79)
- DSD: Regularizing Deep Neural Networks with Dense-Sparse-Dense Training Flow (2016) (77)
- Power, Programmability, and Granularity: The Challenges of ExaScale Computing (2011) (76)
- A compile-time managed multi-level register file hierarchy (2011) (75)
- A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS for Advanced Packaging Applications (2013) (75)
- Exploring the VLSI scalability of stream processors (2003) (75)
- The role of custom design in ASIC chips (2000) (73)
- Cost-Efficient Dragonfly Topology for Large-Scale Systems (2009) (72)
- MAGNet: A Modular Accelerator Generator for Neural Networks (2019) (71)
- An Energy-Efficient Processor Architecture for Embedded Systems (2008) (71)
- Adaptive Routing in High-Radix Clos Network (2006) (70)
- Experience with CST: programming and implementation (1989) (69)
- Architecting an Energy-Efficient DRAM System for GPUs (2017) (69)
- Stream Processors: Progammability and Efficiency (2004) (63)
- Evaluation Of Mechanisms For Fine-grained Parallel Programs In The J-machine And The Cm-5 (1993) (59)
- A 14mW 6.25Gb/s Transceiver in 90nm CMOS for Serial Chip-to-Chip Communications (2007) (59)
- A Deep Neural Network Compression Pipeline: Pruning, Quantization, Huffman Encoding (2015) (58)
- GAD: A 12-GS/s CMOS 4-bit A/D converter for an equalized multi-level link (1999) (57)
- Microarchitecture of a High-Radix Router (2005) (57)
- Architecture and implementation of the reliable router (1994) (57)
- Efficient methods and hardware for deep learning (2017) (56)
- The J-machine network (1992) (56)
- Network congestion avoidance through Speculative Reservation (2012) (56)
- Finite-grain message passing concurrent computers (1988) (55)
- Darwin: A Genomics Co-processor Provides up to 15, 000X Acceleration on Long Read Assembly (2018) (54)
- The J-Machine: System Support for Actors (1988) (54)
- Stream register files with indexed access (2004) (53)
- Random breaking waves: A closed-form solution for planar beaches (1990) (53)
- MARS: A Multiprocessor-Based Programmable Accelerator (1987) (51)
- Interconnect-limited VLSI architecture (1999) (51)
- Polygon rendering on a stream architecture (2000) (51)
- 0.622-8.0 Gbps 150 mW serial IO macrocell with fully flexible preemphasis and equalization (2003) (51)
- Scatter-add in data parallel architectures (2005) (51)
- Longshore Bar Formation—Surf Beat or Undertow? (1987) (50)
- Throughput-centric routing algorithm design (2003) (49)
- TRANSFORMATION OF RANDOM BREAKING WAVES ON SURF BEAT (1986) (48)
- The message-driven processor (1992) (47)
- Universal Mechanisms for Concurrency (1989) (47)
- A 90 mW 4 Gb/s equalized I/O circuit with input offset cancellation (2000) (46)
- Comparing Reyes and OpenGL on a stream architecture (2002) (45)
- A second-order semi-digital clock recovery circuit based on injection locking (2003) (45)
- The VLSI implementation and evaluation of area-and energy-efficient streaming media processors (2003) (45)
- A 33-mW 8-Gb/s CMOS clock multiplier and CDR for highly integrated I/Os (2004) (44)
- Concurrent Event Handling through Multithreading (1999) (44)
- Undistorted Froude Model for Surf Zone Sediment Transport (1986) (43)
- Architectural Support for the Stream Execution Model on General-Purpose Processors (2007) (43)
- A second-order semidigital clock recovery circuit based on injection locking (2003) (43)
- Deep compression and EIE: Efficient inference engine on compressed deep neural network (2016) (43)
- A tuning framework for software-managed memory hierarchies (2008) (42)
- A 0.2-2 GHz 12 mW multiplying DLL for low-jitter clock synthesis in highly-integrated data communication chips (2002) (41)
- A hardware logic simulation system (1990) (41)
- ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA (2016) (41)
- Analysis and Performance Results of a Molecular Modeling Application on Merrimac (2004) (41)
- A portable runtime interface for multi-level memory hierarchies (2008) (40)
- Network endpoint congestion control for fine-grained communication (2015) (40)
- The Delta Tree: An Object-Centered Approach to Image-Based Rendering (1996) (40)
- The Design Space of Data-Parallel Memory Systems (2006) (40)
- Executing irregular scientific applications on stream architectures (2007) (40)
- Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference (2019) (39)
- Locality-preserving randomized oblivious routing on torus networks (2002) (39)
- A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors (2012) (39)
- A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm (2019) (39)
- Stream Scheduling (2001) (38)
- Object-oriented concurrent programming in CST (1988) (37)
- An 84-mW 4-Gb/s clock and data recovery circuit for serial link applications (2001) (36)
- Elastic interconnects: repeater-inserted long wiring capable of compressing and decompressing data (2001) (35)
- Simultaneous bidirectional signalling for IC systems (1990) (34)
- A 1.17pJ/b 25Gb/s/pin ground-referenced single-ended serial link for off- and on-package communication in 16nm CMOS using a process- and temperature-adaptive voltage regulator (2018) (34)
- Topology optimization of interconnection networks (2006) (34)
- VLSI design and verification of the Imagine processor (2002) (33)
- A 0.32–128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With Ground-Referenced Signaling in 16 nm (2020) (33)
- Adaptive Backpressure: Efficient buffer management for on-chip networks (2012) (32)
- Adaptive channel queue routing on k-ary n-cubes (2004) (32)
- An object oriented architecture (1985) (32)
- Communication scheduling (2000) (31)
- A mechanism for efficient context switching (1991) (31)
- Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures (2010) (30)
- The Named-State Register File: implementation and performance (1995) (30)
- The Even/Odd Synchronizer: A Fast, All-Digital, Periodic Synchronizer (2010) (30)
- Operand Registers and Explicit Operand Forwarding (2009) (29)
- CMOS high-speed I/Os - present and future (2003) (29)
- A 1.17-pJ/b, 25-Gb/s/pin Ground-Referenced Single-Ended Serial Link for Off- and On-Package Communication Using a Process- and Temperature-Adaptive Voltage Regulator (2019) (29)
- Accelerating Chip Design With Machine Learning (2020) (29)
- SCNN (2017) (29)
- FPGAs versus GPUs in Data centers (2017) (29)
- Communication Scheduling (2000) (28)
- The J-Machine: A Fine-Gain Concurrent Computer (1989) (28)
- Migration in Single Chip Multiprocessors (2002) (28)
- Hardware-Enabled Artificial Intelligence (2018) (27)
- Memory hierarchy design for stream computing (2005) (27)
- High-performance bidirectional signalling in VLSI systems (1993) (27)
- The end of denial architecture and the rise of throughput computing (2009) (26)
- A 28 nm 2 Mbit 6 T SRAM With Highly Configurable Low-Voltage Write-Ability Assist Implementation and Capacitor-Based Sense-Amplifier Input Offset Compensation (2016) (26)
- Coastal Dynamics '95 (1996) (26)
- A 20-Gb/s 0.13-/spl mu/m CMOS serial link transmitter using an LC-PLL to directly drive the output multiplexer (2005) (25)
- THE ROLE OF ROLLERS IN THE SURF ZONE CURRENTS (1995) (24)
- Router designs for elastic buffer on-chip networks (2009) (24)
- A Delay Metric for Video Object Detection: What Average Precision Fails to Tell (2019) (23)
- Packet chaining: Efficient single-cycle allocation for on-chip networks (2011) (23)
- Low-latency plesiochronous data retiming (1995) (23)
- A 0.54pJ/b 20Gb/s ground-referenced single-ended short-haul serial link in 28nm CMOS for advanced packaging applications (2013) (22)
- A Hardware Architecture for Switch-Level Simulation (1985) (22)
- Packet Chaining: Efficient Single-Cycle Allocation for On-Chip Networks (2011) (22)
- The J-machine system (1991) (22)
- MLSys: The New Frontier of Machine Learning Systems (2019) (21)
- SysML: The New Frontier of Machine Learning Systems (2019) (20)
- 21st century digital design tools (2013) (20)
- Throughput computing (2010) (20)
- Enabling Technology for On-Chip Interconnection Networks (2007) (20)
- Channel reservation protocol for over-subscribed channels and destinations (2013) (20)
- Micro-optimization of floating-point operations (1989) (20)
- Detached Breakwaters for Shore Protection (2018) (20)
- An Efficient, Protected Message Interface (1998) (19)
- CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video (2018) (19)
- Darwin: A Hardware-acceleration Framework for Genomic Sequence Alignment (2017) (19)
- Reuse Distance-Based Probabilistic Cache Replacement (2015) (19)
- High-performance electrical signaling (1998) (19)
- Designing Optimum One-Level Carry-Skip Adders (1993) (17)
- Observations and simulation of winds, surge, and currents on Florida's east coast during hurricane Jeanne (2004) (2012) (17)
- Architecture and Design of the MARS Hardware Accelerator (1987) (17)
- Explaining the gap between ASIC and custom power: a custom perspective (2005) (17)
- VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference (2021) (17)
- Darwin-WGA: A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup (2019) (17)
- A tracking clock recovery receiver for 4-Gbps signaling (1998) (17)
- SLIP: Reducing wire energy in the memory hierarchy (2015) (17)
- Observation and simulation of winds and hydrodynamics in St. Johns and Nassau Rivers (2012) (16)
- Performance evaluation of ephemeral logging (1993) (16)
- Mass flux and undertow in a surf zone, by I.A. Svendsen — Discussion (1986) (16)
- Fault Tolerance Techniques for the Merrimac Streaming Supercomputer (2005) (16)
- Digital Systems Engineering: TIMING CONVENTIONS (1998) (16)
- Computer Architecture in the Many-Core Era (2006) (15)
- Evaluating Elastic Buffer and Wormhole Flow Control (2011) (15)
- Register Pointer Architecture for Efficient Embedded Processors (2007) (15)
- Ground-referenced signaling for intra-chip and short-reach chip-to-chip interconnects (2018) (15)
- Hierarchical Instruction Register Organization (2008) (15)
- High-radix interconnection networks (2008) (15)
- Experience with Concurrent Aggregates (CA): Implementation and Programming (1990) (14)
- Object-oriented concurrent programming in CST (1988) (14)
- A stream processor development platform (2002) (14)
- THE INFLUENCE OF ROLLERS ON LONGSHORE CURRENTS (1997) (14)
- Elastic Buffer Flow Control for On-Chip Networks (2013) (13)
- Tradeoff between data-, instruction-, and thread-level parallelism in stream processors (2007) (13)
- Retrospective: the J-machine (1998) (13)
- Block parallel programming for real-time applications on multi-core processors (2008) (11)
- Evolution of the Graphics Processing Unit (GPU) (2021) (11)
- The Message Driven Processor: an integrated multicomputer processing element (1992) (11)
- Modeling Wave Transformation in the Surf Zone. (1984) (11)
- Merrimac-high-performance and highly-efficient scientific computing with streams (2006) (11)
- Digital Design: A Systems Approach (2012) (11)
- Conditional techniques for stream processing kernels (2004) (10)
- Extended ephemeral logging: log storage management for applications with long lived transactions (1997) (10)
- A Fine-Grained GALS SoC with Pausible Adaptive Clocking in 16 nm FinFET (2019) (10)
- Experiments with Dataflow on a General-Purpose Parallel Computer (1991) (9)
- Named State and Efficient Context Switching (1994) (9)
- The J-Machine architecture and evaluation (1993) (9)
- Optimal Operation of a Plug-in Hybrid Vehicle with Battery Thermal and Degradation Model (2020) (9)
- Fine-grain dynamic instruction placement for L0 scratch-pad memory (2010) (9)
- On-Chip Active Messages for Speed, Scalability, and Efficiency (2015) (9)
- A 17–95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm (2022) (9)
- Detached breakwaters for shore protection / by William R. Dally, Joan Pope. (1986) (9)
- Monolithic chaotic communications system (2001) (9)
- Thread prioritization: a thread scheduling mechanism for multiple-context parallel processors (1995) (8)
- FINE-SCALE MEASUREMENT OF SEDIMENT SUSPENSION BY BREAKING WAVES AT SUPERTANK (1993) (8)
- Bandwidth-efficient deep learning (2018) (8)
- Pi: a parallel architecture interface (1992) (8)
- Long Wave Effects in Laboratory Studies of Cross-Shore Transport (1991) (8)
- Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training (2022) (8)
- Embracing heterogeneity: parallel programming for changing hardware (2009) (8)
- The reconfigurable arithmetic processor (1988) (8)
- Digital Design Using VHDL: A Systems Approach (2016) (8)
- A Fine-Grain, Message-Passing Processing Node (1988) (8)
- MDP design tools and methods (1992) (7)
- Multi-Core for HPC: breakthrough or breakdown? (2006) (7)
- Architecture of a message-driven processor (1987) (7)
- The Longshore Transport Enigma and Analysis of a 10-Year Record of Wind-Driven Nearshore Currents (2017) (7)
- Darwin (2018) (7)
- A 33mW 8Gb/s CMOS clock multiplier and CDR for highly integrated I/Os (2003) (7)
- A Multicomputer Processing Node with Efficient Mechanisms (1992) (7)
- A 0.11 PJ/OP, 0.32-128 Tops, Scalable Multi-Chip-Module-Based Deep Neural Network Accelerator Designed with A High-Productivity vlsi Methodology (2019) (7)
- The Subspace Model: A Theory of Shapes for Parallel Systems (1995) (7)
- 20Gb/s 0.13/spl mu/m CMOS serial link transmitter using an LC-PLL to directly drive the output multiplexer (2004) (6)
- Wave transformation in the surf zone (1989) (6)
- Algorithms for Accuracy Enhancement in a Hardware Logic Simulator (1989) (6)
- The Balanced Cube: A Concurrent Data Structure (1985) (6)
- Simba: scaling deep-learning inference with chiplet-based architecture (2021) (6)
- Champagne: Whole-genome phylogenomic character matrix method places Myomorpha basal in Rodentia (2019) (6)
- Modeling Nearshore Currents on Reef-Fronted Beaches (2001) (6)
- Interconnect-Centric Computing (2007) (6)
- Evaluating the Impact of Beach Nourishment on Surfing: Surf City, Long Beach Island, New Jersey, U.S.A. (2018) (6)
- Media processing using streams (1998) (6)
- Panel Statement (2011) (6)
- Surf Zone Processes (2019) (6)
- Evaluating the locality benefits of active messages (1995) (6)
- Concurrent Computer Architecture (1987) (6)
- On-Demand Dynamic Branch Prediction (2015) (6)
- Efficient topologies for large-scale cluster networks (2010) (6)
- How to Choose the Grain Size of a Parallel Computer (1994) (5)
- Micro-Optimization of Floating Point Operations (1989) (5)
- Transmitter Equalization for 4 Gb / s Signalli ng (2007) (5)
- Guaranteeing Forward Progress of Unified Register Allocation and Instruction Scheduling (2011) (5)
- Moving the needle, computer architecture research in academe and industry (2010) (5)
- Spills , Fills , and Kills An Architecture for Reducing Register-Memory Traffic (2000) (5)
- 8.6 A 6.5-to-23.3fJ/b/mm balanced charge-recycling bus in 16nm FinFET CMOS at 1.7-to-2.6Gb/s/wire with clock forwarding and low-crosstalk contraflow wiring (2016) (5)
- Scalable opto-electronic network (SOENet) (2002) (5)
- Memory and control organizations of stream processors (2007) (5)
- Small Unit Operations (1998) (5)
- The effects of explicitly parallel mechanisms on the multi-ALU processor cluster pipeline (1998) (5)
- Optimal Operation of a Plug-In Hybrid Vehicle (2018) (5)
- High-Resolution Measurements of Sand Suspension by Plunging Breakers in a Large Wave Channel (1994) (5)
- COSMOS: an operating system for a fine-grain concurrent computer (1993) (5)
- Proceedings of the sixth MIT conference on Advanced research in VLSI (1990) (4)
- Message-Driven Processor Architecture (1988) (4)
- Experiences Implementing Dataflow on a General-Purpose Parallel Computer (1991) (4)
- A Data-Driven IDCT Architecture for Low Power Video Applications (1996) (4)
- Darwin: A Genomics Coprocessor (2019) (4)
- Message-Driven Processor Architecture, Version 11 (1988) (4)
- NEARSHORE WAVE AND CURRENT MEASUREMENTS DURING HURRICANE JEANNE (2005) (4)
- Analysis of a 10-Year Record of Nearshore Directional Wave Spectra and Implications to Littoral Processes Research and Engineering Practice (2016) (4)
- LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update (2021) (4)
- The MOSSIM Simulation Engine Architecture and Design (1984) (4)
- On the Performance of k-ary n-cube Interconnection Networks (1986) (4)
- Energy Efficient On-Demand Dynamic Branch Prediction Models (2020) (4)
- Virtual-Channel Flow Controll (2004) (3)
- It's about the power: An architect's view of interconnect (2012) (3)
- Data parallel address architecture (2006) (3)
- Data Mining and the Human Genome (2000) (3)
- Concurrent VLSI Architecture Memo 122 Stanford University Computer Systems Laboratory Stream Scheduling (2001) (3)
- Comparison of a mid-shelf wave hindcast to ADCP-measured directional spectra and their transformation to shallow water (2018) (3)
- Circuit challenges for future computing systems (2011) (3)
- NOCS 2007 Keynote 1 Enabling Technology for On-Chip Interconnection Networks (2007) (3)
- Multiple Input Multiple Output (MIMO) Control of a Novel Three Phase Multilevel Inverter (2020) (3)
- BaM: A Case for Enabling Fine-grain High Throughput GPU-Orchestrated Access to Storage (2022) (3)
- Cost-efficient dragonfly topology for large-scale systems (2009) (3)
- Advanced research in VLSI : proceedings of the sixth MIT Conference (1990) (3)
- A 0.297-pJ/bit 50.4-Gb/s/wire Inverter-Based Short-Reach Simultaneous Bidirectional Transceiver for Die-to-Die Interface in 5nm CMOS (2022) (3)
- The bleeding edge (1998) (3)
- A Fast Translation Method for Paging on top of Segmentation (1992) (3)
- Maximizing the Filter Rate of L 0 Compiler-Managed Instruction Stores by Pinning (2009) (3)
- PROBABILISTIC MODELING OF LONG-TERM WAVE CLIMATE (1997) (3)
- CLOSED-FORM SOLUTIONS FOR THE PROBABILITY DENSITY OF WAVE HEIGHT IN THE SURF ZONE (1988) (3)
- Champagne: Automated Whole-Genome Phylogenomic Character Matrix Method Using Large Genomic Indels for Homoplasy-Free Inference (2022) (2)
- Digital ultrasonic instrument for ophthalmic use (1990) (2)
- Some considerations on coastal processes relevant to sea level rise (1987) (2)
- Block-Parallel Programming for Real-Time Embedded Applications (2010) (2)
- PatchNet - Short-range Template Matching for Efficient Video Processing (2021) (2)
- Digital Systems Engineering: POWER DISTRIBUTION (1998) (2)
- Proceedings of the seventh international conference on Architectural support for programming languages and operating systems (1996) (2)
- An Unconventional, Highly Multipath-Resistant, Modulation Scheme (1997) (2)
- NOISE IN DIGITAL SYSTEMS (1998) (2)
- Development of a Unique Instrumentation System to Monitor Underwater Noise Due to Pile Driving (2020) (2)
- Guest Editors' Introduction: Hot Chips 12 (2001) (2)
- High-Speed Logic, Circuits, Libraries and Layout (2004) (2)
- The J-Machine: A fine-grain parallel computer (1992) (2)
- Estimation of Ebb-Tidal Shoal Sediment Transport Based on a Roller Inclusive Boussinesq Breaking Model (1999) (2)
- The J-Machine : A Retrospective (1998) (2)
- Subspace Optimizations (1994) (2)
- Future of on-chip interconnection architectures (2007) (2)
- Digital Systems Engineering: INTRODUCTION TO DIGITAL SYSTEMS ENGINEERING (1998) (2)
- Simba (2019) (2)
- Mechanisms for Parallel Computers (1993) (2)
- Concurrent aggregates (CA) (1990) (2)
- Meningeal irritation due to tetracycline administration. (1966) (2)
- Mechanisms for Concurrent Computing (1988) (2)
- Architecture and Implementation of the Reliable Router 1 (1994) (2)
- Structured Application-Specific Integrated Circuit (ASIC) Study (2008) (1)
- XEL: extended ephemeral logging for log storage management (1994) (1)
- CG-OoO (2016) (1)
- Architectural and implementation issues for multithreading (panel session I) (1994) (1)
- Stream Scheduling: A Framework to Manage Bulk Operations in a Memory Hierarchy (2007) (1)
- Author retrospective for design tradeoffs for tiled CMP on-chip networks (2014) (1)
- Eecient, Protected Message Interface in the Mit M-machine (2008) (1)
- Exploiting Structure and Managing Wires to Increase Density and Performance (2004) (1)
- Digital Systems Engineering: PACKAGING OF DIGITAL SYSTEMS (1998) (1)
- Hot Chips 16: Power, Parallelism, and Memory Performance (2005) (1)
- M-Machine Microarchitecture v1.1 (1993) (1)
- The Mmmachine Multicomputer (1995) (1)
- Graph Streams (2009) (1)
- The case for broader computer architecture education: keynote address (2004) (1)
- INVITED: Bandwidth-Efficient Deep Learning (2018) (1)
- A Novel High-Efficiency Three-Phase Multilevel PV Inverter With Reduced DC-Link Capacitance (2023) (1)
- MODELING AND ANALYSIS OF WIRES (1998) (1)
- Agile Multi-Function Arrays (2007) (1)
- Analysis and utilization of long-term data from a nearshore ADCP (2011) (1)
- A universal parallel computer architecture (1993) (1)
- Current parking regulator for zero droop/overshoot load transient response (2016) (1)
- Performance Implications of Age-Based Allocation in On-Chip-Networks CVA MEMO 129 (2011) (1)
- INLET DYNAMICS FROM SEMI-ANNUAL SURVEYS (1999) (0)
- Digital Communication Devices Based on Nonlinear Dynamics and Chaos (2003) (0)
- Critical problems in very-large-scale computer systems. Semiannual technical report, 1 October 1989-31 March 1990 (1990) (0)
- Gpu Computing Is at a Tipping Point, Becoming More Widely Used in Demanding Consumer Applications and High-performance Computing. This Article Describes the Rapid Evolution of Gpu Architectures—from Graphics Processors to Massively Parallel Many-core Multiprocessors, Recent Developments in Gpu Compu (2010) (0)
- Implementation of An Effective Router Architecture for NoC on FPGA (2011) (0)
- Sikker : A High-Performance Distributed System Architecture for Secure Service-Oriented Computing (2016) (0)
- Foreword to the First Printing (2003) (0)
- Stream Scheduling: A Framework to Manage Bulk Operations in Memory Hierarchies (2008) (0)
- Gated precharging of flip-flop sense nodes for reduced power dissipation (2012) (0)
- Issues in the Design and Implementation of Instruction Processors for Multicomputers (Position Statement) (1994) (0)
- Acknowledgments 7 Conclusions and Future Work (2007) (0)
- Molecular Electronics: Interfacing the Nano- and Micro-Worlds (2000) (0)
- Mass-referenced single-end signaling-associated graphics processing unit multi-chip module (2013) (0)
- HIGH PERFORMANCE BIOCOMPUTATION (2005) (0)
- Composite remote connection (1999) (0)
- Experiments with data flow on a general-purpose parallel computer. Memorandum report (1991) (0)
- Globally Adaptive Load-Balanced Routing on kary n-cubes (2004) (0)
- Roller Momentum-Thickness and Residual Turbulence (2001) (0)
- Security and Privacy in the NII (1995) (0)
- Input Offset Cancellation (2000) (0)
- Unconventional Systems Integration (1996) (0)
- Pulsenet - A Parallel Flash Sampler and Digital Processor IC for Optical SETI (2006) (0)
- Frontier vs the Exascale Report: Why so long? and Are We Really There Yet? (2022) (0)
- Double Trigger low power flip-flop circuit (2012) (0)
- Digital Systems Engineering: TIMING CIRCUITS (1998) (0)
- IPDPS 2011 Tuesday 25th Year Panel - Looking back (2011) (0)
- CHAPTER 40 STOCHASTIC MODELING OF SURFING CLIMATE (2010) (0)
- OP-VENT (2022) (0)
- A 0.297-pJ/Bit 50.4-Gb/s/Wire Inverter-Based Short-Reach Simultaneous Bi-Directional Transceiver for Die-to-Die Interface in 5-nm CMOS (2023) (0)
- Stream Processing : Hardware and Software Discussion Lecture (2002) (0)
- Northeast Florida—A New Hotspot for Hurricane Damage? (2021) (0)
- OBLIQUE BREAKWATERS FOR MULTI-PURPOSE BEACH RECREATION AND SHORE PROTECTION (2018) (0)
- Integrated circuit and method for performing address-based SRAM access aids (2014) (0)
- An Unconventional , Highly Multipath-Resistant , Modulation Scheme Study (2002) (0)
- Flexible Memory Systems. (AASERT Fellowship). (1996) (0)
- SPAA'21 Panel Paper: Architecture-Friendly Algorithms versus Algorithm-Friendly Architectures (2021) (0)
- Hardware Demonstration of a Novel Three-Phase Multilevel Inverter (2022) (0)
- Digital Systems Engineering: REFERENCES (1998) (0)
- Critical problems in very-large-scale computer systems. Semiannual technical report, 1 April-30 September 1989 (1988) (0)
- Elastic Buffer Networks-on-Chip (2008) (0)
- Microsurveillance of the Urban Battlefield (1995) (0)
- Sixteenth Conference on Advanced Research in VLSI : proceedings : March 27-29, 1995, Chapel Hill, North Carolina (1995) (0)
- GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture (2022) (0)
- An apparatus for transferring data on the occurrence of asynchronous signals. (1986) (0)
- The Balanced Cube (1987) (0)
- Sikker : A Distributed System Architecture for Secure High Performance Computing (2015) (0)
- 0 The J-Machine : System Support for Actors 1 (0)
- Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Proposal for Thesis Research in Partial Fulfillment Of the Requirements for the Degree of Doctor of Philosophy (2003) (0)
- ( 1 ) of Engineers DETACHED BREAKWATERS FOR SHORE PROTECTION by 0 (0)
- A DISSERTATION SUBMITTED TO THE DEPARTMENT OF MECHANICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY (1995) (0)
- CHAPTER SIX A Model for Breaker Decay on Beaches (2010) (0)
- Memory and Control Organizations of Stream Processors a Dissertation Submitted to the Department of Electrical Engineering and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy (2007) (0)
- A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm (2023) (0)
- SatIn: Hardware for Boolean Satisfiability Inference (2023) (0)
- Closure of "Suspended Sediment Transport and Beach Profile Evolution" (1987) (0)
- CHAPTER 64 PROBABILISTIC MODELING OF LONG-TERM WAVE CLIMATE (2010) (0)
- Lab41 Reading Group: SqueezeNet (2016) (0)
- On the model of computation (2022) (0)
- Buffer and Delay Bounds in High Radix Interconnection Networks (2004) (0)
- Coastal Dynamics '95: Proceedings of the International Conference on Coastal Research in Terms of Large Scale Experiments : Gdansk, Poland September 4-8, 1995 (1996) (0)
- Stream Processors (2009) (0)
- HoLiSwap : Reducing Wire Energy in L 1 Caches CVA MEMO 136 (2015) (0)
- Digital Systems Engineering: ADVANCED SIGNALING TECHNIQUES (1998) (0)
- A Quick Overview (2020) (0)
- The utility of fast active messages on many-core chips: Efficient supercomputing project (2011) (0)
- Gpus and the Future of Parallel Computing This Article Discusses the Capabilities of State-of-the Art Gpu-based High- Throughput Computing Systems and Considers the Challenges to Scaling Single-chip Parallel-computing Systems, Highlighting High-impact Areas That the Computing Research Community Can (2011) (0)
- Point Sample Rendering Libraries Point Sample Rendering Title: Professor of Electrical Engineering and Computer Science (2009) (0)
- A static CMOS flip-flop with low power consumption (2012) (0)
- [18] Masahiro Yasugi and Akinori Yonezawa. an Object-oriented Parallel Algorithm for the Newtonian N-body Problem. Yonezawa: Abcl/onem-4: a New Software/hardware Architecture for Object-oriented Concurrent Computing on an Extended Data (1992) (0)
- The practice of digital system design (2015) (0)
- Session details: On-chip interconnection networks (2009) (0)
- Network Management Unit ( NMU ) : A Network Interface Architecture for Job-Level Protection Domains CVA MEMO 133 (2013) (0)
- Session details: On-chip interconnection networks (2009) (0)
- Specialized versus general-purpose hardware (2011) (0)
- Digital Systems Engineering: SIGNALING CIRCUITS (1998) (0)
- A Coherent VLSI Environment (1987) (0)
- HoLiSwap: Reducing Wire Energy in L1 Caches (2017) (0)
- Image Captioning with Sparse LTSM (2017) (0)
- A 2-to-20 GHz Multi-Phase Clock Generator with Phase Interpolators Using Injection-Locked Oscillation Buffers for High-Speed IOs in 16nm FinFET (2019) (0)
- Analysis of Anthropogenic Noise due to Pile Driving Using Computational Fluid Dynamics (2022) (0)
- Conceptual design of a remotely operated vehicle for beach surveying : final report (1989) (0)
- Stream Processing for High-Performance Embedded Systems (2002) (0)
- Communication-oriented computer architecture: data choreography (1997) (0)
- Presentation of gavel (0)
- Session details: Streams to physics processors (2007) (0)
- Concurrent Algorithms for the Max-Flow Problem (1985) (0)
- COST-EFFICIENT DRAGONFLY TOPOLOGY FOR LARGE-SCALE (2009) (0)
This paper list is powered by the following services:
Other Resources About Bill Dally
What Schools Are Affiliated With Bill Dally?
Bill Dally is affiliated with the following schools: