Guang Rong Gao

Guang Rong Gao's AcademicInfluence.com Rankings

Guang Rong Gao

Biology

#12742

World Rank

#16238

Historical Rank

Bioinformatics

#155

World Rank

#157

Historical Rank

Computational Biology

#281

World Rank

#282

Historical Rank

biology Degrees

Download Badge

Biology

Guang Rong Gao's Degrees

PhD Bioinformatics University of California, Santa Cruz
Masters Bioinformatics University of California, Santa Cruz
Bachelors Bioinformatics University of California, Santa Cruz

Why Is Guang Rong Gao Influential?

(Suggest an Edit or Addition)

(See a Problem?)

Guang Rong Gao's Published Works

Number of citations in a given year to any of this author's works

Total number of citations to an author for the works they published in a given year. This highlights publication of the most important work(s) by the author

Published Works

TROLL-Tandem Repeat Occurrence Locator (2002) (213)
Dynamic load balancing on single- and multi-GPU systems (2010) (151)
A novel framework of register allocation for software pipelining (1993) (123)
Using a "codelet" program execution model for exascale machines: position paper (2011) (122)
Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform (2007) (121)
A design study of the EARTH multiprocessor (1995) (115)
Location Consistency-A New Memory Model and Cache Consistency Protocol (2000) (112)
Software pipelining showdown: optimal vs. heuristic methods in a production compiler (1996) (111)
Advances in the dataflow computational model (1999) (108)
Designing the McCAT Compiler Based on a Family of Structured Intermediate Representations (1992) (104)
A Register Allocation Framework Based on Hierarchical Cyclic Interval Graphs (1992) (100)
Identifying loops using DJ graphs (1996) (100)
Minimizing register requirements under resource-constrained rate-optimal software pipelining (1994) (98)
Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures (2007) (96)
Earth: an efficient architecture for running threads (1999) (93)
On The Limits Of Program Parallelism And Its Smoothability (1992) (93)
A linear time algorithm for placing φ-nodes (1995) (92)
Polling Watchdog: Combining Polling and Interrupts for Efficient Message Handling (1996) (88)
Exploiting short-lived variables in superscalar processors (1995) (86)
An efficient pipelined dataflow processor architecture (1988) (80)
Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks (2002) (74)
TiNy threads: a thread virtual machine for the Cyclops64 cellular architecture (2005) (73)
Well-behaved dataflow programs for DSP computation (1992) (73)
ParalleX: A Study of A New Parallel Computation Model (2007) (73)
FAST : A Functionally Accurate Simulation Toolset for the Cyclops 64 Cellular Architecture (2005) (70)
Single-dimension software pipelining for multi-dimensional loops (2004) (69)
A parallel dynamic programming algorithm on a multi-core architecture (2007) (68)
A Framework for Resource-Constrained Rate-Optimal Software Pipelining (1994) (67)
An Implementation of the Codelet Model (2013) (66)
A Multithreaded Parallel Implementation of a Dynamic Programming Algorithm for Sequence Comparison (2000) (66)
Optimizing the Fast Fourier Transform on a Multi-core Architecture (2007) (65)
TERAFLUX: Harnessing dataflow in next generation teradevices (2014) (63)
Minimizing memory requirements in rate-optimal schedules (1994) (61)
Optimization of array accesses by collective loop transformations (1991) (59)
Advanced topics in dataflow computing and multithreading (1994) (59)
Modeling the Weather with a Data Flow Supercomputer (1984) (58)
Mapping the FDTD Application to Many-Core Chip Architectures (2009) (56)
Compiling C for the EARTH multithreaded architecture (1996) (55)
A Polynomial Time Method for Optimal Software Pipelining (1992) (54)
Building multithreaded architectures with off-the-shelf microprocessors (1994) (54)
Scheduling and mapping: software pipelining in the presence of structural hazards (1995) (54)
Hybrid technology multithreaded architecture (1996) (53)
A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture (2006) (52)
Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture (2006) (51)
A timed Petri-net model for fine-grain loop scheduling (1991) (51)
ABC++: Concurrency by Inheritance in C++ (1995) (50)
Parts that add up to a whole : a framework for the analysis of tables (2007) (48)
Multithreaded Architectures: Principles, Projects, and Issues (1994) (47)
Multithreaded algorithms for pricing a class of complex options (2001) (42)
Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures (2003) (42)
Costs and Benefits of Multithreading with Off-the-Shelf RISC Processors (1995) (41)
The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices (2013) (38)
Incremental computation of dominator trees (1995) (37)
Self‐avoiding walks over adaptive unstructured grids (2000) (37)
On achieving balanced power consumption in software pipelined loops (2002) (36)
Maximum Pipelining of Array Operations on Static Data Flow Machine (1983) (36)
A comparative study of multiprocessor list scheduling heuristics (1994) (35)
An energy efficient TLB design methodology (2005) (34)
Processing In Memory: Chips to Petaflops (1997) (34)
Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences (2006) (33)
Analysis of multithreaded multiprocessors with distributed shared memory (1993) (33)
Locality Optimization of Stencil Applications Using Data Dependency Graphs (2010) (32)
Multithreaded Computer Architecture (1994) (32)
Speculative Prefetching of Induction Pointers (2001) (32)
A code mapping scheme for dataflow software pipelining (1990) (31)
Mapping the LU decomposition on a many-core architecture: challenges and solutions (2009) (31)
Minimal register requirements under resource-constrained software pipelining (1994) (30)
Improving Performance of Dynamic Programming via Parallelism and Locality on Multicore Architectures (2009) (30)
TIDeFlow: The Time Iterated Dependency Flow Execution Model (2011) (30)
Landing openMP on cyclops-64: an efficient mapping of openMP to a many-core system-on-a-chip (2006) (30)
Rate-optimal schedule for multi-rate DSP computations (1995) (30)
Designing programming languages for analyzability: a fresh look at pointer data structures (1992) (29)
Well-Behaved Programs for DSP Computation (1992) (29)
A design framework for hybrid-access caches (1995) (29)
Optimized Dense Matrix Multiplication on a Many-Core Architecture (2010) (28)
Location consistency: stepping beyond the barriers of memory coherence and serializability (1993) (28)
Software-Pipelining on Multi-Core Architectures (2007) (28)
A new framework for exhaustive and incremental data flow analysis using DJ graphs (1996) (28)
Register allocation using cyclic interval graphs: a new approach to an old problem (1992) (27)
An Efficient Hybrid Dataflow Architecture Modle (1993) (27)
A novel framework for multi-rate scheduling in DSP applications (1993) (27)
Concurrency Analysis for Shared Memory Programs with Textually Unaligned Barriers (2007) (27)
Elastic history buffer: a low-cost method to improve branch prediction accuracy (1997) (26)
Heap analysis and optimizations for threaded programs (1997) (26)
Automatic data and computation decomposition for distributed memory machines (1995) (24)
Locality Analysis for Distributed Shared-Memory Multiprocessors (1996) (24)
Thread partitioning and scheduling based on cost model (1997) (24)
Minimum Lock Assignment: A Method for Exploiting Concurrency among Critical Sections (2008) (23)
Self-Avoiding Walks over Adaptive Unstructured Grids (1999) (23)
Analysis and performance results of computing betweenness centrality on IBM Cyclops64 (2009) (23)
Overview of the Threaded-C Language (1998) (22)
A pipelined code mapping scheme for static data flow computers (1986) (22)
Exploring Fine-Grained Task-Based Execution on Multi-GPU Systems (2011) (22)
HTMT program execution model (2002) (22)
Algorithmic Aspects of Balancing Techniques for Pipelined Data Flow Code Generation (1989) (22)
Software Pipelining for Stream Programs on Resource Constrained Multicore Architectures (2012) (22)
Power and Energy Impact by Loop Transformations (2000) (22)
Code generation for single-dimension software pipelining of multi-dimensional loops (2004) (21)
Automatically Partitioning Threads for Multithreaded Architectures (1999) (21)
Design of an Efficient Dataflow Architecture without Data Flow (1988) (21)
Register allocation for software pipelined multi-dimensional loops (2005) (20)
Energy efficient tiling on a Many-Core Architecture (2011) (20)
Experiments with the Fresh Breeze tree-based memory model (2011) (20)
Exploiting Schedule Slacks for Rate-Optimal Power-Minimum Software Pipelining (2002) (20)
A new framework for elimination-based data flow analysis using DJ graphs (1998) (20)
Whole Genome Alignment using a Multithreaded Parallel Implementation (2001) (19)
A register pressure sensitive instruction scheduler for dynamic issue processors (1997) (18)
Measurement and modeling of EARTH-MANNA multithreaded architecture (1996) (18)
Speculative execution and branch prediction on parallel machines (1993) (18)
Load adaptive algorithms and implementations for the 2D discrete wavelet transform on fine-grain multithreaded architectures (1999) (18)
On memory models and cache management for shared-memory multiprocessors (1995) (18)
Position Paper: Using a "Codelet" Program Execution Model for Exascale Machines (2011) (17)
A Study of a Software Cache Implementation of the OpenMP Memory Model for Multicore and Manycore Architectures (2010) (17)
Multithreaded algorithms for the fast Fourier transform (2000) (17)
DIMES: an iterative emulation platform for Multiprocessor-System-On-Chip designs (2003) (17)
Toward high-throughput algorithms on many-core architectures (2012) (17)
Landing CG on EARTH: A Case Study of Fine-Grained Multithreading on an Evolutionary Path (2000) (17)
Parallel FEM Simulation of Crack Propagation - Challenges, Status, and Perspectives (2000) (17)
Optimal Modulo Scheduling Through Enumeration (1998) (16)
A Novel High-Speed Memory Organization for Fine-Grain Multi-Thread Computing (1991) (16)
Developing a Communication Intensive Application on the EARTH Multithreaded Architecture (Distinguished Paper) (2000) (16)
Experience on optimizing irregular computation for memory hierarchy in manycore architecture (2008) (16)
Towards an Efficient Hybrid Dataflow Architecture Model (1991) (16)
Location Consistency: Stepping Beyond the Memory Coherence Barrier (1995) (16)
Minimum register instruction sequence problem: revisiting optimal code generation for DAGs (2001) (15)
Compiler-Assisted Cache Replacement: Problem Formulation and Performance Evaluation (2003) (15)
An executable analytical performance evaluation approach for early performance prediction (2003) (15)
Parallel function invocation in a dynamic argument-fetching dataflow architecture (1990) (15)
Implementing parallel hmm-pfam on the EARTH multithreaded architecture (2003) (15)
Compiling for dataflow software pipelining (1990) (15)
E.T.: Re-Thinking Self-Attention for Transformer Models on GPUs (2021) (15)
Performance Characteristics of OpenMP Language Constructs on a Many-core-on-a-chip Architecture (2005) (15)
Dynamic percolation: a case of study on the shortcomings of traditional optimization in many-core architectures (2012) (14)
Tile Percolation: An OpenMP Tile Aware Parallelization Technique for the Cyclops-64 Multicore Processor (2009) (14)
On the Importance of an End-To-End View of Memory Consistency in Future Computer Systems (1997) (14)
Computing phi-nodes in linear time using DJ graphs (1995) (14)
A Discussion in Favor of Dynamic Scheduling for Regular Applications in Many-core Architectures (2012) (14)
A high-speed memory organization for hybrid dataflow / von Neumann computing (1992) (13)
How “hard” is thread partitioning and how “bad” is a list scheduling based partitioning algorithm? (1998) (13)
Co-scheduling hardware and software pipelines (1996) (13)
A fine-grain load-adaptive algorithm of the 2D discrete wavelet transform for multithreaded architectures (2004) (13)
Performance portability on EARTH: a case study across several parallel architectures (2005) (13)
A Maximally Pipelined Tridiagonal Linear Equation Solver (1986) (12)
Optimized lock assignment and allocation: a method for exploiting concurrency among critical sections (2007) (12)
Implementing parallel conjugate gradient on the EARTH multithreaded architecture (2004) (12)
Pipelining-Dovetailing: A Transformation to Enhance Software Pipelining for Nested Loops (1996) (12)
Locality aware concurrent start for stencil applications (2015) (12)
A Study of the EARTH-MANNA Multithreaded System (1996) (12)
Application characterization at scale: lessons learned from developing a distributed open community runtime system for high performance computing (2016) (12)
Determinacy and Repeatability of Parallel Program Schemata (2012) (12)
Experiences with non-numeric applications on multithreaded architectures (1997) (11)
A Novel Methodology Using Genetic Algorithms for the Design of Caches and Cache Replacement Policy (1993) (11)
Toward a Self-aware System for Exascale Architectures (2013) (11)
DJ-graphs and their application to flow graph analyses (1994) (11)
Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture (2013) (11)
Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading (2014) (11)
Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era (2007) (11)
Minimizing communication in rate-optimal software pipelining for stream programs (2010) (11)
A Kahn Principle for Networks of Nonmonotonic Real-time Processes (1993) (11)
Parallel Turing Machine, a Proposal (2017) (11)
Hardware and Software Tradeoffs for Task Synchronization on Manycore Architectures (2011) (10)
Loop Storage Optimization for Dataflow Machines (1991) (10)
Towards a Portable Parallel Programming Environment (1992) (10)
A cluster-based solution for high performance hmmpfam using EARTH execution model (2003) (10)
A Dataflow Programming Language and its Compiler for Streaming Systems (2014) (10)
A New Framework for Analysis and Optimization of Shared Memory Parallel Programs (2005) (10)
Supporting a Dynamic Spmd Model in a Multi-threaded Architecture (1993) (10)
Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP (2009) (9)
Automatic Locality Exploitation in the Codelet Model (2013) (9)
Implementation and evaluation of a communication intensive application on the EARTH multithreaded system (2002) (9)
Sequential Consistency Revisit: The Sufficient Condition and Method to Reason the Consistency Model of a Multiprocessor-on-a-Chip Architecture (2005) (9)
Programming models and system software for future high-end computing systems: work-in-progress (2003) (9)
Dynamic Load Balancers for a Multithreaded Multiprocessor System (2001) (9)
Design and Evaluation of Dynamic Load Balancing Schemes under a Fine-grain Multithreaded Execution Model (1997) (9)
Data parallelism with high performance C (1994) (9)
swFLOW: A large-scale distributed framework for deep learning on Sunway TaihuLight supercomputer (2021) (9)
Towards efficient fine-grain software pipelining (1990) (9)
FreshBreeze: A Data Flow Approach for Meeting DDDAS Challenges (2015) (9)
A Refinement of the HTMT Program Execution Model (1998) (9)
Incremental computation of dominator trees (1997) (8)
Optimizing the LU Benchmark for the Cyclops-64 Architecture (2009) (8)
Toward a Self-Aware Codelet Execution Model (2014) (8)
Iterative layer-based raytracing on CUDA (2009) (8)
Implementation of the EARTH programming model on SMP clusters: a multi‐threaded language and runtime system (2003) (8)
The elephant and the mice: the role of non-strict fine-grain synchronization for modern many-core architectures (2011) (8)
A New Fast Algorithm for Optimal Register Allocation in Modulo Scheduled Loops (1998) (8)
Efficient support of concurrent threads in a hybrid dataflow/von Neumann architecture (1991) (8)
Extending Software Pipelining Techniques for Scheduling Nested Loops (1993) (8)
Analyzable Atomic Sections: Integrating Fine-grained Synchronization and Weak Consistency Models for Scalable Parallelism (2006) (8)
Register Pressure in Software-Pipelined Loop Nests: Fast Computation and Impact on Architecture Design (2005) (8)
Automatic compiler techniques for thread coarsening for multithreaded architectures (2000) (8)
Power-performance trade-offs for energy-efficient architectures: A quantitative study (2002) (8)
Performance analysis of Cooley-Tukey FFT algorithms for a many-core architecture (2010) (8)
Improving power efficiency with compiler-assisted cache replacement (2005) (8)
Self-avoiding Walks over Two-dimensional Adaptive Unstructured Grids (1998) (8)
Hierarchical multithreading: programming model and system software (2006) (8)
A Design Frame for Hybrid Access Cashes (1995) (8)
Polytasks: A Compressed Task Representation for HPC Runtimes (2011) (8)
Performance evaluation of latency tolerant architectures (1992) (8)
Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture (2008) (8)
Exploiting fine-grain parallelism on dataflow architectures (1990) (7)
DEEP: an iterative fpga-based many-core emulation system for chip verification and architecture research (2011) (7)
Programming Experience on Cyclops-64 Multi-Core Chip Architecture (7)
Fine-Grain Stacked Register Allocation for the Itanium Architecture (2002) (7)
Exploitation of locality for energy efficiency for breadth first search in fine-grain execution models (2013) (7)
Massively parallel breadth first search using a tree-structured memory model (2012) (7)
Asynchronous Runtimes in Action: An Introspective Framework for a Next Gen Runtime (2016) (7)
A stability classification method and its application to pipelined solution of linear recurrences (1987) (7)
FTL: a multithreaded environment for parallel computation (1994) (7)
Compiling several classes of communication patterns on a multithreaded architecture (2002) (7)
Quantitive studies of data-locality sensitivity on the EARTH multithreaded architecture: preliminary results (1996) (6)
Generating Fine-Grain Multithreaded Applications Using a Multigrain Approach (2017) (6)
The Importance of Efficient Fine-Grain Synchronization for Many-Core Systems (2016) (6)
Visualizing biosequence data using texture mapping (2002) (6)
The Fresh Breeze Program Execution Model (2011) (6)
Register allocation for software pipelined multidimensional loops (2008) (6)
An efficient parallel algorithm for all pairs examination (1991) (6)
If-Conversion in SSA Form (2004) (6)
Diamond Tiling: A Tiling Framework for Time-iterated Scientic Applications (2009) (6)
Efficient State-Diagram Construction Methods for Software Pipelining (1999) (6)
Supporting a dynamic SPMD in a multi-threaded architecture (1993) (6)
Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (2001) (6)
Design and Implementation of an Efficient Thread Partitioning Algorithm (2000) (6)
A dynamic schema to increase performance in many-core architectures through percolation operations (2013) (6)
Power-aware compilation techniques for high performance processors (2004) (6)
Guest Editors Introduction: Special Issue on OpenMP (2008) (6)
Compiling for multithreaded architectures (2000) (6)
Performance modeling and analysis of multithreaded architectures (1996) (6)
The Multi-Threaded Architecture Multiprocessor (1994) (6)
Latency tolerance: a metric for performance analysis of multithreaded architectures (1997) (5)
Superconducting processors for HTMT: issues and challenges (1999) (5)
Extending the Roofline Model for Asynchronous Many-Task Runtimes (2016) (5)
An enhanced Co-scheduling method using reduced MS-state diagrams (1998) (5)
Exploiting Locality in Single Assignment Data Structures Updated Through Split-Phase Transactions (2001) (5)
Performance analysis of the I-structure software cache on multi-threading systems (2000) (5)
Using Multithreading for the Automatic Load Balancing of Adaptive Finite Element Meshes (1998) (5)
Strategies for improving performance and energy efficiency on a many-core (2013) (5)
An Efficient Scheme for Fine-Grain Software Pipelining (1990) (5)
P3I: the Delaware programmability, productivity and proficiency inquiry (2005) (5)
Overview of eppp - an environment for portable parallel programming (1994) (5)
StreamTMC: Stream compilation for tiled multi-core architectures (2013) (5)
Caching single-assignment structures to build a robust fine-grain multi-threading system (2000) (5)
Automatic Program Segment Similarity Detection in Targeted Program Performance Improvement (2007) (5)
Designing Programming Languages for the Analyzability of Pointer Data Structures (1993) (5)
A Pipelined Solution Method of Tridiagonal Linear Equation Systems (1986) (5)
An HTMT Performance Prediction Case Study: Implementing Cannon's Dense Matrix Multiply Algorithm (1999) (5)
Source Code Partitioning in Program Optimization (2011) (5)
Leveraging access port positions to accelerate page table walk in DWM-based main memory (2017) (5)
Open64 compiler infrastructure for emerging multicore/manycore architecture All Symposium Tutorial (2008) (5)
Dataflow Accelerator Architecture for Autonomous Machine Computing (2021) (4)
The SuperCodelet architecture (2022) (4)
On the Role of Deterministic Fine-Grain Data Synchronization for Scientific Applications: A Revisit in the Emerging Many-Core Era (2007) (4)
Computer Architecture and Parallel Systems Laboratory Optimized Lock Assignment and Allocation for Productivity : A Method for Exploiting Concurrency among Critical Sections (2006) (4)
The Design and Implementation of TIDeFlow: A Dataflow-Inspired Execution Model for Parallel Loops and Task Pipelining (2016) (4)
The Challenges of Efficient Code-Generation for Massively Parallel Architectures (2006) (4)
HAMR: A dataflow-based real-time in-memory cluster computing engine (2017) (4)
A Unified Framework for Instruction Scheduling and Mapping for Function Units with Structural Hazards (1998) (4)
A Profile-Based AI-Assisted Dynamic Scheduling Approach for Heterogeneous Architectures (2021) (4)
An Efficient Parallel Algorithm (1991) (4)
Efficient Interprocessor Synchronization/Communication on a Dataflow Multiprocessor Architecture (1992) (4)
Performance Study of a Whole Genome Comparison Tool on a Hyper-Threading Multiprocessor (2003) (4)
Exploring Financial Applications on Many-Core-on-a-Chip Architecture: A First Experiment (2006) (4)
Designing Scalable Distributed Memory Models: A Case Study (2017) (4)
Inter-procedural stacked register allocation for itanium® like architecture (2003) (4)
PDAWL: Profile-Based Iterative Dynamic Adaptive WorkLoad Balance on Heterogeneous Architectures (2020) (4)
Towards an integrated multiscale simulation of turbulent clouds on PetaScale computers (2011) (4)
Semantics of timed dataflow networks (1993) (4)
Performance Modelling and Optimization of Memory Access on Cellular Computer Architecture Cyclops64 (2005) (4)
A new approach to parallel dynamic partitioning for adaptive unstructured meshes (1999) (4)
Analysis of multithreaded architectures with distributed shared memory (1993) (4)
A Study of Parallel Betweenness Centrality Algorithm on a Manycore Architecture (2007) (4)
Computer Architecture and Parallel Systems Laboratory Dynamic Percolation-Mapping Dense Matrix Multiplication on a Many-Core Architecture (2010) (4)
Design of the Runtime System for the Portable Threaded-C Language (1998) (4)
Sequential Codelet Model of Program Execution. A Super-Codelet model based on the Hierarchical Turing Machine. (2019) (4)
Partial sampling with reverse state reconstruction: A new technique for branch predictor performance estimation (1998) (4)
DEMAC: A Modular Platform for HW-SW Co-Design (2020) (3)
Efficient data flow analysis using DJ-graphs: Elimination methods revisited (1995) (3)
Automatically partitioning threads based on remote paths (1998) (3)
A User-Friendly Methodology for Automatic Exploration of Compiler Options (2006) (3)
A comparative performance study of a fine-grain multi-threading model on distributed memory machines (2000) (3)
Beyond the data parallel paradigm: issues and options (1993) (3)
Optimal Software Pipelining Through Enumeration of Schedules (1996) (3)
ALPHA: A family of structured intermediate representations for a parallelizing C compiler (1992) (3)
Proceedings of the 12th International Workshop on High-Level Parallel Programming Models and Supportive Environments (2007) (3)
swFLOW: A Dataflow Deep Learning Framework on Sunway TaihuLight Supercomputer (2019) (3)
International Conference on Parallel Architectures and Compilation Techniques, PACT '12, Minneapolis, MN, USA - September 19 - 23, 2012 (2012) (3)
Study on the Low Power Technology of Software Pipeline (2003) (3)
Minimizing Loop Storage Allocation for An Argument-Fetching Dataflow Architecture Model (1992) (3)
Minimum Register Instruction Scheduling: A New Approach for Dynamic Instruction Issue Processors (1999) (3)
Computer Architecture and Parallel Systems Laboratory Executable Performance Model and Evaluation of High Performance Architectures with Percolation (2002) (3)
Atomic Section : Concept and Implementation (2005) (3)
Energy efficient multi-level tiling for dense matrix multiplication on many-core architecture (2015) (3)
Multigrain Parallelism: Bridging Coarse-Grain Parallel Programs and Fine-Grain Event-Driven Multithreading (2017) (3)
Computer Architecture and Parallel Systems Laboratory High Throughput Queue Algorithms (2011) (3)
Maximizing Pipelined Functional Units Usage for Minimum Power Software Pipelining (2001) (3)
CODIR: Towards an MLIR Codelet Model Dialect (2020) (3)
Coping with very High Latencies in Petaflop Computer Systems (1999) (3)
Experience of Optimizing FFT on Intel Architectures (2007) (3)
Performance of Interconnection Network in Multithreaded Architectures (1994) (2)
Automatic decomposition in EPPP compiler (1994) (2)
A strict monolithic array constructor (1990) (2)
Multiprocessor Implementation of Nondeterminate Computation in a Functional Programming Framework (1995) (2)
TiNy threads on BlueGene/P: Exploring many-core parallelisms beyond The traditional OS (2010) (2)
The Effects of Resource Limitations on Program Parallelism (1993) (2)
Parallel Architectures and Compilation Techniques, Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques, PACT'94, Montréal, Canada, 24-26 August, 1994 (1994) (2)
The threaded communication library: preliminary experiences on a multiprocessor with dual-processor nodes (1995) (2)
Enhanced Co-Scheduling: A Software Pipelining Method Using Modulo-Scheduled Pipeline Theory (2000) (2)
A framework for rate-optimal resource-constrained software pipelining (1994) (2)
A compiler framework for loop nest software-pipelining (2006) (2)
A theory for software-hardware co-scheduling for ASIPs and embedded processors (2000) (2)
Multi-dimensional Kernel Generation for Loop Nest Software Pipelining (2006) (2)
Algorithmic Aspects of Pipeline Balancing (1991) (2)
Lamport Order Revisit: a Study on How to Eeciently Achieve Sequential Consistency on a Modern Multiprocessor-on-a-chip Architecture (2006) (2)
Instruction set architecture of an efficient pipelined dataflow architecture (1989) (2)
Landing Containment Domains on SWARM: Toward a Robust Resiliency Solution on a Dynamic Adaptive Runtime Machine (2015) (2)
Dataflow software pipelining: a case study (1990) (2)
Towards Memory-Load Balanced Fast Fourier Transformations in Fine-Grain Execution Models (2013) (2)
On the Role of Deterministic Fine-Grain Data Synchronization for Scientific Applications: A Revisit in the Emerging Many-Core Era (2007) (2)
Performance Prediction for the HTMT : A Programming Example (2007) (2)
Next generation system software for future high-end computing systems (2002) (2)
A dynamically scheduled parallel DSP architecture for stream flow programming (1994) (2)
Multithreaded Execution Architecture and Compilation (1999) (2)
University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Synchronization for Dynamic Task Parallelism on Manycore Architectures (2010) (2)
Establishing Causality as a Desideratum for Memory Models and Transformations of Parallel Programs (2009) (2)
Performance Tuning of the Fast Fourier Transform on a Multi-core Architecture (2008) (2)
Demystifying Performance Predictions of Distributed FFT3D Implementations (2012) (2)
An Implementation of a Hopfield Network Kernel on EARTH (1998) (2)
Computer Architecture and Parallel Systems Laboratory An Automatic Methodology for Program Segment-based Compiler Optimization Search (2)
A Holistic Dataflow-Inspired System Design (2014) (2)
Bridging the gap between ISA compilers and silicon compilers: a challenge for future SoC design (2001) (2)
DCF: A Dataflow-Based Collaborative Filtering Training Algorithm (2018) (2)
University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Optimized Dense Matrix Multiplication on a Many-Core Architecture (2010) (2)
Efficent Multithreaded Algorithms for the Fast Fourier Transform (2002) (2)
EPPP - an integrated environment for portable parallel programming (1994) (1)
Design and Integration of New Architecture Features into a Many-Core Chip Architecture - A Report on a Novel Architecture/Software Co-Verification Platform (2010) (1)
Workshop on parallel and distributed Computing in Finance - PDCoF (2010) (1)
Leveraging compiler optimizations to reduce runtime fault recovery overhead (2017) (1)
User-Friendly Methodology for Automatic Exploration of Compiler Options: A Case Study on the Intel XScale Microarchitecture (2006) (1)
Data locality sensitivity of multithreaded computations on a distributed-memory multiprocessor (1996) (1)
Toward a Parallel Turing Machine Model (2016) (1)
Implementing a Non-Strict Functional Programming Language on a Threaded Architecture (1999) (1)
Brain-Flow : A brain inspired dataflow implementation using DEMAC (2020) (1)
Towards An Energy-Efficient Scheduler in the Codelet Model (2013) (1)
Dynamic Optimization Option Search in GCC (2014) (1)
Programming Models and Storage System for High Performance Computation with Many-Core Processors Future generation (2009) (1)
Languages and Compilers for Parallel Computing - Toc (2015) (1)
ACDT: Architected Composite Data Types trading-in unfettered data access for improved execution (2014) (1)
OPELL and PM: A Case Study on Porting Shared Memory Programming Models to Accelerators Architectures (2011) (1)
Order Free Consistency: Towards a Fully Asynchronous Memory Model (2007) (1)
Gregarious Data Re-structuring in a Many Core Architecture (2015) (1)
Compiling Issues of Monolithic Arrays (1991) (1)
An Experimental Study of an ILP-based Exact Solution Method for Software Pipelining (1995) (1)
The High Performance Open Community Runtime : Explorations on Asynchronous Many Task Runtime Systems (2016) (1)
Structured Hints : Extracting and Abstracting Domain Expertise (2009) (1)
Position Paper: Extending Codelet Model for Dataflow Software Pipelining using Software-Hardware Co-Design (2019) (1)
Java/Jini Technologies and High-Performance Pervasive Computing (2002) (1)
Toward efficient fine-grain software pipelining and the limited balancing technique (1991) (1)
Efficient Fine-Grain Synchronization on a Multi-Core Chip Architecture: A Fresh Look (2006) (1)
CARE: Overview of an Adaptive Multithreaded Architecture (2003) (1)
ASAFESSS: A Scheduler-driven Adaptive Framework for Extreme Scale Software Stacks (2014) (1)
Sustained Petaflop and Beyond: Can Parallel Computing Systems Meet The Challenges? (2005) (1)
Parallel Reconstruction for Parallel Imaging SPACERIP on Cellular Computer Architecture (2004) (1)
Efficient Dataflow Software Pipelining (1991) (1)
On the Feasibility of a Codelet Based Multi-core Operating System (2014) (1)
Code Size Oriented Memory Allocation for Temporary Variables (1)
Evaluation and choice of various brånch predictors for low-power embedded processor (2003) (1)
Position Paper: Locality-Driven Scheduling of Tasks for Data-Dependent Multithreading (2014) (1)
Performance Analysis of Multithreaded Architectures using an Integrated System Model (1996) (1)
Special Issue on DataFlow and Multithreaded Architectures - Guest Editors' Introduction (1993) (1)
Irregular Computations on Fine-Grain Multithreaded Architecture (2009) (1)
A problem formulation of assisting cache replacement by compiler (2003) (1)
Architecture and Programming Models for High Performance Intensive Computation (2016) (1)
Energy Avoiding Matrix Multiply (2016) (1)
An Elimination-Based Approach to Incremental Data Flow Analysis (1995) (1)
Exploring a Multithreaded Methodology to Implement a Network Communication Protocol on the Cyclops-64 Multithreaded Architecture (2007) (1)
Design and evaluation of a novel dataflow based bigdata solution (2015) (1)
MULTITHREADED PARALLEL IMPLEMENTATION OF HMMPFAM ON EARTH by Weirong Zhu (0)
Structured hints : extracting and abstracting domain expertise. (2009) (0)
Mapping Scheme for One-Level FORALL Expressions (1991) (0)
Maximum Pipelining Of Array Computation: A Pipelined Code Mapping Scheme For Dataflow Computers (1989) (0)
A high-speed dataflow / von memory organ ization (2002) (0)
Tile Reduction : an OpenMP Extension for Tile Aware Parallelization (2009) (0)
Architecture and Parallel Systems Laboratory Toward Efficient Fine-grained Dynamic Scheduling on Many-Core Architectures (2012) (0)
Problem Formulation 6 4 Solution Strategy 7 5 Reducing the Number of Useless Commits 9 (2007) (0)
University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory DEMAC and CODIR: A whole stack solution for a HW/SW co-design using an MLIR Codelet Model Dialect (2020) (0)
Implementation of a non-strict functional programming language V on a threaded architecture EARTH (1998) (0)
Concurrency Analysis and Its Applications (2005) (0)
The Era of Multi-core Chips -A Fresh Look on Software Challenges (2006) (0)
Register Stack and Optimal Allocation Instruction Placement Register Stack and Optimal Allocation Instruction Placement (2005) (0)
Software Pipelining for Nested Loops (1993) (0)
FAME: Financial Application with Many-core-on-a-chip architecturE (2006) (0)
Mapping Scheme for FOR-CONSTRUCT Expressions (1991) (0)
University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Extending Codelet Model for Dataflow Software Pipelining using Software-Hardware Co-design (2019) (0)
The Role of Non-strict Fine-grain Synchronization (2012) (0)
Author Rebuttal to Rocha et al. “Comments on Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks” (2015) (0)
On the Marriage of Asynchronous Many Task Runtimes and Big Data: A Glance (2020) (0)
Formalizing Causality as a Desideratum for Memory Models and Transformations of Parallel Programs (2009) (0)
An efficient incremental algorithm for maintaining dominator trees and its application to $phi$-node (1994) (0)
The HTMT Program Execution Model ( Extended (1998) (0)
The Elephant and the Mouse : Non-Strict Fine-Grain Synchronization for Many-Core Architectures (2010) (0)
From EARTH to HTMT: An Evolution of a Multiheaded Architecture Model (Abstract) (1999) (0)
I Contents 1 Introduction 1 2 Monolithic Arrays 1 3 from Macs to Double Loops: an Example 4 4 Problem Formulation and Solution Strategy 6 (2007) (0)
Preface: 6th IFIP International Conference on Network and Parallel Computing (2009) (0)
Organizing & Program Committees (2007) (0)
Towards Maximum Throughput of Dataflow Software Pipeline under Resource Constraints (2023) (0)
Code Partition and Overlays : A reintroduction to High Performance Computing (2011) (0)
Identifying Multiply-Add Operations in Kylin Compiler (2005) (0)
Parallel Turing Machine, a Proposal (2017) (0)
Memory Optimization in Codelet Execution Model on Many-core Architectures Memory Optimization in Codelet Execution Model on Many-core Architectures List of Figures (2014) (0)
Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing (2009) (0)
Mapping Scheme for Multi-Level FORALL Expressions (1991) (0)
Using Multi-threading for the Automatic Load Balancing of 2D Adaptive Finite Element Meshes (1998) (0)
Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, PACT '95, Limassol, Cyprus, June 27-29, 1995 (1994) (0)
Mapping Rules for Expressions without Array Creation Constructs (1991) (0)
A Multithreaded Runtime System For a Multiprocessor/Multinode Cluster (2003) (0)
Acceleration Using an Embedded Multi-core Architecture (2007) (0)
Summary of the workshop on frontiers in functional programming and dataflow architecture (1988) (0)
Implementation of a Correlation Algorithm on the Cyclops-64 Architecture (2009) (0)
The Static Data Flow Model (1991) (0)
Explore Be-Nice Instruction Scheduling in Open64 for an Embedded SMT Processor (2008) (0)
Computer Architecture and Parallel Systems Laboratory An Efficient Communication Infrastructure for IBM Cyclops-64 Computer System (2006) (0)
Theory of Modulo-scheduled Pipelines Theory of Modulo-scheduled Pipelines (2007) (0)
An Eecient Monolithic Array Constructor Advanced Computer Architecture and Program Structures Group (2007) (0)
New design paradigms (2001) (0)
Welcome message from the ICPP 2011 chairs (2011) (0)
Related Optimization Techniques (1991) (0)
Parallelization and performance optimization of bioinformatics and biomedical applications targeted to advanced computer architectures (2005) (0)
Towards Exascale Performance Using The Codelet Model (2012) (0)
Maximum pipelining linear recurrence on static data flow computers (2005) (0)
Topic 08+13: Instruction-Level Parallelism and Computer Architecture (2001) (0)
Can Systems Requiring Unbounded Memory Further Work (0)
ReportAn Enhanced Co-Scheduling Methodusing Reduced MS-State Diagrams (1998) (0)
New design paradigms: what needs to be standardized? (2001) (0)
Introduction to ILP workshop (1996) (0)
Author Rebuttal to Rocha et al. “Comments on Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks” (2015) (0)
Program Structure, Compilation, and Machine Design (1991) (0)
University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Multidimensional Kernel Generation for Loop Nest Software Pipelining (2006) (0)
Guest Editorial: Special issue on Network and Parallel Computing for Emerging Architectures and Applications (2021) (0)
Self-Avoiding Walks Over Adaptive Triangular Grids (1999) (0)
Optimal loop storage allocation for argument-fetching dataflow machines (1992) (0)
Editor’s Note: Special Section on Data-Flow for Multicore (2016) (0)
Final Project Report, DynAX Innovations in Programming Models, Compilers and Runtime Systems for Dynamic Adaptive Event Driven Execution Models (2015) (0)
University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Sequential Codelet Model for Parallel Execution (2019) (0)
Collective Loop Fusion for Array Contraction I Contents 1 Introduction 1 2 Program Representation 2 3 Problem Statement 4 4 a Network-flow Formulation of the Partitioning Problem 4 (1992) (0)
A Theory for Co-Scheduling Hardware and Software Pipelines in ASIPs and Embedded Processors (2002) (0)
Computer Architecture and Parallel Systems Laboratory Massively Multi-Core Systems and Virtual Memory (2014) (0)
Recursive and Iterative Multithreaded Algorithms for Pricing American Securities (2000) (0)
Costs and Benef i t s of Mul t i thread ing w i t h Off-the-Shel f RISC Processors (0)
On Parallel Models of Computation (2007) (0)
Mcgraw and Et Al. Sisal: Streams and Iteration in a Single Assignment Language| Language Reference Manual Version 1.2. Technical Report M-146 (1991) (0)
The Design and Implementation of TIDeFlow: A Dataflow-Inspired Execution Model for Parallel Loops and Task Pipelining (2015) (0)
Concurrent Execution of Heterogeneous Threads in the Super-Actor Machine (1994) (0)
Special issue on compilers, architecture, and synthesis for embedded systems (2003) (0)
Center for Programming Models for Scalable Parallel Computing: Future Programming Models (2008) (0)
Exploring a Multithreaded Methodology to Implement a Network Communication Protocol on the IBM Cyclops-64 (2007) (0)
A New Cache Protocol Based On The Order Free Consistency Memory Model (2008) (0)
Architecture and Parallel Systems Laboratory The Bene ts of Hardware-Assisted Fine-Grain Multithreading (2012) (0)
University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Algorithms , Applications , and Environments for Emerging Petascale Architectures (2006) (0)
Editorial for the special issue on innovations in supercomputing techniques (2019) (0)
Source Program Structure and Notation (1991) (0)
Madd Operation Aware Redundancy Elimination (2005) (0)
Minimizing Buuer Requirements under Rate-optimal Schedule in Regular Dataaow Networks (1994) (0)
The Feasibility of Adaptive Unstructured Computations on Petaflops Systems (2013) (0)
High Performance Computing (2003) (0)
[90] G. C. Sih and E.A. Lee, “A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous (1997) (0)
Basic Pipelined Code Mapping Schemes (1991) (0)
A Framework for Resource Aware Multithreading (2014) (0)
Hybrid Technology Multit hreaded Architecture (1996) (0)
Verification of the Extended Roofline Model for Asynchronous Many Task Runtimes (2017) (0)
University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Multigrain Parallelism: Compiling Coarse-Grain Parallel Programs for Fine-Grain Execution (2016) (0)
A pipelined code mapping scheme for tridiagonal linear equation systems (1987) (0)
Computer Architecture and Parallel Systems Laboratory FAME : Financial Application with Many-core-ona-chip architecturE (2007) (0)

This paper list is powered by the following services:

What Schools Are Affiliated With Guang Rong Gao?

Guang Rong Gao is affiliated with the following schools:

University of Delaware

Guang Rong Gao's Academic­Influence.com Rankings

Guang Rong Gao's Degrees

Why Is Guang Rong Gao Influential?

Guang Rong Gao's Published Works

Published Works

What Schools Are Affiliated With Guang Rong Gao?

Guang Rong Gao's AcademicInfluence.com Rankings