Guang Rong Gao
#160,938
Most Influential Person Now
Guang Rong Gao's AcademicInfluence.com Rankings
Guang Rong Gaobiology Degrees
Biology
#12742
World Rank
#16238
Historical Rank
Bioinformatics
#155
World Rank
#157
Historical Rank
Computational Biology
#281
World Rank
#282
Historical Rank

Download Badge
Biology
Guang Rong Gao's Degrees
- PhD Bioinformatics University of California, Santa Cruz
- Masters Bioinformatics University of California, Santa Cruz
- Bachelors Bioinformatics University of California, Santa Cruz
Why Is Guang Rong Gao Influential?
(Suggest an Edit or Addition)Guang Rong Gao's Published Works
Number of citations in a given year to any of this author's works
Total number of citations to an author for the works they published in a given year. This highlights publication of the most important work(s) by the author
Published Works
- TROLL-Tandem Repeat Occurrence Locator (2002) (213)
- Dynamic load balancing on single- and multi-GPU systems (2010) (151)
- A novel framework of register allocation for software pipelining (1993) (123)
- Using a "codelet" program execution model for exascale machines: position paper (2011) (122)
- Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform (2007) (121)
- A design study of the EARTH multiprocessor (1995) (115)
- Location Consistency-A New Memory Model and Cache Consistency Protocol (2000) (112)
- Software pipelining showdown: optimal vs. heuristic methods in a production compiler (1996) (111)
- Advances in the dataflow computational model (1999) (108)
- Designing the McCAT Compiler Based on a Family of Structured Intermediate Representations (1992) (104)
- A Register Allocation Framework Based on Hierarchical Cyclic Interval Graphs (1992) (100)
- Identifying loops using DJ graphs (1996) (100)
- Minimizing register requirements under resource-constrained rate-optimal software pipelining (1994) (98)
- Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures (2007) (96)
- Earth: an efficient architecture for running threads (1999) (93)
- On The Limits Of Program Parallelism And Its Smoothability (1992) (93)
- A linear time algorithm for placing φ-nodes (1995) (92)
- Polling Watchdog: Combining Polling and Interrupts for Efficient Message Handling (1996) (88)
- Exploiting short-lived variables in superscalar processors (1995) (86)
- An efficient pipelined dataflow processor architecture (1988) (80)
- Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks (2002) (74)
- TiNy threads: a thread virtual machine for the Cyclops64 cellular architecture (2005) (73)
- Well-behaved dataflow programs for DSP computation (1992) (73)
- ParalleX: A Study of A New Parallel Computation Model (2007) (73)
- FAST : A Functionally Accurate Simulation Toolset for the Cyclops 64 Cellular Architecture (2005) (70)
- Single-dimension software pipelining for multi-dimensional loops (2004) (69)
- A parallel dynamic programming algorithm on a multi-core architecture (2007) (68)
- A Framework for Resource-Constrained Rate-Optimal Software Pipelining (1994) (67)
- An Implementation of the Codelet Model (2013) (66)
- A Multithreaded Parallel Implementation of a Dynamic Programming Algorithm for Sequence Comparison (2000) (66)
- Optimizing the Fast Fourier Transform on a Multi-core Architecture (2007) (65)
- TERAFLUX: Harnessing dataflow in next generation teradevices (2014) (63)
- Minimizing memory requirements in rate-optimal schedules (1994) (61)
- Optimization of array accesses by collective loop transformations (1991) (59)
- Advanced topics in dataflow computing and multithreading (1994) (59)
- Modeling the Weather with a Data Flow Supercomputer (1984) (58)
- Mapping the FDTD Application to Many-Core Chip Architectures (2009) (56)
- Compiling C for the EARTH multithreaded architecture (1996) (55)
- A Polynomial Time Method for Optimal Software Pipelining (1992) (54)
- Building multithreaded architectures with off-the-shelf microprocessors (1994) (54)
- Scheduling and mapping: software pipelining in the presence of structural hazards (1995) (54)
- Hybrid technology multithreaded architecture (1996) (53)
- A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture (2006) (52)
- Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture (2006) (51)
- A timed Petri-net model for fine-grain loop scheduling (1991) (51)
- ABC++: Concurrency by Inheritance in C++ (1995) (50)
- Parts that add up to a whole : a framework for the analysis of tables (2007) (48)
- Multithreaded Architectures: Principles, Projects, and Issues (1994) (47)
- Multithreaded algorithms for pricing a class of complex options (2001) (42)
- Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures (2003) (42)
- Costs and Benefits of Multithreading with Off-the-Shelf RISC Processors (1995) (41)
- The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices (2013) (38)
- Incremental computation of dominator trees (1995) (37)
- Self‐avoiding walks over adaptive unstructured grids (2000) (37)
- On achieving balanced power consumption in software pipelined loops (2002) (36)
- Maximum Pipelining of Array Operations on Static Data Flow Machine (1983) (36)
- A comparative study of multiprocessor list scheduling heuristics (1994) (35)
- An energy efficient TLB design methodology (2005) (34)
- Processing In Memory: Chips to Petaflops (1997) (34)
- Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences (2006) (33)
- Analysis of multithreaded multiprocessors with distributed shared memory (1993) (33)
- Locality Optimization of Stencil Applications Using Data Dependency Graphs (2010) (32)
- Multithreaded Computer Architecture (1994) (32)
- Speculative Prefetching of Induction Pointers (2001) (32)
- A code mapping scheme for dataflow software pipelining (1990) (31)
- Mapping the LU decomposition on a many-core architecture: challenges and solutions (2009) (31)
- Minimal register requirements under resource-constrained software pipelining (1994) (30)
- Improving Performance of Dynamic Programming via Parallelism and Locality on Multicore Architectures (2009) (30)
- TIDeFlow: The Time Iterated Dependency Flow Execution Model (2011) (30)
- Landing openMP on cyclops-64: an efficient mapping of openMP to a many-core system-on-a-chip (2006) (30)
- Rate-optimal schedule for multi-rate DSP computations (1995) (30)
- Designing programming languages for analyzability: a fresh look at pointer data structures (1992) (29)
- Well-Behaved Programs for DSP Computation (1992) (29)
- A design framework for hybrid-access caches (1995) (29)
- Optimized Dense Matrix Multiplication on a Many-Core Architecture (2010) (28)
- Location consistency: stepping beyond the barriers of memory coherence and serializability (1993) (28)
- Software-Pipelining on Multi-Core Architectures (2007) (28)
- A new framework for exhaustive and incremental data flow analysis using DJ graphs (1996) (28)
- Register allocation using cyclic interval graphs: a new approach to an old problem (1992) (27)
- An Efficient Hybrid Dataflow Architecture Modle (1993) (27)
- A novel framework for multi-rate scheduling in DSP applications (1993) (27)
- Concurrency Analysis for Shared Memory Programs with Textually Unaligned Barriers (2007) (27)
- Elastic history buffer: a low-cost method to improve branch prediction accuracy (1997) (26)
- Heap analysis and optimizations for threaded programs (1997) (26)
- Automatic data and computation decomposition for distributed memory machines (1995) (24)
- Locality Analysis for Distributed Shared-Memory Multiprocessors (1996) (24)
- Thread partitioning and scheduling based on cost model (1997) (24)
- Minimum Lock Assignment: A Method for Exploiting Concurrency among Critical Sections (2008) (23)
- Self-Avoiding Walks over Adaptive Unstructured Grids (1999) (23)
- Analysis and performance results of computing betweenness centrality on IBM Cyclops64 (2009) (23)
- Overview of the Threaded-C Language (1998) (22)
- A pipelined code mapping scheme for static data flow computers (1986) (22)
- Exploring Fine-Grained Task-Based Execution on Multi-GPU Systems (2011) (22)
- HTMT program execution model (2002) (22)
- Algorithmic Aspects of Balancing Techniques for Pipelined Data Flow Code Generation (1989) (22)
- Software Pipelining for Stream Programs on Resource Constrained Multicore Architectures (2012) (22)
- Power and Energy Impact by Loop Transformations (2000) (22)
- Code generation for single-dimension software pipelining of multi-dimensional loops (2004) (21)
- Automatically Partitioning Threads for Multithreaded Architectures (1999) (21)
- Design of an Efficient Dataflow Architecture without Data Flow (1988) (21)
- Register allocation for software pipelined multi-dimensional loops (2005) (20)
- Energy efficient tiling on a Many-Core Architecture (2011) (20)
- Experiments with the Fresh Breeze tree-based memory model (2011) (20)
- Exploiting Schedule Slacks for Rate-Optimal Power-Minimum Software Pipelining (2002) (20)
- A new framework for elimination-based data flow analysis using DJ graphs (1998) (20)
- Whole Genome Alignment using a Multithreaded Parallel Implementation (2001) (19)
- A register pressure sensitive instruction scheduler for dynamic issue processors (1997) (18)
- Measurement and modeling of EARTH-MANNA multithreaded architecture (1996) (18)
- Speculative execution and branch prediction on parallel machines (1993) (18)
- Load adaptive algorithms and implementations for the 2D discrete wavelet transform on fine-grain multithreaded architectures (1999) (18)
- On memory models and cache management for shared-memory multiprocessors (1995) (18)
- Position Paper: Using a "Codelet" Program Execution Model for Exascale Machines (2011) (17)
- A Study of a Software Cache Implementation of the OpenMP Memory Model for Multicore and Manycore Architectures (2010) (17)
- Multithreaded algorithms for the fast Fourier transform (2000) (17)
- DIMES: an iterative emulation platform for Multiprocessor-System-On-Chip designs (2003) (17)
- Toward high-throughput algorithms on many-core architectures (2012) (17)
- Landing CG on EARTH: A Case Study of Fine-Grained Multithreading on an Evolutionary Path (2000) (17)
- Parallel FEM Simulation of Crack Propagation - Challenges, Status, and Perspectives (2000) (17)
- Optimal Modulo Scheduling Through Enumeration (1998) (16)
- A Novel High-Speed Memory Organization for Fine-Grain Multi-Thread Computing (1991) (16)
- Developing a Communication Intensive Application on the EARTH Multithreaded Architecture (Distinguished Paper) (2000) (16)
- Experience on optimizing irregular computation for memory hierarchy in manycore architecture (2008) (16)
- Towards an Efficient Hybrid Dataflow Architecture Model (1991) (16)
- Location Consistency: Stepping Beyond the Memory Coherence Barrier (1995) (16)
- Minimum register instruction sequence problem: revisiting optimal code generation for DAGs (2001) (15)
- Compiler-Assisted Cache Replacement: Problem Formulation and Performance Evaluation (2003) (15)
- An executable analytical performance evaluation approach for early performance prediction (2003) (15)
- Parallel function invocation in a dynamic argument-fetching dataflow architecture (1990) (15)
- Implementing parallel hmm-pfam on the EARTH multithreaded architecture (2003) (15)
- Compiling for dataflow software pipelining (1990) (15)
- E.T.: Re-Thinking Self-Attention for Transformer Models on GPUs (2021) (15)
- Performance Characteristics of OpenMP Language Constructs on a Many-core-on-a-chip Architecture (2005) (15)
- Dynamic percolation: a case of study on the shortcomings of traditional optimization in many-core architectures (2012) (14)
- Tile Percolation: An OpenMP Tile Aware Parallelization Technique for the Cyclops-64 Multicore Processor (2009) (14)
- On the Importance of an End-To-End View of Memory Consistency in Future Computer Systems (1997) (14)
- Computing phi-nodes in linear time using DJ graphs (1995) (14)
- A Discussion in Favor of Dynamic Scheduling for Regular Applications in Many-core Architectures (2012) (14)
- A high-speed memory organization for hybrid dataflow / von Neumann computing (1992) (13)
- How “hard” is thread partitioning and how “bad” is a list scheduling based partitioning algorithm? (1998) (13)
- Co-scheduling hardware and software pipelines (1996) (13)
- A fine-grain load-adaptive algorithm of the 2D discrete wavelet transform for multithreaded architectures (2004) (13)
- Performance portability on EARTH: a case study across several parallel architectures (2005) (13)
- A Maximally Pipelined Tridiagonal Linear Equation Solver (1986) (12)
- Optimized lock assignment and allocation: a method for exploiting concurrency among critical sections (2007) (12)
- Implementing parallel conjugate gradient on the EARTH multithreaded architecture (2004) (12)
- Pipelining-Dovetailing: A Transformation to Enhance Software Pipelining for Nested Loops (1996) (12)
- Locality aware concurrent start for stencil applications (2015) (12)
- A Study of the EARTH-MANNA Multithreaded System (1996) (12)
- Application characterization at scale: lessons learned from developing a distributed open community runtime system for high performance computing (2016) (12)
- Determinacy and Repeatability of Parallel Program Schemata (2012) (12)
- Experiences with non-numeric applications on multithreaded architectures (1997) (11)
- A Novel Methodology Using Genetic Algorithms for the Design of Caches and Cache Replacement Policy (1993) (11)
- Toward a Self-aware System for Exascale Architectures (2013) (11)
- DJ-graphs and their application to flow graph analyses (1994) (11)
- Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture (2013) (11)
- Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading (2014) (11)
- Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era (2007) (11)
- Minimizing communication in rate-optimal software pipelining for stream programs (2010) (11)
- A Kahn Principle for Networks of Nonmonotonic Real-time Processes (1993) (11)
- Parallel Turing Machine, a Proposal (2017) (11)
- Hardware and Software Tradeoffs for Task Synchronization on Manycore Architectures (2011) (10)
- Loop Storage Optimization for Dataflow Machines (1991) (10)
- Towards a Portable Parallel Programming Environment (1992) (10)
- A cluster-based solution for high performance hmmpfam using EARTH execution model (2003) (10)
- A Dataflow Programming Language and its Compiler for Streaming Systems (2014) (10)
- A New Framework for Analysis and Optimization of Shared Memory Parallel Programs (2005) (10)
- Supporting a Dynamic Spmd Model in a Multi-threaded Architecture (1993) (10)
- Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP (2009) (9)
- Automatic Locality Exploitation in the Codelet Model (2013) (9)
- Implementation and evaluation of a communication intensive application on the EARTH multithreaded system (2002) (9)
- Sequential Consistency Revisit: The Sufficient Condition and Method to Reason the Consistency Model of a Multiprocessor-on-a-Chip Architecture (2005) (9)
- Programming models and system software for future high-end computing systems: work-in-progress (2003) (9)
- Dynamic Load Balancers for a Multithreaded Multiprocessor System (2001) (9)
- Design and Evaluation of Dynamic Load Balancing Schemes under a Fine-grain Multithreaded Execution Model (1997) (9)
- Data parallelism with high performance C (1994) (9)
- swFLOW: A large-scale distributed framework for deep learning on Sunway TaihuLight supercomputer (2021) (9)
- Towards efficient fine-grain software pipelining (1990) (9)
- FreshBreeze: A Data Flow Approach for Meeting DDDAS Challenges (2015) (9)
- A Refinement of the HTMT Program Execution Model (1998) (9)
- Incremental computation of dominator trees (1997) (8)
- Optimizing the LU Benchmark for the Cyclops-64 Architecture (2009) (8)
- Toward a Self-Aware Codelet Execution Model (2014) (8)
- Iterative layer-based raytracing on CUDA (2009) (8)
- Implementation of the EARTH programming model on SMP clusters: a multi‐threaded language and runtime system (2003) (8)
- The elephant and the mice: the role of non-strict fine-grain synchronization for modern many-core architectures (2011) (8)
- A New Fast Algorithm for Optimal Register Allocation in Modulo Scheduled Loops (1998) (8)
- Efficient support of concurrent threads in a hybrid dataflow/von Neumann architecture (1991) (8)
- Extending Software Pipelining Techniques for Scheduling Nested Loops (1993) (8)
- Analyzable Atomic Sections: Integrating Fine-grained Synchronization and Weak Consistency Models for Scalable Parallelism (2006) (8)
- Register Pressure in Software-Pipelined Loop Nests: Fast Computation and Impact on Architecture Design (2005) (8)
- Automatic compiler techniques for thread coarsening for multithreaded architectures (2000) (8)
- Power-performance trade-offs for energy-efficient architectures: A quantitative study (2002) (8)
- Performance analysis of Cooley-Tukey FFT algorithms for a many-core architecture (2010) (8)
- Improving power efficiency with compiler-assisted cache replacement (2005) (8)
- Self-avoiding Walks over Two-dimensional Adaptive Unstructured Grids (1998) (8)
- Hierarchical multithreading: programming model and system software (2006) (8)
- A Design Frame for Hybrid Access Cashes (1995) (8)
- Polytasks: A Compressed Task Representation for HPC Runtimes (2011) (8)
- Performance evaluation of latency tolerant architectures (1992) (8)
- Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture (2008) (8)
- Exploiting fine-grain parallelism on dataflow architectures (1990) (7)
- DEEP: an iterative fpga-based many-core emulation system for chip verification and architecture research (2011) (7)
- Programming Experience on Cyclops-64 Multi-Core Chip Architecture (7)
- Fine-Grain Stacked Register Allocation for the Itanium Architecture (2002) (7)
- Exploitation of locality for energy efficiency for breadth first search in fine-grain execution models (2013) (7)
- Massively parallel breadth first search using a tree-structured memory model (2012) (7)
- Asynchronous Runtimes in Action: An Introspective Framework for a Next Gen Runtime (2016) (7)
- A stability classification method and its application to pipelined solution of linear recurrences (1987) (7)
- FTL: a multithreaded environment for parallel computation (1994) (7)
- Compiling several classes of communication patterns on a multithreaded architecture (2002) (7)
- Quantitive studies of data-locality sensitivity on the EARTH multithreaded architecture: preliminary results (1996) (6)
- Generating Fine-Grain Multithreaded Applications Using a Multigrain Approach (2017) (6)
- The Importance of Efficient Fine-Grain Synchronization for Many-Core Systems (2016) (6)
- Visualizing biosequence data using texture mapping (2002) (6)
- The Fresh Breeze Program Execution Model (2011) (6)
- Register allocation for software pipelined multidimensional loops (2008) (6)
- An efficient parallel algorithm for all pairs examination (1991) (6)
- If-Conversion in SSA Form (2004) (6)
- Diamond Tiling: A Tiling Framework for Time-iterated Scientic Applications (2009) (6)
- Efficient State-Diagram Construction Methods for Software Pipelining (1999) (6)
- Supporting a dynamic SPMD in a multi-threaded architecture (1993) (6)
- Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (2001) (6)
- Design and Implementation of an Efficient Thread Partitioning Algorithm (2000) (6)
- A dynamic schema to increase performance in many-core architectures through percolation operations (2013) (6)
- Power-aware compilation techniques for high performance processors (2004) (6)
- Guest Editors Introduction: Special Issue on OpenMP (2008) (6)
- Compiling for multithreaded architectures (2000) (6)
- Performance modeling and analysis of multithreaded architectures (1996) (6)
- The Multi-Threaded Architecture Multiprocessor (1994) (6)
- Latency tolerance: a metric for performance analysis of multithreaded architectures (1997) (5)
- Superconducting processors for HTMT: issues and challenges (1999) (5)
- Extending the Roofline Model for Asynchronous Many-Task Runtimes (2016) (5)
- An enhanced Co-scheduling method using reduced MS-state diagrams (1998) (5)
- Exploiting Locality in Single Assignment Data Structures Updated Through Split-Phase Transactions (2001) (5)
- Performance analysis of the I-structure software cache on multi-threading systems (2000) (5)
- Using Multithreading for the Automatic Load Balancing of Adaptive Finite Element Meshes (1998) (5)
- Strategies for improving performance and energy efficiency on a many-core (2013) (5)
- An Efficient Scheme for Fine-Grain Software Pipelining (1990) (5)
- P3I: the Delaware programmability, productivity and proficiency inquiry (2005) (5)
- Overview of eppp - an environment for portable parallel programming (1994) (5)
- StreamTMC: Stream compilation for tiled multi-core architectures (2013) (5)
- Caching single-assignment structures to build a robust fine-grain multi-threading system (2000) (5)
- Automatic Program Segment Similarity Detection in Targeted Program Performance Improvement (2007) (5)
- Designing Programming Languages for the Analyzability of Pointer Data Structures (1993) (5)
- A Pipelined Solution Method of Tridiagonal Linear Equation Systems (1986) (5)
- An HTMT Performance Prediction Case Study: Implementing Cannon's Dense Matrix Multiply Algorithm (1999) (5)
- Source Code Partitioning in Program Optimization (2011) (5)
- Leveraging access port positions to accelerate page table walk in DWM-based main memory (2017) (5)
- Open64 compiler infrastructure for emerging multicore/manycore architecture All Symposium Tutorial (2008) (5)
- Dataflow Accelerator Architecture for Autonomous Machine Computing (2021) (4)
- The SuperCodelet architecture (2022) (4)
- On the Role of Deterministic Fine-Grain Data Synchronization for Scientific Applications: A Revisit in the Emerging Many-Core Era (2007) (4)
- Computer Architecture and Parallel Systems Laboratory Optimized Lock Assignment and Allocation for Productivity : A Method for Exploiting Concurrency among Critical Sections (2006) (4)
- The Design and Implementation of TIDeFlow: A Dataflow-Inspired Execution Model for Parallel Loops and Task Pipelining (2016) (4)
- The Challenges of Efficient Code-Generation for Massively Parallel Architectures (2006) (4)
- HAMR: A dataflow-based real-time in-memory cluster computing engine (2017) (4)
- A Unified Framework for Instruction Scheduling and Mapping for Function Units with Structural Hazards (1998) (4)
- A Profile-Based AI-Assisted Dynamic Scheduling Approach for Heterogeneous Architectures (2021) (4)
- An Efficient Parallel Algorithm (1991) (4)
- Efficient Interprocessor Synchronization/Communication on a Dataflow Multiprocessor Architecture (1992) (4)
- Performance Study of a Whole Genome Comparison Tool on a Hyper-Threading Multiprocessor (2003) (4)
- Exploring Financial Applications on Many-Core-on-a-Chip Architecture: A First Experiment (2006) (4)
- Designing Scalable Distributed Memory Models: A Case Study (2017) (4)
- Inter-procedural stacked register allocation for itanium® like architecture (2003) (4)
- PDAWL: Profile-Based Iterative Dynamic Adaptive WorkLoad Balance on Heterogeneous Architectures (2020) (4)
- Towards an integrated multiscale simulation of turbulent clouds on PetaScale computers (2011) (4)
- Semantics of timed dataflow networks (1993) (4)
- Performance Modelling and Optimization of Memory Access on Cellular Computer Architecture Cyclops64 (2005) (4)
- A new approach to parallel dynamic partitioning for adaptive unstructured meshes (1999) (4)
- Analysis of multithreaded architectures with distributed shared memory (1993) (4)
- A Study of Parallel Betweenness Centrality Algorithm on a Manycore Architecture (2007) (4)
- Computer Architecture and Parallel Systems Laboratory Dynamic Percolation-Mapping Dense Matrix Multiplication on a Many-Core Architecture (2010) (4)
- Design of the Runtime System for the Portable Threaded-C Language (1998) (4)
- Sequential Codelet Model of Program Execution. A Super-Codelet model based on the Hierarchical Turing Machine. (2019) (4)
- Partial sampling with reverse state reconstruction: A new technique for branch predictor performance estimation (1998) (4)
- DEMAC: A Modular Platform for HW-SW Co-Design (2020) (3)
- Efficient data flow analysis using DJ-graphs: Elimination methods revisited (1995) (3)
- Automatically partitioning threads based on remote paths (1998) (3)
- A User-Friendly Methodology for Automatic Exploration of Compiler Options (2006) (3)
- A comparative performance study of a fine-grain multi-threading model on distributed memory machines (2000) (3)
- Beyond the data parallel paradigm: issues and options (1993) (3)
- Optimal Software Pipelining Through Enumeration of Schedules (1996) (3)
- ALPHA: A family of structured intermediate representations for a parallelizing C compiler (1992) (3)
- Proceedings of the 12th International Workshop on High-Level Parallel Programming Models and Supportive Environments (2007) (3)
- swFLOW: A Dataflow Deep Learning Framework on Sunway TaihuLight Supercomputer (2019) (3)
- International Conference on Parallel Architectures and Compilation Techniques, PACT '12, Minneapolis, MN, USA - September 19 - 23, 2012 (2012) (3)
- Study on the Low Power Technology of Software Pipeline (2003) (3)
- Minimizing Loop Storage Allocation for An Argument-Fetching Dataflow Architecture Model (1992) (3)
- Minimum Register Instruction Scheduling: A New Approach for Dynamic Instruction Issue Processors (1999) (3)
- Computer Architecture and Parallel Systems Laboratory Executable Performance Model and Evaluation of High Performance Architectures with Percolation (2002) (3)
- Atomic Section : Concept and Implementation (2005) (3)
- Energy efficient multi-level tiling for dense matrix multiplication on many-core architecture (2015) (3)
- Multigrain Parallelism: Bridging Coarse-Grain Parallel Programs and Fine-Grain Event-Driven Multithreading (2017) (3)
- Computer Architecture and Parallel Systems Laboratory High Throughput Queue Algorithms (2011) (3)
- Maximizing Pipelined Functional Units Usage for Minimum Power Software Pipelining (2001) (3)
- CODIR: Towards an MLIR Codelet Model Dialect (2020) (3)
- Coping with very High Latencies in Petaflop Computer Systems (1999) (3)
- Experience of Optimizing FFT on Intel Architectures (2007) (3)
- Performance of Interconnection Network in Multithreaded Architectures (1994) (2)
- Automatic decomposition in EPPP compiler (1994) (2)
- A strict monolithic array constructor (1990) (2)
- Multiprocessor Implementation of Nondeterminate Computation in a Functional Programming Framework (1995) (2)
- TiNy threads on BlueGene/P: Exploring many-core parallelisms beyond The traditional OS (2010) (2)
- The Effects of Resource Limitations on Program Parallelism (1993) (2)
- Parallel Architectures and Compilation Techniques, Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques, PACT'94, Montréal, Canada, 24-26 August, 1994 (1994) (2)
- The threaded communication library: preliminary experiences on a multiprocessor with dual-processor nodes (1995) (2)
- Enhanced Co-Scheduling: A Software Pipelining Method Using Modulo-Scheduled Pipeline Theory (2000) (2)
- A framework for rate-optimal resource-constrained software pipelining (1994) (2)
- A compiler framework for loop nest software-pipelining (2006) (2)
- A theory for software-hardware co-scheduling for ASIPs and embedded processors (2000) (2)
- Multi-dimensional Kernel Generation for Loop Nest Software Pipelining (2006) (2)
- Algorithmic Aspects of Pipeline Balancing (1991) (2)
- Lamport Order Revisit: a Study on How to Eeciently Achieve Sequential Consistency on a Modern Multiprocessor-on-a-chip Architecture (2006) (2)
- Instruction set architecture of an efficient pipelined dataflow architecture (1989) (2)
- Landing Containment Domains on SWARM: Toward a Robust Resiliency Solution on a Dynamic Adaptive Runtime Machine (2015) (2)
- Dataflow software pipelining: a case study (1990) (2)
- Towards Memory-Load Balanced Fast Fourier Transformations in Fine-Grain Execution Models (2013) (2)
- On the Role of Deterministic Fine-Grain Data Synchronization for Scientific Applications: A Revisit in the Emerging Many-Core Era (2007) (2)
- Performance Prediction for the HTMT : A Programming Example (2007) (2)
- Next generation system software for future high-end computing systems (2002) (2)
- A dynamically scheduled parallel DSP architecture for stream flow programming (1994) (2)
- Multithreaded Execution Architecture and Compilation (1999) (2)
- University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Synchronization for Dynamic Task Parallelism on Manycore Architectures (2010) (2)
- Establishing Causality as a Desideratum for Memory Models and Transformations of Parallel Programs (2009) (2)
- Performance Tuning of the Fast Fourier Transform on a Multi-core Architecture (2008) (2)
- Demystifying Performance Predictions of Distributed FFT3D Implementations (2012) (2)
- An Implementation of a Hopfield Network Kernel on EARTH (1998) (2)
- Computer Architecture and Parallel Systems Laboratory An Automatic Methodology for Program Segment-based Compiler Optimization Search (2)
- A Holistic Dataflow-Inspired System Design (2014) (2)
- Bridging the gap between ISA compilers and silicon compilers: a challenge for future SoC design (2001) (2)
- DCF: A Dataflow-Based Collaborative Filtering Training Algorithm (2018) (2)
- University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Optimized Dense Matrix Multiplication on a Many-Core Architecture (2010) (2)
- Efficent Multithreaded Algorithms for the Fast Fourier Transform (2002) (2)
- EPPP - an integrated environment for portable parallel programming (1994) (1)
- Design and Integration of New Architecture Features into a Many-Core Chip Architecture - A Report on a Novel Architecture/Software Co-Verification Platform (2010) (1)
- Workshop on parallel and distributed Computing in Finance - PDCoF (2010) (1)
- Leveraging compiler optimizations to reduce runtime fault recovery overhead (2017) (1)
- User-Friendly Methodology for Automatic Exploration of Compiler Options: A Case Study on the Intel XScale Microarchitecture (2006) (1)
- Data locality sensitivity of multithreaded computations on a distributed-memory multiprocessor (1996) (1)
- Toward a Parallel Turing Machine Model (2016) (1)
- Implementing a Non-Strict Functional Programming Language on a Threaded Architecture (1999) (1)
- Brain-Flow : A brain inspired dataflow implementation using DEMAC (2020) (1)
- Towards An Energy-Efficient Scheduler in the Codelet Model (2013) (1)
- Dynamic Optimization Option Search in GCC (2014) (1)
- Programming Models and Storage System for High Performance Computation with Many-Core Processors Future generation (2009) (1)
- Languages and Compilers for Parallel Computing - Toc (2015) (1)
- ACDT: Architected Composite Data Types trading-in unfettered data access for improved execution (2014) (1)
- OPELL and PM: A Case Study on Porting Shared Memory Programming Models to Accelerators Architectures (2011) (1)
- Order Free Consistency: Towards a Fully Asynchronous Memory Model (2007) (1)
- Gregarious Data Re-structuring in a Many Core Architecture (2015) (1)
- Compiling Issues of Monolithic Arrays (1991) (1)
- An Experimental Study of an ILP-based Exact Solution Method for Software Pipelining (1995) (1)
- The High Performance Open Community Runtime : Explorations on Asynchronous Many Task Runtime Systems (2016) (1)
- Structured Hints : Extracting and Abstracting Domain Expertise (2009) (1)
- Position Paper: Extending Codelet Model for Dataflow Software Pipelining using Software-Hardware Co-Design (2019) (1)
- Java/Jini Technologies and High-Performance Pervasive Computing (2002) (1)
- Toward efficient fine-grain software pipelining and the limited balancing technique (1991) (1)
- Efficient Fine-Grain Synchronization on a Multi-Core Chip Architecture: A Fresh Look (2006) (1)
- CARE: Overview of an Adaptive Multithreaded Architecture (2003) (1)
- ASAFESSS: A Scheduler-driven Adaptive Framework for Extreme Scale Software Stacks (2014) (1)
- Sustained Petaflop and Beyond: Can Parallel Computing Systems Meet The Challenges? (2005) (1)
- Parallel Reconstruction for Parallel Imaging SPACERIP on Cellular Computer Architecture (2004) (1)
- Efficient Dataflow Software Pipelining (1991) (1)
- On the Feasibility of a Codelet Based Multi-core Operating System (2014) (1)
- Code Size Oriented Memory Allocation for Temporary Variables (1)
- Evaluation and choice of various brånch predictors for low-power embedded processor (2003) (1)
- Position Paper: Locality-Driven Scheduling of Tasks for Data-Dependent Multithreading (2014) (1)
- Performance Analysis of Multithreaded Architectures using an Integrated System Model (1996) (1)
- Special Issue on DataFlow and Multithreaded Architectures - Guest Editors' Introduction (1993) (1)
- Irregular Computations on Fine-Grain Multithreaded Architecture (2009) (1)
- A problem formulation of assisting cache replacement by compiler (2003) (1)
- Architecture and Programming Models for High Performance Intensive Computation (2016) (1)
- Energy Avoiding Matrix Multiply (2016) (1)
- An Elimination-Based Approach to Incremental Data Flow Analysis (1995) (1)
- Exploring a Multithreaded Methodology to Implement a Network Communication Protocol on the Cyclops-64 Multithreaded Architecture (2007) (1)
- Design and evaluation of a novel dataflow based bigdata solution (2015) (1)
- MULTITHREADED PARALLEL IMPLEMENTATION OF HMMPFAM ON EARTH by Weirong Zhu (0)
- Structured hints : extracting and abstracting domain expertise. (2009) (0)
- Mapping Scheme for One-Level FORALL Expressions (1991) (0)
- Maximum Pipelining Of Array Computation: A Pipelined Code Mapping Scheme For Dataflow Computers (1989) (0)
- A high-speed dataflow / von memory organ ization (2002) (0)
- Tile Reduction : an OpenMP Extension for Tile Aware Parallelization (2009) (0)
- Architecture and Parallel Systems Laboratory Toward Efficient Fine-grained Dynamic Scheduling on Many-Core Architectures (2012) (0)
- Problem Formulation 6 4 Solution Strategy 7 5 Reducing the Number of Useless Commits 9 (2007) (0)
- University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory DEMAC and CODIR: A whole stack solution for a HW/SW co-design using an MLIR Codelet Model Dialect (2020) (0)
- Implementation of a non-strict functional programming language V on a threaded architecture EARTH (1998) (0)
- Concurrency Analysis and Its Applications (2005) (0)
- The Era of Multi-core Chips -A Fresh Look on Software Challenges (2006) (0)
- Register Stack and Optimal Allocation Instruction Placement Register Stack and Optimal Allocation Instruction Placement (2005) (0)
- Software Pipelining for Nested Loops (1993) (0)
- FAME: Financial Application with Many-core-on-a-chip architecturE (2006) (0)
- Mapping Scheme for FOR-CONSTRUCT Expressions (1991) (0)
- University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Extending Codelet Model for Dataflow Software Pipelining using Software-Hardware Co-design (2019) (0)
- The Role of Non-strict Fine-grain Synchronization (2012) (0)
- Author Rebuttal to Rocha et al. “Comments on Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks” (2015) (0)
- On the Marriage of Asynchronous Many Task Runtimes and Big Data: A Glance (2020) (0)
- Formalizing Causality as a Desideratum for Memory Models and Transformations of Parallel Programs (2009) (0)
- An efficient incremental algorithm for maintaining dominator trees and its application to $phi$-node (1994) (0)
- The HTMT Program Execution Model ( Extended (1998) (0)
- The Elephant and the Mouse : Non-Strict Fine-Grain Synchronization for Many-Core Architectures (2010) (0)
- From EARTH to HTMT: An Evolution of a Multiheaded Architecture Model (Abstract) (1999) (0)
- I Contents 1 Introduction 1 2 Monolithic Arrays 1 3 from Macs to Double Loops: an Example 4 4 Problem Formulation and Solution Strategy 6 (2007) (0)
- Preface: 6th IFIP International Conference on Network and Parallel Computing (2009) (0)
- Organizing & Program Committees (2007) (0)
- Towards Maximum Throughput of Dataflow Software Pipeline under Resource Constraints (2023) (0)
- Code Partition and Overlays : A reintroduction to High Performance Computing (2011) (0)
- Identifying Multiply-Add Operations in Kylin Compiler (2005) (0)
- Parallel Turing Machine, a Proposal (2017) (0)
- Memory Optimization in Codelet Execution Model on Many-core Architectures Memory Optimization in Codelet Execution Model on Many-core Architectures List of Figures (2014) (0)
- Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing (2009) (0)
- Mapping Scheme for Multi-Level FORALL Expressions (1991) (0)
- Using Multi-threading for the Automatic Load Balancing of 2D Adaptive Finite Element Meshes (1998) (0)
- Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, PACT '95, Limassol, Cyprus, June 27-29, 1995 (1994) (0)
- Mapping Rules for Expressions without Array Creation Constructs (1991) (0)
- A Multithreaded Runtime System For a Multiprocessor/Multinode Cluster (2003) (0)
- Acceleration Using an Embedded Multi-core Architecture (2007) (0)
- Summary of the workshop on frontiers in functional programming and dataflow architecture (1988) (0)
- Implementation of a Correlation Algorithm on the Cyclops-64 Architecture (2009) (0)
- The Static Data Flow Model (1991) (0)
- Explore Be-Nice Instruction Scheduling in Open64 for an Embedded SMT Processor (2008) (0)
- Computer Architecture and Parallel Systems Laboratory An Efficient Communication Infrastructure for IBM Cyclops-64 Computer System (2006) (0)
- Theory of Modulo-scheduled Pipelines Theory of Modulo-scheduled Pipelines (2007) (0)
- An Eecient Monolithic Array Constructor Advanced Computer Architecture and Program Structures Group (2007) (0)
- New design paradigms (2001) (0)
- Welcome message from the ICPP 2011 chairs (2011) (0)
- Related Optimization Techniques (1991) (0)
- Parallelization and performance optimization of bioinformatics and biomedical applications targeted to advanced computer architectures (2005) (0)
- Towards Exascale Performance Using The Codelet Model (2012) (0)
- Maximum pipelining linear recurrence on static data flow computers (2005) (0)
- Topic 08+13: Instruction-Level Parallelism and Computer Architecture (2001) (0)
- Can Systems Requiring Unbounded Memory Further Work (0)
- ReportAn Enhanced Co-Scheduling Methodusing Reduced MS-State Diagrams (1998) (0)
- New design paradigms: what needs to be standardized? (2001) (0)
- Introduction to ILP workshop (1996) (0)
- Author Rebuttal to Rocha et al. “Comments on Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks” (2015) (0)
- Program Structure, Compilation, and Machine Design (1991) (0)
- University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Multidimensional Kernel Generation for Loop Nest Software Pipelining (2006) (0)
- Guest Editorial: Special issue on Network and Parallel Computing for Emerging Architectures and Applications (2021) (0)
- Self-Avoiding Walks Over Adaptive Triangular Grids (1999) (0)
- Optimal loop storage allocation for argument-fetching dataflow machines (1992) (0)
- Editor’s Note: Special Section on Data-Flow for Multicore (2016) (0)
- Final Project Report, DynAX Innovations in Programming Models, Compilers and Runtime Systems for Dynamic Adaptive Event Driven Execution Models (2015) (0)
- University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Sequential Codelet Model for Parallel Execution (2019) (0)
- Collective Loop Fusion for Array Contraction I Contents 1 Introduction 1 2 Program Representation 2 3 Problem Statement 4 4 a Network-flow Formulation of the Partitioning Problem 4 (1992) (0)
- A Theory for Co-Scheduling Hardware and Software Pipelines in ASIPs and Embedded Processors (2002) (0)
- Computer Architecture and Parallel Systems Laboratory Massively Multi-Core Systems and Virtual Memory (2014) (0)
- Recursive and Iterative Multithreaded Algorithms for Pricing American Securities (2000) (0)
- Costs and Benef i t s of Mul t i thread ing w i t h Off-the-Shel f RISC Processors (0)
- On Parallel Models of Computation (2007) (0)
- Mcgraw and Et Al. Sisal: Streams and Iteration in a Single Assignment Language| Language Reference Manual Version 1.2. Technical Report M-146 (1991) (0)
- The Design and Implementation of TIDeFlow: A Dataflow-Inspired Execution Model for Parallel Loops and Task Pipelining (2015) (0)
- Concurrent Execution of Heterogeneous Threads in the Super-Actor Machine (1994) (0)
- Special issue on compilers, architecture, and synthesis for embedded systems (2003) (0)
- Center for Programming Models for Scalable Parallel Computing: Future Programming Models (2008) (0)
- Exploring a Multithreaded Methodology to Implement a Network Communication Protocol on the IBM Cyclops-64 (2007) (0)
- A New Cache Protocol Based On The Order Free Consistency Memory Model (2008) (0)
- Architecture and Parallel Systems Laboratory The Bene ts of Hardware-Assisted Fine-Grain Multithreading (2012) (0)
- University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Algorithms , Applications , and Environments for Emerging Petascale Architectures (2006) (0)
- Editorial for the special issue on innovations in supercomputing techniques (2019) (0)
- Source Program Structure and Notation (1991) (0)
- Madd Operation Aware Redundancy Elimination (2005) (0)
- Minimizing Buuer Requirements under Rate-optimal Schedule in Regular Dataaow Networks (1994) (0)
- The Feasibility of Adaptive Unstructured Computations on Petaflops Systems (2013) (0)
- High Performance Computing (2003) (0)
- [90] G. C. Sih and E.A. Lee, “A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous (1997) (0)
- Basic Pipelined Code Mapping Schemes (1991) (0)
- A Framework for Resource Aware Multithreading (2014) (0)
- Hybrid Technology Multit hreaded Architecture (1996) (0)
- Verification of the Extended Roofline Model for Asynchronous Many Task Runtimes (2017) (0)
- University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Multigrain Parallelism: Compiling Coarse-Grain Parallel Programs for Fine-Grain Execution (2016) (0)
- A pipelined code mapping scheme for tridiagonal linear equation systems (1987) (0)
- Computer Architecture and Parallel Systems Laboratory FAME : Financial Application with Many-core-ona-chip architecturE (2007) (0)
This paper list is powered by the following services:
What Schools Are Affiliated With Guang Rong Gao?
Guang Rong Gao is affiliated with the following schools: