ML p(r)ior | A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps

A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps

2016-02-03
Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available off-chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive. This work describes the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate different bottlenecks in GPU execution. CABA provides flexible mechanisms to automatically generate "assist warps" that execute on GPU cores to perform specific tasks that can improve GPU performance and efficiency. CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, e.g., by using assist warps to perform data compression to transfer less data from memory. Conversely, the same framework can be employed to handle cases where the GPU is bottlenecked by the available computational units, in which case the memory pipelines are idle and can be used by CABA to speed up computation, e.g., by performing memoization using assist warps. We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck. Our extensive evaluations show that CABA, when used to implement data compression, provides an average performance improvement of 41.7% (as high as 2.6X) across a variety of memory-bandwidth-sensitive GPGPU applications.
PDF

Highlights - Most important sentences from the article

Login to like/save this paper, take notes and configure your recommendations

Related Articles

2019-05-21

The last decade has seen a shift in the computer systems industry where heterogeneous computing has … show more
PDF

Highlights - Most important sentences from the article

2019-03-10

Today's systems are overwhelmingly designed to move data to computation. This design choice goes dir… show more
PDF

Highlights - Most important sentences from the article

2019-04-10
1904.04953 | cs.AR

The rapid progress and advancement in electronic chips technology provide a variety of new implement… show more
PDF

Highlights - Most important sentences from the article

2019-03-18

In 2019, the rapid rate at which GPU manufacturers refresh their designs, coupled with their relucta… show more
PDF

Highlights - Most important sentences from the article

2019-04-24

Given its high integration density, high speed, byte addressability, and low standby power, non-vola… show more
PDF

Highlights - Most important sentences from the article

2019-04-07

We introduce BriskStream, an in-memory data stream processing system (DSPSs) specifically designed f… show more
PDF

Highlights - Most important sentences from the article

2018-10-15

The rapidly growing popularity and scale of data-parallel workloads demand a corresponding increase … show more
PDF

Highlights - Most important sentences from the article

2019-03-06

GPUs offer orders-of-magnitude higher memory bandwidth than traditional CPU-only systems. However, G… show more
PDF

Highlights - Most important sentences from the article

2018-02-11

Unstructured-mesh based numerical algorithms such as finite volume and finite element algorithms for… show more
PDF

Highlights - Most important sentences from the article

2019-03-13

Scalable nonvolatile memory DIMMs will finally be commercially available with the release of the Int… show more
PDF

Highlights - Most important sentences from the article

2018-10-30

Heterogeneous systems appear as a viable design alternative for the dark silicon era. In this paradi… show more
PDF

Highlights - Most important sentences from the article

2019-06-05

Recent rapid strides in memory safety tools and hardware have improved software quality and security… show more
PDF

Highlights - Most important sentences from the article

2018-10-08
1810.04201 | cs.DC

In this paper we describe a single-node, double precision FPGA implementation of the Conjugate Gradi… show more
PDF

Highlights - Most important sentences from the article

2019-02-13

This paper presents a novel, high-performance, graphical processing unit-based algorithm for efficie… show more
PDF

Highlights - Most important sentences from the article

2018-12-11
1812.04514 | cs.AR

Modern societies have developed insatiable demands for more computation capabilities. Exploiting imp… show more
PDF

Highlights - Most important sentences from the article