ML p(r)ior | Heterogeneous-Reliability Memory: Exploiting Application-Level Memory Error Tolerance

Heterogeneous-Reliability Memory: Exploiting Application-Level Memory Error Tolerance

2016-02-01
This paper summarizes our work on characterizing application memory error vulnerability to optimize datacenter cost via Heterogeneous-Reliability Memory (HRM), which was published in DSN 2014, and examines the work's significance and future potential. Memory devices represent a key component of datacenter total cost of ownership (TCO), and techniques used to reduce errors that occur on these devices increase this cost. Existing approaches to providing reliability for memory devices pessimistically treat all data as equally vulnerable to memory errors. Our key insight is that there exists a diverse spectrum of tolerance to memory errors in new data-intensive applications, and that traditional one-size-fits-all memory reliability techniques are inefficient in terms of cost. This presents an opportunity to greatly reduce server hardware cost by provisioning the right amount of memory reliability for different applications. Toward this end, in our DSN 2014 paper, we make three main contributions to enable highly-reliable servers at low datacenter cost. First, we develop a new methodology to quantify the tolerance of applications to memory errors. Second, using our methodology, we perform a case study of three new data-intensive workloads (an interactive web search application, an in-memory key--value store, and a graph mining framework) to identify new insights into the nature of application memory error vulnerability. Third, based on our insights, we propose several new hardware/software heterogeneous-reliability memory system designs to lower datacenter cost while achieving high reliability and discuss their trade-offs. We show that our new techniques can reduce server hardware cost by 4.7% while achieving 99.90% single server availability.
PDF

Highlights - Most important sentences from the article

Login to like/save this paper, take notes and configure your recommendations

Related Articles

2019-05-12
1905.04767 | cs.DB

Near-Data Processing refers to an architectural hardware and software paradigm, based on the co-loca… show more
PDF

Highlights - Most important sentences from the article

2018-01-20

With emerging storage-class memory (SCM) nearing commercialization, there is evidence that it will d… show more
PDF

Highlights - Most important sentences from the article

2019-03-10

Today's systems are overwhelmingly designed to move data to computation. This design choice goes dir… show more
PDF

Highlights - Most important sentences from the article

2019-05-02

Modern computing systems suffer from the dichotomy between computation on one side, which is perform… show more
PDF

Highlights - Most important sentences from the article

2019-02-20

It has become increasingly difficult to understand the complex interaction between modern applicatio… show more
PDF

Highlights - Most important sentences from the article

2018-11-26

In this paper, we present benchmark data for Intel Memory Drive Technology (IMDT), which is a new ge… show more
PDF

Highlights - Most important sentences from the article

2019-03-06

GPUs offer orders-of-magnitude higher memory bandwidth than traditional CPU-only systems. However, G… show more
PDF

Highlights - Most important sentences from the article

2019-03-13

Scalable nonvolatile memory DIMMs will finally be commercially available with the release of the Int… show more
PDF

Highlights - Most important sentences from the article

2018-08-12
1808.04016 | cs.AR

Raw bit errors are common in NAND flash memory and will increase in the future. These errors reduce … show more
PDF

Highlights - Most important sentences from the article

2018-12-16
1812.06377 | cs.AR

DRAM-based main memories have read operations that destroy the read data, and as a result, must buff… show more
PDF

Highlights - Most important sentences from the article

2019-03-25
1903.11056 | cs.CR

We will discuss the RowHammer problem in DRAM, which is a prime (and likely the first) example of ho… show more
PDF

Highlights - Most important sentences from the article

2019-01-01
1901.03401 | cs.DC

The workloads running in the modern data centers of large scale Internet service providers (such as … show more
PDF

Highlights - Most important sentences from the article

2018-08-01

Unified Virtual Memory (UVM) was recently introduced on recent NVIDIA GPUs. Through software and har… show more
PDF

Highlights - Most important sentences from the article

2019-02-09
1902.03518 | cs.CR

DRAM-based main memory and its associated components increasingly account for a significant portion … show more
PDF

Highlights - Most important sentences from the article

2018-05-22
1805.08332 | cs.DC

As the cost-per-byte of storage systems dramatically decreases, SSDs are finding their ways in emerg… show more
PDF

Highlights - Most important sentences from the article