AbstractsComputer Science

Physically Dense Server Architectures.

by Anthony Thomas Gutierrez




Institution: University of Michigan
Department: Computer Science and Engineering
Degree: PhD
Year: 2015
Keywords: Distributed Systems; Computer Science; Engineering
Record ID: 2059143
Full text PDF: http://hdl.handle.net/2027.42/111414


Abstract

Distributed, in-memory key-value stores have emerged as one of today's most important data center workloads. Being critical for the scalability of modern web services, vast resources are dedicated to key-value stores in order to ensure that quality of service guarantees are met. These resources include: many server racks to store terabytes of key-value data, the power necessary to run all of the machines, networking equipment and bandwidth, and the data center warehouses used to house the racks. There is, however, a mismatch between the key-value store software and the commodity servers on which it is run, leading to inefficient use of resources. The primary cause of inefficiency is the overhead incurred from processing individual network packets, which typically carry small payloads, and require minimal compute resources. Thus, one of the key challenges as we enter the exascale era is how to best adjust to the paradigm shift from compute-centric to storage-centric data centers. This dissertation presents a hardware/software solution that addresses the inefficiency issues present in the modern data centers on which key-value stores are currently deployed. First, it proposes two physical server designs, both of which use 3D-stacking technology and low-power CPUs to improve density and efficiency. The first 3D architecture – Mercury – consists of stacks of low-power CPUs with 3D-stacked DRAM. The second architecture – Iridium – replaces DRAM with 3D NAND Flash to improve density. The second portion of this dissertation proposes and enhanced version of the Mercury server design – called KeyVault – that incorporates integrated, zero-copy network interfaces along with an integrated switching fabric. In order to utilize the integrated networking hardware, as well as reduce the response time of requests, a custom networking protocol is proposed. Unlike prior works on accelerating key-value stores – e.g., by completely bypassing the CPU and OS when processing requests – this work only bypasses the CPU and OS when placing network payloads into a process' memory. The insight behind this is that because most of the overhead comes from processing packets in the OS kernel – and not the request processing itself – direct placement of packet's payload is sufficient to provide higher throughput and lower latency than prior approaches.