AbstractsComputer Science

Optimizing Hierarchical Storage Management For Database System

by Xin Liu




Institution: University of Waterloo
Department:
Year: 2014
Keywords: Database, Storage management, Cache management, Hybrid storage, SSD
Record ID: 2055658
Full text PDF: http://hdl.handle.net/10012/8531


Abstract

Caching is a classical but effective way to improve system performance. To improve system performance, servers, such as database servers and storage servers, contain significant amounts of memory that act as a fast cache. Meanwhile, as new storage devices such as flash-based solid state drives (SSDs) are added to storage systems over time, using the memory cache is not the only way to improve system performance. In this thesis, we address the problems of how to manage the cache of a storage server and how to utilize the SSD in a hybrid storage system. Traditional caching policies are known to perform poorly for storage server caches. One promising approach to solving this problem is to use hints from the storage clients to manage the storage server cache. Previous hinting approaches are ad hoc, in that a predefined reaction to specific types of hints is hard-coded into the caching policy. With ad hoc approaches, it is difficult to ensure that the best hints are being used, and it is difficult to accommodate multiple types of hints and multiple client applications. In this thesis, we propose CLient-Informed Caching (CLIC), a generic hint-based technique for managing storage server caches. CLIC automatically interprets hints generated by storage clients and translates them into a server caching policy. It does this without explicit knowledge of the application-specific hint semantics. We demonstrate using trace-based simulation of database workloads that CLIC outperforms hint-oblivious and state-of-the-art hint-aware caching policies. We also demonstrate that the space required to track and interpret hints is small. SSDs are becoming a part of the storage system. Adding SSD to a storage system not only raises the question of how to manage the SSD, but also raises the question of whether current buffer pool algorithms will still work effectively. We are interested in the use of hybrid storage systems, consisting of SSDs and hard disk drives (HDD), for database management. We present cost-aware replacement algorithms for both the DBMS buffer pool and the SSD. These algorithms are aware of the different I/O performance of HDD and SSD. In such a hybrid storage system, the physical access pattern to the SSD depends on the management of the DBMS buffer pool. We studied the impact of the buffer pool caching policies on the access patterns of the SSD. Based on these studies, we designed a caching policy to effectively manage the SSD. We implemented these algorithms in MySQL's InnoDB storage engine and used the TPC-C workload to demonstrate that these cost-aware algorithms outperform previous algorithms.