|Institution:||University of Michigan|
|Department:||Computer Science & Engineering|
|Keywords:||Chip Multiprocessor Cache Memory Systems; Computer Science; Electrical Engineering; Engineering|
|Full text PDF:||http://hdl.handle.net/2027.42/64727|
Chip multiprocessors (CMPs) have become virtually ubiquitous due to the increasing impact of power and thermal constraints being placed on processor design, as well as the diminishing returns of building ever more complex uniprocessors. While the number of cores on a chip has increased rapidly, changes to other aspects of system design have been slower in coming. Namely, the on-chip memory hierarchy has largely been unchanged despite the shift to multicore designs. The last level of cache on chip is generally shared by all hardware threads on the chip, creating a ripe environment for resource allocation problems. This dissertation examines cache resource allocation in large-scale chip multiprocessors. It begins by performing extensive supporting research in the area of shared cache metric analysis, concluding that there is no single optimal shared cache design metric. This result supports the idea that shared caches ought not explicitly attempt to achieve optimal partitions; rather they should only react when unfavorable performance is detected. This study is followed by some studies using machine learning analysis to extract salient characteristics in predicting poor shared cache performance. The culmination of this dissertation is a shared cache management framework called SLAM (Scalable, Lightweight, Adaptive Management). SLAM is a scalable and feasible mechanism which detects inefficiency of cache usage by hardware threads sharing the cache. An inefficient thread can be easily punished by effectively reducing its cache occupancy via a modified cache insertion policy. The crux of SLAM is the detection of inefficient threads, which relies on two novel performance monitors in the cache which stem from the results of the machine learning studies: the Misses Per Access Counter (MPAC), and the Relative Insertion Tracker (RIT), which each requires only tens of bits in storage per thread. SLAM not only provides a means for extracting significant performance gains over current cache designs (up to 13.1% improvement), but SLAM also provides a means for granting differentiated quality of service to various cache sharers. Particularly as commercialized virtual servers become increasingly common, being able to provide differentiated quality of service at low cost potentially has significant value.