AbstractsComputer Science

Techniques for low overhead fences and sequential consistency violation recording

by Yue Lu Duan




Institution: University of Illinois – Urbana-Champaign
Department: 0112
Degree: PhD
Year: 2015
Keywords: Keyword 1
Record ID: 2061816
Full text PDF: http://hdl.handle.net/2142/72747


Abstract

Fences are instructions that programmers or compilers insert in the code to prevent the compiler or the hardware from reordering memory accesses [20, 43]. Fences can be expensive because all of the accesses before the fence have to be finished (i.e., the loads have to be retired and the writes drained from the write buffer) before any access after the fence can be observed by any other processor. This thesis seeks to reduce the fence overhead in relaxed-consistency machines. It first introduces the WeeFence, a fence that is very cheap because it allows post-fence accesses to skip it. Such accesses can typically complete and retire before the pre-fence writes have drained from the write buffer. Only when an incorrect reordering of accesses is about to happen, does the hardware stall to prevent it. WeeFence presents implementation difficulties due to its reliance on global state and structures. This thesis then introduces the Unbalanced Fence, which can optimize both the performance and the implementability of fences. Unbalanced Fence starts off with a design like WeeFence but without the global state, which is called Weak Fence. Since the concurrent execution of multiple Weak Fences induces deadlock, a Weak Fence is combined with the use of a conventional fence (i.e., Strong Fence) for the less performance-critical threads. The result is called Unbalanced fence groups. Unbalanced fences are substantially easier to implement than WeeFence, yet deliver comparable or higher performance. For programs without sufficient fences, Sequential Consistency Violations (SCV) can occur and cause programs to malfunction and are hard to debug. While there are proposals for detecting and recording SCVs, they are limited in that they end program execution after detecting the first SCV because the program is now non-SC. Therefore, they cannot be used in production runs. In addition, such proposals rely on expensive hardware. To address this problem, this thesis introduces the SCtame, an architecture for SCV detection and recording that operates non-stop. SCtame re-uses part of the techniques of WeeFence and Unbalanced Fence to detect SCVs. SCtame operates continuously because, after SCV detection and logging, it recovers and resumes execution while retaining SC. Hence, it can be used in production runs. In addition, SCtame is precise in that it identifies only true SCVs??? rather than dependence cycles due to false sharing. Finally, SCtame???s hardware is not too costly because it is mostly local to each processor, and uses known recovery techniques.