AbstractsComputer Science

Record and deterministic replay of parallel programs on multiprocessors

by Nima Honarmand

Institution: University of Illinois – Urbana-Champaign
Department: 0112
Degree: PhD
Year: 2015
Keywords: Record and Replay
Record ID: 2060624
Full text PDF: http://hdl.handle.net/2142/72886


Record and deterministic Replay (RnR) is a primitive with many proposed applications in computer systems, including debugging, security and fault tolerance. RnR is typically a two phase process: in the first phase (record) enough information about an execution is logged which is then used in the second phase (replay) to re-create the execution. Application-level RnR seeks to record and replay single programs (or sets of programs) in isolation from the rest of the system. In this environment, there are typically two sources of non-determinism that an RnR solution should capture: program inputs (such as the results of the system calls the program makes to the OS or the signals the program receives) and the memory-access interleaving of concurrent threads that result in inter-thread data dependences. In order to enjoy wide acceptance, RnR solutions should be practical to build and, at the same time, enable a diverse range of use-cases (such as debugging and security analysis). Low recording overhead is a key requirement for many use cases of RnR. While software can often be used to record program inputs with low overhead, it can incur significant overhead to record memory-access interleaving. To reduce this overhead, hardware-assisted RnR techniques have been proposed. The main challenge here is devising hardware mechanisms that are simple enough to be embraced by processor vendors and, at the same time, powerful enough to work for the complex architectures of today. The first part of this thesis is a step in this direction???i.e., building practical and low overhead hardware-assisted RnR systems. We focus on the hardware-assisted RnR of parallel programs on multiprocessor machines. Firstly, we introduce QuickRec, the first physical realization of a hardware-assisted RnR system including new hardware and software. The focus of this project is understanding and evaluating the implementation issues of RnR on a real platform. We demonstrate that RnR can be implemented efficiently on a real multicore Intel Architecture (IA) system. We show that the rate of memory log generation is insignificant, and that the recording hardware has negligible performance overhead, as expected. The evaluations however point to the software stack as the major source of overhead (incurring an average recording overhead of nearly 13%), an issue that was largely ignored by previous work on hardware-assisted RnR. We then address the problem of replay speed by introducing Cyrus, an RnR scheme that can record programs and replay them in parallel without making any changes to the cache coherence protocol and messages. The proposal uses a novel hybrid hardware/software mechanism for recording memory access interleaving. The hardware component records a raw and incomplete log that is then processed and transformed into a complete log by an on-the-fly software Backend Pass. As the raw log is being generated, this pass transforms it for high replay parallelism. This can also flexibly trade-off replay parallelism for log size. We evaluate Cyrus…