AbstractsComputer Science

Efficient hardware/software co-designed schemes for low-power processors

by Pedro López Muñoz




Institution: Universitat Politècnica de Catalunya
Department:
Year: 2014
Record ID: 1126876
Full text PDF: http://hdl.handle.net/10803/144619


Abstract

Nowadays, we are reaching a point where further improving single thread performance can only be done at the expenses of significantly increasing power consumption. Thus, multi-core chips have been adopted by the industry and the scientific community as a proven solution to improve performance with limited power consumption. However, the number of units to be integrated into a single die is limited by its area and power restrictions, and therefore the thread level parallelism (TLP) that could be exploited is also limited. One way to continue incrementing the number of core units is to reduce the complexity of each individual core at the cost of sacrificing instruction level parallelism (ILP). We face a design trade-off here: to dedicate the total available die area to put a lot of simple cores and favor TLP or to dedicate it to put fewer cores and favor ILP. Among the different solutions already studied in the literature to deal with this challenge, we selected hybrid hardware/software co-designed processors. This solution provides high single thread performance on simple low-power cores through a software dynamic binary optimizer tightly coupled with the hardware underneath. For this reason, we believe that hardware/software co-designed processors is an area that deserves special attention on the design of multi-core systems since it allows implementing multiple simple cores suitable to maximize TLP but sustaining better ILP than conventional pure hardware approaches. In particular, this thesis explores three different techniques to address some of the most relevant challenges on the design of a simple low-power hardware/software co-designed processor. The first technique is a profiling mechanism, named as LIU Profiler, able to detect hot code regions. It consists in a small hardware table that uses a novel replacement policy aimed at detecting hot code. Such simple hardware structure implements this mechanism and allows the software to apply heuristics when building code regions and applying optimizations. The LIU Profiler achieves 85.5% code coverage detection whereas similar profilers implementing traditional replacement policies reach up to 60% coverage requiring a 4x bigger table. Moreover, the LIU Profiler only increases by 1% the total area of a simple low-power processor and consumes less than 0.87% of the total processor power. The LIU Profiler enables improving single thread performance without significantly incrementing the area and power of the processor. The second technique is a rollback scheme aimed to support code reordering and aggressive speculative optimizations on hot code regions. It is named HRC and combines software and hardware mechanisms to checkpoint and to recover the architectural register state of the processor. When compared with pure hardware solutions that require doubling the number of registers, the proposal reduces by 11% the area of the processor and by 24.4% the register file power consumption, at the cost of only degrading 1% the performance. The third technique is a loop…