Project Proposal for 15-740/18-740: Computer Architecture

Exploiting thread Motion on a CMP with private L1 Caches

Group Member : Athula Balachandran (abalacha@cs.cmu.edu)
Lavanya Subramanian (lsubrama@andrew.cmu.edu)
Project Home Page: http://www.cs.cmu.edu/~abalacha/15740/15740_project.html

Introduction:
Dynamic Voltage Frequency Scaling (DVFS) is a traditional technique that is used to exploit run time variability and conserve power with minimum performance degradation. DVFS is typically employed at the OS scheduler intervals. However, recent research shows that applications' variability behaviour is more fine-grained and cannot be exploited effectively, by performing DVFS at OS scheduler intervals. Employing DVFS at fine-grained intervals imposes a huge delay overhead for the regulator voltage level transitions and is practically impossible, with off-chip regulators. In this light, [1] proposes a scheme, where the different cores are assigned different voltage/performance levels and can be used based on the applications' performance requirements. The authors call this mechanism Thread Motion. Our project looks into the challenges/bottlenecks in applying this to a generic chip multiprocessor.

Project Description:
In [1], the authors employ an architecture similar to the Sun ROCK processor. This architecture groups processors into clusters and they share an L1 cache. Migrations that are performed within a cluster do not suffer the impact of missing L1 cache data. However, in most Chip Multiprocessor Systems, each processor has a private L1 cache. So, we aim at exploring the effectiveness of the "Thread motion" scheme in this scenario. Specifically, we would like to quantify the performance degradation, that would result from the L1 misses, when migration is performed. We observe that the concept of intra and inter clusters does not apply in this scenario.
Migration to a far-off core would also result in increased L2 access latency. We plan to also fine tune the migration algorithm/strategy to minimize this.

75% Goal : Preliminary Evaluation with Thread Motion Manager implemented
100% Goal : Thorough Performance evaluation with/without migration with private L1 architecture
125% Goal : Algorithm fine tuning and evaluations for L2 access latency minimization

Related Work:
[1] looks at migration, at fine grained intervals as described above.Previous work does do migration either at OS intervals, [2] for process variation-aware application mapping combined with DVFS and [3] during thermal hotspots/emergencies. Apart from [1], there isn't any work to our knowledge, that looks at migration at finer-grained intervals than the OS scheduling interval.

Resources:

Bless simulator for simulating the Chip Multi Processor
SPEC 2006 benchmarks for evaluation
Our own PCs and laptops for carrying out the work

Schedule:

Week 1 : Understanding the BLESS CMP simulator. Identifying changes to BLESS to build a Thread Motion Manager
Week 2, 3 : Implementing and testing the Thread Motion Manager in BLESS
Week 4 : Preliminary Evaluation to compare performance with and without migration
Week 5 : Detailed Throughput evaluations to study and quantify performance degradation with migration
Week 6 : Fine tune algorithms to minimize L2 access latency

We both will be working on the design and implemention of the thread motion manager in the simulator and the evaluation process. Once we are done with the design, we may suitably modularize and divide the implementation work between the two of us.

Milestone: Preliminary Evaluation with Thread Motion Manager implemented

Getting Started: We have gone through the existing literature in this area. We have also collected some of the resources required for the project.

References:

[1] Rangan, K. K., Wei, G., and Brooks, D.
Thread motion: fine-grained power management for multi-core systems.
In Proceedings of the 36th Annual international Symposium on Computer Architecture(ISCA) '09.
[2] Teodorescu, R. and Torrellas, J.
Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors.
In Proceedings of the 35th Annual International Symposium on Computer Architecture(ISCA) '08.
[3] Coskun, A. K., Strong, R., Tullsen, D. M., and Simunic Rosing, T.
Evaluating the impact of job scheduling and power management on processor lifetime for chip multiprocessors.
In Proceedings of the Eleventh international Joint Conference on Measurement and Modeling of Computer Systems(SIGMETRICS) '08.