中南大学学报(英文版)

J. Cent. South Univ. (2012) 19: 193-199

DOI: 10.1007/s11771-012-0991-8

Dynamic thermal management by greedy scheduling algorithm

QU Shuang-xi(屈双喜), ZHANG Min-xuan(张民选), LIU Guang-hui(刘光辉), LIU Tao(刘涛)

School of Computer Science, National University of Defense Technology, Changsha 410073, China

? Central South University Press and Springer-Verlag Berlin Heidelberg 2012

Abstract:

Chip multiprocessors (CMPs) allow thread level parallelism, thus increasing performance. However, this comes with the cost of temperature problem. CMPs require more power, creating non uniform power map and hotspots. Aiming at this problem, a thread scheduling algorithm, the greedy scheduling algorithm, was proposed to reduce the thermal emergencies and to improve the throughput. The greedy scheduling algorithm was implemented in the Linux kernel on Intel’s Quad-Core system. The experimental results show that the greedy scheduling algorithm can reduce 9.6%-78.5% of the hardware dynamic thermal management (DTM) in various combinations of workloads, and has an average of 5.2% and up to 9.7% throughput higher than the Linux standard scheduler.

Key words:

greedy scheduling algorithm; chip multiprocessor; thermal-aware

1 Introduction

Chip multiprocessors (CMPs) have already been employed as the main trend in new generation processors. However, diminution in device geometry coupled with a high level of integration and power dissipation have led to high levels of power density and temperature [1]. High temperatures reduce the reliability of the chip and significantly impact its performance. Moreover, leakage power increases exponentially with operating temperature. Increasing leakage power can further raise the temperature, resulting in a thermal runaway [2]. The immense spatial and temporal variation of chip temperature also creates great challenges to cooling and packaging which, for the sake of cost-effectiveness [3], are designed for typical thermal condition. So, leveraging dynamic thermal managements (DTM) to regulate chip temperature at runtime is necessary.

Many hardware- and software-based dynamic thermal management (DTM) techniques have been proposed. The hardware-based DTM (HDTM) techniques, such as dynamic voltage and frequency scaling (DVFS) and clock gating which have been applied in the modern processors nowadays, start to control the temperature after the current temperature reaches the critical temperature threshold [2, 4-9]. However, the benefit of such class of techniques is limited since they usually cause high performance overhead. In addition to the hardware level techniques, the software-based DTM (SDTM) techniques, such as thread migration, leverage the temperature variations between different jobs, and swap them at an appropriate time to control the chip temperature [10-22]. This work continues this direction of research.

In this work, a greedy scheduling algorithm (GSA), a thread scheduling algorithm on CMP, is proposed to minimize the performance degradation from the DTM thermal emergencies (the peak temperature of the core overshoots the threshold) and to improve the reliability, with trivial overhead introduced. GSA is based on the observation that, more heat will be dissipated from the processor if the processor reaches a high temperature level earlier, because the heat dissipation rate is proportional to the distance between the processor and the environment. Thus, our objective can be achieved by letting the hot job (the job which will induce high temperature) to be executed earlier. So, GSA selects the hottest job to be executed at each scheduling interval, as long as the hot job itself does not induce thermal emergency.

To the best of our knowledge, there exists no prior work that proposed the similar algorithm and implemented in a real Quad-Core system. The primary contributions of this work can be summarized as:      1) A new thread scheduling algorithm, GSA, is introduced; 2) GSA is implemented for Intel Quad-Core Q8300 processors systems; 3) Most importantly, there is no additional hardware unit required for this thermal-aware scheduling algorithm which is applicable to any multicore environment in real-world CMP products seamlessly.

2 Related work

Many advanced DTM studies [12-13, 16-17, 23-25] have been conducted to provide thermal fairness and reduce the peak temperature through thermal-aware thread migration schemes.

In Ref. [12], a suite of DTM techniques, job migration policies, and control granularity are jointly investigated to achieve the maximum chip throughput. Thermal management through workload scheduling has been studied in various scenarios. In CMPs, the “Heat-and-Run” technique performs thread assignment and migration to balance the chip temperature at runtime [16]. Although their works use performance counter- based information to determine migration, this information cannot be a direct representation of thermal behavior.

In Ref. [17], YANG et al develop a heuristic scheduling algorithm, ThreshHot, to alleviate the thermal pressure of a processor. ThreshHot is based on the observation that, given two jobs, one hot and one cool, executing the hot job before the cool one results in a lower final temperature than after the reversed order. However, this work targets the single core architecture. Moreover, it is difficult to prove whether the heuristic scheduling algorithm is local optimization algorithm in a CMP.

A temperature-aware scheduler based on thermal behavior grouping in multicore systems is proposed in Ref. [25]. Tss (temperature of steady state) value is used as a classification factor to classify applications according to the thermal behavior. The proposed temperature-aware scheduler finds a core which takes the longest time to reach a temperature threshold instead of the coolest core for migrations. However, they did not take into account that the temperature is not only relative to the job running on this core, but also to the job which prepares to migrate in.

HybDTM, a methodology for fine-grained, coordinated thermal management using both software (priority scheduling) and hardware (clock gating) techniques, is proposed [13]. In order to estimate the temperature, a regression-based thermal model is proposed based on hardware performance counters. However, HybDTM cannot effectively reduce overheat temperature without noticeable performance overhead. This is because the real temperature cannot be estimated solely by hardware performance counter, and both of priority scheduling and clock gating will introduce high performance overhead.

In Ref. [24], the thread with less task size tends to be migrated easier than other threads to reduce the performance overhead caused by migration, but the task size does not necessarily mean the memory usage, especially the cache usage in the core. Moreover, the difference of migration cost between tasks with distinct memory usages is only within 10 ms, and migration cost mainly comes from the task suspension and resumption. Most of these prior works above are based on simulated results, and neglect the thermal-correlation between cores. The power dissipated by the rest of the chip is assumed to be negligible.

3 Greedy scheduling algorithm

The processors must slow down or even stop when they are too hot, which will reduce the throughput and utilization, increase the response time, thus negatively affecting its performance. Maximizing throughput and CPU utilization are the main objectives of GSA. The OS periodically interrupts the job execution and determines if different jobs should be swapped in and, if so, which one. By modifying the Linux scheduling algorithm, the thermal awareness thread scheduling can be implemented. In every scheduling interval, GSA selects the job anticipating such that a selection would lead to the least overall amount of thermal emergencies.

3.1 Principle

To keep the temperature not exceeding the threshold, the conventional thermal-aware scheduling algorithm migrates the thread running on the hot core to the cold core, allowing the hot core to cool down. When the processor does not run in a harsh thermal environment and there are a certain number of cold threads, this approach can reduce the average temperature and the number of thermal emergencies indeed. However, when the processors run in relatively harsh environment or there are fewer cold threads, this method may achieve good results only at the beginning of the execution. When all of the cold threads are finished and there are only hot threads waiting for scheduling, it will inevitably induce thermal emergencies frequently. A sketch map to illustrate this problem is shown in Fig. 1.

When the temperature of some cores is close to the threshold, the traditional scheduling strategy will migrate the hot thread out of the core, allowing the hot core to be cooled down rapidly. This scheduling strategy may induce thermal emergencies frequently when there are only a small number of cold threads. The main reason for this phenomenon is that the traditional thermal-aware scheduling algorithm does not take full advantage of the temperature ‘slack’ of processor (Here, slack is defined as the temperature difference between the threshold and core temperature). Slack can be considered a ‘cool resource’. The traditional scheduling algorithm wastes too much slack at the beginning, resulting in the slack shortage when it is needed.

Fig. 1 Sketch map of temperature response using conventional thermal-aware scheduling and GSA

Different from the traditional scheduling algorithm, GSA selects a hottest thread to run from the beginning, as long as there are no thermal emergencies, and preserves the cold thread for future consideration. This policy saves the slack as much as possible and spends it only when it is really needed.

3.2 Theoretical analysis

There exists a well-known duality between heat transfer and electrical phenomena. This duality provides a convenient basis for an architecture-level thermal model [1]:

                              (1)

where T is the temperature relative to the ambient air, R and C are the effective vertical thermal resistance and capacity of chip, respectively, t is the time and P is the power consumption.

When both sides of Eq. (1) is multiplied by dt,   and then is integrated from t0 to t2, Eq. (1) can be expressed as

                   (2)

The right of Eq. (2) is energy E. Separate time t to two parts: (t0, t1) and (t1, t2), t1∈(t0, t2). Then Eq. (2) can be written as

                 (3)

where S1 and S2 represent the areas in Fig. 2

In Eq. (3), R and C are constants. For a specific scheduling epoch, E can be treated as a constant     also. The objective is to reduce the probability of

Fig. 2 T curve with time

T(t)≥Tthreshold (t1<><>2), so S1 and T(t) (t0<><>1) should be as large as possible, as long as there are no thermal emergencies. Thus, GSA can be described as follows: selecting the hottest thread at each schedule step as long as there are no thermal emergencies. It should be noticed that GSA is not the global optimal algorithm and does not guarantee to minimize the total number of thermal emergencies for a given set of job intervals (in fact, this problem can be shown to be NP-hard [26]).

In Eq. (1), when =0, the chip reaches its steady

temperature TSS (TSS=RP). In this case, P denotes the rate of heat generation and dissipation. As P increases, Tss will increase also, which means that the heat dissipation rate is proportional to T. Thus, GSA can be intuitively understood by the fact that more heat will be dissipated from the processor if the processor reaches a high temperature level earlier.

3.3 Temperature prediction

To select the thread correctly, GSA must know which thread is the hottest thread and does not surpass the threshold at running time. This can be done by a temperature predictor.

Equation (1) can be written as

                             (4)

where , is a thermal parameter dependent on hardware specifications, and Tss is the steady state temperature of an application. Solving Eq. (4) with T(0)=Tinit and T(∞)=Tss, following equation can be obtained:

T(t)=Tss-(Tss-Tinit)·e-bt                         (5)

The steady state temperature Tss can be obtained by running each SPEC CPU2000 benchmark suite on each core until the temperature is not changed anymore. Using Eq. (5), the thermal parameter b can be calculated by accessing the real temperature from the digital thermal sensor (DTS) within a core. Once the thermal parameter b and Tss are obtained, the future temperature of each core can be calculated by Eq. (5).

3.4 Workflow of GSA

In Intel Core Architecture, dynamic thermal sensor (DTS) can be accessed by a machine specific register (MSR). Using these registers, operating temperature on multicore systems can be obtained.

According to the DTS values and the pre-calculated parameters, the temperature predictor calculates the future temperature of every core at the end of each scheduling interval. All these future temperatures are sent to the GSA scheduler to determine the next thread selection. The workflow of GSA can be described in  Fig. 3.

Fig. 3 Workflow of GSA

4 Implementation

4.1 Linux standard scheduler

The Linux OS distinguishes three classes of jobs: interactive jobs, batch jobs and real-time jobs. The real-time jobs are given the highest priorities while the other two are initialized with the same default priorities. The standard scheduler is designed to compromise two opposing aspects: response time and throughput. Real-time jobs and interactive jobs such as shell programming are built to run in a satisfactory response time. On the other hand, batch-jobs need to ensure throughput. To keep up with this corollary in multi-core, a certain process is rarely migrated into another core in Linux standard scheduler. This is mainly because an active process uses running information like TLB for the process through cache memory [27]. Only when the workload is noticeably unbalanced, the Linux standard scheduler initiates process migrations.

4.2 Other schedulers

1) Random scheduler

This scheduler randomly selects a job to execute in every scheduling interval. Both the hot and cool jobs are treated evenly and arbitrarily. However, it is not smart enough to avoid the scenarios in which the hot jobs are assigned to hot cores continuously when there are lots of hot jobs, resulting in many thermal emergencies [17].

2) Round-Robin scheduler

The power of each job is temporally evenly distributed on N cores, even resulting in spatial temperature. However, the temporal variation of the power of jobs causes that Round-Robin fails to take full advantage of the temperature ‘slack’ of processor. Although DTM actions and performance degradation are reduced, the context switch overhead and the effect of thermal cycling for the Round-Robin are large [28].

3) PTDM scheduler

PDTM (Predictive Dynamic Thermal Management) determines that migration is necessary when the predicted temperature exceeds the migration threshold. When the current temperature reaches the temperature trigger threshold, Δt, the time to the migration threshold is calculated. PDTM begins to calculate the future temperature for other cores after Δt. The core with the minimum temperature value is selected as new core for migration [29].

4.3 GSA scheduler

The Linux standard scheduler does not take the temperature behavior into account. To resolve this issue, the proposed GSA enables the scheduling policy to accommodate the temperature behavior as well as workloads in a multicore environment. GSA thermal- aware policy is designed for batch jobs, and the candidate jobs that are eligible for thermal scheduling fall within the priority range of batch job. This ensures that real-time jobs are scheduled in the same way as before. For interactive jobs, there is no easy way to distinguish between them and batch jobs. Linux implements a sophisticated heuristic algorithm based on the past behavior of the job to decide whether a job should be considered to be interactive or batched.

At each schedule step, the current temperature of the core can be obtained from DTS at running time. This temperature information is sent into the temperature predictor, along with the pre-calculated parameters, then the future temperature of the core can be calculated. GSA selects the thread which will make the core with the highest temperature, but below the threshold, run. If there are several threads with very closely high temperature, the thread on current core will be selected first. There are two reasons for this: one is that the precision of temperature predictor is limited, the other is that migrating a thread from other core will induce some overheads.

4.4 Algorithm description

GSA is described as Algorithm 1. Some variables and constants are defined as follows:

N—The number of the thread on current core waiting for schedule;

M—The number of the thread on all other cores waiting for schedule;

TemPre(i)—Predicted temperature of thread (i);

SLACK—Temperature difference between current temperature and threshold;

Tthreshold—The temperature which will induce hardware DTM;

Tmigration—Temperature difference value, which can be decided by experience or experiment results.

Algorithm 1: Greedy scheduling algorithm

for every core

Tpre ← 0

SLACK ← 0

SLACKtemp ← 0

for i=1 to N do

SLACK ← Tthreshold-TemPre (i)

if SLACK ≤ SLACKtemp then

SLACKtemp ← SLACK

candidate_temp ← i

end if

if Tpre < TemPre (i) and SLACK > 0 then

Tpre ← TP(i)

candidate ← i

end if

end for

for j=N+1 to N+M do

SLACK ← Tthreshold - TemPre (j)

if (TemPre (j) – Tpre ) > Tmigration and

SLACK > 0 then

Tpre ← TP(j)

candidate ← j

end if

end for

if all SLACK ≤ 0 than

candidate ← candidate_temp

end if

select thread(candidate) to run

end for

5 Experiment results and analysis

Most of previous works aimed to reduce the top temperature and average temperature of processor. Unlike that, the main objective of this work is to reduce thermal emergencies and improve system throughput. Quantitative measurements on the performance with and without GSA were performed, on a Fedora 7 machine using Q8300 processor. In the course of the experiment, the environment temperature was kept at 26 °C as possible.

The execution of a time quantum is periodically interrupted by the interrupt handler of the kernel, typically once every 1-10 ms. This is the time when a context switch may happen. We choose to insert our scheduling in this interrupt handler to force a context switch on every thermal scheduling interval. To keep the peak temperature below the threshold, the scheduling interval should not be much longer than the RC constant of the hottest unit. Previous works assumed 10 ms as the RC constant of the hottest unit on a CMP processor. In this work, we chose the thermal scheduling interval to be 8 ms.

Twenty-two SPEC CPU2K benchmarks were run to collect their temperature profiles and classified into different thermal groups. According to the thermal profiles, the programs can be classified into three groups, hot, warm, and cool. The classification of these programs is listed in Table 1.

Table 1 Benchmark classification

To demonstrate the proposed GSA, different combinations of workloads were run and the results were compared with Linux base scheduler, Random scheduler, Round-Robin scheduler and PDTM scheduler. The reported results show that GSA outperforms the other policies in minimizing the system thermal emergencies as well as improving the throughput. Different combinations of workloads are listed in Table 2.

Table 2 Workload combinations

Figure 4 shows the amount of thermal emergencies for different workloads when executed under the five schedulers: Linux base scheduler, Random scheduler, Round-Robin scheduler, PTDM scheduler and GSA scheduler. The results are normalized to the Linux base scheduler.

In all workloads, the GSA scheduler outperforms other schedulers, reduces 9.6%-78.5% thermal emergencies with 41% on average. When there are lots of cold tasks, the Random scheduler can effectively reduce the thermal emergencies by randomly selecting thread to execute. This is because the Random scheduler can make more uniform heat distribution to a certain extent. But, when there are few cold tasks, the whole operation temperature is higher. If two hot tasks are executed back to back, thermal emergency is likely to appear. The PTDM scheduler performs better than the Random scheduler especially when the number of hot tasks increases. However, all of these schedulers do not take full advantage of the temperature ‘slack’, except GSA.

Fig. 4 Thermal emergencies of different schedulers, normalized to the Linux base scheduler

Fig. 5 Evaluation of performance improvement of different schedulers

Figure 5 shows the performance improvements of all kinds of schedulers, relative to the Linux base scheduler. GSA can obtain average of 5.2% and up to 9.7% throughput higher than the Linux standard scheduler. It can be seen form Fig. 5 that the maximum benefit of all kinds of thermal-aware schedulers is neither for coldest workload, nor hottest workload. This is not surprising, because there are lots of cold task, thermal emergency will be produced rarely, and the thermal aware scheduler has little effect. Also, when there are lots of hot tasks, the optimization space is very limited because it is too hot.

6 Conclusions

A thermal-aware greedy algorithm (GSA) was proposed for chip multiprocessors and implemented on a Fedora 7 machine using a Quad-Core Q8300 processor. It is demonstrated that GSA is able to reduce thermal emergencies and improve system throughput, compared with the base Linux scheduler and some other thermal-aware schedulers. The reason for this is that GSA increases the operating temperature as quickly as possible and keeps the temperature just below threshold, which can speed up the heat removal from the processor. There is no additional hardware unit required for GSA scheduler.

References

[1] SKADRON K, STAN M R, HUANG W, VELUSAMY S, SANKARANARAYANAN K, TARJAN D. Temperature-aware microarchitecture [C]// ISCA’03: The thirtieth International Symposium on Computer Architecture. New York: ACM, 2003: 2-13.

[2] BROOKS D, MARTONOSI M. Dynamic thermal management for high-performance microprocessors [C]// HPCA’01: Proceedings of the Seventh International Symposium on High-Performance Computer Architecture. Nuevo Leone, Mexico: IEEE Computer Society, 2001: 0171.

[3] Intel pentium 4 processor in the 478-pin package thermal design guidelines [R]. Intel Document, 2002: 48.

[4] GUNTHER S, BINNS F, CARMEAN D M, HALL J C. Managing the impact of increasing microprocessor power consumption [J]. Intel Technology Journal, 2001, 5(1): 1-9.

[5] HEO S, BARR K, ASANOVI? K. Reducing power density through activity migration [C]// ISLPED’03: The International Symposium on Low Power Electronics and Design. New York: ACM, 2003: 217-222.

[6] SKADRON K. Hybrid architectural dynamic thermal management [R]. Automation and Test in Europe Conference and Exhibition, 2004: 10010.

[7] HANSON H, KECKLER S W, GHIASI S, RAJAMANI K, RAWSON F, RUBIO J. Thermal response to DVFS: Analysis with an intel pentium M [C]// ISLPED’07: The International Symposium on Low Power Electronics and Design. New York: ACM, 2007: 219-224.

[8] YAO C, SALUJA K K, RAMANATHAN P. Thermal-aware test scheduling using on-chip temperature sensors [C]// 24th International Conference on VLSI Design. Madras, Chennai India: IEEE, 2011: 376-381.

[9] ZHAO J, DATTA B, BURLESON W, TESSIER R. Thermal-aware voltage droop compensation for multi-core architectures [C]// Proceedings of the 20th Symposium on Great Lakes on VLSI. New York: ACM, 2010: 335-340.

[10] SRINIVASAN J, ADVE S V. Predictive dynamic thermal management for multimedia applications [C]// Proceedings of the 17th annual international conference on Supercomputing. San Francisco: ACM, 2003: 109-120.

[11] CHOI J, CHER C Y, FRANKE H, HAMANN H, WEGER A, BOSE P. Thermal-aware task scheduling at the system software level [C]// ISLPED’07. The International Symposium on Low Power Electronics and Design. New York: ACM, 2007: 213-218.

[12] DONALD J, MARTONOSI M. Techniques for multicore thermal management: Classification and new exploration [C]// ISCA’06: 33rd International Symposium on Computer Architecture. Boston, Massachusetts: IEEE, 2006: 78-88.

[13] KUMAR A, SHANG L, PEH L S, JHA N K. HybDTM: A coordinated hardware-software approach for dynamic thermal management [C]// DAC’06: Design Automation Conference. New York: ACM, 2006: 548-553.

[14] DENG L, DOU Y. Multi-core optimization for conjugate gradient benchmark on heterogeneous processors [J]. Journal of Central South University of Technology, 2011, 18(2): 170-179.

[15] KURSUN E, CHER C Y, BUYUKTOSUNOGLU A, BOSE P. Investigating the effects of task scheduling on thermal behavior [C]// TACS ’06: Workshop on TACS. Citeseer, 2006: 63-72.

[16] GOMAA M, POWELL M D, VIJAYKUMAR T N. Heat-and-run: leveraging SMT and CMP to manage power density through the operating system [C]// ASPLOS-XI. New York: ACM, 2004: 260-270.

[17] YANG J, ZHOU X, CHROBAK M, ZHANG Y, JIN L. Dynamic thermal management through task scheduling [C]// ISPASS’08: IEEE International Symposium on Performance Analysis of Systems and Software. Washington: IEEE, 2008: 191-201.

[18] ABBASI Z, VARSAMOPOULOS G, GUPTA S K S. Thermal aware server provisioning and workload distribution for internet data centers [C]// Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. Chicago, Illinois, 2010: 130-141.

[19] ZHOU X, YANG J, XU Y, ZHANG Y, ZHAO J. Thermal-aware task scheduling for 3D multicore processors [J]. IEEE Transactions on Parallel and Distributed Systems, 2010, 21(1): 60-71.

[20] CHAO C H, JHENG K Y, WANG H Y, WU J C, WU A Y. Traffic-and thermal-aware run-time thermal management scheme for 3D NoC systems [C]// Preceedings of the Fourth ACM/IEEE International Symposium on Networks-on-Chip. Grenoble, France: IEEE, 2010: 223-230.

[21] CUI J, MASKELL D L. High level event driven thermal estimation for thermal aware task allocation and scheduling [C]// Proceedings of the 2010 Asia and South Pacific Design Automation Conference. Taipei, Taiwan, 2010: 793-798.

[22] FISHER N, CHEN J J, WANG S, THIELE L. Thermal-aware global real-time scheduling and analysis on multicore systems [J]. Journal of Systems Architecture, 2010, 57(5): 547-560.

[23] WANG S, BETTATI R. Reactive speed control in temperature-constrained real-time systems [J]. Real-Time Systems, 2008, 39(1): 73-95.

[24] MULAS F, PITTAU M, BUTTU M, CARTA S, ACQUAVIVA A, BENINI L, ATIENZA D, MICHELI G D. Thermal balancing policy for streaming computing on multiprocessor architectures [C]// DATE’08: Design, Automation and Test in Europe. Munich, Germany: IEEE, 2008: 734-739.

[25] YEO I, KIM E J. Temperature-aware scheduler based on thermal behavior grouping in multicore systems [C]// DATE’09: Proceedings of the Conference on Design, Automation and Test in Europe. Leuven, Belgium: European Design and Automation Association, 2009: 946-951.

[26] CHROBAK M, D RR C, HURAND M, ROBERT J. Algorithms for temperature-aware task scheduling in microprocessor systems, [J]. Algorithmic Aspects in Information and Management, 2008(5034): 120-130.

[27] BOVET D, CESATI M. Understanding the Linux kernel [M]. Sebastopol: O'Reilly & Associates, Inc, 2002: 136-150.

[28] LI L, ZHOU X, YANG J. ThresHot: An aggressive task scheduling approach in CMP thermal design [J]. Algorithmic Aspects in Information and Management, 2008, 5034: 92-99.

[29] YEO I, LIU C C, KIM E J. Predictive dynamic thermal management for multicore systems [C]// Anaheim: ACM, 2008: 734-739.

(Edited by YANG Bing)

Foundation item: Projects(2009AA01Z124, 2009AA01Z102) supported by the National High Technology Research and Development Program of China; Projects(60970036, 61076025) supported by the National Natural Science Foundation of China

Received date: 2011-05-03; Accepted date: 2011-07-22

Corresponding author: QU Shuang-xi, PhD Candidate; Tel: +86-13574832272; E-mail: shuangxiqu@nudt.edu.cn

Abstract: Chip multiprocessors (CMPs) allow thread level parallelism, thus increasing performance. However, this comes with the cost of temperature problem. CMPs require more power, creating non uniform power map and hotspots. Aiming at this problem, a thread scheduling algorithm, the greedy scheduling algorithm, was proposed to reduce the thermal emergencies and to improve the throughput. The greedy scheduling algorithm was implemented in the Linux kernel on Intel’s Quad-Core system. The experimental results show that the greedy scheduling algorithm can reduce 9.6%-78.5% of the hardware dynamic thermal management (DTM) in various combinations of workloads, and has an average of 5.2% and up to 9.7% throughput higher than the Linux standard scheduler.

[1] SKADRON K, STAN M R, HUANG W, VELUSAMY S, SANKARANARAYANAN K, TARJAN D. Temperature-aware microarchitecture [C]// ISCA’03: The thirtieth International Symposium on Computer Architecture. New York: ACM, 2003: 2-13.

[2] BROOKS D, MARTONOSI M. Dynamic thermal management for high-performance microprocessors [C]// HPCA’01: Proceedings of the Seventh International Symposium on High-Performance Computer Architecture. Nuevo Leone, Mexico: IEEE Computer Society, 2001: 0171.

[3] Intel pentium 4 processor in the 478-pin package thermal design guidelines [R]. Intel Document, 2002: 48.

[4] GUNTHER S, BINNS F, CARMEAN D M, HALL J C. Managing the impact of increasing microprocessor power consumption [J]. Intel Technology Journal, 2001, 5(1): 1-9.

[5] HEO S, BARR K, ASANOVI? K. Reducing power density through activity migration [C]// ISLPED’03: The International Symposium on Low Power Electronics and Design. New York: ACM, 2003: 217-222.

[6] SKADRON K. Hybrid architectural dynamic thermal management [R]. Automation and Test in Europe Conference and Exhibition, 2004: 10010.

[7] HANSON H, KECKLER S W, GHIASI S, RAJAMANI K, RAWSON F, RUBIO J. Thermal response to DVFS: Analysis with an intel pentium M [C]// ISLPED’07: The International Symposium on Low Power Electronics and Design. New York: ACM, 2007: 219-224.

[8] YAO C, SALUJA K K, RAMANATHAN P. Thermal-aware test scheduling using on-chip temperature sensors [C]// 24th International Conference on VLSI Design. Madras, Chennai India: IEEE, 2011: 376-381.

[9] ZHAO J, DATTA B, BURLESON W, TESSIER R. Thermal-aware voltage droop compensation for multi-core architectures [C]// Proceedings of the 20th Symposium on Great Lakes on VLSI. New York: ACM, 2010: 335-340.

[10] SRINIVASAN J, ADVE S V. Predictive dynamic thermal management for multimedia applications [C]// Proceedings of the 17th annual international conference on Supercomputing. San Francisco: ACM, 2003: 109-120.

[11] CHOI J, CHER C Y, FRANKE H, HAMANN H, WEGER A, BOSE P. Thermal-aware task scheduling at the system software level [C]// ISLPED’07. The International Symposium on Low Power Electronics and Design. New York: ACM, 2007: 213-218.

[12] DONALD J, MARTONOSI M. Techniques for multicore thermal management: Classification and new exploration [C]// ISCA’06: 33rd International Symposium on Computer Architecture. Boston, Massachusetts: IEEE, 2006: 78-88.

[13] KUMAR A, SHANG L, PEH L S, JHA N K. HybDTM: A coordinated hardware-software approach for dynamic thermal management [C]// DAC’06: Design Automation Conference. New York: ACM, 2006: 548-553.

[14] DENG L, DOU Y. Multi-core optimization for conjugate gradient benchmark on heterogeneous processors [J]. Journal of Central South University of Technology, 2011, 18(2): 170-179.

[15] KURSUN E, CHER C Y, BUYUKTOSUNOGLU A, BOSE P. Investigating the effects of task scheduling on thermal behavior [C]// TACS ’06: Workshop on TACS. Citeseer, 2006: 63-72.

[16] GOMAA M, POWELL M D, VIJAYKUMAR T N. Heat-and-run: leveraging SMT and CMP to manage power density through the operating system [C]// ASPLOS-XI. New York: ACM, 2004: 260-270.

[17] YANG J, ZHOU X, CHROBAK M, ZHANG Y, JIN L. Dynamic thermal management through task scheduling [C]// ISPASS’08: IEEE International Symposium on Performance Analysis of Systems and Software. Washington: IEEE, 2008: 191-201.

[18] ABBASI Z, VARSAMOPOULOS G, GUPTA S K S. Thermal aware server provisioning and workload distribution for internet data centers [C]// Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. Chicago, Illinois, 2010: 130-141.

[19] ZHOU X, YANG J, XU Y, ZHANG Y, ZHAO J. Thermal-aware task scheduling for 3D multicore processors [J]. IEEE Transactions on Parallel and Distributed Systems, 2010, 21(1): 60-71.

[20] CHAO C H, JHENG K Y, WANG H Y, WU J C, WU A Y. Traffic-and thermal-aware run-time thermal management scheme for 3D NoC systems [C]// Preceedings of the Fourth ACM/IEEE International Symposium on Networks-on-Chip. Grenoble, France: IEEE, 2010: 223-230.

[21] CUI J, MASKELL D L. High level event driven thermal estimation for thermal aware task allocation and scheduling [C]// Proceedings of the 2010 Asia and South Pacific Design Automation Conference. Taipei, Taiwan, 2010: 793-798.

[22] FISHER N, CHEN J J, WANG S, THIELE L. Thermal-aware global real-time scheduling and analysis on multicore systems [J]. Journal of Systems Architecture, 2010, 57(5): 547-560.

[23] WANG S, BETTATI R. Reactive speed control in temperature-constrained real-time systems [J]. Real-Time Systems, 2008, 39(1): 73-95.

[24] MULAS F, PITTAU M, BUTTU M, CARTA S, ACQUAVIVA A, BENINI L, ATIENZA D, MICHELI G D. Thermal balancing policy for streaming computing on multiprocessor architectures [C]// DATE’08: Design, Automation and Test in Europe. Munich, Germany: IEEE, 2008: 734-739.

[25] YEO I, KIM E J. Temperature-aware scheduler based on thermal behavior grouping in multicore systems [C]// DATE’09: Proceedings of the Conference on Design, Automation and Test in Europe. Leuven, Belgium: European Design and Automation Association, 2009: 946-951.

[26] CHROBAK M, D RR C, HURAND M, ROBERT J. Algorithms for temperature-aware task scheduling in microprocessor systems, [J]. Algorithmic Aspects in Information and Management, 2008(5034): 120-130.

[27] BOVET D, CESATI M. Understanding the Linux kernel [M]. Sebastopol: O'Reilly & Associates, Inc, 2002: 136-150.

[28] LI L, ZHOU X, YANG J. ThresHot: An aggressive task scheduling approach in CMP thermal design [J]. Algorithmic Aspects in Information and Management, 2008, 5034: 92-99.

[29] YEO I, LIU C C, KIM E J. Predictive dynamic thermal management for multicore systems [C]// Anaheim: ACM, 2008: 734-739.