APT Wiki | Jamaica / SreekanthBellur

Project Description

JAM.15 Low-Power Garbage Collection (CCK/JS)*

Garbage collection (GC) is automatic memory management that takes place in Java virtual machines as programs are executing. Since Java is now available on embedded devices, power consumption is becoming a key issue in selecting and tuning GC algorithms. This project will examine a number of standard GC algorithms in the Jikes RVM system running on x86 Linux machines. It should be possible to get estimates of power consumption by reading hardware performance counter values and feeding these to a power modelling tool.

Project aims:

build an infrastructure to measure power consumption of programs in Jikes RVM
evaluate existing GC algorithms in terms of power
consider modifications to GC algorithms to reduce power consumption

Useful links:

Jikes RVM
Rabbit Performance Counter Library
Wattch Power Modelling tool [download] [paper]
The Garbage Collection Page

Project Progress

11 Apr 08

background research on existing GC algorithms (read textbooks, examined code)
experimented with Jikes RVM, various GC algs and benchmarks
Created a profiling framework for capturing benchmark times with diff GC algs, at different heap sizes

Project Objectives

11 Apr 08

adapt GC measurement script to measure cache misses instead of execution time (using oprofile tool on antigua) (done)
check out background material for power modelling (references supplied on 22 Apr 08 - Sreekanth to review these papers)

Project Objectives

22 Apr 08

try all the other DaCapo benchmarks (jython, eclipse etc - to see if they work ok in our framework) (done)
calculate means and stddevs for each set of 10 measurements for each benchmark - graph these (in Excel?) (done)
change perl script to measure number of samples (rather than percent), for JikesRVM process not kernel. (done)
chase up references for relating hardware performance counter values to energy consumption.(Should be useful to include in background research report.) (done)
Jeremy to check background report contents, Sreekanth to draft some ideas for background report. (to do, soon!)

Project Objectives

29 Apr

modify setup to run on the machine mona rather than antigua after kernel bugs+crashes on antigua (done - but still some crashes - need to investigate further, possibly crash is caused by Jikes RVM seg-faults whilst being profiled by opcontrol?)
change perl script to run benchmarks in 2x min heap size, and check for OutOfMemory exceptions after each benchmark run (done)
draft out background report, before weekend, for Jeremy to check (done!)

Project Progress

6 May

completed first draft of background report, minor modifications suggested by jeremy.
on schedule for submitting bg report by Fri 9 May, then 1 month off for exams!

Short-term things to try

6 Jun 08

install Linux on own machine, install oprofile (Now we have abaco.cs.man.ac.uk working ok)
try running N iterations of each benchmark, divide perf counter results by N to get counter results for a single run of a benchmark. (should be less stress on opcontrol profiler) (done)
try larger heap sizes to avoid out-of-memory errors (done)
try interleaving profiled runs of benchmarks with unprofiled runs (to avoid turning counters off and on immediately) (not needed)

Long-term things to try

(jeremy) Work out how to model power from hardware performance counter values (look at Martonosi papers)

Project Objectives

17 Jun 08

monitor 5 different events for each bm (instr retired, L1,2,3 cache misses, ITLB misses) (done - but needs redoing!)
plug these numbers into a power model to find low-power GC algorithms or applications
find alternative benchmarks, more low-power suitable

Project Objectvies

24 Jun 08

edit getStatistics perl script to correctly use different GC algorithms each iteration. ongoing!
investigate cacti - http://quid.hpl.hp.com:9082/cacti/ - for getting cache access energy estimates (done)
write a summary of Cacti and how it works (for thesis) (done)
Still need to work out instruction execution energy and mem access energy.
energy = 1 x num_instrs + 0.5 x (L2_hits+L2_misses) + 15 x(L2_misses)

Problems with this:

no accounting for TLB misses
no proper differentiation between read and write accesses
figures based on Intel XScale, and Cacti approximations, probably not really applicable to Pentium 4... but it's the best we can find at present. At the end of the day, we just want to compare different GC algorithms in this model, we are only comparing against ourselves so only relative differences matter, not absolute values.

Long term goals

set up all parameters for energy estimation formula
plug in each GC algorithm, and work out which GC algorithms are low-power

3 Jul 08

Objectives

re-run getstats script, now with correct GC algorithm changes on each iteration (done)
calculate power estimates using this initial model above (done)
(jeremy) get in touch with someone who might be able to give us time on an XScale Linux box. (not yet)

Rough Thesis chapter plan

introduction - short and interesting description
background - different GC algorithms, low power concerns
infrastructure - Jikes RVM, benchmarks, oprofile statistics gathering, equations for energy estimation (references to other papers in here)
investigation - Report on GC algorithms and benchmarks' energy consumption (lots of graphs or tables in this chapter). Work out trends in energy consumption for various algorithms. Tradeoffs?
extensions - tweak an existing algorithm, or make a new one... show that it has better power characteristics than existing algorithms.
related work (?) what have other people done in this area. (maybe in background chapter instead)
conclusions - very short summary, outline future work

Objectives

8 Jul 08

concentrate on three lowest energy algorithms - marksweep, genms and refcount. Can we justify why these should be low-energy? These three do the least amount of memory copying. (done)
see how energy consumption varies with heap size, for these three GC algorithms. Draw graphs for 1, 1.25,1.5,1.75,2,2.5,3 ... times minimum heap size in which GenMS runs. (done - but heap sizes can be smaller - so redo)
Find min heap size in which MarkSweep and RefCount run, work out what this heapsize is in relation to GenMS's minimum heap size. (do again, for small input DaCapo benchmark execution)
Draw these graphs for benchmarks except eclipse and hsqldb, we have decided these are not suitable ok
use small input size instead of default, more suitable for low-power workloads (Harness -s small Benchmark) ok
choose which GC algorithm we want to tweak... (change source code in MMTk and recompile) thinking about different parameters to change
investigate whether using opcontrol corrupts MMTk timing Harness information (jeremy look at this)

Why is our approach novel?

because we use relative heapsizes, rather than absolute
consider refcount GC algorithm
tweak parameters to GC algorithms, maybe on command line (-X:gc:options)
tweak source code of GC algorithms, recompile MMTk ...

Objectives

15 Jul 08

recalculate min heap times for DaCapo small inputs, for each GC algorithm. Jeremy to dig out script, Sreekanth to re-calibrate on small inputs
re-run benchmarks, regenerate graphs (which look very nice!) (done)
check you can recompile MMTk properly (done)
think about hacking MMTk
1. triggers for garbage collection (all algorithms, in Plan.java) (thinking)
2. relative sizes for each separate Space (all algorithms)
3. Allocator policies - e.g. for MarkSweep, change size classes (done)
4. Ref counting bits - change colour scheme (read paper on this!) (thinking)
evaluate optimizations on one or two benchmarks initially, but eventually we will have to do a full evaluation on all bms

Objectives

12 Aug 08

redraft first half of dissertation, which is almost ready. Then email to js and ck
write up tweaks to GenMS and MarkSweep, evaluate the optimizations (chapter 5)
investigation changing the Jikes RVM GC trigger heuristic, in Plan.java (check email)
(jeremy) think about tweaking Ref count algorithm
(jeremy) categorization of DaCapo benchmarks