Minutes of Subgroup Meetings

Some thoughts on Parallelisation on Jamaica by CCK.

I am being deliberately provocative here, to try to get us to focus on an important issue. I claim that so far we have not yet done any adaptive parallelisation! Yes - Jisheng has implemented some parallelisation in Jikes and got it to work, and of course Jikes is an adaptive compiler. But what I believe is that Jisheng has implemented in Jikes parallelisation strategies that have been worked out for static compilers.

When asked we always say that, by delaying compilation until the latest possible moment, adaptive compilation can do better optimisation than static compilation. If pressed, in the context of parallelisation we might say things like "e.g. if you compile the code for a method when you know what the parameters are, you can determine with certainty that two array references are distinct, and this enables you to parallelise a loop in a way which is not possible if they are not". But this requires the compiler to inspect the values of actual call parameters, and I am fairly sure we don't do such a thing! This example is, of course, derived from a static optimisation - where the compiler would plant code to do the necessary check, and would then plant code for the two different cases. All that being adaptive in this way would achieve would be saving the time to generate the code for the unused branch, and saving the space that such code would occupy. The compiler would need to keep track of which version it had planted, and be prepared to plant the alternative one on a subsequent call for which that check went the other way. It is not so difficult to design a mechanism for this - but the savings over the more obvious static way are not so clear. Using less space for cases never compiled is likely to be only a 2nd order effect (we try to run benchmarks which don't garbage collect if we can), and the compilation time saved is not easy to boast about (we tend to emphasise the runtime of the compiled code - and rather hide the compilation time, because may of the other quoters of benchmark runtimes, with static compilers, do the same.) So it is not clear that being adaptive here would be worth doing anyway.

Perhaps I have taken a bad example? Consider instead the question of whether, if it is possible, a loop should be parallelised. Does the compiler assess the loading of the machine at the moment of compilation, and as a result choose between planting parallel and sequential forms of the loop code? I would suggest that, in fact, it doesn't. It plants some sophisticated code which exports parallel work if there are any takers for it, and otherwise executes it locally. (This is pretty much what SLAM was about, a long time ago.) In the absence of a simple measure of machine loading which the compiler can consult, perhaps this is entirely reasonable. But it is not adaptive compilation.

So what do we get out of the adaptivity of Jikes? At present, just the fact that optimisations are only applied to hot methods. Perhaps that is all we are likely to get!

Comments by IR

I agree with Chris. We're in the position to try out adaptive parallelisation now, which is progress as previously there was no parallelisation phase. In the Jikes RVM there are two means of compiling code to take advantage of runtime data:

create code that tests the condition and branches to optimized code if the condition is met. This is how we perform loop versioning from which the sensible parallelisation becomes possible. It has a drawback of bloating the code.
create an on-stack replacement point that forces recompilation and on-stack replacement if the condition isn't met. This is currently only used for method inlining as recompilation and replacement are expensive.

I have long argued that the adaptive compiler isn't making sensible optimization plans. When the heat of a method increases it merely ups the optimization level - we could determine a phase has no effect at level 0 but re-run the same phase 2 more times. The organization of phases resembles a diagram in the front of Muchnick's book, but this book doesn't cover optimisations now within the Jikes RVM. As the bug log will atest, some of the Jikes RVM optimisation phases are broken (importantly for us its the loop unrolling phase).

So it seems we need a multi-pronged attack on parallelisation:

better optimisations - handle more cases of loops that a static compiler can optimise, as well as speculative parallelisations
sampling - an adaptive sampling network which will try to gather statistics on aliasing that can be used to drive the parallelisation compilation better
better optimization planning - Jisheng's MSc looked at one technique for this but was also limited by the framework
hardware support - in particular speculation requires hardware support
compiler support - if we decide to use on-stack replacement then I believe it needs implementing
compilation cost - over time, on a benchmark running forever, compilation time tends toward 0. For this reason I usually think of compilation cost as a secondary concern. For short running benchmarks looking at parallelising the compiler may reduce the compilation cost. Again better optimization planning is going to help out.