APT Wiki | Jamaica / APlanForSpeculationResearch

As discussed in the project meeting of 17/11/05, it is probably a good idea if we examine the whole area of speculative parallelism and formulate a more coherent plan of action to investigate those areas in which we have interest. This is therefore an attempt to identify the various areas which we should be thinking about. It is initially intended as input to a meeting proposed for 24/11/05.

Motivation

A major theme of the Jamaica project, for the last 3 or so years, has been the use of dynamic compilation to exploit parallelism in chip multiprocessors. Initially we (I?) thought that the right approach was to perform non-speculative parallelisation wherever possible and resort only to speculation when that became impossible. However, there are some strong reasons why this view may need a re-think.

The complexity and hence cost of doing the analysis necessary to identify and extract non-speculative parallelism from complex programs may be too great to make it cost-effective in a dynamic compilation system.
The either/or view is too simplistic. It may be productive to think of parallelisation having a probability of correctness. Any program which can be parallelised with a known correctness probability of 1 could run just as efficently on a speculative system if effective parallelisation decisions can be taken with much less analysis (as long as the costs of supporting speculation do not intrude when the mechanisms are not required). Of course probability of correctness and effectiveness of parallelisation are not the same thing at all. It is very easy to produce a fully correct program with no effective parallelisation!
Although dynamic compilation may be useful in tuning both parallelisation techniques and run-time scheduling in a non-speculative system, it seems likely that it will have a major role in the analysis and tuning of speculative systems.

The Field

Any speculative system will have the following components

A mechanism for deciding where, in the original source program, it may be appropriate to fork parallel threads. This will normally involve one non-speculative thread and one or more speculative threads. It will also involve the specification of join points where the threads will converge.
A mechanism for storing (for each speculative thread) speculative state until it is known that the thread has become non-speculative.
A mechanism for committing speculative state when the thread becomes non-speculative.
A mechanism detecting that the commit of one thread's state has interfered with the execution of a still speculative thread.
A mechanism for abandoning some or all of a speculative thread's state in the presence of interference.
A mechanism for restarting threads (either completely or with partial rollback) when state has been discarded due to interference.

Many systems will have refinements of or additions, but it is thought that the above captures the essence of speculation for parallelism.

It is possible to implement all the above in software. However, it is almost certainly the case that hardware support is needed, for at least some of the above mechanisms, to achieve any degree of performance. Many proposed systems use software only for 1 and regard the rest as largely hardware functions.

However, the use of a dynamic compilation virtual machine may mean that considerably more flexibility could be achieved by performing some of 5 & 6 in software.

Different Approaches

[Matt has compiled a fairly complete list of active groups in this area of research, along with links to some of the more interesting papers they have published. He will be adding his personal thoughts on TLS on these pages. If I have missed a group/project please feel free to amend. Matt]

Matt's survey is comprehensive and I will not duplicate it here. Instead I will add my thoughts on those areas where I feel there might be scope for worthwhile future work.

Random Thoughts

The first major difference is between systems which perfom and monitor memory operations at the individual memory operation level and those that operate in terms of bulk transactions (TCC).

There are quite a few proposals for various sorts of hardware support the individual memory operation approach. Many of these involve modification of bus based cache coherence protocols. Bus snooping is the usual mechanism for observing conflicts.

Less detail is available on hardware to support TCC. However, the basis of the support involves hardware buffers to keep speculative writes and mechanisms for each processor to scan commit packets to see if there are conflicts. Hardware support for arbitration between threads wanting to commit seems also to be assumed. It is not entirely clear to me whether the operations concerned with abandoning and restarting of threads is all provided in hardware or not.

There are several descriptions of compiler support for automatic (both loop and Method) speculative parallelisation. However, most (all?) of these are done by static analysis. (Despite the title Jrpm is a JIT not a dynamic compilation system).

Currently it appears that the TCC approach provides minimal software support. All published results so far use parallel applications written either with barrier synchronisation or Java serialisation; a simple apporach is taken to declaring the basic units of these programs as transactions. There is also a suggestion that programmers might write their programs explicitly in terms of transactions.

Clearly we don't want to re-invent the wheel although there is probably some 'catching up' work that we need to do before we can pursue new research. I feel that the general area in which we ought to be seeking to make a contribution is the use of a dynamic compilation virtual machine to support speculative parallelisation and the subsequent run time behaviour. This doesn't mean that all the work is in the software area, it seems likely that some hardware functions could be simplified significantly by software support while extra support may be needed to provide the information on which the VM can operate.

I think there is scope to pursue both the 'conventional' speculation approach and the TCC approach. Clearly the latter area has more virgin territory but there are still a lot of interesting areas to investigate in the former. Therefore, I think that we work we need to do falls broadly into four obvious areas, although there are clearly several sub areas in most of these.

Hardware support for TLS
Dynamic software support for TLS
Hardware support for TCC
Dynamic software support for TCC

Matt's page MattHorsnell discusses the TLS area in much more detail. I was hoping to flesh out my thoughts on the TCC area. I started to do this in AdaptiveDynamicTransactionalCoherenceAndConsistency but it is incomplete.

---

Software Support for Speculation

[ Jeremy's thoughts ] As Matt and IanW have already said, we need enough novelty to distinguish our research from existing work in the field. I feel that there is good scope for improving software support for speculation. At present, I think there are two major weaknesses with published speculation schemes:

use of ad-hoc heuristics. These should (and can easily!) be replaced by formal metrics. I think we should apply formalisms such as Entropy ? (from information theory) and ProbabilisticSlicing (combinining probability theory with data flow analysis).
use of static estimates, or at best offline feedback-directed quasi-dynamic estimates of runtime behaviour. With JaVM, it is possible to take measurements while the program is running, and use this information to guide the application of optimizations online. Also in an adaptive runtime system, it will be possible to undo/reapply transformations as the program behaviour varies over time. (No-one has really studied whether program behaviour goes through "phases" in which speculation may or may not be effective. [ though I'm working on this at the moment! ] [ Jeremy ]