APT Wiki | Jamaica / Thu9Mar06

Speculation and Transactional Memory subgroup meeting. IT413, 14:30, Thu 9 Mar 2006.

Present

Ian Watson, Ian Rogers, Matt Horsnell, Simon Wilkinson, Jisheng Zhao, Behrem Kahn, Gavin Brown, Mohammad Ansari, Kostas Nikas, Jeremy Singer

Agenda

Gavin presents an overview of value prediction for speculation, based on recent work with Jeremy.

Discussion

(gb) We will see how to apply machine learning to value prediction / speculation method level parallelism (SMLP). This talk focuses on prediction rather than the low-level mechanisms of SMLP. However, detail looks like (diagram on whiteboard)

The main issue is how accurate can predictions be? Most research quotes program speedup figures, inevitably based on prediction accuracy.

We can apply useful concepts from information theory - such as entropy . Entropy is a measure of randomness in a signal. High entropy indicates fewer patterns in a sequence of symbols (imagine black box emitting 0s and 1s). Zero entropy is sequence 11111... or 00000...

We measured the entropy of return values for each method in SPECjvm98 benchmark. Actually measured entropy taking into account previous elements in sequence - structure allows better predictions. Having a look at some history bits may give us context to help determine what the value of the next bit will be. This is the idea of conditional entropy.

Fano's bound relates conditional entropy to prediction accuracy. Basically, there is a ceiling of accuracy for prediction techniques. Consider boolean method return values - each value is either 0 or 1. The conditional entropy of the return value trace will be between 0 (entirely predictable) or 1 (entirely random). Fano's bound says that when the conditional entropy is 1, best case predictability is 50%. This makes sense since the best you can hope to do when the outcome is random is to flip a coin! Fano's bound shows that as the conditional entropy decreases, so the best case predictability increases. When the conditional entropy is 0, the outcome is entirely predictable so the best case predictability is 100%.

Some notation. Y indicates the future. This is the next bit to be predicted. X indicates the history, may be n bits long. This is the context to use as input to the predictor.

Our studies showed how well different existing predictor designs (last value, stride, finite context) performed in relation to this Fano bound on predictability.

This is the important point of our work. Everyone else measures predictability of methods relative to a particular predictor. The Fano bound is predictor-independent.

(ir) Did you investigate the range of conditional entropies in your benchmark programs?

(gb) No, but different benchmark programs obviously have methods with different conditional entropies. The methods we used for our investigations had a nice spread of entropy values, but all clearly obeyed the Fano bound.

It's possible to extend this study from boolean method return values to integer types, though this is more difficult.

It's also possible to take into account more history. We observed that increased history (context) reduced the conditional entropy and increased the predictability, using a finite-context-method predictor - which gets very close to the Fano bound.

The Fano bound enables objective assessment of predictor performance. A perfect predictor is not necessarily 100% accurate. This depends on the Fano bound!

(iw) How many samples of method return values do you need to take before you can be confident in your measure of the entropy?

(gb) Conditional entropy is based on a measurement of probability distributions. We try to approximate prob dist based on observations seen so far. (posterior probability assignment -js) The problem gets harder as the number of possible values increases. (read int type, not boolean! -js) The literature all agrees that integer return value prediction is a hard problem! My future work will focus on how to approximate distribution using machine learning techniques, interpolating full distribution from few observations seen so far.

(iw) And these techniques have to be fast enough to do this on-the-fly, if you want to employ them in an adaptive speculation system!

(gb) Yes, we have to learn the prob dist fast, and be able to calculate conditional entropy fast.

(iw) What about the problem of strides?

(gb) Consider sequence 1,2,3,4,5... Then consider sequence 3,2,5,1,4,... Standard entropy does not consider any kind of relationship between different symbols. Though mathematically, the first sequence has far more structure than the second! Need some kind of extra knowledge to see that the first sequence is predictable!

(ir) What about considering callers - method X may have a much lower entropy if you only consider the times it is called by method Y. (- This is context-sensitive analysis -js)

(iw) Yes, that might explain your interesting phased behaviour of predictability...

(ir) And compiler analysis can help. You can see that a method return value is computed early and squash speculative computations early, before the method returns.

(gb) Yes, we only consider method entry/exits as speculation points at present. IanWatson has pointed out other work that uses arbitrary program points (MitosisSpeculativeCompilation for instance! -js)

Next week

(iw) Suggestion to look at a paper on Efficient Object Based Software Transactions (pdf) by Ananian and Rinard, presented at SCOOL 05.