APT Wiki | Jamaica / AdaptiveDynamicTransactionalCoherenceAndConsistency

In our quest to produce a general purpose parallelising virtual machine, it is clear (to me anyway) that speculative parallelisation is important. At one time, I thought that adaptive non-speculative parallelisation would be all that was required and that speculation was unnecessary.

However, I have become more convinced that, with the overheads of doing complex compiler analysis within a dynamic compilation system, that this may not be the best way forward. Instead, the best way to use adaptive dynamic compilation may be to regard it a a way of tuning a speculatively parallelised program to ensure minimal runtime conflict.

One way forward is to use an approach similar to previous hardware ones (Hydra etc.) but to couple this with a dynamic compilation system that will monitor runtime conflicts, deal with thread squashing and restarting and adaptively alter parallelisation strategies where feasible. This will require the inclusion of hardware support for speculative memory using the sort of modified writeback protocols used by previous attempts. Matt is currently investigating the potential of this approach.

However, since Hydra, the Stanford group has moved on to investigate a scheme which they call Transactional Coherence and Consistency - TCC. A problem with the previous approaches is that they rely on a snooping protocol to observe all memory writes to determine if the data, which a speculative thread is using, is modified by another parallel thread. If so, the specultive thread is invalid and must be squashed and restarted. This arequirement for snooping and identifying critical data, at an individual bus transaction level, adds significantly to the complexity of the memory system. In addition, it does not sit well with the idea of scalable parallel systems.

TCC is a resurrection of the idea of Bulk Synchronous Parallelism where programs are regarded as collections of parallel threads which operate without altering global data but then commit and synchronise with other threads at the end of their execution. (Just like Database Transactions). However, at the commit point, it is possible that there is a conflict and that threads may need to be squashed and restarted.

In many ways, therefore, the effect is little different from current speculative parallelism. However, current TCC systems propose that this can lead to less complex hardware because it is no longer necessary to deal with data at the individual memory transaction level. Instead, a thread produces a packet of memory transaction which it needs to commit to main memory when it terminates. Other threads need to observe the contents of this packet to detrmine if it contains any writes which would invalidate its current work.

I can not see how this makes the system anymore scalable. The transactional commits, still need to be checked by each processing core's cache as each core needs to know whether or not it is still processing a valid transaction, therefore in a scalable network based topology, would it not be true to say that each core would need to be informed about the transaction commits? Also if the transaction commits are only rectified at commital, there is the distinct possibilty that cores will continue processing a transaction that will in all likelyhood be squashed. A benefit of speculative schemes is that as soon as the violation is discovered the thread can be squashed or restarted from the violation.

Another concern, that I personally have with the TCC system, is how priorities are maintained. If a small transactional routine keeps commiting how will a large transactional routine that is affected by the small routine, get a look-in and manage to commit? I think that TCC is a nice paradigm but there are too few implementation details to convince me that this is the right way to go. Matt

In the current Stanford proposals, it is envisaged that programmers would be responsible for writing their programs in this transactional style using new constructs provided in a programming language. These may either specify ordered or unordered transactions depending on the form of the data dependencies.

There are a number of drawbacks to this approach:

The onus is on the programmer to produce correct parallel programs. (e.g. the overlooking of a data dependency in the specification of ordering can result in an incorrect program)
The specification of non-optimal transaction structure can result in poor performance.
The memory mechanisms still rely on broadcast.
A thread must still do a lot of work to examine the contents of each packet.

What we might like to achieve:

A way of identifying (potential) transactions automatically
A way of tuning transaction boundaries using adaptive dynamic techniques
A way of identifying the need for only limited multicast of commit information
A way of limiting the data which needs to be examined for updates