Recent Changes - Search:

PmWiki

pmwiki.org

edit SideBar

BrainStorming

Lest we forget, a random-ism collection of notes and ideas on (weird) processor architecture things. Also a list of things to do.

Notes

Want static call tree analysis! Try from source code or object code? Probably object is easier to parse, although such as ARM (cond. RET) may be messier and contrived CALLs will be a pain.

Extend to include loops as similar structures (+ others?).

The programming model in (e.g.) C discourages and often forbids the use of efficient constructs, except in hand-made libraries. E.g. LOOP, BOUND, INTO, ... (x86) Also there are things which won't be (obviously) visible in compiled code because the programmer has had to do it by hand; examples include array bound checking.

Object handling. What are the implications?

What is the "working set" of - compiler scratch variables?

 - user declared variables?

How do we measure it? What does the graph look like???

All sorts of implications in programming style; should the programmer be writing `clean' code in the language or hack for `efficiency'?

Hardware stacks/reg. windows have got to be good but spilling is a pain. Who should choose/set when they can be used? Compiler? Dynamic link? Dynamic compilation?

Exceptions - both H/W and S/W (like Java?). Won't show in C, 'cos it don't have them.

Ideas

Variable typing; use this to influence instruction decoder. How many types? Sizes?

       integer (signed/unsigned) (8/16/24/32/64/...)
       Boolean
       bit vector (& other vectors, like MMX?)
       float (various)
       base address/pointer
       address offset (index)
       (code address)
       +++

Type can influence which flags/conditions are used; {BVS & BCS}, {BHI & BGT} etc. coalesce.

Vary instruction decoding according to preceding instruction(/sequence).

Associate a secondary register with each? primary register (only enabled some of the time). Also, perhaps, a third (implicit 0?) According to type this could be used for:

        Bound checking (array indexing/jump tables, stack overflow, ...)
        Modulo indexing (DSP, ring buffers)
        Arithmetic overflow
        Loop termination

...

How about just one which compares with ALU output (ST). Designatable? (Could this interfere with other ops.? Should it also check dest. addr.?)

Could specify the (designatable) comparison register elsewhere (e.g. in status register field). Opportunities for other functions ... (?)

"Blocks" need identifying; a procedure is clearly a block, as is a loop. Are there any others? how are these exploited? (e.g. below)

Precalculate `immediate' values and save whenever possible. Immediates include things such as branch targets - useful in loops! Saves both fetch bandwidth and repeated calculation.

Ideally these should be saved after the block(/context?) is left so they can be reused. This requires prescience!

        Idea - (not v.good?) but `immediate registers' into a lump and
        cache a set of such lumps.  At least for CALL these can be
        tagged with an entry address.  When CALL is encountered see if
        one is there.  The lump also contains the address of the end of the init. code, so the PC can be diverted directly to this
        point.
        For nesting a flag can indicate if a lump is currently in
        use, so it is not recycled.

Code analysis. Compiler problem with procedure call conventions. Ideally(?!) blocks should have registers allocated at link time so that restrictions on what is in use and saved can be determined accurately, rather than relying on some generalised PCS.

CASE should be easily compilable to an array of CALLs. Having a visible LR (/top of return stack) is useful here.

Reigster hierarchy? Accumulator? Load target reg. Designatable registers for short-form addressing.

(According to type) allow each register to have an address tag. When valid this means that the register is a cache of a memory variable. This allows something to be left in a register without fear of aliasing problems. Of course this is a C-type nastiness work-around. :-( [AB references to Dan Owen's thesis on aliasing; available?] Can this be used to eliminate (some/most) store operations? Just leave the value lying about until it is loaded over. Of course, when it needs to go it can still be in a write-buffer/victim cache, possibly allowing a subsequent load to be bypassed. Would need a dirty bit. Would also need an IMB for regularising state.

Would such a scheme help with context switching (a la real rather than virtual cache does with memory)?

(ST) Use some form of `dirty bit' to detect which registers are corrupted by a block (call) and save less the next time. Doesn't quite work because of different paths through the routine. Perhaps the routine could have a `dirtied' mask at return time (ORed with anything returned to it). Not quite there, but ...

Flags: C doesn't use overflow (assertion). The meaning of flags depends on the data types. The number of conditions can be reduced if you know what you were comparing for (`overflow' is V for signed and C for unsigned arithmetic). Still need zero, but this could be held with a register. Need some form of extend but same "error" flag could be used. At least one user definable Boolean is very handy. Flag or 1-bit reg. ? Three input operations? A three input add can cost little more than a 2-input one in power (compress:add) and definitely less than 2 ops. May be most useful in address generation: Array + (scaled) index + constant Pentiums can do this.

To Do

More benchmarks! Also try to choose a sensible (sub) suite.

Sort out & document armcc make process, including for Thumb.

Sort out & document arm_gcc make process. (in progress).

Find/try some other ARM compilers?

Emulate other architectures (e.g. x86); importable code? Instrumentation?

Call-tree/structure analyser. [Register allocation/usage analyser ... in progress]

Pull apart some code not from C. Start with Java (A. Dinn) Assembler??

Edit - History - Print - Recent Changes - Search
Page last modified on September 08, 2005, at 03:34 PM