alto {option}*
The following options are available. Please note, that each option should appear at most once. Otherwise, the latest occurence will overwrite all previous ones.
In order to optimize binaries with alto they have to be linked using the flags listed below. Unfortunately, this does not quite work for the gnu compiler due to a stupid bug that might have been fixed by now. the problem is that gcc and g++ pass the -r flag (from -W,-r) too late to the linker.
Here is a workaround. Run "gcc -v" this tells you which spec file gcc is using. Patch the spec file as follows. Goto the line following "*link:". make "-r " the first three characters in this line. Omit the option "-Wl,-r" in the table below.
Sloop reads in a description of a flow graph for the functions in a program (usually generated by alto using the "-P loop" option) and prints out the graphs and/or subgraphs for selected functions, loops, or basic blocks, as specified using the options below. The critical path(s) in the (sub)graph are displayed in green, while any function call out of a basic block is indicated in red.
sloop [option*]
The selector options -f, -l, and -b can be used together, e.g., the query
sloop -f -l wt = :3
specifies that the 3 heaviest loops should be printed out, together with the functions they occur in.
Options must be specified separately, each option preceded by a "-". E.g., the command "sloop -f -l -b 1000" specifies that basic block 1000 should be printed, together with the function and the loop it occurs in; but "sloop -fl -b 1000" asks for basic block 1000 and function "l".
summarize is a tool that takes the often lengthy output produced by alto and compresses it into a few lines
summarize.perl <alto-output
If you modify ALTO stick with the formatting and nameing conventions used in the code.
Every function is either static or extern. If it is extern please describe the function before its declaration. The description will be extracted into texinfo document if it is nested between `@FUN' and `@END'.
You need gnu-make (gmake) to compile alto because we need so features not available with the ordinary make. Make sure that the environment variable `OSTYPE' has the value `osf1' if you plattform is Digital Unix and some other value if you plattform is Linux.
If you want to produce a fast version of `alto' under DEC UNIX that can be optimized by itself, type `gmake alto_fast'.
If you add you add your own optimizations to Alto you will sooner or later end up debugging your code.
Here is a nice way to debugging alto that we use all the time.
Find a small program that alto fails to optimize correctly.
Now we need to find the optimization that is to blame.
Disable the hard optimizations and see whether it still does not work. Also reduce the number of easy optimization rounds so that the problem just barely shows up. Now disable each of the easy optimizations until you found the faulty one.
No we try to locate the place in the input program that causes the problem.
[NEEDS WORK: Use binary search on bbl,ins or fun use the PRINT_OPT_NUM argument ]
(A port to Linux ELF is under way !!) Currently, ALTO works with COFF binaries only. It assumes that those binaries are statically compiled. It also expects relocation information and symbol table information to be present.
A COFF binary consists of 3 segments: text, data, bss. Each segment consists of sections.
Not all section of the text segment contain code but only sections of the text segment contain code. ALTO merges the code containing sections into a single sections. It is helpful that those sections are adjacent in the text segment.
All sections in the text segment are readonly but there are sections in the data segment that are also readonly. It is necessary to know which sections are readonly so that loads from these sections can be "evaluated".
ALTO uses relocation information to to determine basic block boundaries and to find basic blocks that can be the target of indirect jumps. ALTO will also "fix" relocatable data before the binary is written back.
ALTO only considers relocation entries referencing the text segment. Luckily, there aren't all this many. There 8 byte addresses (REFQUAD) and 4 byte gp relative addresses (GPREL32) and one other class of relocation entries.
Refquads indicate the beginning of a basic block that can be jumped to from anywhere.
This is a rather coarse (but safe) assumption. Most of the time the refquad is used to describe a function address which is used in an non-computed jump, it would be nice to know which ones are used for computed jumps.
Gprel32s indicate the beginning of a basic block that is target of a switch statement (computed jump).
It is a big pain to figure out what the possible targets of a switch statement are and it would be nice if the symbol table provided this information.
There is a third type of relocation information used with init and fini sections.
The symbol table is not used all this much. ALTO uses it to associate names with addresses (eg. function name) and to find function boundaries. The latter could also be achieved using procedure descriptors.
Procedure descriptors are not used anymore but they contain important information and I might look into them again.
Some of this information is approximated in ALTO
As more people work on ALTO (at the UofA) we will need to switch to a less restrictive versioning mechanism. CVS seems to be a good candidate.
This is only relevant for shared delopment (at the UofA) Currently, we are using rcs but we are planning on swiching to cvs soon.
To check out a file you load the file from your shadow directory (NOT the master directory!) into Emacs. You then check it out using C-x C-q or the menu option `Tools:Version Control:Check Out'.
This will make the file writable for you and the corresponding file in the master directory is now "locked" for everyone else (pure magic!). They can still read the original version but they cannot change it.
If you now make changes and save the file, the new file will be saved in your shadow directory and not in the master directory. Everyone else will still see the (locked) original.
When you have updated a file and made sure that it still works, you want to put it back in the master directory for everyone else to see. You do this i two steps.
First you check in the file in Emacs using C-x C-q again. This gives you the opportunity to write a comment on your changes. Type C-c C-c when your comment is written. The file in the master RCS directory is now updated. No update of the work file in the master directory itself taken place. To make this happen you must use the shell script `fix.csh' in your shadow directory. Run it in your shadow directory with the name of the file you just checked in as the only argument.
`./fix.csh file'
This script also replaces the file in your directory with a symbolic link to the file in the master directory again. This will make future changes made by others visible to you.
Change into the main src directory.
` rcs -n snapshot-name: *'
Checking Out a Snapshot
`co -rsnapshot-name src/RCS/*'
Stealing a Lock
`rcs -u file '
The files generated in the "altotestdir" directory are as follows (since they are often very big, the script might delete some of them):
Testing alto for speed improvement works similar to the correctness test. Only a different script is used:
`/r/sin/home/debray/ALTO/bin/speed.tcsh'
Instead of comparing the timings wiht the "unprocessed" versions of the programs you can specify a second alto binary in the environment variable "altotestexe2" to generate programs with.
`setenv 'altotestexe2 ~user/alto/new_alto -O loop'' This enables you to either compare different versions of alto, or the same version but with different command line options (turning particular optimizations on or off for instance.)
`/r/sin/home/debray/ALTO/bin/nocheck.tcsh'
`/r/sin/home/debray/ALTO/bin/scrutinize.tcsh'
This script has not been updated recently so it might be broken. Please check ftp://ftp.cs.arizona.edu/people/muth/pfm.tar.gz for the latest version of the pfm tool which is used by scrutinize.
You can check only a subset of the benchmark suite by giving an extra argument to "check.tcsh" or speed.tcsh.
The argument is interpreted as a file name. Only those benchmarks will be used that have a file whose name matches that argument in their subdirectory.
The benchmarks reside in "/r/sin/home/debray/ALTO/benchmark".
Supose you have trouble with the "go" and xlisp benchmark. Do
`touch /r/sin/home/debray/ALTO/benchmark/go/badbadbad'
`touch /r/sin/home/debray/ALTO/benchmark/xlisp/badbadbad'
Now you can use
`/r/sin/home/debray/ALTO/bin/check.tcsh badbadbad'
to just check those two benchmarks.
In order to figure out what subsets have been defined already use:
`/r/sin/home/debray/ALTO/bin/info.tcsh'
The alpha is a 64 bit architecture. Therefore, the use of pointers is rather expensive. In alto all objects are accessed by a 32 bit index relative to a base address rather that directly by a 64 bit address, ie. the objects are stored in an array. The drawbacks is are that (1) one has to know in advance how many objects of a certain type will be used during program execution and that (2) accessing objects is a little cumbersome.
We assume rather high upper bounds on the number of objects to remedy the first problem. They can also be changed using comand line options. Since the OS uses demand paging that is allocated memory does not really consume RAM unless it is used there is no big disadvantage. We address the second problem by providing macros to access object members.
Having all objects of a certain type in an array increases locality and allows us to iterate over them without an additional data structure.
The elements of an object are accessed indirectly using macros this allows us to change the underlying datastructures with having to change the entire program, eg. one of the plans is to stripe the individual components of a datastructure. In fact if one needs additional elements of an object for one particular optimization we suggest to allocate a separate array just for these new elements and define the necessary macros. An example can be found in "eval.c".
Since alto is rather space efficient we can affort the luxury of never deleting an object in order to reuse its space. Instead we just allocate a new object. This eliminates bugs to dangling pointers and simplifies iterating over lists which are simultaneously modified. New objects will always have higher numbers than old ones!
For each function a FUN object is allocated.
A function consists of a doubly linked list of basic blocks. The first bbl of this list is the init bbl and the last bbl is the exit bbl
AFUN_FLAGS
AFUN_BBL_FIRST
AFUN_BBL_LAST
AFUN_NAME(i)
AFUN_STK_MAX
AFUN_STK_MIN
AFUN_REGS_USED
AFUN_REGS_THROUGH
AFUN_REGS_SAVED
AFUN_REGS_CHANGED
AFUN_IS_RECURSIVE
FOREACH_FUN(f)
FOREACH_LIVE_FUN(f)
FOREACH_LIVE_FUN_R(f)
FF_STK_INVALID
FF_STK_ARG
FF_STK_ALIAS
FF_STK_BAD_ALIAS
FF_SETJMP
FF_LONGJMP
FF_PSEUDO
For each basic block a BBL object is allocated. BBLs belonging to one FUN are connected in a doubly linked list.
BT_CALL
BT_JUMP
BT_SWITCH
BT_NORM
BT_UNCOND
BT_BRANCH
BT_EXTRA
BT_RET
BT_HALT
BT_EXIT
BT_PAL
ABBL_FLAGS
ABBL_FUN
ABBL_NEXT
ABBL_PREV
ABBL_PRED
ABBL_SUCC
ABBL_INS_FIRST
ABBL_INS_LAST
FOREACH_BBL(i)
FOREACH_FUN_BBL(f,b)
FOREACH_FUN_BBL_R(f,b)
FB_ALIGNED
FB_CRIT_PATH
FB_CRITICAL
FB_SCHEDULED
FB_LOOP_HEADER
FB_CRIT_ESC
FB_PROG_ENTRY
For each instruction a INS object is allocated. Instructions belonging to one BBL are connected in a doubly linked list.
The instruction type is derrived from the opcode. The types are choose to simplify optimizations and scheduling.
IT_MSC
IT_LDA
IT_ILD
IT_FLD
IT_ULD
IT_LLD
IT_IST
IT_FST
IT_UST
IT_CST
IT_IBR
IT_FBR
IT_BRA
IT_JMP
IT_ICM
IT_FCM
IT_ICP
IT_FCP
IT_FM4
IT_FM8
IT_FD4
IT_FD8
IT_F2I
IT_I2F
IT_LOG
IT_SFT
IT_MLL
IT_MLQ
IT_MLH
IT_IOP
IT_FOP
IT_RX
IT_FCY
IT_MEM
IT_EXT
IT_PAL
IT_F2R
IT_R2F
IT_ERR
AINS_FLAGS
AINS_BBL
AINS_NEXT
AINS_PREV
FOREACH_INS(i)
FOREACH_BBL_INS(f,b)
FOREACH_BBL_INS_R(f,b)
Alto uses the DEC Alpha machine instructions as its intermediate code. Therefore, it inherits all its peculiarities.
A typical Alpha instruction reads (uses) two registers and writes (defines) a third one.
In ALTO you can access those registers using the macros AINS_REG_A(iins), AINS_REG_B(iins) and AINS_REG_C(iins) where "iins" is an instruction id. The macros return an 8-bit-signed number (type REG) specifying the register. 0-31 represents an integer register, 32-63 represents a floating-point register, 64 represents an integer intermediate value. (The file "regs.h" contains the corresponfing symbolic definitions.) When an instruction does not make use of all three registers the unused ones contain either 31 or 63 which are the integer/floatin-point zero registers. Two instruction subgroups do not adhere to this convention:
conditional-move-instructions (IT_ICM,IT_FCM) implicitly read the register AINS_REG_C(iins) besides writing it.
In order to shield the programer form these exceptions two functions have been introduced to ALTO:
AinsDef(iins) returns a 64-bit bit-mask (type REGLIST) representing the read registers of the instruction. AinsUSe(iins) returns a 64-bit bit-mask representing the written registers of the instruction. Here bit number i in the mask represent register number i.
The shielding is not perfect though since a conditional-move-instructions does not always define its target register.
For each edge in the control flow graph an EDG object is allocated. Edges that are successors of one BBL are connected in a single linked list. Edges that are preddecessors of one BBL are connected in a single linked list.
<dt> ET_BR_TARGET <dd> Used With: JMP and BR. <br> Head: BBL containing jsr/bsr instruction. Tail: Init BBL of target function. <br> Stored in: ABBL_BR_TARGET <br> <dt> ET_FALL_THRU <dd> Used With: Fall Throughs <br> Head: this BBL <br> Tail: Next BBL <br> Stored in: ABBL_FALL_THRU <br>
<dt> ET_EXIT_LINK <dd> Used With: RET <br> Head: BBL containing RET instruction, BBL ABBL_JMP (0) <br> Tail: Exit BBL (dummy), relocatable BBL <br>
<dt> ET_SWITCH <dd> Used With: RET <br> Head: BBL containing RET instruction, BBL ABBL_JMP (0) <br> Tail: Exit BBL (dummy), relocatable BBL <br>
<dt> ET_BR_DUMMY <dd> Used With: RELOC and setjmp/longjmp type functions <br> Head: BBL containing RET instruction, BBL ABBL_JMP (0) <br> Tail: Exit BBL (dummy), relocatable BBL <br>
<dt> ET_COPENSATE <dd> Used With: RELOC bbls <br> Head: BBL containing RET instruction, BBL ABBL_JMP (0) <br> Tail: Exit BBL (dummy), relocatable BBL <br> </dl>
ET_FUN_CALL
ET_JUMP
ET_TRUE
ET_FALSE
ET_UNCOND
ET_NORM
ET_FUN_RETURN
ET_SWITCH
ET_FUN_LINK
ET_HELL
ET_EXIT_LINK
ET_SETLONGJMP
ET_COMPENSATE
ET_EXTRA
AEDG_FLAGS
AEDG_HEAD
AEDG_FROM
AEDG_TAIL
AEDG_TO
AEDG_SUCC
AEDG_PRED
The AEDG_FROM and AEDG_TO were originally called AEDG_HEAD and AEDG_TAIL but since they were defined in the opposite way from the Dragon book their use is not encouraged (although they still exists):
FOREACH_EDG(e)
FOREACH_BBL_SUCC(b,e)
FOREACH_BBL_PRED(b,e)
Given:
ibbl callsite next returnsite target entry node of callee fun
task: insert next_new as new returnsite
Code:
INDEX next_new = AbblNew( ); ABBL_TYPE(next_new) = BT_NORM; ABBL_FUN(next_new) = ifun; AbblAppendFancy(next_new); AedgLinkFancy(AedgNewFancy(next_new,next,ET_NORM)); AedgKillFancy( ABBL_EDG_CALL(ibbl)); AedgKillFancy( ABBL_EDG_LINK(ibbl)); AedgLinkFancy(AedgNewFancy(ibbl,next_new,ET_FUN_LINK)); AedgLinkFancy( AedgNewFancy(ibbl,target,ET_FUN_CALL));
Registers are numbered 0-63 and correspond to the alpha register with the same number. Register 64 is special and indicates that an instruction carries an immediate value which can be found using AINS_DATA().
There is also a register set type REGSET which uses a bit representation for each of the 64 registers.
See also regs.h .
Resource follow the same use,def paradigm as registers. The effect is that "uses" of the same resource maybe reordered, eg. during rescheduling. The following resources might be associated with an instruction:
RES_TR (trap)
RES_PC
RES_LD (load)
RES_ST (store)
AEDG_MM (memory)
Great effort has been made to make the control-flow-graphs (CFGs) produced by ALTO as precise and useful as possible. Precise means all possible controlflow is modelled. Useful means the design of interprocedural analyses is simplified. The preciseness requirement lead to the introduction of a variety of edges that might be confusing at first.
The general flavor is based on interprocedural control-flow- graphs (cite something). So a function call is modelled using three edges:
A number of pseudo functions are added to model unknown controlflow.
PSEUDO-HELL
PSEUDO-PAL
We plan on adding a additional Pseudo function for setjmp longjmp
Each edge that crosses procedures boudaries has an associate edge wich models the control flow back to the function. If i is the id of the interprocedural edge than i+1 is the id of the associated edge. <br> For a call edge (ET_FUN_CALL), the associated edge is a return edge (ET_FUN_RETURN) and leads from the exit bbl of the called function to the bbl after the callsite. <br> For all other edges, the associated edge is a compensation edge (ET_COPENSATE) and leads from the exit bbl of the called to the exit bbl of the calling function. <br> The existance of interprocedural edges that are not call edges is due to handcoded assembly langugage in some of the libraries.
For some functions the ususual call and return edges are not enough to model the control flow accurately. For setjmp type functions an extra edge of type ET_SETLONGJMP is added from AFUN_JSR to the exit bbl of the function. For longjmp type functions an extra edge of type ET_SETLONGJMP is added from the exit bbl of the function to AFUN_JSR.
Relocation information is needed to indentify possible target of computed jumps like jmp,jsr.
If the target address of a jmp or jsr instruction can be computed the instruction will be "strength reduced" to its non-computed counterpart jmp->br,jsr->bsr:
This conversion in done as part of the "partial evaluater optimization", so the CFG might change over time.
If this strength reduction is not possible we assume that all relocatable bbls are possibe targets (via PSEUDO_HELL).
We exploit the fact that there are different types of relocation entries:
BinaryConstructFlowgraph
)BinaryProcess
)BinaryAssemble
)extern void Usage()
Print Alto usage information
extern void ParseCommandLine(int argc, char *argv[])
Parse command line and set internal flags accordingly. Allocate space for main datastructures (the size of which should evetually be configurable by commandline options)
extern NUMBER AfunRegisterSaveOffset(INDEX ifun, REG reg)
Assuming register reg is a saved register of ifun find its stack offset. NEEDS WORK!
extern INDEX AfunRegSaveIns(INDEX ifun, REG reg)
Assuming register reg is a saved register of ifun find its save instruction. NEEDS WORK!
extern void AfunBypassSaveRegister(INDEX ifun,REG reg_saved, REG reg_bypass)
Assuming register reg_saved is a saved register of ifun avoid the save and restore operations by bypassing it through reg_bypass
extern void AfunKillLoadStoreSaveRegister(INDEX ifun,REG reg)
Assuming reg is a saved register in ifun delete its save/restore pair.
extern REGLIST AfunScratchRegisters(INDEX ifun)
Find the registers in ifun which are neither used nore defined
extern BOOL AfunIsLeaf(INDEX ifun)
Return True if ifun does not call any other function.
extern BOOL AfunIsLoopAnalyzable(INDEX ifun)
Return True if ifun can be loop-optimized without making the optimizer go into an infinite loop
extern NUMBER AfunReachableFirst(INDEX ifun)
Return True if all bbls of ifun can be reached from the first (init) bbl
extern NUMBER AfunReachableLast(INDEX ifun)
Return True if all bbls of ifun can be (back) reached from the last (exit) bbl
extern NUMBER AfunNSwitches(INDEX ifun)
Return the number of bbls of type BT_SWITCH in ifun
extern NUMBER AfunNSwitchesCritical(INDEX ifun)
Return the number of critical bbls of type BT_SWITCH in ifun
extern NUMBER AfunNJumps(INDEX ifun)
Return the number of bbls of type BT_JUMP in ifun
extern NUMBER AfunNJumpsCritical(INDEX ifun)
Return the number of critical bbls of type BT_JUMP in ifun
extern BOOL AfunHasEsc2(INDEX ifun)
Return True if ifun has outgoing escaping edges
extern BOOL AfunHasEsc2Critical(INDEX ifun)
Return True if ifun has outgoing escaping edges from critical bbls
extern BOOL AfunHas2Esc(INDEX ifun)
Return True if ifun has incoming escaping edges
extern BOOL AfunHas2EscNotFirst(INDEX ifun)
Return True if ifun has incoming escaping edges to bbls other than the init bbl
extern BOOL AfunIsSetjmp(INDEX ifun)
obsolete
extern BOOL AfunIsLongjmp(INDEX ifun)
obsolete
extern REG AfunReturnRegister(INDEX ifun)
Return the return register of ifun
extern NUMBER AfunNCriticalCallSites(INDEX ifun)
Return the number of call edges calling ifun from critical bbls, does include incoming escapes.
extern NUMBER AfunNHotCallSites(INDEX ifun)
Return the number of call edges calling ifun from hot bbls, does include incoming escapes.
extern NUMBER AfunNCallSites(INDEX ifun)
Return the number of call edges calling ifun, does include incoming escapes.
extern NUMBER AfunNInssCritical(INDEX ifun)
Return the number of critical inss in ifun
extern NUMBER AfunNInss(INDEX ifun)
Return the number of inss in ifun
extern NUMBER AfunNBblsCritical(INDEX ifun)
Return the number of critical bbls in ifun
extern NUMBER AfunNBblsHot(INDEX ifun)
Return the number of hot bbls in ifun
extern NUMBER AfunNBbls(INDEX ifun)
Return the number of bbls in ifun
extern void AfunKill(INDEX ifun)
Kill the function including it bbls edgs and ins.
extern INDEX AfunNewFancy()
Create a new essentially empty function.
extern void AbblBranchTwist(INDEX ibbl)
Assumning that ibbl has a conditional branch reverse the true and false branch.
extern BOOL OptBranchForwarding(FILE *fp)
Follow chains of empty basic blocks until a nonempty basic block is found which is returned.
extern INDEX AfunDupCriticalPath(INDEX ifun,double flow_frac)
Duplicate The Critical Path of a Function
extern INDEX AfunDup(INDEX ifun, double flow_frac)
Duplicate (Clone) a Function
extern BOOL OptDuplicateFunctions(FILE *fp)
Inline a function "callee" at the basic block "callsite"
extern BOOL AbblIsReachable(INDEX callfrom, INDEX callto)
Returns TRUE if it is possible to reach callto from callfrom via a sequence of function calls followed backwards, i.e., from callee to caller. This code is not very efficient, but can (and should) be made faster if it's found to be a bottleneck.
extern NUMBER AedgMarkRecursiveCalls()
extern BOOL AfunWillNotBeInlined(INDEX ifun,BOOL usepath)
extern INT32 AbblLoopNCriticalInss(INDEX loophdr)
extern INT32 AfunNCriticalInss(INDEX ifun)
extern BOOL AfunNotRecursivelyInlinable(INDEX callee, BOOL usepath)
Returns TRUE if function callee should not be considered for recursive inlining. At this time, a recursive function callee is not considered for inlining if any of the following hold:
(1) callee is a pseudo-function (shouldn't happen, but included for safety);
(2) callee contains a setjmp or longjmp;
(3) callee does not have a "good" callsite (hence, noweher for it to be inlined to);
(4) callee contains an indirect jump;
(5) callee contains a jump through a jump table (i.e., a switch);
(6) callee contains an "escaping" edge to another function;
If the boolean argument usepath is TRUE, only the critical path is considered.
extern REG RegsGetFirst(REGLIST rl)
Return the lowest positon of a set bit in rl.
extern BOOL AfunCallsBadFunction(INDEX ifun)
obsolete
extern BOOL OptimizeEasy(FILE *fp)
Iteratively Execute Easy Optimization This function calls all the easy optimizations.
extern BOOL OptimizeHard(FILE *fp)
The Hard Optimization. This function is called only once sandwiched between calls to OptimizeEasy. It calls all the optimizations that should be applied only once. Because of problems with jumptable an ugly call to OptimizeEasy was necessary.
extern BOOL OptBranchTwist(FILE *fp)
obsolete (is now part of layouter)
extern BOOL OptBranchConditional(FILE *fp)
If both branches of a conditional point to the same basic block the conditional is useless.
extern BOOL OptBranchForwarding(FILE *fp)
Forward branches over empty basic blocks.
extern BOOL OptBranchMerge(FILE *fp)
Merge basic blocks which are connected by a normal edge if the target basic block has no other incoming edges.
extern BOOL OptBlockDuplicate(FILE *fp)
currently disabled
extern BOOL OptConstantsOne(FILE *fp)
Compute constants faster.
extern BOOL OptConstantsTwo(FILE *fp)
Sophisticated constant propagation. We resuse a constant in some other register to (hopefully) reduce register usage. This is only done on the basic block level. We do not kill instructions but expect useless code to be eliminated by other optimizations. We also allow construction of values from other using either explict lda's or doing it implicitly for memory format instruction. Care must be taken to not construct values that point into code sections since those values may change!!!
extern BOOL OptDuplicateFunctions(FILE *fp)
Duplicate Functions. To reduce "bad data flow" into function called from unknown callsites, eg. "hell". All functions which have "normal" callsite and are called by hell are cloned (duplicated). The normal callsites will call one clone and hell will call the other.
extern BOOL OptInline(FILE *fp,BOOL usepath )
Inline Frequently Called Functions.
extern BOOL OptAvoidLoads(FILE *fp)
Avoid reloading of a value from memory. Pairs of store/load instruction that provably reference the same memory location are replace by a store/move pair.
extern BOOL OptCopyPropagation(FILE *fp)
Propagate copies. Uses of registers are replaced with uses of the oldest copy of that register. This optimization could be extended to make use of lda with nonzero offsets. To "freeze" the changes made by this optimization. Dead code should be removed right after it.
extern BOOL OptMoveElimination(FILE *fp)
Eliminate Moves
This optimization is a reasonable hack until we have a real register reallocator. The idea is that if we eliminate a move operation we
o save an instruction
o can potentially reduce the register usage
Given move x,y We have two options.
o we can make the defintion of x a defintion of y
o we can make all uses of y uses of x
The first option is explores here the second option is implicitly performed by the copy propagation elimination.
The interactions with the Copy Propagation Optimization are unclear.
Conditional moves are a pain in the butt and need very careful coding.
Once we have determined that optimization can be applied we need to do the following:
o the old defining instruction for x should now define y o each use of x should be converted into a use of y once target becomes redefined we are done
The problems is with conditional move instructions which do not really define their target register:
o when we look for the defining instruction for x r we will ignore condtional moves.
Some bad cases:
def x use x use y (bad) move x,y
def x move x,y use y redef y use x (bad)
the following code appears in cos
def x move x,y condmove a,x condmove y,z (bad)
extern BOOL OptStackSaveRegs(FILE *fp)
Avoid Unneccessary Callee Register Savings Not implemented yet but code is scanning for opportunities and prints them.
extern BOOL OptDeleteStack(FILE *fp)
Delete Unused Stack. A stackframe which is never used will be deleted. This optimization seems to be not very effective. and prints them.
extern BOOL OptLeafSaveRegisters(FILE *fp)
Avoid Callee Save Registers in Leaf Functions CURRENTLY NOT USED.
extern BOOL OptCalleeSaveRegisters(FILE *fp)
Delete unused stacks. Currently almost identical to OptStackSaveRegs.
extern BOOL OptStackMerge(FILE *fp)
Merge stacks. This optimization tries to reduce the number of stack pointer manipulations to one at the beginning of the entry bbl and one before each return instruction.
extern BOOL OptTableJumps(FILE *fp)
Identify possible targets in a BT_JUMP basic block. If all targets can be identifies the block changes its type to BT_SWITCH
extern BOOL OptGprel(FILE *fp)
If all BT_JUMP block in a function have been converted into BT_SWITCH blocks. We feel it is safe to eliminate edges from HELL to GPREL32 basic blocks in this function.
extern void AbblInstallNewJumpTable(INDEX ibbl)
Given a basic block of type BT_SWITCH generate a new jumptable for it in the enlarged data section. This useful when we duplicated this basic block and now need to use a different jumptable than the original.
extern BOOL OptNormalizeInstructions(FILE *fp)
Normalize Instructions. Often there are several instruction which achieve the same effect eg, lda r0,5(r1) or add r1,5,r0. In order to reduce the number of cases to be considered by other optimizations we normalize instructions. Those normlization should be (but aren't) undone before scheduling picking the instruction which has the nicer scheduling behaviour, eg. use bis r5,r5,r6 instead of lda r6,0(r5)
extern BOOL OptEliminateNoops(FILE *fp)
Eliminate No-ops
extern BOOL OptRemoveUnreachableCode(FILE *fp)
Eliminate Unreachable Code .
extern BOOL OptUnlinkEmptyBbls(FILE *fp)
Eliminate Empty Basic Blocks. Remove empty basic blocks which must be a bt_norm basic block by forwarding its incoming edges to the next bbl Except if the basic block is a dummy entry basic block
extern BOOL OptPeepHole(FILE *fp)
Peephole optimizations and strength reductions
extern BOOL OptCondMoveOne(FILE *fp)
Conditional move generation Conditional moves are introduced for the basic block on the left hand side
O |\ | O |/ O
extern BOOL OptCondMoveTwo(FILE *fp)
Conditional move generation. Conditional moves are introduced for the basic blocks in the middle
O / \ O O \ / O
extern void ComputeStackInfo(FILE *fp)
Analyze stack size and usage. This anlysis determines the stack behaviour of all the functions. Upon completion
AFUN_MIN(fun) contains the maximum stackframe size used my the function (this is a negative number). AFUN_MAX(fun) if the stackframe size varies this is the smallest stack framesize encountered. (this is a negative number).
For each basic block in fun
ABBL_STACK_CHANGE_IN(bbl) contains the stackpointer relative to what it was at function entry. Stackmerging invalidates this field.
A variety of flags are set or cleared for functions:
AFUN_HAS_STK_ARG(fun) the function accesses the stack outside of its frame AFUN_HAS_STK_ALIAS(ifun) the function copies the value of the stackpointer plus an offset to another register (using lda) AFUN_HAS_STK_BAD_ALIAS(ifun) the function uses the value of the stack pointer with an instruction different from lda. AFUN_HAS_STK_INVALID the function does bad things with the stack
This document was generated on 18 November 1998 using the texi2html translator version 1.52.