(A port to Linux ELF is under way !!) Currently, ALTO works with COFF binaries only. It assumes that those binaries are statically compiled. It also expects relocation information and symbol table information to be present.
A COFF binary consists of 3 segments: text, data, bss. Each segment consists of sections.
Not all section of the text segment contain code but only sections of the text segment contain code. ALTO merges the code containing sections into a single sections. It is helpful that those sections are adjacent in the text segment.
All sections in the text segment are readonly but there are sections in the data segment that are also readonly. It is necessary to know which sections are readonly so that loads from these sections can be "evaluated".
ALTO uses relocation information to to determine basic block boundaries and to find basic blocks that can be the target of indirect jumps. ALTO will also "fix" relocatable data before the binary is written back.
ALTO only considers relocation entries referencing the text segment. Luckily, there aren't all this many. There 8 byte addresses (REFQUAD) and 4 byte gp relative addresses (GPREL32) and one other class of relocation entries.
Refquads indicate the beginning of a basic block that can be jumped to from anywhere.
This is a rather coarse (but safe) assumption. Most of the time the refquad is used to describe a function address which is used in an non-computed jump, it would be nice to know which ones are used for computed jumps.
Gprel32s indicate the beginning of a basic block that is target of a switch statement (computed jump).
It is a big pain to figure out what the possible targets of a switch statement are and it would be nice if the symbol table provided this information.
There is a third type of relocation information used with init and fini sections.
The symbol table is not used all this much. ALTO uses it to associate names with addresses (eg. function name) and to find function boundaries. The latter could also be achieved using procedure descriptors.
Procedure descriptors are not used anymore but they contain important information and I might look into them again.
Some of this information is approximated in ALTO
Go to the first, previous, next, last section, table of contents.