I'm cleaning out files and found these half-written notes. See also earlier: some simple assembly
The first difference between -O0 and -O1 is that -O0 starts with pushq %rpb followed by movq %rsp, %rbp and -O1 doesn't. -O1 immediately starts with movqs from registers onto the stack, including extra registers %r12-%r15 that -O0 does not seem to be aware of.
Interesting: this link lists registers B and r12-r15 as preserved, while r8-r11 are "scratch registers". Definition of "scratch registers": "their contents should be considered (from caller's perspective) clobbered after a function call".
Lawler lists ESI and EDI as "scratch registers" that you can use for any purpose. They seem to have been imagined as "index registers" that contain the offset from an address in another register.
The -O1 code also uses more branching and is hard for me to follow. In a diff, there are few parts which are not changed.
-O1 turns on the following optimization flags: -fdefer-pop -- waits to pop arguments after a function call. Should have minimal effect. No change in .s diff. -fdelayed-branch -- "this target machine does not have delayed branches" -fguess-branch-probability -- literally no change in .s diff -fcprop-registers -- literally no change in .s diff -floop-optimize -- literally no change in .s diff -fif-conversion -- literally no change in .s diff -fif-conversion2 -- literally no change in .s diff -ftree-ccp -ftree-copyrename -ftree-ch -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-fre -ftree-lrs -ftree-sra -ftree-ter -- all ftree literally no change in .s diff -funit-at-a-time -- Changes to constant memory, but not code. -fmerge-constants -- Changes to constant memory, but not code.
so it's not clear what caused the use of the new registers and all of the code rearranging. It must have been an undocumented flag or a combination of several of the above.
Differences between -O0 and -O1:
-O0 runs push %rbp followed by followed by movq %rsp, %rbp and then subtracts $32832 (2^15+64) from the stack pointer %rsp. It does not use preserved registers %rpb, %rbp, and %r12-%r15.
-O1 copies all of the preserved registers %rsb, %rbp, and %r12-%r15 onto the stack before subtracting $32856 (2^15+88) from %rsp.
-O0 has a leave instruction, which I assume undoes the first two instructions
of pushq %rbp and movq %rsp, %rbp.
-O1 does not use a leave instruction. I assume it restores %rsp and %rbp on its own.
(Note: I once tried coding a very small assembler program without leave or those first two lines, and it crashed. Will have to try that again.)
-O0 uses cmp $0, %reg and je to test for zero.
-O1 uses test %reg %reg (itself) and je to test for zero.
-O0 moves function arguments to the stack at -32792(%rbp) through -32812(%rbp).
-O1 moves function arguments to registers %rbp, %r12d, %r13, and %ebx. This cuts down on the number of lines of code because there are fewer movls to and from the stack.
-O0 allocates space for a variable and sets it immediately.
-O1 waits until the variable is first used.
Concept for a renaming tool.
#pragma RENAME $varname %eax
- Parse the next line containing that register.
- Determine which register is modified by the instruction, if any.
- Go forwards until the target register is modified again, renaming the register to a variable.
- If the register was not modified on this line, also go backwards until the target register was last modified; and rename that line.
Result: Your viewer can display registers as variables named $foo, $bar, etc, to make assembly code easier to read. Changing them back to registers for assembly is done in the same way. The difficult part is determining "is modified" when jumps are involved. If it is assumed that any jump outside of the original range could modify any register, the tool would be little help for reading the spaghetti code produced by -O1. This idea might only work for code that is already cleanly written.