The stack reuse trick

23-04-2020

Not much has been written about code golf in general—let alone about golfing in C, of all languages. My contribution here is to share with the world an old but spicy technique that is, in my opinion, quite unique to C.

Disclaimer: Since this is code golf, no guarantee on portability or safety should be expected. The snippets here should work when compiled without optimisations for a 32-bit linux system on x86. You can compile 32-bit binaries on a 64-bit machine with GCC or Clang using the -m32 option.

Consider this simple program that echos a line from standard inputHow dangerous is writing to &s? Since s is a parameter of main, it is stored on the stack before anything else. Since the stack grows towards lower addresses, the gets call will not touch the part of the stack we care about. It may overwrite data stored on the stack before main is called such as the frame pointer to the program entry point but this doesn’t matter since the program exits right after main returns. Empirically, we are good if the input isn’t too long—a reasonable assumption because having to allocate memory makes a boring game of golf..

main(s){gets(&s);puts(&s);}

I claim that it can be shortened even further.

main(s){gets(&s);puts();}

You should verify that this compiles and behaves as expected. As a fun puzzle, pause for a moment and think about why this works before reading on. Hint: the title is a spoiler.

Implicit function declarations

First of all, why does this even compile?

There exists an old language rule from the days of K&R C that deals with implicit declarations. That is, how a compiler should handle calls to an arbitrary function f whose declaration cannot be found. Most people expect this to be a compile error, but the standard states that an implicit declaration of int f() should be generated by defaultThis rule has been banished from the standard since C99. Nonetheless, GCC and clang continue to compile code that rely on this behavior (even on -std=c99), which explains the warnings if you tried to compile the example..

Question: What does int f(); mean exactly?

Surprisingly, it is not a declaration for a function that takes no arguments and returns an int. That would be int f(void);. According to the standard, this instead declares a function with an unspecified but fixed list of parameters. Since parameter information is not provided by the programmer, it is not possible for the compiler to type-check the arguments in a call to such a functionA feature that worsens type safety for no gain has got to be legacy cruft. The existence of such a backwards rule can be traced all the way back to K&R C. More details can be found in this stack overflow answer..

The above rules explain some things about our example:

Since functions have implicit declarations, we can use puts and gets without the appropriate #includeOmitting header includes is one of the first tricks a new golfer learns. Now, we know why it is allowed..
Without the prototypes in the header files, we are allowed to call puts with no arguments and our code will still compile.

This explains why the program compiles. But why does it behave correctly even with a missing argument to puts()? As the semantics of such a function call is undefined, the most reliable way to figure out what happens during runtime is to investigate the generated binary.

i386 calling convention

Our answers lie in the calling convention followed by the compiler which can be found in section 2.2 of the i386 System V ABI document.

We will assume our arguments have 32-bit size types like int or pointers. This is the default argument width for implicit declarationsThis means we can’t use functions that take char arguments. In practice, it doesn’t matter since standard library functions that should take char arguments are defined to take ints instead.Example: int putchar(int c);. The following steps are involved in a function call:

The caller shifts the stack pointer upwards.
The caller pushes the arguments onto the stack from left to right.
The function is called, which also pushes the return address onto the stack. The offset in step 1 is chosen such that the stack pointer is now properly alignedAn invariant maintained by the calling convention is that stack frames are aligned to 16-byte addresses..
The callee pushes the old base pointer onto the stack and sets its base pointer

Execution continues within this new frame. The arguments are accessed through fixed offsets from the base pointer. For reference, this is the stack layout:

frame     address   content
                    ...            (may contain padding)
caller    ebp+12    arg 2
          ebp+8     arg 1
-------------------------------------- (16-byte aligned)
callee    ebp+4     return address
          ebp       base pointer of caller
                    ...

During cleanup, the callee and caller take turns to shift the stack pointer such that it is restored to the address before any of this happened.

We can see this in the disassembly:

puts:
        push    ebp
        mov     ebp, esp
        sub     esp, 8
        ...                ; expects argument at [ebp+8]
        leave
        ret

main:
        ; main's prologue
        ...

        ; gets(&s)
        sub     esp, 12
        push    eax        ; push &s
        call    gets
        add     esp, 16    ; restore stack pointer

        ; puts()
        call    puts

        ; main's epilogue
        ...

Putting it together

We now have enough information to explain how it all works.

Notice that even though we did not call puts with any argument, both stack frames start at the same address (and so do their arguments at ebp+8). This is a side effect of the alignment requirement. Since both gets and puts are called with fewer than four arguments, the addresses of their stack frames are rounded up to the same 16-byte boundary.

Between the two calls, no other stack frame is created at the same address. This means that argument given to gets still exists on the stack by the time puts is called. puts ends up retrieving the value of the gets parameter outside its lifetime since they overlap in memory, and this value is also the expected value &s. Observationally, the stack is set up correctly for puts even though the caller messed up!

Since stack allocations are predictable, we can reliably line up function calls to save bytes by reusing transient values that still exist on the stack. The gets-puts pairing is a common instance of this technique applied. Some others that I remember seeing in the wild include:

read and write
gets and atoi
scanf and vprintf