Coding in Assembly for an Apple II
In my last post, I wrote how I tried to code a port of Conway’s Game of Life for my Apple II. This port was written in C and relied on a cross compiler suite (cc65), generating binaries for the Apple’s CPU : the MOS 6502. The conclusion was that the port was viable. But unfortunately, it was also way too slow. Indeed, the 6502 is a primitive 8 bit CPU that is not well suited to be targeted by a high level compiled language such as C.
It was time to speed things up and re-write the inner loops in assembly!
I will begin by presenting the results I obtained, before explaining the “workflow” I followed to produce binaries, upload then debug them.
The code can be cloned from my Github repository:
Results obtained
I concentrated my optimizing efforts on the main “updating” function. Indeed, it is responsible for determining which cell will die and where new ones will spawn, therefore is the most time-consuming part of the program.
- Rewriting the most inner function, “count_neighbours“, in assembly.
- In my previous post, I noted that the code generated by the compiler was around 220 opcode long, which was huge.
After my rewriting the function shortened to only 30 opcode long!
- In my previous post, I noted that the code generated by the compiler was around 220 opcode long, which was huge.
- Rewriting the C code of the outer loop in order to make it more 6502-friendly.
- Using __fastcall__ calling convention, leading to the use of the accumulator and the X register to pass parameters instead of the “fake” stack.
- Eliminating indirect addressing ( ie. [X][Y] ) which are specially expensive.
- Rewriting all the update function in assembly.
Incidentally, I also abandoned the ASCII display functions provided by the compiler suite, and wrote some assembly functions to replace them. I then abandoned the original text mode to draw the cells in color 😉
And here is the normalized execution time, for 30 iterations, of all these versions. Of course, the less, the better.
Normalized execution time of iterated versions
A 18 time speed-up! Writing assembly is not always easy, but it is definitely worth it!
The 6502 opcodes are few and “logical”, thus they are quite easy to learn and use. However, as I mentioned in my previous post, this CPU is primitive by today standards. Its resources are scarce and I have to admit that I was not used to 8 bit arithmetic: ceiling at 255 can be quite frustrating!
Especially when branching. The 6502 provides very efficient conditional branching instructions. The drawback is that you can only jump 128 bytes backwards and 127 bytes forward. As most instructions are 2 or 3 byte long, you have to code tight and carefully layout your instructions if you want to use them!
Remember that most of the code (all the logic, state machine, textual displays…) is still written in C. This remaining of code is less systematic and would be much more difficult to write in ASM. But it does not matter much, as the time spent in these parts is also much smaller.
To conclude, I would therefore say that nowadays it is not too difficult to write a program for Apple II computers. Thanks to cc65, you can write the whole program in C and, if you require more performance, rewrite the most time consuming function in ASM. And thanks to modern editors and emulators, the process of writing, testing and debugging is way more easy and pleasant than it was thirty years ago!
Quick workflow
When coding such a small project in C, I could live without a debugger. But such was not the case anymore when I started to rewrite some parts in assembly.
- I was not familiar with the 6502 architecture and opcodes, so it was unthinkable that I could produce bug free binaries at my first attempts
- I don’t know how to debug a low level program without a proper debugger
Thus I established my (quick and dirty) workflow.
The assembler
That’s the easy part, as the cc65 compiler suite provides a macro assembler simply called ca65. But I did not want to rewrite all my program, only the inner loops. Thus I had to learn the compiler’s c_alling convention_, a.k.a how to call a subroutine written in ASM from the C code. And return from it… Fortunately, this part is well documented.
The only caveat I could not overcome was to access the symbols declared in C. Thus, I introduced a function, init_asm, to pass their address to my “ASM world”.
Producing a disk image
The linker will produce a perfectly viable Apple II executable file, but loading it on an emulator requires to embedded it on a disk image. For that purpose, I used AppleCommander. It is a utility (unfortunately written in java) which purpose is to manipulate Apple II’s disk images. And one bright spot is that it can be used from the command line, thus allowing to invoke it as a build step in the Makefile.
Command to add the executable to the disk image:
Command to remove the executable from the disk image:
The debugger
In order to test my program and debug it, I used an emulator: AppleWin. It is quite accurate, but most important, it also features a competent debugger! Of course, it is rougher to use than a modern one. But hey, if you’re coding in 6502 assembly that won’t stop you! 😉
At any moment, you can enter this debugger by pressing F7. It is then quite easy to place breakpoints, run until a specific address or inspect the memory. Unfortunately, after exiting the debugger the program often does not resume correctly. Thus I could not reliably place my breakpoints before launching my program :/ So I often had to place an infinite loop at the beginning of my code. When halted, I then entered the debugger to manually modify the program counter in order to exit this loop. Then I could ask the debugger to run until my area of interest.
AppleWin’s integrated debugger