Tigger C : an efficient 6502 C compiler

Tigger C is very close to C and generates efficient 6502 code: fast and compact. Tigger C is written in Hopper and takes advantage of the Hopper 6502 Assembly toolchain.
Tigger C documentation on GitHub:
github.com/sillycowvalley/Hop...
Samples shown in this video:
github.com/sillycowvalley/Hop...
Hopper releases on GitHub:
github.com/sillycowvalley/Hop...

Пікірлер: 25

  • @notexactlysiev
    @notexactlysiev3 күн бұрын

    This is very cool! That llvm-mos project can generate fairly impressive code nowadays, but I still like seeing languages made specifically to target the 6502.

  • @DrMortalWombat
    @DrMortalWombat7 күн бұрын

    The 6502 is in fact a nice target for a C compiler. And it does help, if the compiler plays to its strength, e.g. using zero page instead of stack allocated variables. Most code is non recursive, so a static allocation of local variables can be done with a simple call graph analysis.

  • @biggertigger

    @biggertigger

    5 күн бұрын

    Excellent suggestion. Thanks. Globals can now at a configurable location (any starting point and range in the zero page is an option, or just somewhere else in memory). I added call graph analysis (required lightweight first pass) and now I allocate most locals statically in the same location as the globals. With this, and other improvements, the Sieve benchmark is already running > 2x faster than in this video (2.3 seconds).

  • @espfusion

    @espfusion

    4 күн бұрын

    First time I've heard someone actually think 6502 is C friendly but I respect this perspective! If your whole program's live local variable footprint fits in 256 bytes you can manage okay but if not things could get tough. Not sure what the best strategy is then, maybe relocation? I don't think MOS made the wrong choice but if they found a way to fit a 16-bit SP or zero-page relocation thst may have been a game changer. Then again 6809 wasn't really ever that big so who knows.

  • @biggertigger

    @biggertigger

    4 күн бұрын

    I've moved on from benchmarks to use Tigger C to implement something useful: a little file system for the serial EEPROM on my 6502 SBC that runs the Hopper Runtime (virtual machine). Since Tigger C emits my Hopper 6502 Assembly syntax, I'll easily be able to prototype in C and then integrate the resulting assembly into the Hopper Runtime (which is all 6502 assembly). This exercise should prove one way or the other if Tigger C is actually good for anything in the real world ...

  • @DrMortalWombat

    @DrMortalWombat

    4 күн бұрын

    @@espfusion A stack is only one way to implement the calling semantic of C. Most real world programs that you want to run on these machines are not recursive, and thus local variables can be assigned to a "static" stack using call graph analysis. I have spent almost three years now implementing a C++ compiler for the 6502 (Oscar64) - and given the limitation of the potential programs that may be developed for the platform the CPU is an excellent design for a high level language target.

  • @espfusion

    @espfusion

    3 күн бұрын

    @@DrMortalWombat Understood, it's a good strategy. Still, even just storing the maximum live variable depth you're probably going to run out of zero page space pretty quickly, especially with competition from global/static variables and other housekeeping needed in zero page. You can start allocating the rest by absolute addressing but you'd still need pointers to be somewhere zero page at least temporarily, so I wonder if it's not usually better to just start spilling the oldest variables to and from absolute space in big blocks and writing the new stuff there... Also, while I do expect actual loop recursion isn't that common (and pretty easily specially cased) it's probably a lot more common to see the same functions appear at multiple sites in a call trace. I assume this would be handled at least to a point by aliasing the function into multiple versions but that becomes a bit of a code bloat problem.

  • @dr.ignacioglez.9677
    @dr.ignacioglez.96776 күн бұрын

    I REALY LOVE 6502 ❤❤❤

  • @billchatfield3064
    @billchatfield30644 күн бұрын

    What are the tall yellow bars on your bar graph? They're not labeled so I can't see what you're comparing it to.

  • @biggertigger

    @biggertigger

    3 күн бұрын

    Yellow is the times or Hopper (which runs on a VM), orange Tigger C.

  • @DavidLatham-productiondave
    @DavidLatham-productiondave9 күн бұрын

    8mhz on an 8086 could well be slower than your 8mhz 6502. The 8086 had many more cycles per instruction than the 6502. Most of the 6502 instructions are only 2 or 3 cycles long. Then you start getting into wait states for IO and other complexities the 6502 just doesn't have and these benchmark comparisons become hard to reason about.

  • @biggertigger

    @biggertigger

    9 күн бұрын

    Exactly. The 6502 shares its small instruction set and minimal cycles per instruction with later RISC processors. It's clear that the engineers at Acorn drew inspiration from the 6502 (used in their Atom and BBC Micro computers), influencing their design philosophy for the first ARM processors. Creating benchmarks where the 8086 excels would be straightforward: just perform multiplication or division using the MUL and DIV instructions, or leverage string-specific instructions like SCAS and CMPS. I'm pretty sure the Mandelbrot demo / benchmark would be way faster on an 8086 (thanks to IMUL and IDIV).

  • @DavidLatham-productiondave

    @DavidLatham-productiondave

    9 күн бұрын

    @@biggertigger yeah, I very conveniently failed to acknowledge the superior alu in the later processors. LoL. I should be in marketing.

  • @MechanicaMenace

    @MechanicaMenace

    6 күн бұрын

    ​​@@biggertiggerboth at the same clock speed I'm not sure. The MUL and DIV operations aren't very efficient. They helped reduce memory operations which back in the day did help the 8088 and 8086 make up for their lack of bandwidth and win out on multiplication and division but back then we were comparing ≈1mhz 6502s to ≈5mhz 8088/8086s, and in most other cases we thought of them being roughly equivalent. Both at 8mhz would see what? The 6502 at 8MiB/s, the 8086 at 4MiB/s, and the 8088 at 2MiB/s. I'm not sure even the native 16bit ALU could make up for that. Edit: this is me being curious rather than argumentative btw. I was more of a 68k weenie anyway 😋

  • @espfusion

    @espfusion

    4 күн бұрын

    It's not really a fair comparison because the 6502 basically needs 4-8x faster memory. At 8MHz in the early 80s you're probably already limited to SRAM. Hence why you didn't see > 4MHz 65xx until you got cards with small SRAM caches.

  • @biggertigger

    @biggertigger

    4 күн бұрын

    @@espfusion I've already got it running about 2x as fast as it was a week ago (2.2 seconds for the Sieve benchmark compared to 4.5 seconds in this video). Even after multiplying that by 8 to make up for my 8MHz clock, that's faster than anything on 6502 from the BYTE magazine article other than a pure assembly solution. Most of the speed improvment came from implementing a suggestion from @DrMortalWombat to use simple call graph analysis to determine which locals to use static storage for (zero page of course). The compiler uses what is available and then overflows into regular memory (which is still faster than BP+offset stack storage). With the exception of the Fibonacci sample, all local variables can be static. Arguments are still on the stack (obviously). One massive advantage we have over the gods of the 70's, like Woz for example, is our completely overkill development platforms. Back then it was mostly self-hosted which is incredibly impressive. So much easier for me to just focus on the outcome without worrying about how efficient the tools are.

  • @billchatfield3064
    @billchatfield30644 күн бұрын

    You might as well use Pascal or Modula-2 which has these features. It's going to be hard to remember the differences between this pseudo C and real C.

  • @tiggerbiggo
    @tiggerbiggo4 күн бұрын

    nice name

  • @biggertigger

    @biggertigger

    4 күн бұрын

    Couldn't agree more. :-)

  • @tmbarral664
    @tmbarral6645 күн бұрын

    Just a thought You may try a LDA $FF,X instead of DEX LDA $100,X ;)

  • @biggertigger

    @biggertigger

    5 күн бұрын

    Thanks. Well spotted. Had a problem with $00FF (probably because the Hopper 6502 Assembler saw it a $FF which would make it the zero page version of the instruction where adding X wraps back into the zero page). However, even $00FF, X is suboptimal because it causes the crossing of a page boundary (+1 cycle). So, the eventual implementation of your idea was put the offset of the MSB in X instead. Here's your DEX-less / INX-less version without the zero page wrap or extra page boundary cycle cost: // POPL [BP+0x00] // POP [0x01FF - BP -0] (16 bit) LDA ZP.BP SEC SBC # 0x01 TAX PLA STA 0x0100, X // MSB PLA STA 0x0101, X // LSB Or: // INCLI [BP+0x00] # 0x0001 LDA ZP.BP SEC SBC # 0x01 TAX INC 0x0101, X // LSB if (Z) { INC 0x0100, X // MSB } (and yes, the premble in these two cases should be LDX ZP.BP followed by DEX - this is what the code looks like between peephole optimizer and whole program optimizer)

  • @tmbarral664

    @tmbarral664

    5 күн бұрын

    @@biggertigger I believe you were mentioning something similar in the video so I may be repeating you…. 😀 But a further optimization here would be, for value of 1, to replace the SEC SBC #$01 By TAX DEX ;)

  • @biggertigger

    @biggertigger

    5 күн бұрын

    Yes. See the note at the end of my (long) reply above. Here's an invite link to the Hopper Discord server if you are interested in further optimizations ... :-) discord.gg/H8cVAvhK

  • @DrMortalWombat

    @DrMortalWombat

    5 күн бұрын

    @@biggertigger The main problem is the code representation on which you try to perform the optimizations. A stack based evaluation hides most interdependencies - you notice this in your video at around the 29 minute mark. Since the 70s compilers use intermediate representations that expose the dataflow, which culimated into SSA in the late 80s. The other problem is the low level at which you start optimizing. Compilers usually go through several levels of lowering from AST over SSAs with various levels of IL, and finally assembler with basic blocks. Each of these levels provide inside and optimization opportunities for the compiler, that cannot be done by later stages. So your compiler may benefit a lot, if you try to go with a more dataflow virtual register representation before the actual assembler.