Cool to see F# here! Emulators are a great way to learn a language. On first sight you chose well between more or less idiomatic F# for each job.
Some low hanging fruit to reduce allocations: the discriminated unions in Instructions.fs could be [<Struct>], reusing field names to reuse internal fields.
Also, minor nitpick but I'm confused about some of the registers. They are already of type byte, the setters with `a &&& 0xFFuy` don't add anything over `member val A = 0uy with get, set`. I'm guessing this changed over time.
// Registers can't be a record type because the values need to be truncated to 8 bits when writing, so setters are needed
// This is for the web renderer as Fable transpiles uint8 to Number (more than 8 bits) in JS and doesn't apply any truncation
// Known non-standard behaviour in Fable (https://fable.io/docs/javascript/compatibility.html#numeric-types)
So, I think, it's just conservatively cleaning the data due to Fable's widening via js Number on the web target.
Even if you use AI, there's a certain point where it's not clear that an AI would make you faster. F# is my favorite language, and I've been programming in it so long (since 2012) that I feel like I think in F#. Asking an AI for something can be faster if I can state my requirements informally; but if I need to specify many things precisely to an AI... why not just write the code in F#? Part of the beauty of good functional designs is that they are declarative, not imperative, so in some sense you're really just stating what you want, at finer and finer granularities, until what you want is trivial.
Even when I want code written in a different language (e.g., C/C++), I often still start by making a prototype in F#. This helps me nail down the logic without having to worry about things like allocation or layouts. Perhaps I could ask an AI to do this second step for me, and then use the F# implementation as an oracle. Anyway.
> I probably spent over 20 hours debugging, scanning the emu-dev Discord, creating tests, and even throwing the issue at earlier AI models. Nothing worked. But then after a few weeks away from the emulator I tried Claude Opus, and it found the issue in just a few minutes.
Even if you want to write all the code yourself (which is a fine decision), the only reason in 2026 to bang your head against a problem like this for 20 hours is if you really enjoy doing so.
(I'm surprised that "earlier AI models" didn't work for the author. For me, free-tier Gemini gets stuff like this correct all the time.)
I'm of the mindset that you can use AI however you want to get the speed improvements you're looking for. Personally, I use Agile methods to incrementally implement manually testable features, refine and debug, then commit. Then I use another chat/agent to keep tabs of the overall progress (giving it a summary from the agent that did the work), and then move to the next task by asking the coordinator to draft a prompt for the next bit of work I describe.
As a longtime F# developer and longtime recipient of STEM academic bullying[1] I refuse to use LLMs in large part because ChatGPT-3.5 was so ridiculously bad and obvious about copy-pasting from F# GitHub repos. I never felt the AGI, I just saw a plagiarism machine whose decorations had fallen off.
Eventually I am sure someone at Microsoft noticed and rang the RLHF alarm, so GPT improved substantially. It seems pretty usable for F#. I am sure some unprincipled F#er is crushing it with agents these days. But I didn't think "oh boy they solved the plagiarism problem, let's go generate some slop!" I thought "oh great, now it's no longer going to be blatantly obvious when ChatGPT plagiarizes." I really don't want to roll a d100, or even a d1000, to completely compromise a core value of mine in in exchange for a productivity benefit. I'll just be slow and jobless, thanks. This is serious: I am getting into solar installations and junk hauling.
[1] The "students don't want to think" problem is much older than LLMs. In 2007 I took a senior-level PDEs class, and almost everyone copied my homework because I was actually motivated to study PDEs, and too psychologically weak to resist those mean lazy math majors. Then it happened again in math grad school! Actually unbelievable. Why are you even in the program?
That's so cool! I love F#, but I wrote a little Smalltalk interpreter in it and I can confirm it isn't exactly a speed demon for that kind of thing if you use it as intended lol
I've found that with F#, I get better performance if I do dumb imperative stuff, but keep the side effects within a function. At that point, the functions can basically be "pure" but you can get decent speed.
For example, I usually like using the `Map` data structure, and that's a pretty neat immutable structure and is usually fine for most stuff, but when performance becomes critical, it's easy enough to break into a boring imperative loop with a regular hash map. If I keep everything contained into one function, I usually can avoid feeling super dirty about it.
Yes! That's exactly how you should do it while working with a language that doesn't have a compiler that will aggressively analyze, and rewrite and optimize your code for you. (So, most languages with "heavy runtimes" that support a bunch of dynamic stuff and JITs)
There are basically two points to programming with immutable-first data. One, eliminate certain classes of data race concurrency bugs. Two, less mutable state in a given context makes it easier to reason about.
So, if you're inside a function scope and you aren't launching any concurrent operations from inside that function, you don't have to worry about benefit #1. If you're inside a function (and you're not reaching out for global mutable state), then the context you need to keep in your working memory is likely fairly small, so a few local mutable variables doesn't significantly harm "understandability" of the implementation (in most cases). So, you really don't have to worry about #2, either. Make your functions black boxes with solid "APIs" (type signatures), and let the inside do whatever it needs to make it work the best.
Just because premature optimization is the root of all evil, it doesn't mean we need to jump right to premature pessimization...
Yeah, and even if you need concurrency/parallelism within the function, it can be forgivable to use ConcurrentDictionary or ConcurrentBag or one of the many, many other thread safe mutable data structures built directly into .NET.
I will personally almost always prefer the pretty functional versions of things, and that's almost always what I start with. I like immutable data structures, and they are usually more than fast enough. Occasionally, though, you hit a bottleneck of some kind (usually in some form of loop), and you have to avoid all the beautiful functional stuff and go back to sad imperative stuff. When I do that, I usually try and keep it scoped to one function. Even within one function, I do find the persistent structures easier to reason about, but as you stated it's a small enough surface area to not be too irritating.
There are exceptions to this, of course. Sometimes for caching/memoizing I will make a global ConcurrentDictionary, and I'll use the interlocked thing to do global counters sometimes.
Out of curiosity when did you write that interpreter? The entire dotnet ecosystem has seen massive speed improvements over the years, particularly for anyone who last tried them during the Framework era. Hell they even put work in to improving tail calls which the c# compiler doesn't even take advantage of (also either in the dotnet 9 or 10 timeframe f# added an attribute to make it so a recursive call that isn't a tail call throws a compiler error so you can't accidentally screw that up).
It's .NET 10 lol. It's not so slow you can't write stuff for it, I have implementations of Conway's game of life, Huffman compression, and a minimal TUI. The main problem is doing almost anything in it involves a method lookup. And there are almost certainly places I could have done things more smartly.
One thing I do want to try out is publishing it with native AOT. I had a lot of luck with that on one of my other F# projects, I got like a 75% speedup out of it. I understand the JIT is supposed to outperform native AOT in the long term but I haven't seen it reach that speed.
With some care about what features to use and when, F# can be very fast. Which is nice, use functional paradigm when you want, or low level imperative code in hot loops if you need. But yeah if you use linked lists and sequences and immutable data types everywhere it sure isn't Rust.
I always find emulators written in functional languages impressive. It tends to be much easier to map hardware to an imperative language. I enjoy seeing the functional abstractions people come up with.
Yeah I did see that part. Although he mentioned his Chip8 emulator which was fully immutable. Still interesting so see when people use the mutability escape hatches.
Insanely cool. I've had it in the back of my mind to write a Rust compiler for the game boy for a long time and everytime I see something like this I think about brushing off that project.
mildy related but wasn't there an emulator (maybe not GB but NES or SNES?) which had a visual panel showing each CPU cycle step by step? afaik it was very slow but the 1000% accuracy was the goal not playability.
Sorry for the tangent - does anyone have some really zoomed in views of GB, GBColor, GBA screens in operation? I'd love for retro shaders to be able to more faithfully reproduce.
I mean, ideally, we'd run different color test patterns through, in different lighting conditions, to build a really detailed model, right?
I'm actually starting a new project to create a gba emulator in zig, and also starting with chip8. I'm going to skip nand to tetris because I played Turing complete. Cool to see I'm on the right track!
Some low hanging fruit to reduce allocations: the discriminated unions in Instructions.fs could be [<Struct>], reusing field names to reuse internal fields.
Also, minor nitpick but I'm confused about some of the registers. They are already of type byte, the setters with `a &&& 0xFFuy` don't add anything over `member val A = 0uy with get, set`. I'm guessing this changed over time.
There is some hope for humanity after all I suppose.
Even when I want code written in a different language (e.g., C/C++), I often still start by making a prototype in F#. This helps me nail down the logic without having to worry about things like allocation or layouts. Perhaps I could ask an AI to do this second step for me, and then use the F# implementation as an oracle. Anyway.
Even if you want to write all the code yourself (which is a fine decision), the only reason in 2026 to bang your head against a problem like this for 20 hours is if you really enjoy doing so.
(I'm surprised that "earlier AI models" didn't work for the author. For me, free-tier Gemini gets stuff like this correct all the time.)
Eventually I am sure someone at Microsoft noticed and rang the RLHF alarm, so GPT improved substantially. It seems pretty usable for F#. I am sure some unprincipled F#er is crushing it with agents these days. But I didn't think "oh boy they solved the plagiarism problem, let's go generate some slop!" I thought "oh great, now it's no longer going to be blatantly obvious when ChatGPT plagiarizes." I really don't want to roll a d100, or even a d1000, to completely compromise a core value of mine in in exchange for a productivity benefit. I'll just be slow and jobless, thanks. This is serious: I am getting into solar installations and junk hauling.
[1] The "students don't want to think" problem is much older than LLMs. In 2007 I took a senior-level PDEs class, and almost everyone copied my homework because I was actually motivated to study PDEs, and too psychologically weak to resist those mean lazy math majors. Then it happened again in math grad school! Actually unbelievable. Why are you even in the program?
For example, I usually like using the `Map` data structure, and that's a pretty neat immutable structure and is usually fine for most stuff, but when performance becomes critical, it's easy enough to break into a boring imperative loop with a regular hash map. If I keep everything contained into one function, I usually can avoid feeling super dirty about it.
There are basically two points to programming with immutable-first data. One, eliminate certain classes of data race concurrency bugs. Two, less mutable state in a given context makes it easier to reason about.
So, if you're inside a function scope and you aren't launching any concurrent operations from inside that function, you don't have to worry about benefit #1. If you're inside a function (and you're not reaching out for global mutable state), then the context you need to keep in your working memory is likely fairly small, so a few local mutable variables doesn't significantly harm "understandability" of the implementation (in most cases). So, you really don't have to worry about #2, either. Make your functions black boxes with solid "APIs" (type signatures), and let the inside do whatever it needs to make it work the best.
Just because premature optimization is the root of all evil, it doesn't mean we need to jump right to premature pessimization...
I will personally almost always prefer the pretty functional versions of things, and that's almost always what I start with. I like immutable data structures, and they are usually more than fast enough. Occasionally, though, you hit a bottleneck of some kind (usually in some form of loop), and you have to avoid all the beautiful functional stuff and go back to sad imperative stuff. When I do that, I usually try and keep it scoped to one function. Even within one function, I do find the persistent structures easier to reason about, but as you stated it's a small enough surface area to not be too irritating.
There are exceptions to this, of course. Sometimes for caching/memoizing I will make a global ConcurrentDictionary, and I'll use the interlocked thing to do global counters sometimes.
One thing I do want to try out is publishing it with native AOT. I had a lot of luck with that on one of my other F# projects, I got like a 75% speedup out of it. I understand the JIT is supposed to outperform native AOT in the long term but I haven't seen it reach that speed.
https://gbatemp.net/threads/no-gmb-2-5-dos-full-version.6039...
I've got fond memories of using this to get a preview of Pokemon Gold before it was released in NA!
I mean, ideally, we'd run different color test patterns through, in different lighting conditions, to build a really detailed model, right?
I've been going through a lot of very old stuff recently and a lot of it is well preserved in a way but given enough years everything changes.
I don't think any original Gameboys have been made in twenty years or more.
Speak for yourself