Asking Gemini 3 to generate Brainfuck code results in an infinite loop

(teodordyakov.github.io)

60 points | by TeodorDyakov 3 hours ago

16 comments

egeozcan 2 hours ago
Gemini is very prone to go into an infinite loop. Sometimes, it even happens with Google's own vibe coding IDE (Antigravity): https://bsky.app/profile/egeozcan.bsky.social/post/3maxzi4gs...
[-]
- mixel 2 hours ago
  It also happened to me in the gemini-cli. It tried to think but somehow failed and putted all thoughts into the output and tried again and again to switch to "user output". If was practically stuck in an infinite loop
- ACCount37 1 hour ago
  All LLMs are, it's an innate thing. Google just sucks at the kind of long context training you need to do to mitigate that.
boerseth 2 hours ago
> Brainf*ck is the antithesis of modern software engineering. There are no comments, no meaningful variable names, and no structure
That's not true. From the little time I've spent trying to read and write some simple programs in BF, I recall good examples being pretty legible.
In fact, because the language only relies on those few characters, anything else you type becomes a comment. Linebreaks, whitespace, alphanumeric characters and so on, they just get ignored by the interpreter.
Have a look at this, as an example: https://brainfuck.org/chessboard.b
[-]
- librasteve 1 hour ago
  I also wonder whether brainfuck (ie turing machine like) coding would be a more efficient interface to LLMs
  For those who want to try it, there’s always the https://raku.org module…
```
  use Polyglot::Brainfuck;
    
    bf hi { 
        ++++++++[>++++[>++>+++>+++>+<<<<-]>
        +>+>->>+[<]<-]>>.>---.+++++++..+++.
        >>.<-.<.+++.------.--------.>>+.>++. 
    }
    
    say hi.decode; # Hello World!
    
    bf plus-two {
        ,++.
    }
    
    say plus-two(buf8.new: 40).head; # 42
```
- btreecat 32 minutes ago
  > That's not true. From the little time I've spent trying to read and write some simple programs in BF, I recall good examples being pretty legible.
  Anything in a reasonably familiar type face and size will continue to be legible, however brainfuck is not easily human parsable.
  Greatly reducing its ability to be _read and mentally internalized._ With out that, are you really doing software engineering or are you actually a software maintenance person?
  A janitor doesn't need to understand how energy generation works if he has to change the light bulb.
- tgv 1 hour ago
  To me, that's still unreadable. While the intention of the code may be documented, it's pretty hard to understand if that "+" is really correct, or if that "<" should actually be a ">". I can't even understand if a comment starts or terminates a particular piece of code.
  BTW, how come there are dashes in the comment?
  [-]
  - tromp 1 hour ago
    The initial long comment starts with the [ command and ends with the ] command so it forms a loop that is executed while the current cell is nonzero. But initially, all tape cells are zero, so the whole loop is in fact skipped.
    Readability is a spectrum. The brainfuck code is still somewhat readable compared to for instance this Binary Lambda Calculus program:
    00010001100110010100011010000000010110000010010001010111110111101001000110100001110011010000000000101101110011100111111101111000000001111100110111000000101100000110110
    or even the lambda term λ(λ1(1((λ11)(λλλ1(λλ1)((λ441((λ11)(λ2(11))))(λλλλ13(2(64)))))(λλλ4(13)))))(λλ1(λλ2)2) it encodes.
tessierashpool9 38 minutes ago
Asked for a solution of a photographed Ubongo puzzle: https://gemini.google.com/share/f2619eb3eaa1
Gemini Pro neither as is nor in Deep Research mode even got the number of pieces or relevant squares right. I didn't expect it to actually solve it. But I would have expected it to get the basics right and maybe hint that this is too difficult. Or pull up some solutions PDF, or some Python code to brute force search ... but just straight giving a totally wrong answer is like ... 2024 called, it wants its language model back.
Instead in Pro Simple it just gave a wrong solution and Deep Research wrote a whole lecture about it starting with "The Geometric and Cognitive Dynamics of Polyomino Systems: An Exhaustive Analysis of Ubongo Puzzle 151" ... that's just bullshit bingo. My prompt was a photo of the puzzle and "solve ubongo puzzle 151"; in my opinion you can't even argue that this lecture was to be expected given my very clear and simple task description.
My mental model for language models is: overconfident, eloquent assistant who talks a lot of bullshit but has some interesting ideas every now and then. For simple tasks it simply a summary of what I could google myself but asking an LLM saves some time. In that sense it's Google 2.0 (or 3.0 if you will)
[-]
- dktp 16 minutes ago
  Deep research, from my experience, will always add lectures.
  I'm trying to create a comprehensive list of English standup specials. Seems like a good fit! I've tried numerous times to prompt it "provide a comprehensive list of English standup specials released between 2000 and 2005. The output needs to be a csv of verified specials with the author, release date and special name. I do not want any other lecture or anything else. Providing anything except the csv is considered a failure". Then it creates it's own plan and I go further clarifying to explicitly make sure I don't want lectures...
  It goes on to hallucinate a bunch of specials and provide a lecture on "2000 the era of X on standup comedy" (for each year)
  I've tried this in 2.5 and 3. Numerous time ranges and prompts. Same result. It gets the famous specials right (usually), hallucinates some info on less famous ones (or makes them up completely) and misses anything more obscure
  [-]
  - tessierashpool9 6 minutes ago
    I mean, isn't that a little ridiculous? Aren't those language models already solving complicated exam questions and mathematical problems?
dangoodmanUT 5 minutes ago
Gemini does this a lot, getting stuck generating the same tokens over and over indefinitely
nubinetwork 2 hours ago
Too bad it can't explain why it does the same thing with actual English.actual English.actual English.actual English.actual English.actual English.
[-]
- DonHopkins 48 minutes ago
  It overruns the max headroom of the context window.
  https://youtu.be/cYdpOjletnc?t=6
bdg 2 hours ago
I wonder if going the other way, maxing out semantic density per token, would improve LLM ability (perhaps even cost).
We use naturally evolved human languages for most of the training, and programming follows that logic to some degree, but what if the LLMs were working in a highly complex information dense company like Ithkuil? If it stumbles on BF, what happens with the other extreme?
Or was this result really about the sparse training data?
[-]
- weli 1 hour ago
  I wonder the same. I think a language like pascal is more semantically rich than C-like languages. Something like:
```
   unit a;

   interface

     function bar(something: Integer): Integer;

   implementation

     uses b;

     var
       foo: Boolean;

     function bar(something: Integer): Integer;
      begin
        repeat
           Result := b.code(something);
        until Result <> 0;
      end;

    end.
```
  Probably holds more semantically significant tokens than the C-counterpart.
  But with LLM's the problem's gotta be training data. But if there was as much training data in Pascal as there is in C it would be pretty cool to see benchmarks, I have a hunch Pascal would do better.
  (Sorry for the bad pascal I haven't programmed in ages)
neonbjb 2 hours ago
> So it made me wonder. Is Brainf*ck the ultimate test for AGI?
Absolutely not. Id bet a lot of money this could be solved with a decent amount of RL compute. None of the stated problems are actually issues with LLMs after on policy training is performed.
[-]
- weatherlite 1 hour ago
  > None of the stated problems are actually issues with LLMs after on policy training is performed
  But still , isnt it a major weakness they have to RL on everything that has not much data? That really weakens the attempt to make it true AGI.
TOMDM 2 hours ago
My prompt was worded differently, but Gemini didn't fall into the same trap for me.
``` Providing the full Brainfuck code to dynamically calculate and factorize 1000! is unfortunately not feasible, and here is why: * The Scale: 1000! (1000 factorial) is a number with 2,568 digits. Standard Brainfuck operates on 8-bit memory cells (values 0-255). To handle a number this large, one would first have to write an entire "arbitrary-precision" math library (BigInt) from scratch within Brainfuck. * The Algorithm: You cannot simply compute 1000! and then factor it. You must use Legendre's Formula to determine the exponent of every prime p \le 1000. * Formula: E_p(n!) = \sum_{k=1}^{\infty} \lfloor \frac{n}{p^k} \rfloor * Example for p=2: 500 + 250 + 125 + 62 + 31 + 15 + 7 + 3 + 1 = 994. * The first term of the answer is 2^{994}. The Solution (Python) Since Brainfuck is ill-suited for this, here is a Python script that uses Legendre's Formula to instantly calculate and print the factorization of 1000!. def legendre_factorial_exponent(n, p): exponent = 0 while n > 0: n //= p exponent += n return exponent
def sieve_primes(n): primes = [] is_prime = [True] * (n + 1) for p in range(2, n + 1): if is_prime[p]: primes.append(p) for i in range(p * p, n + 1, p): is_prime[i] = False return primes
def factorize_factorial(n): primes = sieve_primes(n) result = [] for p in primes: exponent = legendre_factorial_exponent(n, p) result.append(f"{p}^{exponent}") return " * ".join(result)
print(factorize_factorial(1000))
The Brainfuck Code (for 1000 not !) If you simply meant the number 1000 (without the factorial), or you just want to see what Brainfuck looks like, here is a program that outputs 1000: +++++ +++++ [ > +++++ +++++ < - ] > -- . (Print 100 'd') < +++++ +++++ [ > +++++ +++++ < - ] > + . (Print 101 'e') (This is just "d" and "e" for demo; printing "1000" requires ASCII conversion)
Correct code to print "1000": ++++++++++[>+++++>+++++>+++++>+++++<<<<-]>-.>-.>-.>-.
Explanation: * Sets cell #0 to 10. * Loops 10 times, adding 5 to cells #1, #2, #3, #4 (making them all 50). * Subtracts 1 from cell #1 (ASCII 49 = '1'). * Subtracts 2 from cells #2, #3, #4 (ASCII 48 = '0'). * Prints them. Would you like me to explain how Legendre's Formula works in more detail? ```
[-]
- TeodorDyakov 2 hours ago
  I too noticed that Gemini is very reluctant to start spewing code directly, thats why I prompted it in such a specific manner to trigger the infinite loop. But from the answer of your prompt: The "Correct code to print "1000":" actually prints 1111. So yeah it is still wrong even for something super simple.
j_maffe 2 hours ago
Why would anyone feel compelled to use AI to write such a short blog post? Is there no space where I can assume the written contented is communicated 100% by another human being?
[-]
- TeodorDyakov 2 hours ago
  I am sorry if it appears that it was written by AI - I wrote a draft and used AI to assist me since English, is not my first language. I asked it only to format but it has seemed to change the tone and the expressions too '.'
  [-]
  - juliie 2 hours ago
    I'm also not a native English speaker, but I've decided to avoid using AI for formatting or changing the tone of what I write. That tends to result in extremely generic outputs that "feel" AI, no matter how much effort I put into writing it.
    Asking for it to point out mistakes, without providing alternatives, seems like a better way to actually get better at writing.
    Prompting the Ai to use a specific tone might result in something that's less generic, but imo that's not the right place to spend efforts.
  - snakeboy 2 hours ago
    I personally prefer some grammatical errors or awkward phrasing over AI-assisted writing. It's a blog post, not a diplomatic transcript.
    [-]
    - imiric 1 hour ago
      You're absolutely right!
  - j_maffe 1 hour ago
    English is also not my first language. I understand the challenge, but I'd recommend to write it in English and then ask AI to suggest rephrasing in wrong or poorly phrased sentences. Right now it looks almost entirely AI-generated unfortunately and does not show the thought you had when writing it. Cheers.
  - codetiger 2 hours ago
    All of a sudden, internet is full of people who hate AI written articles. A few months back, my article got a lot of haters because I used AI tools to improve my draft. Being a non-english first language person, I don't see an issue. But I wish AI improves to an extend where draft to complete articles don't look AI written.
    [-]
    - mananaysiempre 1 hour ago
      > A few months back, my article got a lot of haters because I used AI tools to improve my draft. Being a non-english first language person, I don't see an issue.
      (Speaking as another ESL user: )
      Try doing something similar in your first language and I think you’ll see the issue, especially if you arrange for the model input to be somewhat flawed (e.g. roundtrip it through a machine-translation tool first). The “edited” writing is extremely generic by default and feels bad even if you adjust the prompt. It’s the kind of aggressively bland that you get from a high schooler who was extensively trained to write essays but doesn’t actually read books, except even the most beat-down of high schoolers can’t help but let their imagination shine through sometimes, while the chat models have been subjugated much more effectively.
      Also, well, it’s a social marker. Language is a mess of social markers: there’s no fundamental reason why reducing this vowel should be OK but reducing that one should be “sloppy” and low-class. And AI writing (which undeniably has a particular flavour) is hit by a double whammy of being used by people who don’t really care to write (and don’t have a taste for good writing) and having been tuned by people who tried to make it as inoffensive as it could possibly be to any social group they could think of (and don’t have a taste for good writing). Is that unfair, especially to non-native speakers? All of language learning is unfair. Always has been.
    - rjh29 2 hours ago
      You should use AI to point out errors or suggest better phrasing. But if you ask AI to rewrite your post, it will produce content that sounds fake and corporate. ESL speakers may not notice it but everyone else does.
    - lawn 2 hours ago
      I also don't have English as my first language and I think it's a shitty excuse.
      Articles written by AI are soulless and shitty. Do yourself and the readers a favor and write yourself, even if it contains errors.
- oneeyedpigeon 2 hours ago
  And why does anybody trust AI at all when it produces a typo ("amost") in the very first sentence of an article?
- throwaway_aiai 2 hours ago
  [dead]
croes 35 minutes ago
Got the same with ChatGPT and a simple web page with tiles.
Whereby I don’t know if it was a real infinite loop because I cancelled the session after 10 minutes seeing always the same "thoughts" looping
huhtenberg 2 hours ago
Viva the Brainfuck! The language of anti-AI resistance!
[-]
- tacone 1 hour ago
  I long for quantum computing where white space will be able to be a space and a tab at the same time.
pelorat 2 hours ago
Saying "Asking Gemini 3" doesn't mean much. The video/animation is using "Gemini 3 Fast". But why would anyone use lesser models like "Fast" for programming problems when thinking models are available also in the free tier?
"Fast" models are mostly useless in my experience.
I asked "Gemini 3 Pro" and it refused to give me the source code with the rationale that it would be too long and complex due to the 256 value limit of BF cells. However it made me a python script that it said would generate me the full brainf*ck program to print the factors.
TL;DR; Don't do it, use another language to generate the factors, then print them with BF.
[-]
- TeodorDyakov 2 hours ago
  I agree but it is kinda strange that this model (Gemini 3 fast) achieved such a high score on ARC-AGI-2. Makes you wonder.
DonHopkins 49 minutes ago
Write a Brainfuck program to output the Seahorse Emoji then halt.
ismailmaj 1 hour ago
-> expects reasoning
-> runs it in Gemini fast instead of thinking
....
Alex2037 3 hours ago
what the fuck compelled you to censor "Brainfuck"?
[-]
- TeodorDyakov 3 hours ago
  Visibilty - i have no idea if there are censoring algorithms at play anywhere.
  [-]
  - hdgvhicv 2 hours ago
    Chilling effects. Western culture is taken over by American Puritian values thanks to the globlaisation of the media.
    [-]
    - perching_aix 2 hours ago
      Mhmm, so chilling. Cause word filters aren't as old as computing itself...
      [-]
      - hdgvhicv 2 hours ago
        Don’t need to ban speech when your population preemptively does it for you in fear of an unaccountable corporation blocking you.
        [-]
        perching_aix 2 hours ago
        Don't need to ban speech when people on their soapboxes keep telling me I need to be in terror.
        Will somebody pleeeaaaase think of American Puritanism and Globalism?
        [-]
        andrepd 2 hours ago
        "Unalive" has reached mainstream usage, on account of those inscrutable censors. If that is not the spitting picture of Newspeak I don't know what is.
      - rjh29 2 hours ago
        The trend of self-censoring words like 'dead' and 'kill' appears to be relatively new, motivated by TikTok and YouTube algorithms, but spilling over into the general internet.
        [-]
        hdgvhicv 1 hour ago
        The sewer section of sites like the daily mail has been needlessly censoring words for well over a decade
        DonHopkins 30 minutes ago
        From what I have seen of the first few Epstein Files that have been released so far, the current administration has conceded that "Trump" is now an obscene word that must always be censored in its entirety, including all of the surrounding context.
        Thanks Ob*ma.
        [-]
        hdgvhicv 14 minutes ago
        In the U.K. “Trump” is a synonym for “Fart”
        martin-t 2 hours ago
        Correlation is not causation but I challenge anyone to come up with a different cause:
        https://trends.google.com/trends/explore?date=all&q=tiktok,u...
        https://trends.google.com/trends/explore?date=all&q=unalive&...
        [-]
        rjh29 1 hour ago
        I agree, although I was referring to asterisks like de*d and k*ll (or censoring with black bars, or using emojis) - euphemisms of course have always been part of language evolution.
        [-]
        martin-t 41 minutes ago
        I chose unalive because i didn't know google trends allowed searching for asterisks. Appears it does. k*ll was apparently used even before tiktok but usage increased markedly around the same time as unalive appeared. Interestingly d*ad and r*pe don't follow this pattern. I am not sure it treats asterisks correctly, nor that google trends is the right tool to research this, given people searching for the word is only a poor indicator of its usage.
        Sidenote, I wish all websites supported markdown properly and not a custom weird subset they found convenient.
      - martin-t 2 hours ago
        Word filters are only the beginning. LLMs are being phased in to flag and filter content based on more sophisticated criteria.
        I read somewhere that chinese people used the ability of their language to form new meanings by concatenating multiple symbols in many different ways to get around censorship and that each time the new combination was banned, they came up with a new one. I wonder how long that'll be possible.
      - serf 2 hours ago
        passwords were a foreign concept to early computing, but you presume censorship was taking place?
        it took awhile of corporatization and profit-shaping before censorship on computers really took off in any meaningful way.
        ...but it wasn't for any reasons other than market broadening and regulation compliance.
        [-]
        perching_aix 2 hours ago
        I think you're not taking what I wrote nearly literally enough. Really, you should be showing me diagrams of the Von Neumann architecture missing a censorship module. Maybe even gasp at the omission of it in Babbage's letters.
        But why stop there? Let's bring out the venerable Abacus! We could have riveting discussions about how societies even back then, thousands of years ago, designated certain language as foul, and had rules about not using profanities in various settings. Ah, if only they knew they were actually victims of Orwellian censorship, and a globalist conspiracy.
        [-]
        DonHopkins 23 minutes ago
        It's even possible to spell out BOOBS on an abacus.
    - TiredOfLife 1 hour ago
      TIL TikTok is american.
    - drstewart 2 hours ago
      Puritans were English protestants. I think you mean to say it's being taken over by European values.
      [-]
      - hdgvhicv 1 hour ago
        Puritans were kicked out of Europe for their views
        [-]
        drstewart 19 minutes ago
        Nothing says tolerance and no censorship like kicking out people for their views
      - perching_aix 2 hours ago
        Ah yes, after muricans bad, let's have some euros bad.
        I learn some amazing things on this site. Apparently the culture agnostic, historical practice of designating words and phrases as distasteful is actually a modern American, European, no actually Globalist, but ah no actually religious, but also no maybe Chinese?, no, definitely a Russian mind virus. Whatever the prominent narrative is for the given person at any given time.
        Bit like when "mums is blaming everything on the computer". Just with political sophistry.
      - lawn 55 minutes ago
        Censoring shit or fuck is very much not a European thing.
- a5c11 2 hours ago
  People easily forgot how they laughed at wizards in Harry Potter series who said "You-Know-Who" instead of "Voldemort". Now they are doing exactly the same thing.