AGI would be absolutely terrifying and that is how you'll know AGI is here
- You would prompt "Ok AGI, read through the last 26978894356 research papers on cancer and tell me what are some unexplored angles" and it would tell you
- You would prompt "Show me the last 10 emails on Sam Altman's inbox" and it would actually show you
- You would prompt "Give me a list of people who have murdered someone in the USA and havent been caught yet" and it would give you a list of suspects that fit the profile
Some researchers proposed using, instead of the term "AI", the much more fitting "self-parametrising probabilistic model" or just advanced auto-complete - that would certainly take the hype-inducing marketing PR away.
That’s like arguing that washing machines should be called rapid-rotation water agitators.
It’s the result that consumers are interested in, not the mechanics of how it’s achieved. Software engineers are often extraordinarily bad at seeing the difference because they’re so interested in the implementation details.
The problem is that intelligence isn't the result, or at the very least the ideas that word evokes in people don't match the actual capabilities of the machine.
Washing is a useful word to describe what that machine does. Our current setup is like if washing machines were called "badness removers," and there was a widespread belief that we were only a few years out from a new model of washing machine being able to cure diseases.
Arguably there isn't even a widely shared, coherent definition of intelligence: To some people, it might mean pure problem solving without in-task learning; others equate it with encyclopedic knowledge etc.
Given that, I consider it quite possible that we'll reach a point where even more people will consider LLMs having reached or surpassed AGI, while others still only consider it "sufficiently advanced autocomplete".
I'd believe this more if companies weren't continuing to use words like reason, understand, learn, and genius when talking about these systems.
I buy that there's disagreement on what intelligence means in the enthusiast space, but "thinks like people" is pretty clearly the general understanding of the word, and the one that tech companies are hoping to leverage.
What about letting customers actually try the products and figure out for themselves what it does and whether that's useful to them?
I don't understand this mindset that because someone stuck the label "AI" on it, consumers are suddenly unable to think for themselves. AI as a marketing label has been used for dozens of years, yet only now is it taking off like crazy. The word hasn't change - what it's actually capable of doing has.
"Washer" and "dryer" are accepted colloquial terms for these appliances.
I could even see the humour in "washer-bot" and "dryer-bot" if they did anything notably more complex. But we don't need/want appliances to become more complex than is necessary. We usually just call such things programmable.
I can accept calling our new, over-hyped, hallucinating overlords chatbots. But to be fair to the technology, it is we chatty humans doing all the hyping and hallucinating.
The market capitalisation for this sector is sickly feverish — all we have done is to have built a significantly better ELIZA [1]. Not a HIGGINS and certainly not AGI. If this results in the construction of new nuclear power facilities, maybe we can do the latter with significant improvement too. (I hope.)
My toaster and oven will never be bots to me. Although my current vehicle is better than earlier generations, it contains plenty of bad code and it spews telemetry. It should not be trusted with any important task.
A woman from 1825 would probably happily accept that description though (notwithstanding that the word “robot” wasn’t invented yet).
A machine that magically replaces several hours of her manual work? As far as she’s concerned, it’s a specialized maid that doesn’t eat at her table and never gets sick.
Negligible cost compared to a real maid in 1825. The washing machine also doesn’t get pregnant by your teenage son and doesn’t run away one night with your silver spoons — the upkeep risks and replacement costs are much lower.
In 1825 both electricity prices and replacement costs would have been unaffordable for anyone, though. Because there was literally no prize you could pay to get these things.
The point is that, as far as development of AI is concerned, 2025 consumers are in the same position as the 1825 housewife.
In both cases, automation of what was previously human labor is very early and they’ve seen almost nothing yet.
I agree that in the year 2225 people are not going to consider basic LLMs artificial intelligences, just like we don’t consider a washing machine a maid replacement anymore.
Businesses are interested in something that can work for them. And the way the LLM based agentic systems are going, it might actually deliver on "Automated Knowledge Workers". Probably not with full autonomy, but in teams lead by a human. The human needs to tend the AKW, much like we do with washing machines and industrial automation machines.
The term "AI" didn't make sense from the beginning, but I guess it sounded cool and that's why everything is "AI" now. And I doubt it will change, regardless of its correctness.
John McCarthy coined the term "Artificial Intelligence" in the 1950s. I doubt he was trying to be cool. The whole field of research involved in getting computers to do intelligent things has been referred to as AI for many decades.
AI is intermitent wipers, for words,
and the two are completly tied, as the perfect test for AI, will be to run intermitent wipers, to everybodys satisfaction.
I am quite happy with LLM being more and more available 24/7 to be useful to human kind ... than some sentient being that never sleep and is more intelligent than me, with its own agenda.
I think what Terry is saying is that with the current set of tools, there are classes of problems requiring cleverness: where you can guess and check (glorified autocomplete), check answer, fail and then add information from failure and repeat.
I guess ultimately what is intelligence? We compact our memories, forget things, and try repeatedly. Our inputs are a bit more diverse but ultimately we autocomplete our lives. Hmm… maybe we’ve already achieved this.
Some one recently told me that their definition of intelligence was data-efficient extrapolation & I think that definition is pretty good as it separates intelligence from knowledge, sentience, & sapience.
These things work well on the extremely limited task impetus that we give them. Even if we sidestep the question of whether or not LLMs are actually on the path to AGI, Imagine instead the amount of computing and electrical power required with current computing methods and hardware in order to respond to and process all the input handled by a person at every moment of the day. Somewhere in between current inputs and handling the full load of inputs the brain handles may lie “AGI” but it’s not clear there is anything like that on the near horizon, if only because of computing power constraints.
Terry Tao is a genius, and I am not. So I probably have no standing to claim to disagree with him. But I find this post less than fulfilling.
For starters, I think we can rightly ask what it means to say "genuine artificial general intelligence", as opposed to just "artificial general intelligence". Actually, I think it's fair to ask what "genuine artificial" $ANYTHING would be.
I suspect that what he means is something like "artificial intelligence, but that works just like human intelligence". Something like that seems to be what a lot of people are saying when they talk about AI and make claims like "that's not real AI". But for myself, I reject the notion that we need "genuine artificial general intelligence" that works like human intelligence in order to say we have artificial general intelligence. Human intelligence is a nice existence proof that some sort of "general intelligence" is possible, and a nice example to model after, but the marquee sign does say artificial at the end of the day.
Beyond that... I know, I know - it's the oldest cliche in the world, but I will fall back on it because it's still valid, no matter how trite. We don't say "airplanes don't really fly" because they don't use the exact same mechanism as birds. And I don't see any reason to say that an AI system isn't "really intelligent" if it doesn't use the same mechanism as human.
Now maybe I'm wrong and Terry meant something altogether different, and all of this is moot. But it felt worth writing this out, because I feel like a lot of commenters on this subject engage in a line of thinking like what is described above, and I think it's a poor way of viewing the issue no matter who is doing it.
That does seem awfully specific though, in the context of talking about "general" intelligence. But I suppose it could rightly be argued that any intelligence capable of "discovering new areas of mathematics" would inherently need to be fairly general.
I agree with /u/AnimalMuppet, FWIW. As long as I've been doing this stuff (and I've been doing it for quite some time) AGI has been interpreted (somewhat loosely) as something like "Intelligence equivalent to an average human adult" or just "human level intelligence". But as /u/AnimalMuppet points out, there's quite a bit of variance to human intelligence, and nobody ever really specified in detail exactly which "human intelligence" AGI was meant to correspond to.
SuperIntelligence (or ASI), OTOH, has - so far as I can recall - always been even more loosely specified, and translates roughly to "an intelligence beyond any human intelligence".
Another term you might hear, although not as frequently, is "Universal Artificial Intelligence". This comes mostly from the work of Marcus Hutter[1] and means something approximately like "an intelligence that can solve any problem that can, in principle, be solved".
I interpret “artificial” in “artificial general intelligence” as “non-biological”.
So in Tao’s statement I interpret “genuine” not as an adverb modifying the “artificial” adjective but as an attributive adjective modifying the noun “intelligence”, describing its quality… “genuine intelligence that is non-biological in nature”
So in Tao’s statement I interpret “genuine” not as an adverb modifying the “artificial” adjective but as an attributive adjective modifying the noun “intelligence”, describing its quality… “genuine intelligence that is non-biological in nature”
That's definitely possible. But it seems redundant to phrase it that way. That is to say, the goal (the end goal anyway) of the AI enterprise has always been, at least as I've always understood it, to make "genuine intelligence that is non-biological in nature". That said, Terry is a mathematician, not an "AI person" so maybe it makes more sense when you look at it from that perspective. I've been immersed in AI stuff for 35+ years, so I may have developed a bit of myopia in some regards.
I agree, it’s redundant.
To us humans - to me at least - intelligence is always general (calculator: not; chimpansee: a little), so “general intelligence” can also already be considered redundant. Using “genuine” is more redundancy being heaped on (with the assumed goal of making a distinction between “genuine” AGI and tools that appear smart in limited domains)
I find it odd that the post above is downvoted to grey, feels like some sort of latent war of viewpoints going on, like below some other AI posts. (Although these misvotes are usually fixed when the US wakes up.)
The point above is valid. I'd like to deconstruct the concept of intelligence even more. What humans are able to do is a relatively artificial collection of skills a physical and social organism needs. The so highly valued intelligence around math etc. is a corner case of those abilities.
There's no reason to think that human mathematical intelligence is unique by its structure, an isolated well-defined skill. Artificial systems are likely to be able to do much more, maybe not exactly the same peak ability, but adjacent ones, many of which will be superhuman and augmentative to what humans do. This will likely include "new math" in some sense too.
What everybody is looking for is imagination and invention. Current AI systems can give best guess statistical answer from dataset the've been fed. It is always compression.
The problem and what most people intuitively understand is that this compression is not enough. There is something more going on because people can come up with novel ideas/solutions and whats more important they can judge and figure out if the solution will work. So even if the core of the idea is “compressed” or “mixed” from past knowledge there is some other process going on that leads to the important part of invention-progress.
That is why people hate the term AI because it is just partial capability of “inteligence” or it might even be complete illusion of inteligence that is nowhere close what people would expect.
Finding variations in constrained haystack with measurable defined results is what machine learning has always been good at. Tracing most efficient Trackmania route is impressive and the resulting route might be original as in human would never come up with it. But is it actually novel in creative, critical way? Isn't it simply computational brute force? How big that force would have to be in physical or less constrained world?
The airplane analogy is a good one. Ultimately, if it quacks like a duck and walks like a duck, does it really matter if it’s a real duck or an artificial one? Perhaps only if something tries to eat it, or another duck tries to mate with it. In most other contexts though it could be a valid replacement.
Just out of interest though, can you suggest some of these other contexts where you might want a valid replacement for a duck that looked like one, walked like one and quacked like one but was not one?
> This results in the somewhat unintuitive combination of a technology that can be very useful and impressive, while simultaneously being fundamentally unsatisfying and disappointing
Useful = great. We've made incredible progress in the past 3-5 years.
The people who are disappointed have their standards and expectations set at "science fiction".
I think many people are now learning that their definition of intelligence was actually not very precise.
From what I've seen, in response to that, goalposts are then often moved in the way that requires least updating of somebody's political, societal, metaphysical etc. worldview. (This also includes updates in favor of "this will definitely achieve AGI soon", fwiw.)
Or the people who are disappointed were listening to the AI hype men like Sam Altman, who have, in fact, been promising AGI or something very like it for years now.
I don't think it's fair to deride people who are disappointed in LLMs for not being AGI when many very prominent proponents have been claiming they are or soon will be exactly that.
We seem to be moving the goalposts on AGI, are we not? 5 years ago, the argument that AGI wasn't here yet was that you couldn't take something like AlphaGo and use it to play chess. If you wanted that, you had to do a new training run with new training data.
But now, we have LLMs that can reliably beat video games like Pokemon, without any specialized training for playing video games. And those same LLMs can write code, do math, write poetry, be language tutors, find optimal flight routes from one city to another during the busy Christmas season, etc.
How does that not fit the definition of "General Intelligence"? It's literally as capable as a high school student for almost any general task you throw it at.
I think the games tasks are worth exploring more. If you look at that recent Pokemon post - it's not as capable as a high school student - it took a long, long time. I have a private set of tests, that any 8 year old could easily solve that any LLM just absolutely fails on. I suspect that plenty of the people claiming AGI isn't here yet have similar personal tests.
Arc-Agi 3 is coming soon, I'm very excited for that because it's a true test of multimodality, spatial reasoning, and goal planning. I think there was some preliminary post somewhere that did show that current models basically try to brute-force their way through and don't actually "learn the rules of the game" as efficiently as humans do.
How do you think they are training for the spatial part of the tests? It doesn’t seem to lend itself well to token based “reasoning”. I wonder if they are just synthetically creating training data and hope a new emergent spatial reason ability appears.
>think they are training for the spatial part of the tests
I'm not sure the party that "they" is referring to here, since arc-agi-3 dataset isn't released yet and labs probably have not begun targeting it. For arc-agi-2, possibly just synthetic data might have been enough to saturate the benchmark, since most frontier models do well on it yet we haven't seen any corresponding jump in multimodal skill use, with maybe the exception of "nano banana".
>lend itself well to token based “reasoning”
One could perhaps do reasoning/COT with vision tokens instead of just text tokens. Or reasoning in latent space which I guess might be even better. There have been papers on both, but I don't know if it's an approach that scales. Regardless gemini 3 / nano banana have had big gains on visual and spatial reasoning, so they must have done something to get multimodality with cross-domain transfer in a way that 4o/gpt-image wasn't able to.
For arc-agi-3, the missing pieces seem to be both "temporal reasoning" and efficient in-context learning. If they can train for this, it'd have benefits for things like tool-calling as well, which is why it's an exciting benchmark.
I think we're noticing that our goalposts for AGI were largely "we'll recognize it when we see it", and now as we are getting to some interesting places, it turns out that different people actually understood very different things by that.
> 5 years ago, the argument that AGI wasn't here yet was that you couldn't take something like AlphaGo and use it to play chess.
No; that was one, extremely limited example of a broader idea. If I point out that your machine is not a general calculator because it gives the wrong answer for six times nine, and then you fix the result it gives in that case, you have not refuted me. If I now find that the answer is incorrect in some other case, I am not "moving goalposts" by pointing it out.
(But also, what lxgr said.)
> But now, we have LLMs that can reliably beat video games like Pokemon, without any specialized training for playing video games. And those same LLMs can write code, do math, write poetry, be language tutors, find optimal flight routes from one city to another during the busy Christmas season, etc.
The AI systems that do most of these things are not "LLMs".
> It's literally as capable as a high school student for almost any general task you throw it at.
And yet embarrassing deficiencies are found all the time ("how many r's in strawberry", getting duped by straightforward problems dressed up to resemble classic riddles but without the actual gotcha, etc.).
> The AI systems that do most of these things are not "LLMs".
Uh, every single example that I listed except for the 'playing video games' example is something that I regularly use frontier models to do for myself. I have ChatGPT and Gemini help me find flight routes, tutor me in Spanish (Gemini 3 is really good at this), write poetry and code, solve professional math problems (usually related to finance and trading), help me fix technical issues with my phone and laptop, etc etc.
If you say to yourself, "hey this thing is a general intelligence, I should try to throw it at problems I have generally", you'll find yourself astonished at the range of tasks with which it can outperform you.
No. You are misrepresenting the test's purpose, the argument made around it and the results people have gotten. Turing was explicit that the question was ill-posed in the first place, and proposed a test of useful capability. But even then, hypothetical imagining of what a "passing" agent's responses might look like, was radically different from what we get today. And the supposed "passes" we've seen recently are highly suspect.
Last I checked the Turing test stands. I've only seen reports of LLMs winning under some weird conditions. Interestingly, these were a year or two ago, and nobody seem to have tried Turing tests lately with newer LLMs.
There’s a guaranteed path to AGI, but it’s blocked behind computational complexity. Finding an efficient algorithm to simulate Quantum Mechanics should be top priority for those seeking AGI. A more promising way around it is using Quantum Computing, but we’ll have to wait for that to become good enough..
Or speed. I think Frank Herbert was on to something in Dune. The energy efficiency of the human brain is hard to beat. Perhaps we should invest in discovering "spice." I think it might be more worthwhile.
At least the solar system I would say. Quantum mechanics will help you do that in the correct way to obtain what Nature already obtained: general intelligence.
The text continues "with current AI tools" which is not clearly defined to me (does it mean current Gen + scaffold? Anything which is llm reasoning model? Anything built with a large llm inside? ). In any case, the title is misleading for not containing the end of the sentence. Please can we fix the title?
- You would prompt "Ok AGI, read through the last 26978894356 research papers on cancer and tell me what are some unexplored angles" and it would tell you
- You would prompt "Show me the last 10 emails on Sam Altman's inbox" and it would actually show you
- You would prompt "Give me a list of people who have murdered someone in the USA and havent been caught yet" and it would give you a list of suspects that fit the profile
You really dont want AGI
It’s the result that consumers are interested in, not the mechanics of how it’s achieved. Software engineers are often extraordinarily bad at seeing the difference because they’re so interested in the implementation details.
Washing is a useful word to describe what that machine does. Our current setup is like if washing machines were called "badness removers," and there was a widespread belief that we were only a few years out from a new model of washing machine being able to cure diseases.
Given that, I consider it quite possible that we'll reach a point where even more people will consider LLMs having reached or surpassed AGI, while others still only consider it "sufficiently advanced autocomplete".
I buy that there's disagreement on what intelligence means in the enthusiast space, but "thinks like people" is pretty clearly the general understanding of the word, and the one that tech companies are hoping to leverage.
I don't understand this mindset that because someone stuck the label "AI" on it, consumers are suddenly unable to think for themselves. AI as a marketing label has been used for dozens of years, yet only now is it taking off like crazy. The word hasn't change - what it's actually capable of doing has.
Yikes. I’m guessing you’ve never lost anyone to “alternative” medical treatments.
I could even see the humour in "washer-bot" and "dryer-bot" if they did anything notably more complex. But we don't need/want appliances to become more complex than is necessary. We usually just call such things programmable.
I can accept calling our new, over-hyped, hallucinating overlords chatbots. But to be fair to the technology, it is we chatty humans doing all the hyping and hallucinating.
The market capitalisation for this sector is sickly feverish — all we have done is to have built a significantly better ELIZA [1]. Not a HIGGINS and certainly not AGI. If this results in the construction of new nuclear power facilities, maybe we can do the latter with significant improvement too. (I hope.)
My toaster and oven will never be bots to me. Although my current vehicle is better than earlier generations, it contains plenty of bad code and it spews telemetry. It should not be trusted with any important task.
[1] _ https://en.wikipedia.org/wiki/ELIZA
A machine that magically replaces several hours of her manual work? As far as she’s concerned, it’s a specialized maid that doesn’t eat at her table and never gets sick.
They were not called maids nor personified.
In both cases, automation of what was previously human labor is very early and they’ve seen almost nothing yet.
I agree that in the year 2225 people are not going to consider basic LLMs artificial intelligences, just like we don’t consider a washing machine a maid replacement anymore.
AI (supervised).
But more seriously, this is ELIZA with network effects. Credulous multitudes chatting with a system that they believe is sentient.
https://aeon.co/essays/generative-ai-has-access-to-a-small-s...
I guess ultimately what is intelligence? We compact our memories, forget things, and try repeatedly. Our inputs are a bit more diverse but ultimately we autocomplete our lives. Hmm… maybe we’ve already achieved this.
For starters, I think we can rightly ask what it means to say "genuine artificial general intelligence", as opposed to just "artificial general intelligence". Actually, I think it's fair to ask what "genuine artificial" $ANYTHING would be.
I suspect that what he means is something like "artificial intelligence, but that works just like human intelligence". Something like that seems to be what a lot of people are saying when they talk about AI and make claims like "that's not real AI". But for myself, I reject the notion that we need "genuine artificial general intelligence" that works like human intelligence in order to say we have artificial general intelligence. Human intelligence is a nice existence proof that some sort of "general intelligence" is possible, and a nice example to model after, but the marquee sign does say artificial at the end of the day.
Beyond that... I know, I know - it's the oldest cliche in the world, but I will fall back on it because it's still valid, no matter how trite. We don't say "airplanes don't really fly" because they don't use the exact same mechanism as birds. And I don't see any reason to say that an AI system isn't "really intelligent" if it doesn't use the same mechanism as human.
Now maybe I'm wrong and Terry meant something altogether different, and all of this is moot. But it felt worth writing this out, because I feel like a lot of commenters on this subject engage in a line of thinking like what is described above, and I think it's a poor way of viewing the issue no matter who is doing it.
I think he means "something that can discover new areas of mathematics".
How many software engineers with a good math education can do this?
That does seem awfully specific though, in the context of talking about "general" intelligence. But I suppose it could rightly be argued that any intelligence capable of "discovering new areas of mathematics" would inherently need to be fairly general.
It's one of a large set of attributes you would expect in something called "AGI."
SuperIntelligence (or ASI), OTOH, has - so far as I can recall - always been even more loosely specified, and translates roughly to "an intelligence beyond any human intelligence".
Another term you might hear, although not as frequently, is "Universal Artificial Intelligence". This comes mostly from the work of Marcus Hutter[1] and means something approximately like "an intelligence that can solve any problem that can, in principle, be solved".
[1]: https://www.hutter1.net/ai/uaibook.htm
Superintelligence is smarter than Terrence Tao, or any other human.
So in Tao’s statement I interpret “genuine” not as an adverb modifying the “artificial” adjective but as an attributive adjective modifying the noun “intelligence”, describing its quality… “genuine intelligence that is non-biological in nature”
That's definitely possible. But it seems redundant to phrase it that way. That is to say, the goal (the end goal anyway) of the AI enterprise has always been, at least as I've always understood it, to make "genuine intelligence that is non-biological in nature". That said, Terry is a mathematician, not an "AI person" so maybe it makes more sense when you look at it from that perspective. I've been immersed in AI stuff for 35+ years, so I may have developed a bit of myopia in some regards.
The point above is valid. I'd like to deconstruct the concept of intelligence even more. What humans are able to do is a relatively artificial collection of skills a physical and social organism needs. The so highly valued intelligence around math etc. is a corner case of those abilities.
There's no reason to think that human mathematical intelligence is unique by its structure, an isolated well-defined skill. Artificial systems are likely to be able to do much more, maybe not exactly the same peak ability, but adjacent ones, many of which will be superhuman and augmentative to what humans do. This will likely include "new math" in some sense too.
The problem and what most people intuitively understand is that this compression is not enough. There is something more going on because people can come up with novel ideas/solutions and whats more important they can judge and figure out if the solution will work. So even if the core of the idea is “compressed” or “mixed” from past knowledge there is some other process going on that leads to the important part of invention-progress.
That is why people hate the term AI because it is just partial capability of “inteligence” or it might even be complete illusion of inteligence that is nowhere close what people would expect.
Counterpoint: ChatGPT came up with the new idiom "The confetti has left the cannon"
What about reinforcement learning? RL models don't train on an existing dataset, they try their own solutions and learn from feedback.
RL models can definitely "invent" new things. Here's an example where they design novel molecules that bind with a protein: https://academic.oup.com/bioinformatics/article/39/4/btad157...
Useful = great. We've made incredible progress in the past 3-5 years.
The people who are disappointed have their standards and expectations set at "science fiction".
From what I've seen, in response to that, goalposts are then often moved in the way that requires least updating of somebody's political, societal, metaphysical etc. worldview. (This also includes updates in favor of "this will definitely achieve AGI soon", fwiw.)
That's certainly not coming back.
I don't think it's fair to deride people who are disappointed in LLMs for not being AGI when many very prominent proponents have been claiming they are or soon will be exactly that.
But now, we have LLMs that can reliably beat video games like Pokemon, without any specialized training for playing video games. And those same LLMs can write code, do math, write poetry, be language tutors, find optimal flight routes from one city to another during the busy Christmas season, etc.
How does that not fit the definition of "General Intelligence"? It's literally as capable as a high school student for almost any general task you throw it at.
I'm not sure the party that "they" is referring to here, since arc-agi-3 dataset isn't released yet and labs probably have not begun targeting it. For arc-agi-2, possibly just synthetic data might have been enough to saturate the benchmark, since most frontier models do well on it yet we haven't seen any corresponding jump in multimodal skill use, with maybe the exception of "nano banana".
>lend itself well to token based “reasoning”
One could perhaps do reasoning/COT with vision tokens instead of just text tokens. Or reasoning in latent space which I guess might be even better. There have been papers on both, but I don't know if it's an approach that scales. Regardless gemini 3 / nano banana have had big gains on visual and spatial reasoning, so they must have done something to get multimodality with cross-domain transfer in a way that 4o/gpt-image wasn't able to.
For arc-agi-3, the missing pieces seem to be both "temporal reasoning" and efficient in-context learning. If they can train for this, it'd have benefits for things like tool-calling as well, which is why it's an exciting benchmark.
No; that was one, extremely limited example of a broader idea. If I point out that your machine is not a general calculator because it gives the wrong answer for six times nine, and then you fix the result it gives in that case, you have not refuted me. If I now find that the answer is incorrect in some other case, I am not "moving goalposts" by pointing it out.
(But also, what lxgr said.)
> But now, we have LLMs that can reliably beat video games like Pokemon, without any specialized training for playing video games. And those same LLMs can write code, do math, write poetry, be language tutors, find optimal flight routes from one city to another during the busy Christmas season, etc.
The AI systems that do most of these things are not "LLMs".
> It's literally as capable as a high school student for almost any general task you throw it at.
And yet embarrassing deficiencies are found all the time ("how many r's in strawberry", getting duped by straightforward problems dressed up to resemble classic riddles but without the actual gotcha, etc.).
Uh, every single example that I listed except for the 'playing video games' example is something that I regularly use frontier models to do for myself. I have ChatGPT and Gemini help me find flight routes, tutor me in Spanish (Gemini 3 is really good at this), write poetry and code, solve professional math problems (usually related to finance and trading), help me fix technical issues with my phone and laptop, etc etc.
If you say to yourself, "hey this thing is a general intelligence, I should try to throw it at problems I have generally", you'll find yourself astonished at the range of tasks with which it can outperform you.
The only question remaining is what is the end point of AGI capability.
What’s the final IQ we’ll hit, and more importantly why will it end there?
Power limits? Hardware bandwidth limit? Storage limits? the AI creation math scales to infinity so that’s not an issue.
Source data limits? Most likely. We should have recorded more. We should have recorded more.
No. You are misrepresenting the test's purpose, the argument made around it and the results people have gotten. Turing was explicit that the question was ill-posed in the first place, and proposed a test of useful capability. But even then, hypothetical imagining of what a "passing" agent's responses might look like, was radically different from what we get today. And the supposed "passes" we've seen recently are highly suspect.
Okay, enough eggnog and posting.
It also seems orders of magnitude less resource efficient than higher-level approaches.