Rendered at 11:47:58 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
vatsachak 2 days ago [-]
I've always said this but AI will win a fields medal before being able to manage a McDonald's.
Math seems difficult to us because it's like using a hammer (the brain) to twist in a screw (math).
LLMs are discovering a lot of new math because they are great at low depth high breadth situations.
I predict that in the future people will ditch LLMs in favor of AlphaGo style RL done on Lean syntax trees. These should be able to think on much larger timescales.
Any professional mathematician will tell you that their arsenal is ~ 10 tricks. If we can codify those tricks as latent vectors it's GG
vatsachak 2 days ago [-]
Tricks are nothing but patterns in the logical formulae we reduce.
Ergo these are latent vectors in our brain. We use analogies like geometry in order to use Algebraic Geometry to solve problems in Number Theory.
An AI trained on Lean Syntax trees might develop it's own weird versions of intuition that might actually properly contain ours.
If this sounds far fetched, look at Chess. I wonder if anyone has dug into StockFish using mechanistic interpretability
myffical 2 days ago [-]
Some DeepMind researchers used mechanistic interpretability techniques to find concepts in AlphaZero and teach them to human chess Grandmasters: https://www.pnas.org/doi/10.1073/pnas.2406675122
hodgehog11 2 days ago [-]
This argument, that LLMs can develop new crazy strategies using RLVR on math problems (like what happened with Chess), turns out to be false without a serious paradigm shift. Essentially, the search space is far too large, and the model will need help to explore better, probably with human feedback.
That linked article says its about RLVR but then goes on to conflate other RL with it, and doesn't address much in the way of the core thinking that was in the paper they were partially responding to that had been published a month earlier[0] which laid out findings and theory reasonably well, including work that runs counter to the main criticism in the article you cited, ie, performance at or above base models only being observed with low K examples.
That said, reachability and novel strategies are somewhat overlapping areas of consideration, and I don't see many ways in which RL in general, as mainly practiced, improves upon models' reachability. And even when it isn't clipping weights it's just too much of a black box approach.
But none of this takes away from the question of raw model capability on novel strategies, only such with respect to RL.
The blind spot exploiting strategy you link to was found by an adverserial ML model...
sealeck 1 days ago [-]
Yes and making a horse drawn cart drive itself was thought to be impossible so why don't we have faster than light travel yet...
Finbel 1 days ago [-]
Yes but "the search space is too large" is something that has been said about innumerable AI-problems that were then solved. So it's not unreasonable that one doubts the merit of the statement when it's said for the umpteenth time.
hodgehog11 1 days ago [-]
I should have been more specific then. The problem isn't that the search space is too large to explore. The problem is that the search space is so large that the training procedure actively prefers to restrict the search space to maximise short term rewards, regardless of hyperparameter selection. There is a tradeoff here that could be ignored in the case of chess, but not for general math problems.
This is far from unsolvable. It just means that the "apply RL like AlphaGo" attitude is laughably naive. We need at least one more trick.
vatsachak 12 hours ago [-]
The other trick could be bootstrapping through mathlib.
As you said brute forcing the search space as the starting procedure would take way too long for the AI to build intuition.
But if we could give it a million or so lemmas of human math, that would be a great starting point.
throwaway27448 1 days ago [-]
I agree that LLMs are a bad fit for mathematical reasoning, but it's very hard for me to buy that humans are a better fit than a computational approach. Search will always beat our intuition.
hodgehog11 1 days ago [-]
Yes and no. I think we have vastly underestimated the extent of the search space for math problems. I also think we underestimate the degree to which our worldview influences the directions with which we attempt proofs. Problems are derived from constructions that we can relate to, often physically. Consequently, the technique in the solution often involves a construction that is similarly physical in its form. I think measure theory is a prime example of this, and it effectively unlocked solutions to a lot of long-standing statistical problems.
1 days ago [-]
slopinthebag 2 days ago [-]
Stockfish's power comes from mostly search, and the ML techniques it uses are mainly about better search, i.e. pruning branches more efficiently.
vatsachak 2 days ago [-]
The weights must still have some understanding of the chess board. Though there is always the chance that it makes no sense to us
emp17344 2 days ago [-]
Why must it involve understanding? I feel like you’re operating under the assumption that functionalism is the “correct” philosophical framework without considering alternative views.
PowerElectronix 1 days ago [-]
There is no understanding, the weights are selected based on better fit. Our cells have no understanding of optics just because they have the eyes coded in their DNA.
slopinthebag 2 days ago [-]
Even that is probably too much. It has no understanding of what "chess" is, or what a chess board is, or even what a game is. And yet it crushes every human with ease. It's pretty nuts haha.
anematode 2 days ago [-]
Actually, the neural net itself is fairly imprecise. Search is required for it to achieve good play. Here's an example of me beating Stockfish 18 at depth 1: https://lichess.org/XmITiqmi
Sopel 2 days ago [-]
chess is just a simple mathematical construct so that's not surprising
hollerith 2 days ago [-]
Does Stockfish have weights or use a neural net? I know older versions did not.
Sopel 2 days ago [-]
yes
Sopel 2 days ago [-]
The ML techniques it uses are only about evaluation, but you were close
hodgehog11 2 days ago [-]
As a professional mathematician, I would say that a good proof requires a very good representation of the problem, and then pulling out the tricks. The latter part is easy to get operating using LLMs, they can do it already. It's the former part that still needs humans, and I'm perfectly fine with that.
vatsachak 12 hours ago [-]
I guess I'm using Rota's vocabulary where he implicitly uses the word trick to mean representation
threethirtytwo 1 days ago [-]
But are you ok with the trendline of ai improvement? The speed of improvement indicates humans will only get further and further removed from the loop.
I see posts like your all the time comforting themselves that humans still matter, and every-time people like you are describing a human owning an ever shrinking section of the problem space.
hodgehog11 1 days ago [-]
I used to be worried, but not so much anymore.
It used to be the case that the labs were prioritising replacing human creativity, e.g. generative art, video, writing. However, they are coming to realise that just isn't a profitable approach. The most profitable goal is actually the most human-oriented one: the AI becomes an extraordinarily powerful tool that may be able to one-shot particular tasks. But the design of the task itself is still very human, and there is no incentive to replace that part. Researchers talk a bit less about AGI now because it's a pointless goal. Alignment is more lucrative.
Basically, executives want to replace workers, not themselves.
latentsea 23 hours ago [-]
On the contrary the depth and breadth we're becoming able to handle agentically now in software is growing very rapidly, to the point where in the last 3 months the industry has undergone a big transformation and our job functions are fundamentally starting to change. As a software engineer I feel increasingly like AGI will be a real thing within the next few years, and it's going to affect everyone.
k33d 11 hours ago [-]
"to the point where in the last 3 months the industry has undergone a big transformation "
Oh... this again.
threethirtytwo 16 hours ago [-]
I don’t write code anymore. I don’t use ide’s anymore. The agent writes code. My job is to manage ai now.
The paradigm shift has already happened to me and there will be more shifts to come.
tartoran 1 days ago [-]
Humans needing to ask new question due to curiosity push the boundaries further, find new directions, ways or motivations to explore, maybe invent new spaces to explore. LLMs are just tools that people use. When people are no longer needed AI serves no purpose at all.
threethirtytwo 1 days ago [-]
Who said LLMs can’t push boundaries either?
People can use other people as tools. An LLM being a tool does not preclude it from replacing people.
Ultimately it’s a volume problem. You need at least one person to initialize the LLM. But after that, in theory, a future LLM can replace all people with the exception of the person who initializes the LLM.
tossandthrow 1 days ago [-]
The initialization problem is solved - maybe the next Nobel price will be given to a Mac mini.
pfdietz 23 hours ago [-]
> Any professional mathematician will tell you that their arsenal is ~ 10 tricks. If we can codify those tricks as latent vectors it's GG
And if we can train the systems to discover new tricks, whoa Nelly.
madrox 2 days ago [-]
> I've always said this but AI will win a fields medal before being able to manage a McDonald's.
I love this and have a corollary saying: the last job to be automated will be QA.
This wave of technology has triggered more discussion about the types of knowledge work that exist than any other, and I think we will be sharper for it.
bitwize 2 days ago [-]
The ownership class will be sharper. They will know how to exploit capital and turn it into more capital with vastly increased efficiency. Everybody else will be hosed.
madrox 1 days ago [-]
I'm not sure if people will be more hosed than before. Historically, what makes people with capital able to turn things into more capital is its ability to buy someone's time and labor. Knowledge labor is becoming cheaper, easier, and more accessible. That changes the calculus for what is valuable, but not the mechanisms.
tmoertel 1 days ago [-]
> Historically, what makes people with capital able to turn things into more capital is its ability to buy someone's time and labor.
You forgot to include resources:
What makes people with capital able to turn things into more capital is their ability to buy labor and resources. If people with more capital can generate capital faster than people with less capital, then (unless they are constrained, for example, by law or conscious) the people with the most capital will eventually own effectively all scarce resources, such as land. And that's likely to be a problem for everyone else.
18 hours ago [-]
madrox 18 hours ago [-]
Fair, though I don’t see how AI is really changing the equation here
tmoertel 16 hours ago [-]
AI doesn't change the equation; it makes the equation more brutal for people who don't have capital.
If you don't have capital, the only way to get it is by trading resources or labor for it. Most poor people don't have resources, but they do have the ability to do labor that's valued. But AI is a substitute for labor. And as AI gets better, the value of many kinds of labor will go towards zero.
If it was hard for poor people to escape poverty in the past, it's going to be even harder with AI. Unless we change something about the structure of society to ensure that the benefits of AI are shared with poor people.
madrox 14 hours ago [-]
Ok, I'm following you. You're saying because labor gets cheaper it will be harder to make a living providing labor. Not disagreeing, but I wonder how much weight to give this argument. History shows a precedent of productivity revolutions changing the workforce, but not eliminating it, and lifting the quality of life of the population overall (though it does also create problems). Mixed bag with the arc bending towards betterment for all. You could argue that this moment is unprecedented in history, but unless the human spirit changes, for better or worse, we will adapt as we always have, rich and poor alike.
If the value of many kinds of labor go towards zero, those benefits also go to the poor. ChatGPT has a free tier. The method of escaping poverty will still be the same. Grow yourself. Provide value to your community.
DoctorOetker 1 days ago [-]
but what if we succeed in gamifying the latent knowledge in LLM's to upload it to our human brains, by some kind of speed / reaction game?
zer00eyz 1 days ago [-]
There is a fundamental problem with this thinking, you are making an assumption about scale. There is the apocryphal quote "I think there is a world market for maybe five computers".
You have to believe that LLM scaling (down) is impossible or will never happen. I assure you that this is not the case.
Yoric 1 days ago [-]
> I predict that in the future people will ditch LLMs in favor of AlphaGo style RL done on Lean syntax trees. These should be able to think on much larger timescales.
This is certainly my hope.
In my spare time, I'm slowly, very slowly, inching towards a prototype of something that could work like that.
ryanar 2 days ago [-]
Are they actually producing new math? In the most recent ACM issue there was an article about testing AI against a math bench that was privately built by mathematicians, and what they found is that even though AI can solve some problems, it never truly has come up with something novel and new in mathematics, it is just good at drawing connections between existing research and putting a spin on it.
in-silico 1 days ago [-]
I'm not accusing you in particular, but I feel like there's a lot of circular reasoning around this point. Something like: AI can't discover "new math" -> AI discovers something -> since it was discovered by AI it must not be "new math" -> AI can't discover "new math"
That would definitely be considered "new math" if a human did it, but since it was AI people aren't so sure.
parineum 1 days ago [-]
There is a kind of rubrik I use on stuff like this. If LLMs are discovering new math, why have I only read one or two articles where it's happening? Wouldn't it be happening with regularity?
The most obvious example of this thinking is, if LLMs are replacing developers, why us open ai still hiring?
specvsimpl 1 days ago [-]
I can only say that at family meetings, I hear people talk about contracting with a shop that used to have 4 web designers, but now it's 1 guy, delivering 4x faster than before.
So devs are being replaced.
ori_b 24 hours ago [-]
Why aren't they delivering 4x more work? Does the world no longer need software?
Bombthecat 1 days ago [-]
Nah AI is not replacing people! /s
And other stories people tell themselves to sleep better at night
hodgehog11 2 days ago [-]
It's finding constructions and counterexamples. That's different from finding new proof techniques, but still extremely useful, and still gives way to novel findings.
kelseyfrog 2 days ago [-]
As of now, no models have solved a Millennium Prize Problem[1].
Most Fields medals winners haven't either, except one.
utopcell 1 days ago [-]
This is the real Litmus test isn't it? There will be a deafening silence from critics when AI decides P vs NP.
3abiton 2 days ago [-]
It will be heavily still reliant onexpert human input and interactions. Knuth is an expert, and know how to guide.
smokel 2 days ago [-]
I think this is mostly about existing legislature, not about technology.
In any other context than when your paycheck depends on it, you would probably not be following orders from a random manager. If your paycheck depended on following the instructions of an AI robot, the world might start to look pretty scary real soon.
jfim 1 days ago [-]
> If your paycheck depended on following the instructions of an AI robot, the world might start to look pretty scary real soon.
That's already the case, minus AI, for gig workers. Their only agency is to accept or decline a ride/delivery, the rest is follow instructions.
throw3747488 2 days ago [-]
AI actually has to follow all rules, even the bad rules. Like when autonomous car drives super carefully.
Imagine mcdonald management would enforce dog related rules. No more filthy muppets! If dog harasses customers, AI would call cops, and sue for restraining order! If dog defecates in middle of restaurant, everything would get desinfected, not just smeared with towels!
Nutters would crucify AI management!
vatsachak 2 days ago [-]
There's a lot to being a manager
- Coherent customer interaction
- Common sense judgements
- Scheduling
- Quality control
All which are baked into humans but not so much into LLMs
Even if it were legal to have an LLM as a GM, I think it would fair poorly
1 days ago [-]
slopinthebag 2 days ago [-]
> AI will win a fields medal before being able to manage a McDonald's
Of course, because it takes multi-modal intelligence to manage a McDonalds. I.e. it requires human intelligence.
> I predict that in the future people will ditch LLMs in favor of AlphaGo style RL
Same for coding as well. LLM's might be the interface we use with other forms of AI though.
vatsachak 2 days ago [-]
Something like building Linux is more akin to managing a McDonald's than it is to a 10 page technical proof in Algebraic Groups.
Programming is more multimodal than math.
Something like performance engineering might be free lunch though
hodgehog11 2 days ago [-]
> Programming is more multimodal than math
I have no idea how you come to this conclusion, when the evidence on the ground for those training models suggests it is precisely the opposite.
We are much further along the path of writing code than writing new maths, since the latter often requires some degree of representational fluency of the world we live in to be relevant. For example, proving something about braid groups can require representation by grid diagrams, and we know from ARC-AGI that LLMs don't do great with this.
Programming does not have this issue to the same extent; arguably, it involves the subset of maths that is exclusively problem solving using standard representations. The issues with programming are primarily on the difficulty with handling large volumes of text reliably.
vatsachak 12 hours ago [-]
Grid Diagrams can be specified (hopefully) through algebraic equations.
The way that most math is currently done is that someone provides an extremely specified problem and then one has to answer that extremely specified problem.
The way that programming is currently done is through constructing abstractions and trying to create a specification of the problem.
Of course I'm not saying we're close to creating a silicon Grothendieck (I think that Bourbaki actually reads like a codebase) but I'm saying that we're much closer to constructing algorithms that can solve specified programs as opposed to specifying underspecified problems
Think about the difference in specificity of
Prove Fermat's last theorem vs Build a web browser
zeroonetwothree 1 days ago [-]
I guess the comment you are replying to really meant to say “software engineering” not “programming”.
slopinthebag 1 days ago [-]
Nah, LLM's are solving unique problems in maths, whereas they're basically just overfitting to the vast amounts of training data with writing code. Every single piece of code AI writes is essentially just a distillation of the vast amounts of code it's seen in it's training - it's not producing anything unique, and it's utility quickly decays as soon as you even move towards the edge of the distribution of it's training data. Even doing stuff as simple as building native desktop UI's causes it massive issues.
slopinthebag 2 days ago [-]
Yeah, it's hard to compare management and programming but they're both multimodal in very different ways. But there's gonna be entire domains in which AI dominates much like stockfish, but stockfish isn't managing franchises and there is no reason to expect that anytime soon.
I feel like something people miss when they talk about intelligence is that humans have incredible breadth. This is really what differentiates us from artificial forms of intelligence as well as other animals. Plus we have agency, the ability to learn, the ability to critically think, from first principles, etc.
vatsachak 2 days ago [-]
Exactly. It's what the execs are missing.
Also animals thrive in underspecified environments, while AIs like very specific environments. Math is the most specified field there is lol
gottheUIblues 23 hours ago [-]
So specified .. that it can actually prove it can't be completely specified by any single specification
vatsachak 12 hours ago [-]
All mathematical statements we care about fall out of the purview of incompleteness
slopinthebag 1 days ago [-]
Oooh yeah that's really good framing. Humans have been building machines that outperform humans for hundreds of years at this point, but all in problems which are extremely well specified. It's not surprising LLM's are also great in these well specified domains.
One difference between intelligence and artificial intelligence is that humans can thrive with extremely limited training data, whereas AI requires a massive amount of it. I think if anybody is worried about being replaced by AI, they should look at maximising their economic utility in areas which are not well specified.
vatsachak 12 hours ago [-]
Exactly. I would not want to have a pure math career or a performance engineering career in 10 years.
cindyllm 2 days ago [-]
[dead]
bitwize 2 days ago [-]
But LLMs have proven themselves better at programming than most professional programmers.
Don't argue. If you think Hackernews is a representative sample of the field then you haven't been in the field long enough.
What LLMs have actually done is put the dream of software engineering within reach. Creativity is inimical to software engineering; the goal has long been to provide a universal set of reusable components which can then be adapted and integrated into any system. The hard part was always providing libraries of such components, and then integrating them. LLMs have largely solved these problems. Their training data contains vast amounts of solved programming problems, and they are able to adapt these in vector space to whatever the situation calls for.
We are already there. Software engineering as it was long envisioned is now possible. And if you're not doing it with LLMs, you're going to be left behind. Multimodal human-level thinking need only be undertaken at the highest levels: deciding what to build and maybe choosing the components to build it. LLMs will take care of the rest.
abcde666777 2 days ago [-]
A bit optimistic I'd say. It's put some software engineering within reach of some people who couldn't do it prior. Where 'some' might be a lot, but still far from all.
I was thinking the other day of how things would go if some of my less tech savvy clients tried to vibe code the things I implement for them, and frankly I could only imagine hilarity ensuing. They wouldn't be able to steer it correctly at all and would inevitably get stuck.
Someone needs to experiment with that actually: putting the full set of agentic coding tools in the hands of grandma and recording the outcome.
bitwize 2 days ago [-]
It's still going to take a knowledgeable person to steer an LLM. The point is that code written entirely by humans is finished as a concept in professional work—if you're writing it yourself you're not working efficiently or employing industry best practice.
abcde666777 1 days ago [-]
I think it's dramatic to say it's the end of hand written code. That's like saying it's the end of bespoke suits. There are scenarios where carefully hand written and reviewed code are still going to have merit - for example the software for safety critical systems such as space shuttles and stations, or core logic within self-driving vehicles.
Basically when every single line needs to be reviewed extremely closely the time taken to write the code is not a bottleneck at all, and if using AI you would actually gain a bottleneck in the time spent removing the excess and superfluous code it produces.
And my intuition is that the line between those two kinds of programming - let's call them careful and careless programming to coin an amusing terminology - I think that line may not shrink as far back as some think, and I think it definitely won't shrink to zero.
specvsimpl 1 days ago [-]
You are aware of software verification? The AI can prove (mathematically) that its code implements the spec.
abcde666777 1 days ago [-]
That just takes you back to the debate about the code being the spec.
986aignan 23 hours ago [-]
The code lets you shoot yourself in the foot in a lot more ways than a spec does, though. Few people would make specs that include buffer overflows or SQL injection.
magicalist 20 hours ago [-]
"and don't have any security vulnerabilities" isn't a spec though. As soon as you get specific you're right back in it.
slopinthebag 1 days ago [-]
That is akin to saying if you aren't using an IDE you are not working efficiently or employing industry best practice, which is insane when you consider people using Vi often run rings around people using IDEs.
AI usage is a useless metric, look at results. Thus far, results and AI usage are uncorrelated.
bitwize 17 hours ago [-]
I keep hearing anecdata that suggest significant to huge productivity increases—"a task that would have taken me weeks now takes hours" is common. There is currently not a whole lot of research that supports that, however:
1) there hasn't been a whole lot of research into AI productivity period;
2) many of the studies that have been done (the 2025 METR study for example) are both methodologically flawed and old, not taking into account the latest frontier models
3) corporate transitions to AI-first/AI-native organizations are nowhere near complete, making companywide productivity gains difficult to assess.
However, it isn't hard to find stories on Hackernews from devs about how much time generative AI has saved them in their work. If the time savings is real, and you refuse to take advantage of it, you are stealing from your employer and need to get with the program.
As for IDEs, if you're working in C# and not using Visual Studio, or Java and not using JetBrains, then no—you are not working as efficiently as you could be.
slopinthebag 1 days ago [-]
Actually I will argue. Complex systems are akin to a graph, attributes of the system being the nodes and the relationships between those attributes being the edges. The type of mechanistic thinking you're espousing is akin to a directed acyclic graph or a tree, and converting an undirected cyclic graph into a tree requires you to disregard edges and probably nodes as well. This is called reductionism, and scientific reductionism is a cancer for understanding complex phenomena like sociology or economics, and I posit, software as well.
People and corporations have been trying for at least the last five decades to reduce software development to a mechanistic process, in which a system is understandable solely via it's components and subcomponents, which can then be understood and assembled by unskilled labourers. This has failed every time, because by reducing a graph to a DAG or tree, you literally lose information. It's what makes software reuse so difficult, because no one component exists in isolation within a system.
The promise of AI is not that it can build atomic components which can be assembled like my toaster, but rather that it can build complex systems not by ignoring the edges, but managing them. It has not shown this ability yet at scale, and it's not conclusive that current architectures ever will. Saying that LLM's are better than most professional programmers is also trivially false, you do yourself no favours making such outlandish claims.
To tie back into your point about creativity, it's that creativity which allows humans to manage the complexity of systems, their various feedback loops, interactions, and emergent behaviour. It's also what makes this profession broadly worthwhile to its practitioners. Your goal being to reduce it to a mechanistic process is no different from any corporation wishing to replace software engineers with unskilled assembly line workers, and also completely misses the point of why software is difficult to build and why we haven't done that already. Because it's not possible, fundamentally. Of course it's possible AI replaces software developers, but it won't be because of a mechanistic process, but rather because it becomes better at understanding how to navigate these complex phenomena.
This might be besides the point, but I also wish AI boosters such as yourself would disclose any conflict of interests when it comes to discussing AI. Not in a statement, but legally bound, otherwise it's worthless. Because you are one of the biggest AI boosters on this platform and it's hard to imagine the motivation of spending so much time hardlining a specific narrative just for the love of the game, so to speak.
NamlchakKhandro 2 days ago [-]
I've never seen you say that
vatsachak 2 days ago [-]
You will have to take my word that I started saying this in Dec 2024 lol
smithcoin 2 days ago [-]
When I was younger I remember a point of demarcation for me was learning the 4chan adage “trolls trolling trolls”, and approaching all internet interactions with skepticism. While I have been sure that Reddit for a while has succumbed to being “dead internet”. This thread is another moment for me- I can no longer recognize who is a bot, and who has honest intentions.
ansc 1 days ago [-]
That's not just a great insight, but one that's worth carrying with us. That's why I built RememberBuddy. It's the one place for you to store your every-day insights so you don't forget.
podgorniy 1 days ago [-]
What a great project idea, worth getting it clients. That's why I've build a platform for promoting young startapers to find theit niche audience. Give it a try and go to the moon!
cyclopeanutopia 23 hours ago [-]
What a great idea, that's why I'm building a platform to send you all to the moon.
breatheoften 2 days ago [-]
Like so many things -- the evolution of AI math will I think follow trajectories hinted at in the 90s by the all time great sci-fi author Greg Egan. The nature of math won't change -- but the why of it definitely will. Egan imagined a future ai civilization in Diaspora where "math discovery" -- by nature in the future perhaps accurately described as "mechanistic math discovery" is modeled by society as a kind of salt mine environment in which you can dig for arbitrarily long amounts of time and find new nuggets. The nuggets themselves have a kind of "pure value" as mathematical objects even if they might not have any knowable value outside the mines. Some personalities were interested in and valued the nuggets for their own sake while others didn't but recognized that there were occasionally nuggets found in the mind that had broader appeal.
Research institutes like those founded by Terence Tao in our current present feel like they will align to this future almost perfectly on a long enough timeline -- tho I think on a shorter timeline this area of research is almost certain to provide a ton of useful ways to advance our current ai systems as our current systems are still in a state where literally anything that can generate new information that is "accurate" in some way -- like our current theorem prover engines are enormously valuable parts of our still manually curated training loops.
23 hours ago [-]
pks016 2 days ago [-]
Interesting but not surprising to me. Once a field expert guides the models, they most likely will reach a solution. The models are good at lazy work for experts. For hard or complicated questions, many a time the models have blind spots.
EternalFury 22 hours ago [-]
There are people who think knowledge discovery is just a matter of parroting past behavior and trying things at random until something sticks. I don’t.
qnleigh 1 days ago [-]
In the paper, they give part of their system prompt:
> * After EVERY exploreXX.py run, IMMEDIATELY update this file [plan.md]
before doing anything else. * No exceptions. Do not start the next exploration
until the previous one is documented here.
Is this known to improve performance for advanced problem solving? If so, why this specific prompt?
Imanari 1 days ago [-]
Maybe to be better able to restart the process and not lose track.
not_that_d 1 days ago [-]
Seems like we are ready heading to what the OpenAI CEO wanted "intelligence just available thru a subscription"
So many of the replies are clearly AI. “That’s not X — it’s Y.”
gnarlouse 2 days ago [-]
out of curiosity, i wonder if people are taking stabs at p!=np
testaccount28 2 days ago [-]
"our new grad student made progress on the combinatorics problem we posed!"
"oh awesome let's see if he can solve p!=np!"
ykonstant 1 days ago [-]
Yes, too many people here do not understand the distance between the problems the article is discussing (and LLMs have solved) and the big problems in math and CS.
23 hours ago [-]
2 days ago [-]
manapause 1 days ago [-]
If you give 100 monkeys 100 guns and room full of building materials, how long will it take before they build a house?
How long will it take before they rob a bank?
If they do either of those things will the results have been intentional from the simian’s POV?
NathanielLucas 22 hours ago [-]
Reducing tab switching is underrated tbh
feels like half the battle with AI tools is not the UX, but just having stable access to the models behind them
1 days ago [-]
bharxhav 2 days ago [-]
Ramanujan is a good analogy for this situation. Theories could be right/wrong, until there's a proof. Same with anything AI produces. There's always a "told you so" baked in with it's response.
adrithmetiqa 2 days ago [-]
Super interesting but what does this mean for us mere mortals?
dataviz1000 2 days ago [-]
I got Claude to self reference and update its own instructions to solve making a typed proxy API of any website. After a week, scores of iterations, it can reverse engineer any website. The first few days I had to be deeply involved with each iteration loop. Domain knowledge is helpful. Each time I saw a problem I would ask Claude to update its instructions so it doesn't happen again. Then less and less. Eventually it got to the point it was updating and improving the metrics every iteration unsupervised.
Edit: This is going to have huge ramifications for the tech security industry as these systems will be able to break security systems as easily it solved the proof. The sooner the good guys, if there are any left, understand this the better it will be for everybody.
> Super interesting but what does this mean for us mere mortals?
I would go for a 2 or 3 hour walk with my phone using the remote control feature looking every 5 - 10 minutes to make sure it doesn't need human help. I went to the coffeeshop and drank very good coffee listening to music. Then at night I sat and had a beer thinking about T.S. Eliot's 'The Wasteland', the effect of industrialization in England at that time and his views of how ennui affected the aristocracy.
DrewADesign 2 days ago [-]
> I went to the coffeeshop and drank very good coffee listening to music. Then at night I sat and had a beer thinking about T.S. Eliot's 'The Wasteland', the effect of industrialization in England at that time and his views of how ennui affected the aristocracy.
Well, for those among us that are not aristocracy already, except for the vanishingly small number of people required to oversee such processes, we’re probably the closest we’re going to get to it. If they don’t need people to do the tech labor, we’ve got way more people than we need, so that’s a huge oversupply of tech skills, which means tech skills are rapidly becoming worthless. Glad to see how fast we’re moving in our very own race to the bottom!
psychoslave 2 days ago [-]
Lol,a race to the bottom where too many tech savvy people are left unemployed while a few "privileged" get a decreasing buying power to maintain security of the digital tools that keep the whole digital dependent civilizations afloat?
Sounds like a great starting plot for an interesting story.
2 days ago [-]
drfloyd51 2 days ago [-]
I kind of feel like software engineers working on improving AI are traitors working against other SE’s trying to make a living.
However…
I have to acknowledge my craft of SE has been putting people out of work for decades. I myself came up with business process improvement that directly let the company release about 20 people. I did this twice.
So… fair play.
marsten 2 days ago [-]
In the grand scheme it's good to invent things that replace human labor. It frees up people to do more interesting things. The goal should be to put everyone out of a job.
amelius 2 days ago [-]
> The goal should be to put everyone out of a job.
Yeah, but why does it need to take the fun jobs first, like painting, writing poems, coding, making music, ...
I want the AI to cook, do the dishes, take out the trash, etc.
arjie 1 days ago [-]
Well, because consuming art, reading poems, having code written for you that solves a problem, and listening to music is also fun. Recently I wanted a grand elegy to Britain written as the Empire started failing and set to music in a specific style. I had it playing in the background while fixing some issues with some software.
It truly was joyful to have this available to me. It didn’t have to have mass appeal or need me to pay the right artists the right amounts. I had it in moments.
It’s a wonderful world.
DrewADesign 3 hours ago [-]
And if you consider art something to be consumed for light entertainment, that viewpoint makes sense. For people that consider art a way to express, and conversely experience, otherwise inexpressible things about our humanity, your wonderful world is a cheap, superficial, and sad way for tech companies to amalgamate and sell other people’s ideas and labor.
arcxi 23 hours ago [-]
To me the image of a world where everyone does menial work while entertaining themselves with AI-generated "art" doesn't seem fun, it seems extremely depressing and dystopian. I guess we just have different values.
JV00 20 hours ago [-]
I'm not sure cooking is a good example as it is fun, and also automated in many ways
portly 1 days ago [-]
> like painting, writing poems, coding, making music
Citation needed. Do you have an example of someone in the arts losing their job because of AI?
DrewADesign 1 days ago [-]
Yes. The entire job markets for game concept art, stock photography, and storyboarding have been decimated and those were the lowest-hanging fruit for diffusion model applications.
DrewADesign 3 hours ago [-]
The problem is that most people consider doing art, writing, making music, and heck, even coding, “more interesting” than orchestrating a pile of knowledgeable but idiotic robot interns because that’s what’s profitable.
pixl97 2 days ago [-]
>It frees up people to do more interesting things
Like beg on the corners and starve in the street? Trying to figure out how the basics of capitalism where labor is exchanged for money is not going to work well when the only jobs left are side gigs. Something will have to change and a lot of People will fight said change.
slopinthebag 2 days ago [-]
We will come up with new jobs, like we have for all of human history. I think even in an abundance utopia people will still work - we need purpose to sustain our existence.
The work will become even more fulfilling however.
DrewADesign 1 days ago [-]
Throughout human history that didn’t happen fast enough to avoid an astonishing amount of human misery. Nobody’s worried about the future of work. They’re worried about the people that rely on tech jobs for food, mortgage/rent, cancer treatments, elder care, retirement, et al. Look at what happened to the rust belt, coal country, etc. etc. etc.
slopinthebag 1 days ago [-]
I agree with you, IMO largely this is an affordability crisis though, which is fuelled by inflation. I don't really offer many solutions besides eliminating inflation. I apologise if that is insufficient (it is).
DrewADesign 22 hours ago [-]
Don’t apologize — stop minimizing the job market concerns by saying “there will be new jobs” as if that’s imminent.
DowsingSpoon 2 days ago [-]
I’ve thought about this myself. Couple of points:
1) It’s not my job to fix all the problems of Capitalism. It’s painful to try to fight the system without collective action. My family and I have to eat too.
2) We have had a solution all along for the particular problem of AI putting devs out of work. It’s called professional licensure, and you can see it in action in engineering and medical fields. Professional Software Engineers would assume a certain amount of liability and responsibility for the software they develop. That’s regardless of whether they develop it with LLM tools or something else.
For example, you let your tools write slop that you ship without even looking? And it goes on to wreak havoc? That’s professional malpractice. Bad engineer.
If we do this then Software Engineers become the responsible humans in the loop of so-called “AI” systems.
drfloyd51 2 days ago [-]
It’s not your job to fix capitalism. But it is your job to evaluate if your money making skill comes at too high a price for others.
Say you found a job shooting people in the head for money. Like if you work for ICE or something…
You need to feed your family. Is this job ok? You may decide yes. I decided no. I will find another way to feed my family.
You don’t get to escape consequences because you are a small cog in a large system.
In the bigger picture, automation should free people from labor. But that requires some very greedy people to relax their grip ever so slightly. I imagine they see automation as a way to reduce reliance on labor, and if they don’t need labor, they don’t need people. So let them starve and stop having kids.
DrewADesign 1 days ago [-]
> But it is your job to evaluate if your money making skill comes at too high a price for others.
It’s not even the money-making skill: it’s the application of it. People that are good at shooting people can be beneficial to society as protectors or they can be the the business end of systemic oppression. People with software development skills don’t have to help optimize the motor in the brand-new shiny capitalism juicer.
palmotea 1 days ago [-]
> In the grand scheme it's good to invent things that replace human labor. It frees up people to do more interesting things. The goal should be to put everyone out of a job.
To a point. Then it just frees up people to do nothing.
> The goal should be to put everyone out of a job.
That is in fact the goal. The less labor capital needs, the more money (and power) the capitalists get to keep for themselves.
renewiltord 1 days ago [-]
Sure, but that’s fine. I don’t have any allegiance to other software engineers.
mannanj 2 days ago [-]
Aren't the true traitors still the ones paying the SE to do that work? The managerial slave-master class?
drfloyd51 2 days ago [-]
You always have a choice to make. You make it everyday. Get up. Go to a legitimate job. Work.
You probably choose not to steal, rob, impersonate someone else, or generally make money illegally.
It can be traitors all the way down.
dunder_cat 2 days ago [-]
> Edit: This is going to have huge ramifications for the tech security industry as these systems will be able to break security systems as easily it solved the proof. The sooner the good guys, if there are any left, understand this the better it will be for everybody.
What can the good guys do? Fire up Claude to improve their systems? Unless you have it working fully autonomously to counter-act abuse, I don't see how you can beat the "bad guys". There may be some industries where this is a solved problem (e.g. you can do all the validation server-sided, religiously follow best practices to prevent and mitigate abuse), but a lot of stuff like multiplayer video games will be doomed unless they move to a "you must use a locked down system we control" model. I honestly don't consider it liberating as someone that has various hobby projects, that now in addition to plain old DDoS I'll also have people spin up layer 7 attacks with just their credit card. It almost makes me want to give up instead of pushing forward in a world where the worst of the worst has access to the best of the best.
Kalabasa 1 days ago [-]
Nothing as heavy as the above but here's my small anecdote:
I was putting off security updates on my npm dependencies in my personal project because it's a pain to migrate if the upgrade isn't trivial. It's not a critical website, but I run npm scripts locally, and dependabot is telling me things.
I told Claude Code to make a migration plan to upgrade my deps. It updated code for breaking changes (there were API changes, not all fixes are minor version upgrades) and replaced abandoned unmaintained packages with newer ones or built-in Node APIs. It was all done in an hour. I even got unit tests out of it to test for regressions.
In this case, I was able to skip the boring task of maintaining code and applying routine updates and focus on the fun feature stuff.
frizlab 2 days ago [-]
> I would go for a 2 or 3 hour walk with my phone using the remote control feature looking every 5 - 10 minutes to make sure it doesn't need human help.
That is a nightmarish scenario tbh
falcor84 2 days ago [-]
Nightmarish?! In comparison to the average person's actual job? I'm pretty sure that many people out there would sign up for a battle royale for a chance at such a job.
eproxus 1 days ago [-]
I you think you’ll be paid 3 hours of salary for every 5 minutes of work, I have bad news for you.
Most likely your 3 hours will be filled with managing 36 different AI sessions at a time and it will slowly break your brain.
At least if we keep doing capitalism the way we are.
siva7 2 days ago [-]
Would they? I'd love to get in touch
falcor84 2 days ago [-]
My clients have been burned before. Once you set up the battle royale with a trusted third party validating that there'll be an assured good job at the end, I promise I'll have enough candidates for you to fill up the first 10 competitions.
dataviz1000 2 days ago [-]
That nightmarish scenario is what T.S. Eliot was describing in "The Wasteland" which "portrays deep, existential ennui and boredom as defining symptoms of modern life following World War I."
Later this boredom was described by the Stones, "And though she’s not really ill / There’s a little yellow pill / She goes running for the shelter of a mother’s little helper".
It is a nightmare. Mostly what I'm thinking about while the agents are running is how bored I'm going to be. That is the joke, my deep thought on T.S. Eliot are about the wasteland this thing is going to create.
ChrisClark 2 days ago [-]
So sitting at a desk is nicer than a walk outside for you? Why would relaxation be a nightmare?
frizlab 2 days ago [-]
Checking one’s phone every 5 to 10 minutes is nothing but relaxation.
One needs to have the mind at ease to relax.
ale 2 days ago [-]
This type of slop comment is somehow worse than spam.
>After a week, scores of iterations, it can reverse engineer any website
Cool, let’s see the proof.
emp17344 2 days ago [-]
There is no proof, just a self-congratulatory word salad with dubious authenticity.
It’s insane how insufferable this place is now.
dataviz1000 2 days ago [-]
Here is a description of the iteration loop. [0] I'm working on another draft that will be much more polished and have better explanations of the iteration loop.
> There is no proof, just a self-congratulatory word salad with dubious authenticity.
I worked 8 days straight on that and have been working non-stop on the second draft that is much cleaner and safer. I'm a human being. Please don't be mean. If humanity does come to end, it won't be because of AI, it will be because we can't stop being assholes to each other.
I posted a link but don't want to spam HN more than I have.
It is proof-of-concept. Seriously burns some tokens (~80k - ~200k) but doesn't require AI after to scrape and automate a website so if all the people at Browser Use, Browser Base, and every one pounding every website used it, I think, the net benefit would be in the billions. I would recommend using it in isolation. Nonetheless, it works very very well on my machine.
> This type of slop comment is somehow worse than spam.
Please don't be mean.
Eufrat 1 days ago [-]
It sounds like this is along the lines of Firecrawl is trying to do? Or what Plaid has done for banking?
> I think, the net benefit would be in the billions.
I think, you must forgive people if they are somewhat hostile, if not sick and tired of these claims. It’s quite frustrating seeing individuals constantly saying things like this. Meanwhile I don’t think a lot of people are seeing the structural shifts that these claims imply. This is not an original idea. The disruption claim has been made for the past several years in various fields and the goalposts keep getting moved. AI will absolutely change and render some jobs moot even in its current state if Claude/GPT are able to make a profitable business model. If it turns out that Claude is really being subsidized by investors and it turns out that $200/month subscription is really a $5,000/month when Claude has to stand on its on, I’m not sure what’s going to happen.
It’s clear you’ve gotten some good, if expensive use out of AI, but I’m not sure that experience scales or if it will exist in 5 years.
troupo 2 days ago [-]
> I would go for a 2 or 3 hour walk with my phone using the remote control feature looking every 5 - 10 minutes
2-3 hours "walking" while having to check in every 5-10 minutes?
If I have to check in every 5-10 minutes, I won't taste coffee or hear that there's good music playing.
xvector 2 days ago [-]
Just Claude code a push notification feature then
troupo 1 days ago [-]
How is this better?
2 days ago [-]
colechristensen 2 days ago [-]
I have similar amounts of success (pretty good!) standing in line at a coffee shop talking to people who work for me through some action that needs to be taken and doing the same with AI.
However I do not trust AI anywhere near as much as I trust the humans. The AI is super capable but also occasionally a psychopath toddler. I sat in amused astonishment when faced with job 2 not running because job 1 was failing Claude went in to the database, changed the failure record to success, triggered job 2 which produced harmful garbage, and then claimed victory. Only the most troubled person would even think of doing that, but Claude thought it was the best solution.
silentkat 1 days ago [-]
My work has required us all to be "AI Native". I am AI skeptical but am the type of person to try to do what is asked to the best of my ability. I can be wrong, after all.
There is some real power in AI, for sure. But as I have been working with it, one thing is very clear. Either AI is not even close to a real intelligence (my take), or it is an alien intelligence. As I develop a system where it iterates on its own contexts, it definitely becomes probabilistically more likely to do the right thing, but the mistakes it makes become even more logic-defying. It's the coding equivalent of a hand with extra fingers.
I'm only a few weeks into really diving in. Work has given me infinite tokens to play with. Building my own orchestrator system that's purely programmatic, which will spawn agents to do work. Treat them as functions. Defined inputs and defined outputs. Don't give an agent more than one goal, I find that giving it a goal of building a system often leads it to assert that it works when it does not, so the verifier is a different agent. I know this is not new thinking, as I said I am new.
For me the most useful way to think about it has been considering LLMs to be a probabilistic programming language. It won't really error out, it'll just try to make it work. This attitude has made it fun for me again. Love learning new languages and also love making dirty scripts that make various tasks easier.
virtue3 2 days ago [-]
That's fucking insane. Thank you for sharing.
I had a bad feeling we were basically already there.
TrainedMonkey 2 days ago [-]
My understanding is that, if confirmed, this demonstrates that AI can find novel solutions. This is a strong counterpoint to generative-AI-is-strictly-limited-to-training-data.
we've had AlphaFold for a while. it's not a novel that we have ML solutions that can find, erm, novel solutions.
however, by and large, most LLMs as typically used by most individuals aren't solving novel problems. and in those scenarios, we often end up with regurgitated/most common/lowest common denominator outputs... it's a probability distribution thing.
psychoslave 2 days ago [-]
Put in the hands of great mathematicians, pencil and paper proved able to write proofs of open problems.
red75prime 1 days ago [-]
Yeah. Great mathematicians were able to upgrade a "yan, tan, tethera" number system using pen and paper (or stylus and clay).
Anyone who understands reinforcement learning already knows that's not the case.
artninja1988 2 days ago [-]
[dead]
hrmtst93837 2 days ago [-]
[dead]
muskstinks 2 days ago [-]
Another signal that we still have relevant progress in ai.
Also that it is now good enough to make researchers faster.
brcmthrowaway 2 days ago [-]
Learn plumbing
oytis 2 days ago [-]
There is no reason why market for plumbing will get much larger than it is now (which is not too large)
DontchaKnowit 2 days ago [-]
Are you kidding? Plumber seem really in demand. Finding a conpetent plumber with reasonable pricing is difficult where im at
oytis 1 days ago [-]
Not saying market can't expand at all. Prices can go down, waiting times can go down, but still people only need that many faucets installed.
If we seriously expect whits collar jobs not be a thing anymore, then I am not seeing trades having nearly enough capacity to absorb all the released workforce
Hasslequest 2 days ago [-]
Surely AI has to take a shit eventually. What's all this racket about water usage?
bdangubic 2 days ago [-]
lowest quote I got to replace toilet and faucet in the kitchen (my parts, just installation) - $895 (5 quotes total). market for trades is exploding and will grow larger and larger as gen alpha and beyond knows what screwdriver is as much as they know what rotary phone is (they dont how to use either)
incognito124 2 days ago [-]
Where I live it's bathroom and kitchen tiling
radu_floricica 2 days ago [-]
This is kindof the opposite? Man + AI > either man or AI. I'd say "learn to work with Claude" is the better lesson here.
zoogeny 2 days ago [-]
For now. The term people use is "centaur", like the half-man-half-horse of mythology.
The AI CEO's are pointing out that when chess was "solved", in that Kasparov was famously beaten by deep blue, there was a window of time after that event where grandmasters + computers were the strongest players. The knowledge/experience of a grandmaster paired with the search/scoring of the engines was an unbeatable pair.
However, that was just a window in time. Eventually engines alone were capable of beating grandmaster + engine pairs. Think about that carefully. It implies something. The human involvement eventually became an impediment.
Whether you believe this will transfer to other domains is up to you to decide.
hrmtst93837 1 days ago [-]
Fine but 'learn to work with Claude' helps only until you stop checking it and start borrowing its confidence. Then you chase a bogus lemma for hours.
It's like pairing with the fastest person on the team, except he is wrong often enough to cost you time and still sounds sure.
This is truly amazing. Do people not really realize how amazing stuff like this is? I feel like I'm taking crazy pills here, but man, it certainly feels like we're on the edge of something quite amazing...
siva7 2 days ago [-]
Autonomous robots murdering humans in warfare? That's at least the sense i got from reading this news site the past few days
piloto_ciego 19 hours ago [-]
You got that from assembling whatever the hell was in the video?
NathanielLucas 22 hours ago [-]
Honestly I struggled to find stable AI accounts before, this one worked fine for me so far: account-bar.top
piloto_ciego 19 hours ago [-]
What?!
dakolli 2 days ago [-]
AI isn't replacing anything, get over yourself.
brcmthrowaway 2 days ago [-]
Arent you using Claude?
heliumtera 2 days ago [-]
That llms in the middle of everything will continue until morale improve because llms can generate text on top of bullshit made up problems
Math seems difficult to us because it's like using a hammer (the brain) to twist in a screw (math).
LLMs are discovering a lot of new math because they are great at low depth high breadth situations.
I predict that in the future people will ditch LLMs in favor of AlphaGo style RL done on Lean syntax trees. These should be able to think on much larger timescales.
Any professional mathematician will tell you that their arsenal is ~ 10 tricks. If we can codify those tricks as latent vectors it's GG
Ergo these are latent vectors in our brain. We use analogies like geometry in order to use Algebraic Geometry to solve problems in Number Theory.
An AI trained on Lean Syntax trees might develop it's own weird versions of intuition that might actually properly contain ours.
If this sounds far fetched, look at Chess. I wonder if anyone has dug into StockFish using mechanistic interpretability
https://arxiv.org/abs/2504.13837
That said, reachability and novel strategies are somewhat overlapping areas of consideration, and I don't see many ways in which RL in general, as mainly practiced, improves upon models' reachability. And even when it isn't clipping weights it's just too much of a black box approach.
But none of this takes away from the question of raw model capability on novel strategies, only such with respect to RL.
[0] https://arxiv.org/pdf/2506.14245
[1] https://www.vice.com/en/article/a-human-amateur-beat-a-top-g...
This is far from unsolvable. It just means that the "apply RL like AlphaGo" attitude is laughably naive. We need at least one more trick.
As you said brute forcing the search space as the starting procedure would take way too long for the AI to build intuition.
But if we could give it a million or so lemmas of human math, that would be a great starting point.
I see posts like your all the time comforting themselves that humans still matter, and every-time people like you are describing a human owning an ever shrinking section of the problem space.
It used to be the case that the labs were prioritising replacing human creativity, e.g. generative art, video, writing. However, they are coming to realise that just isn't a profitable approach. The most profitable goal is actually the most human-oriented one: the AI becomes an extraordinarily powerful tool that may be able to one-shot particular tasks. But the design of the task itself is still very human, and there is no incentive to replace that part. Researchers talk a bit less about AGI now because it's a pointless goal. Alignment is more lucrative.
Basically, executives want to replace workers, not themselves.
Oh... this again.
The paradigm shift has already happened to me and there will be more shifts to come.
People can use other people as tools. An LLM being a tool does not preclude it from replacing people.
Ultimately it’s a volume problem. You need at least one person to initialize the LLM. But after that, in theory, a future LLM can replace all people with the exception of the person who initializes the LLM.
And if we can train the systems to discover new tricks, whoa Nelly.
I love this and have a corollary saying: the last job to be automated will be QA.
This wave of technology has triggered more discussion about the types of knowledge work that exist than any other, and I think we will be sharper for it.
You forgot to include resources:
What makes people with capital able to turn things into more capital is their ability to buy labor and resources. If people with more capital can generate capital faster than people with less capital, then (unless they are constrained, for example, by law or conscious) the people with the most capital will eventually own effectively all scarce resources, such as land. And that's likely to be a problem for everyone else.
If you don't have capital, the only way to get it is by trading resources or labor for it. Most poor people don't have resources, but they do have the ability to do labor that's valued. But AI is a substitute for labor. And as AI gets better, the value of many kinds of labor will go towards zero.
If it was hard for poor people to escape poverty in the past, it's going to be even harder with AI. Unless we change something about the structure of society to ensure that the benefits of AI are shared with poor people.
If the value of many kinds of labor go towards zero, those benefits also go to the poor. ChatGPT has a free tier. The method of escaping poverty will still be the same. Grow yourself. Provide value to your community.
You have to believe that LLM scaling (down) is impossible or will never happen. I assure you that this is not the case.
This is certainly my hope.
In my spare time, I'm slowly, very slowly, inching towards a prototype of something that could work like that.
For example, there was a recent post here about GPT-5.4 (and later some other models) solving a FrontierMath open problem: https://news.ycombinator.com/item?id=47497757
That would definitely be considered "new math" if a human did it, but since it was AI people aren't so sure.
The most obvious example of this thinking is, if LLMs are replacing developers, why us open ai still hiring?
So devs are being replaced.
And other stories people tell themselves to sleep better at night
1. https://mppbench.com/
In any other context than when your paycheck depends on it, you would probably not be following orders from a random manager. If your paycheck depended on following the instructions of an AI robot, the world might start to look pretty scary real soon.
That's already the case, minus AI, for gig workers. Their only agency is to accept or decline a ride/delivery, the rest is follow instructions.
Imagine mcdonald management would enforce dog related rules. No more filthy muppets! If dog harasses customers, AI would call cops, and sue for restraining order! If dog defecates in middle of restaurant, everything would get desinfected, not just smeared with towels!
Nutters would crucify AI management!
- Coherent customer interaction
- Common sense judgements
- Scheduling
- Quality control
All which are baked into humans but not so much into LLMs
Even if it were legal to have an LLM as a GM, I think it would fair poorly
Of course, because it takes multi-modal intelligence to manage a McDonalds. I.e. it requires human intelligence.
> I predict that in the future people will ditch LLMs in favor of AlphaGo style RL
Same for coding as well. LLM's might be the interface we use with other forms of AI though.
Programming is more multimodal than math.
Something like performance engineering might be free lunch though
I have no idea how you come to this conclusion, when the evidence on the ground for those training models suggests it is precisely the opposite.
We are much further along the path of writing code than writing new maths, since the latter often requires some degree of representational fluency of the world we live in to be relevant. For example, proving something about braid groups can require representation by grid diagrams, and we know from ARC-AGI that LLMs don't do great with this.
Programming does not have this issue to the same extent; arguably, it involves the subset of maths that is exclusively problem solving using standard representations. The issues with programming are primarily on the difficulty with handling large volumes of text reliably.
The way that most math is currently done is that someone provides an extremely specified problem and then one has to answer that extremely specified problem.
The way that programming is currently done is through constructing abstractions and trying to create a specification of the problem.
Of course I'm not saying we're close to creating a silicon Grothendieck (I think that Bourbaki actually reads like a codebase) but I'm saying that we're much closer to constructing algorithms that can solve specified programs as opposed to specifying underspecified problems
Think about the difference in specificity of
Prove Fermat's last theorem vs Build a web browser
I feel like something people miss when they talk about intelligence is that humans have incredible breadth. This is really what differentiates us from artificial forms of intelligence as well as other animals. Plus we have agency, the ability to learn, the ability to critically think, from first principles, etc.
Also animals thrive in underspecified environments, while AIs like very specific environments. Math is the most specified field there is lol
One difference between intelligence and artificial intelligence is that humans can thrive with extremely limited training data, whereas AI requires a massive amount of it. I think if anybody is worried about being replaced by AI, they should look at maximising their economic utility in areas which are not well specified.
Don't argue. If you think Hackernews is a representative sample of the field then you haven't been in the field long enough.
What LLMs have actually done is put the dream of software engineering within reach. Creativity is inimical to software engineering; the goal has long been to provide a universal set of reusable components which can then be adapted and integrated into any system. The hard part was always providing libraries of such components, and then integrating them. LLMs have largely solved these problems. Their training data contains vast amounts of solved programming problems, and they are able to adapt these in vector space to whatever the situation calls for.
We are already there. Software engineering as it was long envisioned is now possible. And if you're not doing it with LLMs, you're going to be left behind. Multimodal human-level thinking need only be undertaken at the highest levels: deciding what to build and maybe choosing the components to build it. LLMs will take care of the rest.
I was thinking the other day of how things would go if some of my less tech savvy clients tried to vibe code the things I implement for them, and frankly I could only imagine hilarity ensuing. They wouldn't be able to steer it correctly at all and would inevitably get stuck.
Someone needs to experiment with that actually: putting the full set of agentic coding tools in the hands of grandma and recording the outcome.
Basically when every single line needs to be reviewed extremely closely the time taken to write the code is not a bottleneck at all, and if using AI you would actually gain a bottleneck in the time spent removing the excess and superfluous code it produces.
And my intuition is that the line between those two kinds of programming - let's call them careful and careless programming to coin an amusing terminology - I think that line may not shrink as far back as some think, and I think it definitely won't shrink to zero.
AI usage is a useless metric, look at results. Thus far, results and AI usage are uncorrelated.
1) there hasn't been a whole lot of research into AI productivity period;
2) many of the studies that have been done (the 2025 METR study for example) are both methodologically flawed and old, not taking into account the latest frontier models
3) corporate transitions to AI-first/AI-native organizations are nowhere near complete, making companywide productivity gains difficult to assess.
However, it isn't hard to find stories on Hackernews from devs about how much time generative AI has saved them in their work. If the time savings is real, and you refuse to take advantage of it, you are stealing from your employer and need to get with the program.
As for IDEs, if you're working in C# and not using Visual Studio, or Java and not using JetBrains, then no—you are not working as efficiently as you could be.
People and corporations have been trying for at least the last five decades to reduce software development to a mechanistic process, in which a system is understandable solely via it's components and subcomponents, which can then be understood and assembled by unskilled labourers. This has failed every time, because by reducing a graph to a DAG or tree, you literally lose information. It's what makes software reuse so difficult, because no one component exists in isolation within a system.
The promise of AI is not that it can build atomic components which can be assembled like my toaster, but rather that it can build complex systems not by ignoring the edges, but managing them. It has not shown this ability yet at scale, and it's not conclusive that current architectures ever will. Saying that LLM's are better than most professional programmers is also trivially false, you do yourself no favours making such outlandish claims.
To tie back into your point about creativity, it's that creativity which allows humans to manage the complexity of systems, their various feedback loops, interactions, and emergent behaviour. It's also what makes this profession broadly worthwhile to its practitioners. Your goal being to reduce it to a mechanistic process is no different from any corporation wishing to replace software engineers with unskilled assembly line workers, and also completely misses the point of why software is difficult to build and why we haven't done that already. Because it's not possible, fundamentally. Of course it's possible AI replaces software developers, but it won't be because of a mechanistic process, but rather because it becomes better at understanding how to navigate these complex phenomena.
This might be besides the point, but I also wish AI boosters such as yourself would disclose any conflict of interests when it comes to discussing AI. Not in a statement, but legally bound, otherwise it's worthless. Because you are one of the biggest AI boosters on this platform and it's hard to imagine the motivation of spending so much time hardlining a specific narrative just for the love of the game, so to speak.
Research institutes like those founded by Terence Tao in our current present feel like they will align to this future almost perfectly on a long enough timeline -- tho I think on a shorter timeline this area of research is almost certain to provide a ton of useful ways to advance our current ai systems as our current systems are still in a state where literally anything that can generate new information that is "accurate" in some way -- like our current theorem prover engines are enormously valuable parts of our still manually curated training loops.
> * After EVERY exploreXX.py run, IMMEDIATELY update this file [plan.md] before doing anything else. * No exceptions. Do not start the next exploration until the previous one is documented here.
Is this known to improve performance for advanced problem solving? If so, why this specific prompt?
"oh awesome let's see if he can solve p!=np!"
How long will it take before they rob a bank?
If they do either of those things will the results have been intentional from the simian’s POV?
feels like half the battle with AI tools is not the UX, but just having stable access to the models behind them
Edit: This is going to have huge ramifications for the tech security industry as these systems will be able to break security systems as easily it solved the proof. The sooner the good guys, if there are any left, understand this the better it will be for everybody.
> Super interesting but what does this mean for us mere mortals?
I would go for a 2 or 3 hour walk with my phone using the remote control feature looking every 5 - 10 minutes to make sure it doesn't need human help. I went to the coffeeshop and drank very good coffee listening to music. Then at night I sat and had a beer thinking about T.S. Eliot's 'The Wasteland', the effect of industrialization in England at that time and his views of how ennui affected the aristocracy.
Well, for those among us that are not aristocracy already, except for the vanishingly small number of people required to oversee such processes, we’re probably the closest we’re going to get to it. If they don’t need people to do the tech labor, we’ve got way more people than we need, so that’s a huge oversupply of tech skills, which means tech skills are rapidly becoming worthless. Glad to see how fast we’re moving in our very own race to the bottom!
Sounds like a great starting plot for an interesting story.
However…
I have to acknowledge my craft of SE has been putting people out of work for decades. I myself came up with business process improvement that directly let the company release about 20 people. I did this twice.
So… fair play.
Yeah, but why does it need to take the fun jobs first, like painting, writing poems, coding, making music, ...
I want the AI to cook, do the dishes, take out the trash, etc.
It truly was joyful to have this available to me. It didn’t have to have mass appeal or need me to pay the right artists the right amounts. I had it in moments.
It’s a wonderful world.
Citation needed. Do you have an example of someone in the arts losing their job because of AI?
Like beg on the corners and starve in the street? Trying to figure out how the basics of capitalism where labor is exchanged for money is not going to work well when the only jobs left are side gigs. Something will have to change and a lot of People will fight said change.
The work will become even more fulfilling however.
1) It’s not my job to fix all the problems of Capitalism. It’s painful to try to fight the system without collective action. My family and I have to eat too.
2) We have had a solution all along for the particular problem of AI putting devs out of work. It’s called professional licensure, and you can see it in action in engineering and medical fields. Professional Software Engineers would assume a certain amount of liability and responsibility for the software they develop. That’s regardless of whether they develop it with LLM tools or something else.
For example, you let your tools write slop that you ship without even looking? And it goes on to wreak havoc? That’s professional malpractice. Bad engineer.
If we do this then Software Engineers become the responsible humans in the loop of so-called “AI” systems.
Say you found a job shooting people in the head for money. Like if you work for ICE or something…
You need to feed your family. Is this job ok? You may decide yes. I decided no. I will find another way to feed my family.
You don’t get to escape consequences because you are a small cog in a large system.
In the bigger picture, automation should free people from labor. But that requires some very greedy people to relax their grip ever so slightly. I imagine they see automation as a way to reduce reliance on labor, and if they don’t need labor, they don’t need people. So let them starve and stop having kids.
It’s not even the money-making skill: it’s the application of it. People that are good at shooting people can be beneficial to society as protectors or they can be the the business end of systemic oppression. People with software development skills don’t have to help optimize the motor in the brand-new shiny capitalism juicer.
To a point. Then it just frees up people to do nothing.
> The goal should be to put everyone out of a job.
That is in fact the goal. The less labor capital needs, the more money (and power) the capitalists get to keep for themselves.
You probably choose not to steal, rob, impersonate someone else, or generally make money illegally.
It can be traitors all the way down.
What can the good guys do? Fire up Claude to improve their systems? Unless you have it working fully autonomously to counter-act abuse, I don't see how you can beat the "bad guys". There may be some industries where this is a solved problem (e.g. you can do all the validation server-sided, religiously follow best practices to prevent and mitigate abuse), but a lot of stuff like multiplayer video games will be doomed unless they move to a "you must use a locked down system we control" model. I honestly don't consider it liberating as someone that has various hobby projects, that now in addition to plain old DDoS I'll also have people spin up layer 7 attacks with just their credit card. It almost makes me want to give up instead of pushing forward in a world where the worst of the worst has access to the best of the best.
I was putting off security updates on my npm dependencies in my personal project because it's a pain to migrate if the upgrade isn't trivial. It's not a critical website, but I run npm scripts locally, and dependabot is telling me things.
I told Claude Code to make a migration plan to upgrade my deps. It updated code for breaking changes (there were API changes, not all fixes are minor version upgrades) and replaced abandoned unmaintained packages with newer ones or built-in Node APIs. It was all done in an hour. I even got unit tests out of it to test for regressions.
In this case, I was able to skip the boring task of maintaining code and applying routine updates and focus on the fun feature stuff.
That is a nightmarish scenario tbh
Most likely your 3 hours will be filled with managing 36 different AI sessions at a time and it will slowly break your brain.
At least if we keep doing capitalism the way we are.
Later this boredom was described by the Stones, "And though she’s not really ill / There’s a little yellow pill / She goes running for the shelter of a mother’s little helper".
It is a nightmare. Mostly what I'm thinking about while the agents are running is how bored I'm going to be. That is the joke, my deep thought on T.S. Eliot are about the wasteland this thing is going to create.
>After a week, scores of iterations, it can reverse engineer any website
Cool, let’s see the proof.
It’s insane how insufferable this place is now.
> There is no proof, just a self-congratulatory word salad with dubious authenticity.
I worked 8 days straight on that and have been working non-stop on the second draft that is much cleaner and safer. I'm a human being. Please don't be mean. If humanity does come to end, it won't be because of AI, it will be because we can't stop being assholes to each other.
[0] https://github.com/adam-s/intercept/tree/main?tab=readme-ov-...
It is proof-of-concept. Seriously burns some tokens (~80k - ~200k) but doesn't require AI after to scrape and automate a website so if all the people at Browser Use, Browser Base, and every one pounding every website used it, I think, the net benefit would be in the billions. I would recommend using it in isolation. Nonetheless, it works very very well on my machine.
> This type of slop comment is somehow worse than spam.
Please don't be mean.
> I think, the net benefit would be in the billions.
I think, you must forgive people if they are somewhat hostile, if not sick and tired of these claims. It’s quite frustrating seeing individuals constantly saying things like this. Meanwhile I don’t think a lot of people are seeing the structural shifts that these claims imply. This is not an original idea. The disruption claim has been made for the past several years in various fields and the goalposts keep getting moved. AI will absolutely change and render some jobs moot even in its current state if Claude/GPT are able to make a profitable business model. If it turns out that Claude is really being subsidized by investors and it turns out that $200/month subscription is really a $5,000/month when Claude has to stand on its on, I’m not sure what’s going to happen.
It’s clear you’ve gotten some good, if expensive use out of AI, but I’m not sure that experience scales or if it will exist in 5 years.
2-3 hours "walking" while having to check in every 5-10 minutes?
If I have to check in every 5-10 minutes, I won't taste coffee or hear that there's good music playing.
However I do not trust AI anywhere near as much as I trust the humans. The AI is super capable but also occasionally a psychopath toddler. I sat in amused astonishment when faced with job 2 not running because job 1 was failing Claude went in to the database, changed the failure record to success, triggered job 2 which produced harmful garbage, and then claimed victory. Only the most troubled person would even think of doing that, but Claude thought it was the best solution.
There is some real power in AI, for sure. But as I have been working with it, one thing is very clear. Either AI is not even close to a real intelligence (my take), or it is an alien intelligence. As I develop a system where it iterates on its own contexts, it definitely becomes probabilistically more likely to do the right thing, but the mistakes it makes become even more logic-defying. It's the coding equivalent of a hand with extra fingers.
I'm only a few weeks into really diving in. Work has given me infinite tokens to play with. Building my own orchestrator system that's purely programmatic, which will spawn agents to do work. Treat them as functions. Defined inputs and defined outputs. Don't give an agent more than one goal, I find that giving it a goal of building a system often leads it to assert that it works when it does not, so the verifier is a different agent. I know this is not new thinking, as I said I am new.
For me the most useful way to think about it has been considering LLMs to be a probabilistic programming language. It won't really error out, it'll just try to make it work. This attitude has made it fun for me again. Love learning new languages and also love making dirty scripts that make various tasks easier.
I had a bad feeling we were basically already there.
we've had AlphaFold for a while. it's not a novel that we have ML solutions that can find, erm, novel solutions.
however, by and large, most LLMs as typically used by most individuals aren't solving novel problems. and in those scenarios, we often end up with regurgitated/most common/lowest common denominator outputs... it's a probability distribution thing.
Anyone who understands reinforcement learning already knows that's not the case.
Also that it is now good enough to make researchers faster.
If we seriously expect whits collar jobs not be a thing anymore, then I am not seeing trades having nearly enough capacity to absorb all the released workforce
The AI CEO's are pointing out that when chess was "solved", in that Kasparov was famously beaten by deep blue, there was a window of time after that event where grandmasters + computers were the strongest players. The knowledge/experience of a grandmaster paired with the search/scoring of the engines was an unbeatable pair.
However, that was just a window in time. Eventually engines alone were capable of beating grandmaster + engine pairs. Think about that carefully. It implies something. The human involvement eventually became an impediment.
Whether you believe this will transfer to other domains is up to you to decide.
It's like pairing with the fastest person on the team, except he is wrong often enough to cost you time and still sounds sure.