No shit “this decline [in the capability of LLMs to perform logical reasoning through multiple clauses] is due to the fact that current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data.”
Paper from Apple engineers showing that genAI models don’t actually understand what they read but just regurgitate what they’ve seen like a puppy wanting to please its owner:
https://arxiv.org/pdf/2410.05229
This is apparently doing some numbers so I want to make it clear that I think this paper is purely part of a strategic move by Apple to save face and gracefully exit from the genAI hype cycle. Hopefully other companies will do the same.
@CatherineFlick Microsoft definitely won't, they've bet the entire company on this garbage and will not weather its collapse well. Google and Facebook and Amazon are actually in somewhat better positions to back away from it, but it still feels like whether they do or not will mostly be a question of executive ego.
@jplebreton yep, classic big CEO energy
@CatherineFlick as much as I don't like corporate face-saving, I'm relieved to see Apple being unenthusiastic about genAI. I use a Mac for music work, and I would not be happy to see "AI" features keep getting incorporated into the OS. It would be extra cool if they backpedal for macOS 16 and remove or at least lessen the "Apple Intelligence" features.
That said, I've been hearing people like Timnit Gebru use the phrase "stochastic parrot" for years, so this paper is definitely funny
@CatherineFlick ...why ? With the proper implementation and safeguards current models can do amazing things (even if they are dumb and can not reason), a good example is NotebookLM or ChatGPT-4o1 (try it with math).
@ErikJonker they’re very very expensive and environmentally destructive toys. Their “amazing things” are not worth the cost and the safeguards are not sustainable - keeping safeguards up to date will be yet another forever task of whack a mole that companies won’t take seriously. They return bad results too often for use as decision making tools, even with humans in the loop - humans suck at challenging them. Are these harms worth it? Really?
@ErikJonker @CatherineFlick
The chain of thought approach that o1 models have adopted illustrate the diminishing returns of the LLM approach. You now have to wait minutes for it to generate multiple token sequences and pick the best chain in order to get a significant improvement over plain GPT4. And still the results cannot always be relied upon.
https://youtu.be/PH-qIyqwY4U
Will the o2 models have to generate multiple chains of multiple chains of thought, squaring the thinking time?
@bornach @CatherineFlick …for me NotebookLM is an example why those things aren’t necessarily a problem with the right application and the right context
@bornach @CatherineFlick kinda reminds me of the late days of 3dfx, the once-world-beating 3D graphics accelerator card company in the late 90s. for a while their cards were the best around, but they didn't really improve their architecture and the final card they didn't quite manage to ship was this absurd monster with 4 of the same chip crammed onto a single card. meanwhile, Nvidia's chips were doing more per clock and getting better architecturally with each generation.
@bornach @CatherineFlick (fwiw, i don't think there is an Nvidia analog here, i agree with various experts that LLMs are going to run their course and have already more or less hit the limits of what they can do, and it doesn't seem like these companies have anything else up their sleeves at the moment.)
@CatherineFlick but humans cant hold and utilise all that data. We need something to provide the knowledge on-demand.
I don't know about the hype aspects but this is damn good technology to pursue.
@marksun do we really need genAI to do this though? Why not just build good search engines that actually work well?
@CatherineFlick I think genAI like ChatGPT functions like such engine. It is just that we are finding many more uses for them and it gets harder to define what they are in simple terms.
@marksun @CatherineFlick except a search can be updated daily or hourly with fresh content. You can't train an LLM at this pace just to update it, the costs would be astronomical. This and so many more problems
@agorakit @CatherineFlick indeed fresh content is a thing. But remember there is a lot more data in history than a few hours or days. Still we are free to find ways around issues like you mentioned. It is not compulsory to use the same thing for everything.