Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why is learning an appropriate metaphor for changing weights but not for context? There are certainly major differences in what they are good or bad at and especially how much data you can feed them this way effectively. They both have plenty of properties we wish the other had. But they are both ways to take an artifact that behaves as if it doesn't know something and produce an artifact that behaves as if it does.

I've learned how to solve a Rubik's cube before, and forgot almost immediately.

I'm not personally fond of metaphors to human intelligence now that we are getting a better understanding of the specific strengths and weaknesses these models have. But if we're gonna use metaphors I don't see how context isn't a type of learning.



I suppose ultimately, the external behaviour of the system is what matters. You can see the LLM as the system, on a low level, or even the entire organisation of e.g. OpenAI at a high level.

If it's the former: Yeah, I'd argue they don't "learn" much (!) past inference. I'd find it hard to argue context isn't learning at all. It's just pretty limited in how much can be learned post inference.

If you look at the entire organisation, there's clearly learning, even if relatively slow with humans in the loop. They test, they analyse usage data, and they retrain based on that. That's not a system that works without humans, but it's a system that I would argue genuinely learns. Can we build a version of that that "learns" faster and without any human input? Not sure, but doesn't seem entirely impossible.

Do either of these systems "learn like a human"? Dunno, probably not really. Artificial neural networks aren't all that much like our brains, they're just inspired by them. Does it really matter beyond philosophical discussions?

I don't find it too valuable to get obsessed with the terms. Borrowed terminology is always a bit off. Doesn't mean it's not meaningful in the right context.


To stretch the human analogy, it's short term memory that's completely disconnected from long term memory.

The models currently have anteretrograde amnesia.


It’s not very good in context, for one thing. Context isn’t that big, and RAG is clumsy. Working with an LLM agent is like working with someone who can’t form new long term memories. You have to get them up to speed from scratch every time. You can accelerate this by putting important stuff into the context, but that slows things down and can’t handle very much stuff.


The article does demonstrate how bad it is in context.

Context has a lot of big advantages over training though, too, it's not one-sided. Upfront cost and time are the big obvious ones, but context also works better than training on small amounts of data, and it's easier to delete or modify.

Like even for a big product like Claude Code from someone that controls the model, although I'm sure they do a lot of training to make the product better, they're not gonna just rely entirely on training and go with a nearly blank system prompt.


You got this exactly backwards.

"I'm not fond of metaphors to human intelligence".

You're assuming that learning during inference is something specific to humans and that the suggestion is to add human elements into the model that are missing.

That isn't the case at all. The training process is already entirely human specific by way of training on human data. You're already special casing the model as hard as possible.

Human DNA doesn't contain all the information that fully describes the human brain, including the memories stored within it. Human DNA only contains the blue prints for a general purpose distributed element known as neurons and these building blocks are shared by basically any animal with a nervous system.

This means if you want to get away from humans you will have to build a model architecture that is more general and more capable of doing anything imaginable than the current model architectures.

Context is not suitable for learning because it wasn't built for that purpose. The entire point of transformers is that you specify a sequence and the model learns on the entire sequence. This means that any in-context learning you want to perform must be inside the training distribution, which is a different way of saying that it was just pretraining after all.


The fact the DNA doesn't store all connections in the brain doesn't mean that enormous parts of the brain, and by extension, behaviour aren't specified in the DNA. Tons of animals have innate knowledge encoded in their DNA, humans among them.


I don't think it's specific to humans at all, I just think the properties of learning are different in humans than they are in training an LLM, and injecting context is different still. I'd rather talk about the exact properties than bemoan that context isn't learning. We should just talk about the specific things we see as problems.


Models gain information from context but probably not knowledge and definitely not wisdom.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: