Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Understanding Neural Network, Visually (visualrambling.space)
342 points by surprisetalk 1 day ago | hide | past | favorite | 53 comments




For the visual learners, here's a classic intro to how LLMs work: https://bbycroft.net/llm

If you’re interested in an easy explainer for backprop, then highly recommend math for deep learning by Kneusel. Finished it recently and you can apply the chain rule backwards by hand with a tiny NN, and code one too.

- while impressive, it still doesnt tell me why a neural network is architected the way it is and that my bois is where this guy comes in https://threads.championswimmer.in/p/why-are-neural-networks...

- make a visualization of the article above and it would be the biggest aha moment in tech


Regarding architecture, I don't believe a satisfying "why" is in the cards.

Conceptually neural networks are quite simple. You can think of each neural net as a daisy chain of functions that can be efficiently tuned to fulfill some objective via backpropagation.

Their effectiveness (in the dimensions we care about) are more a consequence of the explosion of compute and data that occured in the 2010s.

In my view, every hyped architecture was what yielded the best accuracy given the compute resources available at the time. It's not a given that these architectures are the most optimal and we certainly don't always fully understand why they work. Most of the innovations in this space over the past 15 years have come from private companies that have lacked a strong research focus but are resource rich (endless compute and data capacity).


Lovely visualization. I like the very concrete depiction of middle layers "recognizing features", that make the whole machine feel more plausible. I'm also a fan of visualizing things, but I think its important to appreciate that some things (like 10,000 dimension vector as the input, or even a 100 dimension vector as an output) can't be concretely visualized, and you have to develop intuitions in more roundabout ways.

I hope make more of these, I'd love to see a transformer presented more clearly.



Super cool visualization Found this vid by 3Blue1Brown super helpful for visualizing transformers as well. https://www.youtube.com/watch?v=wjZofJX0v4M&t=1198s

Their series on LLMs, neural nets, etc., is amazing.

This is just scratching the surface -- where neural networks were thirty years ago: https://en.wikipedia.org/wiki/MNIST_database

If you want to understand neural networks, keep going.


Which, if you are trying to learn the basics, is actually a great place to start ...

This reminds me of a "web site" (remember those) I used to visit a lot years ago, trying to understand Neural Networks and genetic algorithms:

http://www.ai-junkie.com/ann/evolved/nnt1.html

This is old. Perhaps late 90s or early 00. The top domain still uses Flash. But the same OCR example is used to teach the concept. For some reason, that site made it all click for me.


I made a similar thing recently: https://lighthousesoftware.co.uk/projects/neural-network/

I wanted to get a feel for what specific neurons are actually looking at, and how disabling/enabling them affects the final output.

It runs a little MNIST model in the browser, but lets you turn pixels and neurons on/off, and examine the weight and activation patterns of each neuron and how it contributes to each prediction. Helped me get more of an intuitive sense of what is going on inside.


This Welch Labs video is very helpful: https://www.youtube.com/watch?v=qx7hirqgfuU

I have a question. With the logic of neural networks, and pattern recognition, is it not then possible to "predict" everything in everything? Like predicting the future to an exact "thing"? Is this not a tool to manipulate for instace the stock market?

It is possible to try it, and some people do (high speed trading is just that, plus taking advantage of privileged information that speed provides to react before anyone else).

However there are two fundamental problems to computational predictions. The first one obviously is accuracy. A model is a compressed memorization of everything observed so far; a prediction with it is just projecting into the future the observed patterns. In a chaotic system, that goes only so far; the most regular, predictable patterns are obvious to everybody and give less return, and the chaotic system states where prediction would be more valuable are the less reliable. You cannot build a perfect oracle that would fix that.

The second problem is more insidious. Even if you were able to build a perfect oracle, acting on its predictions would become part of the system itself. That would change the outcomes, making the system behave in a different way as it was trained, and thus less reliable. If several people do it at the same time, there's no way to retrain the model to take into account the new behaviour.

There's the possibility (but not a guarantee) to reach a fixed point, that a Nash equilibrium would appear where such system becomes into a stable cycle, but that's not likely in a changing environment where everybody tries to outdo everyone else.


Ah, this actually connects a few dots for me. It helps explain why models seem to have a natural lifetime, once deployed at scale, they start interacting with and shaping the environment they were trained on. Over time, data distributions, usage patterns, and incentives shift enough that the model no longer functions as the one originally created, even if the weights themselves haven’t changed.

That also makes sense of the common perception that a model feels “decayed” right before a new release. It’s probably not that the model is getting worse, but that expectations and use cases have moved on, people push it into new regimes, and feedback loops expose mismatches between current tasks and what it was originally tuned for.

In that light, releasing a new model isn’t just about incremental improvements in architecture or scale; it’s also a reset against drift, reflexivity, and a changing world. Prediction and performance don’t disappear, but they’re transient, bounded by how long the underlying assumptions remain valid.

That means all the AI companies that "retire" a model is not because of their new better model only, but also because of decay?

PS. I clean wrote above with AI, (not native englishmen)


Correct me if I am wrong, I think this is related to the term "covariate shift" (change in model input distribution x) and "concept drift".

The interesting part is that its then not possible for true AGI with the current approach, since there is no ceiling/boundaries to "contain" it?

Well nothing is stopping you from attempting to predict everything with neural networks but that doesn't mean your predictions will be (1) good (2) consistently useful or (3) economical. Transformer models for example suffer from (2) and especially (3) in their current iteration.

DNNs learn patterns, for them to work there must be some. The stock market almost entirely reliant on random real world events that aren't recurrent so you can't predict much at all.

Great explanation, but the last question is quite simple. You determine the weights via brute force. Simply running a large amount of data where you have the input as well as the correct output (handwriting to text in this case).

"Brute force" would be trying random weights and keeping the best performing model. Backpropagation is compute-intensive but I wouldn't call it "brute force".

"Brute force" here is about the amount of data you're ingesting. It's no Alpha Zero, that will learn from scratch.

What? Either option requires sufficient data. Brute force implies iterating over all combinations until you find the best weights. Back-prop is an optimization technique.

In context of grandparents post.

     > You determine the weights via brute force. Simply running a large amount of data where you have the input as well as the correct output 
Brute force just means guessing all possible combinations. A dataset containing most human knowledge is about as brute force as you can get.

I'm fairly sure that Alpha Zero data is generated by Alpha Zero. But it's not an LLM.


No, a large dataset does not make something brute force. Rather than backprop, an example of brute force might be taking a single input output pair then systematically sampling the model parameter space to search for a sufficiently close match.

The sampling stage of Evolution Strategies at least bears a resemblance but even that is still a strategic gradient descent algorithm. Meanwhile backprop is about as far from brute force as you can get.


I love this visual article as well:

https://mlu-explain.github.io/neural-networks/


Spent 10 minutes on the site and I think this is where I'll start my day from next week! I just love visual based learning.

I like the style of the site it has a "vintage" look

Don't think it's moire effect but yeah looking at the pattern



Oh god my eyes! As it zooms in (ha)

That's cool, rendering shades in the old days

Man those graphics are so good damn


Oh wow, this looks like a 3d render of a perceptron when I started reading about neural networks. I guess essentially neural networks are built based on that idea? Inputs > weight function to to adjust the final output to desired values?

The layers themselves are basically perceptrons, not really any different to a generalized linear model.

The ‘secret sauce’ in a deep network is the hidden layer with a non-linear activation function. Without that you could simplify all the layers to a linear model.


A neural network is basically a multilayer perceptron

https://en.wikipedia.org/wiki/Multilayer_perceptron


Yes, vanilla neural networks are just lots of perceptrons

As someone who does not use Twitter, I suggest adding RSS to your site.

This visualizations reminds me of the 3blue1brown videos.

I was thinking the same thing. Its at least the same description.

Cool intro and visuals.

Where it ends "how do we calculate the weights ?" is fairly simple.

Start completely randomly and compare output to known truth. When it's incorrect, you beat the model up pretty badly and repeat again. Eventually you get the correct answer pretty consistently.

... and by "beat it up" I mean tweak the weights - totally randomly will work but will take a long time (brute force), so we add a bit of intelligence to see which direction to tweak via some algorithms (backpropagation, gradient descent).


Really cool. The animations within a frame work well.

Nice visuals, but misses the mark. Neural networks transform vector spaces, and collect points into bins. This visualization shows the structure of the computation. This is akin to displaying a Matrix vector multiplication in Wx + b notation, except W,x,and b have more exciting displays.

It completely misses the mark on what it means to 'weight' (linearly transform), bias (affine transform) and then non-linearly transform (i.e, 'collect') points into bins


> but misses the mark

It doesn't match the pictures in your head, but it nevertheless does present a mental representation the author (and presumably some readers) find useful.

Instead of nitpicking, perhaps pointing to a better visualization (like maybe this video: https://www.youtube.com/watch?v=ChfEO8l-fas) could help others learn. Otherwise it's just frustrating to read comments like this.


It's not nitpicking to point out major missing pieces. Comments like this might tend to come across as critical but they are incredibly valuable for any reader that doesn't know what he doesn't know.

It just sucks to put in a ton of work into something and then show it off to people but the first reaction is someone comes out of the woodwork to loudly crow that it "misses the mark" and is somehow crap.

It's a completely avoidable experience when the community has a more generally positive attitude. All it takes is a little different phrasing of exactly the same feedback, but with a positive emotional and encouraging tone.

For example, instead of writing:

> Nice visuals, but misses the mark. Neural networks transform vector spaces, and collect points into bins. This visualization shows the structure of the computation. This is akin to displaying a Matrix vector multiplication in Wx + b notation, except W,x,and b have more exciting displays.

> It completely misses the mark on what it means to 'weight' (linearly transform), bias (affine transform) and then non-linearly transform (i.e, 'collect') points into bins

Here's more or less the same comment but with a completely different attitude:

> Oh wow, that's cool! That must have been a ton of work to put together. That got me thinking as to how it's akin to Matrix vector multiplication in Wx + b notation, except W,x,and b have more exciting displays.

> An idea I am wondering about but don't know how to solve is what it means to 'weight' (linearly transform), bias (affine transform) and then non-linearly transform (i.e, 'collect') points into bins.

> Here's some other links that are related and cool: ...

> Cheers, nice work!

Let's not crap on people's work so readily. After all, we have no idea about who the author is. Maybe it's a teenager or a university student and this was their first project. It's really a jarring and demoralizing experience to have your first visualization immediately crapped on.


When it comes to most in person interactions I approximately agree with you. But on HN brutal honesty seems to be the norm and at least personally I appreciate it for that.

A large part of the problem is a cultural mismatch I think. People have a tendency to interpret even entirely valid criticism as negativity. One of the nice things about a more analytical environment (ex STEM research labs IRL, HN on the net) is that you don't need to worry about that so much. The expectation is that things will be critiqued - that this is a good thing that helps further personal growth and intellectual endeavors more generally.

I'll grant the original comment could have been worded a bit more gently without losing the intended meaning. That said, the alternate example you gave there changes the meaning, sounds rather sycophantic, and honestly reads like corpo-posi-speak or LLM prose to me.

Regarding the original criticism. Notice that the title implies this to be an illustration of how a network does what it does. And the visualization flows through internal to output cells. Yet a number of key concepts aren't explained at all. Vaguely analogous to throwing up some ASM on a PPT slide and remarking "so you see, that's how it works". There's a matmul there, but _why_? What's the _point_ of an activation function? Unless I missed something the visualization doesn't even mention nonlinearity despite it being an essential property.


I agree. This visualization gets the basic idea across, but it doesn't actually tell you how they are implemented mathematically.

It doesn't tell you that each neuron calculates a dot product of the input and neuron weights and that the bias is simply added rather than a threshold, nor does it tell you that there is an activation function that acts as a differentiable threshold.

Without this critical information there is no easy way to explain how to train a neural network since you can't use gradient descent anymore. You're forced to use evolutionary algorithms for non-differentiable networks.


I like the CRT-like filter effect.

I get 3fps on my chrome, most likely due to disabled HW acceleration

High FPS on Safari M2 MBP.

am looking for pengasus trying to heck in to a phone can you help me with that

Great visualization!

Nice work

very cool stuff



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: