The Mechanics of Learning

A visual exploration of how neural networks actually "learn".
Scroll down to interact with the machine.

1. The Forward Pass

We start with a simple goal: Input text, output a prediction. A neural network is essentially layers of "mixers" (neurons). Each line connecting them has a weight—think of it as a volume slider.

In the visualization, the inputs are "The", "Capital", "Is". We want the network to predict "Paris".

Hover over the lines to see their weights. Hover over nodes to see the math. Click Run Forward Pass to watch the data flow. Notice the output is random? That's because the weights are currently random.

Paris Probability: 0.0%

2. The Scorecard (Loss)

The network guessed wrong. But how wrong? We need a number to penalize the machine. This is the Loss Function.

We use a formula called Cross Entropy, simplified here as -log(Probability).

Drag the slider below. Imagine this is the network's confidence in the correct answer ("Paris").

Notice: If the network is 100% confident (1.0), Loss is 0. But as confidence drops towards 0, the Loss explodes. The network hates being wrong.

Loss = -log(0.10) = 2.30

Prediction (Confidence) 0.10

3. The "Wiggle" (Gradient)

To fix the high loss, we have to blame specific connections. We calculate the Gradient.

Think of this as "wiggling" a weight to see what happens. If I increase this weight slightly, does the Loss go up or down?

In the visualization, you are looking at a single connection. Use the slider to change the weight w. Try to find the "sweet spot" (the valley) where Loss is lowest.

The Ghost Dot shows you the slope (Gradient) at your current position.

Loss: 0.00

Weight (w) 0.0

Gradient = Slope of the curve

4. The Loop (Optimization)

We don't just wiggle one weight. We calculate the gradient for every weight in the network simultaneously using Calculus (Backpropagation).

Then, we take a step in the opposite direction of the gradient to reduce error.

Interactive Training:

Click Step (Train).
Watch the signal go forward.
See the Loss flash red.
See the Arrows appear. Green Up means increase weight, Red Down means decrease.
Watch the weights physically shift.

Repeat this 10-20 times and watch the "Paris" probability climb.

Step: 0 Loss: --- Paris: 0%

5. Generalization

Why does this work? By minimizing loss on millions of examples, the network isn't just memorizing.

It creates a high-dimensional landscape where concepts like "Capital" and "France" naturally funnel the signal down into the valley of "Paris".

The ball represents the state of our network. Through training (Gradient Descent), it rolls down the hill, finding the lowest point of error.

You have mastered the loop: Guess, Measure, Wiggle, Step.