Email autocomplete memory lab

Watch RNN, LSTM, and GRU compete to remember context.

SequenceLab AI turns recurrent neural networks into a guided, gamified lab: one email draft, three models, live memory curves, rigorous equations, and plain-language explanations.

Live sequence trace
01
Hi
02
Maya
03
during
04
Monday's
05
kickoff
06
the
07
client
08
specifically
09
requested
10
the
11
compliance
12
appendix
13
but

Step 1

Choose an email prediction challenge

The app does not auto-run. Pick a preset or write your own email fragment, then decide when the models start processing.

Auto-suggested target: Add a longer email fragment
Works for a single word or phrase. For presets, edit this field to test a different expected completion.

Step 2

Run the three-model arena

All metrics come from the simplified TypeScript simulation. The goal is explainable behavior: memory, confidence, prediction quality, latency, and parameter complexity all arise from the same run.

Long Clue: Meeting Follow-up

Hi Maya, during Monday's kickoff the client specifically requested the compliance appendix, but after several unrelated updates could you please attach the final

Target word or phrase: appendix

Predictions are hidden until the run reaches the final token. This keeps the lab honest: first watch the memory behavior, then reveal each model's completion.

The Step button advances one word at a time for classroom explanation. Watch the glow labels: context fading means the model is losing the earlier clue.

Recurrent Neural Network

RNN

1x params

An RNN reads a sequence one token at a time and compresses everything it has seen into a hidden state.

Current token

Context fading
HiStep 1 / 23

The vanilla RNN updates its hidden state, but older clues fade as newer words overwrite the memory.

15%
Memory retention
46%
Confidence
Prediction locked until the run reaches the end. Press Start or Step through the sequence to reveal how this model arrives at its answer.

Long Short-Term Memory

LSTM

4x params

An LSTM adds a cell state and gates so it can choose what to write, forget, and reveal.

Current token

Context fading
HiStep 1 / 23

The LSTM balances new input with preserved cell memory, reducing long-context drift.

16%
Memory retention
40%
Confidence
Prediction locked until the run reaches the end. Press Start or Step through the sequence to reveal how this model arrives at its answer.
input gate54%
forget gate80%
output gate66%

Gated Recurrent Unit

GRU

3x params

A GRU keeps the gating idea but merges memory and hidden state into a simpler structure.

Current token

Context fading
HiStep 1 / 23

The GRU blends previous memory and new context through a compact gated update.

20%
Memory retention
45%
Confidence
Prediction locked until the run reaches the end. Press Start or Step through the sequence to reveal how this model arrives at its answer.
update gate67%
reset gate69%

Step 3

Compare the outcome

The charts translate the run into visible evidence: where memory fades, where gates help, and what each model trades for speed, stability, quality, latency, and complexity.

Complete the arena run to unlock memory, confidence, error, capability, latency, and complexity charts.

Step 4

Theory, math, architecture, strengths, and weaknesses

Each model is explained twice: first in classroom language, then with formal equations suitable for technical study.

Recurrent Neural Network

RNN

An RNN reads a sequence one token at a time and compresses everything it has seen into a hidden state.

Input x_t
Hidden h_t
Output y_t

A single recurrent hidden state carries compressed context forward.

Input x_t

The current word represented as numbers.

Hidden h_t

A compressed memory of what has been read so far.

Output y_t

The model's next-word probability guess.

How to read this math: x_t is the current word, h_t is the memory after reading it, and the softmax output becomes the next-word guess.
ht=tanh(Wxhxt+Whhht1+bh)h_t = \tanh(W_{xh}x_t + W_{hh}h_{t-1} + b_h)
y^t=softmax(Whyht+by)\hat{y}_t = \operatorname{softmax}(W_{hy}h_t + b_y)

Strengths

  • Simple architecture
  • Fast baseline
  • Good for short dependencies

Weaknesses

  • Vanishing gradients
  • Weak long-term memory
  • Hidden state can be overwritten

Use cases

  • Small sequence classification
  • Simple autocomplete demos
  • Educational baselines

Long Short-Term Memory

LSTM

An LSTM adds a cell state and gates so it can choose what to write, forget, and reveal.

Input gate
Forget gate
Cell state
Output gate

Separate gates protect the cell state so important email clues survive longer.

Input gate

Decides how much new information should enter memory.

Forget gate

Decides what old information should be weakened or removed.

Cell state

The long-term memory highway that carries important clues forward.

Output gate

Decides which part of memory should influence the prediction.

How to read this math: The gates are small decision makers. They choose what enters memory, what gets forgotten, and what is exposed for prediction.
ft=σ(Wf[xt,ht1]+bf)f_t = \sigma(W_f[x_t, h_{t-1}] + b_f)
it=σ(Wi[xt,ht1]+bi)i_t = \sigma(W_i[x_t, h_{t-1}] + b_i)
C~t=tanh(WC[xt,ht1]+bC)\tilde{C}_t = \tanh(W_C[x_t, h_{t-1}] + b_C)
Ct=ftCt1+itC~tC_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t
ot=σ(Wo[xt,ht1]+bo)o_t = \sigma(W_o[x_t, h_{t-1}] + b_o)
ht=ottanh(Ct)h_t = o_t \odot \tanh(C_t)

Strengths

  • Excellent long-context memory
  • Controls forgetting
  • Handles delayed clues well

Weaknesses

  • More parameters
  • Higher latency
  • More complex to explain and tune

Use cases

  • Language modeling
  • Speech recognition
  • Long-range time-series forecasting

Gated Recurrent Unit

GRU

A GRU keeps the gating idea but merges memory and hidden state into a simpler structure.

Update gate
Reset gate
Candidate state
Hidden h_t

Compact gates decide how much old context to keep and how much new evidence to write.

Update gate

Controls how much previous memory should be kept.

Reset gate

Controls how much past context should be ignored for the current update.

Candidate state

A proposed new memory based on the current word.

Hidden h_t

The final compact memory used for prediction.

How to read this math: The update gate decides what to keep, the reset gate decides what to ignore, and the hidden state becomes the prediction memory.
zt=σ(Wz[xt,ht1])z_t = \sigma(W_z[x_t, h_{t-1}])
rt=σ(Wr[xt,ht1])r_t = \sigma(W_r[x_t, h_{t-1}])
h~t=tanh(W[xt,rtht1])\tilde{h}_t = \tanh(W[x_t, r_t \odot h_{t-1}])
ht=(1zt)ht1+zth~th_t = (1 - z_t) \odot h_{t-1} + z_t \odot \tilde{h}_t

Strengths

  • Fewer parameters than LSTM
  • Strong practical performance
  • Good speed-memory tradeoff

Weaknesses

  • Less explicit memory control than LSTM
  • Can underperform on very long dependencies

Use cases

  • Chat features
  • Mobile NLP
  • Real-time sequence prediction