Type a prompt and watch a from-scratch GPT-style language model — trained entirely on Wikipedia — generate text in real time. Adjust temperature, top-k, and top-p to explore different sampling behaviors.
~50K English Wikipedia articles are downloaded, cleaned of markup and metadata, and prepared as a training corpus.
A Byte-Pair Encoding tokenizer is trained from scratch on the corpus, learning subword units that efficiently represent the vocabulary.
A GPT-style transformer decoder learns to predict the next token by training on millions of token sequences with cosine-scheduled AdamW optimization.
Given a prompt, the model autoregressively generates tokens using configurable sampling — temperature, top-k, and nucleus (top-p) — for varied output.