GPT

GPT

Generative Pre-trained Transformer

GPT (Generative Pre-trained Transformer) is OpenAI's family of decoder-only transformer large language models (LLMs), pre-trained on massive text corpora via next-token prediction and then aligned using supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) — establishing the template that most modern LLMs follow.

GPT models are "pre-trained" on hundreds of billions of tokens of web text, books, and code to acquire broad world knowledge, reasoning, and language fluency, before a fine-tuning phase adapts them for instruction following, prompt engineering, tool use, RAG pipelines, and multimodal tasks.

The GPT lineage — from GPT-1 (117M parameters, 2018) through GPT-3 (175B, 2020, which first demonstrated few-shot and zero-shot learning at scale) to GPT-4 (2023) — established empirical scaling laws showing LLM capability improves predictably with more parameters, more data, and more compute.

🔍 Click image to zoom

GPT — generative pre-trained transformer

Frequently Asked Questions

What is the difference between GPT and BERT?

GPT uses the decoder part of the Transformer and is trained to predict the next token (left-to-right), making it well-suited for text generation. BERT uses the encoder part and is trained to predict masked tokens using bidirectional context (seeing both left and right), making it better suited for classification and understanding tasks. GPT is generative; BERT is discriminative.

What does the "pre-trained" in GPT mean?

"Pre-trained" means the model was first trained on a massive general-purpose text corpus (hundreds of billions of tokens from the web, books, and code) before any task-specific training. This pre-training phase teaches the model broad world knowledge, grammar, and reasoning patterns. A separate fine-tuning phase then adapts the pre-trained model for instruction following, helpfulness, and safety using techniques like SFT and RLHF.

How has GPT evolved from version 1 to GPT-4?

GPT-1 (2018) had 117 million parameters and demonstrated basic transfer learning for NLP tasks. GPT-2 (2019) scaled to 1.5 billion parameters and could generate coherent multi-paragraph text, raising concerns about misuse. GPT-3 (2020) reached 175 billion parameters and exhibited strong few-shot learning. GPT-4 (2023) is multimodal (accepting images and text), has improved reasoning, and significantly reduced hallucination rates compared to GPT-3.5.

Frequently Asked Questions

What is the difference between GPT and BERT?

What does the "pre-trained" in GPT mean?

How has GPT evolved from version 1 to GPT-4?

See Also