LLM

LLM

Large Language Model

A Large Language Model (LLM) is a deep learning model built on the transformer architecture and trained on hundreds of billions of tokens of text and code — giving it the ability to understand, reason about, and generate human language, enabling zero-shot and few-shot generalisation across tasks it was never explicitly trained for.

LLMs are built through two main phases: large-scale pre-training using a next-token prediction objective, followed by supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to align the model with human preferences and make it accurate, helpful, and safe.

LLMs power a wide range of applications — including conversational assistants, code generation, RAG-based knowledge retrieval, prompt engineering workflows, and multimodal AI systems — and serve as the foundation model underlying most modern AI products.

🔍 Click image to zoom

Large Language Models — how they work

Frequently Asked Questions

What makes a language model "large"?

A language model is considered "large" when it has enough parameters — typically above one billion — to exhibit emergent capabilities such as in-context learning, chain-of-thought reasoning, and zero-shot task generalisation. The threshold is not fixed; as the field progresses, "large" is a relative term compared to the models of the time.

What is the difference between an LLM and a chatbot?

An LLM is the underlying AI model — a mathematical system trained on text data. A chatbot is a product layer built on top of an LLM, which adds a user interface, memory management, and system prompts to shape how the model behaves. ChatGPT and Claude are chatbots; GPT-4 and Claude 3 Sonnet are the LLMs powering them.

How are LLMs evaluated?

LLMs are evaluated on standardised benchmarks such as MMLU (multiple-choice knowledge), HumanEval (code generation), GSM8K (maths reasoning), and MT-Bench (instruction following). Human evaluation — where annotators compare model outputs — is also widely used, as benchmarks can be gamed through overfitting to test sets. No single benchmark fully captures real-world LLM usefulness.

Frequently Asked Questions

What makes a language model "large"?

What is the difference between an LLM and a chatbot?

How are LLMs evaluated?

See Also