Embedding

Embedding

Vector Embedding

A vector embedding is a dense numerical representation of an object — a word, sentence, document, image, or user — as a point in a high-dimensional vector space, where semantically similar objects cluster together, enabling machine learning models to measure meaning through geometry rather than exact keyword matching.

Embeddings are produced by transformer-based encoder models (such as the BERT family or dedicated embedding models): each input passes through tokenization, multi-layer self-attention, and pooling to produce a fixed-length vector — the same mechanism that enables contextual NLP tasks and powers the retrieval stage of RAG pipelines.

Embeddings are the backbone of retrieval-augmented generation (RAG): document chunks are embedded and stored in a vector database at index time; at query time the user's question is embedded and approximate nearest-neighbour search retrieves the most relevant chunks to ground the large language model's response with accurate, citable source material.

🔍 Click image to zoom

Semantic vs episodic memory in AI agents

Frequently Asked Questions

How are embeddings used in RAG?

In a RAG pipeline, embeddings are used in two ways: first, each document chunk is converted to an embedding vector and stored in a vector database; second, each incoming query is converted to an embedding vector and compared against stored embeddings using cosine similarity or dot product. Chunks whose embeddings are closest to the query embedding are retrieved as context for the LLM. The quality of the embedding model directly determines the quality of retrieval.

What is the difference between an embedding and a token?

A token is a discrete unit of text — typically a word, subword, or character — that a model processes as input. An embedding is the continuous numerical vector that represents that token inside the model. Every token is first looked up in an embedding table to get its vector representation; the model then operates on these vectors. Tokens are the input format; embeddings are the internal mathematical representation the model actually computes over.

How do you choose an embedding model?

Choose an embedding model based on three factors: domain fit (a model trained on scientific text will produce better embeddings for scientific queries than a general-purpose model), dimensionality (higher dimensions capture more nuance but require more storage and compute), and benchmark performance on tasks matching your use case (the MTEB leaderboard is the standard reference for text embedding benchmarks as of 2025). For multilingual use cases, verify that the model was trained on your target languages.

Frequently Asked Questions

How are embeddings used in RAG?

What is the difference between an embedding and a token?

How do you choose an embedding model?

See Also