AI Glossary

Every AI term explained clearly, with real-world use cases from developers who build production AI apps every day. No academic jargon.

75 terms defined

Fundamentals

Large Language Model (LLM)

A neural network trained on massive text datasets that can generate, understand, and reason about human language.

Transformer

The neural network architecture behind all modern LLMs, using self-attention mechanisms to process sequences in parallel.

Tokenization

The process of breaking text into smaller units (tokens) that an AI model can process, typically subwords or word pieces.

Context Window

The maximum amount of text (measured in tokens) that an AI model can process in a single request, including both input and output.

Hallucination

When an AI model generates information that sounds plausible but is factually incorrect, fabricated, or not grounded in its training data.

Multimodal AI

AI models that can process and generate multiple types of data: text, images, audio, video, and code.

Temperature

A parameter that controls the randomness of AI model outputs, with lower values producing more deterministic responses and higher values producing more creative ones.

Attention Mechanism

A neural network component that allows models to dynamically focus on the most relevant parts of the input when generating each token of output.

Long Context

The ability of AI models to process and reason over very large inputs — hundreds of thousands or millions of tokens — in a single request.

Model Collapse

A degradation phenomenon where AI models trained on AI-generated data progressively lose quality, diversity, and accuracy over successive generations.

Neural Network

A computational system inspired by the brain, composed of layers of interconnected nodes (neurons) that learn patterns from data through training.

Responsible AI

A framework for developing and deploying AI systems that are fair, transparent, safe, privacy-preserving, and accountable.

Techniques

RAG (Retrieval-Augmented Generation)

A technique that enhances AI responses by retrieving relevant information from a knowledge base before generating an answer.

Embeddings

Numerical vector representations of text that capture semantic meaning, enabling similarity search and clustering.

Fine-Tuning

The process of further training a pre-trained AI model on your specific data to improve performance on domain-specific tasks.

Prompt Engineering

The practice of crafting effective instructions for AI models to produce desired outputs consistently.

Function Calling (Tool Use)

An AI capability where the model can decide to invoke external functions or APIs based on the conversation context.

AI Agent

An AI system that can autonomously plan, reason, use tools, and take actions to accomplish goals with minimal human intervention.

Chain of Thought (CoT)

A prompting technique that improves AI reasoning by instructing the model to break down complex problems into intermediate steps before giving a final answer.

Few-Shot Learning

A prompting technique where you provide a small number of input-output examples in the prompt to teach the model the desired behavior.

Zero-Shot Learning

The ability of an AI model to perform a task based solely on instructions, without any training examples provided in the prompt.

Transfer Learning

A machine learning technique where a model trained on one task is adapted to perform a different but related task, reducing the data and compute needed.

Reinforcement Learning from Human Feedback (RLHF)

A training technique that aligns AI model behavior with human preferences by using human feedback to reward desired outputs and penalize undesired ones.

Semantic Search

A search approach that finds results based on meaning rather than exact keyword matches, using embeddings to understand the intent behind queries.

Structured Output / JSON Mode

A feature that forces AI models to return responses in a specific format like JSON, ensuring parseable and type-safe outputs for programmatic use.

Streaming

A method of receiving AI model output token-by-token in real time as it is generated, rather than waiting for the complete response.

Batch Processing

Processing multiple AI requests together as a group, typically at lower cost and higher throughput than real-time individual requests.

Agentic Workflow

A multi-step AI process where an LLM autonomously plans, executes, and iterates on tasks using tools and feedback loops.

AI Hallucination Detection

Techniques and systems for identifying when an AI model generates false, fabricated, or unsupported information that appears plausible.

Data Labeling

The process of annotating raw data (text, images, audio) with labels or tags so it can be used to train and evaluate machine learning models.

Grounding (AI)

Connecting AI model outputs to verifiable sources of truth — such as retrieved documents, databases, or real-time data — to reduce hallucination and increase factual accuracy.

Multimodal RAG

An extension of RAG that retrieves and reasons over multiple data types — text, images, tables, charts, and audio — not just text documents.

RAG Pipeline (Detailed)

The complete end-to-end system for Retrieval-Augmented Generation, including document ingestion, chunking, embedding, indexing, retrieval, reranking, and generation.

Synthetic Data

Artificially generated data that mimics real-world data, used for training AI models when real data is scarce, expensive, private, or biased.

Tool Use (AI)

The capability of AI models to interact with external tools, APIs, and systems by generating structured function calls based on natural language instructions.

Infrastructure

MCP (Model Context Protocol)

An open standard by Anthropic that provides a universal way for AI models to connect to external data sources and tools.

Vector Database

A specialized database optimized for storing and searching high-dimensional vector embeddings, enabling semantic similarity search.

Inference

The process of running a trained AI model to generate predictions or outputs from new inputs, as opposed to training the model.

API Gateway

A server that acts as a single entry point for AI API requests, handling routing, rate limiting, authentication, and load balancing across multiple AI providers.

Model Serving

The infrastructure and process of hosting a trained AI model and exposing it as an API endpoint for real-time or batch inference.

Edge AI / On-Device AI

Running AI models directly on user devices (phones, laptops, IoT) rather than sending data to cloud servers for processing.

GPU / TPU

Specialized processors designed for the parallel mathematical operations that AI models require for training and inference.

Quantization

A technique that reduces AI model size and memory requirements by using lower-precision numbers to represent model weights, trading a small accuracy loss for major efficiency gains.

Distillation

A technique where a smaller "student" model is trained to replicate the behavior of a larger "teacher" model, achieving comparable quality at lower cost.

Latency

The time delay between sending a request to an AI model and receiving the response, critical for real-time user-facing applications.

Token Limits / Rate Limiting

Restrictions imposed by AI API providers on the number of tokens processed or requests made within a given time period.

AI Guardrails

Safety mechanisms that constrain AI system behavior, preventing harmful outputs, prompt injection, data leaks, and off-topic responses.

AI Orchestration

The coordination of multiple AI models, tools, and data sources in a unified pipeline to accomplish complex tasks that no single model can handle alone.

AI Pipeline

A sequence of data processing and AI inference steps that transforms raw input into a useful output, typically involving preprocessing, model inference, and post-processing.

Inference Optimization

Techniques to make AI model predictions faster, cheaper, and more efficient in production, including quantization, batching, caching, and model distillation.

Knowledge Graph

A structured representation of information as a network of entities and their relationships, used to give AI systems organized, queryable world knowledge.

Token Economy / AI Pricing

The cost structure of AI APIs based on token consumption, where pricing is determined by the number of input and output tokens processed per request.

Models

GPT

OpenAI's family of generative pre-trained transformer models, the most widely adopted LLMs for commercial AI applications.

Claude

Anthropic's family of AI models known for long context windows, strong reasoning, and instruction-following capabilities.

Gemini

Google's multimodal AI model family optimized for text, image, audio, and video understanding and generation.

Llama

Meta's open-source large language model family that can be downloaded, modified, and self-hosted without API fees.

Open-Source AI

AI models whose weights and architecture are publicly available, allowing anyone to inspect, modify, run, and build upon them.

Diffusion Model

A generative AI model that creates images, video, or audio by gradually removing noise from random static, guided by a text or image prompt.

Embedding Model

A specialized AI model that converts text, images, or other data into numerical vectors (embeddings) that capture semantic meaning for search and comparison.

Foundation Model

A large, general-purpose AI model trained on broad data that serves as a base for many downstream tasks through fine-tuning, prompting, or adaptation.

Mixture of Experts (MoE)

A model architecture where multiple specialized sub-networks ("experts") are combined, with a gating mechanism that routes each input to the most relevant experts.

Transformer Architecture (Detailed)

The complete technical architecture of the Transformer, including multi-head self-attention, positional encoding, feed-forward layers, and the encoder-decoder structure.

Vision Language Model (VLM)

An AI model that can process and reason about both images and text simultaneously, enabling visual question answering, image description, and multimodal analysis.

Want to put these concepts into practice?

AI 4U ships production AI apps using these technologies every week. Let's build yours.

Start a Project