AI Glossary

Every AI term explained in plain language. No PhD required.

🔍

82 of 82 terms

💡

Core Concepts

AI AgentAI Agent / Autonomous Agent

Beginner

An AI system that can take actions on its own — browsing the web, writing files, running code, sending messages — not just answering questions. It has goals and can use tools to achieve them.

Example

OpenClaw, AutoGPT, and CrewAI are agent frameworks that let AI act independently.

Related:llm tool-use multi-agent

Alignment

Beginner

The challenge of ensuring AI systems behave according to human values and intended purposes. This involves developing techniques to make AI understand and act upon what humans actually want, not just what they literally ask for.

Context WindowContext Window / Context Length

Beginner

The maximum amount of text a model can 'see' at once — both your input and its output combined. Larger context = can process longer documents.

Example

GPT-4o has 128k tokens (~300 pages). Gemini 2.5 Pro has 1M tokens (~2,500 pages).

Related:token rag

HallucinationAI Hallucination

Beginner

When an AI confidently generates false or made-up information as if it were true. A major challenge — the AI doesn't 'know' it's wrong.

Example

An AI might cite a research paper that doesn't exist, or give you wrong code that looks correct.

Related:grounding rag

LLMLarge Language Model

Beginner

An AI model trained on massive amounts of text data that can understand and generate human-like language. Think of it as a very sophisticated autocomplete that can write essays, code, and answer questions.

Example

GPT-4, Claude, Llama 3 are all LLMs.

Prompt

Beginner

The text you type to tell an AI what you want. The better your prompt, the better the response. Prompt engineering is the art of writing good prompts.

Example

'Write a Python function that sorts a list' is a simple prompt.

Retrieval Augmented Generation

Beginner

RAG combines AI language models with a knowledge database, allowing the AI to search and retrieve relevant information before generating responses. This helps the AI provide accurate, up-to-date answers grounded in real data rather than just relying on training knowledge.

Token

Beginner

The basic unit AI models process text in. Roughly 1 token ≈ 3/4 of a word in English. Models have token limits (context windows) and API pricing is per token.

Example

The sentence 'Hello world' is 2 tokens. GPT-4o costs $2.50 per million input tokens.

Related:context-window tokenizer

Tokenization

Beginner

The process of converting text into smaller pieces (tokens) that AI models can understand. These tokens might be whole words, parts of words, or even characters. The AI processes language token by token, similar to how we read word by word.

⚙️

Technical

Chain-of-thoughtChain-of-thought Reasoning (CoT)

Intermediate

A prompting technique where you ask the AI to 'think step by step' before giving the final answer. Significantly improves accuracy on math, logic, and complex problems.

Example

Adding 'Let's think step by step' to a math problem can improve accuracy from 60% to 90%+.

Related:prompt-engineering reasoning

Constitutional AI

Intermediate

A training approach where AI models learn to behave according to a set of predefined principles or 'constitution'. The AI critiques its own responses against these rules, gradually improving its behavior without requiring extensive human oversight for each interaction.

DPO

Intermediate

Direct Preference Optimization is a more efficient alternative to RLHF that directly optimizes language models based on human preferences. It's less complex to implement while achieving similar alignment improvements in AI behavior.

Distillation

Intermediate

A knowledge transfer technique where a large, capable 'teacher' model trains a smaller 'student' model to replicate its behavior. This creates smaller, faster models that retain most of the original's capabilities while being more practical for deployment.

EmbeddingVector Embedding

Intermediate

Converting text (or images) into a list of numbers (a vector) that captures its meaning. Similar texts get similar numbers. This enables semantic search — finding content by meaning, not just keywords.

Example

'Happy dog' and 'joyful puppy' would have very similar embeddings even though the words are different.

Federated Learning

Intermediate

A distributed training method where AI models learn from data spread across multiple devices or locations without centralizing that data. This allows AI to improve from diverse real-world usage while maintaining user privacy by keeping data on original devices.

Few-shotFew-shot Learning / Few-shot Prompting

Intermediate

Giving the AI a few examples of what you want before asking it to do the task. Dramatically improves accuracy for specific formats or styles.

Example

Showing the AI 3 examples of product descriptions before asking it to write one for your product.

Related:zero-shot prompt-engineering

Fine-tuningFine-tuning / Fine-tune

Intermediate

Taking a pre-trained model and training it further on specific data to make it better at a particular task. Like teaching a college graduate to specialize.

Example

Fine-tuning Llama 3 on medical texts to create a medical AI assistant.

Related:lora qlora training

Instruction Tuning

Intermediate

Training method where models learn to follow human instructions by practicing on example tasks with clear directions. This technique teaches AI to understand and execute specific tasks rather than just generating text, improving reliability on practical applications.

Model Merging

Intermediate

Combining multiple trained models or parameters to create new models with enhanced capabilities. This can involve blending different fine-tuned models or merging training techniques to achieve better performance than individual models alone.

Prompt Engineering

Beginner

The skill of crafting effective prompts to get the best results from AI models. Includes techniques like few-shot examples, chain-of-thought, and role-playing.

Example

Instead of 'Write code', a better prompt: 'You are a senior Python developer. Write a clean, well-documented function that...'

QLoRA

Intermediate

Quantized Low-Rank Adaptation combines LoRA with model quantization, allowing fine-tuning of massive AI models on consumer hardware. This technique can reduce memory requirements by 3-4x while maintaining model quality.

RAGRetrieval-Augmented Generation

Intermediate

A technique where the AI first searches a knowledge base for relevant information, then uses it to generate accurate answers. Reduces hallucinations and keeps answers grounded in real data.

Example

A customer support AI that searches your company's docs before answering questions.

RLHF

Intermediate

Reinforcement Learning from Human Feedback is a training method where AI models learn by receiving feedback from humans. Instead of just learning from data, the AI learns what humans consider good or bad responses, leading to more helpful and aligned behavior.

System PromptSystem Prompt / System Message

Beginner

A hidden instruction given to the AI before the conversation starts. Sets the AI's personality, rules, and behavior. The user usually can't see it.

Example

A system prompt might say: 'You are a helpful Norwegian customer service agent. Always respond in Norwegian.'

Related:prompt prompt-engineering

TransformerTransformer Architecture

Intermediate

The neural network architecture behind virtually all modern AI models. Invented by Google in 2017. Uses 'attention' to understand relationships between words regardless of distance in text.

Example

GPT = Generative Pre-trained Transformer. BERT, LLaMA, Claude all use transformer architecture.

Related:attention llm neural-network

Vector DatabaseVector Database / Vector Store

Intermediate

A database designed to store and search embeddings efficiently. Instead of matching keywords, it finds the most semantically similar content.

Example

Pinecone, Weaviate, ChromaDB, Qdrant are popular vector databases.

Related:embedding rag similarity-search

🖥️

Hardware & Local AI

Batch Processing

Intermediate

Processing multiple inputs together efficiently by grouping them into batches. This approach optimizes hardware usage and can be more cost-effective for high-volume applications compared to processing individual requests one at a time.

CUDA

Intermediate

NVIDIA's programming platform and API that enables AI models to run on graphics cards. It's the foundation that allows AI software to access GPU accelerations for the intensive mathematical calculations required by machine learning models.

Edge AI

Intermediate

AI processing performed locally on devices rather than in centralized cloud servers. This approach improves privacy, reduces latency, and enables AI functionality without internet connectivity, though it's limited by device computational capacity.

GGUFGPT-Generated Unified Format

Intermediate

A file format for quantized AI models, designed for efficient local inference. The standard format used by llama.cpp and Ollama. Replaced the older GGML format.

Example

Download a .gguf file from Hugging Face, load it in Ollama, and you're running AI locally.

Related:quantization ollama llama-cpp

GPUGraphics Processing Unit

Beginner

The chip that runs AI models. Originally designed for gaming graphics, but the parallel processing power is perfect for AI math. NVIDIA dominates the AI GPU market.

Example

Popular AI GPUs: RTX 3090 (24GB, ~$800), RTX 4090 (24GB, ~$1600), A100 (80GB, ~$10,000).

Related:vram cuda tensor-cores

Local InferenceLocal Inference / Running Models Locally

Beginner

Running AI models on your own computer instead of using a cloud API. Free, private, no internet needed — but requires good hardware (especially VRAM).

Example

Using Ollama to run Llama 3 on your gaming PC with an RTX 3090.

Related:vram quantization ollama gpu

On-device AI

Intermediate

AI capabilities that operate completely on user devices like smartphones, laptops, or smart cameras without requiring external servers. This ensures privacy for sensitive data and enables AI features even in environments with poor or no internet connectivity.

QuantizationModel Quantization

Intermediate

Compressing an AI model by using less precise numbers. Like converting a RAW photo to JPEG — smaller file, slightly lower quality, but usually barely noticeable. Q4 = 4-bit, Q8 = 8-bit, FP16 = 16-bit (full).

Example

A 70B model needs 140GB in FP16 but only 40GB in Q4 — fits on 2x RTX 3090 instead of needing an A100.

Related:vram gguf local-inference

Tensor Cores

Intermediate

Specialized processing units on NVIDIA GPUs designed specifically for AI matrix operations. These provide dramatically faster performance than traditional GPU cores for the mathematical operations used in deep learning, essential for efficient AI inference and training.

VRAMVideo RAM (GPU Memory)

Beginner

Memory on your graphics card (GPU). Determines the largest AI model you can run locally. More VRAM = bigger and smarter models. The #1 bottleneck for local AI.

Example

RTX 3090 has 24GB VRAM — can run most 7B-14B models. For 70B models you need 2 GPUs (48GB+).

🏗️

Architecture

AttentionAttention Mechanism

Advanced

The core innovation of transformers. Lets the model weigh how important each word is relative to every other word. 'The cat sat on the mat because it was tired' — attention helps the AI understand 'it' refers to 'the cat'.

Example

Self-attention is why models can understand context across long texts.

Related:transformer context-window

Attention Mechanism

Intermediate

The core technology that allows AI models to focus on the most relevant parts of input data. When processing text, it helps the AI understand which words are most important to each other word, similar to how humans pay attention to key details in a conversation.

Flash Attention

Intermediate

An optimized attention mechanism that works more efficiently with limited GPU memory. It achieves faster processing and enables use of longer context windows by being more memory-efficient, removing traditional bottlenecks in processing long sequences.

KV Cache

Intermediate

A performance optimization that stores pre-computed values (keys and values) from the attention mechanism, avoiding redundant calculations for tokens that have been seen before. This dramatically speeds up inference when generating long responses.

LoRALow-Rank Adaptation

Advanced

A lightweight fine-tuning technique that only trains a tiny fraction of the model's parameters. Makes fine-tuning accessible — you can customize a 70B model on a single GPU.

Example

Fine-tune Stable Diffusion with LoRA to generate images in your specific art style using just 20 example images.

Related:fine-tuning qlora

Mixture of Experts

Intermediate

A model architecture that activates only specific parts of the network for each input, making large models more efficient. It's like a team of specialists where only relevant experts respond to each query, reducing computational costs while maintaining capability.

MoEMixture of Experts

Advanced

A model architecture where only a fraction of the network is active for each task. Like a company with specialists — the right expert handles each question. More efficient than activating the entire model.

Example

Llama 4 Scout has 109B total params but only 17B active. DeepSeek V3 has 671B total but only ~37B active per query.

Related:parameters transformer

Multi-agentMulti-agent System

Intermediate

Multiple AI agents working together, each with different roles or specializations. Like a team where one agent does research, another writes code, and a third reviews quality.

Example

BerserKI uses multiple agents: Gønnar (strategy), Saga (research), Codex (coding), Pixel (design).

Related:ai-agent tool-use

ParametersModel Parameters

Beginner

The 'knowledge' of an AI model, stored as numbers. More parameters generally = more capable but requires more hardware. Think of them as the synapses in a brain.

Example

GPT-4 has ~1.8 trillion parameters. Llama 3 70B has 70 billion. Phi-4 has 14 billion.

Related:llm vram fine-tuning

Speculative Decoding

Intermediate

An optimization technique that uses a smaller, faster model to predict the next tokens, then a larger model corrects any mistakes. This significantly speeds up text generation while maintaining quality, like having a quick assistant do first drafts for review by an expert.

Top-P

Intermediate

A sampling method that considers only the most probable next tokens whose combined probability reaches a certain threshold. This provides more nuanced control than temperature alone, balancing diversity with relevance in generated text.

Vision Language Model

Intermediate

AI models specifically designed to understand both visual and textual information together. They can describe images, answer questions about what's in pictures, read text from images, and perform tasks requiring understanding of visual context.

🔧

Workflow & Tools

APIApplication Programming Interface

Beginner

A way for software to communicate with AI models programmatically. Instead of typing in a chat window, your code sends requests and gets responses. Pay-per-token pricing.

Example

Using OpenAI's API to integrate GPT-4 into your app, paying $2.50 per million tokens.

Related:token inference

Autonomous Agent

Beginner

AI systems capable of independent decision-making and action-taking without constant human supervision. These agents can set goals, plan actions, use tools, and adapt strategies to achieve objectives in complex environments over extended periods.

Differential Privacy

Beginner

A technique that adds carefully calculated noise to data or model outputs to prevent individual user information from being identifiable while preserving overall utility. This enables using sensitive data for AI training without compromising privacy.

Function Calling

Beginner

A structured way for AI models to invoke specific functions or APIs with defined parameters. This allows seamless integration between AI conversations and external applications, enabling automated workflows where AI can trigger specific software actions.

Grounding

Beginner

The process of connecting AI responses to factual, verified information or source data. This helps reduce hallucinations by ensuring the AI's outputs are anchored in real, substantiated knowledge rather than potentially incorrect assumptions.

Guardrails

Beginner

Safety mechanisms built into AI systems that prevent harmful, inappropriate, or unwanted outputs. These can include content filters, rule-based constraints, or automated monitoring that ensure AI behavior stays within acceptable boundaries.

Inference

Beginner

The process of getting an AI model to generate a response. When you send a prompt and get an answer — that's inference. Training creates the model; inference uses it.

Example

Every time ChatGPT answers a question, that's one inference call.

Related:token local-inference api

Jailbreak

Beginner

Techniques or prompts designed to bypass AI safety measures and restrictions, causing the model to generate content it was specifically trained not to produce. This ongoing security challenge requires continuously updating defensive measures.

Open SourceOpen Source / Open Weights

Beginner

Models where the code and/or weights are freely available. 'Open source' means full code access. 'Open weights' means you can use the model but may have license restrictions.

Example

Llama 3 is open-weights (Meta license). Qwen 2.5 is open-source (Apache 2.0 — do anything you want).

Related:llm local-inference

Prompt Injection

Beginner

A security vulnerability where malicious instructions are embedded within user inputs, potentially causing AI systems to behave in unintended ways. This can occur when systems fail to properly separate user content from system instructions.

Red Teaming

Beginner

Systematic testing of AI systems by simulating adversarial attacks to identify vulnerabilities, biases, or harmful behaviors. This proactive approach helps improve safety by discovering potential failure modes before they can be exploited by malicious actors.

Semantic Search

Beginner

Searching based on meaning and context rather than exact keyword matches. This allows finding relevant results even when search terms don't appear exactly in the source material, understanding the intent behind queries.

Streaming

Beginner

AI systems that process and respond to data in real-time as it becomes available, rather than waiting for complete datasets. This enables applications like live transcription, real-time translation, or immediate responses to changing inputs.

TemperatureTemperature (Sampling)

Intermediate

A setting that controls how creative/random the AI's responses are. Low (0.0) = deterministic and focused. High (1.0+) = creative and varied. Like a creativity dial.

Example

Use temperature 0 for code generation (precise), temperature 0.8 for creative writing (varied).

Related:inference prompt

Tool UseTool Use / Function Calling

Intermediate

An AI model's ability to call external tools — search the web, run code, read files, use APIs. Transforms a chatbot into an agent that can actually do things.

Example

Claude using a web browser to search for information, or calling a calculator for precise math.

Related:ai-agent function-calling

TrainingModel Training / Pre-training

Beginner

The process of teaching an AI model by feeding it vast amounts of data. Pre-training uses the entire internet; fine-tuning uses specific data. Costs millions of dollars for large models.

Example

Training GPT-4 reportedly cost over $100 million in compute.

Related:fine-tuning parameters gpu

🎨

Image AI

ControlNet

Advanced

An add-on for image AI that gives you precise control over the output — pose, edges, depth. Instead of hoping the AI gets the composition right, you guide it.

Example

Upload a stick figure pose → ControlNet generates a photorealistic person in that exact pose.

Related:diffusion stable-diffusion

DiffusionDiffusion Model

Intermediate

The technique behind modern image generation AI. Starts with random noise and gradually 'denoises' it into an image guided by your text prompt. Like sculpting a statue from a block of marble.

Example

Stable Diffusion, DALL-E 3, Midjourney, and Flux all use diffusion.

Negative Prompt

Beginner

Text telling the image AI what you DON'T want in the image. Helps avoid common artifacts and unwanted elements.

Example

'blurry, low quality, extra fingers, deformed' is a common negative prompt for portraits.

Related:prompt diffusion

UpscalingAI Upscaling / Super Resolution

Beginner

Using AI to increase image resolution while adding realistic detail. Turns a small 1024px image into a large 4096px print-ready file without losing quality.

Example

RealESRGAN 4x upscales a 1024×1024 image to 4096×4096 — ready for large format printing.

Related:diffusion

82 terms · Missing one? Suggest it

AI Glossary

Core Concepts

AI AgentAI Agent / Autonomous Agent

Alignment

Context WindowContext Window / Context Length

HallucinationAI Hallucination

LLMLarge Language Model

Prompt

Retrieval Augmented Generation

Token

Tokenization

Technical

Chain-of-thoughtChain-of-thought Reasoning (CoT)

Constitutional AI

DPO

Distillation

EmbeddingVector Embedding

Federated Learning

Few-shotFew-shot Learning / Few-shot Prompting

Fine-tuningFine-tuning / Fine-tune

Instruction Tuning

Model Merging

Prompt Engineering

QLoRA

RAGRetrieval-Augmented Generation

RLHF

System PromptSystem Prompt / System Message

TransformerTransformer Architecture

Vector DatabaseVector Database / Vector Store

Hardware & Local AI

Batch Processing

CUDA

Edge AI

GGUFGPT-Generated Unified Format

GPUGraphics Processing Unit

Local InferenceLocal Inference / Running Models Locally

On-device AI

QuantizationModel Quantization

Tensor Cores

VRAMVideo RAM (GPU Memory)

Architecture

AttentionAttention Mechanism

Attention Mechanism

Flash Attention

KV Cache

LoRALow-Rank Adaptation

Mixture of Experts

MoEMixture of Experts

Multi-agentMulti-agent System

Multi-modal

ParametersModel Parameters

Speculative Decoding

Top-P

Vision Language Model

Workflow & Tools

APIApplication Programming Interface

Autonomous Agent

Differential Privacy

Function Calling

Grounding

Guardrails

Inference

Jailbreak

Open SourceOpen Source / Open Weights

Prompt Injection

Red Teaming

Semantic Search

Streaming

TemperatureTemperature (Sampling)

Tool UseTool Use / Function Calling

TrainingModel Training / Pre-training

Image AI

ControlNet

DiffusionDiffusion Model

Negative Prompt

UpscalingAI Upscaling / Super Resolution