Look-Alike, Mean-Different: A Field Guide to AI’s Most Confused Terms
Ever sat in a meeting wondering, “Wait… is Mixtral a company or a model?” You’re not alone.
The world of AI and ML is packed with terms that sound like twins but mean very different things. It’s easy to mix up Mistral and Mixtral, confuse embeddings with encodings, or think GPTQ is a model when it’s really a quantization method.
This is your no-fluff, human-first guide to 25+ AI terms that keep showing up and tripping people up.
Mistral vs Mixtral
One is the company; the other is one of its models.
Mistral is an open-source AI company known for building small, fast, high-performing language models.
Mixtral is a specific mixture-of-experts (MoE) model released by Mistral. It activates only parts of the model at a time to stay efficient while being powerful.
Fine-Tuning vs Pretraining
Pretraining is training from scratch; fine-tuning is customizing a trained model.
Pretraining teaches a model language and world knowledge using huge, generic datasets.
Fine-tuning adapts that pretrained model to specific tasks, like understanding legal text or chatting like a doctor.
ChatGPT vs GPT
GPT is the raw model; ChatGPT is the product you interact with.
GPT (like GPT-4) is a large language model trained to generate intelligent responses.
ChatGPT is the chat-based interface built on top of GPT, with memory, tools, and a friendly user experience.
Token vs Word
A word is what humans read; a token is what models see.
A word like “playing” is simple to us.
A token might split it into “play” and “ing,” depending on the tokenizer the model uses.
Foundation Model vs Base Model
A base model is raw and pretrained; a foundation model is general-purpose and reusable.
A base model is the untouched result of pretraining, like GPT before you do anything with it.
A foundation model is powerful and flexible, designed to be adapted across many domains and tasks.
Open-source vs Open-weight
Open-source includes code and weights; open-weight means only the weights are available.
Open-source projects release everything - training code, architecture, data, and weights.
Open-weight models give you the trained model but may hide the training process or dataset.
Model vs Architecture
Architecture is the design; a model is a trained version of that design.
Architecture describes how a model is built. Like transformers or convolutional networks.
A model is the actual trained product using that architecture, like GPT-3 or BERT.
Zero-Shot vs Few-Shot
Zero-shot uses no examples; few-shot includes a few to guide the model.
In zero-shot, you give the model a task with just instructions.
In few-shot, you include sample inputs and outputs to help the model understand what you want.
GPT vs GPTQ
GPT is the original model; GPTQ is a way to make that model smaller and faster.
GPT (Generative Pretrained Transformer) is the core architecture used in models like GPT-3.5 and GPT-4.
GPTQ is a quantization method that reduces model size and speeds up inference- great for running LLMs locally or on edge devices.
RAG vs Fine-tuning
RAG keeps external knowledge separate; fine-tuning puts knowledge inside the model.
RAG (Retrieval-Augmented Generation) fetches relevant documents at query time, then generates answers using both context and model.
Fine-tuning permanently adjusts the model’s weights using new data, embedding the knowledge directly into the model.
MoE vs LoRA
MoE is about activating different parts of a model; LoRA is about fine-tuning a model efficiently.
MoE (Mixture of Experts) uses multiple expert sub-models and activates only a few at a time to save compute.
LoRA (Low-Rank Adaptation) adds small trainable layers to a frozen base model so it can be fine-tuned quickly and cheaply.
Self-supervised vs Unsupervised
Self-supervised creates labels from the data; unsupervised doesn’t use labels at all.
Self-supervised learning trains a model to predict part of the data using another part. For example, predicting the next word in a sentence.
Unsupervised learning finds structure in unlabeled data like clustering similar items without any predefined categories.
Model Card vs Datasheet
Model cards describe the model; datasheets describe the data used to train it.
A model card gives info on what the model does, how it performs, limitations, and ethical considerations.
A datasheet explains the dataset: where it came from, what it contains, potential biases, and licensing.
In-context Learning vs Few-shot Learning
Few-shot is a subset of in-context learning.
In-context learning is when the model learns a task just by reading examples in the prompt, without updating its weights.
Few-shot learning is one type of in-context learning where the model sees a few examples before generating its own output.
Multi-modal vs Cross-modal
Multi-modal means the model can handle multiple data types; cross-modal means it can connect them.
Multi-modal models take in different modalities - like text + image and process them together.
Cross-modal models learn to translate or generate across modalities like generating a caption from an image, or image from a text.
Embedding vs Encoding
Both turn data into numbers, but for different goals.
Encoding usually refers to converting data into a structured numerical form for processing like tokenizing text.
Embedding creates dense vector representations that capture meaning, so that similar things are close together in space.
Prompt vs Prompt Template
A prompt is what you send to the model; a template is a reusable pattern for prompts.
A prompt is a specific instruction or question you type into the model, like “Summarize this article.”
A prompt template includes placeholders and structure like a form you fill in dynamically, useful in automation and tools like LangChain.
Safety vs Alignment
Safety is about avoiding harm; alignment is about matching model behavior with human intent.
Safety measures reduce toxic, biased, or dangerous outputs from the model.
Alignment ensures the model behaves according to intended values or objectives, especially important for powerful general-purpose models.
Output vs Completion
All completions are outputs, but not all outputs are completions.
Output is a general term for what the model returns - text, image, action, etc.
Completion specifically refers to text generated after a prompt, usually in a language model setting (e.g., “complete this sentence…”).
AGI vs LLM
LLM is a current tool; AGI is a future goal.
An LLM (Large Language Model) is a specialized AI that predicts and generates language.
AGI (Artificial General Intelligence) is a hypothetical system that can reason, learn, and perform across any domain like a human.
Reasoning vs Memorization
Memorization repeats; reasoning connects ideas.
Memorization is when the model recalls facts it has seen during training.
Reasoning is when the model connects concepts or follows logical steps it wasn’t explicitly trained on - like solving a riddle or coding from scratch.
Hope gave you a few “ahhh okay” moments and saved you from saying Mixtral when you meant Mistral.
-amrita