LLM Hallucinations Are Compression Artifacts — And That Explains Everything
Imagine being handed 10 terabytes of text and being told to compress it into a 70-gigabyte file. Not just store it—but make it usable. At any moment, someone might ask a question, and you’d need to reconstruct a meaningful answer from that compressed version.
Not perfectly. Not bit-by-bit. But close enough to make sense.
At that point, one realization becomes unavoidable: this is lossy compression. Some information will be lost. It’s not a flaw—it’s a mathematical inevitability.
And that’s exactly what large language models (LLMs) are doing.
The concept of AI performing sophisticated compression, similar to how LLMs handle text, is also revolutionizing graphics; discover how Neural Texture Compression by NVIDIA: How AI Cuts VRAM Usage by 85% uses AI to drastically cut VRAM usage in games.
Table of Contents
Prediction Is Compression (Not a Metaphor)
This idea might sound poetic, but it’s not. It’s grounded in information theory.
Back in 1948, Claude Shannon demonstrated something profound: predicting the next symbol in a sequence and compressing data are mathematically equivalent problems. If you can predict well, you can compress well. And if you can compress well, you inherently understand patterns in the data.
That means when a model like GPT predicts the next token, it’s not just generating text—it’s effectively decompressing a compressed representation of knowledge.
At its most fundamental level, this is what an LLM does:
def predict_next_token(context: str) -> Distribution:
"""Это одновременно и предсказание, и декомпрессия"""
pass
The better the prediction, the fewer bits are required to encode information. And fewer bits mean better compression.
So here’s the shift in perspective:
The weights of a language model are not just parameters—they are a compressed version of its training data.
The JPEG Analogy for Language Models
If you’ve ever over-compressed a JPEG image, you’ve seen what happens.
Large, simple structures—like a face or a blue sky—remain recognizable. But small details? They vanish first. Text becomes unreadable. Edges get weird halos. Colors appear that were never there.
And yet, the image still looks plausible.
Now replace pixels with knowledge.
- Large structures → common patterns, general knowledge
- Fine details → rare facts, exact numbers, specific dates
- Artifacts → hallucinations
A hallucination, then, isn’t random nonsense. It’s what happens when the model knows something should be there—a number, a citation, a fact—but the exact information wasn’t preserved during compression. So it reconstructs a plausible approximation.
Just like JPEG invents pixels, LLMs invent facts.
Why LLMs Excel at Code but Struggle with Math
This compression perspective suddenly clarifies something many people notice.
Why are LLMs so good at writing code?
Because code is highly compressible. It’s structured, repetitive, and follows strict rules. Patterns like for i in range(n) appear millions of times. That makes them easy to encode efficiently with minimal loss.
Math, however, is a different story.
Precise numbers don’t compress well. There’s no shortcut for something like 1847 × 9283. It’s not a pattern—it’s a specific computation. Either you compute it exactly, or you store it explicitly.
LLMs do neither. They approximate.
That’s why small calculations might work, but larger ones often fail in subtle ways—off by a digit, slightly incorrect, yet still believable.
Model Size, Temperature, and “Creativity”
If hallucinations are compression artifacts, then increasing model size is essentially increasing bitrate.
Think of it like moving from a low-quality JPEG to a high-quality one. More parameters mean more capacity to preserve detail. Fewer artifacts. Better reconstruction.
But never perfect—because compression still exists.
Then there’s temperature, often misunderstood as “creativity.”
In reality, it behaves more like a quality slider:
- Low temperature → sharp, deterministic output (but rigid artifacts)
- Medium → balanced sampling
- High → noisy, diverse, but less accurate
What we call creativity is often just sampling from less probable reconstructions—not true invention, but variation within compressed knowledge.
RAG, Fine-Tuning, and Prompting Through the Lens of Compression
Once you adopt this framework, many AI techniques become surprisingly intuitive.
- RAG (Retrieval-Augmented Generation)
Injects lossless data into the process. Instead of relying on compressed memory, the model gets access to the original information. - Fine-tuning
Reallocates compression priorities. It’s like saying: “Preserve legal language better, even if something else degrades.” - Prompt engineering
Guides the decompression process—telling the model where to “look” within its compressed representation. - RLHF
Adjusts perceived quality, similar to how audio codecs optimize for human perception.
Seen this way, we’re not “teaching intelligence.” We’re managing compression and reconstruction.
Can Hallucinations Ever Be Eliminated?
Here’s the uncomfortable truth.
If hallucinations are artifacts of lossy compression, then they cannot be completely eliminated.
You can:
- Increase model size (more bits)
- Add external memory (RAG)
- Improve architecture (better codec)
But as long as you’re compressing massive datasets into finite models, information loss is unavoidable.
Anyone claiming otherwise is either oversimplifying—or ignoring information theory.
Humans Are Lossy Codecs Too
This is where things get interesting.
Try recalling what you had for lunch last Thursday. Or what was on slide 14 of yesterday’s presentation.
Chances are, you can’t.
Human memory works the same way. We compress experiences into patterns, discard details, and reconstruct them later. Psychologists call this confabulation—filling in gaps with plausible information.
In other words, we hallucinate too.
The difference is time. Our “codec” has been optimized over millions of years. LLMs have had only a few.
Final Perspective: LLM as Artificial Memory
Maybe the biggest misconception is thinking of LLMs as thinking machines.
They’re not.
They’re closer to something else: artificial memory systems. Extremely dense, incredibly powerful, but inherently imperfect.
Once you accept that, a lot of confusion disappears.
You stop expecting perfect accuracy. You stop fearing sudden sentience. And you start treating LLMs like what they are:
A tool for reconstructing meaning from compressed knowledge.
Not truth. Not understanding. But something surprisingly useful in between.