<aside> đź”— Source https://web.archive.org/web/20220610192713/https://ceur-ws.org/Vol-1419/paper0045.pdf

Phil Maguire ([email protected]) Ois´ın Mulhall ([email protected]) Department of Computer Science National University of Ireland, Maynooth

Rebecca Maguire ([email protected]) School of Business, National College of Ireland IFSC, Dublin 1, Ireland

Jessica Taylor ([email protected]) Department of Computer Science, Stanford University, California


Compressionism is a theory of the mind — intelligence & consciousness. It hypothesises that intelligence & consciousness are sophisticated data compression systems.

Here, data compression refers to the broad umbrella of “pattern recognition”, not merely “compression algorithm”. This may feel obvious, but the delineation is helpful because your brain emits more “computing system” analogies on hearing “compression algorithm” and more “intelligence” analogies on hearing “pattern recognition”. And the latter frame of mind has a higher affinity to the upstream thematics of the paper.

Pattern recognition

So, better compression systems have better pattern recognition & abstraction. For example, the sequence $2, 3, 5, 7, 11, 13, 17, 19, 23, 29$ can be simplified as “first 10 primes”. In Python, an int represents 4 bytes & a char represents 1 byte. Storing all integers in a list would be $\sim\text{40b}$, whereas storing a string to represent the abstraction (”utf-8” encoded) would take $\sim\text{15b}$ — a 3x improvement.

Occam’s razor

Per Occam’s razor, the models with the fewest assumptions (simplest models) have the highest probability. This is because each additional assumption carries a multiplicative penalty. To rephrase, the most likely model is the most compressed one.

The authors state that this process of identifying deep patterns, through compression, is what people attribute to when they think “intelligence” or “consciousness”.


Schmidhuber proposes that data becomes more interesting once the observer compresses it better, resulting in beauty through concision. From this perspective, intelligent systems are motivated by curiosity to discover patterns.

In a similar vein, both Maguire’s and Dessalles’ theories view data compression as a key explanative construct in the phenomenon of surprise. Essentially, gestalt — as a dopamine tweaker — motivates systems to find hidden patterns among seemingly random data.


Another theme is the benefit of unification of information via compression. One line of reasoning: compression improves the speed of identification. For e.g., instead of remembering a bunch of red particles in a “circulary” shape, you just tag an “apple” & categorise a bunch of experiences related to how apples fall and what they do, etc. Essentially, evolution has shaped intelligent systems to develop heuristics to deal with situations quickly, consequently driving them to become better data compression systems.

Integrated information theory

Consider a camera with a million binary photodiodes. The camera can represent a photo as 1 million bits of information (with $2^{1,000,000}$ possible states). However, the camera is not considered conscious because it does not represent that information in an integrated manner – each photodiode is disjoint.

On the other hand, if a human views a photo of, say, their “pet dog”, it emits neural patterns related to “animal” or “cat” or “teeth” or “cute” or “bond” etc. And we can’t obliviate just *“*pet dog” out of the brain later. Because every bit of information is (loosely or tightly) integrated onto an existing framework of thoughts & patterns.

In this manner, the authors correlate consciousness to this unavoidable difficulty of breaking down people’s behaviour into disjoint, independent (uncompressed) components.


An obvious followup question is “if compression systems indicate intelligence & consciousness, are artificial compression systems (like LLMs) conscious?”.

The authors argue that the brain (the conscious system) has two additional ingredients which set it apart from artificial compression systems: (a) it is embodied and (b) it compresses observations of its own behaviour.