Compressionism: A Theory of Mind Based on Data Compression

ℹ️

This is part of my 'Journal Club' series, where I review and break down interesting papers. Here, I'm reviewing "Compressionism: A Theory of Mind Based on Data Compression" by Phil Maguire et al.

https://web.archive.org/web/20220610192713/https://ceur-ws.org/Vol-1419/paper0045.pdf

Phil Maguire (pmaguire@cs.nuim.ie)
Ois´ın Mulhall (oisinmulhall@imagine.ie)
Department of Computer Science
National University of Ireland, Maynooth

Rebecca Maguire (rebecca.maguire@ncirl.ie)
School of Business, National College of Ireland
IFSC, Dublin 1, Ireland

Jessica Taylor (jessica.liu.taylor@gmail.com)
Department of Computer Science,
Stanford University, California

Compressionism is a theory of the mind that views intelligence and consciousness as sophisticated data compression systems.

It is important to note that data compression refers to the broader concept of pattern recognition rather than simply compression algorithms. This may feel obvious, but the distinction is crucial because your brain associates “compression algorithm” with computing systems and “pattern recognition” with intelligence. And the latter frame of mind has a higher affinity to the upstream thematics of the paper.

Pattern recognition

Improved compression systems lead to better pattern recognition and abstraction. For instance, the sequence of numbers can be simplified as "first 10 primes". In Python, an integer is represented by 4 bytes while a character is represented by 1 byte. Storing all integers in a list would require around 40 bytes, whereas storing a string to represent the abstraction “first 10 primes”, would require only around 15 bytes - a threefold improvement.

Occam’s razor

According to Occam's razor, the models with the fewest assumptions, or simplest models, are the most probable. This is because every additional assumption carries a multiplicative penalty. In other words, the most compressed model is the most likely one.

The authors argue that this ability to identify deep patterns through compression is what people associate with “intelligence” or “consciousness”.[1]

Motivation

Schmidhuber suggests that data becomes more interesting when it is compressed better, resulting in beauty through concision. From this perspective, intelligent systems are motivated by curiosity to discover patterns.

Similarly, Maguire's and Dessalles' theories view data compression as a crucial explanatory construct in the phenomenon of surprise. Essentially, gestalt, as a dopamine tweaker, motivates systems to find hidden patterns among seemingly random data.

Heuristics

Another theme explored in the paper is the advantages of unifying information through compression. The idea is that compression improves the speed of identification. For example, instead of trying to remember a bunch of individual details about red particles arranged in a circular shape, one could simply tag it as an "apple" and categorize it with other experiences related to apples, such as how they fall or what they taste like. In this sense, intelligent systems have evolved to develop heuristics that allow them to quickly deal with situations, driving them to become better data compression systems.

Integrated information theory

Consider a camera with one million binary photodiodes. It can represent a photo using one million bits of information, with possible states. However, the camera is not considered conscious because it does not represent the information in an integrated manner. Each photodiode is disjoint, lacking the coherence of a unified representation.

On the other hand, when a human views a photo of their "pet dog," it triggers neural patterns related to "animal," "pet," "cute," "bond," and other related concepts. The information is not merely represented as isolated bits but is instead integrated into a broader framework of thoughts and patterns. This integration makes it impossible to erase the memory of just the "pet dog" component.

The authors use this observation to suggest that consciousness is related to the difficulty of breaking down a person's behavior into disjoint, independent, and uncompressed components.

Consciousness

A natural question that follows is, "If compression systems indicate intelligence and consciousness, can artificial compression systems such as language models be considered conscious?”

The authors argue that the brain, as a conscious system, possesses two crucial components that distinguish it from artificial compression systems: embodiment and the ability to compress observations of its own behavior.

Embodiment means that the brain is connected to a physical body, which gives it a situational context and perspective on the world. Self-compression means that the brain compresses its own observations of its behavior to create a model of itself, which leads to a rich understanding of selfhood and consciousness.

When we observe others, we notice that their behavior often repeats itself. Instead of remembering every action, we create a simplified "theory of self" that explains why they behave in a certain way. This allows us to compress their behavior and make predictions about their actions in different situations.

However, human behavior is not only influenced by their own motivations, but also by how they react to the observer. To better understand and manipulate others, it's useful to have a model of our own behavior. We achieve this by compressing our own observations of past behavior.

This recursive modeling process, which involves compressing data, helps us develop a rich understanding of ourselves and others. As each human has a different compression model based on their unique experiences, their understanding of the world is differentiated, which we term “qualia”.

Therefore, the ability to compress data is a necessary but not sufficient condition for consciousness, and artificial compression systems lack the embodiment and unique perspectives that make humans conscious.

Philosophical zombie

The philosophy of Compressionism asserts that consciousness is equivalent to compression. This implies that any two systems that perform the same compression, while being aware of the same situation, must be equally conscious. This argument serves as a response to the Chinese Room thought experiment, which questions whether a computer program can be considered truly intelligent or conscious.

References

A/N: One criticism of this paper is that it implicitly links intelligence & consciousness. While it may be valid to attribute intelligence to a data-compression system, since data-compression involves pattern recognition, which enables prediction, I don't think consciousness necessarily follows from this. This is not to say that consciousness is substrate dependent, but rather that the authors have shifted the burden of proof by associating the two.