Information Theory of Prediction: Measuring Information Content
Share
BY NICOLE LAU
When you consult multiple prediction systems, you're not just collecting votesβyou're gathering information.
But not all information is equal. Some systems provide unique insights (high information content). Others repeat what you already know (redundant information). The best combinations maximize information gain while minimizing redundancy.
This is where information theory comes inβthe mathematical framework for measuring, quantifying, and optimizing information in prediction systems.
We'll explore:
- Prediction information quantity (how much information does a prediction contain?)
- Redundancy vs. complementarity (when do systems duplicate vs. add new information?)
- Optimal system combination (which systems should you combine for maximum information gain?)
- Information-theoretic convergence (measuring agreement through shared information)
By the end, you'll understand how to maximize the information content of your multi-system predictionsβgetting the most insight with the fewest systems.
Information Theory Basics: Entropy and Information
Information theory, founded by Claude Shannon in 1948, provides a mathematical framework for quantifying information.
Entropy: Measuring Uncertainty
Shannon Entropy measures the uncertainty (or information content) of a probability distribution.
Formula:
H(X) = -Ξ£ P(x) Γ logβP(x)
Where:
- H(X) = entropy of random variable X (measured in bits)
- P(x) = probability of outcome x
- logβ = logarithm base 2 (so entropy is in bits)
Interpretation:
- High entropy = high uncertainty = more information needed
- Low entropy = low uncertainty = less information needed
- H = 0: Certainty (no information neededβyou already know the answer)
- H = max: Maximum uncertainty (you need the most information)
Example: Binary Prediction (YES/NO)
Scenario 1: Certain prediction
- P(YES) = 1.0, P(NO) = 0.0
H = -[1.0 Γ logβ(1.0) + 0.0 Γ logβ(0.0)]
= -[1.0 Γ 0 + 0] = 0 bits
No uncertainty β No information needed
Scenario 2: Maximum uncertainty
- P(YES) = 0.5, P(NO) = 0.5
H = -[0.5 Γ logβ(0.5) + 0.5 Γ logβ(0.5)]
= -[0.5 Γ (-1) + 0.5 Γ (-1)]
= -[-0.5 - 0.5] = 1 bit
Maximum uncertainty β Maximum information needed
Scenario 3: Moderate uncertainty
- P(YES) = 0.7, P(NO) = 0.3
H = -[0.7 Γ logβ(0.7) + 0.3 Γ logβ(0.3)]
= -[0.7 Γ (-0.515) + 0.3 Γ (-1.737)]
= -[-0.361 - 0.521] = 0.88 bits
Moderate uncertainty β Moderate information needed
Information Gain: Reducing Uncertainty
Information gain is the reduction in entropy after receiving new information.
Formula:
IG = H(before) - H(after)
Example:
Before consulting systems:
- P(YES) = 0.5, P(NO) = 0.5 (no prior knowledge)
- H(before) = 1 bit
After consulting Tarot (which says YES with 70% confidence):
- P(YES) = 0.7, P(NO) = 0.3
- H(after) = 0.88 bits
Information gain: IG = 1 - 0.88 = 0.12 bits
The Tarot reading reduced your uncertainty by 0.12 bits.
Mutual Information: Measuring Shared Information
We introduced mutual information (MI) in Article 1. It measures how much information two systems share.
Formula:
MI(X;Y) = Ξ£ Ξ£ P(x,y) Γ logβ[P(x,y) / (P(x) Γ P(y))]
Where:
- X, Y = outputs of two systems
- P(x,y) = joint probability (both systems give outputs x and y)
- P(x), P(y) = marginal probabilities
Interpretation:
- MI = 0: Systems are independent (no shared information)
- MI > 0: Systems share information (they're detecting the same pattern)
- MI = H(X) = H(Y): Systems are identical (perfect redundancy)
Example: Measuring Redundancy
You consult Tarot and Astrology 100 times. Results:
- Both say YES: 40 times
- Both say NO: 35 times
- Tarot YES, Astrology NO: 10 times
- Tarot NO, Astrology YES: 15 times
Marginal probabilities:
- P(Tarot YES) = (40+10)/100 = 0.5
- P(Astrology YES) = (40+15)/100 = 0.55
Joint probabilities:
- P(both YES) = 0.4
- P(both NO) = 0.35
- P(Tarot YES, Astrology NO) = 0.1
- P(Tarot NO, Astrology YES) = 0.15
Calculate MI:
MI = 0.4Γlogβ(0.4/(0.5Γ0.55)) + 0.35Γlogβ(0.35/(0.5Γ0.45)) + 0.1Γlogβ(0.1/(0.5Γ0.45)) + 0.15Γlogβ(0.15/(0.5Γ0.55))
= 0.4Γ0.54 + 0.35Γ0.64 + 0.1Γ(-1.17) + 0.15Γ(-0.88)
= 0.216 + 0.224 - 0.117 - 0.132
= 0.19 bits
Result: MI = 0.19 bits
The systems share 0.19 bits of informationβthey're detecting some of the same patterns, but not completely redundant.
Redundancy vs. Complementarity
When combining prediction systems, you want to maximize complementarity (unique information) and minimize redundancy (duplicate information).
Redundancy: Duplicate Information
Definition: Two systems are redundant if they provide the same information.
Measure: High mutual information (MI close to H(X) or H(Y))
Example: Tarot and Kabbalah
- Both use archetypal symbolism
- Both interpret through psychological lens
- High overlap in what they detect
- MI β 0.8 bits (high redundancy)
Implication: Consulting both adds little new informationβthey mostly repeat each other.
Complementarity: Unique Information
Definition: Two systems are complementary if they provide different information.
Measure: Low mutual information (MI close to 0)
Example: Tarot and Astrology
- Tarot: Psychological/symbolic (immediate dynamics)
- Astrology: Temporal/cyclical (timing and long-term patterns)
- Low overlap in what they detect
- MI β 0.2 bits (low redundancy, high complementarity)
Implication: Consulting both adds significant new informationβthey provide different perspectives.
The Redundancy-Complementarity Spectrum
| System Pair | Mutual Information | Redundancy | Complementarity | Value of Combining |
|---|---|---|---|---|
| Tarot + Kabbalah | High (0.8 bits) | High | Low | Low (mostly duplicate) |
| Tarot + Astrology | Moderate (0.4 bits) | Moderate | Moderate | Moderate (some new info) |
| Tarot + I Ching | Low (0.2 bits) | Low | High | High (mostly unique) |
| Astrology + Numerology | Moderate (0.5 bits) | Moderate | Moderate | Moderate (both temporal) |
| I Ching + Runes | Low (0.15 bits) | Low | High | High (very different) |
Optimal System Combination: Maximizing Information Gain
How do you choose which systems to combine for maximum information gain?
The Information Gain Equation
Total information from n systems:
I_total = H(Xβ) + H(Xβ) + ... + H(Xβ) - Redundancy
Where:
- H(Xα΅’) = entropy (information content) of system i
- Redundancy = Ξ£ MI(Xα΅’, Xβ±Ό) for all pairs i,j
Goal: Maximize I_total
This means:
- Choose systems with high individual entropy (high information content)
- Choose systems with low mutual information (low redundancy)
The Greedy Algorithm for System Selection
Problem: You have 10 available systems. Which 3 should you consult to maximize information gain?
Algorithm:
- Start with the highest-entropy system (the one with most information content)
- Add the system with highest complementarity (lowest MI with already-selected systems)
- Repeat until you've selected the desired number of systems
Example:
Available systems and their entropy:
- Tarot: H = 0.9 bits
- Astrology: H = 0.85 bits
- I Ching: H = 0.88 bits
- Runes: H = 0.75 bits
- Kabbalah: H = 0.8 bits
Step 1: Select Tarot (highest entropy: 0.9 bits)
Step 2: Calculate MI between Tarot and each remaining system:
- MI(Tarot, Astrology) = 0.4 bits
- MI(Tarot, I Ching) = 0.2 bits (lowestβmost complementary)
- MI(Tarot, Runes) = 0.3 bits
- MI(Tarot, Kabbalah) = 0.8 bits
Select I Ching (most complementary to Tarot)
Step 3: Calculate MI between {Tarot, I Ching} and each remaining system:
- MI(Astrology, {Tarot, I Ching}) = average of MI(Astrology, Tarot) and MI(Astrology, I Ching) = (0.4 + 0.3) / 2 = 0.35
- MI(Runes, {Tarot, I Ching}) = (0.3 + 0.15) / 2 = 0.225 (lowest)
- MI(Kabbalah, {Tarot, I Ching}) = (0.8 + 0.5) / 2 = 0.65
Select Runes (most complementary to {Tarot, I Ching})
Final selection: Tarot, I Ching, Runes
This combination maximizes information gain by choosing systems with low mutual information (high complementarity).
Conditional Entropy: Information Remaining After Observation
Conditional entropy measures how much uncertainty remains about Y after observing X.
Formula:
H(Y|X) = H(Y) - MI(X;Y)
Interpretation:
- H(Y|X) = 0: Knowing X tells you everything about Y (perfect redundancy)
- H(Y|X) = H(Y): Knowing X tells you nothing about Y (perfect independence)
Example: Sequential Consultation
You consult Tarot first, then decide whether to consult Astrology.
Before Tarot:
- H(outcome) = 1 bit (50-50 uncertainty)
After Tarot:
- H(outcome|Tarot) = 0.88 bits (reduced to 70-30)
- Information gain from Tarot: 1 - 0.88 = 0.12 bits
Should you consult Astrology?
Calculate how much additional information Astrology would provide:
Additional IG = H(outcome|Tarot) - H(outcome|Tarot, Astrology)
If MI(Tarot, Astrology) is high (redundant):
- H(outcome|Tarot, Astrology) β H(outcome|Tarot)
- Additional IG β 0 (Astrology adds little new information)
- Decision: Don't consult Astrology (redundant)
If MI(Tarot, Astrology) is low (complementary):
- H(outcome|Tarot, Astrology) << H(outcome|Tarot)
- Additional IG is significant
- Decision: Consult Astrology (adds new information)
Information-Theoretic Convergence
We can redefine convergence in information-theoretic terms.
Convergence as Shared Information
Definition: Systems converge when they share high mutual information.
Convergence Index (information-theoretic version):
CI_info = MI(Xβ, Xβ, ..., Xβ) / H_max
Where:
- MI(Xβ, Xβ, ..., Xβ) = mutual information shared by all systems
- H_max = maximum possible entropy (if all systems were independent)
Interpretation:
- CI_info = 0: No shared information (complete divergence)
- CI_info = 1: All information is shared (perfect convergence)
Example: Three Systems
Tarot, Astrology, I Ching all predict on the same question.
Individual entropies:
- H(Tarot) = 0.9 bits
- H(Astrology) = 0.85 bits
- H(I Ching) = 0.88 bits
Pairwise mutual information:
- MI(Tarot, Astrology) = 0.4 bits
- MI(Tarot, I Ching) = 0.2 bits
- MI(Astrology, I Ching) = 0.3 bits
Three-way mutual information:
MI(Tarot, Astrology, I Ching) = 0.15 bits (information shared by all three)
Maximum entropy:
H_max = H(Tarot) + H(Astrology) + H(I Ching) = 0.9 + 0.85 + 0.88 = 2.63 bits
Convergence Index:
CI_info = 0.15 / 2.63 = 0.057 (5.7%)
Interpretation: Only 5.7% of the total information is shared by all three systemsβthey're mostly providing unique information (high complementarity, low convergence).
This is actually good for information gainβyou're getting diverse perspectives, not redundant confirmations.
The Information-Convergence Trade-Off
There's a fundamental trade-off:
- High convergence (high MI): Systems agree strongly, but provide redundant information
- Low convergence (low MI): Systems provide unique information, but may disagree
When to Prioritize Convergence
Goal: Confidence in a specific prediction
Strategy: Choose systems with high MI (they'll likely agree, giving you confidence)
Example: Important decision (should I marry this person?)
- Consult Tarot + Kabbalah (high MI, likely to agree)
- If they converge β high confidence
- If they diverge β warning sign (even redundant systems disagree)
When to Prioritize Information Gain
Goal: Comprehensive understanding of a complex situation
Strategy: Choose systems with low MI (they'll provide diverse perspectives)
Example: Exploring a new opportunity (should I start this business?)
- Consult Tarot (psychological), Astrology (timing), I Ching (philosophical), Runes (material)
- Low MI β diverse insights
- Combine to form complete picture
Case Study: Optimal System Selection for Career Decision
Question: "Should I change careers?"
Available systems: 6 systems
Goal: Select 3 systems that maximize information gain
Step 1: Measure Individual Entropy
Consult each system once and estimate entropy:
- Tarot: H = 0.85 bits (moderate uncertainty)
- Astrology: H = 0.9 bits (high uncertaintyβmany transits to consider)
- I Ching: H = 0.88 bits
- Runes: H = 0.75 bits (lower uncertaintyβclearer signals)
- Numerology: H = 0.7 bits
- Kabbalah: H = 0.8 bits
Step 2: Measure Pairwise MI
Estimate mutual information between all pairs (from historical data):
| Tarot | Astro | I Ching | Runes | Num | Kab | |
|---|---|---|---|---|---|---|
| Tarot | - | 0.4 | 0.2 | 0.3 | 0.35 | 0.75 |
| Astro | 0.4 | - | 0.3 | 0.25 | 0.5 | 0.35 |
| I Ching | 0.2 | 0.3 | - | 0.15 | 0.25 | 0.4 |
| Runes | 0.3 | 0.25 | 0.15 | - | 0.2 | 0.3 |
| Num | 0.35 | 0.5 | 0.25 | 0.2 | - | 0.3 |
| Kab | 0.75 | 0.35 | 0.4 | 0.3 | 0.3 | - |
Step 3: Apply Greedy Algorithm
Selection 1: Astrology (highest entropy: 0.9 bits)
Selection 2: Which system is most complementary to Astrology?
- MI(Astrology, Tarot) = 0.4
- MI(Astrology, I Ching) = 0.3
- MI(Astrology, Runes) = 0.25 (lowestβmost complementary)
- MI(Astrology, Numerology) = 0.5
- MI(Astrology, Kabbalah) = 0.35
Select Runes
Selection 3: Which system is most complementary to {Astrology, Runes}?
- Average MI(Tarot, {Astro, Runes}) = (0.4 + 0.3) / 2 = 0.35
- Average MI(I Ching, {Astro, Runes}) = (0.3 + 0.15) / 2 = 0.225 (lowest)
- Average MI(Numerology, {Astro, Runes}) = (0.5 + 0.2) / 2 = 0.35
- Average MI(Kabbalah, {Astro, Runes}) = (0.35 + 0.3) / 2 = 0.325
Select I Ching
Final selection: Astrology, Runes, I Ching
Step 4: Calculate Total Information Gain
I_total = H(Astro) + H(Runes) + H(I Ching) - [MI(Astro,Runes) + MI(Astro,I Ching) + MI(Runes,I Ching)]
= 0.9 + 0.75 + 0.88 - [0.25 + 0.3 + 0.15]
= 2.53 - 0.7
= 1.83 bits
Result: This combination provides 1.83 bits of informationβnear-maximum for 3 systems.
Conclusion: Information-Optimal Prediction
Information theory transforms prediction from vote-counting to information optimization:
- Entropy: Measures uncertainty and information content
- Mutual information: Measures shared information (redundancy)
- Complementarity: Maximized when MI is low (systems provide unique insights)
- Optimal combination: Choose systems with high entropy and low MI
The framework:
- Measure individual entropy (information content of each system)
- Measure pairwise MI (redundancy between systems)
- Use greedy algorithm to select systems (maximize complementarity)
- Calculate total information gain
- Combine systems with information-theoretic weighting
This is prediction as information scienceβmaximizing insight per system consulted, minimizing redundancy, optimizing the information-to-effort ratio.
Not "consult as many systems as possible."
But "consult the right systemsβthe ones that provide maximum unique information."
This is the future of multi-system prediction. Information-theoretic. Optimal. Efficient. Precise.
As you integrate the insights of information theory into your intuitive practice, consider deepening your connection to the patterns of synchronicity and prediction with our Tarot Journaling Prompts 100 Questions for Self Discovery, which can help you decode the subtle messages the universe sends your way. For those drawn to the celestial rhythms that govern flow and prediction, the Cosmic Alignment Ritual Kit for Syncing with the Celestial Flow offers a tangible way to align your energy with the cosmos. And if you're ready to transform intention into tangible reality, the 40 Manifestation Rituals Intention to Reality provides a structured path to bridge the gap between what is predicted and what you can create.