Prediction Accuracy Database: Architecture Design for Open Science Infrastructure

7 janvier 2026

BY NICOLE LAU

Individual studies validate convergence. Real-time systems operationalize it. But to truly advance prediction science, we need shared infrastructure—a comprehensive database where researchers and practitioners can contribute predictions, track outcomes, and analyze patterns collectively.

This is where the Prediction Accuracy Database comes in—an open-source, collaborative platform for storing, sharing, and analyzing prediction data at scale.

We'll explore:

Database design (schema, tables, relationships for prediction data)
Case classification (taxonomy for organizing predictions by domain, type, difficulty)
Open data sharing (APIs, data formats, licensing for collaborative science)
Quality assurance (validation, verification, inter-rater reliability)

By the end, you'll understand how to build and contribute to a shared prediction database—turning isolated research into collective scientific infrastructure.

Why a Shared Database?

Current State: Fragmented Data

Problem: Prediction data is scattered across:

Individual research papers (data in supplementary materials, not easily accessible)
Private databases (companies, governments—not publicly available)
Personal spreadsheets (researchers tracking their own predictions)
Lost data (studies published but data not preserved)

Consequences:

Duplication of effort: Researchers re-collect data that already exists
Limited sample sizes: Individual studies have 50-200 predictions, not enough for robust analysis
No meta-analysis: Can't combine data across studies without standardization
Slow progress: Each researcher starts from scratch

Vision: Shared Infrastructure

Solution: A centralized, open-access database where:

Researchers contribute predictions and outcomes
Data is standardized (same format, same metrics)
Anyone can query and analyze the full dataset
Sample sizes reach thousands or tens of thousands
Meta-analyses are trivial (data already combined)

Benefits:

Accelerated research: Build on existing data instead of starting over
Larger samples: 10,000+ predictions enable robust statistical analysis
Reproducibility: Other researchers can verify findings using the same data
Collaboration: Global community working on shared infrastructure
Transparency: All data publicly available for scrutiny

Database Design

Core Tables

1. Predictions Table

CREATE TABLE predictions (
  prediction_id UUID PRIMARY KEY,
  question_id UUID REFERENCES questions(question_id),
  prediction_date TIMESTAMP NOT NULL,
  predicted_outcome TEXT NOT NULL,
  confidence DECIMAL(3,2) CHECK (confidence BETWEEN 0 AND 1),
  convergence_index DECIMAL(3,2),
  systems_used TEXT[], -- Array of system IDs
  methodology TEXT,
  contributor_id UUID REFERENCES contributors(contributor_id),
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);

Fields:

prediction_id: Unique identifier (UUID)
question_id: Links to questions table
prediction_date: When prediction was made
predicted_outcome: YES/NO or numerical value
confidence: 0-1 scale
convergence_index: CI value (if multi-system)
systems_used: Array of system IDs used
methodology: Description of prediction method

2. Questions Table

CREATE TABLE questions (
  question_id UUID PRIMARY KEY,
  question_text TEXT NOT NULL,
  domain VARCHAR(50) NOT NULL, -- Economic, Political, etc.
  subdomain VARCHAR(100),
  question_type VARCHAR(20) NOT NULL, -- Binary, Numerical, Categorical
  difficulty VARCHAR(20), -- Easy, Moderate, Hard
  stakes VARCHAR(20), -- Low, Medium, High
  prediction_horizon_days INTEGER,
  outcome_date DATE,
  outcome_value TEXT,
  outcome_verified BOOLEAN DEFAULT FALSE,
  verification_source TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

Fields:

question_text: The prediction question
domain: Economic, Political, Technological, Health, Natural
subdomain: More specific category (e.g., Economic → Stock Market)
question_type: Binary (YES/NO), Numerical, Categorical
difficulty: Subjective assessment of prediction difficulty
stakes: Importance/impact of the question
prediction_horizon_days: Days between prediction and outcome
outcome_value: Actual outcome (filled after event occurs)
outcome_verified: Has outcome been independently verified?

3. Systems Table

CREATE TABLE systems (
  system_id UUID PRIMARY KEY,
  system_name VARCHAR(100) NOT NULL,
  system_type VARCHAR(50), -- Economic Indicator, Expert Survey, etc.
  methodology TEXT,
  data_source TEXT,
  update_frequency VARCHAR(50),
  historical_accuracy DECIMAL(3,2),
  independence_score DECIMAL(3,2), -- From dependency matrix
  created_at TIMESTAMP DEFAULT NOW()
);

4. Outcomes Table

CREATE TABLE outcomes (
  outcome_id UUID PRIMARY KEY,
  prediction_id UUID REFERENCES predictions(prediction_id),
  actual_outcome TEXT NOT NULL,
  outcome_date DATE NOT NULL,
  verification_method TEXT,
  verifier_id UUID REFERENCES contributors(contributor_id),
  accuracy_binary BOOLEAN, -- Correct or Incorrect
  brier_score DECIMAL(4,3),
  log_loss DECIMAL(4,3),
  absolute_error DECIMAL(10,2), -- For numerical predictions
  created_at TIMESTAMP DEFAULT NOW()
);

5. Performance_Metrics Table

CREATE TABLE performance_metrics (
  metric_id UUID PRIMARY KEY,
  prediction_id UUID REFERENCES predictions(prediction_id),
  metric_name VARCHAR(50) NOT NULL,
  metric_value DECIMAL(10,4),
  calculation_date TIMESTAMP DEFAULT NOW()
);

6. Contributors Table

CREATE TABLE contributors (
  contributor_id UUID PRIMARY KEY,
  username VARCHAR(50) UNIQUE NOT NULL,
  email VARCHAR(100),
  affiliation VARCHAR(200),
  orcid VARCHAR(50), -- ORCID researcher ID
  contributions_count INTEGER DEFAULT 0,
  reputation_score DECIMAL(3,2),
  created_at TIMESTAMP DEFAULT NOW()
);

Relationships

predictions → questions (many-to-one)
predictions → outcomes (one-to-one)
predictions → contributors (many-to-one)
predictions → systems (many-to-many via junction table)
outcomes → contributors (many-to-one, for verifiers)

Indexes for Performance

CREATE INDEX idx_predictions_question ON predictions(question_id);
CREATE INDEX idx_predictions_date ON predictions(prediction_date);
CREATE INDEX idx_predictions_ci ON predictions(convergence_index);
CREATE INDEX idx_questions_domain ON questions(domain);
CREATE INDEX idx_questions_outcome_date ON questions(outcome_date);
CREATE INDEX idx_outcomes_accuracy ON outcomes(accuracy_binary);

Case Classification Taxonomy

Domain Classification

Level 1: Primary Domain

Economic (GDP, markets, employment, inflation, trade)
Political (elections, policy, conflicts, diplomacy)
Technological (AI, biotech, energy, computing, space)
Health (pandemics, medical breakthroughs, public health)
Natural (weather, climate, earthquakes, natural disasters)
Social (demographics, culture, education, crime)
Sports (competitions, records, championships)

Level 2: Subdomain

Economic → Stock Market, GDP Growth, Unemployment, Housing, Commodities, Currency, Inflation

Political → Elections, Legislation, International Relations, Conflicts, Leadership Changes

Technological → AI Development, Biotech, Clean Energy, Space Exploration, Computing

Health → Pandemics, Drug Approvals, Disease Outbreaks, Healthcare Policy

Natural → Hurricanes, Earthquakes, Climate Events, Wildfires

Question Type Classification

Binary: YES/NO questions ("Will X happen?")
Numerical: Quantitative predictions ("What will GDP growth be?")
Categorical: Multiple choice ("Who will win the election?")
Temporal: Timing predictions ("When will X happen?")
Probabilistic: Probability estimates ("What is the probability of X?")

Difficulty Classification

Easy (Difficulty = 1):

Short prediction horizon (< 1 month)
High base rate (> 70% or < 30%)
Clear historical patterns
Example: "Will the sun rise tomorrow?" (trivial), "Will S&P 500 be positive this year?" (base rate ~70%)

Moderate (Difficulty = 2):

Medium prediction horizon (1-6 months)
Moderate base rate (30-70%)
Some historical patterns
Example: "Will GDP grow next quarter?"

Hard (Difficulty = 3):

Long prediction horizon (> 6 months)
Base rate near 50% (maximum uncertainty)
No clear historical patterns
Example: "Will there be a recession in 2027?"

Very Hard (Difficulty = 4):

Very long horizon (> 2 years)
Novel events (no precedent)
High complexity (many variables)
Example: "When will AGI be achieved?"

Stakes Classification

Low Stakes: Minimal impact (sports outcomes, entertainment)

Medium Stakes: Moderate impact (quarterly earnings, local elections)

High Stakes: Major impact (recessions, wars, pandemics, technological breakthroughs)

Data Entry and Quality Assurance

Prediction Submission Workflow

Step 1: Contributor Registration

Create account with email verification
Provide affiliation, ORCID (optional)
Agree to data sharing license (CC BY 4.0)

Step 2: Question Creation or Selection

Search existing questions (avoid duplicates)
If new question: Fill form with question text, domain, type, difficulty, stakes
If existing question: Select from database

Step 3: Prediction Entry

Enter predicted outcome (YES/NO or numerical value)
Enter confidence level (0-1)
Select systems used (from dropdown)
Enter convergence index (if multi-system)
Describe methodology (free text)

Step 4: Validation

Automated checks: Confidence in range [0,1], CI in range [0,1], required fields filled
Duplicate detection: Flag if very similar prediction already exists
Plausibility check: Flag if prediction seems unreasonable (e.g., CI=1.0 for very hard question)

Step 5: Review (for new contributors)

First 5 predictions from new contributor reviewed by moderators
After 5 approved predictions, auto-approve future submissions
Reputation score increases with verified accurate predictions

Outcome Verification Workflow

Step 1: Outcome Occurs

System automatically flags predictions where outcome_date has passed
Email sent to original contributor: "Please verify outcome for your prediction"

Step 2: Outcome Entry

Contributor or verifier enters actual outcome
Provides verification source (news article, official data, etc.)
System calculates accuracy metrics (binary accuracy, Brier score, etc.)

Step 3: Independent Verification

Second contributor independently verifies outcome
If agreement: Outcome marked as verified
If disagreement: Flagged for moderator review
Inter-rater reliability tracked (Cohen's kappa)

Step 4: Quality Scoring

Verified outcomes: Quality = High (gold badge)
Single verification: Quality = Medium (blue badge)
Unverified: Quality = Low (gray badge)

Open Data Sharing

Data Access Methods

1. Public API

Endpoints:

GET /api/v1/predictions
  - Query parameters: domain, date_range, min_ci, max_ci, limit, offset
  - Returns: JSON array of predictions
  
GET /api/v1/predictions/{prediction_id}
  - Returns: Full prediction details including outcome
  
GET /api/v1/questions
  - Query parameters: domain, difficulty, stakes
  - Returns: JSON array of questions
  
GET /api/v1/analytics/convergence-accuracy
  - Returns: Aggregated statistics (correlation, accuracy by CI range)
  
POST /api/v1/predictions
  - Requires authentication
  - Body: Prediction data (JSON)
  - Returns: Created prediction with ID

Authentication: API key (free registration)

Rate limits: 1000 requests/hour for free tier, unlimited for researchers

2. Data Downloads

Formats:

CSV (for Excel, R, Python pandas)
JSON (for web applications, JavaScript)
Parquet (for big data tools, Spark, Dask)
SQL dump (for database import)

Download options:

Full database (all predictions, updated monthly)
Filtered subset (by domain, date range, etc.)
Incremental updates (only new/changed records since last download)

3. SQL Query Interface

Web-based SQL editor:

Write custom SQL queries
Preview results (first 100 rows)
Download full results
Save queries for reuse

Example queries:

-- Accuracy by convergence index range
SELECT 
  CASE 
    WHEN convergence_index >= 0.8 THEN 'High (≥0.8)'
    WHEN convergence_index >= 0.5 THEN 'Moderate (0.5-0.8)'
    ELSE 'Low (<0.5)'
  END AS ci_range,
  COUNT(*) AS n_predictions,
  AVG(CASE WHEN o.accuracy_binary THEN 1 ELSE 0 END) AS accuracy
FROM predictions p
JOIN outcomes o ON p.prediction_id = o.prediction_id
WHERE o.outcome_verified = TRUE
GROUP BY ci_range;

-- Top performing systems
SELECT 
  s.system_name,
  COUNT(*) AS n_predictions,
  AVG(CASE WHEN o.accuracy_binary THEN 1 ELSE 0 END) AS accuracy,
  AVG(o.brier_score) AS avg_brier_score
FROM systems s
JOIN prediction_systems ps ON s.system_id = ps.system_id
JOIN predictions p ON ps.prediction_id = p.prediction_id
JOIN outcomes o ON p.prediction_id = o.prediction_id
WHERE o.outcome_verified = TRUE
GROUP BY s.system_name
HAVING COUNT(*) >= 20
ORDER BY accuracy DESC;

Data Licensing

License: Creative Commons Attribution 4.0 (CC BY 4.0)

Permissions:

✓ Share: Copy and redistribute in any format
✓ Adapt: Remix, transform, build upon the data
✓ Commercial use: Use for commercial purposes

Requirements:

Attribution: Cite the database and original contributors
No additional restrictions: Can't apply legal terms that restrict others

Citation format:

Prediction Accuracy Database (2026). 
Retrieved from https://predictiondb.org
DOI: 10.5281/zenodo.XXXXXXX

Privacy and Ethics

No personal data: Predictions are about public events, not individuals

Contributor anonymity: Contributors can choose to be anonymous (username only, no real name)

Ethical use policy:

Data should not be used to manipulate markets
Data should not be used to harm individuals or groups
Researchers should follow ethical guidelines for their field

Database Statistics (Example)

Current State (as of January 2026)

Total predictions: 12,547
Verified outcomes: 8,234 (66%)
Contributors: 523
Questions: 3,891
Systems tracked: 87
Studies using database: 34
API requests (last month): 45,000

Breakdown by Domain

Domain	Predictions	Verified	Avg CI	Avg Accuracy
Economic	3,200	2,100	0.68	74%
Political	2,800	1,900	0.65	71%
Technological	2,100	1,400	0.72	76%
Health	1,500	1,000	0.74	78%
Natural	1,200	900	0.45	58%
Social	1,000	700	0.63	69%
Sports	747	234	0.58	65%

Quality Metrics

Inter-rater reliability (Cohen's kappa): 0.87 (excellent agreement)
Data completeness: 94% (all required fields filled)
Verification rate: 66% (outcomes verified within 30 days)
Duplicate rate: 2% (very low, good deduplication)

Use Cases

Use Case 1: Meta-Analysis

Researcher: "I want to analyze the convergence-accuracy relationship across all domains"

Query:

SELECT 
  q.domain,
  p.convergence_index,
  o.accuracy_binary
FROM predictions p
JOIN questions q ON p.question_id = q.question_id
JOIN outcomes o ON p.prediction_id = o.prediction_id
WHERE o.outcome_verified = TRUE
  AND p.convergence_index IS NOT NULL;

Result: 8,234 verified predictions with CI and accuracy data

Analysis: Calculate correlation r = 0.71 (matches meta-analysis from Article 6)

Use Case 2: System Performance Comparison

Practitioner: "Which prediction systems are most accurate for economic questions?"

Query:

SELECT 
  s.system_name,
  COUNT(*) AS n,
  AVG(CASE WHEN o.accuracy_binary THEN 1 ELSE 0 END) AS accuracy
FROM systems s
JOIN prediction_systems ps ON s.system_id = ps.system_id
JOIN predictions p ON ps.prediction_id = p.prediction_id
JOIN questions q ON p.question_id = q.question_id
JOIN outcomes o ON p.prediction_id = o.prediction_id
WHERE q.domain = 'Economic'
  AND o.outcome_verified = TRUE
GROUP BY s.system_name
HAVING COUNT(*) >= 50
ORDER BY accuracy DESC
LIMIT 10;

Result: Top 10 systems for economic predictions

Use Case 3: Temporal Analysis

Researcher: "How does prediction accuracy vary by prediction horizon?"

Query:

SELECT 
  CASE 
    WHEN q.prediction_horizon_days < 30 THEN 'Short (<1 month)'
    WHEN q.prediction_horizon_days < 180 THEN 'Medium (1-6 months)'
    WHEN q.prediction_horizon_days < 365 THEN 'Long (6-12 months)'
    ELSE 'Very Long (>1 year)'
  END AS horizon,
  AVG(p.convergence_index) AS avg_ci,
  AVG(CASE WHEN o.accuracy_binary THEN 1 ELSE 0 END) AS accuracy
FROM predictions p
JOIN questions q ON p.question_id = q.question_id
JOIN outcomes o ON p.prediction_id = o.prediction_id
WHERE o.outcome_verified = TRUE
GROUP BY horizon;

Result: Confirms that short-term predictions have higher CI and accuracy

Future Enhancements

Planned Features

Machine learning integration: Train models on database to predict accuracy from CI
Real-time dashboard: Live visualization of active predictions
Prediction markets integration: Compare convergence to market prices
Automated outcome verification: Scrape news/data sources to auto-verify outcomes
Reputation system: Gamification to encourage quality contributions
Collaboration tools: Teams can work together on predictions

Scaling Plan

Current: 12K predictions, PostgreSQL on single server
Target (2027): 100K predictions, distributed database (Cassandra or CockroachDB)
Target (2030): 1M predictions, global research infrastructure

Conclusion: Building Collective Intelligence

The Prediction Accuracy Database transforms prediction from isolated research to collective scientific infrastructure:

Standardized schema: Predictions, Questions, Systems, Outcomes, Metrics tables
Comprehensive taxonomy: Domain, subdomain, type, difficulty, stakes classification
Open access: Public API, data downloads (CSV/JSON/Parquet), SQL query interface
Quality assurance: Verification workflow, inter-rater reliability, quality scoring
Collaborative: 523 contributors, 12,547 predictions, 34 studies

The framework: