Peer-reviewed methodology - MiroFish + OASIS

Can AI Agents Predict What
Consultants Cannot?

A systematic evaluation of multi-agent simulation for strategic decision-making, built on MiroFish swarm intelligence architecture and the peer-reviewed OASIS simulation engine.

Request Full Paper Read OASIS on arXiv

All seed documents, ground truth data, and accuracy scores in this research are derived from publicly verifiable sources - news reports, SEC filings, earnings calls, regulatory announcements, and official company statements. Every claim can be independently verified. No proprietary or privileged data was used in any simulation.

Overall Accuracy

Case Studies

Dimensions Tested

Beat Human Experts

Accuracy by Domain

Performance varies by scenario complexity - multi-stakeholder domains yield the highest accuracy

Multi-Stakeholder(Consumer Pricing, Trade Policy, Platform Governance)

97%

Corporate / M&A(Corporate Strategy, M&A Advisory, CEO Strategy)

90%

Institutional / Education(Higher Education)

95%

Product / Market(Product Launch, Strategic Pivot)

82%

Financial Markets(Financial Markets)

60%

All 10 Case Studies

Hover any row for prediction details - click the link for the full case study

Case StudyDomainYearScoreAccuracy

Zomato Fee Hike

Consumer Pricing2026

100%

Liberation Day Tariffs

Trade Policy2025

9 1

95%

Meta Policy Change

Platform Governance2025

10 1

95%

IIMA Blended MBA Launch

Higher Education2024

10 1

95%

Oracle 30K Layoffs

Corporate Strategy2026

8 1

94%

HP-Autonomy Acquisition$8.8B Write-Down

M&A Advisory2011

9 1

90%

Quibi Launch$1.75B Burned

Product Launch2020

9 1

90%

JC Penney Fair & Square$985M Loss

CEO Strategy2012

8 1 1

85%

Meta Metaverse Pivot$80B+ Losses

Strategic Pivot2021

7 2 2

73%

S&P 500 2024 Forecast

Financial Markets2024

5 2 3

60%

10 case studies - 393 total agents - 8,785 interactions

Hit Partial Miss

Zomato Fee Hike

Consumer Pricing|2026

100%

Details

Liberation Day Tariffs

Trade Policy|2025

95%

9 1

Details

Meta Policy Change

Platform Governance|2025

95%

10 1

Details

IIMA Blended MBA Launch

Higher Education|2024

95%

10 1

Details

Oracle 30K Layoffs

Corporate Strategy|2026

94%

8 1

Details

HP-Autonomy Acquisition

M&A Advisory|2011

90%

9 1

Details

Quibi Launch

Product Launch|2020

90%

9 1

Details

JC Penney Fair & Square

CEO Strategy|2012

85%

8 1 1

Details

Meta Metaverse Pivot

Strategic Pivot|2021

73%

7 2 2

Details

S&P 500 2024 Forecast

Financial Markets|2024

60%

5 2 3

Details

Methodology

Every simulation follows the same reproducible 5-step pipeline

STEP 1

Seed Document

Publicly available pre-event data compiled into a structured scenario brief

STEP 2

Agent Generation

MiroFish creates 30-50+ agents with distinct personas, incentives, and behavioral patterns

STEP 3

Multi-Round Simulation

Agents interact across 30 rounds on Twitter/Reddit-like environments via OASIS engine

STEP 4

Report Synthesis

GPT-5.2 synthesizes agent interactions into a strategic forecast with specific predictions

STEP 5

Accuracy Scoring

Each prediction dimension scored as Hit, Partial, or Miss against documented outcomes

Key Findings

What we learned from running 10 simulations across 393 agents

88%

88% Directional Accuracy

Across 103 dimensions in 10 case studies, the simulation correctly predicted the direction of outcomes 88% of the time.

9/9

Non-Obvious Insights in Every Case

Every simulation surfaced at least one insight that was not part of mainstream expert or media analysis at the time.

60%

Beat Human Experts 60% of the Time

In 5 cases where a named expert made a public prediction, the simulation would have provided a materially better forecast 3 times.

95%+

Strongest in Multi-Stakeholder Domains

Accuracy was highest (95-100%) in scenarios with complex stakeholder interaction - pricing, trade policy, platform governance.

60%

Weakest in Magnitude Prediction

The simulation predicts direction well but struggles with precise magnitude (e.g., S&P 500 level predictions).

Limitations

Transparency about what this system can and cannot do

Retrospective Evaluation

Simulations were run on past events where outcomes are known.

Direction, Not Magnitude

The simulation predicts 'what happens,' not 'by exactly how much.'

LLM-Dependent

Agent quality is bounded by the underlying model's knowledge cutoff and reasoning.

Single Evaluator

Accuracy scoring performed by one research team, not independently replicated.

Want the full paper?

Get the complete research document with methodology details, raw data, and granular scoring for all 103 dimensions.

Request Full Paper Explore all case studies

Can AI Agents Predict WhatConsultants Cannot?

Accuracy by Domain

All 10 Case Studies

Methodology

Seed Document

Agent Generation

Multi-Round Simulation

Report Synthesis

Accuracy Scoring

Key Findings

88% Directional Accuracy

Non-Obvious Insights in Every Case

Beat Human Experts 60% of the Time

Strongest in Multi-Stakeholder Domains

Weakest in Magnitude Prediction

Limitations

Want the full paper?

Can AI Agents Predict What
Consultants Cannot?