Peer-reviewed methodology - MiroFish + OASIS

Can AI Agents Predict What
Consultants Cannot?

A systematic evaluation of multi-agent simulation for strategic decision-making, built on MiroFish swarm intelligence architecture and the peer-reviewed OASIS simulation engine.

All seed documents, ground truth data, and accuracy scores in this research are derived from publicly verifiable sources - news reports, SEC filings, earnings calls, regulatory announcements, and official company statements. Every claim can be independently verified. No proprietary or privileged data was used in any simulation.

0%

Overall Accuracy

0

Case Studies

0

Dimensions Tested

0%

Beat Human Experts

Accuracy by Domain

Performance varies by scenario complexity - multi-stakeholder domains yield the highest accuracy

Multi-Stakeholder(Consumer Pricing, Trade Policy, Platform Governance)
97%
Corporate / M&A(Corporate Strategy, M&A Advisory, CEO Strategy)
90%
Institutional / Education(Higher Education)
95%
Product / Market(Product Launch, Strategic Pivot)
82%
Financial Markets(Financial Markets)
60%

All 10 Case Studies

Hover any row for prediction details - click the link for the full case study

Zomato Fee Hike

Consumer Pricing|2026
100%
11
Details

Liberation Day Tariffs

Trade Policy|2025
95%
9 1
Details

Meta Policy Change

Platform Governance|2025
95%
10 1
Details

IIMA Blended MBA Launch

Higher Education|2024
95%
10 1
Details

Oracle 30K Layoffs

Corporate Strategy|2026
94%
8 1
Details

HP-Autonomy Acquisition

M&A Advisory|2011
90%
9 1
Details

Quibi Launch

Product Launch|2020
90%
9 1
Details

JC Penney Fair & Square

CEO Strategy|2012
85%
8 1 1
Details

Meta Metaverse Pivot

Strategic Pivot|2021
73%
7 2 2
Details

S&P 500 2024 Forecast

Financial Markets|2024
60%
5 2 3
Details

Methodology

Every simulation follows the same reproducible 5-step pipeline

STEP 1

Seed Document

Publicly available pre-event data compiled into a structured scenario brief

STEP 2

Agent Generation

MiroFish creates 30-50+ agents with distinct personas, incentives, and behavioral patterns

STEP 3

Multi-Round Simulation

Agents interact across 30 rounds on Twitter/Reddit-like environments via OASIS engine

STEP 4

Report Synthesis

GPT-5.2 synthesizes agent interactions into a strategic forecast with specific predictions

STEP 5

Accuracy Scoring

Each prediction dimension scored as Hit, Partial, or Miss against documented outcomes

Key Findings

What we learned from running 10 simulations across 393 agents

88%

88% Directional Accuracy

Across 103 dimensions in 10 case studies, the simulation correctly predicted the direction of outcomes 88% of the time.

9/9

Non-Obvious Insights in Every Case

Every simulation surfaced at least one insight that was not part of mainstream expert or media analysis at the time.

60%

Beat Human Experts 60% of the Time

In 5 cases where a named expert made a public prediction, the simulation would have provided a materially better forecast 3 times.

95%+

Strongest in Multi-Stakeholder Domains

Accuracy was highest (95-100%) in scenarios with complex stakeholder interaction - pricing, trade policy, platform governance.

60%

Weakest in Magnitude Prediction

The simulation predicts direction well but struggles with precise magnitude (e.g., S&P 500 level predictions).

Limitations

Transparency about what this system can and cannot do

Retrospective Evaluation

Simulations were run on past events where outcomes are known.

Direction, Not Magnitude

The simulation predicts 'what happens,' not 'by exactly how much.'

LLM-Dependent

Agent quality is bounded by the underlying model's knowledge cutoff and reasoning.

Single Evaluator

Accuracy scoring performed by one research team, not independently replicated.

Want the full paper?

Get the complete research document with methodology details, raw data, and granular scoring for all 103 dimensions.