Can AI Agents Predict What
Consultants Cannot?
A systematic evaluation of multi-agent simulation for strategic decision-making, built on MiroFish swarm intelligence architecture and the peer-reviewed OASIS simulation engine.
All seed documents, ground truth data, and accuracy scores in this research are derived from publicly verifiable sources - news reports, SEC filings, earnings calls, regulatory announcements, and official company statements. Every claim can be independently verified. No proprietary or privileged data was used in any simulation.
0%
Overall Accuracy
0
Case Studies
0
Dimensions Tested
0%
Beat Human Experts
Accuracy by Domain
Performance varies by scenario complexity - multi-stakeholder domains yield the highest accuracy
All 10 Case Studies
Hover any row for prediction details - click the link for the full case study
Zomato Fee Hike
Liberation Day Tariffs
Meta Policy Change
IIMA Blended MBA Launch
Oracle 30K Layoffs
HP-Autonomy Acquisition
Quibi Launch
JC Penney Fair & Square
Meta Metaverse Pivot
S&P 500 2024 Forecast
Methodology
Every simulation follows the same reproducible 5-step pipeline
Seed Document
Publicly available pre-event data compiled into a structured scenario brief
Agent Generation
MiroFish creates 30-50+ agents with distinct personas, incentives, and behavioral patterns
Multi-Round Simulation
Agents interact across 30 rounds on Twitter/Reddit-like environments via OASIS engine
Report Synthesis
GPT-5.2 synthesizes agent interactions into a strategic forecast with specific predictions
Accuracy Scoring
Each prediction dimension scored as Hit, Partial, or Miss against documented outcomes
Key Findings
What we learned from running 10 simulations across 393 agents
88% Directional Accuracy
Across 103 dimensions in 10 case studies, the simulation correctly predicted the direction of outcomes 88% of the time.
Non-Obvious Insights in Every Case
Every simulation surfaced at least one insight that was not part of mainstream expert or media analysis at the time.
Beat Human Experts 60% of the Time
In 5 cases where a named expert made a public prediction, the simulation would have provided a materially better forecast 3 times.
Strongest in Multi-Stakeholder Domains
Accuracy was highest (95-100%) in scenarios with complex stakeholder interaction - pricing, trade policy, platform governance.
Weakest in Magnitude Prediction
The simulation predicts direction well but struggles with precise magnitude (e.g., S&P 500 level predictions).
Limitations
Transparency about what this system can and cannot do
Retrospective Evaluation
Simulations were run on past events where outcomes are known.
Direction, Not Magnitude
The simulation predicts 'what happens,' not 'by exactly how much.'
LLM-Dependent
Agent quality is bounded by the underlying model's knowledge cutoff and reasoning.
Single Evaluator
Accuracy scoring performed by one research team, not independently replicated.
Want the full paper?
Get the complete research document with methodology details, raw data, and granular scoring for all 103 dimensions.