The Numbers
Across 10 case studies spanning consumer pricing, trade policy, platform governance, corporate strategy, M&A, product launches, higher education, and financial markets, our MiroFish-powered simulations scored:
- 92 total dimensions scored
- 85 correct (HITs + 0.5 × PARTIALs)
- 88% overall directional accuracy
- Strongest domain: Multi-stakeholder dynamics (96%)
- Weakest domain: Quantitative market forecasting (60%)
The 10 Cases at a Glance
| Case | Domain | Accuracy | Dimensions |
|---|---|---|---|
| Zomato Fee Hike | Consumer Pricing | 100% | 11/11 |
| Liberation Day Tariffs | Trade Policy | 95% | 9.5/10 |
| Meta Policy Change | Platform Governance | 95% | 10.5/11 |
| Oracle Layoffs | Corporate Strategy | 94% | 8.5/9 |
| HP-Autonomy | M&A Advisory | 90% | 9/10 |
| Quibi Launch | Product Launch | 90% | 9/10 |
| JC Penney EDLP | Retail Strategy | 85% | 8.5/10 |
| Meta Metaverse | Strategic Pivot | 73% | 8/11 |
| IIMA Blended MBA | Higher Education | 95% | 10.5/11 |
| S&P 500 Forecast | Financial Markets | 60% | 6/10 |
Where Simulation Excels
Multi-Stakeholder Dynamics: 96% Accuracy
When the scenario involves multiple stakeholder groups reacting to each other - consumers, investors, competitors, regulators, media - simulation performs at near-perfect accuracy. The Zomato, tariffs, Meta policy, and Oracle cases all scored above 94%.
The reason is structural: multi-agent simulation is specifically designed to model these dynamics. Each agent has distinct incentives, and the emergent behavior that arises from their interactions captures the cascading effects that no single-stakeholder analysis can predict.
Pattern: The more stakeholders involved, the better simulation performs relative to traditional analysis.
Behavioral Predictions Over Stated Preferences: Consistently Strong
Across all 10 cases, simulation was consistently better at predicting what people do than what they say. In the Zomato case, the simulation correctly predicted that consumers would adapt (basket inflation, frequency reduction) rather than boycott - the opposite of what surveys would have found.
In the JC Penney case, the simulation predicted that customers would miss coupons for their motivational value (the "hunt"), not their monetary value - a behavioral insight that explained the traffic collapse better than any pricing analysis.
Pattern: Simulation captures revealed preferences, not stated preferences.
Non-Obvious Insight Generation: The Most Valuable Output
Every case study surfaced at least one non-obvious insight that went beyond simple directional prediction:
- Zomato: "Silent frequency collapse" - the real risk is quiet behavior change, not loud backlash
- Tariffs: "Bond market stress as policy constraint" - bonds, not stocks, forced the walk-back
- Meta: "Notes wars" - the new battleground is annotation legitimacy, not content removal
- Oracle: "Paper capacity vs rack capacity" - backlog is a financial artifact, not delivery proof
- HP: "Governance shortcuts as credibility accelerants" - process failures amplify outcome failures
- Quibi: "Conversion hinge" - synchronized churn cliff when free trials end
- JC Penney: "Caught between Walmart and Nordstrom" - positioning death in no-man's-land
Pattern: Emergence is the simulation's superpower. Non-obvious insights arise from agent interactions, not from any individual agent's logic.
Where Simulation Struggles
Quantitative Precision: 60% Accuracy
Our worst performance was on the S&P 500 forecast, where we achieved 60% accuracy - barely better than coin-flip on several dimensions. The simulation correctly identified the bull-over-bear direction but missed badly on magnitude: we predicted ~5,068, the actual close was 5,881.
This isn't surprising. Multi-agent simulation models social dynamics, not quantitative market mechanics. It can tell you "investors will be bullish" - it can't tell you the S&P will reach exactly 5,881. That requires quantitative models with different architectures.
Pattern: Use simulation for direction, not digits.
Complex Strategic Pivots: 73% Accuracy
The Meta Metaverse case (73%) was our second-weakest performance. The simulation correctly identified many dynamics (regulatory risk, creator resistance, antitrust constraints) but missed the magnitude of financial losses and the eventual pivot to AI that saved the company.
The challenge: strategic pivots play out over years, not weeks. Our simulation models 30 rounds (~14-30 days). The Meta Metaverse story took 3+ years to unfold. The simulation captured the initial dynamics accurately but couldn't model the multi-year trajectory.
Pattern: Simulation is strongest over 2-week to 3-month horizons. Multi-year predictions require different methodologies.
Specific Magnitudes: Consistent Weakness
Across all 10 cases, the simulation's most common miss type was magnitude rather than direction. We correctly predicted HP's stock would drop but didn't predict the exact 20%. We correctly predicted JC Penney's sales would decline but underestimated the -32% same-store crash.
Pattern: "Which direction" is reliable. "How much" is not.
Systematic Biases We've Identified
LLM herd bias: Agent opinions tend to converge faster than real human populations. We partially mitigate this with diverse persona specifications, but it remains a factor. In the Oracle case, the simulation was more cautious about investor bullishness than the actual market reaction, likely because the bear-case agents were too persuasive relative to real-world investment dynamics.
Recency bias from training data: GPT-5.2's training data includes information about events we're simulating. While we use strict pre-event seed documents, the model's prior knowledge can subtly influence agent behavior. This is the biggest methodological concern and the reason we're transparent about it.
Western-centric behavioral patterns: When simulating Indian market dynamics, agents occasionally default to behavioral patterns more typical of US consumers. Our human analysts correct for this, but it requires vigilance.
What 88% Means in Practice
Is 88% good? It depends on the alternative.
Wall Street consensus forecasts for the S&P 500 have been wrong by an average of 13-15% annually over the past decade. CEO predictions about their own strategic initiatives fail at rates well above 50% (see: JC Penney, HP, Quibi, Meta Metaverse).
88% directional accuracy across diverse domains - when the typical alternative is confidence-weighted guessing - is a meaningful improvement. Not because 88% is perfect, but because the alternative isn't 100%.
The value proposition isn't "we're always right." It's "we're right more often than the alternatives, we're faster, we're cheaper, and we surface risks that the alternatives systematically miss."
That's a proposition worth building a firm on.
Explore the individual case studies and their scoring details at /case-studies. Want to add simulation to your decision process? Contact us.