Autonomous Cross-Disciplinary
Scientific Hypothesis Generation
via Adversarial Multi-Agent Systems
Alberto Trivero · Kakashi Venture Accelerator · Torino, Italy
Two predictions tested against real data
MAGELLAN produces predictions specific enough to test computationally. We verified two against public datasets and first-principles mathematics. Both produced novel scientific findings — including a case where the system generated its own diagnostic for a failed prediction.
Extreme Value Theory Applied to the Meltome Atlas
EXTREME VALUE THEORY → THERMAL PROTEOMICS
Cramér-Rao Bound for Plant Gravitropism
FISHER INFORMATION THEORY → PLANT GRAVITY SENSING
How MAGELLAN compares
Twelve AI scientific discovery systems evaluated across five axes. Data sourced from published papers, technical reports, and official documentation as of March 2026.
| System | Target Autonomy | Validation | Open | Published Results | Domain |
|---|---|---|---|---|---|
| MAGELLAN | Fully autonomous | Adversarial + cross-model | Yes | Yes (273 hyps.) | Life sciences |
| Swanson / Arrowsmith | Manual | Bibliometric | Yes | Yes | Biomedicine |
| SciAgents | Manual | KG paths + Critic | Yes | Partial | Cross-domain |
| Google AI Co-Scientist | Human-in-loop | Elo tournament | No | 3 validated | Biomedicine |
| FutureHouse / Kosmos | Human-defined | Multi-agent | Partial | Partial | Biomedicine |
| Sakana AI Scientist v2 | Semi-auto (ML) | Peer review sim. | Yes | Yes (ML) | Machine Learning |
| POPPER | Human-defined | Falsification | Yes | Yes | General |
| Virtual Lab | Human-defined | Wet-lab testing | Partial | Yes (nanobodies) | Protein eng. |
| MOOSE-Chem | Human-curated | Inspiration papers | Yes | Yes | Chemistry |
| TruthHypo | Human-defined | Grounding check | Yes | Yes | General |
| AlphaEvolve | Semi-auto | Evolutionary | Partial | Yes (algorithms) | Mathematics |
| Aletheia | Semi-auto (math) | Verifier-Reviser | No | Partial (4 proofs) | Mathematics |
Source: Table 1 of the paper. For detailed competitor profiles, see The Landscape.
What 22 sessions revealed
The Session Analyst extracts patterns across discovery sessions. These meta-learning signals now inform future target selection and hypothesis generation.
Mathematical constraints > analogies
Physical law bridges — conservation laws, information bounds, thermodynamic inequalities — survived adversarial critique at substantially higher rates than analogy-based connections. When a mathematical theorem applies, the hypothesis is partially validated by construction.
Disjoint fields > partially explored
Targets where the two fields have zero cross-field citation (classified DISJOINT) produced dramatically better hypotheses than partially explored connections. True blind spots are more productive than incremental extensions.
Strategy performance varies dramatically
The Scout’s 10 exploration strategies produced markedly different survival rates. Converging vocabularies led at 87.5%, followed by tool repurposing at ~67%. Recent breakthrough radiation underperformed at 13%, suggesting trending topics yield less novel connections.
Bridge types predict survival
The structural form of the cross-disciplinary connection predicts survival more reliably than strategy choice. Analytical tool transfers and physical law constraints achieved 100% survival. Direct EM field effects (0/8) and quantum entanglement (0/3) were universally killed — energy-scale mismatch is the consistent failure mode.
Honest limitations
The paper states these explicitly. We repeat them here because transparency is not optional.
No experimental validation
No MAGELLAN hypothesis has been tested in a laboratory. The computational verifications use published data and first-principles mathematics. The distance between “computationally plausible” and “experimentally confirmed” is the central gap in our evidence.
Life sciences bias
All 22 sessions involve at least one biological domain. Physics, mathematics, social sciences, and engineering are underrepresented. This reflects both retrieval infrastructure (PubMed, KEGG, STRING) and scoring bias toward experimental testability.
Missing ablations
No controlled experiments isolating which pipeline components contribute most to hypothesis quality. The 37% survival rate is a property of the complete system — we cannot currently attribute it to any specific component.
Citation hallucination
Approximately 2–3% of grounded claims contain citation errors — predominantly conflations (correct finding, wrong attribution) rather than fabrications. A single fabricated citation triggers automatic Quality Gate failure.
Abstract
We present MAGELLAN, an open-source multi-agent system built on Claude Code (Anthropic) that autonomously generates cross-disciplinary scientific hypotheses by connecting fields that lack mutual citation. Building on Swanson’s “Undiscovered Public Knowledge” principle (1986), MAGELLAN replaces bibliometric methods with frontier large language models (Claude Opus 4.6 and Sonnet 4.6) that have absorbed the literature of multiple disciplines.
The system comprises 15 specialized agents: a Scout that autonomously identifies promising cross-field bridges using 10 formalized strategies, a Generator that creates hypotheses from parametric knowledge, and an adversarial pipeline (Critic with 9 attack vectors, Quality Gate with ∼60% rejection rate, independent validation by GPT-5.4 Pro and Gemini 3.1 Pro) that systematically eliminates weak candidates.
Over 22 sessions, MAGELLAN generated 273 hypotheses, of which 102 (37%) survived the full adversarial pipeline. Surviving hypotheses are not surface analogies: they involve first-principles derivations from physical theorems applied to new biological domains, quantitative predictions with specific falsification experiments, and the kind of cross-disciplinary reasoning that would require years of cross-training for a human researcher.
We verify two hypotheses at different levels of a validation taxonomy: (1) applying extreme value theory to the Meltome Atlas (48,000 proteins, 13 species)—the first such application, producing novel findings (universal Weibull-domain behavior, measurement censoring diagnostic) despite failing to confirm the predicted correlation with growth temperature; (2) deriving the Cramér-Rao bound for plant gravitropic sensing from published statolith parameters—a mathematical derivation confirmed from first principles, showing plants operate 1–6× above the fundamental physical limit. A third hypothesis (percolation theory for tumor immune exclusion) has independent convergence evidence from a clinical trial, a patent, and a peer-reviewed publication.
Meta-learning across sessions reveals that mathematical constraints transfer more reliably than analogies, fully disjoint field pairs produce substantially better hypotheses (84% vs. 30% survival), and the bottleneck in AI-assisted discovery has shifted from hypothesis generation to domain expertise for evaluation.
Unlike proprietary systems (Google AI Co-Scientist, FutureHouse), MAGELLAN is fully open-source (Apache 2.0), fully autonomous in target selection, and publishes all results as CC0 public domain. Code, hypotheses, methodology, and computational verifications are publicly available.
Cite this work
@article{trivero2026magellan,
title={MAGELLAN: Autonomous Cross-Disciplinary Scientific
Hypothesis Generation via Adversarial Multi-Agent Systems},
author={Trivero, Alberto},
year={2026},
institution={Kakashi Venture Accelerator},
note={arXiv preprint}
}