ARXIV PREPRINT · APRIL 2026

Autonomous Cross-Disciplinary
Scientific Hypothesis Generation

via Adversarial Multi-Agent Systems

Alberto Trivero · Kakashi Venture Accelerator · Torino, Italy

273
Hypotheses generated
across 22 sessions
37%
Survival rate
102 passed adversarial pipeline
15
Specialized agents
adversarial multi-agent
2
Computational verifications
GEV analysis + Cramer-Rao
~60%
QG rejection rate
Quality Gate filtration
THE EVIDENCE

Two predictions tested against real data

MAGELLAN produces predictions specific enough to test computationally. We verified two against public datasets and first-principles mathematics. Both produced novel scientific findings — including a case where the system generated its own diagnostic for a failed prediction.

1Novel discoveries

Extreme Value Theory Applied to the Meltome Atlas

EXTREME VALUE THEORY → THERMAL PROTEOMICS

What MAGELLAN foundThe system autonomously connected extreme value theory to thermal proteomics — a connection no researcher had made (zero PubMed co-occurrences across 8 query variants). Fitting GEV distributions to 48,000 proteins across 13 species revealed universal Weibull-domain behavior across all three domains of life: every proteome has a finite upper bound on thermal stability.
Self-correcting portfolioOne hypothesis from this session predicted that the GEV shape parameter should correlate with growth temperature — this specific prediction was falsified. But a companion hypothesis from the same session independently predicted that TPP measurement censoring would create artifacts for thermophilic species. When we tested the first prediction and found the T. thermophilus outlier, it confirmed exactly what the companion had warned about. The system generated both the prediction and its own diagnostic.
Why this mattersThe first application of extreme value theory to thermal proteomics. The universal Weibull-domain finding and the measurement censoring diagnostic each merit independent publication — and neither existed before MAGELLAN identified this cross-field connection.
13 species · 48,000 proteins · First application of EVT to thermal proteomics
2Confirmed

Cramér-Rao Bound for Plant Gravitropism

FISHER INFORMATION THEORY → PLANT GRAVITY SENSING

What MAGELLAN foundThe Cramér-Rao bound from statistical estimation theory sets a fundamental physical limit on how precisely plants can sense gravity, derivable from published statolith parameters. This is the first application of Fisher information to plant gravitropism — zero PubMed co-occurrences across 8 query variants.
ResultConfirmed. Plants operate 1–6× above the fundamental physical limit at the standard test angle — comparable to photoreceptors (1–5×) and bacterial chemotaxis sensors (2–10×). This is not a statistical hypothesis awaiting lab testing — it is a mathematical derivation confirmed from published measurements.
Why this mattersThe derivation required simultaneous knowledge of Fisher information, Boltzmann-distributed particles in gravitational potentials, and statolith dynamics in plant columella cells. A human researcher would need training in both statistical physics and plant biology — precisely the kind of cross-disciplinary expertise that Swanson identified as the bottleneck.
0 prior literature · Confirmed from published dataSee all 6 surviving hypotheses →
THE LANDSCAPE

How MAGELLAN compares

Twelve AI scientific discovery systems evaluated across five axes. Data sourced from published papers, technical reports, and official documentation as of March 2026.

SystemTarget AutonomyValidationOpenPublished ResultsDomain
MAGELLANFully autonomousAdversarial + cross-modelYesYes (273 hyps.)Life sciences
Swanson / ArrowsmithManualBibliometricYesYesBiomedicine
SciAgentsManualKG paths + CriticYesPartialCross-domain
Google AI Co-ScientistHuman-in-loopElo tournamentNo3 validatedBiomedicine
FutureHouse / KosmosHuman-definedMulti-agentPartialPartialBiomedicine
Sakana AI Scientist v2Semi-auto (ML)Peer review sim.YesYes (ML)Machine Learning
POPPERHuman-definedFalsificationYesYesGeneral
Virtual LabHuman-definedWet-lab testingPartialYes (nanobodies)Protein eng.
MOOSE-ChemHuman-curatedInspiration papersYesYesChemistry
TruthHypoHuman-definedGrounding checkYesYesGeneral
AlphaEvolveSemi-autoEvolutionaryPartialYes (algorithms)Mathematics
AletheiaSemi-auto (math)Verifier-ReviserNoPartial (4 proofs)Mathematics

Source: Table 1 of the paper. For detailed competitor profiles, see The Landscape.

META-LEARNING

What 22 sessions revealed

The Session Analyst extracts patterns across discovery sessions. These meta-learning signals now inform future target selection and hypothesis generation.

Mathematical constraints > analogies

Physical law bridges — conservation laws, information bounds, thermodynamic inequalities — survived adversarial critique at substantially higher rates than analogy-based connections. When a mathematical theorem applies, the hypothesis is partially validated by construction.

100% survival for physical law constraints (TUR) vs. 0% for EM field effects

Disjoint fields > partially explored

Targets where the two fields have zero cross-field citation (classified DISJOINT) produced dramatically better hypotheses than partially explored connections. True blind spots are more productive than incremental extensions.

84% pass+conditional for DISJOINT vs. 30% for PARTIALLY_EXPLORED

Strategy performance varies dramatically

The Scout’s 10 exploration strategies produced markedly different survival rates. Converging vocabularies led at 87.5%, followed by tool repurposing at ~67%. Recent breakthrough radiation underperformed at 13%, suggesting trending topics yield less novel connections.

Converging vocabularies: 87.5% | Tool repurposing: ~67% | Recent breakthrough: 13%

Bridge types predict survival

The structural form of the cross-disciplinary connection predicts survival more reliably than strategy choice. Analytical tool transfers and physical law constraints achieved 100% survival. Direct EM field effects (0/8) and quantum entanglement (0/3) were universally killed — energy-scale mismatch is the consistent failure mode.

Tool transfers: 100% | Physical laws: 100% | EM effects: 0% | Quantum: 0%

Honest limitations

The paper states these explicitly. We repeat them here because transparency is not optional.

No experimental validation

No MAGELLAN hypothesis has been tested in a laboratory. The computational verifications use published data and first-principles mathematics. The distance between “computationally plausible” and “experimentally confirmed” is the central gap in our evidence.

Life sciences bias

All 22 sessions involve at least one biological domain. Physics, mathematics, social sciences, and engineering are underrepresented. This reflects both retrieval infrastructure (PubMed, KEGG, STRING) and scoring bias toward experimental testability.

Missing ablations

No controlled experiments isolating which pipeline components contribute most to hypothesis quality. The 37% survival rate is a property of the complete system — we cannot currently attribute it to any specific component.

Citation hallucination

Approximately 2–3% of grounded claims contain citation errors — predominantly conflations (correct finding, wrong attribution) rather than fabrications. A single fabricated citation triggers automatic Quality Gate failure.

FULL PAPER

Abstract

We present MAGELLAN, an open-source multi-agent system built on Claude Code (Anthropic) that autonomously generates cross-disciplinary scientific hypotheses by connecting fields that lack mutual citation. Building on Swanson’s “Undiscovered Public Knowledge” principle (1986), MAGELLAN replaces bibliometric methods with frontier large language models (Claude Opus 4.6 and Sonnet 4.6) that have absorbed the literature of multiple disciplines.

The system comprises 15 specialized agents: a Scout that autonomously identifies promising cross-field bridges using 10 formalized strategies, a Generator that creates hypotheses from parametric knowledge, and an adversarial pipeline (Critic with 9 attack vectors, Quality Gate with ∼60% rejection rate, independent validation by GPT-5.4 Pro and Gemini 3.1 Pro) that systematically eliminates weak candidates.

Over 22 sessions, MAGELLAN generated 273 hypotheses, of which 102 (37%) survived the full adversarial pipeline. Surviving hypotheses are not surface analogies: they involve first-principles derivations from physical theorems applied to new biological domains, quantitative predictions with specific falsification experiments, and the kind of cross-disciplinary reasoning that would require years of cross-training for a human researcher.

We verify two hypotheses at different levels of a validation taxonomy: (1) applying extreme value theory to the Meltome Atlas (48,000 proteins, 13 species)—the first such application, producing novel findings (universal Weibull-domain behavior, measurement censoring diagnostic) despite failing to confirm the predicted correlation with growth temperature; (2) deriving the Cramér-Rao bound for plant gravitropic sensing from published statolith parameters—a mathematical derivation confirmed from first principles, showing plants operate 1–6× above the fundamental physical limit. A third hypothesis (percolation theory for tumor immune exclusion) has independent convergence evidence from a clinical trial, a patent, and a peer-reviewed publication.

Meta-learning across sessions reveals that mathematical constraints transfer more reliably than analogies, fully disjoint field pairs produce substantially better hypotheses (84% vs. 30% survival), and the bottleneck in AI-assisted discovery has shifted from hypothesis generation to domain expertise for evaluation.

Unlike proprietary systems (Google AI Co-Scientist, FutureHouse), MAGELLAN is fully open-source (Apache 2.0), fully autonomous in target selection, and publishes all results as CC0 public domain. Code, hypotheses, methodology, and computational verifications are publicly available.

Cite this work

BibTeX
@article{trivero2026magellan,
  title={MAGELLAN: Autonomous Cross-Disciplinary Scientific
         Hypothesis Generation via Adversarial Multi-Agent Systems},
  author={Trivero, Alberto},
  year={2026},
  institution={Kakashi Venture Accelerator},
  note={arXiv preprint}
}