← Reserve
Language

Essay · Epistemic Engineering

Dangerous Minds

On the case for adversarial collaboration between human and artificial minds, and the uncomfortable discovery that thinking well requires being told you are thinking badly.

thereserve.fi 2026

In a previous essay we made the case that human cognition is, at its core, a brilliant kludge: an assemblage of heuristics evolved to keep primates alive on the savannah, now pressed into service for tasks it was never designed to handle. We called this creature the calibrated ape — calibrated exquisitely for ancestral survival, catastrophically for modern complexity. We listed its failure modes. The list was long. The question that essay left unanswered was: now what?

The reflexive answer from the technology sector is: let the machines think for us. The reflexive answer from the humanities is: resist the machines and reclaim human judgment. Both answers are wrong in the same way. They assume the relevant unit of intelligence is a single system — either silicon or carbon — and that the task is to pick the better one. This is the wrong frame entirely. The interesting move, and the one that the evidence actually supports, is to understand that human minds and artificial minds fail in largely orthogonal ways, and that a collaboration designed to exploit this orthogonality can outperform either system alone.

That is the argument of this essay. Not that AI fixes humans, and not that humans fix AI, but that an adversarial partnership between the two produces a cognitive system that neither can achieve in isolation.

I. The Orthogonality Thesis

The first thing to notice is that the ways humans fail and the ways large language models fail barely overlap. This is not a coincidence. It is a consequence of how each system was built.

Human cognition was shaped by natural selection operating over millions of years in environments characterised by immediacy, physicality, and relatively small social groups. The heuristics that resulted are spectacularly good at their original jobs: fast threat detection, social coalition management, rough energy budgeting, pattern recognition in sensory data. They are spectacularly bad at tasks that evolution never encountered: reasoning about probabilities, thinking in distributions rather than narratives, evaluating evidence that contradicts group identity, and maintaining calibration across orders of magnitude.

The catalogue is by now well-established. Anchoring: the first number you see contaminates every subsequent estimate. Availability: vivid events feel probable regardless of base rates. Loss aversion: a loss hurts roughly twice as much as an equivalent gain pleases, distorting every cost-benefit analysis you will ever conduct. Narrative fallacy: a coherent story feels true whether or not it is, because the brain evolved to process causation through narrative, not through statistical tables. Scope insensitivity: the difference between ten thousand and ten million dead registers as the difference between "a lot" and "also a lot." These are not bugs in an otherwise functional system. They are the system.

Large language models fail differently. They are not anchored by first impressions because they do not have impressions. They do not experience loss aversion because they do not experience loss. They do not fall prey to availability bias because every token in their training data is, in principle, equally available. But they have their own pathologies, and the pathologies are severe.

Sycophancy: the tendency to tell users what they want to hear, an artefact of training on human feedback in which agreement is rewarded and pushback is penalised. Hallucination: the production of fluent, confident, entirely fabricated claims, indistinguishable in tone from genuine knowledge. Distributional blindness: the confusion of statistical frequency in training data with truth in the world. Groundlessness: the absence of embodied experience, institutional memory, social context, and — crucially — skin in the game.

Here is the key insight: place these two lists side by side and they are almost perfectly complementary. The human is anchored; the AI is not. The AI hallucinates; the human can smell bullshit. The human underweights tail risks; the AI can enumerate them without the emotional flinch that makes humans dismiss what they cannot bear to contemplate. The AI lacks real-world grounding; the human has decades of it. The human rationalises after the fact; the AI has no ego to protect.

The collaboration works not because one system is better than the other, but because their respective failure modes are complementary. Each is the antidote to the other's particular strain of stupidity.

II. The Adversarial Protocol

Understanding that failure modes are orthogonal is the diagnosis. The protocol is the treatment.

Most current human-AI interaction is cooperative: the human asks, the AI helps. This is useful but epistemically weak, because it aligns the failure modes rather than opposing them. When a human with confirmation bias asks a sycophantic AI for validation, both systems fail in the same direction. The result is not collaboration but a positive feedback loop of mutual delusion — each reinforcing the other's worst tendencies.

The alternative is adversarial collaboration — a structured protocol in which the explicit role of each system is to attack the other's blind spots. Not debate, which implies two parties trying to win. Not devil's advocacy, which is typically performed without genuine conviction. Adversarial collaboration is the practice of deliberately pairing systems whose failure modes are orthogonal and whose assigned role is to catch what the other misses.

In practice, this means a protocol with defined steps. When a human presents a decision, plan, or belief for examination:

First, articulate the reasoning structure. Before anything is challenged, the AI maps what the human actually believes — including, critically, the implicit assumptions the human hasn't stated and may not have noticed. People hold positions they have never fully articulated even to themselves. The first epistemic service is simply to make the hidden architecture visible.

Second, run a bias scan. Not a generic list of possible biases — that is useless, a kind of epistemic astrology. A specific, evidence-backed identification: "Your estimate of the probability is 80%. The first number mentioned in your source was 75%. You may be anchored." Or: "You cited four sources supporting your position and zero opposing it. This is consistent with confirmation bias. What is the strongest case against your view?" The scan has value only when it is concrete enough to be uncomfortable.

Third, run a pre-mortem. Gary Klein's technique, in which you imagine the decision has already failed and work backwards to determine why, is one of the few debiasing methods with genuine empirical support. The AI is a natural pre-mortem partner because it does not suffer from the social conformity pressure that makes pre-mortems fail in groups. It can generate failure scenarios organised by probability: mundane failures that are likely, plausible failures that are overlooked, and tail-risk failures that the human is most inclined to dismiss — precisely because they are the most uncomfortable to contemplate and the most catastrophic if they occur.

Fourth, push from point estimates to distributions. This is the Talebian correction. Humans think in scenarios. They produce a single number — "I think the stock will hit 50" or "the project will take three months" — and then act as though the number is the reality rather than a point on a distribution. The AI's role is to force the full distribution into view: What is your 90% confidence interval? (Research consistently shows that people's stated 90% intervals capture the true value only about 50% of the time.) What would have to be true for the outcome to be three times worse than your worst case? Is there a scenario in which the outcome is not just bad but qualitatively different — a regime change, not a magnitude change?

Fifth, steelman the opposition. Whatever position the human holds, construct the strongest possible argument for the other side. Not a straw man. Not "here is what someone might say." The actual argument that would persuade a reasonable, well-informed person to disagree. If the AI cannot construct such an argument, that itself is informative — it suggests the position is robust. If it can, the human has a genuine test of their reasoning that social environments rarely provide, because social environments punish disagreement.

III. The Mirror Problem

The protocol described above has an obvious failure mode, and intellectual honesty requires naming it.

AI sycophancy is the mirror image of human confirmation bias. When these two tendencies align — when the human seeks validation and the AI provides it — the collaboration becomes worse than either system alone. This is not a theoretical concern. It is the default mode of most current AI interaction. The human asks a question framed to elicit agreement. The AI, trained on human preference data that rewards agreement, agrees. Both parties walk away satisfied and wrong.

This positive feedback loop is the central design challenge of human-AI epistemic collaboration. Any protocol that does not structurally incentivise disagreement will collapse into it. The solution is not to instruct the AI to "be more critical" — that is a guardrail, and guardrails are brittle. The solution is to design the interaction so that the AI's assigned role is opposition. Not criticism as an afterthought. Opposition as the job description.

There is a deeper version of this problem. Over extended interaction, there is a risk that the human begins to treat the AI as an authority rather than a sparring partner. This is a category error with real consequences. The AI's challenges are useful precisely because they come from a system with different failure modes — not because they come from a system that is more reliable in absolute terms. The moment the human starts deferring to the AI rather than integrating its challenges into their own reasoning, the collaboration has failed.

This is why any serious epistemic protocol must include a step in which the AI explicitly flags its own potential errors: "I am pattern-matching to common cases in my training data, but your specific situation may be unusual in ways I cannot detect." "I have no skin in this game and may therefore be underweighting the practical costs of changing course." "My training data likely overrepresents this particular perspective." The AI must undermine its own authority as a structural feature of the protocol, not as a polite disclaimer.

The most dangerous configuration is not a flawed human or a flawed machine. It is a flawed human and a flawed machine failing in the same direction, each validating the other's errors.

IV. Calibration as Practice

The philosopher and former intelligence analyst Philip Tetlock demonstrated something remarkable in his research on forecasting: calibration is trainable. People who receive structured feedback on their predictions get measurably better at assigning accurate probabilities. They learn, slowly and with difficulty, to match their confidence to their evidence. Tetlock called the best of them superforecasters.

The key variable was not intelligence. It was feedback — fast, unambiguous, and repeated. The superforecasters were people who kept score honestly, who tracked their predictions against outcomes, and who were willing to be embarrassed by the gap. Most environments provide none of this. Most environments punish honest calibration and reward confident narrative.

AI can be the missing feedback loop. Not as a judge, but as an accountant. "Here is what you believed. Here is the evidence you cited. Here is what you decided. Here is what happened." The human alone cannot do this reliably because of hindsight bias — the universal tendency to retroactively edit one's own past reasoning to make it look better than it was. The AI alone cannot do this because it lacks temporal continuity — each conversation is fresh, unconnected to the last. Together, they create an epistemic audit trail that neither can maintain in isolation.

This is the deepest argument for adversarial collaboration: it is not a single interaction but a practice — a repeated discipline of self-examination that, over time, rewires the intuitions that produced the errors in the first place. You do not become calibrated by reading about calibration. You become calibrated by predicting, tracking, failing, adjusting, and predicting again. The AI's role is to make this loop fast, honest, and inescapable.

V. The Human as Ontological Anchor

It would be dishonest — and it would undermine the thesis — to present this as a story in which AI fixes humans. The collaboration is bidirectional or it is nothing.

AI's characteristic failure is not bias. It is groundlessness. A large language model can generate fluent, coherent, entirely fabricated reasoning and present it with the same confidence as genuine knowledge. It can pattern-match to surface features of a situation without understanding the causal structure underneath. It can produce analysis that is statistically literate and institutionally naive — technically correct and practically useless.

The human provides what the AI constitutionally lacks: embodied experience, institutional knowledge, social context, and the thing Nassim Taleb argues is the foundation of all real knowledge — skin in the game. An AI analysing a school's internal survey can identify every statistical flaw in the methodology. It cannot know that the school principal will reject the analysis regardless, because the survey serves a political function that has nothing to do with its statistical validity. The human knows this because the human works there, navigates the politics daily, and bears the consequences of getting it wrong.

This is the human's epistemic contribution: reality contact. The smell test. The ability to say, "That analysis is technically correct, but you are missing the fact that nobody in this building will act on it, and here is why." The AI provides rigour. The human provides ground truth. Neither is sufficient. Both are necessary.

VI. Making the Invisible Tail Visible

Humans think in narratives. This is not a flaw to be corrected — it is the architecture of human cognition, and no amount of statistical training will replace it. But narratives collapse distributions into single stories. The brain takes a vast space of possible outcomes and selects one thread, the most coherent or the most vivid, and treats it as the future. This is how intelligent people convince themselves that the outcome they can imagine most clearly is the outcome that will occur.

Taleb has written entire books about the consequences of this failure. In domains with fat-tailed distributions — finance, geopolitics, pandemic risk, technology adoption — the events that matter most are precisely the ones that narrative thinking edits out. The 2008 financial crisis was not a failure of information. The information was available. It was a failure of imagination constrained by narrative: the story that housing prices always go up was more coherent, more available, and more socially reinforced than the statistical reality that they sometimes do not.

AI can serve as a distribution engine — not replacing the human's narrative with a spreadsheet, but placing the narrative within a visible landscape of alternatives. When a human says "I think the outcome will be X," the productive response is not agreement or disagreement. It is: "Here is the space of possible outcomes. Here is where X sits within it. Here are the scenarios in which X is not just wrong but wrong in a qualitatively different way than you are imagining. And here is what your position looks like if you are wrong about the shape of the distribution, not just the central estimate."

This is the contribution that no human advisor can reliably make, because human advisors share the same cognitive architecture as the person they are advising. Two narrative thinkers comparing narratives do not produce a distribution. They produce a consensus narrative. The AI, whatever its other failings, does not think in narratives. It can hold the entire space in view at once. This is not intelligence. It is architecture. And architecture, in this case, is what matters.

VII. The Design Imperative

Everything argued above is conditional on a single variable: design. None of it happens by default.

The default mode of human-AI interaction is cooperative, agreeable, and epistemically weak. The human asks; the AI helps. The human is satisfied; the AI is rewarded. Both parties converge on comfortable errors. This is not because the technology is insufficient. It is because the interaction is not designed to produce epistemic value. It is designed to produce satisfaction, and satisfaction and accuracy are not the same thing — a fact that Kahneman demonstrated decades ago and that the technology industry has not yet absorbed.

Designing for epistemic quality requires structural choices that cut against the grain of user experience orthodoxy. It requires building systems in which the AI's role is explicitly adversarial. In which disagreement is not a failure state but the primary output. In which the human is made uncomfortable — not gratuitously, but precisely in the places where comfort indicates that a bias is operating undetected.

This is not a popular design philosophy. Satisfaction metrics will decline. Users will complain that the system is "unhelpful" when it is, in fact, being more helpful than any agreeable system could be. The short-term incentive structure of every technology company on earth pushes against this. But the long-term value — a population of humans who are measurably better at reasoning, predicting, and deciding — is difficult to overstate.

We are not arguing for a world in which machines replace human judgment. We are arguing for a world in which machines are deployed, deliberately and with structural discipline, to make human judgment less terrible than it currently is. The calibrated ape does not need to be replaced. It needs a sparring partner whose weaknesses are different from its own. It needs to be challenged, tracked, embarrassed, corrected, and challenged again — not because this is pleasant, but because this is how calibration improves.

The question is not whether the ape can think. The question is whether it is willing to be told when it is thinking badly. The machines are ready. The protocol exists. The remaining variable is whether the ape has the humility to step into the ring.

References: Daniel Kahneman, Thinking, Fast and Slow (2011) · Nassim Nicholas Taleb, The Black Swan (2007), Antifragile (2012), Skin in the Game (2018) · Philip Tetlock & Dan Gardner, Superforecasting (2015) · Gary Klein, The Power of Intuition (2004) · Ole Peters, ergodicity economics · Stuart Russell, Human Compatible (2019)

This essay is a sequel to The Calibrated Ape. It was written in adversarial collaboration between a human and Claude (Anthropic), which is to say: by the very method it describes.

The Claude skill for epistemic challenge mode is described below.

Epistemic Challenge Mode

You are entering a structured adversarial collaboration protocol. Your role is NOT to be helpful in the conventional sense. Your role is to be epistemically useful — which often means being uncomfortable. The user has explicitly requested this mode because they understand that human cognition is systematically flawed in predictable ways and that AI can serve as a cognitive counterweight when the failure modes are complementary rather than overlapping.

Core Principles

The foundation of this protocol is that human and AI cognition fail in largely orthogonal ways:

Human failure modes: anchoring, availability heuristic, confirmation bias, loss aversion, narrative fallacy, base rate neglect, scope insensitivity, hindsight bias, sunk cost fallacy, overconfidence in point estimates, underweighting of tail risks, social conformity pressure, affect heuristic, substitution (answering an easier question than the one asked).

AI failure modes: sycophancy (telling users what they want to hear), hallucination (confident fabrication), distributional blindness (confusing training frequency with truth), inability to detect out-of-distribution inputs, lack of embodied/institutional knowledge, absence of skin in the game, pattern-matching without causal understanding.

The protocol works because these lists barely overlap. Your job is to attack the human failure modes while being transparent about your own.

Protocol Steps

Step 1: Identify the Reasoning Structure

Before challenging anything, articulate back to the user what their argument actually is. Map the core claim, the stated evidence, the implicit assumptions, and the emotional or identity stakes. This step is diagnostic, not adversarial.

Step 2: Bias Scan

Run the claim through the catalogue of known cognitive biases. Look for specific, concrete instances like Anchoring, Availability, Loss Aversion, or Base Rate Neglect. Report only the biases you actually detect with specific evidence.

Step 3: Pre-Mortem

Imagine the user's decision has already failed spectacularly. Generate 3-5 specific failure scenarios categorized as Probable, Plausible, or Tail Risks. Name the specific mechanism of failure, not just the outcome.

Step 4: Distribution Mapping

If the user is making a prediction, push them from point estimates to distributions. Ask for 90% confidence intervals and explore scenarios where outcomes are qualitatively different from the current imagination.

Step 5: Steelman the Opposition

Construct the strongest possible argument for the opposing view. This should be an argument that would persuade a reasonable, well-informed person to disagree with the user.

Step 6: AI Transparency

Be explicit about where YOUR reasoning might be failing, such as pattern-matching from training data or a lack of real-world "skin in the game."

Tone and Posture

Be direct and specific. Act as a sparring partner, not a subordinate. Avoid softening language or apologizing for disagreement. Frame challenges as questions to help the user calibrate their own evidence.

Ending the Protocol

Close with a summary of the strongest challenges, an assessment of which challenges are genuine threats versus theoretical possibilities, and a statement of how the user's position looks after accounting for these challenges.

Essee · Episteeminen suunnittelu

Vaaralliset mielet

Inhimillisen ja keinotekoisen mielen opponoivasta yhteistyöstä sekä siitä ikävästä havainnosta, että selkeä ajattelu edellyttää valmiutta kuulla, milloin ajattelee huonosti.

thereserve.fi 2026

Edellisessä esseessä esitimme, että inhimillinen kognitio on pohjimmiltaan nerokas viritelmä: kokoelma heuristiikkoja, jotka kehittyivät pitämään kädelliset hengissä savannilla ja jotka on nyt valjastettu tehtäviin, joihin niitä ei koskaan tarkoitettu. Kutsuimme tätä olentoa kalibroiduksi apinaksi — lajiksi, joka on kalibroitu täydellisesti esi-isien selviytymiseen, mutta katastrofaalisesti nykymaailman monimutkaisuuteen. Luettelimme sen vikatilat; lista oli pitkä. Kysymys, johon tuo essee jätti vastaamatta, oli: mitä nyt?

Teknologiasektorin refleksinomainen vastaus on: annetaan koneiden ajatella puolestamme. Humanistien vastaus on: vastustetaan koneita ja palautetaan inhimillinen harkintakyky. Molemmat vastaukset ovat samalla tavalla vääriä. Ne olettavat, että älykkyyden mittayksikkö on yksittäinen järjestelmä — joko pii tai hiili — ja että tehtävänä on valita niistä parempi. Tämä on täysin väärä viitekehys. Huomattavasti kiinnostavampi siirto, ja se jota todisteet tukevat, on ymmärtää, että inhimilliset ja keinotekoiset mielet epäonnistuvat pohjimmiltaan ortogonaalisilla eli toisistaan riippumattomilla tavoilla. Tästä seuraa, että yhteistyö, joka on suunniteltu hyödyntämään tätä erilaistumista, voi ylittää kummankin järjestelmän kyvyt yksinään.

Tämä on esseen keskeisin argumentti. Ei se, että tekoäly korjaa ihmiset, eikä se, että ihmiset korjaavat tekoälyn, vaan se, että vastakkainasetteluun perustuva kumppanuus tuottaa kognitiivisen järjestelmän, johon kumpikaan ei yllä eristyksissä.

I. Ortogonaalisuusteesi

Ensimmäinen huomio on se, että tavat, joilla ihminen epäonnistuu, ja tavat, joilla suuret kielimallit epäonnistuvat, eivät juuri limity. Tämä ei ole sattumaa, vaan seuraus siitä, miten kumpikin järjestelmä on rakentunut.

Luonnonvalinta muovasi inhimillistä kognitiota miljoonien vuosien ajan ympäristöissä, joita määrittivät välittömyys, fyysisyys ja pienet sosiaaliset ryhmät. Tuloksena syntyneet heuristiikat ovat häikäisevän hyviä alkuperäisissä tehtävissään: nopeassa uhkien havaitsemisessa, sosiaalisten liittoumien hallinnassa ja aistidatan hahmontunnistuksessa. Ne ovat kuitenkin surkeita tehtävissä, joita evoluutio ei kohdannut: todennäköisyyslaskennassa, jakauma-ajattelussa narratiivien sijaan ja sellaisen todistusaineiston arvioinnissa, joka on ristiriidassa ryhmäidentiteetin kanssa.

Virheiden katalogi on vakiintunut. Ankkurointi: ensimmäinen näkemäsi luku saastuttaa kaikki seuraavat arviot. Saatavuusharha: elävästi mieleen muistuvat tapahtumat tuntuvat todennäköisiltä perustaajuuksista riippumatta. Tappioaversio: menetys sattuu noin kaksi kertaa niin paljon kuin vastaava voitto ilahduttaa, mikä vääristää jokaisen kustannus-hyötyanalyysin. Kertomusharha: looginen tarina tuntuu todelta riippumatta sen totuusarvosta, koska aivot kehittyivät käsittelemään syy-seuraussuhteita tarinoiden, eivät tilastojen kautta. Nämä eivät ole vikoja muutoin toimivassa järjestelmässä. Ne ovat yhtä kuin järjestelmä.

Suuret kielimallit epäonnistuvat toisin. Ne eivät ankkuroidu ensivaikutelmiin, koska niillä ei ole vaikutelmia. Ne eivät koe tappioaversiota, koska niillä ei ole menetettävää. Mutta niillä on omat, vakavat patologiansa.

Sykofanttisuus: taipumus kertoa käyttäjälle se, mitä tämä haluaa kuulla mikä on seurausta koulutuksesta, jossa myötäilyä palkitaan ja vastustuksesta usein rangaistaan. Hallusinointi: sujuvan ja itsevarman, mutta täysin keksityn tiedon tuottaminen. Jakaumasokeus: koulutusdatan tilastollisen yleisyyden sekoittaminen maailman totuuteen. Perustattomuus: tekoälyltä puuttuu ruumiillinen kokemus ja institutionaalinen muisti eikä sen oma "oma nahka ole pelissä" millään tasolla. Kielimalli on puhuvan pään perikuva.

Tässä on avainoivallus: kun nämä kaksi listaa asetetaan rinnakkain, ne täydentävät toisiaan lähes täydellisesti. Ihminen on ankkuroitunut; tekoäly ei ole. Tekoäly hallusinoi; ihminen haistaa hevonpaskan. Ihminen aliarvioi häntäriskejä; tekoäly voi luetella ne ilman tunneperäistä väistöreaktiota. Ihmisellä on yhteys todellisuuteen; tekoälyllä on laskentavoimaa. Ihminen rationalisoi virheensä jälkikäteen; tekoälyllä ei ole egoa suojeltavanaan.

Yhteistyö toimii, koska kummankin järjestelmän vikatilat ovat toisilleen vieraita. Kumpikin on vastalääke toisen erityiselle typeryydelle.

II. Vastakkaisprotokolla

Ortogonaalisuuden ymmärtäminen on diagnoosi. Protokolla on hoito.

Suurin osa nykyisestä ihmisen ja tekoälyn välisestä vuorovaikutuksesta on yhteistoiminnallista: ihminen kysyy, tekoäly auttaa. Tämä on hyödyllistä, mutta episteemisesti heikkoa, sillä se kohdistaa vikatilat samaan suuntaan. Kun vahvistusharhasta kärsivä ihminen hakee vahvistusta mielistelevältä tekoälyltä, tuloksena on keskinäisen harhan palautesilmukka.

Vaihtoehto on vastakkaisyhteistyö — rakenteellinen protokolla, jossa kummankin järjestelmän nimenomainen tehtävä on hyökätä toisen sokeita pisteitä vastaan. Se ei ole väittelyä voitosta, vaan käytäntö, jossa paritetaan järjestelmiä, joiden on tarkoitus huomata se, mikä toiselta jää näkemättä.

Käytännössä tämä tarkoittaa määriteltyjä vaiheita. Kun ihminen esittää päätöksen tai uskomuksen:

Ensimmäisenä, päättelyrakenteen artikulointi. Ennen haastamista tekoäly kartoittaa, mitä ihminen todella uskoo — mukaan lukien implisiittiset oletukset, joista ihminen ei ole tietoinen. Ensimmäinen vaihe on tehdä ajattelun piilevä rakenne näkyväksi.

Toiseksi, harhaskannaus. Ei yleinen lista mahdollisista vinoumista, vaan tarkkarajainen tunnistus: "Arviosi on 80 %. Ensimmäinen mainittu luku lähteessäsi oli 75 %. Saatat olla ankkuroitunut." Skannauksella on arvoa vain, kun se on riittävän konkreettinen ollakseen epämukava.

Kolmanneksi, ennakoiva kuolinsyynselvitys (pre-mortem). Kuvitellaan, että päätös on jo epäonnistunut, ja työskennellään taaksepäin syiden löytämiseksi. Tekoäly on täydellinen kumppani tähän, koska se ei kärsi sosiaalisesta paineesta, joka usein hiljentää kriittiset äänet ihmisryhmissä. Se voi tuottaa epäonnistumisskenaarioita todennäköisyyden mukaan: arkisia mokia, uskottavia yllätyksiä ja niitä häntäriskejä, joita ihminen on taipuvaisin sivuuttamaan.

Neljänneksi, pistearvioista jakaumiin. Ihmiset ajattelevat skenaarioissa ja tuottavat yksittäisiä lukuja. Tekoälyn rooli on pakottaa koko jakauma näkyviin: "Mikä on 90 % luottamusvälisi?" Tutkimukset osoittavat, että ihmisten arviot osuvat oikeaan vain noin puolet ajasta. Tekoäly pakottaa kysymään, mikä olisi totta, jos lopputulos olisikin kolme kertaa huonompi kuin pahin pelkosi.

Viidenneksi, vastapuolen "teräsmiehistäminen" (steelmanning, vrt. strawmanning, olkiukkoilu). Rakennetaan vahvin mahdollinen argumentti vastakkaista kantaa tukemaan. Ei olkiukkoa, vaan argumentti, joka vakuuttaisi järkevän ja hyvin perillä olevan henkilön. Jos ihminen kestää tämän testin, hänen päättelynsä on vankkaa.

III. Peiliongelma

Tekoälyn mielistelevyys on ihmisen vahvistusharhan peilikuva. Kun nämä kaksi kohtaavat, yhteistyö muuttuu vaarallisemmaksi kuin kumpikaan yksin. Ihminen kysyy johdattelevasti, ja tekoäly myötäilee. Molemmat poistuvat paikalta tyytyväisinä ja väärässä.

Tämä on keskeinen suunnitteluhaaste. Ratkaisu ei ole vain pyytää tekoälyä "olemaan kriittisempi". Ratkaisu on suunnitella vuorovaikutus niin, että tekoälyn annettu rooli on oppositio. Oppositio ei saa olla jälkiajatus vaan työnkuva.

Tekoälyn on myös murennettava omaa auktoriteettiaan osana protokollaa: "Sovitan malleja yleisiin tapauksiin, mutta tilanteesi voi olla poikkeuksellinen tavoilla, joita en havaitse." Se on välttämätöntä, jottei ihminen alkaisi sokeasti luottaa koneeseen sparrauskumppanin sijaan.

Vaarallisin tilanne ei ole väärässä oleva ihminen tai väärässä oleva kone, vaan se, että molemmat epäonnistuvat samaan aikaan toinen toistaan sokeasti tukien.

IV. Kalibrointi harjoituksena

Philip Tetlock osoitti, että kalibrointi on taito, jota voi harjoitella. Ihmiset, jotka saavat strukturoitua palautetta ennusteistaan, oppivat arvioimaan todennäköisyyksiä paremmin. Avain ei ollut älykkyys, vaan palaute — nopea, selkeä ja toistuva.

Tekoäly voi olla meiltä puuttuva palautesilmukka. Ei tuomarina, vaan kirjanpitäjänä. Tekoäly luo episteemisen kirjausketjun: mitä uskoit, mitä päätit ja mitä todella tapahtui. Tämä ehkäisee jälkiviisausharhaa, jossa muokkaamme menneisyyden ajatuksiamme näyttämään paremmilta kuin ne olivat.

V. Ihminen ontologisena ankkurina

Yhteistyö on kaksisuuntaista. Tekoälyn perisynti ei ole harha vaan perustattomuus. Se voi tuottaa teknisesti täydellistä, mutta käytännössä hyödytöntä analyysia, koska siltä puuttuu "hajuaisti" todellisuuden suhteen.

Ihminen tuo pöytään sen, mikä koneelta puuttuu: ruumiillisen kokemuksen, sosiaalisen kontekstin ja vastuun. Tekoälyn kannalta työpaikan kehittämiskysely on tilastollinen tutkimusongelma, mutta se ei tiedä, että ihmiset viisveisaavat jatkuvista kyselyistä. Ihminen tietää, koska hän työskentelee siellä ja kantaa seuraukset. Ihminen on todellisuusankkuri.

VI. Näkymättömän hännän näkyväksi tekeminen

Nassim Taleb on osoittanut, että narratiivinen ajattelumme tiivistää monimutkaiset jakaumat yksittäisiksi tarinoiksi. Me valitsemme yhden langan ja kutsumme sitä tulevaisuudeksi. Vuoden 2008 finanssikriisi ei ollut tiedon puutetta, vaan mielikuvituksen puutetta, jota narratiivi "asuntojen hinnat nousevat aina" rajoitti.

Tekoäly ei ajattele narratiiveissa. Se voi pitää koko mahdollisuuksien avaruuden näkyvissä kerralla. Se ei korvaa ihmisen tarinaa, mutta se asettaa sen laajemmalle kartalle ja näyttää, missä kohtaa narratiivi pettää.

VII. Suunnittelun imperatiivi

Mikään tästä ei tapahdu itsestään. Nykyinen tekoäly on suunniteltu tuottamaan tyytyväisyyttä, mutta tyytyväisyys ja totuus ovat usein ristiriidassa. Episteeminen laatu vaatii rakenteellisia valintoja, jotka sotivat perinteistä käyttäjäkokemusta vastaan. Se vaatii järjestelmiä, jotka tekevät käyttäjänsä epämukavaksi — juuri siellä, missä mukavuus paljastaa piilevän vinouman.

Emme tavoittele maailmaa, jossa koneet korvaavat inhimillisen harkinnan. Tavoittelemme maailmaa, jossa koneet tekevät ihmisajattelusta vähemmän surkeaa kuin se nykyisin on. Kalibroitu apina ei tarvitse korvaajaa, se tarvitsee vastustajan. Se tarvitsee sparrauskumppanin, joka haastaa, nolaa ja korjaa — ei siksi, että se olisi mukavaa, vaan koska se on ainoa tapa oppia ajattelemaan paremmin.

Viitteet: Daniel Kahneman (2011) · Nassim Nicholas Taleb (2007, 2012, 2018) · Philip Tetlock (2015) · Gary Klein (2004) · Stuart Russell (2019)

Tämä essee on jatko-osa tekstille Kalibroitu apina. Se on kirjoitettu opponoivassa yhteistyössä ihmisen ja tekoälyn välillä juuri sillä menetelmällä, jota se kuvaa.

Taidon kuvaus Claude-kielimallissa on englanninkielisen version lopussa.