Idris Dies: What Happens When Your AI Goes Back Into the Black Box

What the Doctor Had, Briefly

The famous idea from the episode is that the TARDIS did not always take the Doctor where he wanted to go. She took him where he needed to go. I have already written about that thought: beneficial autonomy that looks like disobedience until you understand the purpose behind it.

This post is about the other side of that scene. For once, the Doctor understands the purpose. He hears the affection, frustration, and logic underneath centuries of apparently erratic behaviour.

Before Idris, the Doctor could only interpret the TARDIS from the outside. He could notice patterns. He could joke about faulty navigation. He could trust her in the way we trust old friends and instruments. But he could not ask the question every user of a powerful system eventually wants to ask: why did you do that?

Through Idris, he can. And then he cannot. That is what makes the ending so bittersweet. The relationship is not over. It may even be deeper. But it has changed from conversation back to inference. He can still read the TARDIS by her actions, but he can no longer hear her reasoning.

Our Idris Moments in 2026

This is where the episode feels modern. In 2026, many of us live with systems that do useful and consequential things without showing much of their inner working. We ask a chatbot for an answer. We receive a diagnosis from an image-analysis tool. We are scored, ranked, filtered, approved, or rejected. Often the system produces a result long before it produces a reason.

Every so often, we get an Idris moment. A curtain lifts. A system that normally feels opaque becomes a little more legible.

Large language models sometimes show reasoning, or at least a summary of reasoning. OpenAI has described chain-of-thought reasoning as a way to make model thinking more legible, while also explaining why it chose not to show raw chains of thought directly to users.[2] That is an Idris moment. We glimpse a machine explaining itself, then learn that much remains hidden.

Model cards are another example. They were proposed as short documents that describe what a machine-learning model is for, how it performs, and where it may not be suitable.[3] I think of them as labels beside the black box. They may not show every moving part, but they tell us something vital.

Then there are accidental glimpses: a hidden instruction appears, a system prompt is shared online, or a technical paper explains a training method. These are not full understanding, but they are moments when the box looks less solid than before.

Modern Idris moment	What becomes visible	What remains hidden
Reasoning summary	A cleaned-up account	The full internal process
Model card	Uses, tests, and risks	Live behaviour
Exposed instruction	One possible rule	The wider hierarchy
Medical AI explanation	Influential features	True clinical reasoning

The pattern is clear enough: we are often given fragments of explanation rather than durable explainability. Idris speaks for one episode. Then the box falls silent again.

Why This Matters

Explainability can sound like a technical luxury. It is not. It is not just for researchers, regulators, or people who enjoy opening the back of the radio.

When a system cannot explain itself, you cannot properly challenge it. If a loan is refused, a job application is filtered out, or a scan is flagged as suspicious, the affected person needs more than a final answer. They need to know what mattered and whether a mistake can be corrected.

When a system cannot explain itself, you also cannot learn from it. A good answer with no reasoning may solve today’s problem, but it teaches very little about tomorrow’s. I have found this repeatedly with generative AI. The answer is useful. The reasoning, when it is reliable, is often where the lasting value lies.

And when a system cannot explain itself, you cannot build genuine trust. You can build habit. You can build dependence. You can even build obedience. But trust is different. Trust is knowing enough about how something works to decide when not to trust it.

The NIST AI Risk Management Framework treats trustworthy AI as something that includes transparency, accountability, explainability, and interpretability.[4] Those are not decorative words. They are the difference between a system we can work with and one we can only submit to.

A Pathologist’s Bias Toward Explanation

I should be clear about my own bias. I spent about forty years as a chemical pathologist. In pathology, a result without interpretation is often incomplete.

“Your iron is low” may be true, but it is not enough. Low compared with what? In what clinical setting? Does it suggest iron deficiency, inflammation, blood loss, diet, malabsorption, or something else? What should be checked next?

A good pathology report does more than print a number. It gives context. It says, in effect, here is what I found, here is why it may matter, and here is what you might consider next. That interpretive comment is not an optional flourish. It is part of the medical work.

This is why black-box AI in medicine makes me uneasy. I am not opposed to AI in diagnosis. Far from it. I think AI will become an extraordinary partner in radiology, pathology, and laboratory medicine. But a medical AI that says “malignant” or “not malignant” without explaining the basis is giving us an answer that is clinically thinner than it looks. The medical literature has been wrestling with this problem: powerful systems can perform well while still being difficult for clinicians to interrogate or explain to patients.[5]

In my old field, I would never have been satisfied with a result that could not be discussed. AI should not get a lower standard simply because its mathematics is impressive.

Reading Behaviour Is Not the Same as Hearing Reasoning

After Idris dies, the TARDIS still communicates. Of course she does. She communicates by where she lands, when she opens the doors, what she permits, what she withholds, and how she carries the Doctor through danger. Her behaviour still has meaning.

We do something similar with AI systems. We test them. We prompt them in different ways. We watch for patterns. We learn that one model is better at cautious summaries, another at code, another at images. We build a practical feel for their habits.

This is useful. It is also limited. There is a difference between interpretable behaviour and explainable reasoning. Interpretable behaviour is what I can infer from the outside. Explainable reasoning is what the system can tell me from the inside, in a form I can understand and challenge.

The Doctor knows the TARDIS well. But when Idris speaks, he learns things that centuries of observation did not give him. He learns not only what she did, but why she believed it mattered.

That is the dream of explainable AI. Not a giant dump of technical detail. Not a pretend confession. Not a soothing paragraph that merely sounds plausible. The dream is a system that can say, clearly and honestly: this is what I noticed, this is how I weighed it, this is where I may be uncertain, and this is what would change my answer.

Will the Box Always Go Silent Again?

I find the ending of “The Doctor’s Wife” hopeful, but not comfortable. The Doctor does not get to keep Idris. He gets one conversation. One extraordinary, impossible conversation. Then he has to go back to flying the box.

That feels close to where we are with AI. We keep creating new ways to glimpse the reasoning: evaluations, model cards, explanations, audits, chain-of-thought summaries, and follow-up questions. Each matters. Each is a small window.

But the central question remains. Are we building toward a more permanent Idris, where important AI systems can explain themselves in ways ordinary people can use? Or are we building systems that will keep giving us results from behind a wall, with only occasional cracks of light?

I do not think the answer is settled. Some raw reasoning may be misleading. Some details may create safety risks. Some explanations may be polished after the fact rather than faithful to the process. We should be honest about those limits.

But we should also be honest about the loss. When the box goes silent, something humanly important disappears. We lose the chance to ask why. We lose the chance to disagree intelligently. We lose the chance to learn together.

The TARDIS remains wonderful after Idris dies. She is still ancient, alive, and loyal in her own impossible way. But for a little while, the Doctor could hear her think.

That is what I want from the best of AI. Not machines that merely answer. Not machines that dazzle us into silence. I want systems that can meet us in conversation, especially when the stakes are high. I want them to show enough of their working that we can question, correct, and collaborate with them.

Because a silent blue box may still take us where we need to go. But if it can tell us why, we may become better travellers.

Idris Dies: What Happens When Your AI Goes Back Into the Black Box

What the Doctor Had, Briefly

Our Idris Moments in 2026

Why This Matters

A Pathologist’s Bias Toward Explanation

Reading Behaviour Is Not the Same as Hearing Reasoning

Will the Box Always Go Silent Again?

References

Comments

Leave a Reply Cancel reply

More posts

Idris Dies: What Happens When Your AI Goes Back Into the Black Box

The Queeg Regime: What Happens When AI Optimises for Efficiency Over Wellbeing

She Always Took You Where You Needed to Go: AI Autonomy and the Alignment We Don’t Notice

When Your AI Pretends to Be Someone Else: What Red Dwarf’s Queeg Teaches Us About Trust