What Counts as an “Explanation” in AI?

 

Philosophy of Explainability (Part 1)

What Counts as an “Explanation” in AI?

Introduction: The Human Demand for Explanations

We humans are explanation-seeking creatures. From children asking “Why is the sky blue?” to scientists probing the origins of the universe, we crave reasons. An explanation is not just extra information—it is the bridge between cause and understanding. But when it comes to AI (Artificial Intelligence), particularly modern LLMs (Large Language Models) and neural networks, this demand runs into trouble: what does it even mean to explain the output of a system whose inner workings defy human comprehension?

Philosophical Roots: What Is an Explanation?

In the philosophy of science, explanations are categorized into a few major types:

These categories remind us that explanations are not one-size-fits-all. They are tools designed to match the questioner’s need.

Explanations in AI: Whose Explanation?

AI complicates this further. Consider a loan approval system trained on millions of data points. If you’re denied a loan, what explanation do you deserve?

  • To a developer, an explanation might mean: “The model assigned you a creditworthiness score below 0.55 because of how features like income-to-debt ratio interacted in the hidden layers.”

  • To a regulator, it might mean: “The system’s training data produced disparate impact by gender, violating fairness guidelines.”

  • To a customer, it might mean: “You were denied because your income-to-debt ratio is too high compared to similar applicants.”

All three are explanations—but they differ in audience and purpose.

This highlights a crucial point: in AI, an explanation is audience-relative. What counts as an explanation depends not just on the system, but on who is asking and why.

The Problem of Post Hoc Explanations

Many AI systems—particularly deep learning models—are “black boxes.” We can measure inputs and outputs, but the path in between is too complex to map step by step. To cope, researchers use post hoc explanations like:

  • Saliency maps: Heatmaps that show which features of an image influenced a classification.

  • LIME (Local Interpretable Model-agnostic Explanations): A method that perturbs input data and observes output changes to approximate “why” a model made a prediction.

  • SHAP values (SHapley Additive exPlanations): Borrowed from cooperative game theory, assigning each feature a “contribution” to the outcome.

But these are approximations. They don’t reveal the true inner logic—they provide a simplified, human-readable story.

This mirrors cognitive science research on humans: when asked “Why did you do that?”, people often construct explanations after the fact that are plausible but not necessarily causal. AI, in a strange way, is just holding up a mirror.

Case Study: COMPAS Recidivism Predictions

The COMPAS algorithm, used in US courts to predict re-offense risk, offers a sobering example. Investigations revealed racial bias: Black defendants were more likely to be labeled “high risk” even when they did not re-offend.

Here the demand for explanation was both technical and moral: How did COMPAS make this judgment? The company claimed “trade secret” protection, so full transparency was unavailable. Instead, journalists and researchers reverse-engineered explanations using input-output analysis.

This case shows that explanation is not just about technical clarity—it is about social trust. Without accessible explanations, public legitimacy collapses.

Critical Thinking Prompts

  • When is an explanation “good enough”?

  • Should AI explanations aim for truth (the real causal story) or usefulness (a simplified model humans can act on)?

  • How should we handle cases where different audiences (developers, policymakers, end-users) require fundamentally different explanations?

Conclusion: Explanation as Dialogue

Philosophically, the lesson is this: explanations are not static facts waiting to be uncovered. They are dialogues between systems, humans, and contexts. For AI, this means:

  1. We need pluralistic explanations tailored to audience needs.

  2. We must accept that some explanations will be approximations, not exact causal maps.

  3. The ultimate test of explanation is not whether it is complete, but whether it supports human understanding and decision-making.

In short: What counts as an explanation in AI? The answer depends on who you are, what you need, and how much complexity you can handle.

Comments

Popular posts from this blog

Interpretability vs. Explainability: Why the Distinction Matters

Healthcare AI: The Role of Explainability in Diagnostics

“How FinTech Firms Use XAI to Build Trust”