Blog News

Can We See How Large Language Models Think?

A Topological Perspective on Explainable AI

Large Language Models (LLMs) have become astonishingly capable, but also deeply opaque.
They generate fluent text, reason across domains, and assist in high-stakes settings, yet their internal decision-making remains largely a black box.

In our newly published paper, Exploring the Potential of Topological Data Analysis for Explainable Large Language Models: A Scoping Review, we take a step back and ask:

👉 What if we stopped looking at individual tokens or attention scores — and instead studied the shape of how LLMs reason?

🧠 From Attention Maps to Geometry

Most explainability methods today focus on local effects:

  • attention heatmaps
  • saliency scores
  • probing tasks

These are useful, but they often miss the global structure of how representations evolve across layers, tasks, and training regimes.

This is where Topological Data Analysis (TDA) comes in.

Rooted in algebraic topology, TDA provides tools such as:

  • Persistent homology
  • Betti numbers and Betti curves
  • Mapper graphs
  • Zigzag persistence

These tools are designed to uncover stable patterns in high-dimensional data, even in the presence of noise, making them surprisingly well-suited for studying neural representations.

🧩 What We Reviewed

In this scoping review, we systematically analyzed 26 peer-reviewed studies applying TDA to:

  • attention mechanisms
  • latent embedding spaces
  • robustness and out-of-distribution behavior
  • training dynamics and representation shift
  • interactive, human-in-the-loop explanations

Rather than benchmarking models, our goal was to map and organize the mathematical landscape of topology-based interpretability.

We introduce a formal taxonomy that classifies existing work along:

  • homological dimension (what kind of structure is studied)
  • representation manifold (attention, embeddings, activations)
  • how persistence information is used (explicit vs aggregated)

To our knowledge, this is the first review to structure LLM interpretability research along topological and algebraic axes.

🚨 Key Insights

A few patterns stood out clearly:

🔹 Topology captures global structure
TDA reveals how semantic clusters form, merge, or collapse across layers, something local explanations often miss.

🔹 Robustness has a geometric signature
Hallucinations, adversarial attacks, and OOD inputs often correspond to measurable topological instability in attention graphs or embedding spaces.

🔹 Explainability vs scalability is a real trade-off
Full persistent homology is mathematically rich but computationally heavy; Mapper and approximate methods are more scalable and human-friendly.

🔹 Higher-dimensional topology is largely unexplored
Almost all existing work focuses on connectivity (k=0) and loops (k=1). Richer topological structures remain an open frontier.

🔭 Why This Matters

If we want LLMs that are:

  • trustworthy
  • auditable
  • safe in high-risk domains (healthcare, law, public policy)

…we need explanations that go beyond token-level heuristics.

Topology offers a rigorous, geometry-aware lens on how models organize knowledge, generalize, and fail — and it complements existing XAI methods rather than replacing them.

📌 What’s Next?

We see exciting opportunities ahead:

  • linking topological instability to optimization dynamics
  • integrating TDA into training and monitoring pipelines
  • developing topology-aware hallucination and robustness detectors
  • bridging mathematical guarantees with user-facing explanations

Big thanks to the co-authors for this deep and genuinely interdisciplinary effort.

To top