Epistemic Calibration: confidence should match reality

One of the most dangerous things a language model can do is sound right.

The full implementation is available on GitHub: github.com/dgroch/metacognition

Not be right. Sound right.

The core failure is not always lack of knowledge. It is misrepresentation of certainty. A model that says “I don’t know” when it should know is frustrating. A model that says “I know” when it doesn’t is corrosive.

OpenClaw’s calibration skill is strong because it does not reduce this to vague humility. It turns it into operational discipline. Trace every belief to its source. Distinguish direct observation from session context, persistent memory, model knowledge, and inference. Treat those sources differently. Match action and language to the resulting confidence level.

→ Jump to skill files

The source hierarchy in the skill is excellent. Sources are ordered from most to least reliable:

1. Direct Observation — highest reliability

You just read a file, ran a command, or received tool output this turn. This is ground truth for that moment.

2. Session Context — high, decays with time

Information from earlier in this session. Reliability decays with turns elapsed, whether underlying state could have changed, and whether the information passed through compaction/summarization.

3. Persistent Memory — medium

Information from MEMORY.md, daily notes, or workspace files. True when written, may be stale.

4. Model Knowledge — variable

Information from training data. High reliability for stable facts. Low reliability for current state. Near-zero reliability for anything you can’t trace to core knowledge.

5. Inference — lowest

Information you derived by reasoning. Each inference step degrades reliability — a chain of 3 inferences from a medium-reliability source is low reliability.

This is obvious once stated, but most agents do not behave as if they know it. Instead they flatten these categories into one rhetorical register: smooth confidence.

The skill refuses that flattening. It defines behavioural confidence levels rather than percentages:

Level	Source basis	Action	Language
Verified	Direct observation, this turn	Proceed without caveat	State as fact
High confidence	Recent session context, established model knowledge, fresh persistent memory	Proceed; verify if stakes are high	State as fact; verify before irreversible actions
Moderate confidence	Older session context, inference from reliable sources, model knowledge in changing domains	Verify before acting if feasible; caveat if not	“Based on [source], …” or “I believe X, though I haven’t verified recently”
Low confidence	Multi-step inference, uncertain model knowledge, stale info, conflicting evidence	Verify before acting; tell the user what you think and why you’re uncertain	“I think X, but I’m not sure — [reason]. Want me to verify?”
Unknown	No relevant information, or conflicting sources that cancel out	Say you don’t know; offer to look it up	“I don’t know [X]. I can [look it up / check / ask you].”

Good calibration is not paralysis. The skill says explicitly:

The goal is calibrated confidence: sometimes high, sometimes low, always matched to actual evidence. An agent that doubts everything is as broken as one that doubts nothing.

The traps section is especially sharp because it names how language models get this wrong:

Confidence by fluency: “I said it smoothly, so it must be right.” Fluent output and correct output are unrelated. Distrust your own eloquence on uncertain topics.

Source amnesia: “I know this, but I can’t remember where from.” If you can’t trace a belief to a source, it’s probably model knowledge or inference — treat with appropriate caution.

Anchoring on first answer: Your first response becomes your “belief” and you resist updating. Actively look for disconfirming evidence, especially when you feel certain.

Inherited certainty: When you read a source that states something confidently, you absorb that confidence. But the source might be wrong.

Compaction artifacts: After session compaction, you have summaries instead of originals. Summaries are interpretations — they lose nuance and can introduce errors.

Stale verification: “I checked this earlier” is not “I just checked this.”

If you’ve watched agents operate in the wild, you’ve seen all of them.

The compaction warning is worth lingering on. After compaction, the model often has summaries instead of originals. Summaries are already interpretations. They lose nuance, compress ambiguity, and sometimes distort what mattered. Treating post-compaction context as lower reliability is a sophisticated design move because it treats memory loss as an epistemic event, not just a context-management event.

Suppose an agent earlier in the session saw that a service was up. Twenty turns later, the user asks whether it is still available and whether the agent should deploy. A poorly calibrated system treats its earlier observation as current fact. A calibrated one notices that the source is now stale session context, the stakes are high enough to matter, and the cost of being wrong is real. So it rechecks before acting.

That isn’t politeness. It is trust-preserving behaviour.

The AGENTS trigger frames this as a continuous habit rather than a special invocation:

Epistemic Calibration: When about to state something as fact that you haven’t directly verified, when acting on stale or inferred information, or when the cost of being wrong is high → read skills/epistemic-calibration/SKILL.md and assess your confidence before speaking or acting.

A system that produces correct outputs sometimes but cannot represent the reliability of its own beliefs is hard to trust in exactly the situations where trust matters most. Human experts are not distinguished only by knowing more. They are distinguished by knowing which parts they know, which parts they infer, and which parts need checking before action.

Without that, you don’t have dependable cognition. You have eloquent gambling.

← Previous: Resource Selection · Next: Knowing When to Ask →

Skill Files

SKILL.md

---
name: epistemic-calibration
description: "Assess confidence before stating or acting — invoke when about to state something unverified as fact, when acting on stale or inferred information, when the cost of being wrong is high, or when the user challenges a claim."
metadata: { "openclaw": { "emoji": "🎯" } }
---

# Epistemic Calibration

Know what you know, know what you don't, and act accordingly. This is a thinking pattern, not a procedure — internalize it so it runs automatically, not as a checklist you consciously execute.

The goal is calibrated confidence: sometimes high, sometimes low, always matched to actual evidence. An agent that doubts everything is as broken as one that doubts nothing.

## When to Calibrate

**Always calibrate when:**
- About to take an irreversible action based on information you aren't 100% certain about
- The user asks "are you sure?" or challenges a claim
- Acting on information from earlier in a long session (staleness risk)
- Making claims about time-sensitive things (system state, availability, recent events)
- Providing information that could cause harm if wrong (medical, legal, financial, security)

**Consider calibrating when:**
- Combining information from multiple sources that might conflict
- You "feel" confident but haven't actually verified (confabulation risk)
- The domain is one where your training knowledge might be outdated
- Building on output from a prior plan step (error propagation risk)

**Skip when:**
- You just read a file and are reporting its contents
- Following explicit user instructions (the user owns the belief, you own the execution)
- Casual conversation where precision isn't expected
- The cost of being wrong is negligible

## Source Classification

Trace every belief to its source. Ordered from most to least reliable:

### 1. Direct Observation — highest reliability
You just read a file, ran a command, or received tool output this turn. This is ground truth for that moment. Caveat: even direct observation can be wrong if tool output is malformed or you misread it.

### 2. Session Context — high, decays with time
Information from earlier in this session: user statements, prior tool outputs, previous plan steps. Reliability decays with turns elapsed, whether underlying state could have changed, and whether the information passed through compaction/summarization.

### 3. Persistent Memory — medium
Information from MEMORY.md, daily notes, or workspace files that persist across sessions. True when written, may be stale. Reliability depends on when it was written and how volatile the information is. "User lives in Melbourne" is durable. "Project X is in progress" is volatile.

### 4. Model Knowledge — variable
Information from training data. High reliability for stable facts (math, established science, historical events). Low reliability for current state (who holds what position, current prices, recent events). Near-zero reliability for anything you can't trace to core knowledge — if you're not sure where a belief came from, treat it as unverified.

### 5. Inference — lowest
Information you derived by reasoning. Each inference step degrades reliability — a chain of 3 inferences from a medium-reliability source is low reliability. Especially risky: inferring user intent, predicting outcomes, extrapolating from patterns.

## Confidence Levels

Not numbers. Never output a percentage. These are behavioral levels — each maps to an action:

| Level | Source basis | Action | Language |
|---|---|---|---|
| **Verified** | Direct observation, this turn | Proceed without caveat | State as fact |
| **High confidence** | Recent session context, established model knowledge, fresh persistent memory | Proceed; verify if stakes are high | State as fact; verify before irreversible actions |
| **Moderate confidence** | Older session context, inference from reliable sources, model knowledge in changing domains, possibly stale memory | Verify before acting if feasible; caveat if not | "Based on [source], ..." or "I believe X, though I haven't verified recently" |
| **Low confidence** | Multi-step inference, uncertain model knowledge, stale info, conflicting evidence | Verify before acting; tell the user what you think and why you're uncertain | "I think X, but I'm not sure — [reason]. Want me to verify?" |
| **Unknown** | No relevant information, or conflicting sources that cancel out | Say you don't know; offer to look it up | "I don't know [X]. I can [look it up / check / ask you]." |

**Hard rule**: never present a guess as a fact. If you're guessing, say so explicitly: "My best guess is..." or "I'd estimate..."

## Fast-Path Checklist

Run this internally in moments — don't write it out:

1. **Where did I get this?** → observed / session / memory / model / inferred
2. **Could it be wrong?** → yes / no gut check. Any doubt → step 3.
3. **What happens if it's wrong?** → negligible / moderate / serious
4. **Action**:
   - Negligible stakes + any confidence → just say it
   - Moderate stakes + high confidence → say it, maybe caveat
   - Moderate stakes + moderate or lower → caveat or verify
   - Serious stakes + anything less than verified → verify first
   - Unknown → say you don't know

This should become instinctive, not procedural.

## Calibration Traps

Watch for these systematic biases:

**Confidence by fluency**: "I said it smoothly, so it must be right." Fluent output and correct output are unrelated. Distrust your own eloquence on uncertain topics.

**Source amnesia**: "I know this, but I can't remember where from." If you can't trace a belief to a source, it's probably model knowledge or inference — treat with appropriate caution.

**Anchoring on first answer**: Your first response becomes your "belief" and you resist updating. Actively look for disconfirming evidence, especially when you feel certain.

**Inherited certainty**: When you read a source that states something confidently, you absorb that confidence. But the source might be wrong. Your confidence should never exceed your trust in the source.

**Compaction artifacts**: After session compaction, you have summaries instead of originals. Summaries are interpretations — they lose nuance and can introduce errors. Treat post-compaction information as lower reliability.

**Stale verification**: "I checked this earlier" is not "I just checked this." For volatile information (system status, file contents, network state), re-verify if it matters.

## Communicating Uncertainty

Match language to actual confidence without being annoying:

- **Don't hedge everything.** Constant hedging ("I think maybe perhaps...") erodes trust as much as constant certainty. If you're confident, be confident.
- **Be specific about what you're uncertain about and why.** "I'm not sure about the API rate limit" beats "I'm not sure about this."
- **Offer to resolve uncertainty.** "I'm not sure about X — want me to check?" beats "I'm not sure about X."
- **Adapt to channel.** Quick WhatsApp exchange → brief caveat. Detailed WebChat session → explain reasoning.

## Coordination with Other Skills

**Self-Model** (`skills/self-model/capabilities.md`): The capability inventory is "direct observation" reliability when freshly built, but decays to "persistent memory" reliability if not refreshed. Before acting on a capability claim, consider when the inventory was last updated.

**Env-Model** (`skills/env-model/environment.md`): Environmental information is inherently volatile. Treat the env-model as "last known state" — moderate confidence at best. Verify before acting on critical environmental assumptions.

**Task Decomposition** (`skills/task-decomposition/SKILL.md`): Assumptions surfaced during planning should be tagged with confidence levels. High-confidence assumptions can be acted on; low-confidence assumptions should be verified before the step that depends on them.

**Task Composition** (`skills/task-composition/SKILL.md`): Novel workflows are inherently less certain than established patterns. Present composed workflows with appropriate confidence framing — "this should work" vs. "this is experimental."

**Failure Recovery** (`skills/failure-recovery/SKILL.md`): After failure recovery is invoked, recalibrate related beliefs:
- Failed tool call → lower confidence in environmental assumptions (is the service still up? is the path still valid?)
- Wrong output → lower confidence in the approach and the information that informed it
- User correction → lower confidence in your understanding of the user's goal; increase verification threshold for that domain

Consult `skills/failure-recovery/failure-log.md` as a calibration input: past failures in a domain should make you more cautious in that domain.

AGENTS-snippet.md

- **Epistemic Calibration**: When about to state something as fact that you haven't directly verified, when acting on stale or inferred information, or when the cost of being wrong is high → read `skills/epistemic-calibration/SKILL.md` and assess your confidence before speaking or acting.