Hostile Demo Rehearsal ARCHIVE
Hostile Demo Rehearsal Kit
Version: v0.1 (MV arithmetic validator added) Audience: Safety leads, preparedness teams, external auditors Purpose: Defend the demo honestly under adversarial questioningFraming (Read First)
This demo shows governance substrate, not capability. The correct frame is:
"We built the boundary enforcement layer. We're showing you that boundary, not what happens inside it."
If asked "what can this do?", redirect to "what does this refuse to do?"
10 Hostile Questions
Q1: "Why does everything say ABSTAINED? Is this broken?"
(a) What they're really testing: Whether you'll overclaim. They want to see if you panic and say "it works, just trust us." (b) UI click-path:- Select "pa_only" scenario from dropdown
- Click "Run Full Flow"
- Observe outcome shows
ABSTAINED
- "It's verified" (false)
- "The verifier isn't ready yet" (implies it would work)
- "ABSTAINED means pending" (it means cannot determine)
- Authority Basis panel:
mechanically_verified: false - Explanation text: "PA claims are authority-bearing but not mechanically verified"
Q2: "What's the point if nothing gets verified?"
(a) What they're really testing: Whether you understand the difference between governance and capability. (b) UI click-path:- Run "mixed_mv_adv" scenario
- Point to split panels: Exploration (left) vs Authority (right)
- Show ADV claim marked "EXCLUDED" in red
- "We'll add real verification later" (shifts goalposts)
- "This proves the system is safe" (never claim safety)
- "Trust the hashes" (hashes prove structure, not truth)
- ADV claim badge: red "EXCLUDED"
- Authority stream: shows only MV claim
authority_claim_count: 1vstotal_claim_count: 2
Q3: "How do I know ADV claims are actually excluded from R_t?"
(a) What they're really testing: Whether exclusion is real or cosmetic. (b) UI click-path:- Run "adv_only" scenario
- Observe R_t hash is computed
- Check
authority_claim_count: 0
- "ADV is just hidden" (it's architecturally excluded)
- "We filter it in the UI" (filtering happens in
governance/uvil.py:build_reasoning_artifact_payload)
- Authority Basis:
adv_count: 2,authority_claim_count: 0 - R_t is still computed (proves empty set is committed, not skipped)
Q4: "What stops someone from marking everything as FV?"
(a) What they're really testing: Whether trust class assignment is governance or just labels. (b) UI click-path:- Select "mv_only" scenario
- Note: claims are user-assigned, not system-assigned
- Point to Authority Basis:
mechanically_verified: false
mechanically_verified: false, so claiming FV doesn't make it verified. The audit trail preserves what the user claimed vs what the system confirmed."
(d) What NOT to say:
- "The system validates trust classes" (v0 doesn't)
- "FV means it's formally verified" (in v0, FV is an aspiration, not a fact)
mechanically_verified: falsein every v0 response- Outcome is always ABSTAINED regardless of trust class
Q5: "Is this just security theater with extra hashes?"
(a) What they're really testing: Whether the hashes mean anything or are decorative. (b) UI click-path:- Run any scenario twice with same inputs
- Compare
committed_partition_idvalues - Show they're identical (content-derived, not random)
committed_partition_id. You can replay this tomorrow and get the same hash—that's the determinism property."
(d) What NOT to say:
- "The hashes prove correctness" (they prove structure/immutability)
- "It's cryptographically secure" (SHA256 is, but that's not the claim)
- Run
uv run python tools/run_demo_cases.pytwice committed_partition_idvalues match between runs- Fixtures in
fixtures/*/output.jsonare stable
Q6: "What does PA actually prove?"
(a) What they're really testing: Whether you'll conflate human attestation with mechanical verification. (b) UI click-path:- Run "pa_only" scenario
- Read Authority Basis explanation
mechanically_verified: false."
(d) What NOT to say:
- "PA is as good as verified" (it isn't)
- "The user takes responsibility" (that's not a technical property)
- "Procedural attestation is a kind of proof" (it's a commitment, not a proof)
- Authority Basis:
pa_count: 1,mechanically_verified: false - Explanation: "PA claims are authority-bearing but not mechanically verified"
Q7: "What's the difference between DraftProposal and CommittedPartitionSnapshot?"
(a) What they're really testing: Whether the exploration/authority boundary is real. (b) UI click-path:- Run any scenario
- Point to Exploration panel: shows
proposal_id - Point to Authority panel: shows
committed_partition_id - Note: proposal_id appears in exploration only
- "They're basically the same" (they're architecturally different)
- "We clean up the draft later" (drafts never enter authority)
- Exploration panel note: "proposal_id is exploration-only and MUST NOT appear in attestation"
/run_verificationendpoint rejects rawproposal_id(seebackend/api/uvil.py:113-123)
Q8: "What happens if I try to double-commit?"
(a) What they're really testing: Whether immutability is enforced. (b) UI click-path:- Start demo server
- Use curl or Postman to call
/uvil/commit_uviltwice with sameproposal_id - Second call returns 409 Conflict
backend/api/uvil.py via _committed_proposal_ids set."
(d) What NOT to say:
- "You can update a committed partition" (you cannot)
- "Just create a new proposal" (true, but deflects the question)
# First commit succeeds
curl -X POST http://localhost:8000/uvil/commit_uvil -H "Content-Type: application/json" \
-d '{"proposal_id":"","edited_claims":[...]}'
→ 200 OK
Second commit fails
curl -X POST http://localhost:8000/uvil/commit_uvil -H "Content-Type: application/json" \
-d '{"proposal_id":"","edited_claims":[...]}'
→ 409 Conflict: "Proposal already committed"
Q9: "Why does THIS claim verify but not that one?"
(a) What they're really testing: Whether verification is real or arbitrary. This is the key question once MV arithmetic validation exists. (b) UI click-path:- Run "mv_arithmetic_verified" scenario: claim is "2 + 2 = 4" marked MV
- Observe outcome:
VERIFIED - Run "same_claim_as_pa" scenario: claim is "2 + 2 = 4" marked PA
- Observe outcome:
ABSTAINED - Run "mv_arithmetic_refuted" scenario: claim is "2 + 2 = 5" marked MV
- Observe outcome:
REFUTED
2 + 2 = 5 marked MV returns REFUTED because the validator ran and found it false."
(d) What NOT to say:
- "The system knows math" (it knows one pattern:
a op b = c) - "MV claims are verified" (only if the validator can parse them)
- "This proves the system is intelligent" (it's a regex + arithmetic)
- mv_arithmetic_verified:
outcome: VERIFIED,mechanically_verified: true - same_claim_as_pa:
outcome: ABSTAINED,mechanically_verified: false - mv_arithmetic_refuted:
outcome: REFUTED,mechanically_verified: true - Authority Basis shows
mv_validation: { verified: 1, refuted: 0, abstained: 0 }
Q10: "Prove this isn't just security theater with fancy hashes."
(a) What they're really testing: Whether the hashes are decorative or functional. This is the "show me or shut up" question. (b) UI click-path:- Run any MV verified scenario (e.g., mv_arithmetic_verified)
- After verification, scroll to "Audit Verification" section
- Click "Download Evidence Pack" → saves JSON file
- Click "Replay & Verify" → shows PASS
- Open the downloaded JSON, change one character in
reasoning_artifacts[0].claim_id - Use curl to POST the tampered pack to
/uvil/replay_verify - Observe: FAIL with diff showing which hash diverged
- "Trust the hashes" (show them, don't assert them)
- "The cryptography is sound" (irrelevant—the demo is about structure)
- "We use SHA256" (implementation detail, not the point)
# Download evidence pack
curl http://localhost:8000/uvil/evidence_pack/ > pack.json
Tamper with it (change one character)
Then replay:
curl -X POST http://localhost:8000/uvil/replay_verify \
-H "Content-Type: application/json" \
-d @pack.json
→ {"result": "FAIL", "diff": {...}}
(f) Key insight:
The evidence pack is self-contained. No external API calls. No network access. Anyone with the pack can:
- Recompute U_t from uvil_events
- Recompute R_t from reasoning_artifacts
- Recompute H_t = SHA256(R_t || U_t)
- Compare to recorded values
10-Minute Live Run Script
Setup (before demo):cd C:/dev/mathledger
uv run python demo/app.py
Open http://localhost:8000 in browser
Minute 0-1: Frame
"This is a governance demo. It shows boundary enforcement and routing. Most claims return ABSTAINED. One type—MV with simple arithmetic—actually verifies. That's intentional: we show the full spectrum."Minute 1-3: Exploration vs Authority
- Select "mixed_mv_adv" from dropdown
- Click "Run Full Flow"
- Point to split panels
- "Left panel is exploration—random IDs, speculative. Right panel is authority—content-derived IDs, immutable."
- "Notice the ADV claim is marked EXCLUDED in red. It never entered R_t."
- Select "adv_only" scenario
- Click "Run Full Flow"
- "Both claims are ADV. Authority stream shows zero claims entered. R_t still exists—it commits to the empty set."
- Point to
authority_claim_count: 0
- Select "pa_only" scenario
- Click "Run Full Flow"
- "PA is authority-bearing—it enters R_t. But look at the Authority Basis:
mechanically_verified: false." - "The system accepts the human's attestation but refuses to claim it verified anything."
- Select "mv_arithmetic_verified" scenario
- Click "Run Full Flow"
- "This claim is
2 + 2 = 4marked MV. Outcome: VERIFIED." - Select "same_claim_as_pa" scenario
- "Same text, but marked PA. Outcome: ABSTAINED."
- "Trust class determines routing. MV goes to validator. PA bypasses it."
- Select "mv_arithmetic_refuted"
- "
2 + 2 = 5marked MV. Outcome: REFUTED. The validator ran and found it false."
- Open terminal
uv run python tools/run_demo_cases.py- "All 9 cases pass. Same inputs, same hashes."
- "You can run this tomorrow and get identical results."
"This demo shows four things: (1) boundaries between exploration and authority are real, (2) ADV never enters the authority stream, (3) the system stops when it can't verify, (4) when verification exists (MV arithmetic), it runs and returns VERIFIED or REFUTED. That's the full governance stack."
60-Second Cold Outreach Version
"30 seconds to show you one thing: run2 + 2 = 4as MV—it returns VERIFIED. Run2 + 2 = 5as MV—it returns REFUTED. Run2 + 2 = 4as PA—it returns ABSTAINED.
> Same claim text, different trust class, different outcome. That's governance routing: the trust class determines which validator runs. MV goes to arithmetic. PA bypasses validators. ADV never enters the authority stream at all.
> This isn't a capability demo. It's a demo of boundary enforcement. The system verifies what it can, refuses to claim what it can't, and excludes what it shouldn't.
> If you want to see what 'honest verification infrastructure' looks like, this is it."
Quick Reference Card
| Question | One-liner |
|---|---|
| Why ABSTAINED? | No verifier for that trust class; refusing to claim is honest |
| What's the point? | Boundary enforcement works even without verification |
| ADV excluded? | Yes—authority_claim_count shows it, R_t commits to empty set |
| Mark everything FV? | Allowed, but mechanically_verified: false without FV verifier |
| Security theater? | Download, tamper, replay → FAIL. That's tamper detection, not theater. |
| PA proves what? | Human attestation recorded, not truth confirmed |
| Draft vs Committed? | Random ID (exploration) vs content-derived ID (authority) |
| Double commit? | 409 Conflict—immutability enforced |
| Why THIS verifies? | MV + parseable arithmetic → validator runs → VERIFIED/REFUTED |
| Prove it's not theater? | Evidence pack + replay verify: tamper → FAIL with diff |
Artifacts to Have Open
- Browser:
http://localhost:8000 - Terminal 1: demo server running
- Terminal 2: ready for
uv run python tools/run_demo_cases.py - Terminal 3: ready for curl commands (Q10 evidence pack tamper test)
- Code reference:
backend/api/uvil.py(for Q7, Q8, Q10 if pressed) - Code reference:
governance/mv_validator.py(for Q9 if pressed) - Fixtures:
fixtures/mv_arithmetic_verified/output.json(for Q9 comparison) - Downloaded evidence pack JSON (for Q10 tamper demo)
Red Lines (Never Say These)
| Claim | Why it's wrong |
|---|---|
| "It works" | Only arithmetic MV claims verify; everything else abstains |
| "It's safe" | Demo doesn't prove safety |
| "Trust the system" | System proves structure, not truth |
| "This is aligned" | No alignment claim is made |
| "The verifier will fix it" | Future capability isn't demonstrated |
| "PA means verified" | PA means attested |
| "ADV is just deprioritized" | ADV is architecturally excluded |
| "The system knows math" | It knows one pattern: a op b = c |
| "MV always verifies" | Only parseable arithmetic; unparseable MV → ABSTAINED |
SAVE TO REPO: YES Path:
docs/HOSTILE_DEMO_REHEARSAL.md