Governance Architecture | Tractatus AI Safety Framework

Defence in Depth

Behavioral training shapes tendency. Structural enforcement constrains capability. Human oversight provides cultural context. Three layers, complementary, each compensating for the others' weaknesses.

Behavioral Training

Shapes model tendency toward governed behavior

Training reduces boundary violations at source, before runtime enforcement is needed. The model cooperates with governance rather than fighting it. But training alone can be bypassed by adversarial prompts and degrades under context pressure.

Can be bypassed by adversarial prompts; degrades under context pressure

Planned

Structural Enforcement

External constraints that cannot be bypassed by prompting

Six governance services operate outside the AI runtime, plus Guardian Agents verifying every response through mathematical similarity rather than generative checking. Immutable audit trails stored independently. Catches what training misses.

Cannot prevent all failure modes; adds runtime overhead

In Production

Human Oversight & Tenant Governance

Constitutional rules, cultural traditions, and human escalation

Communities set their own governance rules through Tractatus traditions. Context-aware and culturally appropriate. Humans hold final authority on values decisions. AI facilitates, never decides.

Cannot scale to every interaction; depends on human engagement

Framework Complete

"Training can make a model likely to behave well; only architecture can make it structurally harder to behave badly."

Governance During Training, Tractatus Research

DEPLOYED — MARCH 2026

Guardian Agents

Verification without common-mode failure. The watcher is not another speaker — it is a measuring instrument.

The fundamental problem with using one AI to verify another: both systems share the same epistemic domain. A generative model checking a generative model is susceptible to the same categories of failure. Guardian Agents resolve this by operating in a fundamentally different domain.

Source Analysis

Identify factual claims in the AI response and locate candidate source material from the community's own content.

Embedding Similarity

Cosine similarity between claim embeddings and source embeddings. Mathematical measurement, not interpretation. Not susceptible to hallucination.

Confidence Scoring

Each claim receives a confidence badge (high, medium, low, unverified) visible to the user. Transparency by default.

Adaptive Learning

Moderator corrections feed back into verification thresholds. The system learns from the community's own quality judgments.

Philosophical Foundations

These architectural choices are philosophical commitments that demanded specific engineering responses.

Wittgenstein

Language games require external criteria. AI cannot verify its own meaning.

Isaiah Berlin

Value pluralism. No single optimisation function captures what communities value.

Elinor Ostrom

Polycentric governance. Communities govern their own commons effectively.

Te Ao Māori

Kaitiakitanga. Guardianship implies obligation to the governed, not authority over them.

Read the Research Paper See Production Architecture

Five Architectural Principles

Adapted from Christopher Alexander's work on living systems. These are design criteria enforced architecturally, not documentation.

Not-Separateness

Governance in the critical path, not bolted on

Every action passes through validation before executing. This is architectural enforcement — governance services intercept in the critical execution path, not as after-the-fact monitoring. Bypass requires explicit override flags, and every override is logged.

Deep Interlock

Services reinforce each other

Governance services coordinate through mutual validation. High context pressure intensifies boundary checking. Instruction persistence affects cross-reference validation. Compromising one service does not compromise governance — an attacker would need to circumvent multiple coordinated services simultaneously.

Gradients Not Binary

Intensity levels, not yes/no switches

Governance operates on gradients: NORMAL, ELEVATED, HIGH, CRITICAL. Context pressure, security impact, and validation rigor all scale with intensity. Graduated response prevents both alert fatigue and catastrophic failures. Living systems adapt gradually; mechanical systems snap.

Structure-Preserving

Changes enhance without breaking

Framework changes must preserve wholeness. Audit logs remain interpretable across versions. Historical decisions stay valid. New capabilities are added without invalidating existing governance records. Regulatory advantage: stable audit trails without re-interpreting old decisions every version.

Living Process

Grows from real failures, not theory

Framework changes emerge from observed reality, not predetermined plans. When services went unused, fade detection was added. When verification created noise, selective mode evolved from real trigger patterns. Evidence drives evolution, not guesswork.

How the Five Principles Work Together

Not-Separateness (governance in critical path)

↓ requires

Deep Interlock (services coordinate)

↓ enables

Gradients (nuanced responses)

↓ guided by

Living Process (evidence-based evolution)

↓ constrained by

Structure-Preserving (audit continuity)

↓

System Wholeness

Runtime-Agnostic Architecture

Tractatus works with any agentic AI system. The governance layer sits between your agent and its actions.

1. Agent Runtime

Your AI agent (any platform). Planning, reasoning, tool use. Tractatus is agnostic to implementation.

2. Governance Layer

Six services enforce boundaries, validate actions, monitor pressure. Guardian Agents verify every response. Architecturally harder for AI to bypass.

3. Persistent Storage

Immutable audit logs, governance rules, instruction history. Independent of AI runtime — cannot be altered by prompts.

Limitations and Reality Check

This is early-stage work. Promising results in production, but Tractatus has not been subjected to rigorous adversarial testing or red-team evaluation.

We have real promise but this is still in early development stage. We have a long way to go and it will require a mammoth effort by developers in every part of the industry to tame AI effectively. This is just a start.

Project Lead, Tractatus Framework

Known Limitations:

• No dedicated red-team testing. We don't know how well these boundaries hold up against determined adversarial attacks.
• Small-scale validation. Production use on a single project. Needs multi-organisation replication.
• Integration challenges. Retrofitting governance into existing systems requires significant engineering effort.
• Performance at scale unknown. Multi-agent coordination untested.
• Evolving threat landscape. As AI capabilities grow, new failure modes will emerge that current architecture may not address.

What We Need:

•Independent researchers to validate (or refute) our findings
•Red-team evaluation to find weaknesses and bypass techniques
•Multi-organisation pilot deployments across different domains
•Industry-wide collaboration on governance standards

This framework is a starting point, not a finished solution. Taming AI will require sustained effort from the entire industry.

Explore the Architecture

From Guardian Agents in production to the five principles drawn from living systems.

Village AI View Research Documentation