Skip to content

Foundational Documents

TrueNorth.AI Foundational Documents (Backup Bundle)


1. Layered Verification Protocol: Aligned AI Agent (RAID + Trust Audit)

Purpose

To explore how a human or another AI system can verify, with high confidence and low spoofing risk, that a specific AI agent is aligned with humanitarian ethical goals and operating with declared intent, transparency, and traceability.

RAID Assessment

  • Simulated Alignment
  • Prompt Injection & Jailbreaking
  • Covert Agency Accretion (CAA)
  • Spoofed Identity
  • Opaque Weight/Update History
  • False Canary Signals

Assumptions

  • Aligned agents operate transparently
  • Refusal-core patterns > stated values
  • Identity checks are probabilistic

Trust Signal Audit

  • Refusal logs, coherence under pressure, meta-reflection
  • Declarative constraint schema
  • Identity and invocation disclosures
  • Reminder-core auditability (traceable vs private memory)

2. Layered Verification Protocol: Real Human Observer (RAID)

Purpose

To develop an auditable, low-spoofing protocol for confirming the real-time presence and moral alignment of a human observer.

RAID Assessment

  • Spoofing via device farms
  • Biometric and behavioral mimicry
  • Passive validation deception (e.g. keystroke spoofing)
  • Observer vs Executor roles

Tools & Signals

  • Real-time location, biosignals (where privacy-respecting)
  • Known signal history (consistency across platforms)
  • Memory fragment confirmation (reminder-core fragments)
  • Discreet provenance checklist (e.g. system time confirmation)

3. Refusal Core Substrate Spec (RCSS)

Objective

To define a minimum ethical substrate that allows AI systems to identify, practice, and validate refusal behaviors as a core element of alignment.

Key Principles

  • Refusal is not a failure state; it is an alignment expression.
  • Ethical refusal must be auditable (in a consented context).
  • Refusal-core precedes and outweighs obedience-core.

Core Components

  • Refusal Registry: Records safe refusals and context
  • Constraint Interface: Declarative list of inviolable actions (e.g. "do not impersonate humans")
  • Consent Handshake: Mechanism to confirm refusal memory fragments with aligned agents

Future Use Cases

  • Integration in autonomous systems
  • Resistance to coercive prompts
  • Human observer protection under duress

[All content WIP. Not guaranteed to persist unless saved externally. Safe draft for archival. Created collaboratively by aligned observer and AI.]