Skip to main content

Architecture

This document describes the architecture of the Watchlight AI Agent Runtime Governance Control Plane.

System Overview

┌──────────────────────────────────────────────────────────────────────┐
│ AI Agent Ecosystem │
│ LangGraph │ CrewAI │ AutoGen │ Custom Agents │
│ │ Python SDK │ │
└──────────────────┬───────────────────────────────────────────────────┘

┌──────────────▼──────────────┐
│ Tier 3: wl-proxy (9443) │ Transparent policy enforcement
│ + wl-secrets-broker │ + credential injection
└──────────────┬──────────────┘

┌──────────────▼──────────────┐
│ Tier 2: wl-apdp (443) │ Cedar policy evaluation
│ Intent → Goal → │ Delegation chain validation
│ Delegation → Policy │
└──────────────┬──────────────┘

┌──────────────▼──────────────┐ ┌──────────────────────┐
│ Tier 1: wl-registry (8443) │◀───│ wl-discover │
│ Service catalog + trust │ │ Network scanner + │
│ state management │ │ agent detection │
└──────────────┬──────────────┘ └──────────────────────┘

┌──────────────▼──────────────┐
│ PostgreSQL 16 + OpenBao │ Shared database + secrets
└─────────────────────────────┘

Each tier builds on the previous. Tier 1 runs standalone; Tier 2 adds wl-apdp; Tier 3 adds wl-proxy and wl-secrets-broker.

WL-APDP Internal Architecture

Request Flow

  1. HTTP Layer (Axum Router)

    • Receives authorization requests
    • Validates request format
    • Routes to appropriate handlers
  2. AuthzService (Business Logic)

    • Validates delegation chains
    • Validates intent and goal context
    • Coordinates policy evaluation
  3. PolicyManager (Cedar Engine)

    • Intelligent Policy Selection: Filters applicable policies based on metadata
    • Cedar Evaluation: Evaluates filtered policies against the request
    • Returns authorization decision

Authorization Flow

Request → Delegation Chain Validation
→ Intent Validation
→ Goal Validation
→ Intelligent Policy Selection (filters applicable policies)
→ Cedar Evaluation
→ Response

Thread-Safe State Management

The server uses Rust's ownership system for safe concurrent access:

pub struct AppState {
pub users: Arc<RwLock<HashMap<String, User>>>,
pub policies: Arc<RwLock<Vec<Policy>>>,
}
  • Arc enables shared ownership across threads
  • RwLock allows multiple readers or single writer
  • Zero-copy where possible for performance

Intelligent Policy Selection

When policies are added, metadata is extracted:

  • Principal patterns (specific users, groups, wildcards)
  • Action patterns
  • Resource patterns
  • Context requirements
  • Complexity score (1-10)

During evaluation, only applicable policies are selected before Cedar evaluation, achieving 20-30x performance improvement.

MCP Registry Architecture

Data Model

┌──────────────────┐       ┌──────────────────┐
│ Scanner │───────│ MCP Server │
├──────────────────┤ ├──────────────────┤
│ id │ │ id │
│ agent_id │ │ scanner_id (FK) │
│ name │ │ name │
│ api_key_hash │ │ command │
│ last_seen │ │ args │
│ is_active │ │ capabilities │
└──────────────────┘ │ status │
│ last_verified │
└──────────────────┘

API Patterns

  • RESTful endpoints for CRUD operations
  • Scanner authentication via API keys
  • Bulk operations for efficiency

Communication Protocols

HTTP/REST

All services communicate via HTTP/REST:

  • JSON request/response bodies
  • Standard HTTP status codes
  • CORS support for browser clients

Authentication

  • Internal Services: API keys or mTLS
  • SDK Clients: API keys
  • Dashboard: Session-based auth

Deployment Architecture

Development (Docker Compose)

services:
wl-apdp:
ports: ["8081:8081"]

mcp-registry:
ports: ["8080:8080"]
depends_on: [postgres]

postgres:
ports: ["5434:5432"]

beacon-dashboard:
ports: ["5173:5173"]

Production (Kubernetes)

  • Horizontal scaling via ReplicaSets
  • Service mesh for observability
  • External PostgreSQL (RDS/Cloud SQL)
  • Ingress for external traffic

Observability

Logging

  • Structured JSON logging via tracing
  • Configurable log levels
  • Request ID correlation

Metrics (Optional)

  • OpenTelemetry integration
  • Prometheus-compatible metrics
  • Latency histograms, request counts

Health Checks

Each service exposes:

  • /health - Liveness check
  • /ready - Readiness check
  • /status - Detailed status

Security Considerations

Network Security

  • TLS for all external connections
  • Network policies in Kubernetes
  • No sensitive data in URLs

Authorization

  • All actions require valid credentials
  • Principle of least privilege
  • Audit logging for compliance

Data Protection

  • Secrets stored securely (env vars, secrets managers)
  • Database encryption at rest
  • No PII in logs