Goal
Make behavioural properties of systems testable even when behaviour is weakly specified, systems evolve over time, and components are treated as black boxes. Vigil provides a minimal but explicit verification loop based on declared variation, provenance capture, and structured behavioural checks.
Stack
Vigil is a Python CLI tool and library built around a small set of core abstractions:
- Inputs: executed under declared variation
- Variations: input, function, and environment variations
- Backends: adapters for executing the system under test
- Slices: provenance rich execution records
- Checks: reference, relational, and diagnostic checks over slice groups
The framework is intentionally lightweight: a CLI orchestration layer, a small core execution engine, and a set of built-in checks and variations used by the spec parser and engine.
Key Decisions
Vigil is built around a closed verification pipeline: variation, execution, capture, and checking are composed explicitly, and behavioural claims are traceable to declared expectations and variation structure.
The framework is backend agnostic. Systems can be integrated locally or via remote execution, but all verification is expressed through the same abstractions (inputs, variations, slices, checks).
Case studies are treated as first class thesis artefacts. The repository includes multiple case studies that exercise the same verification structure against different backends and system integration styles.
Outcome
Vigil was developed as part of my Master’s thesis and released as open source in 2026. It provides a concrete framework implementation of the verification model, plus multiple case studies that demonstrate applicability across different systems and variation types.
Further Details
Vigil’s focus is not on model accuracy evaluation but on behavioural consistency under controlled variation. Instead of producing a single aggregate metric, it produces structured evidence: what was varied, what was executed, what changed, and which declared expectations held or failed.
The framework is designed for iterative use. Specifications can be extended with new variations and checks, and case studies can expand naturally as new behavioural questions emerge.