ADR-001: Content Validation Pipeline
Status
Accepted
Context
The website uses Jekyll for static site generation with automated content publishing through GitHub Actions. AI-assisted content creation introduces patterns that degrade content quality and signal non-human authorship. Manual validation is error-prone and inconsistent across team members.
Decision
Implement a three-tier content validation pipeline using Node.js scripts integrated with GitHub Actions:
- Tier 1: Character-level validation (
validate-character-restrictions.js)- Detects AI-telltale characters (em dashes, smart quotes, emojis)
- Automated checks in CI with GitHub annotations
- Exit codes for CI integration
- Tier 2: Frontmatter validation (
validate-frontmatter.js)- JSON Schema validation using AJV
- Schema files co-located with content directories
- Supports draft-07 JSON Schema specification
- Tier 3: Security hardening (implemented in all scripts)
- Symlink traversal protection
- File size limits (1MB default)
- Directory depth limits (10 levels)
- Atomic file writes for fix operations
Consequences
Positive:
- Consistent content quality across all contributors
- Automated enforcement reduces manual review burden
- Clear error messages with line/column references
- GitHub Actions integration provides PR-level feedback
Negative:
- Additional CI time for validation steps
- Schema updates require coordination with content structure changes
- Strict validation may reject legitimate edge cases
Alternatives Considered
- Linter-only approach: markdownlint alone lacks AI-pattern detection
- Pre-commit hooks only: Bypassed too easily, no CI enforcement
- Manual review: Inconsistent and doesn’t scale