ADR-001: Content Validation Pipeline

Status

Accepted

Context

The website uses Jekyll for static site generation with automated content publishing through GitHub Actions. AI-assisted content creation introduces patterns that degrade content quality and signal non-human authorship. Manual validation is error-prone and inconsistent across team members.

Decision

Implement a three-tier content validation pipeline using Node.js scripts integrated with GitHub Actions:

  1. Tier 1: Character-level validation (validate-character-restrictions.js)
    • Detects AI-telltale characters (em dashes, smart quotes, emojis)
    • Automated checks in CI with GitHub annotations
    • Exit codes for CI integration
  2. Tier 2: Frontmatter validation (validate-frontmatter.js)
    • JSON Schema validation using AJV
    • Schema files co-located with content directories
    • Supports draft-07 JSON Schema specification
  3. Tier 3: Security hardening (implemented in all scripts)
    • Symlink traversal protection
    • File size limits (1MB default)
    • Directory depth limits (10 levels)
    • Atomic file writes for fix operations

Consequences

Positive:

Negative:

Alternatives Considered

  1. Linter-only approach: markdownlint alone lacks AI-pattern detection
  2. Pre-commit hooks only: Bypassed too easily, no CI enforcement
  3. Manual review: Inconsistent and doesn’t scale