Building a daily SEO audit agent with Claude API: a complete walkthrough

This site has a daily SEO audit that runs at 11:00 UTC every morning. It reads every published blog post, classifies each one into a content silo, scores it against on-page SEO best practices, identifies coverage gaps, and writes a markdown report. The whole thing is about three hundred lines of Python, runs as a scheduled GitHub Action, and costs roughly a dollar a day in Claude API calls.

This is the kind of small, compounding system that wasn't worth building before the API economics got friendly. Below is the complete pattern.

What the audit actually does

For each published post, it produces:

Silo classification — which content silo (Web Development, Web Hosting & Maintenance, Performance Marketing, Creative Design) and which sub-category (e.g. WordPress Development) does this post belong to?
Static SEO checks — title length, meta description length, word count, H2/H3 heading count, internal blog link count, presence of cover image and frontmatter sources
Content quality score — 0-10 from Claude, based on substance and specificity, not vibes
Primary keyword alignment — strong / moderate / weak / absent
Specific recommendations — up to five per post, tagged as must / should / nice
Composite score 0-100 — combining the static checks with the Claude quality scores

For the site as a whole, it produces a coverage matrix (posts per silo vs target) and a list of recommended next articles for under-covered sub-categories.

The architecture in 50 words

A YAML file defines the silos and their target keywords. A Python script reads the posts, runs static checks, calls Claude API for the qualitative parts, and writes a markdown report. A GitHub Actions workflow runs the script daily and commits the report back to the repo.

The silos file

The taxonomy lives in _local/seo/silos.yaml. It's the source of truth for what we're trying to rank for:

silos:
  web-development:
    display: "Web Development"
    sub_categories:
      wordpress-development:
        primary_keyword: "WordPress development"
        supporting_keywords: ["WordPress development services", "custom WordPress development"]
        target_count: 4
      # ... 15 more sub-categories

Editing this file is the only way to change what the audit is measuring. The Python doesn't have any hard-coded silos — everything comes from YAML.

Static checks first, then Claude

The script does the boring deterministic stuff in Python, then sends the post body and frontmatter to Claude for the parts that need judgment. This is the right split for two reasons:

Cost. Counting H2 tags doesn't need an LLM. Sending the body to Claude just to count headings is wasteful.
Reliability. Static checks always produce the same number. LLM checks are noisier across runs. Anchoring the score in deterministic parts keeps the daily reports stable.

The static score is worth up to 70 points (title 10, description 10, word count 20, structure 10, internal linking 10, cover image 5, sources 5). Claude adds the remaining 30 points (keyword alignment up to 20, content quality 0-10).

The Claude call

One call per post, with the silos and scoring rubric in the system prompt and the post body in the user message. Two things make this fast and cheap.

Structured outputs. The Claude API supports output_config.format with a JSON schema. We define the exact shape we want — silo, sub_category, primary_keyword_alignment, content_quality, recommendations — and the API returns valid JSON. No regex parsing of free-form text, no schema validation issues, no retry loop when the model decides to wrap the JSON in markdown.

Prompt caching. The system prompt is the same across all posts in the audit run — the entire silos.yaml plus the scoring rubric. That's about 5,000 tokens. We mark it with cache_control on the last block, and after the first call writes the cache, every subsequent call in the same run reads from it at ~10% of the normal input price. For an audit run against 20 posts, this saves something like 70% of the input token spend.

The user message is just the post frontmatter plus body (capped at 8K characters). Claude returns the structured JSON.

The recommendation prompt

Recommendations are the part that's easiest to get wrong. Generic LLM "advice" like "consider improving readability" or "add more internal links" is worse than useless — it trains the user to ignore the output.

The fix is to constrain the prompt aggressively:

Recommendations should be specific and actionable. Severity levels: - must: SEO-critical (missing/wrong primary keyword, title issues, no H1) - should: meaningful improvement (add internal links, expand thin sections, fix structure) - nice: optional polish (tone, additional examples)

Keep each recommendation under 30 words and actionable. Prefer specific over generic.

The 30-word cap is doing more work than it looks. Without it, every recommendation balloons to "you might want to consider potentially expanding this section to discuss more thoroughly the various ways in which..." With it, recommendations come out as "Title is 71 chars — too long for SERP. Cut to 50-60 with 'WordPress development' in front."

Gap analysis

After all posts are classified, the script computes coverage per silo and sub-category:

Posts per silo, posts per sub-category
Sub-categories below their target_count from silos.yaml

A second Claude call takes the gap list and suggests article topics for the under-covered sub-categories. This call is also schema-constrained — each suggestion has a title, primary_keyword, and angle field. Output is markdown bullets in the daily report's "Recommended Next Articles" section.

Scheduling

The whole thing is a GitHub Actions workflow:

on:
  schedule:
    - cron: '0 11 * * *'  # 11 UTC, ~7am Eastern
  workflow_dispatch:

jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install -r _local/build/cli-requirements.txt
      - name: Run audit
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: python3 _local/seo/audit.py
      - name: Commit report
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add _local/seo-reports/
          git diff --staged --quiet || git commit -m "SEO audit: $(date -u +%Y-%m-%d) [skip ci]"
          git push

The [skip ci] marker on the commit message prevents the audit commit from triggering the production deploy workflow — the audit report is data, not a content change.

Performance analytics on a laptop screen

What this enables

Once the daily report exists, you get behavior changes you didn't plan for:

You stop guessing about content gaps. The report tells you exactly which sub-categories are underserved with a target count and an actual count.
Editorial decisions are evidence-based. When the report says "AI Maintenance has 0 of 3 target posts," that's a more compelling argument for the next article topic than "I think we should write about that."
Quality gets measured. A post scoring 45/100 is something you can act on. A post you "feel pretty good about" isn't.

What it doesn't do (yet)

The audit doesn't pull real ranking data from Google Search Console. That's the next layer — connecting GSC API to feed actual impressions, clicks, and position data into the per-post scoring. We're working on that integration now.

It also doesn't auto-apply edits. The report is read-only — recommendations land in markdown, you review them, you decide what to act on. Auto-editing committed content is a foot-gun, and we've deliberately avoided shipping it.

The full code

The audit lives in _local/seo/audit.py in the same git repo as the rest of this site. It's about three hundred lines, has a clean separation between static checks and Claude calls, and the silos file is the only thing you'd customize for a different agency. If you want the whole repo as a reference, the static rebuild announcement has more context on the architecture.

The cost of building this was real but small: about two hours including the silos file, the audit script, the GitHub Actions workflow, and the documentation. That investment pays back the first day the report tells you something you didn't know about your content.

Building a daily SEO audit agent with Claude API: a complete walkthrough

What the audit actually does

The architecture in 50 words

The silos file

Static checks first, then Claude

The Claude call

The recommendation prompt

Gap analysis

Scheduling

What this enables

What it doesn't do (yet)

The full code

Rough Works

Ready for a Website That Actually Works?

What the audit actually does

The architecture in 50 words

The silos file

Static checks first, then Claude

The Claude call

The recommendation prompt

Gap analysis

Scheduling

What this enables

What it doesn't do (yet)

The full code

Rough Works

Keep reading

Berner Crawl: seven engineering problems behind our 3-dog roguelike

OLVR's ring scroll: replicating a Rolex 3D effect with 360 sprite frames

Four Awwwards case studies from 2026 that should reshape how you brief agencies

Ready for a Website That Actually Works?