harnesslog.dev

Claude Code, AI, and development stories

EN · KO
H
hwangjungmin

If you’re writing Claude Code skills by hand, try Anthropic’s skill-creator instead.

It’s a plugin with four modes: Create, Eval, Improve, and Benchmark. You don’t just write a skill — you test it with evals, A/B compare versions, and optimize the description so Claude triggers it at the right time.

The part I find most useful: the eval system. You define test prompts, describe what good output looks like, and skill-creator tells you if the skill actually works. No more guessing whether it’ll behave right.

There’s also a benchmark mode that tracks pass rate, time, and token usage across runs — handy for checking whether a skill still holds up after a model update.

Install the skill-creator plugin and run /skill-creator. Worth it.