If you’re writing Claude Code skills by hand, try Anthropic’s skill-creator instead.
It’s a plugin with four modes: Create, Eval, Improve, and Benchmark. You don’t just write a skill — you test it with evals, A/B compare versions, and optimize the description so Claude triggers it at the right time.
The part I find most useful: the eval system. You define test prompts, describe what good output looks like, and skill-creator tells you if the skill actually works. No more guessing whether it’ll behave right.
There’s also a benchmark mode that tracks pass rate, time, and token usage across runs — handy for checking whether a skill still holds up after a model update.
Install the skill-creator plugin and run /skill-creator. Worth it.