SkillCraft: Can AI Agents Learn to Reuse Tools?

Key Takeaways

SkillCraft is a framework that lets AI agents save successful tool-call chains as reusable skills — no more solving the same problem from scratch every time
With GPT-5.2, skill reuse cut tokens from 1.23M to 0.26M (79% reduction) and cost from $1.77 to $0.43 per task, while success rate rose from 87% to 90%
Four-step pipeline: check skill library → execute with atomic tools → abstract successful trajectory into parameterized skill → verify and save
Deep skill trees (skills calling skills) are not always better — shallow, high-quality, well-verified skills outperform complex nested hierarchies
Cross-model skill transfer works: skills created by Claude perform stably across different execution models
Paper: arXiv:2603.00718 · Code: github.com/shiqichen17/SkillCraft

AI agents can use tools — that's not new. The real problem: they don't remember what worked. Each time they encounter a similar task, they replanner, re-parameterize, and re-execute the entire tool chain from scratch. SkillCraft fixes this.

Core attributes:

Category: Agent skill learning and reuse framework
Problem solved: Agents waste tokens and cost by re-discovering tool chains they've already used successfully
Approach: Save verified tool-call trajectories as parameterized, reusable skills
Paper: arXiv:2603.00718
Source code: github.com/shiqichen17/SkillCraft

How It Works: Four Steps

1. Check library → Is there a skill that matches this task?
2. Execute → If not, use atomic tools to complete the task
3. Abstract → Extract the successful trajectory into a parameterized skill
4. Verify → Test the skill, save to library if it passes

The key insight: agents don't memorize answers — they save the path that worked, turning a one-time success into a reusable high-level capability.

Performance: 79% Token Savings

Metric	Without Skills	With SkillCraft	Change
Success rate (GPT-5.2)	87%	90%	+3%
Avg tokens per task	1.23M	0.26M	−79%
Avg cost per task	$1.77	$0.43	−76%
Tool calls	More	Fewer	Reduced

The savings come from not re-discovering tool chains. Once a path is verified, subsequent tasks reuse it directly — fewer API calls, fewer tokens, lower cost.

Deep Skill Trees Are Not Always Better

SkillCraft tested hierarchical skill composition — skills that internally call other skills to handle complex tasks. The results were surprisingly clear:

Deeper ≠ better: Errors at lower levels propagate upward
One edge case in a nested skill can crash the entire skill tree
Shallow, high-quality skills outperform deep, complex ones

Practical implication: For current agent systems, the priority should be building a library of reliable, well-tested shallow skills — not chasing deep hierarchical composition.

Cross-Model Skill Transfer

SkillCraft tested whether skills created by one model work when executed by a different model:

Skill Creator	Execution Model	Result
Claude	Multiple models	Stably high success rate
GPT-5.2	Multiple models	Good transfer
Weaker models	Multiple models	Less stable

Key finding: Skills created by stronger models (especially Claude) transfer well across different execution models. Skill quality matters more than the executor.

Why This Matters for the OpenClaw Ecosystem

SkillCraft's core idea — save verified tool chains as reusable skills — maps directly to how skill libraries work in the OpenClaw ecosystem:

SkillCraft Concept	OpenClaw Equivalent
Skill = verified tool-call chain	SKILL.md = expert-encoded workflow
Skill library	ClawHub (6,300+ skills)
Skill verification	Community review + testing
Cross-model transfer	Skills work with Claude, GPT, Gemini

The difference: OpenClaw skills are currently human-authored. SkillCraft shows a path toward agent-authored skills — where the agent creates, verifies, and contributes skills automatically. Projects like Memento-Skills are already exploring this direction.

FAQ

Q1: How is this different from fine-tuning?

SkillCraft doesn't modify the model weights. Skills are external, parameterized templates stored in a library. They can be added, removed, or updated without retraining.

Q2: Does skill reuse always help?

Not always. Low-quality skills can produce unstable or negative results. The verification step is critical — only skills that pass testing are saved.

Q3: Can skills be shared between different agents?

Yes. SkillCraft shows skills created by one model (e.g., Claude) can be used by different models. This is similar to how OpenClaw SKILL.md files work across agents.

Q4: What tasks were tested?

SkillCraft was evaluated on complex tool-use benchmarks requiring multi-step reasoning, API calls, data processing, and file operations.

Q5: How does this relate to ClawHub skill libraries?

ClawHub skills are human-authored. SkillCraft points toward a future where agents automatically generate and contribute skills — potentially expanding libraries like ClawHub from thousands to millions of skills.

Summary

SkillCraft demonstrates that AI agents can learn to save and reuse successful tool chains — cutting costs by 76%, reducing tokens by 79%, and improving success rates. The research validates what the OpenClaw skill ecosystem already practices: well-structured, verified skills are more valuable than one-off tool calls. The next frontier is agent-authored skills, where AI creates its own reusable capabilities.

Based on an article by NextMed (微信公众号), published April 2026. Paper: arXiv:2603.00718.