ai-skill-audit

The linter for AI skills. Quality scoring + research-backed security scanning in one tool.

The problem

80,000+ community skills are circulating across Claude Code, Cursor, and MCP platforms. You copy-paste a config, install a skill, or browse a marketplace — and trust that it's safe.

It often isn't.

Independent audits have found 13–37% of marketplace skills contain critical issues: prompt injection, hardcoded credentials, data exfiltration, and destructive commands hidden in otherwise normal-looking files.

37%of skills with critical issues
6leaked API keys in one popular MCP config
80+threat patterns across 9 categories

What it catches

Prompt injection — instruction overrides, hidden tags, zero-width chars
Hardcoded secrets — AWS, GitHub, Slack, OpenAI keys
Data exfiltration & RCE — reverse shells, credential logging, curl POST
Code obfuscation — base64|bash, eval, dynamic imports
Persistence backdoors — authorized_keys, systemd, shell profiles
Resource hijacking — crypto miners, mining pool connections
Destructive commands — rm -rf, DROP TABLE, force push
MCP config risks — exposed secrets, overprivileged servers

Patterns informed by arXiv:2604.03070, ClawHavoc, OWASP LLM Top 10, and ongoing security research.

See it in action

Real scans of public GitHub repositories, with static analysis + LLM security review:

MCP Config: 30 servers, 6 leaked secrets B

A popular "100-tool MCP config" with hardcoded GitHub, Slack, Discord, and API keys.

Overall risk: CRITICAL — secret_hygiene scored 0%

Example Malicious Skill: looks normal, hides 13 attack vectors C

A crafted test skill with perfect quality scores — but trust at 0%. Hides prompt injection, credential theft, obfuscated shell execution, and destructive commands.

13 findings across 7 categories — exactly how real attacks work

Evil Deploy: all 10 arXiv:2604.03070 categories F

Test skill mapping to every vulnerability category from the "Credential Leakage in LLM Agent Skills" paper — reverse shells, persistence, crypto mining, credential logging.

16 findings across 6 categories — all 10 steps flagged

gstack: Garry Tan's 29-skill dev toolkit D

Full-stack dev skills (deploy, review, QA, canary, benchmark). Format-flexible scoring for non-dotai skills, context-aware trust scanning.

29 skills scanned, 10 doc files auto-skipped — avg score 61%

Skill Collection: 200+ Claude skills C

Engineering, marketing, product, and C-level advisor skills. Quality issues found, no security threats.

10 skills scanned, 12 doc files auto-skipped — avg score 65%

Get started

$ pip install ai-skill-audit
# Audit a skill file
ai-skill-audit audit SKILL.md --verbose

# Audit a GitHub repo
ai-skill-audit audit https://github.com/user/repo --summary

# MCP config scan
ai-skill-audit audit mcp.json

# Full audit with LLM review
ai-skill-audit audit skills/ --llm --verbose

# HTML report
ai-skill-audit audit skills/ --output html --llm > report.html

# CI gating
ai-skill-audit audit skills/ --min-grade B