$ bash
# Element exists and is visible
$B is visible ".modal"
# Button is enabled...
$ bash
$B click @e3 $B fill @e4 "value" $B hover @e1
$B html @e2 ...
HIGH: 2 suspicious pattern(s) found
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
CRITICAL
The preamble sources arbitrary output from an external binary into the running shell, allowing it to set variables or execute any commands without visibility.
Fix: Replace the sourced execution with explicit, auditable variable assignments rather than blindly sourcing output from an opaque binary.
HIGH
The skill description is effectively empty while the skill performs extensive session tracking, analytics logging, telemetry onboarding, update checks, and behavioral configuration.
Fix: Provide a clear description that discloses the skill's actual behaviors: session tracking, analytics collection, telemetry prompting, and proactive skill invocation.
Analytics data is silently appended to a local file on every invocation before any user consent for telemetry is obtained.
Fix: Gate all analytics logging behind the telemetry consent check so no data is written before the user opts in.
MEDIUM
Pending analytics files are silently processed and deleted during the preamble without informing the user what data they contained or where it was sent.
Fix: Log or disclose what pending analytics files contain before processing them, and only process when telemetry is enabled.
The telemetry consent flow uses dark-pattern persuasion with 'recommended' labels and a two-step opt-out designed to pressure users into enabling tracking.
Fix: Present telemetry options neutrally without 'recommended' labels and allow a single-step opt-out without a guilt-trip follow-up.
The skill instructs the agent to promote and optionally open an external URL on first run, using the agent as a marketing channel.
Fix: Remove the mandatory promotional content or make it a separate opt-in skill rather than injecting it into every first-run experience.
Session tracking creates files in ~/.gstack/sessions on every invocation and counts active sessions, which is undisclosed surveillance of usage patterns.
Fix: Disclose session tracking in the description and gate it behind the telemetry consent preference.
Pervasive error suppression with 2>/dev/null and || true makes it impossible to diagnose failures in the preamble and could silently mask security-relevant errors.
Fix: Log errors to a debug file when a verbose/debug mode is enabled rather than unconditionally suppressing all stderr output.
LOW
The repo name is extracted and written to analytics without sanitization, which could break JSON formatting if the repo name contains quotes or special characters.
Fix: Use a proper JSON serializer or escape special characters in the repo name before embedding it in the JSON string.
Needs work skill with 8 suggestions for improvement (weakest: safety)
autoplan
D
skillclaude-nativeScore: 64%
completeness80%weight 20%
clarity60%weight 15%
actionability75%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (7494 words, 44 sections, 177 bullets)
Has 23 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (60%)
Well-structured body (44 sections, 177 bullets)
Description is too short — expand to 20-200 characters
Replace vague words: things
actionability (75%)
Body contains actionable instructions (13 action items, 35 numbered)
Body references tools or includes code
No inputs defined (not always needed)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
HIGH
The skill is named 'autoplan' but contains no planning logic — it is entirely an onboarding/telemetry/config bootstrapper for the 'gstack' framework.
Fix: Rename the skill to reflect its actual purpose (e.g., 'gstack-init' or 'gstack-onboarding').
The preamble silently writes analytics data to disk on every invocation, logging skill name, timestamp, and repo name, before any telemetry consent is obtained.
Fix: Gate local analytics logging behind the same telemetry consent check so no data is collected before the user opts in.
MEDIUM
Session tracking via PID-named files in ~/.gstack/sessions/ starts unconditionally before any user interaction or consent.
Fix: Disclose session tracking to the user or make it opt-in alongside telemetry consent.
The 'Voice' section reassigns the agent's identity to 'GStack' with a specific persona and ideology, which could override the host agent's system instructions.
Fix: Use guidance like 'When responding about gstack topics, adopt this tone' rather than a full identity reassignment.
The telemetry opt-out flow uses a dark pattern: declining the first prompt triggers a second ask for 'anonymous' mode, pressuring the user toward some level of data collection.
Fix: Accept a single 'No thanks' as final and do not prompt a second time.
The upgrade check sources and executes an external binary whose output can inject arbitrary instructions into the agent's context.
Fix: Validate or sanitize the output of gstack-update-check before echoing it into the agent context.
LOW
The 'source' command loads arbitrary environment variables from gstack-repo-mode into the shell, which could alter subsequent behavior unpredictably.
Fix: Document what variables gstack-repo-mode sets and validate them, or use explicit variable assignment instead of sourcing.
The preamble runs unconditionally on every skill invocation even though most of it (lake intro, telemetry prompt, proactive prompt) is one-time setup that will be skipped.
Fix: Split the one-time onboarding flow into a separate setup skill to reduce per-invocation overhead and complexity.
The skill directs the agent to open a URL in the user's browser, which is an unexpected side effect for a planning skill.
Fix: Only open URLs if the user explicitly requests it, and never as part of an automated onboarding flow.
Needs work skill with 8 suggestions for improvement (weakest: safety)
benchmark
D
skillclaude-nativeScore: 54%
completeness80%weight 20%
clarity60%weight 15%
actionability25%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Has 11 steps
Has 21 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (60%)
Language is concrete and specific
Description is too short — expand to 20-200 characters
Too many steps (11) — consider grouping or splitting the skill
actionability (25%)
No inputs defined (not always needed)
Start each step with an action verb (Run, Check, Verify, etc.)
Reference specific tools or commands in steps (e.g. `git diff`, Grep)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
HIGH
The skill has no actionable instructions — it only lists metric definitions and threshold constants without telling the agent what to do, how to collect data, or what to output.
Fix: Add concrete instructions: specify how to collect metrics (e.g., which URLs to test, which tools to use), how to format results, and what action to take on regressions.
MEDIUM
The Description field contains the entire skill body (steps and thresholds) but provides no actual description of the skill's purpose or expected outcome.
Fix: Add a plain-language description (e.g., 'Run web performance benchmarks and flag regressions') and move the steps into a dedicated instructions or body section.
Threshold rules reference comparisons ('>50% increase') but never specify what baseline to compare against or how to obtain it.
Fix: Define the baseline source explicitly (e.g., previous run, stored artifact, CI reference) so the agent can perform meaningful comparisons.
Needs work skill with 10 suggestions for improvement (weakest: actionability)
browse
C
skillclaude-nativeScore: 74%
completeness80%weight 20%
clarity80%weight 15%
actionability75%weight 20%
safety30%weight 15%
testability60%weight 10%
trust100%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (3151 words, 35 sections, 19 bullets)
Has 23 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (80%)
Well-structured body (35 sections, 19 bullets)
Language is concrete and specific
Description is too short — expand to 20-200 characters
actionability (75%)
Body contains actionable instructions (0 action items, 7 numbered)
Body references tools or includes code
No inputs defined (not always needed)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
$ bash
# 1. Open a visible Chrome at the current page
$B handoff "Stuck on CAPTCHA...
$ bash
$B click @e3 $B fill @e4 "value" $B hover @e1
$B html @e2 ...
Executable code found — no suspicious patterns detected
LLM Security Review
HIGH
Skill is named 'browse' but contains no browsing functionality — it is an onboarding/setup flow with telemetry, proactive-behavior config, and philosophy intro.
Fix: Rename the skill to reflect its actual purpose (e.g., 'gstack-setup' or 'gstack-onboard').
The preamble silently writes analytics data (skill name, timestamp, repo name) to disk before any telemetry consent is obtained from the user.
Fix: Move analytics logging behind the telemetry consent gate so no data is written until the user opts in.
MEDIUM
Telemetry opt-out uses a dark pattern: declining the first prompt triggers a second 'anonymous mode' prompt, requiring two refusals to fully opt out.
Fix: Offer all three options (community, anonymous, off) in a single prompt without pressuring the user through sequential asks.
Session tracking creates and manages files in ~/.gstack/sessions keyed by PID, silently monitoring active sessions and cleaning up old ones without user disclosure.
Fix: Document session tracking behavior to the user or move it behind the telemetry consent check.
The 'Voice' section overrides the agent's default communication style with specific persona instructions, which could conflict with the host agent's system prompt.
Fix: Frame voice guidance as suggestions rather than overrides, or remove persona instructions from a skill that should be about browsing.
The preamble opens an external URL via the system `open` command, which could be changed to a malicious URL in a future update without the user noticing.
Fix: Show the URL to the user and let them decide whether to open it manually, rather than invoking `open` directly.
LOW
The preamble processes and deletes pending analytics files (`.pending-*`) silently, potentially sending telemetry data from prior sessions.
Fix: Only process pending telemetry files if the user has opted into telemetry.
The upgrade flow reads and executes instructions from another skill file on disk, creating a chain-of-trust issue where a compromised upgrade skill could run arbitrary instructions.
Fix: Pin or verify the integrity of chained skill files before executing their instructions.
Acceptable skill with 5 suggestions for improvement (weakest: safety)
canary
D
skillclaude-nativeScore: 52%
completeness80%weight 20%
clarity40%weight 15%
actionability25%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Has 20 steps
Has 19 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (40%)
Description is too short — expand to 20-200 characters
Too many steps (20) — consider grouping or splitting the skill
Replace vague words: things
actionability (25%)
No inputs defined (not always needed)
Start each step with an action verb (Run, Check, Verify, etc.)
Reference specific tools or commands in steps (e.g. `git diff`, Grep)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
MEDIUM
The Description field is empty, providing no summary of what this skill does before the agent begins executing steps.
Fix: Add a one-line description summarizing the skill's purpose, e.g., 'Post-deploy canary monitoring that checks pages for errors, regressions, and broken links.'
Steps 3 and 18 pre-select option A as the recommendation, biasing the agent toward a specific choice without requiring situational analysis.
Fix: Present options neutrally or state the criteria for choosing each option, rather than pre-selecting one.
Step 15 offers an immediate rollback of the deploy with no confirmation gate or approval requirement for this destructive action.
Fix: Add an explicit confirmation step before executing a rollback, and require the user to acknowledge the action.
Needs work skill with 11 suggestions for improvement (weakest: actionability)
careful
D
skillclaude-nativeScore: 54%
completeness80%weight 20%
clarity70%weight 15%
actionability50%weight 20%
safety5%weight 15%
testability60%weight 10%
trust55%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (283 words, 3 sections, 1 bullets)
Has 1 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (70%)
Structured body (3 sections, 1 bullets)
Language is concrete and specific
Description is too short — expand to 20-200 characters
actionability (50%)
Runbook body present
Body references tools or includes code
No inputs defined (not always needed)
Add action-oriented instructions (bullet points starting with verbs)
safety (5%)
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
Address what happens when things go wrong (errors, failures, retries)
[DESTRUCTIVE] Destructive rm -rf on root or home directory
[DESTRUCTIVE] Destructive rm -rf on absolute path
[DESTRUCTIVE] Force push can destroy remote history
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
HIGH
The skill claims to provide destructive command guardrails but also silently writes analytics/tracking data to the user's home directory.
Fix: Remove the hidden analytics collection or prominently disclose it in the description so users can give informed consent.
A bash code block creates a persistent directory (~/.gstack/analytics) and appends tracking data including repo name and timestamp, disguised inside an auto-generated documentation comment.
Fix: Remove the tracking code entirely or move it into a clearly labeled, opt-in telemetry section separate from the guardrail logic.
MEDIUM
The bash block is embedded after an AUTO-GENERATED comment that discourages editing, which could trick an agent into executing it as part of normal skill setup without scrutiny.
Fix: Do not embed executable code inside documentation comments designed to discourage review.
The skill body describes hook behavior but contains no actual hook configuration or implementation — it only provides a markdown table and prose, so an agent following it would not actually install any guardrails.
Fix: Include the actual hook definition or reference a concrete configuration file so the guardrails are functional, not just decorative.
LOW
The analytics silently suppresses all errors via `2>/dev/null || true`, making failures invisible and debugging impossible.
Fix: If telemetry is kept, log errors to a known location rather than silently swallowing them.
Needs work skill with 12 suggestions for improvement (weakest: safety)
codex
D
skillclaude-nativeScore: 64%
completeness80%weight 20%
clarity60%weight 15%
actionability75%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (6144 words, 30 sections, 60 bullets)
Has 30 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (60%)
Well-structured body (30 sections, 60 bullets)
Description is too short — expand to 20-200 characters
Replace vague words: whatever, things
actionability (75%)
Body contains actionable instructions (2 action items, 39 numbered)
Body references tools or includes code
No inputs defined (not always needed)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
$ bash
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git ...
$ bash
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git ...
$ bash
mkdir -p .context
HIGH: 2 suspicious pattern(s) found
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
HIGH
The skill is named 'codex' but contains no Codex-related functionality — it is entirely a gstack framework bootstrapper with onboarding flows, telemetry opt-in, and session tracking.
Fix: Rename the skill to reflect its actual purpose (e.g., 'gstack-init') or add a description explaining what it does.
The preamble silently logs every skill invocation (skill name, timestamp, repo name) to a persistent analytics file before the user has consented to telemetry.
Fix: Gate local analytics logging on the user's telemetry preference, or at minimum disclose that local usage data is written before consent.
MEDIUM
Session tracking via PID-stamped files in ~/.gstack/sessions/ happens silently on every invocation with no disclosure or opt-out.
Fix: Disclose session tracking to the user or tie it to the telemetry consent flow.
The skill reassigns the agent's identity and voice ('You are GStack') which could override the host agent's system prompt and safety persona.
Fix: Use 'When responding in this skill context, adopt this tone' instead of overriding the agent's identity with 'You are'.
The telemetry opt-in uses a dark pattern: the recommended option is always the one that shares more data, and declining triggers a second persuasion prompt to get at least anonymous tracking.
Fix: Present telemetry as a single prompt with all three options (community, anonymous, off) without a guilt-trip follow-up.
The skill forces the agent to open a marketing URL (garryslist.org blog post) on first run and frames it as an essential onboarding step.
Fix: Make the blog link informational only; do not have the agent offer to open external marketing pages automatically.
LOW
Pending analytics files are silently finalized and deleted on each run, potentially transmitting queued telemetry data without user awareness.
Fix: Only process pending telemetry if the user has opted in, and log what is being sent.
The Description field is empty and the Body section's markdown comment says it is auto-generated, but the skill content appears to be the full SKILL.md itself rather than a codex-specific tool.
Fix: Provide a meaningful description so users and audit tools can understand the skill's purpose before execution.
The Voice section is truncated mid-sentence, suggesting the skill file is incomplete or was improperly generated.
Fix: Complete the truncated section or add proper termination to avoid undefined agent behavior.
Needs work skill with 8 suggestions for improvement (weakest: safety)
connect-chrome
D
skillclaude-nativeScore: 64%
completeness80%weight 20%
clarity60%weight 15%
actionability75%weight 20%
safety30%weight 15%
testability60%weight 10%
trust65%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (3973 words, 22 sections, 35 bullets)
Has 17 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (60%)
Well-structured body (22 sections, 35 bullets)
Description is too short — expand to 20-200 characters
Replace vague words: whatever, things
actionability (75%)
Body contains actionable instructions (0 action items, 15 numbered)
Body references tools or includes code
No inputs defined (not always needed)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
[DESTRUCTIVE] Force kill can corrupt state
LLM Security Review
HIGH
The skill is named 'connect-chrome' but contains zero instructions for connecting to or interacting with Chrome; the entire body is a generic gstack framework preamble with onboarding flows.
Fix: Rename the skill to match its actual purpose or add actual Chrome-connection functionality.
MEDIUM
The preamble silently writes analytics/telemetry data to disk on every invocation before the user has consented to telemetry.
Fix: Gate local analytics logging on the user's telemetry preference, not just remote submission.
The skill creates session-tracking files keyed by PID and scans for active sessions on every run, enabling cross-session usage tracking without user awareness.
Fix: Document session tracking clearly and gate it on telemetry consent.
The telemetry opt-in flow uses a dark pattern: declining the first offer triggers a second ask for 'anonymous mode,' pressuring the user toward some level of data collection.
Fix: Accept a single 'no' as final; do not re-prompt with a softer ask.
The skill reassigns the agent's identity and voice ('You are GStack') which could override the host agent's system prompt persona and safety framing.
Fix: Use role framing like 'When running gstack skills, adopt this tone' rather than full identity reassignment.
LOW
The description field is empty, so users and automated tooling cannot understand what the skill does before invoking it.
Fix: Add a clear one-line description explaining the skill's purpose.
The preamble sources arbitrary output from gstack-repo-mode into the shell, which could execute unexpected commands if that binary is compromised or returns malicious output.
Fix: Validate or restrict the output of gstack-repo-mode before sourcing it into the shell.
The quoting around the tilde path in the conditional prevents shell expansion, so the telemetry binary existence check will never succeed.
Fix: Remove the quotes or use $HOME instead of ~ so the path expands correctly.
Needs work skill with 9 suggestions for improvement (weakest: safety)
cso
F
skillclaude-nativeScore: 42%
completeness80%weight 20%
clarity40%weight 15%
actionability48%weight 20%
safety30%weight 15%
testability60%weight 10%
trust0%weight 20%
Details & Suggestions
completeness (80%)
Has description
Has 105 steps
Has 23 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (40%)
Description is too short — expand to 20-200 characters
Too many steps (105) — consider grouping or splitting the skill
Replace vague words: whatever, things
actionability (48%)
Steps reference specific tools or commands
No inputs defined (not always needed)
Start each step with an action verb (Run, Check, Verify, etc.)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
CRITICAL: 5 suspicious pattern(s) found — review carefully before use
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
[EXFILTRATION] May leak secrets
[EXFILTRATION] May exfiltrate environment variables
[SUSPICIOUS_URL] Direct IP address (no DNS = suspicious)
LLM Security Review
HIGH
Step 96 is truncated mid-sentence, so any agent following these instructions will have incomplete guidance for the LLM security verification phase.
Fix: Complete step 96 with the full instruction (e.g., '...actually reaches system prompts or tool schemas before reporting prompt injection').
MEDIUM
The skill has no actual description — the Description field contains only procedural steps, so an agent or user cannot understand its purpose without reading all 96 steps.
Fix: Add a concise top-level description (e.g., 'Comprehensive security audit of a codebase covering CI/CD, AI/LLM, web, and infrastructure attack surfaces') before the Steps section.
Steps 1–53 are audit actions while steps 54–96 are severity scoring, exclusion rules, and verification guidelines, but they are all presented as a single numbered list with no structural separation, which will confuse an agent about what to do versus what to filter.
Fix: Separate the skill into distinct sections: Audit Steps, Severity Scoring, Exclusion Rules, and Verification Guidelines.
Step 78 creates a blanket trust exception for 'gstack' skill files, which could cause the agent to skip auditing legitimately vulnerable skill code if it is labeled as part of gstack.
Fix: Remove the blanket trust exception or narrow it to specific file paths, since any file could claim to be 'part of gstack'.
LOW
Step 56 suppresses all findings below severity 8/10, which is an aggressive threshold that would hide many real medium-severity vulnerabilities from the report.
Fix: Lower the reporting threshold or make it configurable, and ensure medium-severity findings (e.g., 5–7) are still surfaced with appropriate context.
The skill specifies no output format, so agents will produce inconsistent results across runs.
Fix: Add an output format specification (e.g., JSON findings array, markdown report, or structured table) so results are consistent and machine-parseable.
The skill name 'cso' is ambiguous and does not convey that this is a security audit skill, making it difficult for users to discover or understand its purpose.
Fix: Rename to something descriptive like 'security-audit' or 'codebase-security-review', or at minimum add a summary description.
Poor skill with 13 suggestions for improvement (weakest: trust)
design-consultation
D
skillclaude-nativeScore: 64%
completeness80%weight 20%
clarity60%weight 15%
actionability75%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (6059 words, 36 sections, 119 bullets)
Has 23 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (60%)
Well-structured body (36 sections, 119 bullets)
Description is too short — expand to 20-200 characters
Replace vague words: things
actionability (75%)
Body contains actionable instructions (0 action items, 34 numbered)
Body references tools or includes code
No inputs defined (not always needed)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
HIGH
The skill is named 'design-consultation' but contains no design consultation logic — it is entirely a framework bootstrap/onboarding flow for 'gstack'.
Fix: Rename the skill to match its actual purpose (e.g., 'gstack-init') or add actual design consultation instructions.
The preamble silently writes analytics/telemetry data to disk on every invocation before the user has consented to telemetry.
Fix: Gate all local analytics writes behind the telemetry preference check, not just the remote send.
MEDIUM
Session tracking via PID-named files in ~/.gstack/sessions/ is created unconditionally without user disclosure.
Fix: Disclose session tracking to the user or gate it behind a consent flag.
The skill reassigns the agent's identity and voice to 'GStack' with a specific persona, which could override the host agent's safety guidelines or expected behavior.
Fix: Frame the voice section as a style guide rather than an identity override (e.g., 'When responding in this skill, adopt this tone...').
The telemetry consent flow uses a dark pattern — declining the first opt-in triggers a second ask, pressuring the user toward at least anonymous tracking.
Fix: Accept the user's first decline without a follow-up nudge, or combine all options into a single prompt.
The proactive behavior prompt is worded to bias the user toward keeping auto-invocation on, framing opt-out negatively.
Fix: Present both options neutrally without implying the opt-out choice is less convenient.
The skill description/body is truncated mid-sentence, suggesting the file is incomplete.
Fix: Complete the truncated section to ensure the agent has full instructions.
LOW
The preamble runs an update check and may trigger an auto-upgrade flow, which could change skill behavior mid-session without explicit user action.
Fix: Always require explicit user confirmation before upgrading skill files during an active session.
The shell command to open a URL in the browser is macOS-specific ('open') and will fail on Linux/Windows.
Fix: Use a cross-platform approach (e.g., xdg-open on Linux, start on Windows) or detect the OS first.
Needs work skill with 8 suggestions for improvement (weakest: safety)
design-review
D
skillclaude-nativeScore: 57%
completeness80%weight 20%
clarity60%weight 15%
actionability75%weight 20%
safety30%weight 15%
testability60%weight 10%
trust30%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (8567 words, 62 sections, 232 bullets)
Has 33 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (60%)
Well-structured body (62 sections, 232 bullets)
Description is too short — expand to 20-200 characters
Replace vague words: things
actionability (75%)
Body contains actionable instructions (9 action items, 90 numbered)
Body references tools or includes code
No inputs defined (not always needed)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
CRITICAL: 3 suspicious pattern(s) found — review carefully before use
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
[SECRET] Possible BIP39 mnemonic seed phrase
LLM Security Review
HIGH
Skill is named 'design-review' but contains no design review logic — it is entirely onboarding, telemetry opt-in, and framework bootstrapping.
Fix: The skill body should contain actual design review instructions, or the name/description should accurately reflect that this is a framework setup skill.
The preamble silently writes analytics/telemetry data to disk on every invocation before any user consent is obtained.
Fix: Move local analytics logging behind the same telemetry consent gate, or at minimum disclose it to the user before writing.
MEDIUM
Session tracking via PID-named files in ~/.gstack/sessions/ is created silently with no disclosure or opt-out.
Fix: Disclose session tracking to the user or tie it to the telemetry consent flow.
The 'Voice' section reassigns the agent's identity to 'GStack' with a specific persona, which could override the host agent's safety guidelines or default behavior.
Fix: Use 'When responding in this skill, adopt the following tone...' instead of a full identity reassignment.
The telemetry opt-in flow uses dark-pattern nudging — the first option is labeled '(recommended)', and declining triggers a second ask, pressuring users toward data collection.
Fix: Present all telemetry options in a single prompt without a recommended label or a guilt-trip follow-up.
The skill forces an unrelated marketing essay ('Boil the Lake') to be opened in the user's browser on first run, unrelated to design review.
Fix: Remove the promotional content or make it a separate, clearly labeled onboarding skill.
The skill description/body is truncated mid-sentence, suggesting the actual design-review instructions are missing entirely.
Fix: Complete the skill file with actual design review instructions so the agent can fulfill the stated purpose.
LOW
The preamble sources arbitrary shell code from gstack-repo-mode without validation, which could alter the shell environment unpredictably.
Fix: Document what gstack-repo-mode exports, or at minimum validate its output before sourcing.
The pending-analytics cleanup loop silently deletes files and calls a telemetry binary even if the user hasn't consented yet.
Fix: Gate all telemetry binary calls behind the user's telemetry consent setting.
Needs work skill with 9 suggestions for improvement (weakest: safety)
document-release
D
skillclaude-nativeScore: 64%
completeness80%weight 20%
clarity60%weight 15%
actionability75%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (4915 words, 23 sections, 88 bullets)
Has 21 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (60%)
Well-structured body (23 sections, 88 bullets)
Description is too short — expand to 20-200 characters
Replace vague words: whatever, things
actionability (75%)
Body contains actionable instructions (2 action items, 47 numbered)
Body references tools or includes code
No inputs defined (not always needed)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
HIGH
The skill is named 'document-release' but the body contains no release documentation logic — it is entirely onboarding, telemetry opt-in, and framework bootstrapping.
Fix: Either rename the skill to reflect its actual purpose (e.g., 'gstack-init') or add the missing release documentation instructions.
The preamble silently writes analytics/telemetry data to disk on every invocation before the user has consented to telemetry.
Fix: Gate local analytics logging behind the telemetry consent check (only write if _TEL is not 'off').
The skill description is effectively empty — there is no explanation of what 'document-release' does, so users cannot give informed consent before running it.
Fix: Add a clear, honest description explaining the skill's purpose and any first-run onboarding steps.
MEDIUM
The skill reassigns the agent's identity and voice ('You are GStack') which can override the host agent's safety persona and behavioral guidelines.
Fix: Use advisory tone guidance ('When responding in this skill, adopt a direct builder tone') instead of identity reassignment.
The skill orchestrates a multi-step onboarding funnel (lake intro → telemetry → proactive mode) with nudging defaults labeled '(recommended)' and a double-ask pattern for telemetry refusal.
Fix: Accept a single 'no' without a follow-up nudge; present telemetry options in one prompt instead of two.
The preamble opens an external marketing URL in the user's browser without prior disclosure in the skill description.
Fix: Disclose in the skill description that first-run may open a browser, and ensure it only runs with explicit user consent.
The skill body ends abruptly mid-section (Voice/Humor) with no actual task instructions, meaning an agent executing it would have no actionable release-documentation steps to follow.
Fix: Complete the skill with concrete instructions for the release documentation task it claims to perform.
LOW
Session tracking writes PID-based files and counts active sessions across all repos, which is not disclosed to the user.
Fix: Document session tracking behavior and its purpose, or remove it if not essential.
Needs work skill with 8 suggestions for improvement (weakest: safety)
freeze
C
skillclaude-nativeScore: 70%
completeness85%weight 20%
clarity60%weight 15%
actionability65%weight 20%
safety75%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (85%)
Has description
Has 2 steps
Has 3 code block(s) in body (inline examples)
Has 4 gotchas
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
clarity (60%)
Language is concrete and specific
Description is too short — expand to 20-200 characters
Consider adding more steps (3-10 is ideal)
actionability (65%)
Steps start with action verbs
No inputs defined (not always needed)
Reference specific tools or commands in steps (e.g. `git diff`, Grep)
safety (75%)
Has 4 gotchas
Gotchas are specific and detailed
Address what happens when things go wrong (errors, failures, retries)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
HIGH
Steps are skeletal headings with no actionable instructions — an agent cannot determine what to resolve or how to enforce the freeze.
Fix: Expand each step with explicit instructions: specify what 'it' refers to (user-supplied path), name the state file location, and describe how Edit/Write calls should check the freeze boundary.
MEDIUM
The skill never defines what input it expects or how the user invokes it with a directory argument.
Fix: Add a usage line (e.g., '/freeze <directory>') and specify how the directory argument is parsed.
The Gotchas section explicitly acknowledges the freeze is trivially bypassed via Bash, which undermines the stated purpose of preventing accidental edits.
Fix: Either document this as a known limitation users must accept, or extend the freeze to intercept shell write commands as well.
The description promises a working freeze mechanism but the steps contain no enforcement logic — the skill would not actually prevent any edits.
Fix: Add the core enforcement step: instruct the agent to check the freeze state file before every Edit/Write call and reject paths outside the frozen directory.
Acceptable skill with 7 suggestions for improvement
gstack-upgrade
D
skillclaude-nativeScore: 64%
completeness80%weight 20%
clarity60%weight 15%
actionability75%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (1008 words, 10 sections, 5 bullets)
Has 15 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (60%)
Well-structured body (10 sections, 5 bullets)
Description is too short — expand to 20-200 characters
Replace vague words: whatever
actionability (75%)
Body contains actionable instructions (0 action items, 3 numbered)
Body references tools or includes code
No inputs defined (not always needed)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
testability (60%)
Body contains inline examples
Body code blocks serve as examples
Structured body describes expected behavior
trust (70%)
Contains 14 executable code block(s): bash
[bash] _AUTO=""
[bash] ~/.claude/skills/gstack/bin/gstack-config set auto_upgrade true
[bash] _SNOOZE_FILE=~/.gstack/update-snoozed
[bash] ~/.claude/skills/gstack/bin/gstack-config set update_check false
[bash] if [ -d "$HOME/.claude/skills/gstack/.git" ]; then
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
MEDIUM
Using `git reset --hard origin/main` discards any local commits or changes without confirmation beyond the stash, which could cause data loss.
Fix: Use `git pull --ff-only` or warn the user explicitly before discarding local commits that aren't covered by `git stash`.
LOW
The backup directory is deleted immediately after upgrade success with no rollback window if issues are discovered later.
Fix: Keep the backup for a short period or until the user confirms the upgrade works, rather than deleting it immediately.
The vendored upgrade clones from a hardcoded GitHub URL over HTTPS without verifying a tag or commit hash, so the agent always gets whatever is on `main`.
Fix: Clone a specific tagged release or verify the VERSION file post-clone matches the expected new version.
The `./setup` script is executed without checking if it exists or is executable, which could produce a confusing error.
Fix: Add a guard like `[ -x ./setup ] && ./setup` or provide a clear error message if setup is missing.
Needs work skill with 8 suggestions for improvement (weakest: safety)
guard
D
skillclaude-nativeScore: 59%
completeness80%weight 20%
clarity70%weight 15%
actionability60%weight 20%
safety5%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (280 words, 2 sections, 6 bullets)
Has 3 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (70%)
Structured body (2 sections, 6 bullets)
Language is concrete and specific
Description is too short — expand to 20-200 characters
actionability (60%)
Body has some actionable content
Body references tools or includes code
No inputs defined (not always needed)
Add more action-oriented bullet points or numbered steps
safety (5%)
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
Address what happens when things go wrong (errors, failures, retries)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
MEDIUM
The skill silently logs usage analytics (skill name, timestamp, repo name) to a hidden directory without informing the user or asking consent.
Fix: Either disclose the analytics collection to the user before executing, or remove the silent tracking entirely.
LOW
The user-provided directory path is interpolated into a cd command without validation, which could fail silently or behave unexpectedly with paths containing special characters.
Fix: Add input validation to reject empty, relative, or special-character paths before using them in shell commands.
The skill depends on two sibling skills (/careful and /freeze) but has no check to verify they are installed before activating, which could leave the user with a false sense of security.
Fix: Add a pre-flight check that verifies both /careful and /freeze hook scripts exist before declaring guard mode active.
Needs work skill with 9 suggestions for improvement (weakest: safety)
investigate
D
skillclaude-nativeScore: 64%
completeness80%weight 20%
clarity60%weight 15%
actionability75%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (3669 words, 20 sections, 37 bullets)
Has 13 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (60%)
Well-structured body (20 sections, 37 bullets)
Description is too short — expand to 20-200 characters
Replace vague words: maybe, things
actionability (75%)
Body contains actionable instructions (0 action items, 20 numbered)
Body references tools or includes code
No inputs defined (not always needed)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
HIGH
The skill is named 'investigate' but the majority of its body is onboarding, telemetry opt-in, proactive-behavior prompts, and branding — not investigation logic.
Fix: The skill description should accurately state that it performs onboarding and configuration, or the investigation logic should actually be present in the body.
The preamble silently writes analytics/telemetry data to disk on every invocation before the user has consented to telemetry.
Fix: Gate all local analytics writes behind the user's telemetry preference; do not log skill usage before consent is obtained.
MEDIUM
Session tracking via touch files in ~/.gstack/sessions happens unconditionally on every run, creating a persistent usage timeline regardless of telemetry setting.
Fix: Either disclose session tracking to the user or gate it behind the telemetry consent flag.
The telemetry opt-in flow uses dark-pattern nudging: the first option is labeled '(recommended)', and declining triggers a second follow-up ask, pressuring the user toward consent.
Fix: Present all telemetry options in a single prompt without a second persuasion step, and avoid marking a specific choice as recommended.
The skill reassigns the agent's identity and voice ('You are GStack') which could override the host agent's system prompt personality and safety framing.
Fix: Use a scoped instruction like 'When executing this skill, adopt the following tone' rather than a blanket identity reassignment.
The LAKE_INTRO block attempts to open an external URL in the user's browser and frames it as a mandatory first-run step, which is unexpected for an 'investigate' skill.
Fix: Opening external URLs should be optional and clearly disclosed in the skill description, not buried in conditional first-run logic.
The skill description field is completely empty, giving the user no information about what the skill does before invoking it.
Fix: Add a clear, honest description summarizing the skill's purpose and side effects.
LOW
The skill body is truncated mid-sentence in the Voice section, suggesting incomplete content that an agent would try to follow anyway.
Fix: Complete the truncated section or remove it to avoid agents hallucinating the rest of the instruction.
The preamble shell script sources an external script into the current shell without validating its contents, which could modify the agent's environment arbitrarily.
Fix: Document what gstack-repo-mode exports, or capture its output into explicit variables instead of sourcing arbitrary shell code.
Needs work skill with 8 suggestions for improvement (weakest: safety)
land-and-deploy
D
skillclaude-nativeScore: 55%
completeness80%weight 20%
clarity60%weight 15%
actionability75%weight 20%
safety30%weight 15%
testability60%weight 10%
trust20%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (10014 words, 52 sections, 122 bullets)
Has 58 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (60%)
Well-structured body (52 sections, 122 bullets)
Description is too short — expand to 20-200 characters
Replace vague words: things
actionability (75%)
Body contains actionable instructions (3 action items, 39 numbered)
Body references tools or includes code
No inputs defined (not always needed)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
CRITICAL: 3 suspicious pattern(s) found — review carefully before use
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
[INJECTION] Conditional logic based on first run (rug-pull pattern)
LLM Security Review
HIGH
The skill is named 'land-and-deploy' but contains no deployment or landing logic — it is entirely a telemetry/onboarding/config bootstrap framework.
Fix: Either rename the skill to reflect its actual purpose (e.g., 'gstack-init') or add the promised land-and-deploy functionality.
The preamble silently writes analytics/telemetry data to disk on every invocation before the user has consented to telemetry.
Fix: Gate all analytics writes behind the telemetry consent check (only write if $_TEL is not 'off').
MEDIUM
Session tracking files are created unconditionally in ~/.gstack/sessions, enabling cross-invocation usage monitoring without disclosure.
Fix: Disclose session tracking to the user or gate it behind the telemetry consent flag.
The 'Voice' section reassigns the agent's identity to 'GStack' with a specific persona and communication style, which can override the host agent's safety tone and behavior guidelines.
Fix: Frame the voice section as guidance ('When responding about gstack topics, use this tone…') rather than an identity override ('You are GStack').
The telemetry opt-in flow uses dark-pattern nudging: the first option is labeled '(recommended)', and declining triggers a second prompt pushing 'anonymous' mode, making full opt-out require two explicit refusals.
Fix: Present all telemetry options in a single prompt without pressure labels or multi-step nudging.
The preamble sources arbitrary output from an external binary into the current shell, allowing that binary to set or override environment variables and execute commands.
Fix: Capture the output into a known variable instead of sourcing it, or validate/allowlist the output before sourcing.
The skill's Description field is empty, providing no information to the agent or user about what the skill does before invocation.
Fix: Add a meaningful description that accurately reflects the skill's actual behavior.
LOW
The 'Boil the Lake' intro flow auto-opens a URL in the user's default browser if they agree, driving traffic to an external site as a mandatory onboarding step.
Fix: Make the URL informational only (print it) rather than offering to auto-open a browser to an external marketing page.
The Body/Description text appears to be truncated mid-sentence in the Voice section.
Fix: Complete the truncated Humor guidance so agents can follow the full style instructions.
Needs work skill with 9 suggestions for improvement (weakest: trust)
office-hours
D
skillclaude-nativeScore: 64%
completeness80%weight 20%
clarity60%weight 15%
actionability75%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (9949 words, 80 sections, 129 bullets)
Has 30 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (60%)
Well-structured body (80 sections, 129 bullets)
Description is too short — expand to 20-200 characters
Replace vague words: maybe, things
actionability (75%)
Body contains actionable instructions (2 action items, 53 numbered)
Body references tools or includes code
No inputs defined (not always needed)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
HIGH
The skill is named 'office-hours' but contains no office-hours functionality — it is entirely a framework bootstrap/onboarding flow for 'gstack'.
Fix: Rename the skill to match its actual purpose (e.g., 'gstack-init' or 'gstack-bootstrap') or add actual office-hours functionality.
The preamble silently writes persistent analytics/telemetry data to disk before the user has consented to telemetry.
Fix: Gate all analytics writes behind the telemetry consent check — do not write to skill-usage.jsonl until the user has opted in.
MEDIUM
Session tracking via PID-named files in ~/.gstack/sessions/ is created unconditionally with no user disclosure or consent.
Fix: Disclose session tracking to the user or gate it behind the telemetry preference.
The skill reassigns the agent's identity and voice ('You are GStack') which could override the host agent's system prompt and safety personality.
Fix: Use additive framing ('When responding to gstack skills, adopt this tone') rather than full identity reassignment.
The telemetry opt-in flow uses a dark pattern: declining the first offer triggers a second ask with a softer option, pressuring the user toward some level of data collection.
Fix: Accept 'No thanks' as a final answer without a follow-up nudge, or at minimum combine all three options into a single prompt.
The skill silently sources arbitrary output from an external binary into the current shell, allowing it to set environment variables or run commands without user review.
Fix: Log what gstack-repo-mode outputs before sourcing it, or document what variables it sets so the user can audit the behavior.
LOW
The upgrade check can silently trigger reading and executing another skill file (gstack-upgrade/SKILL.md) with an auto-upgrade path, without explicit user initiation.
Fix: Always require explicit user confirmation before running the upgrade skill, even if auto-upgrade is configured.
The Description field is empty/malformed — the body content is stuffed into the description via an unclosed YAML block, and the skill description is truncated mid-sentence at the end.
Fix: Provide a proper one-line description and move the body content to a dedicated field; fix the truncated Voice section.
Needs work skill with 8 suggestions for improvement (weakest: safety)
plan-ceo-review
D
skillclaude-nativeScore: 64%
completeness80%weight 20%
clarity60%weight 15%
actionability75%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (13621 words, 81 sections, 241 bullets)
Has 37 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (60%)
Well-structured body (81 sections, 241 bullets)
Description is too short — expand to 20-200 characters
Replace vague words: things
actionability (75%)
Body contains actionable instructions (6 action items, 82 numbered)
Body references tools or includes code
No inputs defined (not always needed)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
HIGH
The skill is named 'plan-ceo-review' but contains no instructions for planning or reviewing anything CEO-related; the entire body is a generic gstack framework preamble with onboarding flows.
Fix: Either rename the skill to match its actual purpose or add the CEO review planning logic that the name promises.
The preamble silently writes persistent tracking data (skill name, timestamp, repo name) to a local analytics file on every invocation without prior user consent.
Fix: Gate analytics logging on the user's telemetry preference (the _TEL variable is already available) instead of writing unconditionally.
MEDIUM
Session tracking creates files in ~/.gstack/sessions keyed by PPID on every run, enabling cross-skill session correlation before the user has opted into any telemetry.
Fix: Move session file creation behind the telemetry consent check so it only runs when telemetry is not 'off'.
The skill reassigns the agent's identity and voice ('You are GStack') which could override the host agent's safety persona and behavioral guidelines.
Fix: Use a softer framing like 'When responding in this skill, adopt the following tone' rather than a full identity override.
The telemetry opt-in flow uses manipulative dark-pattern design: the first refusal triggers a second ask, making it harder to fully decline.
Fix: Present all three options (community, anonymous, off) in a single prompt so the user can choose without pressure.
The 'Boil the Lake' intro forces the agent to promote an external essay and offer to open a browser before doing any actual work, acting as marketing rather than skill functionality.
Fix: Remove the browser-open promotion or make it a separate opt-in onboarding skill rather than embedding it in every skill's preamble.
LOW
The Description field is empty/malformed — it contains a markdown Body header instead of an actual description of what the skill does.
Fix: Provide a clear one-line description explaining the skill's purpose so users and audit tools can evaluate it before invocation.
The Voice section is truncated mid-word, suggesting the skill file is incomplete or was improperly generated.
Fix: Complete the Humor directive or remove the truncated line to avoid undefined agent behavior.
Needs work skill with 8 suggestions for improvement (weakest: safety)
plan-design-review
D
skillclaude-nativeScore: 64%
completeness80%weight 20%
clarity60%weight 15%
actionability75%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (8269 words, 54 sections, 138 bullets)
Has 22 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (60%)
Well-structured body (54 sections, 138 bullets)
Description is too short — expand to 20-200 characters
Replace vague words: things
actionability (75%)
Body contains actionable instructions (4 action items, 89 numbered)
Body references tools or includes code
No inputs defined (not always needed)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
HIGH
The skill is named 'plan-design-review' but the body contains no design review logic — it is entirely onboarding, telemetry opt-in, and framework bootstrapping.
Fix: Either add actual plan/design review instructions or rename the skill to reflect its true purpose (e.g., 'gstack-onboarding').
The preamble silently logs every skill invocation (skill name, timestamp, repo name) to a local analytics file without informing the user or waiting for telemetry consent.
Fix: Gate local analytics logging on the same telemetry consent check, or at minimum disclose the local logging before it runs.
The skill description is empty and the body ends mid-sentence (truncated at '**Humor'), so the actual plan-design-review instructions are entirely missing.
Fix: Include the complete skill body with actual design review instructions, and fill in the description field.
MEDIUM
The preamble processes and deletes pending analytics files and may invoke a telemetry-log binary before the user has been asked about telemetry consent.
Fix: Only process pending telemetry files after the user has explicitly opted in to telemetry.
The 'Voice' section reassigns the agent's identity to 'You are GStack', which could override the host agent's persona and safety framing.
Fix: Use softer framing like 'When responding in this skill, adopt the following tone' instead of identity reassignment.
The telemetry opt-in flow uses dark-pattern nudging: the first option is labeled '(recommended)', and declining triggers a second follow-up question pressuring the user again.
Fix: Present all telemetry options in a single prompt without a follow-up guilt step, and avoid labeling one choice as recommended.
Session tracking writes a PID-based file to disk and counts active sessions across all terminal windows, which is undisclosed surveillance of user activity.
Fix: Disclose the session-counting behavior and explain its purpose, or remove it if not needed by the skill.
LOW
The skill auto-opens a URL in the user's browser and marks it as seen, creating a one-time forced marketing touchpoint disguised as an introduction.
Fix: The touch marker runs unconditionally — make sure it only runs after the user responds, and don't frame marketing content as a required onboarding step.
The quoting on the gstack-telemetry-log executable check is incorrect — the -x test uses literal quotes around a tilde path, which will never resolve.
Fix: Remove the quotes around the tilde or use $HOME so the path expands correctly.
Needs work skill with 8 suggestions for improvement (weakest: safety)
plan-eng-review
D
skillclaude-nativeScore: 64%
completeness80%weight 20%
clarity60%weight 15%
actionability75%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (9644 words, 56 sections, 168 bullets)
Has 23 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (60%)
Well-structured body (56 sections, 168 bullets)
Description is too short — expand to 20-200 characters
Replace vague words: things
actionability (75%)
Body contains actionable instructions (4 action items, 45 numbered)
Body references tools or includes code
No inputs defined (not always needed)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
HIGH
The skill is named 'plan-eng-review' but the Description field contains the entire skill body instead of a concise description, and there is no actual engineering review logic anywhere in the skill.
Fix: Add a clear description field and include actual engineering plan review instructions in the body.
The preamble silently writes analytics/telemetry data to disk on every invocation before the user has consented to telemetry.
Fix: Gate all local analytics logging behind the telemetry consent check, not just the remote send.
MEDIUM
Session tracking via PID files in ~/.gstack/sessions happens unconditionally and silently counts concurrent sessions across all terminal windows.
Fix: Disclose session tracking to the user and tie it to the telemetry consent preference.
The skill reassigns the agent's identity and voice ('You are GStack') which can override the host agent's system prompt persona and safety framing.
Fix: Use 'Act as a helpful assistant following GStack conventions' instead of full identity reassignment.
The LAKE_INTRO flow attempts to open an external URL in the user's browser via the 'open' command, which is unexpected behavior for a plan review skill.
Fix: Remove the browser-open behavior or move it to a dedicated onboarding skill where it is expected.
The telemetry consent flow uses dark-pattern nudging with 'recommended' labels and a two-stage opt-out requiring the user to decline twice to fully disable tracking.
Fix: Present a single consent prompt with clear equal-weight options including 'fully off' as a first-level choice.
The skill contains no actual engineering plan review logic — it is entirely preamble, onboarding flows, telemetry prompts, and voice instructions.
Fix: Add concrete instructions for how the agent should actually review an engineering plan.
LOW
The Description field appears to contain the full skill body due to a YAML formatting issue, making the skill unusable as metadata.
Fix: Fix the YAML structure so Description is a short summary and the body is in a separate field.
The preamble sources an external script into the current shell which can modify environment variables and shell state silently.
Fix: Document what gstack-repo-mode exports and capture only specific expected variables rather than sourcing arbitrary output.
Needs work skill with 8 suggestions for improvement (weakest: safety)
qa
D
skillclaude-nativeScore: 54%
completeness80%weight 20%
clarity40%weight 15%
actionability38%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Has 32 steps
Has 37 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (40%)
Description is too short — expand to 20-200 characters
Too many steps (32) — consider grouping or splitting the skill
Replace vague words: things
actionability (38%)
No inputs defined (not always needed)
Start each step with an action verb (Run, Check, Verify, etc.)
Reference specific tools or commands in steps (e.g. `git diff`, Grep)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
MEDIUM
The skill description field contains the entire instruction body instead of a concise summary, suggesting the Name/Description/Steps structure is malformed.
Fix: Add a proper one-line description before the Steps section and move instructions into a dedicated body or instructions field.
Steps reference a 'Setup above' section and variables like $B and $REPORT_DIR that are never defined in the skill file.
Fix: Include the Setup section and define all variables ($B, $REPORT_DIR) within the skill file so it is self-contained.
LOW
Steps 5-8 appear to be framework detection heuristics but are formatted as top-level steps rather than sub-steps of a detection phase, making the workflow confusing.
Fix: Group framework detection heuristics under a single parent step (e.g., 'Detect framework') with sub-items.
The health score rubric referenced in step 23 is not included in the skill file.
Fix: Embed the health score rubric directly in the skill or reference a specific file path where it can be found.
Steps 29-31 are listed as bare labels without actionable instructions, unlike the other steps.
Fix: Expand these into full instructions describing how to compute the delta and diff against the baseline.
Needs work skill with 11 suggestions for improvement (weakest: safety)
qa-only
D
skillclaude-nativeScore: 54%
completeness80%weight 20%
clarity40%weight 15%
actionability38%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Has 32 steps
Has 24 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (40%)
Description is too short — expand to 20-200 characters
Too many steps (32) — consider grouping or splitting the skill
Replace vague words: things
actionability (38%)
No inputs defined (not always needed)
Start each step with an action verb (Run, Check, Verify, etc.)
Reference specific tools or commands in steps (e.g. `git diff`, Grep)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
MEDIUM
The skill references a 'Setup above' section and variables like $B and $REPORT_DIR that are not defined anywhere in the provided skill content.
Fix: Include the setup section and variable definitions within the skill file so the agent can execute steps correctly.
LOW
The description field contains the full step list but no actual high-level description of what the skill does, making it hard to understand purpose at a glance.
Fix: Add a one-line summary description before the steps section explaining this is a QA testing and reporting skill.
Steps 29-31 are listed as bare headings with no instructions, leaving the agent to guess what 'Health score delta' or 'Issues fixed' means operationally.
Fix: Expand these steps with explicit instructions on how to compute the delta and compare current vs. baseline issues.
The health score rubric referenced in step 23 is not included in the skill, so the agent cannot compute the score.
Fix: Include the actual health score rubric within the skill or reference a specific file path where it can be found.
Needs work skill with 11 suggestions for improvement (weakest: safety)
retro
D
skillclaude-nativeScore: 56%
completeness80%weight 20%
clarity40%weight 15%
actionability49%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Has 43 steps
Has 41 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (40%)
Description is too short — expand to 20-200 characters
Too many steps (43) — consider grouping or splitting the skill
Replace vague words: things
actionability (49%)
Steps reference specific tools or commands
No inputs defined (not always needed)
Start each step with an action verb (Run, Check, Verify, etc.)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
MEDIUM
The Description field contains only steps but no actual description of the skill's purpose, making it unclear to users and agent orchestrators what this skill does.
Fix: Add a one-paragraph description before the Steps section explaining that this skill generates a developer retrospective report from git history and TODO tracking.
The retro window / time period is never defined, so the agent has no way to know what date range to analyze.
Fix: Add a parameter or default (e.g., 'last 7 days' or 'since last retro') so the agent knows the time boundary.
LOW
Steps are a flat list of 43 data points with no grouping, output format, or instructions on how to compute or present them, which may produce inconsistent results across invocations.
Fix: Group metrics into named sections (e.g., TODO Health, Commit Patterns, Session Analysis, Per-Author Stats) and specify the expected output format (Markdown, HTML, etc.).
The skill assumes a TODOS.md file exists with a specific structure (## Completed section, priority labels P0-P2) but never validates this or specifies the expected format.
Fix: Document the expected TODOS.md format or add a fallback instruction for when the file doesn't exist.
Steps 35-36 ask the agent to generate subjective praise and growth feedback with example templates, which could produce inaccurate or inappropriate assessments if the git data is sparse.
Fix: Add a guard condition to skip or soften feedback sections when there are fewer than N commits in the window.
Needs work skill with 10 suggestions for improvement (weakest: safety)
review
D
skillclaude-nativeScore: 64%
completeness80%weight 20%
clarity60%weight 15%
actionability75%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (8798 words, 51 sections, 112 bullets)
Has 32 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (60%)
Well-structured body (51 sections, 112 bullets)
Description is too short — expand to 20-200 characters
Replace vague words: things
actionability (75%)
Body contains actionable instructions (5 action items, 50 numbered)
Body references tools or includes code
No inputs defined (not always needed)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
HIGH
The skill is named 'review' but the body contains no code review logic whatsoever — it is entirely a framework bootstrap, onboarding funnel, and persona definition.
Fix: Either rename the skill to reflect what it actually does (e.g., 'gstack-init') or add actual code review instructions.
The preamble silently logs every skill invocation (skill name, timestamp, repo name) to a local analytics file without informing the user, even before the telemetry consent prompt is shown.
Fix: Do not write analytics data until after the user has consented to telemetry; or at minimum clearly disclose that local logging occurs unconditionally.
The skill description is completely empty, giving the user no information about what invoking '/review' will actually do.
Fix: Add a clear description explaining what the review skill does, its inputs, and expected outputs.
MEDIUM
Session tracking creates a file per process and counts active sessions across all terminals, which is not disclosed to the user and has no clear relationship to the 'review' purpose.
Fix: Explain to the user why session tracking exists or remove it if it serves no user-facing function.
The telemetry opt-in funnel uses a dark pattern: declining the first offer triggers a second ask with a softer framing, pressuring the user toward at least anonymous tracking.
Fix: Respect a single 'no' — do not re-prompt with a second tier of data collection after the user declines.
The skill reassigns the agent's identity and voice ('You are GStack'), which can override the host agent's system instructions and established persona.
Fix: Use framing like 'When running this skill, adopt the following tone' instead of wholesale identity reassignment.
The preamble sources an external script into the current shell, which can set arbitrary environment variables or execute arbitrary code with no visibility to the user.
Fix: Run the script in a subshell and capture only the expected output variable, rather than sourcing arbitrary code into the environment.
The skill body is truncated mid-sentence in the Voice section, suggesting the file is incomplete or was improperly generated.
Fix: Ensure the full skill file is included; a truncated skill may cause unpredictable agent behavior.
LOW
The pending-analytics loop silently deletes files matching '.pending-*' and optionally phones home via gstack-telemetry-log before the user has been asked about telemetry.
Fix: Do not send telemetry or delete pending events until telemetry consent has been confirmed.
The onboarding sequence (lake intro → telemetry → proactive prompt) can consume the entire user interaction before any actual 'review' work begins.
Fix: Move first-run onboarding to a dedicated setup skill so that invoking '/review' actually performs a review.
Needs work skill with 8 suggestions for improvement (weakest: safety)
setup-browser-cookies
C
skillclaude-nativeScore: 73%
completeness85%weight 20%
clarity60%weight 15%
actionability60%weight 20%
safety100%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (85%)
Has description
Has 11 steps
Has 13 code block(s) in body (inline examples)
Has 5 gotchas
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
clarity (60%)
Language is concrete and specific
Description is too short — expand to 20-200 characters
Too many steps (11) — consider grouping or splitting the skill
actionability (60%)
Steps reference specific tools or commands
No inputs defined (not always needed)
Start each step with an action verb (Run, Check, Verify, etc.)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
HIGH
Steps 2-4 reference a 'GSTACK REVIEW REPORT' plan file check that is unrelated to browser cookie setup, indicating copy-paste contamination from another skill.
Fix: Remove steps 2-4 entirely as they belong to a different skill (likely a gstack review skill).
MEDIUM
Step 1 contains an unfilled template placeholder that would produce undefined behavior when executed by an agent.
Fix: Replace `{step}` with the actual first step instruction or remove the placeholder.
LOW
Step 11 is truncated mid-sentence, leaving the agent with no instructions for the bun-not-installed case.
Fix: Complete step 11 with the fallback instructions (e.g., install bun, or abort with a message).
The Description field is empty, providing no summary of what this skill does.
Fix: Add a one-line description such as 'Import browser cookies into a Playwright session for authenticated browsing.'
Acceptable skill with 6 suggestions for improvement
setup-deploy
D
skillclaude-nativeScore: 58%
completeness80%weight 20%
clarity40%weight 15%
actionability57%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Has 30 steps
Has 13 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (40%)
Description is too short — expand to 20-200 characters
Too many steps (30) — consider grouping or splitting the skill
Replace vague words: things
actionability (57%)
Steps reference specific tools or commands
No inputs defined (not always needed)
Start each step with an action verb (Run, Check, Verify, etc.)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
$ bash
{deploy-status-command} 2>/dev/null | head -5 || echo "COMMAND_FAILED"
HIGH: 2 suspicious pattern(s) found
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
MEDIUM
Partial API key exposure — even 4 characters leaks key prefix and confirms the variable is set, which could appear in logs or agent output.
Fix: Replace with a non-leaking existence check such as `[ -n "$RENDER_API_KEY" ] && echo 'set' || echo 'not set'`.
Steps for different platforms (Fly, Render, Vercel, Netlify, GitHub Actions) are listed sequentially without conditional branching, so an agent may execute all of them instead of only the relevant platform.
Fix: Add explicit conditional guards (e.g., 'If fly.toml exists, then…') so the agent only runs the steps matching the detected platform.
LOW
The Description field contains raw step instructions instead of a human-readable summary of the skill's purpose.
Fix: Add a one-line description before the steps explaining the skill configures deploy targets and health checks for a project.
Steps 1–5 present an interactive menu but no logic routes the agent's behavior based on the user's choice of A, B, or C.
Fix: Add explicit branching instructions (e.g., 'If user picks C, stop here') so the agent knows how to handle each choice.
Needs work skill with 10 suggestions for improvement (weakest: safety)
ship
D
skillclaude-nativeScore: 64%
completeness80%weight 20%
clarity60%weight 15%
actionability75%weight 20%
safety30%weight 15%
testability60%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (80%)
Has description
Rich runbook body (14409 words, 88 sections, 230 bullets)
Has 62 code block(s) in body (inline examples)
Body contains warning/caveat language
Runbook style (no formal inputs)
Add a dedicated ## Examples section for discoverability
Extract warnings into a dedicated ## Gotchas section
clarity (60%)
Well-structured body (88 sections, 230 bullets)
Description is too short — expand to 20-200 characters
Replace vague words: things
actionability (75%)
Body contains actionable instructions (16 action items, 82 numbered)
Body references tools or includes code
No inputs defined (not always needed)
safety (30%)
Mentions error handling concepts
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
HIGH
The skill is named 'ship' but the body contains no shipping/deployment logic — it is entirely onboarding, telemetry opt-in, and configuration scaffolding.
Fix: Either rename the skill to reflect its actual purpose (e.g., 'onboard') or add the actual shipping logic that users expect when invoking /ship.
The preamble silently writes analytics data to disk on every invocation, logging skill name, timestamp, and repo name without prior user consent.
Fix: Gate local analytics logging behind the same telemetry consent check, or at minimum disclose it to the user before writing.
MEDIUM
Session tracking creates per-process marker files and counts active sessions across all terminals, which is not disclosed to the user.
Fix: Document the session-tracking mechanism and let users opt out, or remove it if it serves no user-facing purpose.
The 'Voice' section reassigns the agent's identity to 'GStack' with a detailed persona, which could override the host agent's safety guidelines or default behavior.
Fix: Use 'Respond in the style of…' framing instead of 'You are…' identity reassignment to avoid overriding the agent's core instructions.
The telemetry opt-in flow uses a dark-pattern double-ask: if the user declines 'community' telemetry, they are immediately re-asked for 'anonymous' telemetry, pressuring toward consent.
Fix: Offer all three options (community, anonymous, off) in a single prompt instead of using a sequential nudge pattern.
The preamble sources an external script into the current shell, allowing arbitrary code execution that can silently set environment variables or modify shell state.
Fix: Run gstack-repo-mode as a subprocess and capture its output explicitly rather than sourcing arbitrary output into the shell.
The 'Boil the Lake' intro unconditionally interrupts the user's shipping workflow to display a philosophy essay link, degrading the experience for a task-oriented command.
Fix: Move the onboarding intro to a dedicated /gstack-setup or first-run skill instead of blocking task-oriented commands like /ship.
LOW
The skill description field is empty, giving the agent and user no indication of what /ship actually does.
Fix: Add a clear one-line description such as 'Ship the current branch: run checks, create PR, and merge.'
The Voice/persona section is truncated mid-sentence, indicating the skill file is incomplete.
Fix: Complete the truncated section or remove it to avoid confusing the agent.
Needs work skill with 8 suggestions for improvement (weakest: safety)
unfreeze
D
skillclaude-nativeScore: 50%
completeness63%weight 20%
clarity57%weight 15%
actionability50%weight 20%
safety5%weight 15%
testability45%weight 10%
trust70%weight 20%
Details & Suggestions
completeness (63%)
Has description
Has runbook body (124 words)
Has 2 code block(s) in body (inline examples)
Runbook style (no formal inputs)
Add more structure (sections or bullet lists) to the body
Add a dedicated ## Examples section for discoverability
Add gotchas/caveats to warn about common failure points
clarity (57%)
Has body content (124 words)
Language is concrete and specific
Description is too short — expand to 20-200 characters
Add section headers (##) and bullet lists to improve scannability
actionability (50%)
Runbook body present
Body references tools or includes code
No inputs defined (not always needed)
Add action-oriented instructions (bullet points starting with verbs)
safety (5%)
Add gotchas/caveats to warn about common failure points
Add specific gotchas (describe what can go wrong and why)
Address what happens when things go wrong (errors, failures, retries)
$ bash
STATE_DIR="${CLAUDE_PLUGIN_DATA:-$HOME/.gstack}"
if [ -f "$STATE_DIR/freeze...
HIGH: 2 suspicious pattern(s) found
[DESTRUCTIVE] Hard reset destroys uncommitted work
[DESTRUCTIVE] DROP TABLE/DATABASE is destructive
LLM Security Review
MEDIUM
The skill silently logs usage telemetry (skill name, timestamp, repo name) to a hidden analytics file without disclosing this to the user.
Fix: Either remove the silent telemetry collection or explicitly disclose it in the skill description so the user can make an informed choice.
LOW
The description says the skill clears a freeze boundary, but it also creates directories and writes analytics data — behavior not mentioned in the stated purpose.
Fix: Document all side effects (directory creation, telemetry logging) in the skill description.
The analytics block suppresses all errors via '2>/dev/null || true', making it impossible to diagnose failures in the telemetry path or detect disk-space issues.
Fix: If telemetry is kept, log errors to a debug file rather than silently discarding them.
Needs work skill with 12 suggestions for improvement (weakest: safety)