Allowed-Tools Surface Area Analysis | Research

What We Looked At

Every SKILL.md file in the Claude Code Plugins ecosystem includes an allowed-tools field that declares which tools the skill is permitted to use. It reads as a kind of manifest — a promise about what the skill will and won't do. We parsed 1,372 of those declarations across all 315 plugins in the marketplace to see whether real-world usage matched what you'd expect.

The question that motivated this wasn't abstract. It was practical: when you're writing a new skill, how much does the allowed-tools field actually matter? Are there patterns that hold across every category, or does it vary wildly? Does it make a meaningful difference to namespace your Bash calls, or is that security theater?

The dataset spans 25 plugin categories — everything from devops and database automation to productivity and documentation tooling. Skills range in complexity from single-purpose utilities (generate a commit message, summarize a file) to orchestrated workflows that call subagents and write to multiple systems. The 13 valid tool names per the current spec are: Read, Write, Edit, Bash, Glob, Grep, WebFetch, WebSearch, Task, TodoWrite, NotebookEdit, AskUserQuestion, and Skill.

1,372 SKILL.md files across 315 plugins in 25 categories. Every file parsed for its allowed-tools declaration, then grouped by category and analyzed for frequency, variance, and namespace patterns.

A few methodological notes before diving in. We counted a tool as "used" if it appeared anywhere in the allowed-tools line — including namespace variants like Bash(python:*) counting toward the Bash total. Skills that declared Bash without any namespace were counted separately from those with namespacing, which turned out to be a meaningful distinction. Skills that omitted allowed-tools entirely were excluded from frequency analysis but tracked separately — there were 47 of them, mostly older files that predate the current spec.

The Core Five

There is a cluster of five tools that appear so consistently across the ecosystem that they've effectively become the default starting point for any skill. Read, Write, Edit, Grep, and Glob appear in the vast majority of SKILL.md files, with barely any variation by category. A skill for AWS infrastructure management has the same five in its header as a skill for formatting markdown documentation.

This makes sense once you think about what Claude actually does when it runs a skill. Almost every useful task involves reading something first — a file, a config, a codebase section. Reading naturally leads to editing or writing something new. Grep and Glob are the navigation layer: Glob finds files that match a pattern, Grep finds content within those files. Together they let Claude understand the shape of a project before touching anything.

The core five are less a permission grant and more an acknowledgment of how Claude Code actually works. A skill without Read is nearly impossible to write. A skill without Glob or Grep is nearly impossible to make context-aware.

The implication for skill authors is simple: start with all five. Don't agonize over whether your skill "really needs" Grep if it touches any files at all. It probably does. The cost of including it is zero. The cost of leaving it out is a skill that fails silently in edge cases.

What's more interesting is the floor these five establish. Every category has them. Every sophisticated skill has them. The differentiation between a basic skill and a capable one shows up in what comes after the core five — and that's where the analysis gets interesting.

Bash: The Most Interesting Tool

Bash is where the real divergence happens. Of the 13 valid tools, Bash has the highest variance across both categories and individual skills. Some skills use unrestricted Bash — meaning Claude can run any shell command the user's environment allows. Others namespace it to specific command groups. And a substantial number of skills, particularly in productivity and documentation categories, skip Bash entirely.

Unrestricted vs. Namespaced Bash

Unrestricted Bash is simple to write and genuinely powerful. When you put Bash in your allowed-tools, Claude can run git commands, package managers, file operations, network requests via curl, anything. For exploratory scripts or developer-facing tools where the operator trusts the environment, this is often fine.

Namespaced Bash restricts that surface area to a specific set of commands. The syntax is Bash(command:*), where the wildcard allows any arguments to that particular command but blocks everything else. So Bash(python:*) means Claude can run Python scripts but can't, for example, run rm -rf or make curl requests.

The security tradeoff is real. An unrestricted Bash permission in a shared or multi-tenant environment means any skill with that declaration can, if prompted correctly, run arbitrary shell commands. Namespacing creates an explicit contract: the skill author is telling you exactly what kinds of commands this skill needs, and the runtime can enforce that boundary.

Here's what the namespace patterns actually look like across the dataset:

Namespace Pattern	What It Restricts To	Primary Use Case	Categories Where Common
`Bash(python:*)`	Python interpreter only	Data analysis, ML scripts, automation	ai-ml, data, testing
`Bash(psql:*)`	PostgreSQL CLI	Database queries and migrations	database
`Bash(mysql:*)`	MySQL client	MySQL administration and queries	database
`Bash(ansible:*)`	Ansible playbook runner	Infrastructure provisioning	devops
`Bash(terraform:*)`	Terraform CLI	Cloud infrastructure management	devops, cloud
`Bash(cmd:*)`	Generic command execution	Windows-compatible scripts	productivity, devops
`Bash(api:*)`	API calls via curl or http client	External service integration	integrations, saas-packs
`Bash(aws:s3:*)`	AWS S3 operations only	Object storage management	cloud, devops
`Bash(az:storage:*)`	Azure Storage CLI	Azure blob and file management	cloud

The deeper the namespace hierarchy, the more opinionated the skill. Bash(aws:s3:*) isn't just saying "this is a cloud skill" — it's saying "this skill does exactly one thing with AWS and nothing else." That level of specificity is valuable both for security review and for skill discoverability. A developer scanning a plugin's allowed-tools can understand the blast radius immediately.

When Namespacing Is Worth the Effort

Not every skill benefits equally from namespacing. A personal productivity skill running on a developer's laptop has a very different threat model than an infrastructure automation skill running in a CI/CD pipeline. The data reflects this: productivity skills are far less likely to use namespaced Bash, while devops and database skills namespace heavily.

The practical rule that emerges from the dataset: if your skill's Bash usage is tied to a specific tool (a database client, a cloud CLI, a package manager), namespace it. If your skill genuinely needs general shell access — for example, a meta-skill that helps developers run arbitrary project commands — unrestricted Bash may be the honest answer. The worst outcome is writing Bash because it's easy and ending up with unnecessary surface area that nobody intended.

Tools That Rarely Appear

Three tools show up so infrequently in the dataset that they warrant their own discussion: TodoWrite, NotebookEdit, and AskUserQuestion. They're valid per the spec, they serve real purposes, and they appear in well-written skills — but they're genuinely niche. Understanding what they're for helps you recognize the rare cases where they belong.

TodoWrite

TodoWrite writes structured task lists to Claude's internal todo system during a session. It shows up in skills that are explicitly orchestrating multi-step workflows — things like "plan this migration" or "break this feature into implementation tasks." It's notably absent from skills that are themselves a single discrete operation, which is most of them. If your skill does one thing and does it completely, TodoWrite adds complexity without benefit. If your skill is a planning layer — creating a roadmap that other skills or the developer will then execute — it becomes genuinely useful.

NotebookEdit

NotebookEdit is specifically for Jupyter notebooks, and its usage pattern reflects that specialization exactly. It appears almost exclusively in ai-ml category skills, data analysis plugins, and anything in the research or data-science adjacent space. If you're not writing skills for data practitioners who work in notebooks, you will likely never need this tool. The skills that do use it tend to be quite sophisticated — they're not just editing a cell but restructuring notebook workflows, updating code cells while preserving output, or converting between formats.

AskUserQuestion

AskUserQuestion lets a skill pause execution and prompt the user for input before continuing. The use case sounds broadly applicable — who wouldn't want their skill to ask clarifying questions? — but in practice it appears sparingly. The reason is that most skill authors discover a better pattern: write the skill to be smart enough about context that it rarely needs to ask, and when it does need input, design the workflow so the user provides it upfront rather than mid-execution. Skills that use AskUserQuestion well tend to be interactive setup wizards or configuration flows, not task-execution skills.

TodoWrite, NotebookEdit, and AskUserQuestion collectively appear in fewer than 8% of the 1,372 skills analyzed. That's not because they're poorly designed — it's because they solve specific problems that most skills don't have.

WebFetch and WebSearch occupy a middle tier — more common than the three niche tools above, less universal than the core five. WebFetch appears in skills that retrieve documentation, check API responses, or validate URLs. WebSearch is less common because skills that search the web tend to be exploratory research tools rather than workflow automation. The Task tool, which spawns subagents, appears in orchestrator-level skills that coordinate complex multi-agent workflows — common in the saas-packs category, rare in single-purpose skills.

What This Means for Skill Authors

The data resolves a lot of the uncertainty that comes up when you're writing a new skill. Here's what we'd tell someone sitting down to write their first SKILL.md based on what we found across 1,372 real examples.

Start with the core five: Read, Write, Edit, Grep, Glob. Don't overthink it. These appear in nearly every well-functioning skill for a reason — they're the minimum viable toolkit for doing almost anything useful with a codebase or filesystem. Including them doesn't broaden your attack surface in any meaningful way; they're tightly scoped to reading and writing files within the project context.

Add Bash only when your skill genuinely needs it. The clearest signal is whether your skill needs to run a program — not just read a file, but actually execute something. If you're invoking a CLI, running a script, or hitting a database client, you need Bash. If your skill is purely about reading files and generating new content, you probably don't.

When you add Bash, namespace it if you can. This isn't about paranoia — it's about clarity. A namespaced Bash declaration tells anyone reviewing your skill exactly what it does. It also makes the skill easier to audit and safer to use in shared environments. The extra characters in Bash(terraform:*) versus Bash cost you nothing and communicate a great deal.

The most common mistake we see isn't granting too many permissions — it's declaring permissions that don't match what the skill actually does. A skill that only reads files but declares Bash raises questions. A skill that needs to run database queries but only declares Read will fail. Match your declarations to your actual behavior.

Think carefully before adding WebFetch or WebSearch. Skills that reach out to external URLs introduce dependencies that your local filesystem doesn't have. They can fail due to network issues, rate limits, or endpoint changes. If your skill can accomplish its goal with local information, keep it local. If external data is genuinely necessary, make that explicit in your skill description so users know what to expect.

Skip the niche tools unless you know you need them. If you're not building an interactive configuration wizard, you don't need AskUserQuestion. If you're not in the data science space, you don't need NotebookEdit. If your skill is a single discrete operation, you don't need TodoWrite. These tools are genuinely useful in their domains, but the data is clear that most skills function perfectly without them.

Category Patterns

Tool usage doesn't vary uniformly across categories — there are real structural differences that reflect what different types of work require. The table below shows tool frequency for the top five categories by plugin count, based on the full dataset.

Category	Read / Write / Edit / Grep / Glob	Bash (unrestricted)	Bash (namespaced)	WebFetch / WebSearch	Task	Niche Tools
devops	Nearly universal	Moderate	High — terraform, ansible, aws, az	Low	Moderate	Rare
database	Nearly universal	Low	High — psql, mysql, sqlite3	Very low	Low	Rare
security	Nearly universal	High	Moderate — nmap, openssl, aws:iam	Low	Low	Rare
ai-ml	Nearly universal	Moderate	Moderate — python, pip, jupyter	Moderate	High	NotebookEdit common
productivity	Nearly universal	Low	Very low	Moderate	Low	TodoWrite occasional

The devops category has the most sophisticated Bash namespacing patterns in the entire dataset. This makes intuitive sense — infrastructure tools are high-stakes, often running in automated pipelines, and the people who write devops skills tend to be security-conscious. The concentration of specific namespace patterns (terraform, ansible, cloud CLIs) reflects a community norm around being explicit about blast radius.

Database skills are a close second for namespacing discipline, and for similar reasons. A skill that can run arbitrary shell commands in a database environment has serious potential for damage. Skills in this category tend to constrain themselves to the specific database client they need, and nothing more. The difference between Bash(psql:*) and unrestricted Bash in a database context is not theoretical.

Security category skills show an interesting pattern: they're more likely to use unrestricted Bash than any other technical category, because security tooling genuinely requires it. Vulnerability scanners, penetration testing scripts, and certificate management tools need to run a variety of commands that don't fit neatly into a single namespace. The security skills that do namespace tend to be narrowly focused utilities (certificate management, IAM policy analysis) rather than general-purpose security assessment tools.

The ai-ml category is where you see the most Task tool usage, reflecting the orchestration-heavy nature of machine learning workflows. Training pipelines, experiment tracking, and model evaluation are often multi-step operations that benefit from spawning subagents for parallel work. This is also the only category where NotebookEdit appears with any regularity.

Productivity skills tell the most conservative story. These skills tend to be about personal workflow — organizing notes, summarizing documents, drafting content. They don't need database clients or cloud CLIs. They need to read what's there and write something useful. The core five plus occasional WebFetch for reference lookups covers nearly everything in this category. The simplicity is appropriate to the use case.

The most striking finding across categories: no category abandons the core five. Read, Write, Edit, Grep, and Glob appear in virtually every skill regardless of domain. The variation is entirely in what gets added on top.

There's a meaningful correlation between Bash namespacing and overall skill quality scores in the dataset. Skills with namespaced Bash tend to have more precise descriptions, cleaner frontmatter, and higher scores on the 100-point validation rubric. The discipline required to think through what commands a skill actually needs seems to correlate with the discipline required to write a good skill overall. This doesn't prove causation — it might just be that careful authors namespace their Bash and also write better descriptions — but it's a pattern worth noticing.

Methodology

The dataset for this analysis was produced by the marketplace's build pipeline, specifically the discover-skills.mjs script that runs as the first step of every build. This script scans all plugin directories, finds every SKILL.md file, and extracts frontmatter into structured data. We collected this structured output across all 315 plugins and analyzed the allowed-tools field specifically.

A "skill" for purposes of this analysis means any file named SKILL.md located in a skills/[skill-name]/ subdirectory within a valid plugin directory, with a well-formed YAML frontmatter block that includes the allowed-tools field. Files with malformed YAML, missing frontmatter, or absent allowed-tools declarations were excluded from frequency analysis but logged separately.

Tool frequency was computed per-skill, not per-plugin. A plugin with eight skills contributes eight data points to the frequency analysis. This approach reflects how tools are actually granted — at the skill level, not the plugin level — and avoids artificially weighting small plugins over large ones.

For the Bash namespace analysis, we extracted the namespace argument from patterns like Bash(pattern:*) using regex matching against each tool declaration string. Namespaces with fewer than three occurrences in the dataset were grouped into an "other namespaced" category to avoid over-indexing on outliers.

Category assignments come from the plugin's category field in marketplace.extended.json, which is the canonical source of truth for plugin metadata. Each plugin belongs to exactly one category. The 25 categories reflect the current taxonomy as of the data collection date.

The skills-only validation script (python3 scripts/validate-skills-schema.py --skills-only) was used to identify and exclude skills that fail the 2026 spec validation, ensuring the dataset reflects only current, valid skill declarations rather than legacy or malformed files.

One important caveat: the allowed-tools field represents what a skill declares it needs, not necessarily what it actually uses at runtime. A skill could declare Read but never read a file, or declare Bash but only use it in rare code paths. This analysis is about declared surface area, which is the relevant quantity for security review and author intent — but it's not a runtime execution trace.

Future iterations of this research could augment the static analysis with runtime telemetry, tracking which tools Claude actually invokes during skill execution across a sample of real sessions. That would allow comparison of declared vs. actual surface area, which could reveal systematic over-declaration (skills that ask for more than they need) or under-declaration (skills that would benefit from additional tools they haven't declared). For now, the declared data gives us a clear picture of author intent across the full ecosystem.

Cite This Research

Allowed-Tools Surface Area Analysis. Claude Code Plugins Research, 2026. https://tonsofskills.com/research/allowed-tools-surface-area/