Most quality problems in software are discovered too late. A missed validation, a broken link, a secret accidentally committed — these are the kinds of things that slip through when the only check is "does it look right to me right now?" The Claude Code Plugins ecosystem was designed around the opposite assumption: that every change should be challenged automatically, at multiple layers, before it ever ships.
What follows is a detailed look at how that challenge system actually works — the five parallel jobs that run on every pull request, the local scripts that catch problems in 30 seconds, the grading rubric that scores every skill out of 100 points, and the two-catalog sync that ensures what contributors edit and what the CLI downloads are never allowed to drift apart.
This isn't a tutorial on setting up CI/CD. It's an analysis of the specific patterns that emerged from maintaining 270+ plugins and 1,500+ skills — and what each layer of validation actually protects against.
The Five-Job Pipeline
Every pull request triggers a GitHub Actions workflow that runs five jobs in parallel. "Parallel" is the key word — each job is independent and starts immediately, so the feedback loop is bounded by the slowest job rather than the sum of all jobs. In practice, this means contributors get a full quality report in roughly the time it takes a single build to complete.
Here's the full breakdown of what each job checks and why it exists:
| Job | What It Checks | Why It Exists |
|---|---|---|
validate |
JSON validity, plugin structure, catalog sync, secret scanning, dangerous patterns | Catches structural problems and security issues before any build runs. This is the cheapest check to run and the most expensive to miss. |
test (matrix) |
MCP plugin builds + vitest unit tests, Python pytest, ccpi validate --strict, SKILL.md frontmatter |
Runs as a matrix so each plugin type gets appropriate tests. TypeScript MCP plugins get a real build; Python utilities get pytest; all skills get strict CLI validation. |
check-package-manager |
Enforces pnpm everywhere except marketplace/, which must use npm |
Prevents lockfile conflicts. The marketplace is excluded from the pnpm workspace intentionally — mixing package managers mid-dependency-tree causes subtle, hard-to-diagnose failures. |
marketplace-validation |
Astro build, route validation, internal link checking, smoke tests, cowork download integrity, security scan of zip contents | The website is a build artifact. This job treats it like one — verifying every route exists, every link resolves, and every downloadable zip is clean before it's ever served. |
playwright-tests |
End-to-end tests on Chromium, WebKit, and mobile viewports (depends on marketplace-validation) | Runs after marketplace-validation passes, because there's no point testing browser behavior against a broken build. Tests cover the T1–T9 test suite across all viewport types. |
cli-smoke-tests |
CLI build, --help, --version, npm pack, no workspace: dependencies |
The ccpi CLI is published to npm. It must never reference internal workspace packages that don't exist on the public registry. This job catches that before publish. |
The Dependency Between Jobs
The one explicit dependency in the pipeline — Playwright waiting on marketplace-validation — is worth examining. It's not there because of technical necessity alone. It's there because running browser tests against a build that already failed validation would generate misleading failures. You'd see Playwright reporting that certain pages don't exist, when the real problem is that the Astro build never completed successfully.
This "gate-then-test" pattern shows up repeatedly in the ecosystem's design: validate structure before exercising behavior. Structural failures are cheaper to diagnose and fix. Behavioral failures on top of structural failures compound into noise.
Validation Before You Commit
The CI pipeline is the last line of defense. The first line is local — a pair of scripts that contributors can run before a single line is pushed.
quick-test.sh: The 30-Second Check
The design goal for ./scripts/quick-test.sh was simple: give contributors a complete sanity check in roughly the time it takes to make a cup of coffee. In practice it runs in about 30 seconds. It covers the build, linting, and core validation — enough to catch the most common categories of error without requiring a full marketplace build.
Thirty seconds is a psychological threshold. If a local check takes longer than that, developers start skipping it. Shorter than that, and it often misses too much to be trustworthy. Thirty seconds is the sweet spot where the check is both fast enough to run habitually and thorough enough to matter.
What quick-test.sh covers:
- Full workspace build via
pnpm build - ESLint across all packages
- Core plugin structure validation
- JSON well-formedness checks
- Frontmatter presence and format
validate-all-plugins.sh: The Full Check
When a contributor is adding a new plugin or making significant changes to an existing one, ./scripts/validate-all-plugins.sh is the appropriate tool. It takes longer but covers the complete validation surface:
- JSON validity for every
plugin.jsonand catalog file - Required field presence in plugin manifests
- Catalog sync verification (extended vs generated)
- Secret scanning for accidentally committed credentials
- Dangerous pattern detection (shell injection vectors, unsafe permissions)
- SKILL.md frontmatter completeness
- Internal reference integrity (skills referencing tools that don't exist, etc.)
Both scripts can be scoped to a single plugin by passing its path as an argument. This matters in a repo with 270+ plugins — validating the whole tree when you've only touched one directory is wasteful. The scripts are designed to support focused, incremental validation as the natural workflow.
pnpm run sync-marketplace and ./scripts/validate-all-plugins.sh as mandatory pre-commit steps, not optional suggestions. The expectation is that CI catches things you miss locally — not things you didn't bother to check.
The Skills Grading System
Skills are the most complex content type in the ecosystem. Unlike plugin manifests (which are largely structural) or slash commands (which have a simpler schema), skills involve instructional content that Claude actually uses at runtime. Bad skill design doesn't just fail validation — it produces AI behavior that's subtly wrong in ways that are hard to debug.
The 100-point grading rubric in scripts/validate-skills-schema.py exists to surface that quality gradient before it reaches production.
What the Rubric Measures
The rubric grades along two axes: structure and substance. Structure checks whether required sections exist and frontmatter is complete. Substance checks whether the content within those sections actually does what it's supposed to do.
Required SKILL.md sections:
- Overview — What this skill does and when to use it
- Prerequisites — What must be true before the skill can run
- Instructions — The actual procedural content Claude executes
- Output — What the skill produces and in what format
- Error Handling — What to do when things go wrong
- Examples — Concrete usage examples with expected results
- Resources — References and further reading
Required frontmatter fields:
name— Kebab-case skill identifierdescription— Multi-line trigger phrase description (this is what Claude reads to decide whether to activate the skill)allowed-tools— Explicit tool permission listversion— Semantic version stringauthor— Name and email
What Earns an A vs. What Gets an F
| Grade | Score Range | Typical Profile |
|---|---|---|
| A | 90–100 | All sections present and substantive. Description includes multiple trigger phrases. Error handling covers at least 3 failure modes. Examples show both happy path and edge cases. |
| B | 80–89 | All required sections present. One or two are thin (e.g., minimal examples, sparse error handling). Frontmatter complete. |
| C | 70–79 | Most sections present but some are placeholders. Description is functional but lacks trigger phrase variety. May be missing one non-critical section. |
| D | 60–69 | Multiple sections missing or near-empty. Frontmatter may be incomplete. Skill would likely activate but produce poor results. |
| F | Below 60 | Structural failure. Missing required frontmatter fields, missing critical sections, or content that indicates the skill was never actually tested. |
Running the Grader
The validator supports several modes depending on what you want to check:
python3 scripts/validate-skills-schema.py --verbose— Full report on all content (skills, commands, agents)python3 scripts/validate-skills-schema.py --skills-only— SKILL.md files onlypython3 scripts/validate-skills-schema.py --commands-only— Commands only
The verbose output shows per-section scores, which is where the diagnostic value actually lives. Knowing a skill scored 74 overall is less useful than knowing it scored full marks on structure and lost points specifically on Error Handling and Examples.
Package Manager Enforcement
One of the stranger-looking constraints in the repository is the split package manager policy: pnpm everywhere, except the marketplace/ directory which must use npm. This isn't an accident and it isn't arbitrary — it reflects a real architectural boundary.
Why the Split Exists
The marketplace is an Astro 5 website that lives in a directory explicitly excluded from the pnpm workspace. This exclusion is intentional: the marketplace has its own dependency graph, its own build pipeline, and its own deployment target. Treating it as a workspace member would pull it into the pnpm lockfile alongside CLI packages and plugin utilities, creating cross-contamination between completely unrelated dependency trees.
npm inside marketplace/ and pnpm everywhere else creates a clean boundary. The two package managers manage two separate node_modules trees with no shared state.
How CI Enforces It
The check-package-manager job runs scripts/check-package-manager.mjs, which inspects each directory and verifies the correct package manager is in use. If a contributor accidentally runs pnpm install inside marketplace/ (creating a pnpm-lock.yaml where there should be a package-lock.json), this job fails immediately with a clear message.
Package manager drift is one of those problems that's invisible until it isn't. A mismatched lockfile won't cause an error during development if everyone on the team happens to be running the same version. It will cause reproducibility failures in CI, in deployment, and for any contributor who doesn't know to look for the drift.
The Marketplace Build Pipeline
The marketplace build is not a single command — it's a five-step sequential pipeline, where each step produces output that the next step depends on. Running npm run build inside marketplace/ kicks off all five steps in order via scripts/build.mjs.
The Five Steps
| Step | Script | What It Produces |
|---|---|---|
| 1 | discover-skills.mjs |
Scans all plugins and extracts SKILL.md data into src/data/ — the structured representation of every skill in the ecosystem |
| 2 | sync-catalog.mjs |
Copies catalog JSON (from the auto-generated marketplace.json) into the marketplace data directory so Astro can read it at build time |
| 3 | generate-unified-search.mjs |
Builds the Fuse.js search index from the discovered skills and catalog data — the search index that powers the site's search UI |
| 4 | build-cowork-zips.mjs |
Generates plugin zips, category bundles, a mega-zip of everything, and a manifest with download links and checksums for the /cowork page |
| 5 | astro build |
Static site generation — takes all the data produced in steps 1–4 and renders every page |
Why Order Matters
The sequential ordering isn't just convenience — each step's output is an input to later steps. Step 1 must run before step 3 because the unified search index is built from the discovered skill data. Step 4 must run before step 5 because Astro needs the cowork manifest to generate the correct download links on the cowork page.
If these steps ran in parallel, you'd get race conditions: Astro trying to render pages that reference a search index that hasn't been built yet, or download links pointing to zips that aren't generated. Sequential execution trades some parallelism for deterministic correctness.
compressHTML is disabled in the Astro config. This is not an oversight — iOS Safari fails to render HTML lines longer than 5,000 characters. Enabling compression would concatenate HTML onto single lines, producing exactly that condition. The CI pipeline includes a smoke test that verifies this setting remains disabled.
Post-Build Safety Nets
After Astro completes the static site build, a second layer of validation runs against the generated output — not the source files, but the actual HTML, routes, and assets that would be served in production. This is the layer where assumptions get tested against reality.
What Runs After the Build
- validate-routes.mjs — Verifies that every plugin page referenced in the catalog actually has a corresponding route in the built output. A plugin in the catalog without a page is a broken link waiting to happen.
- validate-playbook-routes.mjs — Same check specifically for playbook pages, which have their own URL structure and are validated separately.
- validate-internal-links.mjs — Crawls the built site starting from seed pages (index, playbooks, explore, skills, cowork) and verifies every internal link resolves. This catches link rot introduced by page renames, route restructuring, or deleted content.
- validate-links.mjs — Specifically checks skill-to-plugin link integrity: does every skill page correctly link back to its parent plugin?
- validate-cowork-downloads.mjs — Verifies the cowork zip build output: manifest existence, checksums, download link correctness. Every file advertised for download must exist and match its checksum.
- validate-cowork-security.mjs — Scans zip contents for security issues: no secrets, no
node_modules, no files that should never be distributed.
What Breaks Without These
Each of these checks exists because the corresponding failure actually happened — or the failure mode was identified as high-probability enough to warrant preemptive detection. Route validation catches the case where a plugin is added to the catalog but the Astro page generation logic doesn't pick it up correctly. Link validation catches the case where a page is renamed without updating all the references to it.
The cowork security scan is particularly important. The cowork feature lets teams download plugin bundles as zips. If a contributor accidentally includes a .env file in their plugin directory, that file would end up in the zip without the security scan. This is the kind of mistake that's obvious in retrospect and invisible in the moment.
Post-build validation is fundamentally different from pre-build validation. Pre-build checks verify that source files are correct. Post-build checks verify that the build process itself worked correctly — that the transformation from source to artifact didn't introduce new problems. Both layers are necessary.
The Two-Catalog Sync Check
The most unusual part of the ecosystem's CI design is the two-catalog system — and the sync check that keeps it honest.
Why Two Catalogs
The ecosystem maintains two versions of the plugin catalog:
| File | Purpose | Who Edits It |
|---|---|---|
.claude-plugin/marketplace.extended.json |
Source of truth with full metadata including extended fields: featured, mcpTools, pluginCount, pricing, components, zcf_metadata, external_sync |
Contributors and maintainers — this is the file you edit |
.claude-plugin/marketplace.json |
CLI-compatible catalog with extended fields stripped out — exactly what the ccpi CLI fetches and caches |
Never — auto-generated by pnpm run sync-marketplace |
The split exists because the CLI and the website need different things from the catalog. The website benefits from rich metadata about pricing, featured status, and component structure. The CLI needs a clean, minimal schema that won't break on fields it doesn't understand. Rather than maintaining two separate files manually (a recipe for drift), the extended file is the canonical source and the CLI file is always derived from it.
The Sync Command
Running pnpm run sync-marketplace strips the extended-only fields from marketplace.extended.json and writes the result to marketplace.json. It's a deterministic transformation: same input always produces the same output.
This command is marked as mandatory before every commit in the project's CLAUDE.md. But mandatory instructions get forgotten. The CI check is what actually enforces it.
What the CI Check Does
During the validate job, CI runs the sync command and compares the result to the committed marketplace.json. If they differ — if someone edited marketplace.extended.json and committed without running sync — the job fails with a clear message explaining that the catalogs are out of sync.
The Data Flow
Understanding the full data flow explains why the sync matters so much:
- Contributor edits
marketplace.extended.json - Contributor runs
pnpm run sync-marketplaceto regeneratemarketplace.json - Both files are committed
- CI verifies the files are in sync
- On merge to main, CI deploys
marketplace.jsonto GitHub Pages ccpiCLI fetches from GitHub Pages and caches locally- Plugin installations use the cached catalog
Any break in this chain — especially between steps 2 and 3, or between steps 3 and 4 — produces a stale CLI experience for everyone who runs ccpi after the broken merge. The sync check at step 4 is the last opportunity to catch it.
Setting Up Your Own Gates
If you're a plugin author working in this ecosystem, or building something inspired by these patterns, here's the local workflow that mirrors what CI does — so you're never surprised by a CI failure on something you could have caught in 30 seconds.
Before Any Commit
Two commands cover the mandatory pre-commit surface:
pnpm run sync-marketplace— Regenerates the CLI catalog from the extended catalog. Run this every time you touchmarketplace.extended.json../scripts/validate-all-plugins.sh— Full plugin validation. Run this when adding or significantly modifying a plugin. For quick changes,./scripts/quick-test.shis faster.
For Skill Authors
If you're writing or revising a SKILL.md, run the grader before pushing:
python3 scripts/validate-skills-schema.py --verbose— Full verbose output so you can see exactly which sections are losing pointspython3 scripts/validate-skills-schema.py --skills-only— Scoped to skills if you want faster feedback
Target an A or high B before submitting. A C-grade skill will pass validation but will produce inconsistent behavior at runtime — it has enough structure to not fail, but not enough substance to work reliably.
For the ccpi CLI
The ccpi validate command is the same tool CI uses. Running it locally gives you the same feedback CI would give, before CI has to tell you:
ccpi validate --strict— Full validation, fails on warnings as well as errorsccpi validate --skills— Skills-only validationccpi validate --frontmatter— Frontmatter-only checkccpi validate --json— Outputs results as JSON, useful for scripting or integrating with other tools
For MCP Plugin Authors
MCP plugins have additional requirements that pure-markdown plugins don't. The full local workflow for an MCP plugin:
cd plugins/mcp/[name]/ && pnpm build— Build the TypeScript source todist/index.jschmod +x dist/index.js— Make the built file executable (the shebang line is not enough without this)- Verify the
.mcp.jsonfile references the correct entry point - Verify
plugin.jsonuses only the 8 allowed fields — CI will reject any others
name, version, description, author, repository, homepage, license, keywords. Any additional field in a submitted plugin.json causes CI to reject the PR. This keeps the CLI schema stable and prevents plugins from encoding runtime state or extended metadata in a file that's meant to be minimal.
The Marketplace Dev Loop
For contributors working on the marketplace website itself, the dev loop is:
cd marketplace/ && npm run dev— Dev server atlocalhost:4321cd marketplace/ && npm run build— Full production build (all five pipeline steps)cd marketplace/ && npm run validate— Route and link validation against the built outputcd marketplace/ && npx playwright test— E2E tests on Chromium and WebKit
Running the full build locally before pushing is the single most effective way to avoid marketplace-validation CI failures. The dev server runs fast but skips some pipeline steps. The production build is slower but validates the complete artifact.
The pattern that runs through all of these gates is the same: validate at the cheapest possible point, as early as possible, against the most complete model of what "correct" looks like. Local scripts catch things in 30 seconds that would otherwise surface as CI failures 5 minutes after a push. Post-build checks catch things that only become visible in the artifact that would otherwise surface as production errors. The two-catalog sync check catches drift that would otherwise be invisible until someone's CLI installation breaks.
None of this is novel. It's just disciplined application of the principle that quality is cheaper to maintain than to repair — and that the earlier you catch a problem, the less it costs.
Cite This Research
CI/CD Integration Patterns. Claude Code Plugins Research, 2026. https://tonsofskills.com/research/cicd-integration-patterns/