CI/CD Integration Patterns | Research

Most quality problems in software are discovered too late. A missed validation, a broken link, a secret accidentally committed — these are the kinds of things that slip through when the only check is "does it look right to me right now?" The Claude Code Plugins ecosystem was designed around the opposite assumption: that every change should be challenged automatically, at multiple layers, before it ever ships.

What follows is a detailed look at how that challenge system actually works — the five parallel jobs that run on every pull request, the local scripts that catch problems in 30 seconds, the grading rubric that scores every skill out of 100 points, and the two-catalog sync that ensures what contributors edit and what the CLI downloads are never allowed to drift apart.

This isn't a tutorial on setting up CI/CD. It's an analysis of the specific patterns that emerged from maintaining 270+ plugins and 1,500+ skills — and what each layer of validation actually protects against.

The Five-Job Pipeline

Every pull request triggers a GitHub Actions workflow that runs five jobs in parallel. "Parallel" is the key word — each job is independent and starts immediately, so the feedback loop is bounded by the slowest job rather than the sum of all jobs. In practice, this means contributors get a full quality report in roughly the time it takes a single build to complete.

Here's the full breakdown of what each job checks and why it exists:

Job	What It Checks	Why It Exists
`validate`	JSON validity, plugin structure, catalog sync, secret scanning, dangerous patterns	Catches structural problems and security issues before any build runs. This is the cheapest check to run and the most expensive to miss.
`test` (matrix)	MCP plugin builds + vitest unit tests, Python pytest, `ccpi validate --strict`, SKILL.md frontmatter	Runs as a matrix so each plugin type gets appropriate tests. TypeScript MCP plugins get a real build; Python utilities get pytest; all skills get strict CLI validation.
`check-package-manager`	Enforces pnpm everywhere except `marketplace/`, which must use npm	Prevents lockfile conflicts. The marketplace is excluded from the pnpm workspace intentionally — mixing package managers mid-dependency-tree causes subtle, hard-to-diagnose failures.
`marketplace-validation`	Astro build, route validation, internal link checking, smoke tests, cowork download integrity, security scan of zip contents	The website is a build artifact. This job treats it like one — verifying every route exists, every link resolves, and every downloadable zip is clean before it's ever served.
`playwright-tests`	End-to-end tests on Chromium, WebKit, and mobile viewports (depends on marketplace-validation)	Runs after marketplace-validation passes, because there's no point testing browser behavior against a broken build. Tests cover the T1–T9 test suite across all viewport types.
`cli-smoke-tests`	CLI build, `--help`, `--version`, `npm pack`, no `workspace:` dependencies	The `ccpi` CLI is published to npm. It must never reference internal workspace packages that don't exist on the public registry. This job catches that before publish.

6 jobs, 1 PR. The workflow runs all five primary jobs plus cli-smoke-tests simultaneously. The Playwright tests gate on marketplace-validation, but everything else starts immediately. A contributor knows within minutes whether their change is structurally sound.

The Dependency Between Jobs

The one explicit dependency in the pipeline — Playwright waiting on marketplace-validation — is worth examining. It's not there because of technical necessity alone. It's there because running browser tests against a build that already failed validation would generate misleading failures. You'd see Playwright reporting that certain pages don't exist, when the real problem is that the Astro build never completed successfully.

This "gate-then-test" pattern shows up repeatedly in the ecosystem's design: validate structure before exercising behavior. Structural failures are cheaper to diagnose and fix. Behavioral failures on top of structural failures compound into noise.

Validation Before You Commit

The CI pipeline is the last line of defense. The first line is local — a pair of scripts that contributors can run before a single line is pushed.

quick-test.sh: The 30-Second Check

The design goal for ./scripts/quick-test.sh was simple: give contributors a complete sanity check in roughly the time it takes to make a cup of coffee. In practice it runs in about 30 seconds. It covers the build, linting, and core validation — enough to catch the most common categories of error without requiring a full marketplace build.

Thirty seconds is a psychological threshold. If a local check takes longer than that, developers start skipping it. Shorter than that, and it often misses too much to be trustworthy. Thirty seconds is the sweet spot where the check is both fast enough to run habitually and thorough enough to matter.

What quick-test.sh covers:

Full workspace build via pnpm build
ESLint across all packages
Core plugin structure validation
JSON well-formedness checks
Frontmatter presence and format

validate-all-plugins.sh: The Full Check

When a contributor is adding a new plugin or making significant changes to an existing one, ./scripts/validate-all-plugins.sh is the appropriate tool. It takes longer but covers the complete validation surface:

JSON validity for every plugin.json and catalog file
Required field presence in plugin manifests
Catalog sync verification (extended vs generated)
Secret scanning for accidentally committed credentials
Dangerous pattern detection (shell injection vectors, unsafe permissions)
SKILL.md frontmatter completeness
Internal reference integrity (skills referencing tools that don't exist, etc.)

Both scripts can be scoped to a single plugin by passing its path as an argument. This matters in a repo with 270+ plugins — validating the whole tree when you've only touched one directory is wasteful. The scripts are designed to support focused, incremental validation as the natural workflow.

Mandatory before commit. The CLAUDE.md for this repository marks pnpm run sync-marketplace and ./scripts/validate-all-plugins.sh as mandatory pre-commit steps, not optional suggestions. The expectation is that CI catches things you miss locally — not things you didn't bother to check.

The Skills Grading System

Skills are the most complex content type in the ecosystem. Unlike plugin manifests (which are largely structural) or slash commands (which have a simpler schema), skills involve instructional content that Claude actually uses at runtime. Bad skill design doesn't just fail validation — it produces AI behavior that's subtly wrong in ways that are hard to debug.

The 100-point grading rubric in scripts/validate-skills-schema.py exists to surface that quality gradient before it reaches production.

What the Rubric Measures

The rubric grades along two axes: structure and substance. Structure checks whether required sections exist and frontmatter is complete. Substance checks whether the content within those sections actually does what it's supposed to do.

Required SKILL.md sections:

Overview — What this skill does and when to use it
Prerequisites — What must be true before the skill can run
Instructions — The actual procedural content Claude executes
Output — What the skill produces and in what format
Error Handling — What to do when things go wrong
Examples — Concrete usage examples with expected results
Resources — References and further reading

Required frontmatter fields:

name — Kebab-case skill identifier
description — Multi-line trigger phrase description (this is what Claude reads to decide whether to activate the skill)
allowed-tools — Explicit tool permission list
version — Semantic version string
author — Name and email

What Earns an A vs. What Gets an F

Grade	Score Range	Typical Profile
A	90–100	All sections present and substantive. Description includes multiple trigger phrases. Error handling covers at least 3 failure modes. Examples show both happy path and edge cases.
B	80–89	All required sections present. One or two are thin (e.g., minimal examples, sparse error handling). Frontmatter complete.
C	70–79	Most sections present but some are placeholders. Description is functional but lacks trigger phrase variety. May be missing one non-critical section.
D	60–69	Multiple sections missing or near-empty. Frontmatter may be incomplete. Skill would likely activate but produce poor results.
F	Below 60	Structural failure. Missing required frontmatter fields, missing critical sections, or content that indicates the skill was never actually tested.

The description field is disproportionately important. Claude reads the skill description to determine whether the current task matches what the skill does. A description that says "Helps with code" will never activate reliably. A description that lists specific trigger phrases — "Use when the user asks to refactor a function, extract a method, or improve code readability" — gives Claude the signal it needs to match intent to tool.

Running the Grader

The validator supports several modes depending on what you want to check:

python3 scripts/validate-skills-schema.py --verbose — Full report on all content (skills, commands, agents)
python3 scripts/validate-skills-schema.py --skills-only — SKILL.md files only
python3 scripts/validate-skills-schema.py --commands-only — Commands only

The verbose output shows per-section scores, which is where the diagnostic value actually lives. Knowing a skill scored 74 overall is less useful than knowing it scored full marks on structure and lost points specifically on Error Handling and Examples.

Package Manager Enforcement

One of the stranger-looking constraints in the repository is the split package manager policy: pnpm everywhere, except the marketplace/ directory which must use npm. This isn't an accident and it isn't arbitrary — it reflects a real architectural boundary.

Why the Split Exists

The marketplace is an Astro 5 website that lives in a directory explicitly excluded from the pnpm workspace. This exclusion is intentional: the marketplace has its own dependency graph, its own build pipeline, and its own deployment target. Treating it as a workspace member would pull it into the pnpm lockfile alongside CLI packages and plugin utilities, creating cross-contamination between completely unrelated dependency trees.

npm inside marketplace/ and pnpm everywhere else creates a clean boundary. The two package managers manage two separate node_modules trees with no shared state.

How CI Enforces It

The check-package-manager job runs scripts/check-package-manager.mjs, which inspects each directory and verifies the correct package manager is in use. If a contributor accidentally runs pnpm install inside marketplace/ (creating a pnpm-lock.yaml where there should be a package-lock.json), this job fails immediately with a clear message.

Package manager drift is one of those problems that's invisible until it isn't. A mismatched lockfile won't cause an error during development if everyone on the team happens to be running the same version. It will cause reproducibility failures in CI, in deployment, and for any contributor who doesn't know to look for the drift.

The enforcement is directory-aware. The check script doesn't just look for one package manager globally — it applies the rule per directory. Root and packages use pnpm. marketplace/ uses npm. Anything else triggers a failure. This makes it easy to add new directories with explicit package manager assignments without rewriting the entire enforcement script.

The Marketplace Build Pipeline

The marketplace build is not a single command — it's a five-step sequential pipeline, where each step produces output that the next step depends on. Running npm run build inside marketplace/ kicks off all five steps in order via scripts/build.mjs.

The Five Steps

Step	Script	What It Produces
1	`discover-skills.mjs`	Scans all plugins and extracts SKILL.md data into `src/data/` — the structured representation of every skill in the ecosystem
2	`sync-catalog.mjs`	Copies catalog JSON (from the auto-generated `marketplace.json`) into the marketplace data directory so Astro can read it at build time
3	`generate-unified-search.mjs`	Builds the Fuse.js search index from the discovered skills and catalog data — the search index that powers the site's search UI
4	`build-cowork-zips.mjs`	Generates plugin zips, category bundles, a mega-zip of everything, and a manifest with download links and checksums for the `/cowork` page
5	`astro build`	Static site generation — takes all the data produced in steps 1–4 and renders every page

Why Order Matters

The sequential ordering isn't just convenience — each step's output is an input to later steps. Step 1 must run before step 3 because the unified search index is built from the discovered skill data. Step 4 must run before step 5 because Astro needs the cowork manifest to generate the correct download links on the cowork page.

If these steps ran in parallel, you'd get race conditions: Astro trying to render pages that reference a search index that hasn't been built yet, or download links pointing to zips that aren't generated. Sequential execution trades some parallelism for deterministic correctness.

One known gotcha in the pipeline: compressHTML is disabled in the Astro config. This is not an oversight — iOS Safari fails to render HTML lines longer than 5,000 characters. Enabling compression would concatenate HTML onto single lines, producing exactly that condition. The CI pipeline includes a smoke test that verifies this setting remains disabled.

Post-Build Safety Nets

After Astro completes the static site build, a second layer of validation runs against the generated output — not the source files, but the actual HTML, routes, and assets that would be served in production. This is the layer where assumptions get tested against reality.

What Runs After the Build

validate-routes.mjs — Verifies that every plugin page referenced in the catalog actually has a corresponding route in the built output. A plugin in the catalog without a page is a broken link waiting to happen.
validate-playbook-routes.mjs — Same check specifically for playbook pages, which have their own URL structure and are validated separately.
validate-internal-links.mjs — Crawls the built site starting from seed pages (index, playbooks, explore, skills, cowork) and verifies every internal link resolves. This catches link rot introduced by page renames, route restructuring, or deleted content.
validate-links.mjs — Specifically checks skill-to-plugin link integrity: does every skill page correctly link back to its parent plugin?
validate-cowork-downloads.mjs — Verifies the cowork zip build output: manifest existence, checksums, download link correctness. Every file advertised for download must exist and match its checksum.
validate-cowork-security.mjs — Scans zip contents for security issues: no secrets, no node_modules, no files that should never be distributed.

What Breaks Without These

Each of these checks exists because the corresponding failure actually happened — or the failure mode was identified as high-probability enough to warrant preemptive detection. Route validation catches the case where a plugin is added to the catalog but the Astro page generation logic doesn't pick it up correctly. Link validation catches the case where a page is renamed without updating all the references to it.

The cowork security scan is particularly important. The cowork feature lets teams download plugin bundles as zips. If a contributor accidentally includes a .env file in their plugin directory, that file would end up in the zip without the security scan. This is the kind of mistake that's obvious in retrospect and invisible in the moment.

Post-build validation is fundamentally different from pre-build validation. Pre-build checks verify that source files are correct. Post-build checks verify that the build process itself worked correctly — that the transformation from source to artifact didn't introduce new problems. Both layers are necessary.

The Two-Catalog Sync Check

The most unusual part of the ecosystem's CI design is the two-catalog system — and the sync check that keeps it honest.

Why Two Catalogs

The ecosystem maintains two versions of the plugin catalog:

File	Purpose	Who Edits It
`.claude-plugin/marketplace.extended.json`	Source of truth with full metadata including extended fields: `featured`, `mcpTools`, `pluginCount`, `pricing`, `components`, `zcf_metadata`, `external_sync`	Contributors and maintainers — this is the file you edit
`.claude-plugin/marketplace.json`	CLI-compatible catalog with extended fields stripped out — exactly what the `ccpi` CLI fetches and caches	Never — auto-generated by `pnpm run sync-marketplace`

The split exists because the CLI and the website need different things from the catalog. The website benefits from rich metadata about pricing, featured status, and component structure. The CLI needs a clean, minimal schema that won't break on fields it doesn't understand. Rather than maintaining two separate files manually (a recipe for drift), the extended file is the canonical source and the CLI file is always derived from it.

The Sync Command

Running pnpm run sync-marketplace strips the extended-only fields from marketplace.extended.json and writes the result to marketplace.json. It's a deterministic transformation: same input always produces the same output.

This command is marked as mandatory before every commit in the project's CLAUDE.md. But mandatory instructions get forgotten. The CI check is what actually enforces it.

What the CI Check Does

During the validate job, CI runs the sync command and compares the result to the committed marketplace.json. If they differ — if someone edited marketplace.extended.json and committed without running sync — the job fails with a clear message explaining that the catalogs are out of sync.

Out-of-sync catalogs are a silent failure without this check. If the catalogs drift, the CLI will fetch an outdated version of the catalog — potentially missing new plugins, showing stale pricing, or referencing removed entries. Contributors who forget to run sync don't see any local error. The CI check is the only reliable detector.

The Data Flow

Understanding the full data flow explains why the sync matters so much:

Contributor edits marketplace.extended.json
Contributor runs pnpm run sync-marketplace to regenerate marketplace.json
Both files are committed
CI verifies the files are in sync
On merge to main, CI deploys marketplace.json to GitHub Pages
ccpi CLI fetches from GitHub Pages and caches locally
Plugin installations use the cached catalog

Any break in this chain — especially between steps 2 and 3, or between steps 3 and 4 — produces a stale CLI experience for everyone who runs ccpi after the broken merge. The sync check at step 4 is the last opportunity to catch it.

Setting Up Your Own Gates

If you're a plugin author working in this ecosystem, or building something inspired by these patterns, here's the local workflow that mirrors what CI does — so you're never surprised by a CI failure on something you could have caught in 30 seconds.

Before Any Commit

Two commands cover the mandatory pre-commit surface:

pnpm run sync-marketplace — Regenerates the CLI catalog from the extended catalog. Run this every time you touch marketplace.extended.json.
./scripts/validate-all-plugins.sh — Full plugin validation. Run this when adding or significantly modifying a plugin. For quick changes, ./scripts/quick-test.sh is faster.

For Skill Authors

If you're writing or revising a SKILL.md, run the grader before pushing:

python3 scripts/validate-skills-schema.py --verbose — Full verbose output so you can see exactly which sections are losing points
python3 scripts/validate-skills-schema.py --skills-only — Scoped to skills if you want faster feedback

Target an A or high B before submitting. A C-grade skill will pass validation but will produce inconsistent behavior at runtime — it has enough structure to not fail, but not enough substance to work reliably.

For the ccpi CLI

The ccpi validate command is the same tool CI uses. Running it locally gives you the same feedback CI would give, before CI has to tell you:

ccpi validate --strict — Full validation, fails on warnings as well as errors
ccpi validate --skills — Skills-only validation
ccpi validate --frontmatter — Frontmatter-only check
ccpi validate --json — Outputs results as JSON, useful for scripting or integrating with other tools

For MCP Plugin Authors

MCP plugins have additional requirements that pure-markdown plugins don't. The full local workflow for an MCP plugin:

cd plugins/mcp/[name]/ && pnpm build — Build the TypeScript source to dist/index.js
chmod +x dist/index.js — Make the built file executable (the shebang line is not enough without this)
Verify the .mcp.json file references the correct entry point
Verify plugin.json uses only the 8 allowed fields — CI will reject any others

The 8-field constraint in plugin.json is a design decision, not an oversight. The allowed fields are: name, version, description, author, repository, homepage, license, keywords. Any additional field in a submitted plugin.json causes CI to reject the PR. This keeps the CLI schema stable and prevents plugins from encoding runtime state or extended metadata in a file that's meant to be minimal.

The Marketplace Dev Loop

For contributors working on the marketplace website itself, the dev loop is:

cd marketplace/ && npm run dev — Dev server at localhost:4321
cd marketplace/ && npm run build — Full production build (all five pipeline steps)
cd marketplace/ && npm run validate — Route and link validation against the built output
cd marketplace/ && npx playwright test — E2E tests on Chromium and WebKit

Running the full build locally before pushing is the single most effective way to avoid marketplace-validation CI failures. The dev server runs fast but skips some pipeline steps. The production build is slower but validates the complete artifact.

The pattern that runs through all of these gates is the same: validate at the cheapest possible point, as early as possible, against the most complete model of what "correct" looks like. Local scripts catch things in 30 seconds that would otherwise surface as CI failures 5 minutes after a push. Post-build checks catch things that only become visible in the artifact that would otherwise surface as production errors. The two-catalog sync check catches drift that would otherwise be invisible until someone's CLI installation breaks.

None of this is novel. It's just disciplined application of the principle that quality is cheaper to maintain than to repair — and that the earlier you catch a problem, the less it costs.

Cite This Research

CI/CD Integration Patterns. Claude Code Plugins Research, 2026. https://tonsofskills.com/research/cicd-integration-patterns/