bug-clustering
Internal process for the bug-clusterer agent. Defines the step-by-step procedure for parsing, classifying, redacting, scoring, and clustering bug candidates from raw X/Twitter posts. Not user-invocable — loaded by the bug-clusterer agent through its skills frontmatter.
Allowed Tools
Provided by Plugin
x-bug-triage-plugin
Closed-loop bug triage: X complaints → clusters → repo evidence → owner routing → terminal review → filed issues
Installation
This skill is included in the x-bug-triage-plugin plugin:
/plugin install x-bug-triage-plugin@claude-code-plugins-plus
Click to copy
Instructions
Bug Clustering Process
Step-by-step procedure for transforming raw XPost objects into structured, clustered bug candidates with PII redaction and reliability scoring.
Overview
Loaded by the bug-clusterer agent inside the x-bug-triage plugin. Transforms raw XPost records ingested from X/Twitter into structured BugCandidate rows, deduplicates near-identical candidates, classifies each candidate into one of 12 bug families, redacts six categories of PII, scores reporter reliability across four dimensions, and groups results into bug clusters by deterministic signature. The output feeds the downstream repo-scanning, owner-routing, and triage-display stages.
Prerequisites
- Database initialized via
lib/db.tsschema migrations config/cluster-matching-thresholds.jsonpresent (controls dedup + cluster overlap thresholds)config/approved-accounts.jsonpresent (drives reporter category tagging)- Raw
XPost[]array passed in by the orchestrator (not fetched here)
Instructions
Step 1: Parse
For each XPost, produce a BugCandidate with all 33 fields using lib/parser.ts:
- Extract productsurface, featurearea, symptoms, errorstrings, reprohints
- Extract urls, media_keys, language, conversation references
- Determine sourcetype (mention, reply, quotepost, search_hit)
Step 1.5: Deduplicate
Before classification, run content-similarity deduplication using lib/dedupe.ts:
- Call
deduplicateCandidates()with parsed candidates and thecandidatededup.hybridsimilarity_thresholdfromconfig/cluster-matching-thresholds.json(default 0.70) - Uses char-trigram + token-Jaccard hybrid similarity
- Does NOT remove posts — tags them as duplicate groups with a canonical post (highest engagement)
- Only canonical posts and non-duplicates (
forward_ids) proceed to classification - Log dedup stats in the form: "N posts (M unique, K duplicate groups)" with N/M/K replaced by integer counts
Step 2: Classify
Run lib/classifier.ts on each candidate:
- Assign one of 12 classifications with confidence score (0.0-1.0) and rationale
- Sarcastic bug reports get classified separately — still treated as signal
Step 3: Redact PII
Run lib/redactor.ts on each candidate:
- Detect 6 PII types: email, API key, phone, account ID, media flag, URL token
- Replace with [REDACTED:type] tags
- Set piiflags array and rawtextstoragepolicy
Step 4: Score Reliability
Run lib/reporter-scorer.ts on each candidate:
- 4 dimensions: report quality, independence, account authenticity, historical accuracy
- Composite reporterreliabilityscore (0.0-1.0)
Step 5: Tag Reporter Category
Match author against approved_accounts config:
- Categories: public, internal, partner, tester
Step 6: Cluster
Using lib/clusterer.ts and lib/signatures.ts:
- Generate deterministic bug signature from errorstrings + symptoms + featurearea
- Match against active_clusters at >=70% signature overlap
- Family-first guard: different ClusterFamilies NEVER cluster together
- New match: create cluster (initial severity "low")
- Existing match: update reportcount, lastseen, sub_status
- Resolved match: reopen with substatus "regressionreopened"
- Suppressed match: skip, log to audit
Step 7: Persist
- Insert candidates to DB via
lib/db.ts - Insert/update clusters and cluster_posts junction
- Write audit events for each classification, redaction, and cluster action
Output
bug_candidatesrows with classification, PII flags, and reporter reliability scoresbugclustersrows (new or updated) with severity, substatus, and report_countcluster_postsjunction rows linking candidates to clusters- Audit-event rows recording every classification, redaction, and cluster action
Error Handling
- Parser failure on a single XPost: log + skip, continue with remaining posts (degraded mode)
- Classification confidence below threshold: still recorded, flagged for human review
- Cluster signature collision across families: hard-blocked by the family-first guard — never cross-clusters
- Persist failure mid-batch: rollback uncommitted writes for that batch only; preserve already-committed work
Examples
The bug-clusterer agent invokes this skill after the X/Twitter ingest phase completes. A typical batch processes 50–500 candidates per run, producing 5–30 clusters depending on overlap. Sample audit log line: "127 posts (89 unique, 14 duplicate groups, 6 new clusters, 8 cluster updates)".
Resources
Load evidence tier definitions for proper cluster evidence assessment:
!cat skills/x-bug-triage/references/evidence-policy.md
Load data model reference for BugCandidate fields and cluster schemas:
!cat skills/x-bug-triage/references/schemas.md