bug-clustering

Internal process for the bug-clusterer agent. Defines the step-by-step procedure for parsing, classifying, redacting, scoring, and clustering bug candidates from raw X/Twitter posts. Not user-invocable — loaded by the bug-clusterer agent through its skills frontmatter.

4 Tools
x-bug-triage-plugin Plugin
mcp Category

Allowed Tools

"ReadBash(cat:*)GrepGlob"

Provided by Plugin

x-bug-triage-plugin

Closed-loop bug triage: X complaints → clusters → repo evidence → owner routing → terminal review → filed issues

mcp v0.3.0
View Plugin

Installation

This skill is included in the x-bug-triage-plugin plugin:

/plugin install x-bug-triage-plugin@claude-code-plugins-plus

Click to copy

Instructions

Bug Clustering Process

Step-by-step procedure for transforming raw XPost objects into structured, clustered bug candidates with PII redaction and reliability scoring.

Overview

Loaded by the bug-clusterer agent inside the x-bug-triage plugin. Transforms raw XPost records ingested from X/Twitter into structured BugCandidate rows, deduplicates near-identical candidates, classifies each candidate into one of 12 bug families, redacts six categories of PII, scores reporter reliability across four dimensions, and groups results into bug clusters by deterministic signature. The output feeds the downstream repo-scanning, owner-routing, and triage-display stages.

Prerequisites

  • Database initialized via lib/db.ts schema migrations
  • config/cluster-matching-thresholds.json present (controls dedup + cluster overlap thresholds)
  • config/approved-accounts.json present (drives reporter category tagging)
  • Raw XPost[] array passed in by the orchestrator (not fetched here)

Instructions

Step 1: Parse

For each XPost, produce a BugCandidate with all 33 fields using lib/parser.ts:

  • Extract productsurface, featurearea, symptoms, errorstrings, reprohints
  • Extract urls, media_keys, language, conversation references
  • Determine sourcetype (mention, reply, quotepost, search_hit)

Step 1.5: Deduplicate

Before classification, run content-similarity deduplication using lib/dedupe.ts:

  • Call deduplicateCandidates() with parsed candidates and the candidatededup.hybridsimilarity_threshold from config/cluster-matching-thresholds.json (default 0.70)
  • Uses char-trigram + token-Jaccard hybrid similarity
  • Does NOT remove posts — tags them as duplicate groups with a canonical post (highest engagement)
  • Only canonical posts and non-duplicates (forward_ids) proceed to classification
  • Log dedup stats in the form: "N posts (M unique, K duplicate groups)" with N/M/K replaced by integer counts

Step 2: Classify

Run lib/classifier.ts on each candidate:

  • Assign one of 12 classifications with confidence score (0.0-1.0) and rationale
  • Sarcastic bug reports get classified separately — still treated as signal

Step 3: Redact PII

Run lib/redactor.ts on each candidate:

  • Detect 6 PII types: email, API key, phone, account ID, media flag, URL token
  • Replace with [REDACTED:type] tags
  • Set piiflags array and rawtextstoragepolicy

Step 4: Score Reliability

Run lib/reporter-scorer.ts on each candidate:

  • 4 dimensions: report quality, independence, account authenticity, historical accuracy
  • Composite reporterreliabilityscore (0.0-1.0)

Step 5: Tag Reporter Category

Match author against approved_accounts config:

  • Categories: public, internal, partner, tester

Step 6: Cluster

Using lib/clusterer.ts and lib/signatures.ts:

  • Generate deterministic bug signature from errorstrings + symptoms + featurearea
  • Match against active_clusters at >=70% signature overlap
  • Family-first guard: different ClusterFamilies NEVER cluster together
  • New match: create cluster (initial severity "low")
  • Existing match: update reportcount, lastseen, sub_status
  • Resolved match: reopen with substatus "regressionreopened"
  • Suppressed match: skip, log to audit

Step 7: Persist

  • Insert candidates to DB via lib/db.ts
  • Insert/update clusters and cluster_posts junction
  • Write audit events for each classification, redaction, and cluster action

Output

  • bug_candidates rows with classification, PII flags, and reporter reliability scores
  • bugclusters rows (new or updated) with severity, substatus, and report_count
  • cluster_posts junction rows linking candidates to clusters
  • Audit-event rows recording every classification, redaction, and cluster action

Error Handling

  • Parser failure on a single XPost: log + skip, continue with remaining posts (degraded mode)
  • Classification confidence below threshold: still recorded, flagged for human review
  • Cluster signature collision across families: hard-blocked by the family-first guard — never cross-clusters
  • Persist failure mid-batch: rollback uncommitted writes for that batch only; preserve already-committed work

Examples

The bug-clusterer agent invokes this skill after the X/Twitter ingest phase completes. A typical batch processes 50–500 candidates per run, producing 5–30 clusters depending on overlap. Sample audit log line: "127 posts (89 unique, 14 duplicate groups, 6 new clusters, 8 cluster updates)".

Resources

Load evidence tier definitions for proper cluster evidence assessment:


!cat skills/x-bug-triage/references/evidence-policy.md

Load data model reference for BugCandidate fields and cluster schemas:


!cat skills/x-bug-triage/references/schemas.md

Ready to use x-bug-triage-plugin?