bug-clustering

Internal process for the bug-clusterer agent. Defines the step-by-step procedure for parsing, classifying, redacting, scoring, and clustering bug candidates from raw X/Twitter posts. Not user-invocable — loaded by the bug-clusterer agent through its skills frontmatter.

v0.1.0

Jeremy Longshore

SEE LICENSE IN LICENSE

4 Tools

x-bug-triage-plugin Plugin

mcp Category

Allowed Tools
        "ReadBash(cat:*)GrepGlob"
      

Provided by Plugin

x-bug-triage-plugin

Closed-loop bug triage: X complaints → clusters → repo evidence → owner routing → terminal review → filed issues

mcp v0.3.0

View Plugin

Installation

This skill is included in the x-bug-triage-plugin plugin:

/plugin install x-bug-triage-plugin@claude-code-plugins-plus

Click to copy

Instructions

Bug Clustering Process

Step-by-step procedure for transforming raw XPost objects into structured, clustered bug candidates with PII redaction and reliability scoring.

Overview

Loaded by the bug-clusterer agent inside the x-bug-triage plugin. Transforms raw XPost records ingested from X/Twitter into structured BugCandidate rows, deduplicates near-identical candidates, classifies each candidate into one of 12 bug families, redacts six categories of PII, scores reporter reliability across four dimensions, and groups results into bug clusters by deterministic signature. The output feeds the downstream repo-scanning, owner-routing, and triage-display stages.

Prerequisites

Database initialized via lib/db.ts schema migrations
config/cluster-matching-thresholds.json present (controls dedup + cluster overlap thresholds)
config/approved-accounts.json present (drives reporter category tagging)
Raw XPost[] array passed in by the orchestrator (not fetched here)

Instructions

Step 1: Parse

For each XPost, produce a BugCandidate with all 33 fields using lib/parser.ts:

Extract productsurface, featurearea, symptoms, errorstrings, reprohints
Extract urls, media_keys, language, conversation references
Determine sourcetype (mention, reply, quotepost, search_hit)

Step 1.5: Deduplicate

Before classification, run content-similarity deduplication using lib/dedupe.ts:

Call deduplicateCandidates() with parsed candidates and the candidatededup.hybridsimilarity_threshold from config/cluster-matching-thresholds.json (default 0.70)
Uses char-trigram + token-Jaccard hybrid similarity
Does NOT remove posts — tags them as duplicate groups with a canonical post (highest engagement)
Only canonical posts and non-duplicates (forward_ids) proceed to classification
Log dedup stats in the form: "N posts (M unique, K duplicate groups)" with N/M/K replaced by integer counts

Step 2: Classify

Run lib/classifier.ts on each candidate:

Assign one of 12 classifications with confidence score (0.0-1.0) and rationale
Sarcastic bug reports get classified separately — still treated as signal

Step 3: Redact PII

Run lib/redactor.ts on each candidate:

Detect 6 PII types: email, API key, phone, account ID, media flag, URL token
Replace with [REDACTED:type] tags
Set piiflags array and rawtextstoragepolicy

Step 4: Score Reliability

Run lib/reporter-scorer.ts on each candidate:

4 dimensions: report quality, independence, account authenticity, historical accuracy
Composite reporterreliabilityscore (0.0-1.0)

Step 5: Tag Reporter Category

Match author against approved_accounts config:

Categories: public, internal, partner, tester

Step 6: Cluster

Using lib/clusterer.ts and lib/signatures.ts:

Generate deterministic bug signature from errorstrings + symptoms + featurearea
Match against active_clusters at >=70% signature overlap
Family-first guard: different ClusterFamilies NEVER cluster together
New match: create cluster (initial severity "low")
Existing match: update reportcount, lastseen, sub_status
Resolved match: reopen with substatus "regressionreopened"
Suppressed match: skip, log to audit

Step 7: Persist

Insert candidates to DB via lib/db.ts
Insert/update clusters and cluster_posts junction
Write audit events for each classification, redaction, and cluster action

Output

bug_candidates rows with classification, PII flags, and reporter reliability scores
bugclusters rows (new or updated) with severity, substatus, and report_count
cluster_posts junction rows linking candidates to clusters
Audit-event rows recording every classification, redaction, and cluster action

Error Handling

Parser failure on a single XPost: log + skip, continue with remaining posts (degraded mode)
Classification confidence below threshold: still recorded, flagged for human review
Cluster signature collision across families: hard-blocked by the family-first guard — never cross-clusters
Persist failure mid-batch: rollback uncommitted writes for that batch only; preserve already-committed work

Examples

The bug-clusterer agent invokes this skill after the X/Twitter ingest phase completes. A typical batch processes 50–500 candidates per run, producing 5–30 clusters depending on overlap. Sample audit log line: "127 posts (89 unique, 14 duplicate groups, 6 new clusters, 8 cluster updates)".

Resources

Load evidence tier definitions for proper cluster evidence assessment:


!cat skills/x-bug-triage/references/evidence-policy.md

Load data model reference for BugCandidate fields and cluster schemas:


!cat skills/x-bug-triage/references/schemas.md

Allowed Tools

Provided by Plugin

x-bug-triage-plugin

Installation

Instructions

Bug Clustering Process

Overview

Prerequisites

Instructions

Step 1: Parse

Step 1.5: Deduplicate

Step 2: Classify

Step 3: Redact PII

Step 4: Score Reliability

Step 5: Tag Reporter Category

Step 6: Cluster

Step 7: Persist

Output

Error Handling

Examples

Resources

Ready to use x-bug-triage-plugin?

Related Skills

access

configure

owner-routing

policy

repo-scanning

triage-display