Installation

Open Claude Code and run this command:

/plugin install clickhouse-pack@claude-code-plugins-plus

Use --global to install for all projects, or --project for current project only.

What It Does

> 24 skills for building, operating, and scaling ClickHouse-powered analytics — real @clickhouse/client code, real SQL, real MergeTree engines.

Every skill uses the official ClickHouse Node.js client (@clickhouse/client with createClient), actual ClickHouse SQL syntax (MergeTree, ReplacingMergeTree, AggregatingMergeTree), real system tables (system.parts, system.query_log, system.merges), and production patterns (parameterized queries, streaming inserts, materialized views).

Links: tonsofskills.com | ClickHouse Docs | @clickhouse/client

Skills (24)

clickhouse-ci-integration SKILL.md View full skill →

'Run ClickHouse integration tests in CI with GitHub Actions and Docker.

ReadWriteEditBash(gh:*)

ClickHouse CI Integration

Overview

Run integration tests against a real ClickHouse server in GitHub Actions using

Docker service containers. No mocks needed for schema and query validation.

Prerequisites

GitHub repository with Actions enabled
@clickhouse/client in project dependencies
Test suite (vitest or jest)

Instructions

Step 1: GitHub Actions Workflow with ClickHouse Service


# .github/workflows/clickhouse-tests.yml
name: ClickHouse Integration Tests

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest

    services:
      clickhouse:
        image: clickhouse/clickhouse-server:latest
        ports:
          - 8123:8123
          - 9000:9000
        options: >-
          --health-cmd "wget --no-verbose --tries=1 --spider http://localhost:8123/ping || exit 1"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    env:
      CLICKHOUSE_HOST: http://localhost:8123
      CLICKHOUSE_USER: default
      CLICKHOUSE_PASSWORD: ""

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: "20"
          cache: "npm"

      - run: npm ci

      # Apply schema before tests
      - name: Apply schema
        run: |
          curl -s 'http://localhost:8123/' -d 'CREATE DATABASE IF NOT EXISTS test_db'
          for f in init-db/*.sql; do
            echo "Applying $f..."
            curl -s 'http://localhost:8123/?database=test_db' --data-binary @"$f"
          done

      - name: Run unit tests
        run: npm test -- --coverage

      - name: Run integration tests
        run: npm run test:integration

Step 2: Integration Test Setup


// tests/setup-integration.ts
import { createClient, ClickHouseClient } from '@clickhouse/client';
import { beforeAll, afterAll, beforeEach } from 'vitest';

let client: ClickHouseClient;

beforeAll(async () => {
  client = createClient({
    url: process.env.CLICKHOUSE_HOST ?? 'http://localhost:8123',
    database: 'test_db',
  });

  // Verify connection
  const { success } = await client.ping();
  if (!success) throw new Error('ClickHouse not reachable');
});

beforeEach(async () => {
  // Clean test data between tests
  await client.command({ query: 'TRUNCATE TABLE IF EXISTS test_db.events' });
});

afterAll(async () => {
  await client.close();
});

export { client };

Step 3: Write Real Integration Tests


// tests/events.integration.test.ts
import { describe, it, expect } from 'vitest';
import { client } from './setup-integration';

describe('Events ta


                
                  
                  
                  clickhouse-common-errors
                  SKILL.md
                  View full skill →
                
                
                  "Diagnose and fix the top 15 ClickHouse errors \u2014 query failures,\.
                  
                      ReadGrepBash(curl:*)
                    
                
                
                  ClickHouse Common Errors
Overview
Quick reference for the most common ClickHouse errors with real error codes,
diagnostic queries, and proven solutions.
Prerequisites

Access to ClickHouse (client or HTTP interface)
Ability to query system.* tables

Error Reference
1. Too Many Parts (Code 252)

DB::Exception: Too many parts (600). Merges are processing significantly slower than inserts.

Cause: Each INSERT creates a new data part. Hundreds of tiny inserts per second
overwhelm the merge process.
Fix:

-- Check current part count per table
SELECT database, table, count() AS part_count
FROM system.parts WHERE active GROUP BY database, table ORDER BY part_count DESC;

-- Temporary: raise the limit
ALTER TABLE events MODIFY SETTING parts_to_throw_insert = 1000;

-- Permanent: batch your inserts (10K+ rows per INSERT)
-- See clickhouse-sdk-patterns for batching code

2. Memory Limit Exceeded (Code 241)

DB::Exception: Memory limit (for query) exceeded: ... (MEMORY_LIMIT_EXCEEDED)

Cause: Query allocates more RAM than maxmemoryusage (default ~10GB).
Fix:

-- Check what's consuming memory
SELECT query, memory_usage, peak_memory_usage
FROM system.processes ORDER BY peak_memory_usage DESC;

-- Option A: Increase limit for this query
SET max_memory_usage = 20000000000;  -- 20GB

-- Option B: Reduce data scanned
SELECT ... FROM events
WHERE created_at >= today() - 7  -- Add time filters
LIMIT 10000;                      -- Cap result size

-- Option C: Enable disk spill for large sorts/GROUP BY
SET max_bytes_before_external_sort = 10000000000;
SET max_bytes_before_external_group_by = 10000000000;

3. Syntax Error (Code 62)

DB::Exception: Syntax error: ... Expected ... before ... (SYNTAX_ERROR)

Common causes:

-- Wrong: using backticks for identifiers (MySQL habit)
SELECT `user_id` FROM events;
-- Fix: use double-quotes or no quotes
SELECT "user_id" FROM events;
SELECT user_id FROM events;

-- Wrong: LIMIT with OFFSET keyword
SELECT * FROM events LIMIT 10, 20;
-- Fix: use LIMIT ... OFFSET
SELECT * FROM events LIMIT 10 OFFSET 20;

-- Wrong: using != in older versions
WHERE status != 'active';
-- Fix: use <>
WHERE status <> 'active';

4. Unknown Table (Code 60)

DB::Exception: Table default.events does not exist. (UNKNOWN_TABLE)

Fix:

-- List all tables in the database
SHOW TABLES FROM default;

-- Check all databases
SHOW DATABASES;

-- The table might be in a different database
SE


                
                  
                  
                  clickhouse-core-workflow-a
                  SKILL.md
                  View full skill →
                
                
                  'Design ClickHouse schemas with MergeTree engines, ORDER BY keys, and.
                  
                      ReadWriteEditBash(npm:*)Grep
                    
                
                
                  ClickHouse Schema Design (Core Workflow A)
Overview
Design ClickHouse tables with correct engine selection, ORDER BY keys,
partitioning, and codec choices for analytical workloads.
Prerequisites

@clickhouse/client connected (see clickhouse-install-auth)
Understanding of your query patterns (what you filter and group on)

Instructions
Step 1: Choose the Right Engine

Engine
Best For
Dedup?
Example


MergeTree
General analytics, append-only logs
No
Clickstream, IoT


ReplacingMergeTree
Mutable rows (upserts)
Yes (on merge)
User profiles, state


SummingMergeTree
Pre-aggregated counters
Sums numerics
Page view counts


AggregatingMergeTree
Materialized view targets
Merges states
Dashboards


CollapsingMergeTree
Stateful row updates
Collapses +-1
Shopping carts


ClickHouse Cloud uses SharedMergeTree — it is a drop-in replacement for
MergeTree on Cloud. You do not need to change your DDL.
Step 2: Design the ORDER BY (Sort Key)
The ORDER BY clause is the single most important schema decision. It defines:

Primary index — sparse index over sort-key granules (8192 rows default)
Data layout on disk — rows sorted physically by these columns
Query speed — queries filtering on ORDER BY prefix columns hit fewer granules

Rules of thumb:

Put low-cardinality filter columns first (event_type, status)
Then high-cardinality columns you filter on (userid, tenantid)
End with a time column if you use range filters (created_at)
Do NOT put high-cardinality columns you never filter on in ORDER BY


-- Good: filter by tenant, then by time ranges
ORDER BY (tenant_id, event_type, created_at)

-- Bad: UUID first means every query scans the full index
ORDER BY (event_id, created_at)  -- event_id is random UUID

Step 3: Schema Examples
Event Analytics Table

CREATE TABLE analytics.events (
    event_id     UUID DEFAULT generateUUIDv4(),
    tenant_id    UInt32,
    event_type   LowCardinality(String),
    user_id      UInt64,
    session_id   String,
    properties   String CODEC(ZSTD(3)),  -- JSON blob, compress well
    url          String CODEC(ZSTD(1)),
    ip_address   IPv4,
    country


                
                  
                  
                  clickhouse-core-workflow-b
                  SKILL.md
                  View full skill →
                
                
                  'Insert, query, and aggregate data in ClickHouse with real SQL patterns.
                  
                      ReadWriteEditBash(npm:*)Grep
                    
                
                
                  ClickHouse Insert & Query (Core Workflow B)
Overview
Insert data efficiently and write analytical queries with aggregations,
window functions, and materialized views.
Prerequisites

Tables created (see clickhouse-core-workflow-a)
@clickhouse/client connected

Instructions
Step 1: Bulk Insert Patterns

import { createClient } from '@clickhouse/client';

const client = createClient({
  url: process.env.CLICKHOUSE_HOST!,
  username: process.env.CLICKHOUSE_USER ?? 'default',
  password: process.env.CLICKHOUSE_PASSWORD ?? '',
});

// Insert many rows efficiently — @clickhouse/client buffers internally
await client.insert({
  table: 'analytics.events',
  values: events,   // Array of objects matching table columns
  format: 'JSONEachRow',
});

// Insert from file (CSV, Parquet, etc.)
import { createReadStream } from 'fs';

await client.insert({
  table: 'analytics.events',
  values: createReadStream('./data/events.csv'),
  format: 'CSVWithNames',
});

Insert best practices:

Batch rows: aim for 10K-100K rows per INSERT (not one at a time)
ClickHouse creates a new "part" per INSERT — too many small inserts cause "too many parts"
For real-time streams, buffer 1-5 seconds then flush

Step 2: Analytical Queries

-- Top events by tenant in the last 7 days
SELECT
    tenant_id,
    event_type,
    count()                  AS event_count,
    uniqExact(user_id)       AS unique_users,
    min(created_at)          AS first_seen,
    max(created_at)          AS last_seen
FROM analytics.events
WHERE created_at >= now() - INTERVAL 7 DAY
GROUP BY tenant_id, event_type
ORDER BY event_count DESC
LIMIT 100;


-- Funnel analysis: signup → activation → purchase
SELECT
    level,
    count() AS users
FROM (
    SELECT
        user_id,
        groupArray(event_type) AS journey
    FROM analytics.events
    WHERE event_type IN ('signup', 'activation', 'purchase')
      AND created_at >= today() - 30
    GROUP BY user_id
)
ARRAY JOIN arrayEnumerate(journey) AS level
GROUP BY level
ORDER BY level;


-- Retention: users active this week who were also active last week
SELECT
    count(DISTINCT curr.user_id) AS retained_users
FROM analytics.events AS curr
INNER JOIN analytics.events AS prev
    ON curr.user_id = prev.user_id
WHERE curr.created_at >= toMonday(today())
  AND prev.created_at >= toMonday(today()) - 7
  AND prev.created_at < toMonday(today());

Step 3: Parameterized Queries in Node.js

// Use {param:Type} syntax for safe parameterized queries
const rs = await client.query({


                
                  
                  
                  clickhouse-cost-tuning
                  SKILL.md
                  View full skill →
                
                
                  "Optimize ClickHouse Cloud costs \u2014 compute scaling, storage tiering,\.
                  
                      ReadGrep
                    
                
                
                  ClickHouse Cost Tuning
Overview
Reduce ClickHouse Cloud costs through storage optimization, compression tuning,
TTL policies, compute scaling, and query efficiency improvements.
Prerequisites

ClickHouse Cloud account with billing access
Understanding of current data volumes and query patterns

Instructions
Step 1: Understand ClickHouse Cloud Pricing

Component
Pricing Model
Key Driver


Compute
Per-hour per replica
vCPU + memory tier


Storage
Per GB-month
Compressed data on disk


Network
Per GB egress
Query result sizes


Backups
Per GB stored
Backup retention


Key insight: ClickHouse bills on compressed storage, and ClickHouse
compresses extremely well (often 10-20x). Your cost driver is usually compute,
not storage.
Step 2: Analyze Storage Usage

-- Storage cost breakdown by table
SELECT
    database,
    table,
    formatReadableSize(sum(bytes_on_disk)) AS compressed_size,
    formatReadableSize(sum(data_uncompressed_bytes)) AS raw_size,
    round(sum(data_uncompressed_bytes) / sum(bytes_on_disk), 1) AS compression_ratio,
    sum(rows) AS total_rows,
    count() AS parts
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytes_on_disk) DESC;

-- Storage by column (find bloated columns)
SELECT
    table,
    column,
    type,
    formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
    formatReadableSize(sum(column_data_uncompressed_bytes)) AS raw,
    round(sum(column_data_uncompressed_bytes) / sum(column_data_compressed_bytes), 1) AS ratio
FROM system.parts_columns
WHERE active AND database = 'analytics'
GROUP BY table, column, type
ORDER BY sum(column_data_compressed_bytes) DESC
LIMIT 30;

Step 3: Improve Compression

-- Check current codec per column
SELECT name, type, compression_codec
FROM system.columns
WHERE database = 'analytics' AND table = 'events';

-- Apply better codecs to large columns
ALTER TABLE analytics.events
    MODIFY COLUMN properties String CODEC(ZSTD(3));  -- JSON blobs

ALTER TABLE analytics.events
    MODIFY COLUMN created_at DateTime CODEC(DoubleDelta, ZSTD);  -- Timestamps

ALTER TABLE analytics.events
    MODIFY COLUMN user_id UInt64 CODEC(Delta, ZSTD);  -- Sequential IDs

-- Verify improvement after next merge
OPTIMIZE TABLE analytics.events FINAL;

-- Check new compression ratio
SELECT
    column,
    formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
    round(sum(column_data_uncompressed_bytes) / sum(column_data_compressed_bytes), 1) AS ratio
FROM system.parts_columns
WHERE active AND datab


                
                  
                  
                  clickhouse-data-handling
                  SKILL.md
                  View full skill →
                
                
                  "Handle data lifecycle in ClickHouse \u2014 TTL expiration, data deletion\.
                  
                      ReadWriteEdit
                    
                
                
                  ClickHouse Data Handling
Overview
Manage the full data lifecycle in ClickHouse: TTL-based expiration, GDPR/CCPA
deletion, data masking, partition management, and audit trails.
Prerequisites

ClickHouse tables with data (see clickhouse-core-workflow-a)
Understanding of your data retention requirements

Instructions
Step 1: TTL-Based Data Expiration

-- Add TTL to expire data automatically
CREATE TABLE analytics.events (
    event_id    UUID DEFAULT generateUUIDv4(),
    event_type  LowCardinality(String),
    user_id     UInt64,
    properties  String CODEC(ZSTD(3)),
    created_at  DateTime DEFAULT now()
)
ENGINE = MergeTree()
ORDER BY (event_type, created_at)
PARTITION BY toYYYYMM(created_at)
TTL created_at + INTERVAL 90 DAY;    -- Auto-delete after 90 days

-- Add TTL to existing table
ALTER TABLE analytics.events
    MODIFY TTL created_at + INTERVAL 90 DAY;

-- Tiered storage TTL (hot → cold → delete)
ALTER TABLE analytics.events
    MODIFY TTL
        created_at + INTERVAL 7 DAY TO VOLUME 'hot',
        created_at + INTERVAL 30 DAY TO VOLUME 'cold',
        created_at + INTERVAL 365 DAY DELETE;

-- Column-level TTL (null out PII after 30 days, keep the row)
ALTER TABLE analytics.events
    MODIFY COLUMN email String DEFAULT ''
    TTL created_at + INTERVAL 30 DAY;

-- Force TTL cleanup now (normally runs during merges)
OPTIMIZE TABLE analytics.events FINAL;

Step 2: Data Deletion for GDPR/CCPA

-- Option A: Lightweight DELETE (ClickHouse 23.3+)
-- Marks rows as deleted without rewriting parts immediately
DELETE FROM analytics.events WHERE user_id = 42;

-- Option B: ALTER TABLE DELETE (mutation — rewrites parts in background)
ALTER TABLE analytics.events DELETE WHERE user_id = 42;

-- Check mutation progress
SELECT
    database, table, mutation_id, command,
    is_done, parts_to_do, create_time
FROM system.mutations
WHERE NOT is_done
ORDER BY create_time DESC;

-- Option C: Drop entire partitions (fastest for bulk deletion)
-- First, check what partitions exist
SELECT partition, count() AS parts, sum(rows) AS rows,
       min(min_time) AS from_time, max(max_time) AS to_time
FROM system.parts
WHERE database = 'analytics' AND table = 'events' AND active
GROUP BY partition ORDER BY partition;

ALTER TABLE analytics.events DROP PARTITION '202401';

Important notes on ClickHouse deletions:

DELETE FROM is lightweight but still creates mutations internally
Mutations rewrite data parts in the background — not instant
For GDPR compliance, use ALTER TABLE DELETE and verify via system.mutations
Partitioned data is fastest to bulk-delete via DROP PARTITION

Step 3: Data Masking and Anonymization
                
              

                
                  
                  
                  clickhouse-debug-bundle
                  SKILL.md
                  View full skill →
                
                
                  "Collect ClickHouse diagnostic data \u2014 system tables, query logs,\.
                  
                      ReadBash(grep:*)Bash(curl:*)Bash(tar:*)Grep
                    
                
                
                  ClickHouse Debug Bundle
Overview
Collect comprehensive diagnostic data from ClickHouse system tables for
troubleshooting performance issues, merge problems, or support escalation.
Prerequisites

Access to ClickHouse with system.* table read permissions
curl or clickhouse-client available

Instructions
Step 1: Server Health Overview

-- Server version and uptime
SELECT
    version()                       AS version,
    uptime()                        AS uptime_seconds,
    formatReadableTimeDelta(uptime()) AS uptime_human,
    currentDatabase()               AS current_db;

-- Global metrics snapshot
SELECT metric, value, description
FROM system.metrics
WHERE metric IN (
    'Query', 'Merge', 'PartMutation', 'ReplicatedFetch',
    'TCPConnection', 'HTTPConnection', 'MemoryTracking',
    'BackgroundMergesAndMutationsPoolTask'
);

Step 2: Disk and Table Health

-- Disk usage by table (top 20)
SELECT
    database,
    table,
    formatReadableSize(sum(bytes_on_disk))  AS disk_size,
    sum(rows)                               AS total_rows,
    count()                                 AS active_parts,
    max(modification_time)                  AS last_modified
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytes_on_disk) DESC
LIMIT 20;

-- Tables with too many parts (merge pressure)
SELECT database, table, count() AS parts
FROM system.parts WHERE active
GROUP BY database, table
HAVING parts > 100
ORDER BY parts DESC;

-- Disk space per disk
SELECT
    name,
    path,
    formatReadableSize(total_space)     AS total,
    formatReadableSize(free_space)      AS free,
    round(free_space / total_space * 100, 1) AS free_pct
FROM system.disks;

Step 3: Query Performance Analysis

-- Slowest queries in the last 24 hours
SELECT
    event_time,
    query_duration_ms,
    read_rows,
    read_bytes,
    result_rows,
    memory_usage,
    substring(query, 1, 200) AS query_preview
FROM system.query_log
WHERE type = 'QueryFinish'
  AND event_time >= now() - INTERVAL 24 HOUR
ORDER BY query_duration_ms DESC
LIMIT 20;

-- Failed queries (last 24h)
SELECT
    event_time,
    exception_code,
    exception,
    substring(query, 1, 200) AS query_preview
FROM system.query_log
WHERE type = 'ExceptionWhileProcessing'
  AND event_time >= now() - INTERVAL 24 HOUR
ORDER BY event_time DESC
LIMIT 20;

-- Query patterns (group by normalized query)
SELECT
    normalized_query_hash,
    count()                          AS executions,
    avg(query_duration_ms)           AS avg_ms,
    max(query_duration_ms)           AS max_ms,
    sum(read_rows)                   AS total_rows_read,
    formatReadableSize(sum(read_bytes)) AS 

                

              

                
                  
                  
                  clickhouse-deploy-integration
                  SKILL.md
                  View full skill →
                
                
                  'Deploy ClickHouse-backed applications to Vercel, Fly.
                  
                      ReadWriteEditBash(vercel:*)Bash(fly:*)Bash(gcloud:*)
                    
                
                
                  ClickHouse Deploy Integration
Overview
Deploy applications that connect to ClickHouse Cloud from serverless and
container platforms with proper connection management and secrets handling.
Prerequisites

ClickHouse Cloud instance (or self-hosted with public endpoint)
Platform CLI installed (vercel, fly, or gcloud)
Application tested locally against ClickHouse

Instructions
Step 1: ClickHouse Connection Module (Platform-Agnostic)

// src/db.ts — singleton for serverless-safe connections
import { createClient, ClickHouseClient } from '@clickhouse/client';

let client: ClickHouseClient | null = null;

export function getClickHouse(): ClickHouseClient {
  if (!client) {
    client = createClient({
      url: process.env.CLICKHOUSE_HOST!,         // https://<host>:8443
      username: process.env.CLICKHOUSE_USER!,
      password: process.env.CLICKHOUSE_PASSWORD!,
      database: process.env.CLICKHOUSE_DATABASE ?? 'default',
      request_timeout: 30_000,
      max_open_connections: 5,    // Low for serverless (many cold starts)
      compression: {
        request: true,            // Saves egress bandwidth
        response: true,
      },
    });
  }
  return client;
}

Step 2: Vercel (Serverless Functions)

# Set secrets
vercel env add CLICKHOUSE_HOST production
vercel env add CLICKHOUSE_USER production
vercel env add CLICKHOUSE_PASSWORD production
vercel env add CLICKHOUSE_DATABASE production


// api/events/route.ts (Next.js App Router)
import { getClickHouse } from '@/src/db';
import { NextResponse } from 'next/server';

export async function GET(request: Request) {
  const { searchParams } = new URL(request.url);
  const days = Number(searchParams.get('days') ?? 7);

  const client = getClickHouse();
  const rs = await client.query({
    query: `
      SELECT event_type, count() AS cnt
      FROM events
      WHERE created_at >= now() - INTERVAL {days:UInt32} DAY
      GROUP BY event_type ORDER BY cnt DESC
    `,
    query_params: { days },
    format: 'JSONEachRow',
  });

  return NextResponse.json(await rs.json());
}

Vercel gotchas:

Serverless function timeout: 30s (Pro) / 10s (Hobby)
Each invocation may create a new connection — set maxopenconnections low
Use Edge Runtime only with HTTP-based clients (ClickHouse client works fine)

Step 3: Fly.io (Containers)

# fly.toml
app = "my-clickhouse-app"
primary_region = "iad"

[env]
  NODE_ENV = "production"
  CLICKHOUSE_DATABASE = "analytics"

[http_service]
  internal_port = 3000
  force_https = true
  auto_stop_machines = true
  auto_start_machines

                

              

                
                  
                  
                  clickhouse-enterprise-rbac
                  SKILL.md
                  View full skill →
                
                
                  "Configure ClickHouse enterprise RBAC \u2014 SQL-based users, roles,\.
                  
                      ReadWriteEdit
                    
                
                
                  ClickHouse Enterprise RBAC
Overview
Implement enterprise-grade role-based access control in ClickHouse using SQL-based
user management, hierarchical roles, row-level policies, and quotas.
Prerequisites

ClickHouse with access_management = 1 enabled (default in Cloud)
Admin user with GRANT OPTION

Instructions
Step 1: Create Users with Authentication

-- SHA256 password (standard)
CREATE USER app_backend
    IDENTIFIED WITH sha256_password BY 'strong-password-here'
    DEFAULT DATABASE analytics
    HOST IP '10.0.0.0/8'           -- Restrict to VPC
    SETTINGS max_memory_usage = 10000000000,   -- 10GB per query
             max_execution_time = 60;          -- 60s timeout

-- Double SHA1 (MySQL wire protocol compatible)
CREATE USER legacy_app
    IDENTIFIED WITH double_sha1_password BY 'password'
    DEFAULT DATABASE analytics;

-- bcrypt (strongest, slowest — use for admin accounts)
CREATE USER admin_user
    IDENTIFIED WITH bcrypt_password BY 'admin-password';

-- Verify user was created
SHOW CREATE USER app_backend;
SELECT name, host_ip, default_database FROM system.users;

Step 2: Create Role Hierarchy

-- Base roles (leaf-level permissions)
CREATE ROLE data_reader;
GRANT SELECT ON analytics.* TO data_reader;

CREATE ROLE data_writer;
GRANT INSERT ON analytics.* TO data_writer;

CREATE ROLE schema_manager;
GRANT CREATE TABLE, ALTER TABLE, DROP TABLE ON analytics.* TO schema_manager;

-- Composite roles (inherit from base roles)
CREATE ROLE analyst;
GRANT data_reader TO analyst;
-- Analysts can also create temporary tables for ad-hoc work
GRANT CREATE TEMPORARY TABLE ON *.* TO analyst;

CREATE ROLE developer;
GRANT data_reader, data_writer TO developer;

CREATE ROLE platform_admin;
GRANT data_reader, data_writer, schema_manager TO platform_admin;
GRANT SYSTEM RELOAD, SYSTEM FLUSH LOGS ON *.* TO platform_admin;

-- Assign roles to users
GRANT analyst TO app_backend;         -- Read-only
GRANT developer TO app_backend;       -- Read + write
GRANT platform_admin TO admin_user;   -- Full access

-- Set default role (active when user connects)
SET DEFAULT ROLE developer TO app_backend;

-- Verify the full permission chain
SHOW GRANTS FOR app_backend;
SHOW ACCESS;  -- All users, roles, policies

Step 3: Row-Level Security

-- Multi-tenant isolation: each user sees only their tenant's data
CREATE USER tenant_acme
    IDENTIFIED WITH sha256_password BY 'pass'
    DEFAULT DATABASE analytics;

CREATE USER tenant_globex
    IDENTIFIED WITH sha256_password BY 'pass'
    DEFAULT DATABASE analytics;

-- Row policy: restrict by tenant_id
CREATE ROW POLICY acme_isolation ON analytics.events
    FOR SELECT
    USING tenant_id = 1
    TO tenant_acme;

CREATE ROW POLICY globex_i

                

              

                
                  
                  
                  clickhouse-hello-world
                  SKILL.md
                  View full skill →
                
                
                  'Create your first ClickHouse table, insert data, and run analytical.
                  
                      ReadWriteEditBash(npm:*)Bash(node:*)
                    
                
                
                  ClickHouse Hello World
Overview
Create a MergeTree table, insert rows with JSONEachRow, and run your first
analytical query -- all using the official @clickhouse/client.
Prerequisites

@clickhouse/client installed and connected (see clickhouse-install-auth)

Instructions
Step 1: Create a MergeTree Table

import { createClient } from '@clickhouse/client';

const client = createClient({
  url: process.env.CLICKHOUSE_HOST ?? 'http://localhost:8123',
  username: process.env.CLICKHOUSE_USER ?? 'default',
  password: process.env.CLICKHOUSE_PASSWORD ?? '',
});

await client.command({
  query: `
    CREATE TABLE IF NOT EXISTS events (
      event_id    UUID DEFAULT generateUUIDv4(),
      event_type  LowCardinality(String),
      user_id     UInt64,
      payload     String,
      created_at  DateTime DEFAULT now()
    )
    ENGINE = MergeTree()
    ORDER BY (event_type, created_at)
    PARTITION BY toYYYYMM(created_at)
    TTL created_at + INTERVAL 90 DAY
  `,
});
console.log('Table "events" created.');

Key concepts:

MergeTree() -- the foundational ClickHouse engine for analytics
ORDER BY -- defines the primary index (sort key); pick columns you filter/group on
PARTITION BY -- splits data into parts by month for efficient pruning
TTL -- automatic data expiration
LowCardinality(String) -- dictionary-encoded string, ideal for columns with < 10K distinct values

Step 2: Insert Data with JSONEachRow

await client.insert({
  table: 'events',
  values: [
    { event_type: 'page_view', user_id: 1001, payload: '{"url":"/home"}' },
    { event_type: 'click',     user_id: 1001, payload: '{"button":"signup"}' },
    { event_type: 'page_view', user_id: 1002, payload: '{"url":"/pricing"}' },
    { event_type: 'purchase',  user_id: 1002, payload: '{"amount":49.99}' },
    { event_type: 'page_view', user_id: 1003, payload: '{"url":"/docs"}' },
  ],
  format: 'JSONEachRow',
});
console.log('Inserted 5 events.');

Step 3: Query the Data

// Count events by type
const rs = await client.query({
  query: `
    SELECT
      event_type,
      count()          AS total,
      uniqExact(user_id) AS unique_users
    FROM events
    GROUP BY event_type
    ORDER BY total DESC
  `,
  format: 'JSONEachRow',
});

const rows = await rs.json<{
  event_type: string;
  total: string;        // ClickHouse returns numbers as strings in JS

                

              

                
                  
                  
                  clickhouse-incident-runbook
                  SKILL.md
                  View full skill →
                
                
                  "ClickHouse incident response \u2014 triage, diagnose, and remediate\.
                  
                      ReadGrepBash(kubectl:*)Bash(curl:*)
                    
                
                
                  ClickHouse Incident Runbook
Overview
Step-by-step procedures for triaging and resolving ClickHouse incidents
using built-in system tables and SQL commands.
Severity Levels

Level
Definition
Response
Examples


P1
ClickHouse unreachable / all queries failing
< 15 min
Server down, OOM, disk full


P2
Degraded performance / partial failures
< 1 hour
Slow queries, merge backlog


P3
Minor impact / non-critical errors
< 4 hours
Single table issue, warnings


P4
No user impact
Next business day
Monitoring gaps, optimization


Quick Triage (Run First)

# 1. Is ClickHouse alive?
curl -sf 'http://localhost:8123/ping' && echo "UP" || echo "DOWN"

# 2. Can it answer a query?
curl -sf 'http://localhost:8123/?query=SELECT+1' && echo "OK" || echo "QUERY FAILED"

# 3. Check ClickHouse Cloud status
curl -sf 'https://status.clickhouse.cloud' | head -5


-- 4. Server health snapshot (run if server responds)
SELECT
    version()                         AS version,
    formatReadableTimeDelta(uptime())  AS uptime,
    (SELECT count() FROM system.processes) AS running_queries,
    (SELECT value FROM system.metrics WHERE metric = 'MemoryTracking')
        AS memory_bytes,
    (SELECT count() FROM system.merges) AS active_merges;

-- 5. Recent errors
SELECT event_time, exception_code, exception, substring(query, 1, 200) AS q
FROM system.query_log
WHERE type = 'ExceptionWhileProcessing'
  AND event_time >= now() - INTERVAL 10 MINUTE
ORDER BY event_time DESC
LIMIT 10;

Decision Tree

Server responds to ping?
├─ NO → Check process/container status, disk space, OOM killer logs
│       └─ Container/process dead → Restart, check logs
│       └─ Disk full → Emergency: drop old partitions, expand disk
│       └─ OOM killed → Reduce max_memory_usage, add RAM
└─ YES → Queries succeeding?
    ├─ NO → Check error codes below
    │   └─ Auth errors (516) → Verify credentials, check user exists
    │   └─ Too many queries (202) → Kill stuck queries, reduce concurrency
    │   └─ Memory exceeded (241) → Kill large queries, reduce max_threads
    └─ YES but slow → Performance triage below

Remediation Procedures
P1: Server Down / OOM

# Check if process was OOM-killed
dmesg | grep -i "out of memory" | tail -5
journalctl -u clickhouse-server --since "10 minutes ago" | tail -20

# Restart
sudo systemctl restart clickhouse-server
# or for Docker:
docker restart clickhouse

# Verify recovery
curl '

                

              

                
                  
                  
                  clickhouse-install-auth
                  SKILL.md
                  View full skill →
                
                
                  'Install @clickhouse/client and configure authentication to ClickHouse.
                  
                      ReadWriteEditBash(npm:*)Bash(pnpm:*)Bash(pip:*)Grep
                    
                
                
                  ClickHouse Install & Auth
Overview
Set up the official ClickHouse client for Node.js or Python and configure authentication
to ClickHouse Cloud or a self-hosted instance.
Prerequisites

Node.js 18+ or Python 3.8+
A running ClickHouse instance (Cloud or self-hosted)
Connection credentials (host, port, user, password)

Instructions
Step 1: Install the Official Client

# Node.js — official client (HTTP-based, supports streaming)
npm install @clickhouse/client

# Python — official client
pip install clickhouse-connect

Step 2: Configure Environment Variables

# .env (NEVER commit — add to .gitignore)
CLICKHOUSE_HOST=https://abc123.us-east-1.aws.clickhouse.cloud:8443
CLICKHOUSE_USER=default
CLICKHOUSE_PASSWORD=your-password-here

# Self-hosted (HTTP interface on port 8123, native on 9000)
# CLICKHOUSE_HOST=http://localhost:8123

Step 3: Create the Client (Node.js)

import { createClient } from '@clickhouse/client';

// ClickHouse Cloud
const client = createClient({
  url: process.env.CLICKHOUSE_HOST,           // https://<host>:8443
  username: process.env.CLICKHOUSE_USER,       // default
  password: process.env.CLICKHOUSE_PASSWORD,
  // ClickHouse Cloud requires TLS — the client handles it via https:// URL
});

// Self-hosted (no TLS)
const localClient = createClient({
  url: 'http://localhost:8123',
  username: 'default',
  password: '',
});

Step 4: Verify Connection

async function verifyConnection() {
  // Ping returns true if the server is reachable
  const alive = await client.ping();
  console.log('ClickHouse ping:', alive.success);  // true

  // Run a test query
  const rs = await client.query({
    query: 'SELECT version() AS ver, uptime() AS uptime_sec',
    format: 'JSONEachRow',
  });
  const rows = await rs.json<{ ver: string; uptime_sec: number }>();
  console.log('Server version:', rows[0].ver);
  console.log('Uptime (sec):', rows[0].uptime_sec);
}

verifyConnection().catch(console.error);

Step 5: Python Alternative

import clickhouse_connect

client = clickhouse_connect.get_client(
    host='abc123.us-east-1.aws.clickhouse.cloud',
    port=8443,
    username='default',
    password='your-password-here',
    secure=True,
)

result = client.query('SELECT version(), uptime()')
print(f"Version: {result.result_rows[0][0]}")

Connection Options Reference

Option
Default
Description


url
http://localhost:8123
Full URL including protocol and port
                
              
                
                  
                  
                  clickhouse-local-dev-loop
                  SKILL.md
                  View full skill →
                
                
                  'Run ClickHouse locally with Docker, configure test fixtures, and iterate.
                  
                      ReadWriteEditBash(npm:*)Bash(docker:*)Bash(docker-compose:*)
                    
                
                
                  ClickHouse Local Dev Loop
Overview
Run ClickHouse in Docker for local development with fast schema iteration,
seed data, and integration testing using vitest.
Prerequisites

Docker or Docker Compose installed
Node.js 18+ with @clickhouse/client

Instructions
Step 1: Docker Compose Setup

# docker-compose.yml
services:
  clickhouse:
    image: clickhouse/clickhouse-server:latest
    ports:
      - "8123:8123"   # HTTP interface
      - "9000:9000"   # Native TCP (clickhouse-client CLI)
    volumes:
      - clickhouse-data:/var/lib/clickhouse
      - ./init-db:/docker-entrypoint-initdb.d  # Auto-run SQL on first start
    environment:
      CLICKHOUSE_USER: default
      CLICKHOUSE_PASSWORD: dev_password
      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1  # Enable SQL-based user management
    ulimits:
      nofile:
        soft: 262144
        hard: 262144

volumes:
  clickhouse-data:


docker compose up -d
# Verify: curl http://localhost:8123/ping   → "Ok.\n"

Step 2: Init Script (Auto-Run on First Start)

-- init-db/001-schema.sql
CREATE DATABASE IF NOT EXISTS app;

CREATE TABLE IF NOT EXISTS app.events (
    event_id    UUID DEFAULT generateUUIDv4(),
    event_type  LowCardinality(String),
    user_id     UInt64,
    properties  String,     -- JSON string
    created_at  DateTime DEFAULT now()
)
ENGINE = MergeTree()
ORDER BY (event_type, created_at)
PARTITION BY toYYYYMM(created_at);

Step 3: Seed Data Script

// scripts/seed.ts
import { createClient } from '@clickhouse/client';

const client = createClient({
  url: 'http://localhost:8123',
  username: 'default',
  password: 'dev_password',
  database: 'app',
});

const events = Array.from({ length: 1000 }, (_, i) => ({
  event_type: ['page_view', 'click', 'signup', 'purchase'][i % 4],
  user_id: Math.floor(Math.random() * 100) + 1,
  properties: JSON.stringify({ index: i }),
  created_at: new Date(Date.now() - Math.random() * 86400000 * 30)
    .toISOString()
    .replace('T', ' ')
    .slice(0, 19),
}));

await client.insert({ table: 'events', values: events, format: 'JSONEachRow' });
console.log(`Seeded ${events.length} events`);
await client.close();

Step 4: Project Structure

my-clickhouse-app/
├── docker-compose.yml
├── init-db/
│   └── 001-schema.sql
├── scripts/
│   └── seed.ts
├── src/
│   ├── db.ts              # Client singleton
│   └── queries.ts          # Named query functions
├── tests/
│   ├── setup.ts            # Test lifecycle (truncate tables)
│   └── events.test.ts
├── .env.local              # Local creds (git-ignored)
├── .env.example
└── pa

                

              

                
                  
                  
                  clickhouse-migration-deep-dive
                  SKILL.md
                  View full skill →
                
                
                  "Execute ClickHouse schema migrations \u2014 ALTER TABLE operations,\.
                  
                      ReadWriteEditBash(npm:*)Bash(node:*)Bash(kubectl:*)
                    
                
                
                  ClickHouse Migration Deep Dive
Overview
Plan and execute ClickHouse schema migrations: column changes, engine migrations,
ORDER BY modifications, and versioned migration runners.
Prerequisites

ClickHouse admin access
Backup of production data (see clickhouse-prod-checklist)
Test environment for validation

Instructions
Step 1: Understanding ClickHouse DDL
ClickHouse ALTER operations are mutations — they run asynchronously and
rewrite data parts in the background. This is fundamentally different from
PostgreSQL/MySQL where ALTER is often instant or blocking.

-- Lightweight operations (instant, metadata only)
ALTER TABLE events ADD COLUMN country LowCardinality(String) DEFAULT '';
ALTER TABLE events RENAME COLUMN old_name TO new_name;
ALTER TABLE events COMMENT COLUMN user_id 'Unique user identifier';

-- Heavyweight operations (mutations — rewrite parts in background)
ALTER TABLE events MODIFY COLUMN properties String CODEC(ZSTD(3));
ALTER TABLE events DROP COLUMN deprecated_field;
ALTER TABLE events DELETE WHERE user_id = 0;
ALTER TABLE events UPDATE email = '' WHERE created_at < '2024-01-01';

-- Check mutation progress
SELECT database, table, mutation_id, command, is_done,
       parts_to_do, create_time
FROM system.mutations
WHERE NOT is_done ORDER BY create_time;

Step 2: Column Operations

-- Add a column (instant — no data rewrite)
ALTER TABLE analytics.events
    ADD COLUMN IF NOT EXISTS country LowCardinality(String) DEFAULT ''
    AFTER user_id;

-- Add column with materialized default (fills new data, not old)
ALTER TABLE analytics.events
    ADD COLUMN IF NOT EXISTS event_date Date
    MATERIALIZED toDate(created_at);

-- Modify column type (mutation — rewrites all parts)
ALTER TABLE analytics.events
    MODIFY COLUMN user_id UInt64;   -- Was UInt32, now UInt64

-- Drop a column
ALTER TABLE analytics.events
    DROP COLUMN IF EXISTS deprecated_field;

-- Change default value
ALTER TABLE analytics.events
    MODIFY COLUMN created_at DateTime DEFAULT now();

-- Add codec to existing column (mutation)
ALTER TABLE analytics.events
    MODIFY COLUMN properties String CODEC(ZSTD(3));

Step 3: Change ORDER BY (Requires Table Recreation)
ClickHouse does not support ALTER TABLE ... MODIFY ORDER BY. You must
create a new table and migrate data.

-- Step 1: Create new table with desired ORDER BY
CREATE TABLE analytics.events_v2 AS analytics.events
ENGINE = MergeTree()
ORDER BY (tenant_id, event_type, toDate(created_at))  -- New key
PARTITION BY toYYYYMM(created_at);

-- Step 2: Copy data
INSERT INTO analytics.events_v2 SELECT * FROM analytics.events;

-- Step 3: Atomic swap (zero-downtime if a

                

              

                
                  
                  
                  clickhouse-multi-env-setup
                  SKILL.md
                  View full skill →
                
                
                  'Configure ClickHouse across dev, staging, and production with environment-specific.
                  
                      ReadWriteEditBash(aws:*)Bash(gcloud:*)Bash(vault:*)
                    
                
                
                  ClickHouse Multi-Environment Setup
Overview
Configure separate ClickHouse instances for development, staging, and production
with proper secrets management, environment detection, and infrastructure-as-code.
Prerequisites

ClickHouse Cloud account or self-hosted instances per environment
Secret management solution (Vault, AWS Secrets Manager, GCP Secret Manager)
CI/CD pipeline with environment variables

Instructions
Step 1: Environment Strategy

Environment
Instance
Purpose
Data


Development
Docker local
Fast iteration
Synthetic seed data


Staging
ClickHouse Cloud (Dev tier)
Pre-prod validation
Sampled prod copy


Production
ClickHouse Cloud (Prod tier)
Live traffic
Real data


Step 2: Configuration Module

// src/config/clickhouse.ts
interface ClickHouseEnvConfig {
  url: string;
  username: string;
  password: string;
  database: string;
  maxConnections: number;
  requestTimeout: number;
  compression: boolean;
}

const configs: Record<string, ClickHouseEnvConfig> = {
  development: {
    url: 'http://localhost:8123',
    username: 'default',
    password: process.env.CLICKHOUSE_PASSWORD ?? 'dev_password',
    database: 'app_dev',
    maxConnections: 5,
    requestTimeout: 60_000,    // Longer for debugging
    compression: false,         // Easier to debug raw
  },
  staging: {
    url: process.env.CLICKHOUSE_HOST ?? 'https://staging.clickhouse.cloud:8443',
    username: process.env.CLICKHOUSE_USER ?? 'app_staging',
    password: process.env.CLICKHOUSE_PASSWORD!,
    database: 'app_staging',
    maxConnections: 10,
    requestTimeout: 30_000,
    compression: true,
  },
  production: {
    url: process.env.CLICKHOUSE_HOST!,
    username: process.env.CLICKHOUSE_USER!,
    password: process.env.CLICKHOUSE_PASSWORD!,
    database: 'app_prod',
    maxConnections: 20,
    requestTimeout: 30_000,
    compression: true,
  },
};

export function getConfig(): ClickHouseEnvConfig {
  const env = process.env.NODE_ENV ?? 'development';
  const config = configs[env];
  if (!config) throw new Error(`Unknown environment: ${env}`);

  // Validate required fields in non-dev environments
  if (env !== 'development') {
    if (!config.password) throw new Error(`CLICKHOUSE_PASSWORD not set for ${env}`);
    if (!config.url.startsWith('https://')) {
      throw new Error(`ClickHouse ${env} must use HTTPS`);
    }
  }

  return config;
}

Step 3: Client Factory

// src/clickhouse/client.ts
import { createClient, ClickHouseClient } from '@clickhouse/clien

                

              

                
                  
                  
                  clickhouse-observability
                  SKILL.md
                  View full skill →
                
                
                  'Monitor ClickHouse with Prometheus metrics, Grafana dashboards, system.
                  
                      ReadWriteEdit
                    
                
                
                  ClickHouse Observability
Overview
Set up comprehensive monitoring for ClickHouse using built-in system tables,
Prometheus integration, Grafana dashboards, and alerting rules.
Prerequisites

ClickHouse instance with system.* table access
Prometheus (or compatible: Grafana Alloy, Victoria Metrics)
Grafana for dashboards
AlertManager or PagerDuty for alerts

Instructions
Step 1: Key Metrics from System Tables

-- Real-time server health snapshot
SELECT
    (SELECT count() FROM system.processes) AS running_queries,
    (SELECT value FROM system.metrics WHERE metric = 'MemoryTracking') AS memory_bytes,
    (SELECT value FROM system.metrics WHERE metric = 'Query') AS concurrent_queries,
    (SELECT count() FROM system.merges) AS active_merges,
    (SELECT value FROM system.asynchronous_metrics WHERE metric = 'Uptime') AS uptime_sec;

-- Query throughput (last hour, per minute)
SELECT
    toStartOfMinute(event_time) AS minute,
    count() AS queries,
    countIf(exception_code != 0) AS errors,
    round(avg(query_duration_ms)) AS avg_ms,
    round(quantile(0.95)(query_duration_ms)) AS p95_ms,
    formatReadableSize(sum(read_bytes)) AS total_read
FROM system.query_log
WHERE type IN ('QueryFinish', 'ExceptionWhileProcessing')
  AND event_time >= now() - INTERVAL 1 HOUR
GROUP BY minute ORDER BY minute;

-- Insert throughput (last hour)
SELECT
    toStartOfMinute(event_time) AS minute,
    count() AS inserts,
    sum(written_rows) AS rows_written,
    formatReadableSize(sum(written_bytes)) AS bytes_written
FROM system.query_log
WHERE type = 'QueryFinish' AND query_kind = 'Insert'
  AND event_time >= now() - INTERVAL 1 HOUR
GROUP BY minute ORDER BY minute;

-- Part count per table (merge health indicator)
SELECT database, table, count() AS parts, sum(rows) AS rows,
       formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts WHERE active
GROUP BY database, table
HAVING parts > 50
ORDER BY parts DESC;

Step 2: Prometheus Integration
ClickHouse Cloud exposes a managed Prometheus endpoint:

# prometheus.yml
scrape_configs:
  - job_name: clickhouse-cloud
    metrics_path: /v1/organizations/<ORG_ID>/prometheus
    basic_auth:
      username: <API_KEY_ID>
      password: <API_KEY_SECRET>
    static_configs:
      - targets: ['api.clickhouse.cloud']
    params:
      filtered_metrics: ['true']   # 125 critical metrics only

Self-hosted — use clickhouse-exporter or built-in metrics endpoint:

# prometheus.yml
scrape_configs:
  - job_name: clickhouse
    static_configs:
      - targets: ['clickhouse-server:9363']  # Built-in Prometheus endpoint
    metrics_path: /metrics

                

              

                
                  
                  
                  clickhouse-performance-tuning
                  SKILL.md
                  View full skill →
                
                
                  'Optimize ClickHouse query performance with indexing, projections, settings.
                  
                      ReadWriteEdit
                    
                
                
                  ClickHouse Performance Tuning
Overview
Diagnose and fix ClickHouse performance issues using query analysis, proper indexing,
projections, materialized views, and server settings tuning.
Prerequisites

ClickHouse tables with data (see clickhouse-core-workflow-a)
Access to system.query_log and system.parts

Instructions
Step 1: Diagnose Slow Queries

-- Find the slowest queries in the last 24 hours
SELECT
    event_time,
    query_duration_ms,
    read_rows,
    read_bytes,
    result_rows,
    memory_usage,
    substring(query, 1, 300) AS query_preview
FROM system.query_log
WHERE type = 'QueryFinish'
  AND event_time >= now() - INTERVAL 24 HOUR
  AND query_duration_ms > 1000   -- > 1 second
ORDER BY query_duration_ms DESC
LIMIT 20;

-- Analyze a specific query with EXPLAIN
EXPLAIN PLAN
SELECT event_type, count() FROM events WHERE created_at >= today() - 7 GROUP BY event_type;

-- Full pipeline analysis
EXPLAIN PIPELINE
SELECT event_type, count() FROM events WHERE created_at >= today() - 7 GROUP BY event_type;

Step 2: ORDER BY Key Optimization
The ORDER BY key is ClickHouse's primary performance lever. Queries that filter
on the ORDER BY prefix skip entire granules (8192-row chunks).

-- Check what your current ORDER BY key is
SELECT
    database, table, sorting_key, primary_key,
    formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.tables
JOIN system.parts ON tables.name = parts.table AND tables.database = parts.database
WHERE tables.database = 'analytics' AND tables.name = 'events' AND parts.active
GROUP BY database, table, sorting_key, primary_key;

-- If your queries filter on (tenant_id, event_type, created_at)
-- but ORDER BY is (created_at), you're scanning too much data.
-- Fix: recreate table with correct ORDER BY
CREATE TABLE analytics.events_v2 AS analytics.events
ENGINE = MergeTree()
ORDER BY (tenant_id, event_type, toDate(created_at));

INSERT INTO analytics.events_v2 SELECT * FROM analytics.events;
RENAME TABLE analytics.events TO analytics.events_old,
             analytics.events_v2 TO analytics.events;

Step 3: Data Skipping Indexes

-- Add a bloom filter index for high-cardinality lookups
ALTER TABLE analytics.events
    ADD INDEX idx_session_id session_id TYPE bloom_filter(0.01) GRANULARITY 4;

-- Add a set index for low-cardinality columns
ALTER TABLE analytics.events
    ADD INDEX idx_country country TYPE set(100) GRANULARITY 4;

-- Add a minmax index for range queries on non-ORDER-BY columns
ALTER TABLE analytics.events
    ADD INDEX idx_amount amount TYPE minmax GRANULARITY 4;

-- Materialize indexes for existing data
ALTER TABLE analytics.events MATERIALIZE INDEX idx_session_id;

-- Verify index usage
EXPLAIN inde

                

              

                
                  
                  
                  clickhouse-prod-checklist
                  SKILL.md
                  View full skill →
                
                
                  "Production readiness checklist for ClickHouse \u2014 server tuning,\.
                  
                      ReadBash(kubectl:*)Bash(curl:*)Grep
                    
                
                
                  ClickHouse Production Checklist
Overview
Comprehensive go-live checklist for ClickHouse covering server tuning, schema design,
backup configuration, monitoring, and operational readiness.
Prerequisites

ClickHouse instance provisioned (Cloud or self-hosted)
Application integration code tested in staging

Checklist
1. Schema & Engine Design

[ ] Tables use MergeTree family engines (not Memory, Log, or TinyLog)
[ ] ORDER BY columns match primary filter/group patterns
[ ] PARTITION BY is coarse (monthly or weekly, never by ID)
[ ] TTL configured for data retention policy
[ ] LowCardinality(String) used for low-cardinality columns
[ ] CODEC(ZSTD) applied to large String/JSON columns
[ ] ReplacingMergeTree used with FINAL or dedup logic if upserts needed

2. Server Configuration (Self-Hosted)

<!-- Key production settings in config.xml / users.xml -->

<!-- Memory: set to ~80% of available RAM -->
<max_server_memory_usage_to_ram_ratio>0.8</max_server_memory_usage_to_ram_ratio>

<!-- Query limits -->
<max_concurrent_queries>150</max_concurrent_queries>
<max_memory_usage>10000000000</max_memory_usage>  <!-- 10GB per query -->
<max_execution_time>300</max_execution_time>       <!-- 5 min timeout -->

<!-- Merge settings -->
<background_pool_size>16</background_pool_size>
<background_schedule_pool_size>16</background_schedule_pool_size>

<!-- Logging -->
<query_log>
    <database>system</database>
    <table>query_log</table>
    <flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_log>

3. Backup Configuration

-- ClickHouse native BACKUP to S3
BACKUP TABLE analytics.events
    TO S3(
        'https://my-bucket.s3.us-east-1.amazonaws.com/backups/events',
        'ACCESS_KEY',
        'SECRET_KEY'
    )
    SETTINGS compression_method = 'zstd';

-- Incremental backup (base + delta)
BACKUP TABLE analytics.events
    TO S3('s3://my-bucket/backups/events-incremental')
    SETTINGS base_backup = S3('s3://my-bucket/backups/events-base');

ClickHouse Cloud: Backups are automatic. Configure retention and frequency
in the Cloud console under Service Settings.

[ ] Backup schedule configured (daily minimum)
[ ] Backup restore tested and documented
[ ] Point-in-time recovery possible (incremental backups)
[ ] Backup stored in different region/account from primary

4. Monitoring & Ale
                
              

                
                  
                  
                  clickhouse-rate-limits
                  SKILL.md
                  View full skill →
                
                
                  'Configure ClickHouse query concurrency, memory quotas, and connection.
                  
                      ReadWriteEdit
                    
                
                
                  ClickHouse Rate Limits & Concurrency
Overview
ClickHouse does not have REST API rate limits like a SaaS product. Instead, it has
server-side concurrency limits, memory quotas, and per-user settings that control
resource usage. This skill covers how to configure and work within those limits.
Prerequisites

ClickHouse admin access (or Cloud console)
Understanding of your concurrency requirements

Instructions
Step 1: Understand Server-Side Limits

Setting
Default
Description


maxconcurrentqueries
100
Max queries running simultaneously


max_connections
4096
Max TCP/HTTP connections


maxmemoryusage
~10GB
Per-query memory limit


maxexecutiontime
0 (unlimited)
Per-query timeout in seconds


max_threads
CPU cores
Threads per query


ClickHouse Cloud API limit: The Cloud management API (not the query interface)
is limited to 10 requests per 10 seconds.
Step 2: Configure Per-User Quotas

-- Create a quota that limits query resources per user
CREATE QUOTA IF NOT EXISTS app_quota
    FOR INTERVAL 1 HOUR MAX
        queries = 10000,
        result_rows = 100000000,
        read_rows = 1000000000,
        execution_time = 3600
    TO app_user;

-- Create a profile with resource limits
CREATE SETTINGS PROFILE IF NOT EXISTS app_profile
    SETTINGS
        max_memory_usage = 5000000000,      -- 5GB per query
        max_execution_time = 30,             -- 30s timeout
        max_threads = 4,                     -- 4 threads per query
        max_concurrent_queries_for_user = 10 -- 10 parallel queries
    TO app_user;

Step 3: Client-Side Connection Pooling

import { createClient } from '@clickhouse/client';

// The @clickhouse/client manages HTTP keep-alive connections internally
const client = createClient({
  url: process.env.CLICKHOUSE_HOST!,
  username: process.env.CLICKHOUSE_USER!,
  password: process.env.CLICKHOUSE_PASSWORD!,
  max_open_connections: 10,   // Connection pool size
  request_timeout: 30_000,    // 30s per request
  compression: {
    request: true,            // Compress request bodies (saves bandwidth)
    response: true,           // Decompress responses
  },
});

Step 4: Application-Level Concurrency Control

import PQueue from 'p-queue';

// Limit concurrent ClickHouse queries from your app
const queryQueue = new PQueue({
  concurrency: 5,          // Max 5 concurrent queries
  timeout: 30_000,         //

                

              

                
                  
                  
                  clickhouse-reference-architecture
                  SKILL.md
                  View full skill →
                
                
                  "Production reference architecture for ClickHouse-backed applications\.
                  
                      ReadGrep
                    
                
                
                  ClickHouse Reference Architecture
Overview
Production-grade architecture for ClickHouse analytics platforms covering
project layout, data flow, multi-tenancy, and operational patterns.
Prerequisites

Understanding of ClickHouse fundamentals (engines, ORDER BY, partitioning)
TypeScript/Node.js project

Instructions
Step 1: Project Structure

my-analytics-platform/
├── src/
│   ├── clickhouse/
│   │   ├── client.ts           # Singleton client with health checks
│   │   ├── schemas/            # SQL DDL files (source of truth)
│   │   │   ├── 001-events.sql
│   │   │   ├── 002-users.sql
│   │   │   └── 003-materialized-views.sql
│   │   ├── queries/            # Named query functions
│   │   │   ├── events.ts
│   │   │   ├── users.ts
│   │   │   └── dashboards.ts
│   │   └── migrations/         # Schema migrations
│   │       ├── runner.ts
│   │       └── 001-add-country.sql
│   ├── ingestion/
│   │   ├── webhook-receiver.ts # HTTP webhook endpoint
│   │   ├── kafka-consumer.ts   # Kafka consumer (if applicable)
│   │   └── buffer.ts           # Insert batching buffer
│   ├── api/
│   │   ├── routes.ts           # API endpoints
│   │   └── middleware.ts       # Auth, rate limiting
│   └── jobs/
│       ├── daily-rollup.ts     # Scheduled aggregations
│       └── cleanup.ts          # TTL enforcement
├── tests/
│   ├── unit/
│   └── integration/
├── docker-compose.yml          # Local ClickHouse
├── init-db/                    # Docker init scripts
└── config/
    ├── development.env
    ├── staging.env
    └── production.env

Step 2: Data Flow Architecture

                    ┌─────────────────┐
                    │   Data Sources   │
                    │  (Webhooks, API, │
                    │   Kafka, S3)     │
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │  Ingestion Layer │
                    │  (Buffer + Batch │
                    │   10K+ rows/ins) │
                    └────────┬────────┘
                             │
              ┌──────────────▼──────────────┐
              │       ClickHouse Server      │
              │                              │
              │  ┌────────────────────────┐  │
              │  │   Raw Event Tables     │  │
              │  │   (MergeTree, append)  │  │
              │  └───────────┬────────────┘  │
              │              │               │
              │  ┌───────────▼────────────┐  │
              │  │  Materialized Views    │  │
              │  │  (Auto-aggregate on    │  │
              │  │   INSERT — hourly,     │  │
              │  │   daily, tenant-level) │  │
              │  └───────────┬────────────┘  │
              │              │               │
              │  ┌───────────▼────────────┐  │
              │  │  Aggregate Tables      │  │
              │  │  (A

                

              

                
                  
                  
                  clickhouse-sdk-patterns
                  SKILL.md
                  View full skill →
                
                
                  "Production-ready patterns for @clickhouse/client \u2014 streaming inserts,\.
                  
                      ReadWriteEdit
                    
                
                
                  ClickHouse SDK Patterns
Overview
Production patterns for @clickhouse/client — typed queries, streaming inserts,
error handling, and connection lifecycle management.
Prerequisites

@clickhouse/client installed (see clickhouse-install-auth)
Familiarity with async/await and Node.js streams

Instructions
Pattern 1: Typed Query Helper

import { createClient } from '@clickhouse/client';

const client = createClient({
  url: process.env.CLICKHOUSE_HOST!,
  username: process.env.CLICKHOUSE_USER ?? 'default',
  password: process.env.CLICKHOUSE_PASSWORD ?? '',
});

// Generic typed query — returns parsed JSON rows
async function query<T>(sql: string, params?: Record<string, unknown>): Promise<T[]> {
  const rs = await client.query({
    query: sql,
    query_params: params,
    format: 'JSONEachRow',
  });
  return rs.json<T>();
}

// Usage
interface EventCount {
  event_type: string;
  cnt: string;  // ClickHouse JSON returns numbers as strings
}

const rows = await query<EventCount>(
  'SELECT event_type, count() AS cnt FROM events WHERE user_id = {user_id:UInt64} GROUP BY event_type',
  { user_id: 42 }
);

Note on parameterized queries: ClickHouse uses {name:Type} syntax for parameters,
not $1 or ?. Always use typed parameters to prevent SQL injection.
Pattern 2: Streaming Insert (Backpressure-Safe)

import { createClient } from '@clickhouse/client';
import { Readable } from 'stream';

// For large inserts, stream data instead of buffering in memory
async function streamInsert(rows: AsyncIterable<Record<string, unknown>>) {
  const stream = new Readable({
    objectMode: true,
    read() {},  // push-based
  });

  const insertPromise = client.insert({
    table: 'events',
    values: stream,
    format: 'JSONEachRow',
  });

  for await (const row of rows) {
    // Backpressure: if push returns false, wait for drain
    if (!stream.push(row)) {
      await new Promise<void>((resolve) => stream.once('drain', resolve));
    }
  }
  stream.push(null);  // Signal end of stream

  await insertPromise;
}

Pattern 3: Batch Insert with Retry

async function batchInsert<T extends Record<string, unknown>>(
  table: string,
  rows: T[],
  batchSize = 10_000,
  maxRetries = 3,
): Promise<{ inserted: number; errors: Error[] }> {
  let inserted = 0;
  const errors: Error[] = [];

  for (let i = 0; i < rows.length; i += batchSize) {
    const batch = rows.slice(i, i + batchSize);
    let attempt = 0;

    while (attempt < maxRetries) {
      try {
        await client.insert({
   

                

              

                
                  
                  
                  clickhouse-security-basics
                  SKILL.md
                  View full skill →
                
                
                  'Secure ClickHouse with user management, network restrictions, TLS, and.
                  
                      ReadWriteGrep
                    
                
                
                  ClickHouse Security Basics
Overview
Secure a ClickHouse deployment with SQL-based user management, network restrictions,
TLS encryption, and query audit logging.
Prerequisites

ClickHouse admin access
CLICKHOUSEDEFAULTACCESS_MANAGEMENT=1 for SQL-based user management
For self-hosted: access to server config files

Instructions
Step 1: Create Restricted Users (SQL-Based RBAC)

-- Create a read-only analyst user
CREATE USER analyst
    IDENTIFIED WITH sha256_password BY 'strong-password-here'
    DEFAULT DATABASE analytics
    SETTINGS
        readonly = 1,                -- Read-only mode
        max_memory_usage = 5000000000,  -- 5GB per query
        max_execution_time = 60;     -- 60s timeout

GRANT SELECT ON analytics.* TO analyst;

-- Create an application user with insert permissions
CREATE USER app_writer
    IDENTIFIED WITH sha256_password BY 'another-strong-password'
    DEFAULT DATABASE analytics;

GRANT SELECT, INSERT ON analytics.* TO app_writer;
-- Explicitly deny destructive operations
REVOKE DROP, ALTER, CREATE ON *.* FROM app_writer;

-- Create an admin user
CREATE USER ch_admin
    IDENTIFIED WITH sha256_password BY 'admin-password'
    SETTINGS PROFILE 'default';

GRANT ALL ON *.* TO ch_admin WITH GRANT OPTION;

Step 2: Use Roles for Permission Groups

-- Create reusable roles
CREATE ROLE data_reader;
GRANT SELECT ON analytics.* TO data_reader;

CREATE ROLE data_writer;
GRANT SELECT, INSERT ON analytics.* TO data_writer;

CREATE ROLE schema_admin;
GRANT CREATE TABLE, ALTER TABLE, DROP TABLE ON analytics.* TO schema_admin;

-- Assign roles to users
GRANT data_reader TO analyst;
GRANT data_writer TO app_writer;
GRANT schema_admin, data_writer TO ch_admin;

-- Verify grants
SHOW GRANTS FOR analyst;
SHOW GRANTS FOR app_writer;

Step 3: Row-Level Security

-- Create a row policy: tenant users only see their own data
CREATE ROW POLICY tenant_isolation ON analytics.events
    FOR SELECT
    USING tenant_id = currentUser()  -- or a mapped value
    TO data_reader;

-- More practical: map users to tenant IDs via settings
CREATE USER tenant_42
    IDENTIFIED WITH sha256_password BY 'pass'
    SETTINGS custom_tenant_id = 42;

CREATE ROW POLICY tenant_filter ON analytics.events
    FOR SELECT
    USING tenant_id = getSetting('custom_tenant_id')
    TO tenant_42;

Step 4: Network Security

<!-- config.xml — restrict listen addresses -->
<listen_host>0.0.0.0</listen_host>  <!-- or specific IP -->

<!-- IP allowlist per user -->
<users>
    <app_writer>
        <networks>
            <ip>10.0.0.0/8</ip>          <!-- VPC only -->
            <i

                

              

                
                  
                  
                  clickhouse-upgrade-migration
                  SKILL.md
                  View full skill →
                
                
                  'Upgrade ClickHouse server versions and @clickhouse/client SDK safely.
                  
                      ReadWriteEditBash(npm:*)Bash(git:*)
                    
                
                
                  ClickHouse Upgrade & Migration
Overview
Safely upgrade ClickHouse server and the @clickhouse/client Node.js SDK,
with rollback procedures and breaking change detection.
Prerequisites

Current ClickHouse version known (SELECT version())
Git for version control
Test suite for integration validation
Staging environment for pre-production testing

Instructions
Step 1: Check Current Versions

# Check server version (via HTTP)
curl 'http://localhost:8123/?query=SELECT+version()'

# Check Node.js client version
npm list @clickhouse/client

# Check latest available
npm view @clickhouse/client version


-- Server-side version details
SELECT
    version()           AS server_version,
    uptime()            AS uptime_sec,
    currentDatabase()   AS current_db;

Step 2: Review Changelog

# View release notes
open https://github.com/ClickHouse/clickhouse-js/releases

# Server changelog
open https://github.com/ClickHouse/ClickHouse/blob/master/CHANGELOG.md

Key breaking changes to watch for:

Client API signature changes (createClient options)
Default setting changes (compression, timeouts)
New query result format behavior
Deprecated SQL functions removed in server upgrades
MergeTree settings renamed or defaults changed

Step 3: Upgrade the Node.js Client

git checkout -b upgrade/clickhouse-client
npm install @clickhouse/client@latest
npm test

Common migration patterns:

// v0.x → v1.x: createClient options restructured
// Before (v0.x)
import { createClient } from '@clickhouse/client';
const client = createClient({
  host: 'http://localhost:8123',
});

// After (v1.x)
const client = createClient({
  url: 'http://localhost:8123',   // 'host' renamed to 'url'
});

// v0.x → v1.x: query result handling
// Before: rs.json() returned { data: [...], statistics: {...} }
// After: rs.json() returns the rows array directly

// Before
const result = await rs.json();
const rows = result.data;

// After
const rows = await rs.json();

Step 4: Upgrade ClickHouse Server
ClickHouse Cloud: Upgrades happen automatically. Check release notes in
the Cloud console.
Self-hosted upgrade procedure:

# 1. Backup current data
clickhouse-client --query "BACKUP DATABASE analytics TO Disk('backups', 'pre-upgrade')"

# 2. Check compatibility
clickhouse-client --query "SELECT * FROM system.settings WHERE changed"

# 3. Stop server gracefully
sudo syst

                

              

                
                  
                  
                  clickhouse-webhooks-events
                  SKILL.md
                  View full skill →
                
                
                  'Ingest data into ClickHouse from webhooks, Kafka, and streaming sources.
                  
                      ReadWriteEditBash(curl:*)
                    
                
                
                  ClickHouse Data Ingestion
Overview
Build data ingestion pipelines into ClickHouse from HTTP webhooks, Kafka,
and streaming sources with proper batching, deduplication, and error handling.
Prerequisites

ClickHouse table with appropriate engine (see clickhouse-core-workflow-a)
@clickhouse/client connected

Instructions
Step 1: Webhook Receiver with Batched Inserts

import express from 'express';
import { createClient } from '@clickhouse/client';

const client = createClient({ url: process.env.CLICKHOUSE_HOST! });
const app = express();
app.use(express.json());

// Buffer for batching — ClickHouse hates one-row-at-a-time inserts
const buffer: Record<string, unknown>[] = [];
const BATCH_SIZE = 5_000;
const FLUSH_INTERVAL_MS = 5_000;

async function flushBuffer() {
  if (buffer.length === 0) return;
  const batch = buffer.splice(0, buffer.length);

  try {
    await client.insert({
      table: 'analytics.events',
      values: batch,
      format: 'JSONEachRow',
    });
    console.log(`Flushed ${batch.length} events to ClickHouse`);
  } catch (err) {
    console.error('Insert failed, re-queuing:', (err as Error).message);
    buffer.unshift(...batch);  // Put back at front for retry
  }
}

// Flush periodically
setInterval(flushBuffer, FLUSH_INTERVAL_MS);

// Webhook endpoint
app.post('/ingest', async (req, res) => {
  const events = Array.isArray(req.body) ? req.body : [req.body];

  for (const event of events) {
    buffer.push({
      event_type: event.type ?? 'unknown',
      user_id: event.userId ?? 0,
      properties: JSON.stringify(event.properties ?? {}),
      created_at: new Date().toISOString().replace('T', ' ').slice(0, 19),
    });
  }

  if (buffer.length >= BATCH_SIZE) {
    await flushBuffer();
  }

  res.status(202).json({ queued: events.length, buffer_size: buffer.length });
});

Step 2: Kafka Table Engine (Server-Side Ingestion)

-- Create a Kafka engine table (consumes messages automatically)
CREATE TABLE analytics.events_kafka (
    event_type  String,
    user_id     UInt64,
    properties  String,
    timestamp   DateTime
)
ENGINE = Kafka()
SETTINGS
    kafka_broker_list = 'kafka:9092',
    kafka_topic_list = 'events',
    kafka_group_name = 'clickhouse_consumer',
    kafka_format = 'JSONEachRow',
    kafka_num_consumers = 2,
    kafka_max_block_size = 65536;

-- Materialized view pipes Kafka → MergeTree automatically
CREATE MATERIALIZED VIEW analytics.events_kafka_mv
TO analytics.events
AS SELECT
    event_type,
    user_id,
    properties,
    timestamp AS created_at
FROM analytics.events_kafka;

-- ClickHouse now consumes from Kafka continuously!
-- Check lag:
SELECT * FROM system.kafka_consumers;

S
                
              

          

        

      
      

      
      

      
      

      
      
  Ready to use clickhouse-pack?
  
    
    
  




      
      
          Related Plugins
          
            
                supabase-pack
                Complete Supabase integration skill pack with 30 skills covering authentication, database, storage, realtime, edge functions, and production operations. Flagship+ tier vendor pack.
              
                vercel-pack
                Complete Vercel integration skill pack with 30 skills covering deployments, edge functions, preview environments, performance optimization, and production operations. Flagship+ tier vendor pack.
              
                clay-pack
                Complete Clay integration skill pack with 30 skills covering data enrichment, waterfall workflows, AI agents, and GTM automation. Flagship+ tier vendor pack.
              
                cursor-pack
                Complete Cursor integration skill pack with 30 skills covering AI code editing, composer workflows, codebase indexing, and productivity features. Flagship+ tier vendor pack.
              
                exa-pack
                Complete Exa integration skill pack with 30 skills covering neural search, semantic retrieval, web search API, and AI-powered discovery. Flagship+ tier vendor pack.
              
                firecrawl-pack
                Complete Firecrawl integration skill pack with 30 skills covering web scraping, crawling, markdown conversion, and LLM-ready data extraction. Flagship+ tier vendor pack.
              
          
        

      
      
          Tags
          
            clickhousesaassdkintegration
          
        
    
  

  

    

    
    
        
            
                Stay in the Loop
                
                    
                    
                    
                
                No spam. Unsubscribe anytime.
            

            
                
                    Product
                    
                        Explore
                        Skills
                        Cowork
                        Compare
                        Tools
                    
                
                
                    Resources
                    
                        Docs
                        Changelog
                        Collections
                        Playbooks
                        Research
                        Learning
                    
                
                
                    Company
                    
                        Community
                        Hall of Fame
                        GitHub
                    
                
                
                    Legal
                    
                        Privacy
                        Terms
                        Acceptable Use
                    
                
            

            
                Tons of Skills by Intent Solutions. Marine. Citadel Grad. 20 years ops → self-taught dev → AI architect.
                © 2026 Tons of Skills | Intent Solutions

Engine	Best For	Dedup?	Example
`MergeTree`	General analytics, append-only logs	No	Clickstream, IoT
`ReplacingMergeTree`	Mutable rows (upserts)	Yes (on merge)	User profiles, state
`SummingMergeTree`	Pre-aggregated counters	Sums numerics	Page view counts
`AggregatingMergeTree`	Materialized view targets	Merges states	Dashboards
`CollapsingMergeTree`	Stateful row updates	Collapses +-1	Shopping carts

Component	Pricing Model	Key Driver
Compute	Per-hour per replica	vCPU + memory tier
Storage	Per GB-month	Compressed data on disk
Network	Per GB egress	Query result sizes
Backups	Per GB stored	Backup retention

Level	Definition	Response	Examples
P1	ClickHouse unreachable / all queries failing	< 15 min	Server down, OOM, disk full
P2	Degraded performance / partial failures	< 1 hour	Slow queries, merge backlog
P3	Minor impact / non-critical errors	< 4 hours	Single table issue, warnings
P4	No user impact	Next business day	Monitoring gaps, optimization