Claude Code skill pack for ClickHouse (24 skills)
Installation
Open Claude Code and run this command:
/plugin install clickhouse-pack@claude-code-plugins-plus
Use --global to install for all projects, or --project for current project only.
What It Does
> 24 skills for building, operating, and scaling ClickHouse-powered analytics — real @clickhouse/client code, real SQL, real MergeTree engines.
Every skill uses the official ClickHouse Node.js client (@clickhouse/client with createClient), actual ClickHouse SQL syntax (MergeTree, ReplacingMergeTree, AggregatingMergeTree), real system tables (system.parts, system.query_log, system.merges), and production patterns (parameterized queries, streaming inserts, materialized views).
Links: tonsofskills.com | ClickHouse Docs | @clickhouse/client
Skills (24)
'Run ClickHouse integration tests in CI with GitHub Actions and Docker.
ClickHouse CI Integration
Overview
Run integration tests against a real ClickHouse server in GitHub Actions using
Docker service containers. No mocks needed for schema and query validation.
Prerequisites
- GitHub repository with Actions enabled
@clickhouse/clientin project dependencies- Test suite (vitest or jest)
Instructions
Step 1: GitHub Actions Workflow with ClickHouse Service
# .github/workflows/clickhouse-tests.yml
name: ClickHouse Integration Tests
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
services:
clickhouse:
image: clickhouse/clickhouse-server:latest
ports:
- 8123:8123
- 9000:9000
options: >-
--health-cmd "wget --no-verbose --tries=1 --spider http://localhost:8123/ping || exit 1"
--health-interval 10s
--health-timeout 5s
--health-retries 5
env:
CLICKHOUSE_HOST: http://localhost:8123
CLICKHOUSE_USER: default
CLICKHOUSE_PASSWORD: ""
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"
- run: npm ci
# Apply schema before tests
- name: Apply schema
run: |
curl -s 'http://localhost:8123/' -d 'CREATE DATABASE IF NOT EXISTS test_db'
for f in init-db/*.sql; do
echo "Applying $f..."
curl -s 'http://localhost:8123/?database=test_db' --data-binary @"$f"
done
- name: Run unit tests
run: npm test -- --coverage
- name: Run integration tests
run: npm run test:integration
Step 2: Integration Test Setup
// tests/setup-integration.ts
import { createClient, ClickHouseClient } from '@clickhouse/client';
import { beforeAll, afterAll, beforeEach } from 'vitest';
let client: ClickHouseClient;
beforeAll(async () => {
client = createClient({
url: process.env.CLICKHOUSE_HOST ?? 'http://localhost:8123',
database: 'test_db',
});
// Verify connection
const { success } = await client.ping();
if (!success) throw new Error('ClickHouse not reachable');
});
beforeEach(async () => {
// Clean test data between tests
await client.command({ query: 'TRUNCATE TABLE IF EXISTS test_db.events' });
});
afterAll(async () => {
await client.close();
});
export { client };
Step 3: Write Real Integration Tests
// tests/events.integration.test.ts
import { describe, it, expect } from 'vitest';
import { client } from './setup-integration';
describe('Events ta"Diagnose and fix the top 15 ClickHouse errors \u2014 query failures,\.
ClickHouse Common Errors
Overview
Quick reference for the most common ClickHouse errors with real error codes,
diagnostic queries, and proven solutions.
Prerequisites
- Access to ClickHouse (client or HTTP interface)
- Ability to query
system.*tables
Error Reference
1. Too Many Parts (Code 252)
DB::Exception: Too many parts (600). Merges are processing significantly slower than inserts.
Cause: Each INSERT creates a new data part. Hundreds of tiny inserts per second
overwhelm the merge process.
Fix:
-- Check current part count per table
SELECT database, table, count() AS part_count
FROM system.parts WHERE active GROUP BY database, table ORDER BY part_count DESC;
-- Temporary: raise the limit
ALTER TABLE events MODIFY SETTING parts_to_throw_insert = 1000;
-- Permanent: batch your inserts (10K+ rows per INSERT)
-- See clickhouse-sdk-patterns for batching code
2. Memory Limit Exceeded (Code 241)
DB::Exception: Memory limit (for query) exceeded: ... (MEMORY_LIMIT_EXCEEDED)
Cause: Query allocates more RAM than maxmemoryusage (default ~10GB).
Fix:
-- Check what's consuming memory
SELECT query, memory_usage, peak_memory_usage
FROM system.processes ORDER BY peak_memory_usage DESC;
-- Option A: Increase limit for this query
SET max_memory_usage = 20000000000; -- 20GB
-- Option B: Reduce data scanned
SELECT ... FROM events
WHERE created_at >= today() - 7 -- Add time filters
LIMIT 10000; -- Cap result size
-- Option C: Enable disk spill for large sorts/GROUP BY
SET max_bytes_before_external_sort = 10000000000;
SET max_bytes_before_external_group_by = 10000000000;
3. Syntax Error (Code 62)
DB::Exception: Syntax error: ... Expected ... before ... (SYNTAX_ERROR)
Common causes:
-- Wrong: using backticks for identifiers (MySQL habit)
SELECT `user_id` FROM events;
-- Fix: use double-quotes or no quotes
SELECT "user_id" FROM events;
SELECT user_id FROM events;
-- Wrong: LIMIT with OFFSET keyword
SELECT * FROM events LIMIT 10, 20;
-- Fix: use LIMIT ... OFFSET
SELECT * FROM events LIMIT 10 OFFSET 20;
-- Wrong: using != in older versions
WHERE status != 'active';
-- Fix: use <>
WHERE status <> 'active';
4. Unknown Table (Code 60)
DB::Exception: Table default.events does not exist. (UNKNOWN_TABLE)
Fix:
-- List all tables in the database
SHOW TABLES FROM default;
-- Check all databases
SHOW DATABASES;
-- The table might be in a different database
SE'Design ClickHouse schemas with MergeTree engines, ORDER BY keys, and.
ClickHouse Schema Design (Core Workflow A)
Overview
Design ClickHouse tables with correct engine selection, ORDER BY keys,
partitioning, and codec choices for analytical workloads.
Prerequisites
@clickhouse/clientconnected (seeclickhouse-install-auth)- Understanding of your query patterns (what you filter and group on)
Instructions
Step 1: Choose the Right Engine
| Engine | Best For | Dedup? | Example |
|---|---|---|---|
MergeTree |
General analytics, append-only logs | No | Clickstream, IoT |
ReplacingMergeTree |
Mutable rows (upserts) | Yes (on merge) | User profiles, state |
SummingMergeTree |
Pre-aggregated counters | Sums numerics | Page view counts |
AggregatingMergeTree |
Materialized view targets | Merges states | Dashboards |
CollapsingMergeTree |
Stateful row updates | Collapses +-1 | Shopping carts |
ClickHouse Cloud uses SharedMergeTree — it is a drop-in replacement for
MergeTree on Cloud. You do not need to change your DDL.
Step 2: Design the ORDER BY (Sort Key)
The ORDER BY clause is the single most important schema decision. It defines:
- Primary index — sparse index over sort-key granules (8192 rows default)
- Data layout on disk — rows sorted physically by these columns
- Query speed — queries filtering on ORDER BY prefix columns hit fewer granules
Rules of thumb:
- Put low-cardinality filter columns first (
event_type,status) - Then high-cardinality columns you filter on (
userid,tenantid) - End with a time column if you use range filters (
created_at) - Do NOT put high-cardinality columns you never filter on in ORDER BY
-- Good: filter by tenant, then by time ranges
ORDER BY (tenant_id, event_type, created_at)
-- Bad: UUID first means every query scans the full index
ORDER BY (event_id, created_at) -- event_id is random UUID
Step 3: Schema Examples
Event Analytics Table
CREATE TABLE analytics.events (
event_id UUID DEFAULT generateUUIDv4(),
tenant_id UInt32,
event_type LowCardinality(String),
user_id UInt64,
session_id String,
properties String CODEC(ZSTD(3)), -- JSON blob, compress well
url String CODEC(ZSTD(1)),
ip_address IPv4,
country 'Insert, query, and aggregate data in ClickHouse with real SQL patterns.
ClickHouse Insert & Query (Core Workflow B)
Overview
Insert data efficiently and write analytical queries with aggregations,
window functions, and materialized views.
Prerequisites
- Tables created (see
clickhouse-core-workflow-a) @clickhouse/clientconnected
Instructions
Step 1: Bulk Insert Patterns
import { createClient } from '@clickhouse/client';
const client = createClient({
url: process.env.CLICKHOUSE_HOST!,
username: process.env.CLICKHOUSE_USER ?? 'default',
password: process.env.CLICKHOUSE_PASSWORD ?? '',
});
// Insert many rows efficiently — @clickhouse/client buffers internally
await client.insert({
table: 'analytics.events',
values: events, // Array of objects matching table columns
format: 'JSONEachRow',
});
// Insert from file (CSV, Parquet, etc.)
import { createReadStream } from 'fs';
await client.insert({
table: 'analytics.events',
values: createReadStream('./data/events.csv'),
format: 'CSVWithNames',
});
Insert best practices:
- Batch rows: aim for 10K-100K rows per INSERT (not one at a time)
- ClickHouse creates a new "part" per INSERT — too many small inserts cause "too many parts"
- For real-time streams, buffer 1-5 seconds then flush
Step 2: Analytical Queries
-- Top events by tenant in the last 7 days
SELECT
tenant_id,
event_type,
count() AS event_count,
uniqExact(user_id) AS unique_users,
min(created_at) AS first_seen,
max(created_at) AS last_seen
FROM analytics.events
WHERE created_at >= now() - INTERVAL 7 DAY
GROUP BY tenant_id, event_type
ORDER BY event_count DESC
LIMIT 100;
-- Funnel analysis: signup → activation → purchase
SELECT
level,
count() AS users
FROM (
SELECT
user_id,
groupArray(event_type) AS journey
FROM analytics.events
WHERE event_type IN ('signup', 'activation', 'purchase')
AND created_at >= today() - 30
GROUP BY user_id
)
ARRAY JOIN arrayEnumerate(journey) AS level
GROUP BY level
ORDER BY level;
-- Retention: users active this week who were also active last week
SELECT
count(DISTINCT curr.user_id) AS retained_users
FROM analytics.events AS curr
INNER JOIN analytics.events AS prev
ON curr.user_id = prev.user_id
WHERE curr.created_at >= toMonday(today())
AND prev.created_at >= toMonday(today()) - 7
AND prev.created_at < toMonday(today());
Step 3: Parameterized Queries in Node.js
// Use {param:Type} syntax for safe parameterized queries
const rs = await client.query({"Optimize ClickHouse Cloud costs \u2014 compute scaling, storage tiering,\.
ClickHouse Cost Tuning
Overview
Reduce ClickHouse Cloud costs through storage optimization, compression tuning,
TTL policies, compute scaling, and query efficiency improvements.
Prerequisites
- ClickHouse Cloud account with billing access
- Understanding of current data volumes and query patterns
Instructions
Step 1: Understand ClickHouse Cloud Pricing
| Component | Pricing Model | Key Driver |
|---|---|---|
| Compute | Per-hour per replica | vCPU + memory tier |
| Storage | Per GB-month | Compressed data on disk |
| Network | Per GB egress | Query result sizes |
| Backups | Per GB stored | Backup retention |
Key insight: ClickHouse bills on compressed storage, and ClickHouse
compresses extremely well (often 10-20x). Your cost driver is usually compute,
not storage.
Step 2: Analyze Storage Usage
-- Storage cost breakdown by table
SELECT
database,
table,
formatReadableSize(sum(bytes_on_disk)) AS compressed_size,
formatReadableSize(sum(data_uncompressed_bytes)) AS raw_size,
round(sum(data_uncompressed_bytes) / sum(bytes_on_disk), 1) AS compression_ratio,
sum(rows) AS total_rows,
count() AS parts
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytes_on_disk) DESC;
-- Storage by column (find bloated columns)
SELECT
table,
column,
type,
formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
formatReadableSize(sum(column_data_uncompressed_bytes)) AS raw,
round(sum(column_data_uncompressed_bytes) / sum(column_data_compressed_bytes), 1) AS ratio
FROM system.parts_columns
WHERE active AND database = 'analytics'
GROUP BY table, column, type
ORDER BY sum(column_data_compressed_bytes) DESC
LIMIT 30;
Step 3: Improve Compression
-- Check current codec per column
SELECT name, type, compression_codec
FROM system.columns
WHERE database = 'analytics' AND table = 'events';
-- Apply better codecs to large columns
ALTER TABLE analytics.events
MODIFY COLUMN properties String CODEC(ZSTD(3)); -- JSON blobs
ALTER TABLE analytics.events
MODIFY COLUMN created_at DateTime CODEC(DoubleDelta, ZSTD); -- Timestamps
ALTER TABLE analytics.events
MODIFY COLUMN user_id UInt64 CODEC(Delta, ZSTD); -- Sequential IDs
-- Verify improvement after next merge
OPTIMIZE TABLE analytics.events FINAL;
-- Check new compression ratio
SELECT
column,
formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
round(sum(column_data_uncompressed_bytes) / sum(column_data_compressed_bytes), 1) AS ratio
FROM system.parts_columns
WHERE active AND datab"Handle data lifecycle in ClickHouse \u2014 TTL expiration, data deletion\.
ClickHouse Data Handling
Overview
Manage the full data lifecycle in ClickHouse: TTL-based expiration, GDPR/CCPA
deletion, data masking, partition management, and audit trails.
Prerequisites
- ClickHouse tables with data (see
clickhouse-core-workflow-a) - Understanding of your data retention requirements
Instructions
Step 1: TTL-Based Data Expiration
-- Add TTL to expire data automatically
CREATE TABLE analytics.events (
event_id UUID DEFAULT generateUUIDv4(),
event_type LowCardinality(String),
user_id UInt64,
properties String CODEC(ZSTD(3)),
created_at DateTime DEFAULT now()
)
ENGINE = MergeTree()
ORDER BY (event_type, created_at)
PARTITION BY toYYYYMM(created_at)
TTL created_at + INTERVAL 90 DAY; -- Auto-delete after 90 days
-- Add TTL to existing table
ALTER TABLE analytics.events
MODIFY TTL created_at + INTERVAL 90 DAY;
-- Tiered storage TTL (hot → cold → delete)
ALTER TABLE analytics.events
MODIFY TTL
created_at + INTERVAL 7 DAY TO VOLUME 'hot',
created_at + INTERVAL 30 DAY TO VOLUME 'cold',
created_at + INTERVAL 365 DAY DELETE;
-- Column-level TTL (null out PII after 30 days, keep the row)
ALTER TABLE analytics.events
MODIFY COLUMN email String DEFAULT ''
TTL created_at + INTERVAL 30 DAY;
-- Force TTL cleanup now (normally runs during merges)
OPTIMIZE TABLE analytics.events FINAL;
Step 2: Data Deletion for GDPR/CCPA
-- Option A: Lightweight DELETE (ClickHouse 23.3+)
-- Marks rows as deleted without rewriting parts immediately
DELETE FROM analytics.events WHERE user_id = 42;
-- Option B: ALTER TABLE DELETE (mutation — rewrites parts in background)
ALTER TABLE analytics.events DELETE WHERE user_id = 42;
-- Check mutation progress
SELECT
database, table, mutation_id, command,
is_done, parts_to_do, create_time
FROM system.mutations
WHERE NOT is_done
ORDER BY create_time DESC;
-- Option C: Drop entire partitions (fastest for bulk deletion)
-- First, check what partitions exist
SELECT partition, count() AS parts, sum(rows) AS rows,
min(min_time) AS from_time, max(max_time) AS to_time
FROM system.parts
WHERE database = 'analytics' AND table = 'events' AND active
GROUP BY partition ORDER BY partition;
ALTER TABLE analytics.events DROP PARTITION '202401';
Important notes on ClickHouse deletions:
DELETE FROMis lightweight but still creates mutations internally- Mutations rewrite data parts in the background — not instant
- For GDPR compliance, use
ALTER TABLE DELETEand verify viasystem.mutations - Partitioned data is fastest to bulk-delete via
DROP PARTITION
Step 3: Data Masking and Anonymization
"Collect ClickHouse diagnostic data \u2014 system tables, query logs,\.
ClickHouse Debug Bundle
Overview
Collect comprehensive diagnostic data from ClickHouse system tables for
troubleshooting performance issues, merge problems, or support escalation.
Prerequisites
- Access to ClickHouse with
system.*table read permissions curlorclickhouse-clientavailable
Instructions
Step 1: Server Health Overview
-- Server version and uptime
SELECT
version() AS version,
uptime() AS uptime_seconds,
formatReadableTimeDelta(uptime()) AS uptime_human,
currentDatabase() AS current_db;
-- Global metrics snapshot
SELECT metric, value, description
FROM system.metrics
WHERE metric IN (
'Query', 'Merge', 'PartMutation', 'ReplicatedFetch',
'TCPConnection', 'HTTPConnection', 'MemoryTracking',
'BackgroundMergesAndMutationsPoolTask'
);
Step 2: Disk and Table Health
-- Disk usage by table (top 20)
SELECT
database,
table,
formatReadableSize(sum(bytes_on_disk)) AS disk_size,
sum(rows) AS total_rows,
count() AS active_parts,
max(modification_time) AS last_modified
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytes_on_disk) DESC
LIMIT 20;
-- Tables with too many parts (merge pressure)
SELECT database, table, count() AS parts
FROM system.parts WHERE active
GROUP BY database, table
HAVING parts > 100
ORDER BY parts DESC;
-- Disk space per disk
SELECT
name,
path,
formatReadableSize(total_space) AS total,
formatReadableSize(free_space) AS free,
round(free_space / total_space * 100, 1) AS free_pct
FROM system.disks;
Step 3: Query Performance Analysis
-- Slowest queries in the last 24 hours
SELECT
event_time,
query_duration_ms,
read_rows,
read_bytes,
result_rows,
memory_usage,
substring(query, 1, 200) AS query_preview
FROM system.query_log
WHERE type = 'QueryFinish'
AND event_time >= now() - INTERVAL 24 HOUR
ORDER BY query_duration_ms DESC
LIMIT 20;
-- Failed queries (last 24h)
SELECT
event_time,
exception_code,
exception,
substring(query, 1, 200) AS query_preview
FROM system.query_log
WHERE type = 'ExceptionWhileProcessing'
AND event_time >= now() - INTERVAL 24 HOUR
ORDER BY event_time DESC
LIMIT 20;
-- Query patterns (group by normalized query)
SELECT
normalized_query_hash,
count() AS executions,
avg(query_duration_ms) AS avg_ms,
max(query_duration_ms) AS max_ms,
sum(read_rows) AS total_rows_read,
formatReadableSize(sum(read_bytes)) AS 'Deploy ClickHouse-backed applications to Vercel, Fly.
ClickHouse Deploy Integration
Overview
Deploy applications that connect to ClickHouse Cloud from serverless and
container platforms with proper connection management and secrets handling.
Prerequisites
- ClickHouse Cloud instance (or self-hosted with public endpoint)
- Platform CLI installed (vercel, fly, or gcloud)
- Application tested locally against ClickHouse
Instructions
Step 1: ClickHouse Connection Module (Platform-Agnostic)
// src/db.ts — singleton for serverless-safe connections
import { createClient, ClickHouseClient } from '@clickhouse/client';
let client: ClickHouseClient | null = null;
export function getClickHouse(): ClickHouseClient {
if (!client) {
client = createClient({
url: process.env.CLICKHOUSE_HOST!, // https://<host>:8443
username: process.env.CLICKHOUSE_USER!,
password: process.env.CLICKHOUSE_PASSWORD!,
database: process.env.CLICKHOUSE_DATABASE ?? 'default',
request_timeout: 30_000,
max_open_connections: 5, // Low for serverless (many cold starts)
compression: {
request: true, // Saves egress bandwidth
response: true,
},
});
}
return client;
}
Step 2: Vercel (Serverless Functions)
# Set secrets
vercel env add CLICKHOUSE_HOST production
vercel env add CLICKHOUSE_USER production
vercel env add CLICKHOUSE_PASSWORD production
vercel env add CLICKHOUSE_DATABASE production
// api/events/route.ts (Next.js App Router)
import { getClickHouse } from '@/src/db';
import { NextResponse } from 'next/server';
export async function GET(request: Request) {
const { searchParams } = new URL(request.url);
const days = Number(searchParams.get('days') ?? 7);
const client = getClickHouse();
const rs = await client.query({
query: `
SELECT event_type, count() AS cnt
FROM events
WHERE created_at >= now() - INTERVAL {days:UInt32} DAY
GROUP BY event_type ORDER BY cnt DESC
`,
query_params: { days },
format: 'JSONEachRow',
});
return NextResponse.json(await rs.json());
}
Vercel gotchas:
- Serverless function timeout: 30s (Pro) / 10s (Hobby)
- Each invocation may create a new connection — set
maxopenconnectionslow - Use Edge Runtime only with HTTP-based clients (ClickHouse client works fine)
Step 3: Fly.io (Containers)
# fly.toml
app = "my-clickhouse-app"
primary_region = "iad"
[env]
NODE_ENV = "production"
CLICKHOUSE_DATABASE = "analytics"
[http_service]
internal_port = 3000
force_https = true
auto_stop_machines = true
auto_start_machines"Configure ClickHouse enterprise RBAC \u2014 SQL-based users, roles,\.
ClickHouse Enterprise RBAC
Overview
Implement enterprise-grade role-based access control in ClickHouse using SQL-based
user management, hierarchical roles, row-level policies, and quotas.
Prerequisites
- ClickHouse with
access_management = 1enabled (default in Cloud) - Admin user with
GRANT OPTION
Instructions
Step 1: Create Users with Authentication
-- SHA256 password (standard)
CREATE USER app_backend
IDENTIFIED WITH sha256_password BY 'strong-password-here'
DEFAULT DATABASE analytics
HOST IP '10.0.0.0/8' -- Restrict to VPC
SETTINGS max_memory_usage = 10000000000, -- 10GB per query
max_execution_time = 60; -- 60s timeout
-- Double SHA1 (MySQL wire protocol compatible)
CREATE USER legacy_app
IDENTIFIED WITH double_sha1_password BY 'password'
DEFAULT DATABASE analytics;
-- bcrypt (strongest, slowest — use for admin accounts)
CREATE USER admin_user
IDENTIFIED WITH bcrypt_password BY 'admin-password';
-- Verify user was created
SHOW CREATE USER app_backend;
SELECT name, host_ip, default_database FROM system.users;
Step 2: Create Role Hierarchy
-- Base roles (leaf-level permissions)
CREATE ROLE data_reader;
GRANT SELECT ON analytics.* TO data_reader;
CREATE ROLE data_writer;
GRANT INSERT ON analytics.* TO data_writer;
CREATE ROLE schema_manager;
GRANT CREATE TABLE, ALTER TABLE, DROP TABLE ON analytics.* TO schema_manager;
-- Composite roles (inherit from base roles)
CREATE ROLE analyst;
GRANT data_reader TO analyst;
-- Analysts can also create temporary tables for ad-hoc work
GRANT CREATE TEMPORARY TABLE ON *.* TO analyst;
CREATE ROLE developer;
GRANT data_reader, data_writer TO developer;
CREATE ROLE platform_admin;
GRANT data_reader, data_writer, schema_manager TO platform_admin;
GRANT SYSTEM RELOAD, SYSTEM FLUSH LOGS ON *.* TO platform_admin;
-- Assign roles to users
GRANT analyst TO app_backend; -- Read-only
GRANT developer TO app_backend; -- Read + write
GRANT platform_admin TO admin_user; -- Full access
-- Set default role (active when user connects)
SET DEFAULT ROLE developer TO app_backend;
-- Verify the full permission chain
SHOW GRANTS FOR app_backend;
SHOW ACCESS; -- All users, roles, policies
Step 3: Row-Level Security
-- Multi-tenant isolation: each user sees only their tenant's data
CREATE USER tenant_acme
IDENTIFIED WITH sha256_password BY 'pass'
DEFAULT DATABASE analytics;
CREATE USER tenant_globex
IDENTIFIED WITH sha256_password BY 'pass'
DEFAULT DATABASE analytics;
-- Row policy: restrict by tenant_id
CREATE ROW POLICY acme_isolation ON analytics.events
FOR SELECT
USING tenant_id = 1
TO tenant_acme;
CREATE ROW POLICY globex_i'Create your first ClickHouse table, insert data, and run analytical.
ClickHouse Hello World
Overview
Create a MergeTree table, insert rows with JSONEachRow, and run your first
analytical query -- all using the official @clickhouse/client.
Prerequisites
@clickhouse/clientinstalled and connected (seeclickhouse-install-auth)
Instructions
Step 1: Create a MergeTree Table
import { createClient } from '@clickhouse/client';
const client = createClient({
url: process.env.CLICKHOUSE_HOST ?? 'http://localhost:8123',
username: process.env.CLICKHOUSE_USER ?? 'default',
password: process.env.CLICKHOUSE_PASSWORD ?? '',
});
await client.command({
query: `
CREATE TABLE IF NOT EXISTS events (
event_id UUID DEFAULT generateUUIDv4(),
event_type LowCardinality(String),
user_id UInt64,
payload String,
created_at DateTime DEFAULT now()
)
ENGINE = MergeTree()
ORDER BY (event_type, created_at)
PARTITION BY toYYYYMM(created_at)
TTL created_at + INTERVAL 90 DAY
`,
});
console.log('Table "events" created.');
Key concepts:
MergeTree()-- the foundational ClickHouse engine for analyticsORDER BY-- defines the primary index (sort key); pick columns you filter/group onPARTITION BY-- splits data into parts by month for efficient pruningTTL-- automatic data expirationLowCardinality(String)-- dictionary-encoded string, ideal for columns with < 10K distinct values
Step 2: Insert Data with JSONEachRow
await client.insert({
table: 'events',
values: [
{ event_type: 'page_view', user_id: 1001, payload: '{"url":"/home"}' },
{ event_type: 'click', user_id: 1001, payload: '{"button":"signup"}' },
{ event_type: 'page_view', user_id: 1002, payload: '{"url":"/pricing"}' },
{ event_type: 'purchase', user_id: 1002, payload: '{"amount":49.99}' },
{ event_type: 'page_view', user_id: 1003, payload: '{"url":"/docs"}' },
],
format: 'JSONEachRow',
});
console.log('Inserted 5 events.');
Step 3: Query the Data
// Count events by type
const rs = await client.query({
query: `
SELECT
event_type,
count() AS total,
uniqExact(user_id) AS unique_users
FROM events
GROUP BY event_type
ORDER BY total DESC
`,
format: 'JSONEachRow',
});
const rows = await rs.json<{
event_type: string;
total: string; // ClickHouse returns numbers as strings in JS"ClickHouse incident response \u2014 triage, diagnose, and remediate\.
ClickHouse Incident Runbook
Overview
Step-by-step procedures for triaging and resolving ClickHouse incidents
using built-in system tables and SQL commands.
Severity Levels
| Level | Definition | Response | Examples |
|---|---|---|---|
| P1 | ClickHouse unreachable / all queries failing | < 15 min | Server down, OOM, disk full |
| P2 | Degraded performance / partial failures | < 1 hour | Slow queries, merge backlog |
| P3 | Minor impact / non-critical errors | < 4 hours | Single table issue, warnings |
| P4 | No user impact | Next business day | Monitoring gaps, optimization |
Quick Triage (Run First)
# 1. Is ClickHouse alive?
curl -sf 'http://localhost:8123/ping' && echo "UP" || echo "DOWN"
# 2. Can it answer a query?
curl -sf 'http://localhost:8123/?query=SELECT+1' && echo "OK" || echo "QUERY FAILED"
# 3. Check ClickHouse Cloud status
curl -sf 'https://status.clickhouse.cloud' | head -5
-- 4. Server health snapshot (run if server responds)
SELECT
version() AS version,
formatReadableTimeDelta(uptime()) AS uptime,
(SELECT count() FROM system.processes) AS running_queries,
(SELECT value FROM system.metrics WHERE metric = 'MemoryTracking')
AS memory_bytes,
(SELECT count() FROM system.merges) AS active_merges;
-- 5. Recent errors
SELECT event_time, exception_code, exception, substring(query, 1, 200) AS q
FROM system.query_log
WHERE type = 'ExceptionWhileProcessing'
AND event_time >= now() - INTERVAL 10 MINUTE
ORDER BY event_time DESC
LIMIT 10;
Decision Tree
Server responds to ping?
├─ NO → Check process/container status, disk space, OOM killer logs
│ └─ Container/process dead → Restart, check logs
│ └─ Disk full → Emergency: drop old partitions, expand disk
│ └─ OOM killed → Reduce max_memory_usage, add RAM
└─ YES → Queries succeeding?
├─ NO → Check error codes below
│ └─ Auth errors (516) → Verify credentials, check user exists
│ └─ Too many queries (202) → Kill stuck queries, reduce concurrency
│ └─ Memory exceeded (241) → Kill large queries, reduce max_threads
└─ YES but slow → Performance triage below
Remediation Procedures
P1: Server Down / OOM
# Check if process was OOM-killed
dmesg | grep -i "out of memory" | tail -5
journalctl -u clickhouse-server --since "10 minutes ago" | tail -20
# Restart
sudo systemctl restart clickhouse-server
# or for Docker:
docker restart clickhouse
# Verify recovery
curl ''Install @clickhouse/client and configure authentication to ClickHouse.
ClickHouse Install & Auth
Overview
Set up the official ClickHouse client for Node.js or Python and configure authentication
to ClickHouse Cloud or a self-hosted instance.
Prerequisites
- Node.js 18+ or Python 3.8+
- A running ClickHouse instance (Cloud or self-hosted)
- Connection credentials (host, port, user, password)
Instructions
Step 1: Install the Official Client
# Node.js — official client (HTTP-based, supports streaming)
npm install @clickhouse/client
# Python — official client
pip install clickhouse-connect
Step 2: Configure Environment Variables
# .env (NEVER commit — add to .gitignore)
CLICKHOUSE_HOST=https://abc123.us-east-1.aws.clickhouse.cloud:8443
CLICKHOUSE_USER=default
CLICKHOUSE_PASSWORD=your-password-here
# Self-hosted (HTTP interface on port 8123, native on 9000)
# CLICKHOUSE_HOST=http://localhost:8123
Step 3: Create the Client (Node.js)
import { createClient } from '@clickhouse/client';
// ClickHouse Cloud
const client = createClient({
url: process.env.CLICKHOUSE_HOST, // https://<host>:8443
username: process.env.CLICKHOUSE_USER, // default
password: process.env.CLICKHOUSE_PASSWORD,
// ClickHouse Cloud requires TLS — the client handles it via https:// URL
});
// Self-hosted (no TLS)
const localClient = createClient({
url: 'http://localhost:8123',
username: 'default',
password: '',
});
Step 4: Verify Connection
async function verifyConnection() {
// Ping returns true if the server is reachable
const alive = await client.ping();
console.log('ClickHouse ping:', alive.success); // true
// Run a test query
const rs = await client.query({
query: 'SELECT version() AS ver, uptime() AS uptime_sec',
format: 'JSONEachRow',
});
const rows = await rs.json<{ ver: string; uptime_sec: number }>();
console.log('Server version:', rows[0].ver);
console.log('Uptime (sec):', rows[0].uptime_sec);
}
verifyConnection().catch(console.error);
Step 5: Python Alternative
import clickhouse_connect
client = clickhouse_connect.get_client(
host='abc123.us-east-1.aws.clickhouse.cloud',
port=8443,
username='default',
password='your-password-here',
secure=True,
)
result = client.query('SELECT version(), uptime()')
print(f"Version: {result.result_rows[0][0]}")
Connection Options Reference
| Option | Default | Description | ||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
url |
http://localhost:8123 |
Full URL including protocol and port
'Run ClickHouse locally with Docker, configure test fixtures, and iterate.
ReadWriteEditBash(npm:*)Bash(docker:*)Bash(docker-compose:*)
ClickHouse Local Dev LoopOverviewRun ClickHouse in Docker for local development with fast schema iteration, seed data, and integration testing using vitest. Prerequisites
InstructionsStep 1: Docker Compose Setup
Step 2: Init Script (Auto-Run on First Start)
Step 3: Seed Data Script
Step 4: Project Structure"Execute ClickHouse schema migrations \u2014 ALTER TABLE operations,\.
ReadWriteEditBash(npm:*)Bash(node:*)Bash(kubectl:*)
ClickHouse Migration Deep DiveOverviewPlan and execute ClickHouse schema migrations: column changes, engine migrations, ORDER BY modifications, and versioned migration runners. Prerequisites
InstructionsStep 1: Understanding ClickHouse DDLClickHouse ALTER operations are mutations — they run asynchronously and rewrite data parts in the background. This is fundamentally different from PostgreSQL/MySQL where ALTER is often instant or blocking.
Step 2: Column Operations
Step 3: Change ORDER BY (Requires Table Recreation)ClickHouse does not support create a new table and migrate data. 'Configure ClickHouse across dev, staging, and production with environment-specific.
ReadWriteEditBash(aws:*)Bash(gcloud:*)Bash(vault:*)
ClickHouse Multi-Environment SetupOverviewConfigure separate ClickHouse instances for development, staging, and production with proper secrets management, environment detection, and infrastructure-as-code. Prerequisites
InstructionsStep 1: Environment Strategy
Step 2: Configuration Module
Step 3: Client Factory'Monitor ClickHouse with Prometheus metrics, Grafana dashboards, system.
ReadWriteEdit
ClickHouse ObservabilityOverviewSet up comprehensive monitoring for ClickHouse using built-in system tables, Prometheus integration, Grafana dashboards, and alerting rules. Prerequisites
InstructionsStep 1: Key Metrics from System Tables
Step 2: Prometheus IntegrationClickHouse Cloud exposes a managed Prometheus endpoint:
Self-hosted — use clickhouse-exporter or built-in metrics endpoint: 'Optimize ClickHouse query performance with indexing, projections, settings.
ReadWriteEdit
ClickHouse Performance TuningOverviewDiagnose and fix ClickHouse performance issues using query analysis, proper indexing, projections, materialized views, and server settings tuning. Prerequisites
InstructionsStep 1: Diagnose Slow Queries
Step 2: ORDER BY Key OptimizationThe ORDER BY key is ClickHouse's primary performance lever. Queries that filter on the ORDER BY prefix skip entire granules (8192-row chunks).
Step 3: Data Skipping Indexes"Production readiness checklist for ClickHouse \u2014 server tuning,\.
ReadBash(kubectl:*)Bash(curl:*)Grep
ClickHouse Production ChecklistOverviewComprehensive go-live checklist for ClickHouse covering server tuning, schema design, backup configuration, monitoring, and operational readiness. Prerequisites
Checklist1. Schema & Engine Design
2. Server Configuration (Self-Hosted)
3. Backup Configuration
ClickHouse Cloud: Backups are automatic. Configure retention and frequency in the Cloud console under Service Settings.
4. Monitoring & Ale'Configure ClickHouse query concurrency, memory quotas, and connection.
ReadWriteEdit
ClickHouse Rate Limits & ConcurrencyOverviewClickHouse does not have REST API rate limits like a SaaS product. Instead, it has server-side concurrency limits, memory quotas, and per-user settings that control resource usage. This skill covers how to configure and work within those limits. Prerequisites
InstructionsStep 1: Understand Server-Side Limits
ClickHouse Cloud API limit: The Cloud management API (not the query interface) is limited to 10 requests per 10 seconds. Step 2: Configure Per-User Quotas
Step 3: Client-Side Connection Pooling
Step 4: Application-Level Concurrency Control"Production reference architecture for ClickHouse-backed applications\.
ReadGrep
ClickHouse Reference ArchitectureOverviewProduction-grade architecture for ClickHouse analytics platforms covering project layout, data flow, multi-tenancy, and operational patterns. Prerequisites
InstructionsStep 1: Project Structure
Step 2: Data Flow Architecture"Production-ready patterns for @clickhouse/client \u2014 streaming inserts,\.
ReadWriteEdit
ClickHouse SDK PatternsOverviewProduction patterns for error handling, and connection lifecycle management. Prerequisites
InstructionsPattern 1: Typed Query Helper
Note on parameterized queries: ClickHouse uses not Pattern 2: Streaming Insert (Backpressure-Safe)
Pattern 3: Batch Insert with Retry'Secure ClickHouse with user management, network restrictions, TLS, and.
ReadWriteGrep
ClickHouse Security BasicsOverviewSecure a ClickHouse deployment with SQL-based user management, network restrictions, TLS encryption, and query audit logging. Prerequisites
InstructionsStep 1: Create Restricted Users (SQL-Based RBAC)
Step 2: Use Roles for Permission Groups
Step 3: Row-Level Security
Step 4: Network Security'Upgrade ClickHouse server versions and @clickhouse/client SDK safely.
ReadWriteEditBash(npm:*)Bash(git:*)
ClickHouse Upgrade & MigrationOverviewSafely upgrade ClickHouse server and the with rollback procedures and breaking change detection. Prerequisites
InstructionsStep 1: Check Current Versions
Step 2: Review Changelog
Key breaking changes to watch for:
Step 3: Upgrade the Node.js Client
Common migration patterns:
Step 4: Upgrade ClickHouse ServerClickHouse Cloud: Upgrades happen automatically. Check release notes in the Cloud console. Self-hosted upgrade procedure: 'Ingest data into ClickHouse from webhooks, Kafka, and streaming sources.
ReadWriteEditBash(curl:*)
ClickHouse Data IngestionOverviewBuild data ingestion pipelines into ClickHouse from HTTP webhooks, Kafka, and streaming sources with proper batching, deduplication, and error handling. Prerequisites
InstructionsStep 1: Webhook Receiver with Batched Inserts
Step 2: Kafka Table Engine (Server-Side Ingestion)
SReady to use clickhouse-pack?Related Pluginssupabase-packComplete Supabase integration skill pack with 30 skills covering authentication, database, storage, realtime, edge functions, and production operations. Flagship+ tier vendor pack. vercel-packComplete Vercel integration skill pack with 30 skills covering deployments, edge functions, preview environments, performance optimization, and production operations. Flagship+ tier vendor pack. clay-packComplete Clay integration skill pack with 30 skills covering data enrichment, waterfall workflows, AI agents, and GTM automation. Flagship+ tier vendor pack. cursor-packComplete Cursor integration skill pack with 30 skills covering AI code editing, composer workflows, codebase indexing, and productivity features. Flagship+ tier vendor pack. exa-packComplete Exa integration skill pack with 30 skills covering neural search, semantic retrieval, web search API, and AI-powered discovery. Flagship+ tier vendor pack. firecrawl-packComplete Firecrawl integration skill pack with 30 skills covering web scraping, crawling, markdown conversion, and LLM-ready data extraction. Flagship+ tier vendor pack.
Tags
clickhousesaassdkintegration
|