snowflake-prod-checklist

'Execute Snowflake production readiness checklist with monitoring and

v1.0.0

Jeremy Longshore

MIT

4 Tools

snowflake-pack Plugin

saas packs Category

Allowed Tools
        ReadBash(kubectl:*)Bash(curl:*)Grep
      

Provided by Plugin

snowflake-pack

Claude Code skill pack for Snowflake data platform — snowflake-sdk, SQL, Snowpark (30 skills)

saas packs v1.0.0

View Plugin

Installation

This skill is included in the snowflake-pack plugin:

/plugin install snowflake-pack@claude-code-plugins-plus

Click to copy

Instructions

Snowflake Production Checklist

Overview

Complete checklist for deploying Snowflake data pipelines and integrations to production.

Prerequisites

Staging environment validated
Production Snowflake account configured
Resource monitors in place
Monitoring infrastructure ready

Pre-Deployment Checklist

Authentication & Secrets

[ ] Service accounts use key pair auth (not password)
[ ] Private keys stored in secret manager (not files/env vars)
[ ] Key rotation procedure documented and tested
[ ] Network policy applied to production account
[ ] Connection parameters use production account identifier

Warehouse Configuration

[ ] Production warehouses created with appropriate sizing
[ ] Auto-suspend configured (60-300s based on workload)
[ ] Auto-resume enabled
[ ] Resource monitors with credit quotas and alerts
[ ] Separate warehouses for ETL, analytics, and dashboard workloads


-- Production warehouse setup
CREATE WAREHOUSE IF NOT EXISTS PROD_ETL_WH
  WAREHOUSE_SIZE = 'LARGE'
  AUTO_SUSPEND = 120
  AUTO_RESUME = TRUE;

CREATE WAREHOUSE IF NOT EXISTS PROD_ANALYTICS_WH
  WAREHOUSE_SIZE = 'MEDIUM'
  MIN_CLUSTER_COUNT = 1
  MAX_CLUSTER_COUNT = 3
  SCALING_POLICY = 'STANDARD'
  AUTO_SUSPEND = 300
  AUTO_RESUME = TRUE;

-- Resource monitor with alerts
CREATE OR REPLACE RESOURCE MONITOR prod_monitor
  WITH CREDIT_QUOTA = 1000
  FREQUENCY = MONTHLY
  START_TIMESTAMP = IMMEDIATELY
  TRIGGERS
    ON 75 PERCENT DO NOTIFY
    ON 90 PERCENT DO NOTIFY
    ON 100 PERCENT DO SUSPEND
    ON 110 PERCENT DO SUSPEND_IMMEDIATE;

ALTER WAREHOUSE PROD_ETL_WH SET RESOURCE_MONITOR = prod_monitor;
ALTER WAREHOUSE PROD_ANALYTICS_WH SET RESOURCE_MONITOR = prod_monitor;

Data Pipeline Readiness

[ ] All tasks resumed and running on schedule
[ ] Streams not stale (check with SHOW STREAMS)
[ ] Snowpipe notifications configured and verified
[ ] COPY INTO error handling set (ONERROR = 'CONTINUE' or 'SKIPFILE')
[ ] Data retention set appropriately (DATARETENTIONTIMEINDAYS)

Query & Performance

[ ] Critical queries tested at production data volume
[ ] Clustering keys set on large tables (>1TB)
[ ] Statement timeout configured per warehouse
[ ] Result caching enabled (USECACHEDRESULT = TRUE)


-- Set statement timeout for production
ALTER WAREHOUSE PROD_ETL_WH SET STATEMENT_TIMEOUT_IN_SECONDS = 3600;
ALTER WAREHOUSE PROD_ANALYTICS_WH SET STATEMENT_TIMEOUT_IN_SECONDS = 600;

-- Enable query result caching (default is ON)
ALTER ACCOUNT SET USE_CACHED_RESULT = TRUE;

Access Control

[ ] RBAC hierarchy follows Snowflake best practices
[ ] No users have ACCOUNTADMIN as default role
[ ] Service accounts have minimal required privileges
[ ] Object ownership assigned to functional roles


-- Verify no one defaults to ACCOUNTADMIN
SELECT name, default_role
FROM SNOWFLAKE.ACCOUNT_USAGE.USERS
WHERE default_role = 'ACCOUNTADMIN' AND disabled = 'false';

Monitoring & Alerting

[ ] Query failure alerts configured
[ ] Warehouse credit consumption dashboards
[ ] Task failure notifications
[ ] Login failure monitoring


-- Create alert for task failures (Snowflake Alerts feature)
CREATE OR REPLACE ALERT task_failure_alert
  WAREHOUSE = PROD_ANALYTICS_WH
  SCHEDULE = '5 MINUTE'
  IF (EXISTS (
    SELECT *
    FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY(
      SCHEDULED_TIME_RANGE_START => DATEADD(minutes, -10, CURRENT_TIMESTAMP())
    ))
    WHERE state = 'FAILED'
  ))
  THEN
    CALL SYSTEM$SEND_EMAIL(
      'prod_notifications',
      'oncall@company.com',
      'Snowflake Task Failure',
      'One or more tasks failed in the last 10 minutes. Check TASK_HISTORY.'
    );

ALTER ALERT task_failure_alert RESUME;

Disaster Recovery

[ ] Time Travel retention set (Enterprise: up to 90 days)
[ ] Database replication configured for critical databases
[ ] Failover tested to secondary account/region
[ ] Backup procedure documented


-- Enable 14-day Time Travel on production tables
ALTER TABLE prod_db.core.orders SET DATA_RETENTION_TIME_IN_DAYS = 14;

-- Enable database replication
ALTER DATABASE prod_db ENABLE REPLICATION TO ACCOUNTS myorg.secondary_account;

Health Check Query


-- Run this before and after deployment
SELECT 'Warehouses' AS check_type,
       COUNT(*) AS count,
       COUNT_IF(state = 'STARTED') AS active
FROM TABLE(INFORMATION_SCHEMA.WAREHOUSES())
UNION ALL
SELECT 'Tasks', COUNT(*), COUNT_IF(state = 'started')
FROM TABLE(INFORMATION_SCHEMA.TASKS())
UNION ALL
SELECT 'Streams', COUNT(*), COUNT_IF(stale = FALSE)
FROM TABLE(INFORMATION_SCHEMA.STREAMS())
UNION ALL
SELECT 'Pipes', COUNT(*), COUNT_IF(is_autoingest_enabled = 'true')
FROM TABLE(INFORMATION_SCHEMA.PIPES());

Rollback Procedure


-- Use Time Travel to revert a table
CREATE OR REPLACE TABLE prod_db.core.orders
  CLONE prod_db.core.orders AT (TIMESTAMP => '2026-03-21 12:00:00'::TIMESTAMP_NTZ);

-- Suspend problematic tasks
ALTER TASK transform_orders SUSPEND;

-- Revert warehouse changes
ALTER WAREHOUSE PROD_ETL_WH SET WAREHOUSE_SIZE = 'MEDIUM';

Error Handling

Alert	Condition	Severity
Task failure	`state = 'FAILED'` in TASK_HISTORY	P1
Stream stale	`stale = TRUE` in SHOW STREAMS	P1
Credit quota >90%	Resource monitor trigger	P2
Query queue >5min	`avgqueuedload > 0` sustained	P2
Login failures spike	>10 failures/hour	P2

Resources

Next Steps

For version upgrades, see snowflake-upgrade-migration.

Allowed Tools