Claude Code skill pack for Apify (18 skills)
Installation
Open Claude Code and run this command:
/plugin install apify-pack@claude-code-plugins-plus
Use --global to install for all projects, or --project for current project only.
Skills (18)
Configure CI/CD pipelines for Apify Actor builds and deployments.
Apify CI Integration
Overview
Automate Apify Actor builds, tests, and deployments using GitHub Actions. Covers test-on-PR, deploy-on-merge, integration testing with live Apify API, and Actor build verification.
Prerequisites
- GitHub repository with Actions enabled
- Apify API token stored as GitHub secret
- Actor code in the repository
Instructions
Step 1: Configure GitHub Secrets
# Store Apify token for CI
gh secret set APIFY_TOKEN --body "apify_api_YOUR_CI_TOKEN"
# Optional: separate tokens for test vs production
gh secret set APIFY_TOKEN_TEST --body "apify_api_test_token"
gh secret set APIFY_TOKEN_PROD --body "apify_api_prod_token"
Step 2: Create Test Workflow
Create .github/workflows/apify-test.yml:
name: Apify Tests
on:
pull_request:
branches: [main]
push:
branches: [main]
jobs:
unit-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npm run build
- run: npm test -- --coverage
integration-tests:
runs-on: ubuntu-latest
if: github.event_name == 'push' # Only on merge to main
env:
APIFY_TOKEN: ${{ secrets.APIFY_TOKEN_TEST }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- name: Verify Apify connection
run: |
curl -sf -H "Authorization: Bearer $APIFY_TOKEN" \
https://api.apify.com/v2/users/me | jq '.data.username'
- name: Run integration tests
run: npm run test:integration
timeout-minutes: 10
Step 3: Create Deploy Workflow
Create .github/workflows/apify-deploy.yml:
name: Deploy Actor
on:
push:
branches: [main]
paths:
- 'src/**'
- 'package.json'
- 'package-lock.json'
- '.actor/**'
workflow_dispatch: # Manual trigger
jobs:
deploy:
runs-on: ubuntu-latest
env:
APIFY_TOKEN: ${{ secrets.APIFY_TOKEN_PROD }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npm run build
- run: npm test
- name: Install Apify CLI
run: npm install -g apify-cli
- name: Login to Apify
run: apify login --token $APIFY_TOKEN
- name: Push Actor to Apify
run: apify push
- name: Verify deployment
run: |
# Get latest build status
ACTOR_ID=$(jq -Diagnose and fix common Apify Actor and API errors.
Apify Common Errors
Overview
Quick diagnostic reference for the most common Apify errors. Covers Actor run failures, API errors, proxy problems, anti-bot blocks, and platform-specific issues.
Prerequisites
- Apify token configured
- Access to Apify Console for log review
Error Reference
1. Actor Run Status: FAILED
Status: FAILED
StatusMessage: Process exited with code 1
Cause: Unhandled exception in Actor code.
Diagnosis:
// Check run log via API
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.run('RUN_ID').get();
console.log(run.statusMessage);
// Get the run log
const log = await client.run('RUN_ID').log().get();
console.log(log); // Full stdout/stderr output
Fix: Read the log, find the stack trace, fix the bug. Common causes:
- Missing input validation (
Actor.getInput()returnsnull) - Selector returns no results (page structure changed)
- Unhandled promise rejection
2. Actor Run Status: TIMED-OUT
Status: TIMED-OUT
StatusMessage: Actor timed out after 3600 seconds
Cause: Actor exceeded its configured timeout.
Fix:
// Increase timeout when calling via client
const run = await client.actor('user/actor').call(input, {
timeout: 7200, // 2 hours in seconds
});
// Or set in Actor configuration on platform
// Console > Actor > Settings > Timeout
Prevention: Reduce workload scope or increase maxConcurrency.
3. HTTP 429 — Rate Limited
ApifyApiError: Rate limit exceeded (429)
Cause: More than 60 requests/second to a single API resource.
Fix: The apify-client package retries 429s automatically (up to 8 retries with exponential backoff). If you still hit limits:
// Add delays between API calls
import { sleep } from 'crawlee';
for (const item of items) {
await client.dataset(dsId).pushItems([item]);
await sleep(100); // 100ms between calls
}
// Better: batch push items (one API call)
await client.dataset(dsId).pushItems(items); // Up to 9MB per call
4. HTTP 401 — Unauthorized
ApifyApiError: Authentication required (401)
Cause: Invalid, expired, or missing API token.
Diagnosis:
# Test your token
curl -s -H "Authorization: Bearer $APIFY_TOKEN" \
https://api.apify.com/v2/users/me | jq '.data.username'
Fix:
Build a complete web scraping Actor with Crawlee and deploy to Apify.
Apify Core Workflow A — Build & Deploy a Scraper
Overview
End-to-end workflow: define input schema, build a Crawlee-based Actor, extract structured data, store results in datasets, test locally, and deploy to Apify platform. This is the primary money-path workflow for Apify.
Prerequisites
npm install apify crawleein your projectnpm install -g apify-cliandapify logincompleted- Familiarity with
apify-sdk-patterns
Instructions
Step 1: Define Input Schema
Create .actor/INPUT_SCHEMA.json:
{
"title": "E-Commerce Scraper",
"type": "object",
"schemaVersion": 1,
"properties": {
"startUrls": {
"title": "Start URLs",
"type": "array",
"description": "Product listing page URLs to scrape",
"editor": "requestListSources",
"prefill": [{ "url": "https://example-store.com/products" }]
},
"maxItems": {
"title": "Max items",
"type": "integer",
"description": "Maximum number of products to scrape",
"default": 100,
"minimum": 1,
"maximum": 10000
},
"proxyConfig": {
"title": "Proxy configuration",
"type": "object",
"description": "Select proxy to use",
"editor": "proxy",
"default": { "useApifyProxy": true }
}
},
"required": ["startUrls"]
}
Step 2: Build the Actor with Router Pattern
// src/main.ts
import { Actor } from 'apify';
import { CheerioCrawler, createCheerioRouter, Dataset, log } from 'crawlee';
interface ProductInput {
startUrls: { url: string }[];
maxItems?: number;
proxyConfig?: { useApifyProxy: boolean; groups?: string[] };
}
interface Product {
url: string;
name: string;
price: number | null;
currency: string;
description: string;
imageUrl: string | null;
inStock: boolean;
scrapedAt: string;
}
const router = createCheerioRouter();
// LISTING pages — extract product links
router.addDefaultHandler(async ({ request, $, enqueueLinks, log }) => {
log.info(`Listing page: ${request.url}`);
await enqueueLinks({
selector: 'a.product-card',
label: 'PRODUCT',
});
// Handle pagination
await enqueueLinks({
selector: 'a.next-page',
label: 'LISTING',
});
});
// PRODUCT detail pages — extract structured data
router.addHandler('PRODUCT', async ({ request, $, log }) => {
log.Manage Apify datasets, key-value stores, and request queues programmatically.
Apify Core Workflow B — Storage & Pipelines
Overview
Manage Apify's three storage types (datasets, key-value stores, request queues) and orchestrate multi-Actor pipelines. Covers CRUD operations, data export, pagination, and chaining Actors together.
Prerequisites
apify-clientinstalled and authenticated- Familiarity with
apify-core-workflow-a
Storage Types at a Glance
| Storage | Best For | Analogy | Retention |
|---|---|---|---|
| Dataset | Lists of similar items (products, pages) | Append-only table | 7 days (unnamed) |
| Key-Value Store | Config, screenshots, summaries, any file | S3 bucket | 7 days (unnamed) |
| Request Queue | URLs to crawl (managed by Crawlee) | Job queue | 7 days (unnamed) |
Named storages persist indefinitely. Unnamed (default run) storages expire after 7 days.
Instructions
Step 1: Dataset Operations
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
// Create a named dataset (persists indefinitely)
const dataset = await client.datasets().getOrCreate('product-catalog');
const dsClient = client.dataset(dataset.id);
// Push items (single or batch)
await dsClient.pushItems([
{ sku: 'ABC123', name: 'Widget', price: 9.99 },
{ sku: 'DEF456', name: 'Gadget', price: 19.99 },
]);
// List items with pagination
const page1 = await dsClient.listItems({ limit: 100, offset: 0 });
console.log(`Total items: ${page1.total}, this page: ${page1.items.length}`);
// Iterate all items (handles pagination automatically)
let offset = 0;
const limit = 1000;
const allItems = [];
while (true) {
const { items } = await dsClient.listItems({ limit, offset });
if (items.length === 0) break;
allItems.push(...items);
offset += items.length;
}
// Download in various formats
const csvBuffer = await dsClient.downloadItems('csv');
const jsonBuffer = await dsClient.downloadItems('json');
const xlsxBuffer = await dsClient.downloadItems('xlsx');
// Download filtered/transformed
const filtered = await dsClient.downloadItems('json', {
fields: ['sku', 'name', 'price'], // Only these fields
unwind: 'variants', // Flatten nested arrays
desc: true, // Reverse order
});
// Get dataset info (item count, size)
const info = await dsClient.get();
console.log(`${info.itemCount} items, ${info.actSize} bytes`);
Step 2: Key-Value Store Operations
// Create a named store
const store = await client.keyValueStores().getOrCreate('scraper-config');
cOptimize Apify platform costs through memory tuning, compute unit management, and proxy budgeting.
Apify Cost Tuning
Overview
Apify charges based on compute units (CU), proxy traffic (GB), and storage. One CU = 1 GB memory running for 1 hour. This skill covers how to analyze, reduce, and monitor costs across all three dimensions.
Pricing Model
Compute Units (CU)
CU = (Memory in GB) x (Duration in hours)
Example: 2048 MB (2 GB) running for 30 minutes = 2 x 0.5 = 1 CU
| Plan | CU Price | Included CUs |
|---|---|---|
| Free | N/A | Limited trial |
| Starter | $0.30/CU | Varies by plan |
| Scale | $0.25/CU | Volume discounts |
| Enterprise | Custom | Negotiated |
Proxy Costs
| Proxy Type | Cost | Use Case |
|---|---|---|
| Datacenter | Included in plan | Non-blocking sites |
| Residential | ~$12/GB | Sites that block datacenters |
| Google SERP | ~$3.50/1000 queries | Google search results |
Storage
Named datasets and KV stores persist indefinitely but count against storage quota. Unnamed (default run) storage expires after 7 days.
Instructions
Step 1: Analyze Current Costs
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
async function analyzeActorCosts(actorId: string, days = 30) {
const { items: runs } = await client.actor(actorId).runs().list({
limit: 1000,
desc: true,
});
const cutoff = new Date(Date.now() - days * 86400_000);
const recentRuns = runs.filter(r => new Date(r.startedAt) > cutoff);
let totalCu = 0;
let totalUsd = 0;
let totalDurationSecs = 0;
for (const run of recentRuns) {
totalCu += run.usage?.ACTOR_COMPUTE_UNITS ?? 0;
totalUsd += run.usageTotalUsd ?? 0;
totalDurationSecs += run.stats?.runTimeSecs ?? 0;
}
const avgCuPerRun = recentRuns.length > 0 ? totalCu / recentRuns.length : 0;
const avgCostPerRun = recentRuns.length > 0 ? totalUsd / recentRuns.length : 0;
console.log(`=== Cost Analysis: ${actorId} (last ${days} days) ===`);
console.log(`Runs: ${recentRuns.length}`);
console.log(`Total CU: ${totalCu.toFixed(4)}`);
console.log(`Total cost: $${totalUsd.toFixed(4)}`);
console.log(`Avg CU/run: ${avgCuPerRun.toFixed(4)}`);
console.log(`Avg cost/run: $${avgCostPerRun.toFixed(4)}`);
console.log(`Total duration: ${(totalDurationSecs / 3600).toFixed(2)} hours`);
// Find the most expensive run
const mostExpensive = recentRuns.reduce(
(max, r) => ((r.usageTotalUsd ?? 0) > (max.usageTotalUsd ?? 0) ? r : max),
recentRuns[0],
);
if (mostExpensive) {
conCollect Apify debug evidence for support tickets and troubleshooting.
Apify Debug Bundle
Overview
Collect all diagnostic information needed to troubleshoot failed Actor runs and prepare Apify support tickets. Pulls run metadata, logs, dataset samples, and environment info into a single bundle.
Prerequisites
apify-clientinstalledAPIFY_TOKENconfigured- A failed or problematic run ID to investigate
Instructions
Step 1: Investigate a Failed Run
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
async function investigateRun(runId: string) {
// Get run details
const run = await client.run(runId).get();
console.log('=== Run Summary ===');
console.log(`Status: ${run.status}`);
console.log(`Message: ${run.statusMessage}`);
console.log(`Started: ${run.startedAt}`);
console.log(`Finished: ${run.finishedAt}`);
console.log(`Memory MB: ${run.options?.memoryMbytes}`);
console.log(`Timeout sec: ${run.options?.timeoutSecs}`);
console.log(`Build: ${run.buildNumber}`);
console.log(`Origin: ${run.meta?.origin}`);
console.log(`CU used: ${run.usage?.ACTOR_COMPUTE_UNITS?.toFixed(4)}`);
console.log(`Cost USD: $${run.usageTotalUsd?.toFixed(4)}`);
// Get dataset stats
if (run.defaultDatasetId) {
const ds = await client.dataset(run.defaultDatasetId).get();
console.log(`\nDataset items: ${ds.itemCount}`);
}
// Get run log (last 5000 chars)
const log = await client.run(runId).log().get();
console.log('\n=== Last 2000 chars of log ===');
console.log(log?.slice(-2000));
return { run, log };
}
Step 2: Create Debug Bundle Script
#!/bin/bash
# apify-debug-bundle.sh <RUN_ID>
RUN_ID="${1:?Usage: apify-debug-bundle.sh <RUN_ID>}"
BUNDLE_DIR="apify-debug-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE_DIR"
echo "Collecting debug info for run $RUN_ID..."
# Environment info
{
echo "=== Environment ==="
echo "Date: $(date -u)"
echo "Node: $(node --version 2>/dev/null || echo 'not found')"
echo "npm: $(npm --version 2>/dev/null || echo 'not found')"
echo ""
echo "=== Apify Packages ==="
npm list apify-client apify crawlee 2>/dev/null || echo "No packages found"
echo ""
echo "=== Apify CLI ==="
apify --version 2>/dev/null || echo "CLI not installed"
} > "$BUNDLE_DIR/environment.txt"
# Run details via API
curl -sf -H "Authorization: Bearer $APIFY_TOKEN" \
"https://api.apify.com/v2/actor-runs/$RUN_ID" | \
jq '.data | {id, actId, status, statusMessage, startedAt, finishedAt,
options: {memoryMbytes: .options.memoryMbytes, timeoutSecs: .opDeploy Apify Actors and integrate scraping into external applications.
Apify Deploy Integration
Overview
Deploy Actors to the Apify platform and integrate their results into external applications. Covers apify push deployment, API-triggered runs from web apps, scheduled scraping with data pipelines, and platform-specific integration patterns.
Prerequisites
- Actor tested locally (
apify run) apify logincompleted- Target application ready for integration
Instructions
Step 1: Deploy Actor to Platform
# Push Actor code to Apify
apify push
# Push to a specific Actor (creates if doesn't exist)
apify push username/my-scraper
# Pull an existing Actor to modify
apify pull username/existing-actor
Step 2: Integrate with a Web Application
The most common pattern: trigger an Actor from your app and consume results.
// src/services/apify.ts
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
interface ScrapeResult {
url: string;
title: string;
price: number;
inStock: boolean;
}
/**
* Run a scraping Actor and return typed results.
* Blocks until the Actor finishes (synchronous pattern).
*/
export async function scrapeProducts(urls: string[]): Promise<ScrapeResult[]> {
const run = await client.actor('username/product-scraper').call({
startUrls: urls.map(url => ({ url })),
maxItems: 500,
}, {
memory: 2048,
timeout: 600, // 10 minutes
});
if (run.status !== 'SUCCEEDED') {
throw new Error(`Scrape failed: ${run.status} — ${run.statusMessage}`);
}
const { items } = await client.dataset(run.defaultDatasetId).listItems();
return items as ScrapeResult[];
}
/**
* Start a scraping Actor without waiting (async pattern).
* Returns run ID for later polling.
*/
export async function startScrape(urls: string[]): Promise<string> {
const run = await client.actor('username/product-scraper').start({
startUrls: urls.map(url => ({ url })),
});
return run.id;
}
/**
* Check if a run has finished and get results.
*/
export async function getScrapeResults(runId: string): Promise<{
status: string;
items?: ScrapeResult[];
}> {
const run = await client.run(runId).get();
if (run.status === 'RUNNING' || run.status === 'READY') {
return { status: run.status };
}
if (run.status === 'SUCCEEDED') {
const { items } = await client.dataset(run.defaultDatasetId).listItems();
return { status: 'SUCCEEDED', items: items as ScrapeResult[] };
}
return { status: run.status };
}
Step 3: Next.js API Route Integration
// app/api/scrape/route.ts (Next.js App Router)
import { NextResponse } from 'next/server';
import { ApifyClient } from 'apify-cRun your first Apify Actor and retrieve results via apify-client.
Apify Hello World
Overview
Run a public Actor from the Apify Store, wait for it to finish, and retrieve the scraped data. This demonstrates the fundamental call-wait-collect pattern used in every Apify integration.
Prerequisites
npm install apify-clientcompletedAPIFY_TOKENenvironment variable set- See
apify-install-authif not ready
Core Pattern: Call Actor, Get Data
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
// 1. Run an Actor and wait for it to finish
const run = await client.actor('apify/website-content-crawler').call({
startUrls: [{ url: 'https://docs.apify.com/academy' }],
maxCrawlPages: 5,
});
// 2. Retrieve results from the default dataset
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Crawled ${items.length} pages:`);
items.forEach(item => {
console.log(` - ${item.url}: ${item.text?.substring(0, 80)}...`);
});
Instructions
Step 1: Create the Script
Create hello-apify.ts (or .js) with the code above.
Step 2: Run It
# With tsx (recommended)
npx tsx hello-apify.ts
# Or with Node.js (plain JS)
node hello-apify.js
Step 3: Understand the Output
The Actor runs on Apify's cloud infrastructure. When it finishes:
run.id— unique run identifierrun.status—SUCCEEDED,FAILED,TIMED-OUT, orABORTEDrun.defaultDatasetId— ID of the dataset containing resultsrun.defaultKeyValueStoreId— ID of the KV store with metadata
Popular Starter Actors
| Actor ID | Purpose | Typical Input |
|---|---|---|
apify/website-content-crawler |
Crawl and extract text | { startUrls, maxCrawlPages } |
apify/web-scraper |
General-purpose scraper | { startUrls, pageFunction } |
apify/cheerio-scraper |
Fast HTML scraper | { startUrls, pageFunction } |
apify/google-search-scraper |
Google SERP results | { queries, maxPagesPerQuery } |
Synchronous vs Asynchronous Runs
// SYNCHRONOUS — .call() waits for the Actor to finish (simple, blocking)
const run = await client.actor('apify/web-scraper').call(input);
// ASYNCHRONOUS — .start() returns immediately, poll later
const run = await client.actor('apify/web-scraper').start(input);
// ... do other Install and configure Apify SDK, CLI, and API client authentication.
Apify Install & Auth
Overview
Set up the Apify ecosystem: the apify-client JS library (for calling Actors remotely), the apify SDK (for building Actors), the Apify CLI (for deploying), and Crawlee (for crawling). Each package serves a different purpose.
Package Map
| Package | npm | Purpose |
|---|---|---|
apify-client |
npm i apify-client |
Call Actors, manage datasets/KV stores from external apps |
apify |
npm i apify |
Build Actors (includes Actor.init(), Actor.pushData()) |
crawlee |
npm i crawlee |
Crawler framework (Cheerio, Playwright, Puppeteer crawlers) |
apify-cli |
npm i -g apify-cli |
CLI for apify login, apify run, apify push |
Prerequisites
- Node.js 18+ (required by SDK v3+)
- Apify account at https://console.apify.com
- API token from Settings > Integrations in Apify Console
Instructions
Step 1: Install Packages
# For CALLING existing Actors from your app:
npm install apify-client
# For BUILDING your own Actors:
npm install apify crawlee
# For CLI deployment:
npm install -g apify-cli
Step 2: Configure Authentication
# Option A: Environment variable (recommended for apps)
export APIFY_TOKEN="apify_api_YOUR_TOKEN_HERE"
# Option B: .env file (add .env to .gitignore)
echo 'APIFY_TOKEN=apify_api_YOUR_TOKEN_HERE' >> .env
# Option C: CLI login (for interactive development)
apify login
# Paste your token when prompted
Step 3: Verify Connection
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({
token: process.env.APIFY_TOKEN,
});
// List your Actors to confirm auth works
const { items } = await client.actors().list();
console.log(`Authenticated. You have ${items.length} Actors.`);
Step 4: Verify CLI (if installed)
apify login --token YOUR_TOKEN
apify info # Shows your account info
Auth Token Details
- Token format:
apifyapiprefix followed by alphanumeric string - Pass via
Authorization: Bearerheader (REST API) - Pass via
tokenconstructor option (JS client) - The
APIFY_TOKENenv var is auto-detected by bothapify-clientandapifySDK
Environment Variable Reference
| Variable | Purpose | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|