Complete Firecrawl integration skill pack with 30 skills covering web scraping, crawling, markdown conversion, and LLM-ready data extraction. Flagship+ tier vendor pack.
Installation
Open Claude Code and run this command:
/plugin install firecrawl-pack@claude-code-plugins-plus
Use --global to install for all projects, or --project for current project only.
Skills (30)
Debug hard-to-diagnose Firecrawl issues with systematic isolation and evidence collection.
Firecrawl Advanced Troubleshooting
Overview
Deep debugging techniques for complex Firecrawl issues: empty scrapes on certain domains, crawl jobs that never complete, inconsistent extraction results, and webhook delivery failures. Uses systematic layer-by-layer isolation.
Instructions
Step 1: Minimal Reproduction
import FirecrawlApp from "@mendable/firecrawl-js";
// Strip everything down to the simplest failing case
async function minimalRepro() {
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// Test 1: Can we scrape at all?
console.log("Test 1: Basic scrape");
const basic = await firecrawl.scrapeUrl("https://example.com", {
formats: ["markdown"],
});
console.log(` Success: ${basic.success}, Length: ${basic.markdown?.length}`);
// Test 2: Does the target URL work?
console.log("Test 2: Target URL");
const target = await firecrawl.scrapeUrl("https://YOUR-FAILING-URL.com", {
formats: ["markdown"],
});
console.log(` Success: ${target.success}, Length: ${target.markdown?.length}`);
// Test 3: With waitFor for JS rendering
console.log("Test 3: With JS wait");
const withWait = await firecrawl.scrapeUrl("https://YOUR-FAILING-URL.com", {
formats: ["markdown"],
waitFor: 10000,
onlyMainContent: true,
});
console.log(` Success: ${withWait.success}, Length: ${withWait.markdown?.length}`);
// Test 4: With actions
console.log("Test 4: With actions");
const withActions = await firecrawl.scrapeUrl("https://YOUR-FAILING-URL.com", {
formats: ["markdown", "screenshot"],
actions: [
{ type: "wait", milliseconds: 3000 },
{ type: "scroll", direction: "down" },
{ type: "wait", milliseconds: 2000 },
],
});
console.log(` Success: ${withActions.success}, Length: ${withActions.markdown?.length}`);
// Screenshot will show what Firecrawl actually sees
}
Step 2: Layer-by-Layer Isolation
async function diagnose(url: string) {
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY! });
const results: Array<{ test: string; pass: boolean; detail: string }> = [];
// Layer 1: API connectivity
try {
await firecrawl.scrapeUrl("https://example.com", { formats: ["markdown"] });
results.push({ test: "API connectivity", pass: true, detail: "OK" });
} catch (e: any) {
results.push({ test: "API connectivity", pass: false, detail: `${e.statusCode}: ${e.message}` });
return results; // can't continue
}
// Layer 2: Target URL accessibility
try {
const result = await firecrawl.scrapeUrl(url, { formats: ["markdown"] });
Choose and implement Firecrawl architecture patterns for different scales and use cases.
Firecrawl Architecture Variants
Overview
Three deployment architectures for Firecrawl at different scales: on-demand scraping for simple use cases, scheduled crawl pipelines for content monitoring, and real-time ingestion pipelines for AI/RAG applications. Choose based on volume, latency requirements, and cost budget.
Decision Matrix
| Factor | On-Demand | Scheduled Pipeline | Real-Time Pipeline |
|---|---|---|---|
| Volume | < 500/day | 500-10K/day | 10K+/day |
| Latency | Sync (2-10s) | Async (hours) | Async (minutes) |
| Use Case | Single page lookup | Site monitoring | Knowledge base, RAG |
| Credit Control | Per-request | Per-crawl budget | Credit pipeline |
| Complexity | Low | Medium | High |
Instructions
Architecture 1: On-Demand Scraping
User Request → Backend API → firecrawl.scrapeUrl → Clean Content → Response
Best for: chatbots, content preview, single-page extraction.
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// Simple API endpoint
app.post("/api/scrape", async (req, res) => {
const { url } = req.body;
const result = await firecrawl.scrapeUrl(url, {
formats: ["markdown"],
onlyMainContent: true,
waitFor: 3000,
});
res.json({
title: result.metadata?.title,
content: result.markdown,
url: result.metadata?.sourceURL,
});
});
// With LLM extraction
app.post("/api/extract", async (req, res) => {
const { url, schema } = req.body;
const result = await firecrawl.scrapeUrl(url, {
formats: ["extract"],
extract: { schema },
});
res.json({ data: result.extract });
});
Architecture 2: Scheduled Crawl Pipeline
Scheduler (cron) → Crawl Queue → firecrawl.asyncCrawlUrl → Result Store
│
▼
Content Processor → Search Index
Best for: documentation monitoring, content indexing, competitive analysis.
import cron from "node-cron";
interface CrawlTarget {
id: string;
url: string;
maxPages: number;
paths?: string[];
schedule: string; // cron expression
}
const targets: CrawlTarget[] = [
{ id: "docs", url: "https://docs.example.com", maxPages: 100, paths: ["/docs/*"], schedule: "0 2 * * *" },
{ id: "blog", url: "https://blog.example.com", mConfigure Firecrawl CI/CD integration with GitHub Actions and automated scraping tests.
Firecrawl CI Integration
Overview
Set up CI/CD pipelines to test Firecrawl integrations automatically. Covers GitHub Actions workflow, API key secrets management, integration tests that validate real scraping, and mock-based unit tests for PRs.
Prerequisites
- GitHub repository with Actions enabled
- Firecrawl API key for testing (separate from production)
@mendable/firecrawl-jsinstalled
Instructions
Step 1: Configure Secrets
set -euo pipefail
# Store test API key in GitHub Actions secrets
gh secret set FIRECRAWL_API_KEY --body "fc-test-key-here"
Step 2: GitHub Actions Workflow
# .github/workflows/firecrawl-tests.yml
name: Firecrawl Integration Tests
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
unit-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"
- run: npm ci
- run: npm test -- --coverage
# Unit tests use mocked SDK — no API key needed
integration-tests:
runs-on: ubuntu-latest
if: github.event_name == 'push' # Only on merge, not PRs (saves credits)
env:
FIRECRAWL_API_KEY: ${{ secrets.FIRECRAWL_API_KEY }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"
- run: npm ci
- run: npm run test:integration
timeout-minutes: 5
Step 3: Integration Tests
// tests/firecrawl.integration.test.ts
import { describe, it, expect } from "vitest";
import FirecrawlApp from "@mendable/firecrawl-js";
const SKIP = !process.env.FIRECRAWL_API_KEY;
describe.skipIf(SKIP)("Firecrawl Integration", () => {
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
it("scrapes a page to markdown", async () => {
const result = await firecrawl.scrapeUrl("https://example.com", {
formats: ["markdown"],
});
expect(result.success).toBe(true);
expect(result.markdown).toBeDefined();
expect(result.markdown!.length).toBeGreaterThan(50);
expect(result.metadata?.title).toBeDefined();
}, 30000);
it("maps a site for URLs", async () => {
const result = await firecrawl.mapUrl("https://docs.firecrawl.dev");
expect(result.links).toBeDefined();
expect(result.links!.length).toBeGreaterThan(0);
}, 30000);
it("extracts structured data", async () => {
const result = await firecrawl.scrapeUrl("https://example.com", {
formats: ["extract"],
extract: {
schema: {
Diagnose and fix Firecrawl common errors and API response codes.
Firecrawl Common Errors
Overview
Quick-reference diagnostic guide for the most common Firecrawl API errors. Covers HTTP status codes, SDK exceptions, empty content, and crawl job failures with concrete fixes.
Prerequisites
- Firecrawl SDK installed (
@mendable/firecrawl-js) FIRECRAWLAPIKEYenvironment variable set- Access to error logs or console output
Error Reference
401 Unauthorized — Invalid API Key
Error: Unauthorized. Invalid API key.
Cause: API key is missing, malformed, or revoked.
set -euo pipefail
# Verify key is set and starts with fc-
echo "Key prefix: ${FIRECRAWL_API_KEY:0:3}"
# Test directly
curl -s https://api.firecrawl.dev/v1/scrape \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com","formats":["markdown"]}' | jq .success
Fix: Regenerate key at firecrawl.dev/app. Ensure it starts with fc-.
402 Payment Required — Credits Exhausted
Error: Payment required. You have exceeded your credit limit.
Cause: Monthly or plan credits are used up.
set -euo pipefail
# Check remaining credits
curl -s https://api.firecrawl.dev/v1/team/credits \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" | jq .
Fix: Upgrade plan or wait for monthly credit reset. Failed requests do not consume credits.
429 Too Many Requests — Rate Limited
Error: Rate limit exceeded. Retry after X seconds.
Cause: Too many concurrent requests or requests per minute.
// Fix: implement exponential backoff
async function scrapeWithBackoff(url: string, retries = 3) {
for (let i = 0; i < retries; i++) {
try {
return await firecrawl.scrapeUrl(url, { formats: ["markdown"] });
} catch (err: any) {
if (err.statusCode !== 429 || i === retries - 1) throw err;
const delay = 1000 * Math.pow(2, i);
console.warn(`Rate limited, retrying in ${delay}ms...`);
await new Promise(r => setTimeout(r, delay));
}
}
}
Fix: Respect Retry-After header. Queue requests with p-queue.
Empty Markdown — JS Content Not Rendered
const result = await firecrawl.scrapeUrl("https://spa-app.com");
console.log(result.markdown); // "" or just nav text
Cause: Single-page app or JS-heavy site needs time to render.
Execute Firecrawl primary workflow: scrape and crawl websites into LLM-ready markdown.
ReadWriteEditBash(npm:*)Grep
Firecrawl Core Workflow A — Scrape & Crawl
Overview
Primary workflow for Firecrawl: convert websites into clean LLM-ready markdown. Covers single-page scraping with scrapeUrl, multi-page crawling with crawlUrl, async crawl jobs with polling, and content processing pipelines.
Prerequisites
@mendable/firecrawl-js installed
FIRECRAWLAPIKEY environment variable set
- Target URL(s) identified
Instructions
Step 1: Single-Page Scrape
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// Scrape a single page to clean markdown
const result = await firecrawl.scrapeUrl("https://docs.example.com/api", {
formats: ["markdown"],
onlyMainContent: true, // strips nav, footer, sidebars
waitFor: 2000, // wait 2s for JS to render
});
if (result.success) {
console.log("Title:", result.metadata?.title);
console.log("Source:", result.metadata?.sourceURL);
console.log("Markdown:", result.markdown?.substring(0, 200));
}
Step 2: Multi-Page Synchronous Crawl
// Crawl a site — Firecrawl follows links, renders JS, returns all pages
const crawlResult = await firecrawl.crawlUrl("https://docs.example.com", {
limit: 50, // max pages to crawl
maxDepth: 3, // link depth from start URL
includePaths: ["/docs/*", "/api/*"], // only these paths
excludePaths: ["/blog/*", "/changelog/*"],
allowBackwardLinks: false, // only crawl child paths
scrapeOptions: {
formats: ["markdown"],
onlyMainContent: true,
},
});
console.log(`Crawled ${crawlResult.data?.length} pages`);
for (const page of crawlResult.data || []) {
console.log(` ${page.metadata?.sourceURL}: ${page.markdown?.length} chars`);
}
Step 3: Async Crawl for Large Sites
// Start an async crawl job — returns immediately with job ID
const job = await firecrawl.asyncCrawlUrl("https://docs.example.com", {
limit: 500,
scrapeOptions: { formats: ["markdown"] },
});
console.log(`Crawl started: ${job.id}`);
// Poll for completion with backoff
let pollInterval = 2000;
let status = await firecrawl.checkCrawlStatus(job.id);
while (status.status === "scraping") {
console.log(`Progress: ${status.completed}/${status.total} pages`);
await new Promise(r => setTimeout(r, pollInterval));
pollInterval = Math.min(pollInterval * 1.5, 30000);
status = await firecrawl.checkCrawlStatus(job.id);
}
if (status.status === "completed") {
console.log(`Done: ${status.data?.length} pages scraped`);
} else {
console.error(&
Execute Firecrawl secondary workflow: LLM extraction, batch scraping, and site mapping.
Firecrawl Core Workflow B — Extract, Batch & Map
Overview
Secondary workflow complementing the scrape/crawl workflow. Covers LLM-powered structured data extraction with JSON schemas, batch scraping multiple known URLs, and rapid site map discovery. Use this when you need typed data rather than raw markdown.
Prerequisites
@mendable/firecrawl-jsinstalledFIRECRAWLAPIKEYenvironment variable set- Understanding of JSON Schema (for extract)
Instructions
Step 1: LLM Extract — Structured Data from Pages
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// Extract structured data using an LLM + JSON schema
const result = await firecrawl.scrapeUrl("https://firecrawl.dev/pricing", {
formats: ["extract"],
extract: {
schema: {
type: "object",
properties: {
plans: {
type: "array",
items: {
type: "object",
properties: {
name: { type: "string" },
price: { type: "string" },
credits_per_month: { type: "number" },
features: { type: "array", items: { type: "string" } },
},
required: ["name", "price"],
},
},
},
},
},
});
console.log("Extracted plans:", JSON.stringify(result.extract, null, 2));
Step 2: Extract with Prompt (No Schema)
// Use natural language prompt instead of rigid schema
const result = await firecrawl.scrapeUrl("https://news.ycombinator.com", {
formats: ["extract"],
extract: {
prompt: "Extract the top 5 stories with their title, URL, points, and comment count",
},
});
console.log(result.extract);
Step 3: Batch Scrape Known URLs
// Scrape multiple specific URLs at once — more efficient than individual calls
const batchResult = await firecrawl.batchScrapeUrls(
[
"https://docs.firecrawl.dev/features/scrape",
"https://docs.firecrawl.dev/features/crawl",
"https://docs.firecrawl.dev/features/extract",
"https://docs.firecrawl.dev/features/map",
],
{
formats: ["markdown"],
onlyMainContent: true,
}
);
for (const page of batchResult.data || []) {
console.log(`${page.metadata?.title}: ${page.markdown?.length} chars`);
}
Step 4: Async Batch Scrape (Large Sets)
// Start async batch scrape for many URLs — returns job ID
const job = await firecrawl.asyncBatchScrapeUrls(
urls, // array of 100+ URLs
{ formats:Optimize Firecrawl costs through crawl limits, format selection, caching, and credit monitoring.
Firecrawl Cost Tuning
Overview
Firecrawl charges credits per operation: 1 credit per scrape, 1 per crawled page, 1 per map call, and variable credits for extract (LLM usage). An unbounded crawl on a large site can consume thousands of credits in minutes. This skill covers concrete techniques to reduce credit consumption by 50-80%.
Credit Cost Table
| Operation | Credits | Notes |
|---|---|---|
scrapeUrl |
1 | Per page, any format |
crawlUrl |
1 per page | Each discovered page costs 1 credit |
mapUrl |
1 | Regardless of URLs returned |
batchScrapeUrls |
1 per URL | Same as individual scrape |
extract |
5+ | LLM processing adds cost |
Instructions
Step 1: Always Set Crawl Limits
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// BAD: no limit — could crawl 100K pages
await firecrawl.crawlUrl("https://docs.large-project.org");
// Cost: potentially 100,000+ credits
// GOOD: bounded crawl
await firecrawl.crawlUrl("https://docs.large-project.org", {
limit: 50, // max 50 pages
maxDepth: 2, // only 2 levels deep
includePaths: ["/api/*"], // only API docs
excludePaths: ["/blog/*", "/changelog/*"],
scrapeOptions: { formats: ["markdown"] },
});
// Cost: max 50 credits
Step 2: Use Scrape for Known URLs Instead of Crawl
// If you know which pages you need, don't crawl — scrape them directly
const targetUrls = [
"https://docs.example.com/api/auth",
"https://docs.example.com/api/users",
"https://docs.example.com/api/billing",
];
// Cost: 3 credits (one per page)
const results = await firecrawl.batchScrapeUrls(targetUrls, {
formats: ["markdown"],
});
// vs crawling the whole docs site: potentially 500+ credits
Step 3: Map First, Then Selective Scrape
// Map costs 1 credit and returns up to 30K URLs
const map = await firecrawl.mapUrl("https://docs.example.com");
// Cost: 1 credit
// Filter to only what you need
const apiDocs = (map.links || []).filter(url => url.includes("/api/"));
console.log(`${map.links?.length} total URLs, only ${apiDocs.length} are API docs`);
// Scrape only relevant pages
const results = await firecrawl.batchScrapeUrls(apiDocs.slice(0, 20), {
formats: ["markdown"],
});
// Cost: 1 (map) + 20 (scrape) = 21 credits
// vs blind crawl: could be 500+ credits
Step 4: C
Process, validate, and store Firecrawl scraped content with deduplication and chunking.
Firecrawl Data Handling
Overview
Process scraped web content from Firecrawl pipelines. Covers markdown cleaning, structured data extraction with Zod validation, content deduplication, chunking for LLM/RAG, and storage patterns for crawled content.
Instructions
Step 1: Content Cleaning
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// Scrape with clean output settings
async function scrapeClean(url: string) {
const result = await firecrawl.scrapeUrl(url, {
formats: ["markdown"],
onlyMainContent: true, // strips nav, footer, sidebar
excludeTags: ["script", "style", "nav", "footer", "iframe"],
waitFor: 2000,
});
return {
url: result.metadata?.sourceURL || url,
title: result.metadata?.title || "",
markdown: cleanMarkdown(result.markdown || ""),
scrapedAt: new Date().toISOString(),
};
}
function cleanMarkdown(md: string): string {
return md
.replace(/\n{3,}/g, "\n\n") // collapse multiple newlines
.replace(/\[.*?\]\(javascript:.*?\)/g, "") // remove JS links
.replace(/!\[.*?\]\(data:.*?\)/g, "") // remove inline data URIs
.replace(/<!--[\s\S]*?-->/g, "") // remove HTML comments
.replace(/<script[\s\S]*?<\/script>/gi, "") // remove script tags
.trim();
}
Step 2: Structured Extraction with Validation
import { z } from "zod";
const ArticleSchema = z.object({
title: z.string().min(1),
author: z.string().optional(),
publishedDate: z.string().optional(),
content: z.string().min(50),
wordCount: z.number(),
});
async function extractArticle(url: string) {
const result = await firecrawl.scrapeUrl(url, {
formats: ["extract"],
extract: {
schema: {
type: "object",
properties: {
title: { type: "string" },
author: { type: "string" },
publishedDate: { type: "string" },
content: { type: "string" },
},
required: ["title", "content"],
},
},
});
if (!result.extract) throw new Error(`Extraction failed for ${url}`);
return ArticleSchema.parse({
...result.extract,
wordCount: (result.extract.content || "").split(/\s+/).length,
});
}
Step 3: Content Deduplication
import { createHash } from "crypto";
function contentHash(text: string): string {
return createHash("sha256")
.update(text.trim().toLowerCase())
.digest("hex");
}
function deduplicatePages(pages: ArraCollect Firecrawl debug evidence for support tickets and troubleshooting.
Firecrawl Debug Bundle
Current State
!node --version 2>/dev/null || echo 'N/A'
!npm list @mendable/firecrawl-js 2>/dev/null | grep firecrawl || echo 'SDK not installed'
Overview
Collect all diagnostic information needed for Firecrawl support tickets. Tests API connectivity, checks SDK version, verifies credentials, captures error context, and packages it all into a redacted bundle.
Prerequisites
- Firecrawl SDK installed
FIRECRAWLAPIKEYenvironment variable set- Access to application logs
Instructions
Step 1: Create Debug Bundle Script
#!/bin/bash
set -euo pipefail
# firecrawl-debug-bundle.sh
BUNDLE_DIR="firecrawl-debug-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE_DIR"
echo "=== Firecrawl Debug Bundle ===" > "$BUNDLE_DIR/summary.txt"
echo "Generated: $(date -u +%Y-%m-%dT%H:%M:%SZ)" >> "$BUNDLE_DIR/summary.txt"
echo "" >> "$BUNDLE_DIR/summary.txt"
# Environment
echo "--- Runtime ---" >> "$BUNDLE_DIR/summary.txt"
node --version >> "$BUNDLE_DIR/summary.txt" 2>&1 || echo "Node: N/A" >> "$BUNDLE_DIR/summary.txt"
echo "OS: $(uname -a)" >> "$BUNDLE_DIR/summary.txt"
echo "FIRECRAWL_API_KEY: ${FIRECRAWL_API_KEY:+SET (${#FIRECRAWL_API_KEY} chars)}" >> "$BUNDLE_DIR/summary.txt"
echo "FIRECRAWL_API_URL: ${FIRECRAWL_API_URL:-https://api.firecrawl.dev (default)}" >> "$BUNDLE_DIR/summary.txt"
Step 2: Collect SDK and API Status
set -euo pipefail
# SDK version
echo "" >> "$BUNDLE_DIR/summary.txt"
echo "--- SDK ---" >> "$BUNDLE_DIR/summary.txt"
npm list @mendable/firecrawl-js 2>/dev/null >> "$BUNDLE_DIR/summary.txt" || echo "Not found in npm" >> "$BUNDLE_DIR/summary.txt"
pip show firecrawl-py 2>/dev/null >> "$BUNDLE_DIR/summary.txt" || true
# API connectivity test
echo "" >> "$BUNDLE_DIR/summary.txt"
echo "--- API Connectivity ---" >> "$BUNDLE_DIR/summary.txt"
API_RESPONSE=$(curl -s -w "\n%{http_code}" https://api.firecrawl.dev/v1/scrape \
-H "Authorization: Bearer ${FIRECRAWL_API_KEY:-missing}" \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com","formats":["markdown"]}' 2>&1)
HTTP_CODE=$(echo "$API_RESPONSE" | tail -1)
echo "API Status: HTTP $HTTP_CODE" >> "$BUNDLE_DIR/summary.txt"
# Credit balance
echo "" >> "$BUNDLE_DIR/summary.txt"
echo "--Deploy Firecrawl integrations to Vercel, Cloud Run, and Docker platforms.
Firecrawl Deploy Integration
Overview
Deploy applications using Firecrawl's web scraping API to production. Covers Vercel serverless, Cloud Run containers, self-hosted Firecrawl via Docker, and webhook endpoint deployment for async crawl results.
Prerequisites
- Firecrawl API key (
FIRECRAWLAPIKEY) - Application using
@mendable/firecrawl-js - Platform CLI (vercel, docker, or gcloud)
Instructions
Step 1: Configure Platform Secrets
set -euo pipefail
# Vercel
vercel env add FIRECRAWL_API_KEY production
# Cloud Run
echo -n "$FIRECRAWL_API_KEY" | gcloud secrets create firecrawl-api-key --data-file=-
# Docker
# Use --env-file or docker secrets
Step 2: Vercel Serverless API Route
// app/api/scrape/route.ts (Next.js App Router)
import FirecrawlApp from "@mendable/firecrawl-js";
import { NextRequest, NextResponse } from "next/server";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
export async function POST(req: NextRequest) {
const { url, formats = ["markdown"] } = await req.json();
if (!url) {
return NextResponse.json({ error: "URL required" }, { status: 400 });
}
try {
const result = await firecrawl.scrapeUrl(url, {
formats,
onlyMainContent: true,
waitFor: 3000,
});
return NextResponse.json({
success: result.success,
markdown: result.markdown,
title: result.metadata?.title,
sourceURL: result.metadata?.sourceURL,
});
} catch (error: any) {
return NextResponse.json(
{ error: error.message, status: error.statusCode },
{ status: error.statusCode || 500 }
);
}
}
Step 3: Self-Hosted Firecrawl (Docker Compose)
# docker-compose.yml
services:
firecrawl:
image: mendableai/firecrawl:latest
ports:
- "3002:3002"
environment:
- PORT=3002
- USE_DB_AUTHENTICATION=false
- REDIS_URL=redis://redis:6379
- REDIS_RATE_LIMIT_URL=redis://redis:6379
- NUM_WORKERS_PER_QUEUE=2
- BULL_AUTH_KEY=${BULL_AUTH_KEY:-changeme}
depends_on:
redis:
condition: service_healthy
redis:
image: redis:7-alpine
ports:
- "6379:6379"
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
app:
build: .
ports:
- "3000:3000"
environment:
- FIRECRAWL_API_KEY=fc-self-hosted
- FIRECRAWL_API_URL=http://firecrawl:3002
depends_on:
- firecrawl
// Point app to self-hosted Firecrawl
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
apiUrl: process.envConfigure Firecrawl team access control with per-key credit limits and domain restrictions.
Firecrawl Enterprise RBAC
Overview
Control access to Firecrawl scraping resources through API key management, domain allowlists, and credit budgets per team. Firecrawl's credit-based pricing means access control is primarily about limiting credit consumption and restricting scrape targets per consumer.
Prerequisites
- Firecrawl Team or Scale plan
- Dashboard access at firecrawl.dev/app
- Understanding of credit-per-page billing
Instructions
Step 1: Separate API Keys per Consumer
set -euo pipefail
# Create dedicated keys at firecrawl.dev/app for each team/service
# Content indexing pipeline — high volume
# Key: fc-content-indexer-prod (monthly credit limit: 50,000)
# Sales team prospect research — scrape only
# Key: fc-sales-research (monthly credit limit: 5,000)
# Dev/testing — minimal
# Key: fc-dev-testing (monthly credit limit: 500)
Step 2: Gateway Proxy with Domain Allowlists
import FirecrawlApp from "@mendable/firecrawl-js";
const TEAM_POLICIES: Record<string, {
apiKey: string;
allowedDomains: string[];
maxPagesPerCrawl: number;
dailyCreditLimit: number;
}> = {
"content-team": {
apiKey: process.env.FIRECRAWL_KEY_CONTENT!,
allowedDomains: ["docs.*", "*.readthedocs.io", "medium.com"],
maxPagesPerCrawl: 200,
dailyCreditLimit: 2000,
},
"sales-team": {
apiKey: process.env.FIRECRAWL_KEY_SALES!,
allowedDomains: ["linkedin.com", "crunchbase.com", "g2.com"],
maxPagesPerCrawl: 20,
dailyCreditLimit: 500,
},
"engineering": {
apiKey: process.env.FIRECRAWL_KEY_ENGINEERING!,
allowedDomains: ["*"], // unrestricted
maxPagesPerCrawl: 100,
dailyCreditLimit: 1000,
},
};
function isDomainAllowed(team: string, url: string): boolean {
const policy = TEAM_POLICIES[team];
if (!policy) return false;
const domain = new URL(url).hostname;
return policy.allowedDomains.some(pattern =>
pattern === "*" || domain.endsWith(pattern.replace("*.", "").replace("*", ""))
);
}
function getTeamClient(team: string): FirecrawlApp {
const policy = TEAM_POLICIES[team];
if (!policy) throw new Error(`Unknown team: ${team}`);
return new FirecrawlApp({ apiKey: policy.apiKey });
}
Step 3: Credit Budget Enforcement
class TeamBudget {
private usage = new Map<string, Map<string, number>>(); // team -> date -> credits
record(team: string, credits: number) {
const today = new Date().toISOString().split("T")[0];
if (!this.usage.has(team)) this.usage.set(team, new Map());
const teamUsage = this.usage.get(team)!;
teamUsage.set(tCreate a minimal working Firecrawl example that scrapes a page to markdown.
Firecrawl Hello World
Overview
Four minimal examples covering Firecrawl's core endpoints: scrape (single page), crawl (multi-page), map (URL discovery), and extract (LLM structured data). Each is a standalone snippet you can run immediately.
Prerequisites
@mendable/firecrawl-jsinstalled (npm install @mendable/firecrawl-js)FIRECRAWLAPIKEYenvironment variable set
Instructions
Step 1: Single-Page Scrape
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// Scrape one page — returns markdown, HTML, metadata, links
const result = await firecrawl.scrapeUrl("https://docs.firecrawl.dev", {
formats: ["markdown"],
});
console.log("Title:", result.metadata?.title);
console.log("Markdown:", result.markdown?.substring(0, 500));
Step 2: Multi-Page Crawl
// Crawl a site recursively — follows links, respects robots.txt
const crawlResult = await firecrawl.crawlUrl("https://docs.firecrawl.dev", {
limit: 10, // max 10 pages (saves credits)
scrapeOptions: {
formats: ["markdown"],
},
});
console.log(`Crawled ${crawlResult.data?.length} pages`);
for (const page of crawlResult.data || []) {
console.log(` ${page.metadata?.title} — ${page.metadata?.sourceURL}`);
}
Step 3: Map a Site (URL Discovery)
// Discover all URLs on a site in ~2-3 seconds (uses sitemap + SERP)
const mapResult = await firecrawl.mapUrl("https://docs.firecrawl.dev");
console.log(`Found ${mapResult.links?.length} URLs`);
mapResult.links?.slice(0, 10).forEach(url => console.log(` ${url}`));
Step 4: LLM Extract (Structured Data)
// Extract structured data from a page using an LLM + JSON schema
const extracted = await firecrawl.scrapeUrl("https://firecrawl.dev/pricing", {
formats: ["extract"],
extract: {
schema: {
type: "object",
properties: {
plans: {
type: "array",
items: {
type: "object",
properties: {
name: { type: "string" },
price: { type: "string" },
credits: { type: "number" },
},
},
},
},
},
},
});
console.log("Pricing plans:", JSON.stringify(extracted.extract, null, 2));
Output
- Single-page markdown scraped from a live URL
- Multi-page crawl results with titles and source URLs
- Site map with all discovered URLs
- St
Execute Firecrawl incident response procedures with triage, mitigation, and postmortem.
Firecrawl Incident Runbook
Overview
Rapid incident response procedures for Firecrawl integration failures. Covers API outage triage, credential issues, credit exhaustion, crawl job failures, and webhook delivery problems.
Severity Levels
| Level | Definition | Response Time | Examples |
|---|---|---|---|
| P1 | Complete failure | < 15 min | API returns 401/500 on all requests |
| P2 | Degraded service | < 1 hour | High latency, partial failures, 429s |
| P3 | Minor impact | < 4 hours | Webhook delays, some empty scrapes |
| P4 | No user impact | Next business day | Monitoring gaps, credit warnings |
Quick Triage (Run First)
set -euo pipefail
# 1. Test Firecrawl API directly
echo "=== API Health ==="
curl -s -w "\nHTTP %{http_code}\n" https://api.firecrawl.dev/v1/scrape \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com","formats":["markdown"]}' | jq '{success, error}'
# 2. Check credit balance
echo "=== Credits ==="
curl -s https://api.firecrawl.dev/v1/team/credits \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" | jq .
# 3. Check our app health
echo "=== App Health ==="
curl -sf https://api.yourapp.com/health | jq '.services.firecrawl' || echo "App unhealthy"
Decision Tree
Firecrawl API returning errors?
├─ 401: API key invalid
│ → Verify key at firecrawl.dev/app, rotate if needed
├─ 402: Credits exhausted
│ → Upgrade plan or wait for monthly reset
├─ 429: Rate limited
│ → Reduce concurrency, enable backoff, check Retry-After
├─ 500/503: Firecrawl outage
│ → Enable fallback mode, monitor firecrawl.dev status
└─ API working fine
└─ Our integration issue
├─ Empty markdown → Increase waitFor, check target site
├─ Crawl stuck → Check job status, enforce timeout
└─ Webhook not firing → Verify endpoint, check signature
Immediate Actions by Error Type
401 — Authentication Failure
set -euo pipefail
# Verify current key
echo "Key prefix: ${FIRECRAWL_API_KEY:0:5}"
echo "Key length: ${#FIRECRAWL_API_KEY}"
# Test with explicit key
curl -s https://api.firecrawl.dev/v1/scrape \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com","formats":["markdown"]}' | jq .success
# If fails: regenerate key at firecrawl.dev/app and update all environments
402 — Credits
Install and configure Firecrawl SDK authentication for web scraping.
Firecrawl Install & Auth
Overview
Install the Firecrawl SDK and configure API key authentication. Firecrawl turns any website into LLM-ready markdown or structured data. The SDK is published as @mendable/firecrawl-js on npm and firecrawl-py on PyPI.
Prerequisites
- Node.js 18+ or Python 3.10+
- Package manager (npm, pnpm, yarn, or pip)
- Firecrawl API key from firecrawl.dev/app (free tier available)
Instructions
Step 1: Install the SDK
set -euo pipefail
# Node.js (official npm package)
npm install @mendable/firecrawl-js
# Python
pip install firecrawl-py
Step 2: Configure Your API Key
# Set the environment variable (SDK reads FIRECRAWL_API_KEY automatically)
export FIRECRAWL_API_KEY="fc-YOUR_API_KEY"
# Or add to .env file (use dotenv in your app)
echo 'FIRECRAWL_API_KEY=fc-YOUR_API_KEY' >> .env
All Firecrawl API keys start with fc-. Get yours at firecrawl.dev/app.
Step 3: Verify Connection — TypeScript
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// Quick connection test: scrape a simple page
const result = await firecrawl.scrapeUrl("https://example.com", {
formats: ["markdown"],
});
if (result.success) {
console.log("Firecrawl connected. Page title:", result.metadata?.title);
console.log("Content length:", result.markdown?.length, "chars");
} else {
console.error("Firecrawl error:", result.error);
}
Step 4: Verify Connection — Python
from firecrawl import FirecrawlApp
firecrawl = FirecrawlApp(api_key="fc-YOUR_API_KEY")
# Quick connection test
result = firecrawl.scrape_url("https://example.com", params={
"formats": ["markdown"]
})
print(f"Title: {result.get('metadata', {}).get('title')}")
print(f"Content: {len(result.get('markdown', ''))} chars")
Step 5: Self-Hosted Setup (Optional)
// Point to your own Firecrawl instance instead of api.firecrawl.dev
const firecrawl = new FirecrawlApp({
apiKey: "any-key", // required even for self-hosted
apiUrl: "http://localhost:3002", // self-hosted Firecrawl URL
});
Output
@mendable/firecrawl-jsinstalled innode_modules/FIRECRAWLAPIKEYenvironment variable configured- Successful scrape confirming API connectivity
Error Handling
Identify and avoid Firecrawl anti-patterns and common integration mistakes.
Firecrawl Known Pitfalls
Overview
Real gotchas from production Firecrawl integrations. Each pitfall includes the bad pattern, why it fails, and the correct approach. Use this as a code review checklist.
Pitfall 1: Unbounded Crawl (Credit Bomb)
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// BAD: no limit — a docs site with 50K pages burns your entire credit balance
await firecrawl.crawlUrl("https://docs.large-project.org");
// GOOD: always set limit, maxDepth, and path filters
await firecrawl.crawlUrl("https://docs.large-project.org", {
limit: 100,
maxDepth: 3,
includePaths: ["/api/*", "/guides/*"],
excludePaths: ["/changelog/*", "/blog/*"],
scrapeOptions: { formats: ["markdown"] },
});
Pitfall 2: Not Specifying Output Format
// BAD: default format may not include markdown
const result = await firecrawl.scrapeUrl("https://example.com");
console.log(result.markdown); // might be undefined!
// GOOD: explicitly request the format you need
const result = await firecrawl.scrapeUrl("https://example.com", {
formats: ["markdown"],
onlyMainContent: true,
});
console.log(result.markdown); // guaranteed present
Pitfall 3: Not Waiting for JS-Heavy Pages
// BAD: SPAs show loading state, not content
const result = await firecrawl.scrapeUrl("https://app.example.com/dashboard");
// result.markdown === "Loading..." or empty
// GOOD: wait for JS to render
const result = await firecrawl.scrapeUrl("https://app.example.com/dashboard", {
formats: ["markdown"],
waitFor: 5000, // wait 5s for JS rendering
onlyMainContent: true,
});
// BETTER: wait for a specific element
const result = await firecrawl.scrapeUrl("https://app.example.com/dashboard", {
formats: ["markdown"],
actions: [
{ type: "wait", selector: ".main-content" },
],
});
Pitfall 4: Wrong Package Name / Import
// BAD: these packages don't exist or are wrong
import FirecrawlApp from "firecrawl-js"; // wrong
import { FireCrawlClient } from "@firecrawl/sdk"; // wrong
// GOOD: the correct npm package
import FirecrawlApp from "@mendable/firecrawl-js"; // correct!
// Install: npm install @mendable/firecrawl-js
Pitfall 5: Polling Too Aggressively
// BAD: polling every 100ms wastes resources and may trigger rate limits
let status = await firecrawl.checkCrawlStatus(jobId);
while (status.status !== "completed") {
status = await firecrawl.checkCrawlStatus(jobIdLoad test and scale Firecrawl scraping pipelines with concurrency control and batching.
Firecrawl Load & Scale
Overview
Load test and scale Firecrawl scraping pipelines. Firecrawl's rate limits are per-plan (RPM and concurrent connections), so scaling means maximizing throughput within those limits using batch scraping, async crawls, and queue-based request management.
Rate Limits by Plan
| Plan | Scrape RPM | Concurrent Crawls | Max Batch Size |
|---|---|---|---|
| Free | 10 | 2 | 10 |
| Hobby | 20 | 3 | 50 |
| Standard | 50 | 5 | 100 |
| Growth | 100 | 10 | 100 |
| Scale | 500+ | 50+ | 100 |
Instructions
Step 1: Measure Baseline Throughput
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
async function measureThroughput(urls: string[], concurrency: number) {
const start = Date.now();
const results: Array<{ url: string; durationMs: number; success: boolean; chars: number }> = [];
// Process in batches of `concurrency`
for (let i = 0; i < urls.length; i += concurrency) {
const batch = urls.slice(i, i + concurrency);
const batchResults = await Promise.all(
batch.map(async url => {
const t0 = Date.now();
try {
const result = await firecrawl.scrapeUrl(url, { formats: ["markdown"] });
return { url, durationMs: Date.now() - t0, success: true, chars: result.markdown?.length || 0 };
} catch {
return { url, durationMs: Date.now() - t0, success: false, chars: 0 };
}
})
);
results.push(...batchResults);
}
const totalMs = Date.now() - start;
const succeeded = results.filter(r => r.success).length;
console.log(`=== Throughput Report ===`);
console.log(`URLs: ${urls.length}, Concurrency: ${concurrency}`);
console.log(`Total time: ${totalMs}ms`);
console.log(`Success: ${succeeded}/${urls.length}`);
console.log(`Throughput: ${(urls.length / (totalMs / 1000)).toFixed(1)} pages/sec`);
console.log(`Avg latency: ${(results.reduce((s, r) => s + r.durationMs, 0) / results.length).toFixed(0)}ms`);
return results;
}
Step 2: Use Batch Scrape for Maximum Efficiency
// batchScrapeUrls is the most efficient way to scrape multiple known URLs
async function scaledBatchScrape(urls: string[], batchSize = 50) {
const allResults: any[] = [];
for (let i = 0; i < urls.length; i += batchSize) {
const batch = urls.slice(i, i + batchSize);
console.log(`Batch ${i / batchSize + 1}: scraping ${batch.length} URLs...`);
const result = await firecrawl.batchScrapeUrls(batch, {
formats: ["markdown"],
onlyMConfigure Firecrawl local development with self-hosted Docker, mocking, and testing.
Firecrawl Local Dev Loop
Overview
Set up a fast development workflow for Firecrawl integrations. Use self-hosted Firecrawl via Docker to avoid burning API credits during development, mock the SDK for unit tests, and run integration tests against the local instance.
Prerequisites
- Node.js 18+ with npm/pnpm
- Docker + Docker Compose (for self-hosted Firecrawl)
@mendable/firecrawl-jsinstalled
Instructions
Step 1: Project Structure
my-firecrawl-project/
├── src/
│ ├── scraper.ts # Firecrawl business logic
│ └── config.ts # Environment-aware config
├── tests/
│ ├── scraper.test.ts # Unit tests (mocked SDK)
│ └── integration.test.ts # Integration tests (real API)
├── docker-compose.yml # Self-hosted Firecrawl
├── .env.local # Dev secrets (git-ignored)
├── .env.example # Template for team
└── package.json
Step 2: Self-Hosted Firecrawl for Zero-Credit Dev
# docker-compose.yml
services:
firecrawl:
image: mendableai/firecrawl:latest
ports:
- "3002:3002"
environment:
- PORT=3002
- USE_DB_AUTHENTICATION=false
- REDIS_URL=redis://redis:6379
- NUM_WORKERS_PER_QUEUE=1
- BULL_AUTH_KEY=devonly
depends_on:
redis:
condition: service_healthy
redis:
image: redis:7-alpine
ports:
- "6379:6379"
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
set -euo pipefail
# Start local Firecrawl
docker compose up -d
# Verify it's running
curl -s http://localhost:3002/health | jq .
Step 3: Environment-Aware Configuration
// src/config.ts
import FirecrawlApp from "@mendable/firecrawl-js";
export function getFirecrawl(): FirecrawlApp {
const isDev = process.env.NODE_ENV !== "production";
return new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY || "fc-dev",
// Point to local Docker instance in dev
...(isDev && process.env.FIRECRAWL_API_URL
? { apiUrl: process.env.FIRECRAWL_API_URL }
: {}),
});
}
# .env.local (for development — zero API credits used)
FIRECRAWL_API_KEY=fc-localdev
FIRECRAWL_API_URL=http://localhost:3002
NODE_ENV=development
Step 4: Unit Tests with Mocked SDK
// tests/scraper.test.ts
import { describe, it, expect, vi, beforeEach } from "vitest";
// Mock the SDK
vi.mock("@mendable/firecrawl-js", () => ({
default: vi.fn().mockImplementation(() => ({
scrapeUrl: vi.fn().mockResolvedValue({
success: true,
markdown: "# Hello World\n\nSMigrate to Firecrawl from Puppeteer, Playwright, Cheerio, or other scraping tools.
Firecrawl Migration Deep Dive
Current State
!npm list puppeteer playwright cheerio 2>/dev/null | grep -E "puppeteer|playwright|cheerio" || echo 'No scraping libs found'
Overview
Migrate from custom scraping (Puppeteer, Playwright, Cheerio) or competing APIs to Firecrawl. Firecrawl eliminates browser management, anti-bot handling, and JS rendering infrastructure. This skill shows equivalent code for common scraping patterns.
Migration Comparison
| Feature | Puppeteer/Playwright | Cheerio | Firecrawl |
|---|---|---|---|
| JS rendering | Manual browser | No | Automatic |
| Anti-bot bypass | DIY (stealth plugin) | No | Built-in |
| Output format | Raw HTML | Parsed HTML | Markdown/JSON/HTML |
| Infrastructure | Browser instances | None | API call |
| Concurrent scraping | Manage browser pool | Simple | Managed by Firecrawl |
| Cost model | Compute (CPU/RAM) | Free | Credits per page |
Instructions
Step 1: Replace Puppeteer Single-Page Scrape
// BEFORE: Puppeteer (20+ lines, browser management)
import puppeteer from "puppeteer";
async function scrapePuppeteer(url: string) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url, { waitUntil: "networkidle2" });
const html = await page.content();
const title = await page.title();
await browser.close();
return { html, title };
}
// AFTER: Firecrawl (5 lines, no browser needed)
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY! });
async function scrapeFirecrawl(url: string) {
const result = await firecrawl.scrapeUrl(url, {
formats: ["markdown"],
onlyMainContent: true,
waitFor: 2000,
});
return { markdown: result.markdown, title: result.metadata?.title };
}
Step 2: Replace Cheerio HTML Parsing
// BEFORE: fetch + cheerio (manual parsing)
import * as cheerio from "cheerio";
async function scrapeCheerio(url: string) {
const html = await fetch(url).then(r => r.text());
const $ = cheerio.load(html);
return {
title: $("h1").first().text(),
content: $("main").text(),
links: $("a").map((_, el) => $(el).attr("href")).get(),
};
}
// AFTER: Firecrawl with extract (LLM-powered, no CSS selectors)
async function extractFirecrawl(url: string) {
const result = await firecrawl.scrapeUrl(url, {
formats: ["extract", "links"],
extract: {
schema: {
type: "Configure Firecrawl across development, staging, and production environments.
Firecrawl Multi-Environment Setup
Overview
Firecrawl's credit-based pricing makes environment separation critical. Development should use self-hosted Firecrawl or strict limits to avoid burning production credits during testing. This skill covers per-environment config, self-hosted Docker for dev, and credit budget enforcement.
Environment Strategy
| Environment | API Source | Crawl Limit | Concurrency | Credits |
|---|---|---|---|---|
| Development | Self-hosted Docker | 10 pages | 1 | Zero (local) |
| Staging | Cloud API (test key) | 50 pages | 2 | Limited |
| Production | Cloud API (prod key) | Per-task | Full plan | Monitored |
Instructions
Step 1: Environment-Aware Configuration
// config/firecrawl.ts
import FirecrawlApp from "@mendable/firecrawl-js";
type Env = "development" | "staging" | "production";
interface FirecrawlConfig {
apiKey: string;
apiUrl?: string;
maxPagesPerCrawl: number;
maxDepth: number;
concurrency: number;
waitFor: number;
}
const configs: Record<Env, FirecrawlConfig> = {
development: {
apiKey: process.env.FIRECRAWL_API_KEY_DEV || "fc-localdev",
apiUrl: process.env.FIRECRAWL_API_URL_DEV || "http://localhost:3002",
maxPagesPerCrawl: 10,
maxDepth: 2,
concurrency: 1,
waitFor: 2000,
},
staging: {
apiKey: process.env.FIRECRAWL_API_KEY_STAGING!,
maxPagesPerCrawl: 50,
maxDepth: 3,
concurrency: 2,
waitFor: 3000,
},
production: {
apiKey: process.env.FIRECRAWL_API_KEY_PROD!,
maxPagesPerCrawl: 500,
maxDepth: 5,
concurrency: 5,
waitFor: 3000,
},
};
export function getConfig(): FirecrawlConfig {
const env = (process.env.NODE_ENV || "development") as Env;
return configs[env] || configs.development;
}
export function getFirecrawl(): FirecrawlApp {
const cfg = getConfig();
return new FirecrawlApp({
apiKey: cfg.apiKey,
...(cfg.apiUrl ? { apiUrl: cfg.apiUrl } : {}),
});
}
Step 2: Self-Hosted Firecrawl for Development
# docker-compose.dev.yml
services:
firecrawl:
image: mendableai/firecrawl:latest
ports:
- "3002:3002"
environment:
- PORT=3002
- USE_DB_AUTHENTICATION=false
- REDIS_URL=redis://redis:6379
- NUM_WORKERS_PER_QUEUE=1
- BULL_AUTH_KEY=devonly
depends_on:
redis:
condition: service_healthy
redis:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
set -euo pipefail
docker compose -Monitor Firecrawl scraping pipelines with metrics, credit tracking, and quality alerts.
Firecrawl Observability
Overview
Monitor Firecrawl web scraping pipelines for success rates, credit consumption, content quality, and latency. Key signals: scrape success rate, crawl job completion, credit burn velocity, extraction quality (did markdown actually contain useful content vs error pages), and webhook delivery health.
Key Metrics
| Metric | Type | Why It Matters |
|---|---|---|
firecrawlscrapestotal |
Counter | Track scrape volume and success rate |
firecrawlcreditsused |
Counter | Monitor credit consumption |
firecrawlscrapeduration_ms |
Histogram | Detect latency issues |
firecrawlcontentquality |
Counter | Catch empty/error pages |
firecrawlcrawljobs_total |
Counter | Track crawl job outcomes |
Instructions
Step 1: Instrumented Firecrawl Wrapper
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// Counters (use your metrics library: prom-client, statsd, datadog, etc.)
function emit(metric: string, value: number, tags?: Record<string, string>) {
console.log(JSON.stringify({ metric, value, tags, timestamp: Date.now() }));
}
export async function instrumentedScrape(url: string) {
const start = Date.now();
try {
const result = await firecrawl.scrapeUrl(url, {
formats: ["markdown"],
onlyMainContent: true,
});
const duration = Date.now() - start;
const quality = evaluateQuality(result);
emit("firecrawl_scrapes_total", 1, { status: "success" });
emit("firecrawl_scrape_duration_ms", duration);
emit("firecrawl_credits_used", 1);
emit("firecrawl_content_quality", 1, { quality });
return result;
} catch (error: any) {
emit("firecrawl_scrapes_total", 1, {
status: "error",
error_code: String(error.statusCode || "unknown"),
});
emit("firecrawl_scrape_duration_ms", Date.now() - start);
throw error;
}
}
function evaluateQuality(result: any): string {
const md = result.markdown || "";
if (md.length < 100) return "empty";
if (/404|not found|access denied|captcha/i.test(md)) return "error_page";
if (!/^#{1,3}\s/m.test(md)) return "no_structure";
return "good";
}
Step 2: Credit Consumption Monitor
async function checkCreditHealth() {
const response = await fetch("https://api.firecrawl.dev/v1/team/credits", {
headers: {Optimize Firecrawl scraping performance with caching, batch scraping, and format selection.
Firecrawl Performance Tuning
Overview
Optimize Firecrawl API performance by choosing efficient scraping modes, caching results, using batch endpoints, and minimizing unnecessary rendering. Key levers: format selection (markdown vs HTML vs screenshot), waitFor tuning, onlyMainContent, and batch vs individual scraping.
Latency Benchmarks
| Operation | Typical | With JS Wait | With Screenshot |
|---|---|---|---|
| scrapeUrl (markdown) | 2-5s | 5-10s | 8-15s |
| scrapeUrl (extract) | 3-8s | 8-15s | N/A |
| crawlUrl (10 pages) | 20-40s | 40-80s | N/A |
| mapUrl | 1-3s | N/A | N/A |
| batchScrapeUrls (10) | 10-20s | 20-40s | N/A |
Instructions
Step 1: Minimize Formats (Biggest Win)
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// SLOW: requesting everything
const slow = await firecrawl.scrapeUrl(url, {
formats: ["markdown", "html", "links", "screenshot"],
// screenshot + full HTML = 3-5x slower
});
// FAST: request only what you need
const fast = await firecrawl.scrapeUrl(url, {
formats: ["markdown"], // markdown only = fastest
onlyMainContent: true, // skip nav/footer/sidebar
});
Step 2: Tune waitFor for JS-Heavy Pages
// Default: no JS wait (fastest, works for static sites)
const staticResult = await firecrawl.scrapeUrl("https://docs.example.com", {
formats: ["markdown"],
// No waitFor needed — content is in initial HTML
});
// SPA/dynamic pages: add minimal wait
const spaResult = await firecrawl.scrapeUrl("https://app.example.com", {
formats: ["markdown"],
waitFor: 3000, // 3s — enough for most SPAs
onlyMainContent: true,
});
// Heavy interactive page: use actions instead of long wait
const heavyResult = await firecrawl.scrapeUrl("https://dashboard.example.com", {
formats: ["markdown"],
actions: [
{ type: "wait", selector: ".data-table" }, // wait for specific element
{ type: "scroll", direction: "down" }, // trigger lazy loading
],
});
Step 3: Cache Scraped Content
import { LRUCache } from "lru-cache";
import { createHash } from "crypto";
const scrapeCache = new LRUCache<string, any>({
max: 500, // max 500 cached pages
ttl: 3600000, // 1 hour TTL
});
async function cachedScrape(url: string) {
const key = createHash("md5").update(Implement Firecrawl scraping policy enforcement: domain blocklists, credit budgets, content filtering, and robots.
Firecrawl Policy Guardrails
Overview
Automated guardrails for Firecrawl scraping pipelines. Web scraping carries legal (robots.txt, ToS), ethical (rate limiting, attribution), and cost (credit burn) risks. This skill implements domain blocklists, credit budgets, content quality gates, and per-domain rate limits as enforceable policies.
Instructions
Step 1: Domain Policy Enforcement
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
class ScrapePolicy {
// Domains that explicitly prohibit scraping in their ToS
static BLOCKED_DOMAINS = [
"facebook.com", "instagram.com", // Meta ToS
"linkedin.com", // LinkedIn ToS
"twitter.com", "x.com", // X/Twitter ToS
];
// Domains with sensitive/regulated content
static SENSITIVE_DOMAINS = [
"*.gov", "*.mil", // Government
"*.edu", // Educational (FERPA)
];
static validateUrl(url: string): void {
const hostname = new URL(url).hostname;
for (const blocked of this.BLOCKED_DOMAINS) {
if (hostname === blocked || hostname.endsWith(`.${blocked}`)) {
throw new PolicyViolation(`Domain "${hostname}" is blocked: ToS prohibits scraping`);
}
}
for (const pattern of this.SENSITIVE_DOMAINS) {
const regex = new RegExp("^" + pattern.replace("*.", ".*\\.") + "$");
if (regex.test(hostname)) {
console.warn(`CAUTION: "${hostname}" matches sensitive domain pattern "${pattern}"`);
}
}
}
}
class PolicyViolation extends Error {
constructor(message: string) {
super(message);
this.name = "PolicyViolation";
}
}
Step 2: Credit Budget Enforcement
class CrawlBudget {
private usage = new Map<string, number>();
private dailyLimit: number;
constructor(dailyLimit = 5000) {
this.dailyLimit = dailyLimit;
}
authorize(estimatedPages: number): void {
const today = new Date().toISOString().split("T")[0];
const used = this.usage.get(today) || 0;
if (used + estimatedPages > this.dailyLimit) {
throw new PolicyViolation(
`Daily credit limit would be exceeded: ${used} used + ${estimatedPages} requested > ${this.dailyLimit} limit`
);
}
}
record(pagesScraped: number) {
const today = new Date().toISOString().split("T")[0];
this.usage.set(today, (this.usage.get(today) || 0) + pagesScraped);
}
}
const budget = new CrawlBudget(5000);
Step 3: Content Quality Gate
function validateScrapedContent(result: any): {
accepted: boolean;
reasExecute Firecrawl production deployment checklist and rollback procedures.
Firecrawl Production Checklist
Overview
Pre-deployment validation checklist for applications using Firecrawl's scrape, crawl, map, and extract APIs. Covers credential management, crawl safety limits, error handling, monitoring, and rollback.
Prerequisites
- Staging environment tested and passing
- Production API key from firecrawl.dev/app
- Monitoring infrastructure ready
Pre-Deployment Checklist
Credentials & Security
- [ ] Production
FIRECRAWLAPIKEYin secure vault (not in code or .env) - [ ] Key starts with
fc-and is scoped to production - [ ] Different API keys for dev/staging/production
- [ ]
.envfiles in.gitignore - [ ] Webhook secrets stored securely
- [ ] Git history scanned for leaked keys
Crawl Safety
- [ ] All
crawlUrlcalls havelimitparameter set - [ ]
maxDepthconfigured to prevent unbounded crawling - [ ]
includePaths/excludePathsfilters applied where appropriate - [ ] Credit budget tracking implemented (daily limit alerts)
- [ ] No hardcoded URLs in production code
Error Handling
- [ ] 429 rate limit handling with exponential backoff
- [ ] 402 credit exhaustion handled gracefully (no crash)
- [ ] 401 auth failure logged and alerted
- [ ] Async crawl jobs have timeout with deadline
- [ ] Fallback from crawl to individual scrape on failure
- [ ] Empty markdown detection (JS rendering issues)
Monitoring & Alerting
- [ ] Scrape success/failure rate tracked
- [ ] Credit consumption monitored
- [ ] Crawl job completion rate tracked
- [ ] Alert on credit balance below threshold
- [ ] Alert on error rate > 5%
- [ ] Webhook delivery failures logged
Instructions
Step 1: Verify API Connectivity
set -euo pipefail
# Test production key
curl -s https://api.firecrawl.dev/v1/scrape \
-H "Authorization: Bearer $FIRECRAWL_API_KEY_PROD" \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com","formats":["markdown"]}' | jq '.success'
# Check credit balance
curl -s https://api.firecrawl.dev/v1/team/credits \
-H "Authorization: Bearer $FIRECRAWL_API_KEY_PROD" | jq .
Step 2: Health Check Endpoint
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
export async function healthCheck() {
const start = Date.now();
try {
const result = await firecrawl.scrapeUrl("Implement Firecrawl rate limiting, backoff, and request queuing patterns.
Firecrawl Rate Limits
Overview
Firecrawl enforces rate limits per API key measured in requests per minute and concurrent connections. When exceeded, the API returns 429 Too Many Requests with a Retry-After header. This skill covers backoff strategies, request queuing, and proactive throttling.
Rate Limit Tiers
| Plan | Scrape RPM | Crawl Concurrency | Credits/Month |
|---|---|---|---|
| Free | 10 | 2 | 500 |
| Hobby | 20 | 3 | 3,000 |
| Standard | 50 | 5 | 50,000 |
| Growth | 100 | 10 | 500,000 |
| Scale | 500+ | 50+ | Custom |
Concurrent crawl jobs count against concurrency limits. If the queue is full, new jobs are rejected with 429.
Instructions
Step 1: Exponential Backoff with Jitter
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
async function withBackoff<T>(
operation: () => Promise<T>,
config = { maxRetries: 5, baseDelayMs: 1000, maxDelayMs: 32000 }
): Promise<T> {
for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
try {
return await operation();
} catch (error: any) {
if (attempt === config.maxRetries) throw error;
const status = error.statusCode || error.status;
// Only retry on 429 (rate limit) and 5xx (server error)
if (status && status !== 429 && status < 500) throw error;
// Exponential delay with random jitter to prevent thundering herd
const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt);
const jitter = Math.random() * 500;
const delay = Math.min(exponentialDelay + jitter, config.maxDelayMs);
console.warn(`Rate limited (${status}). Retry ${attempt + 1}/${config.maxRetries} in ${delay.toFixed(0)}ms`);
await new Promise(r => setTimeout(r, delay));
}
}
throw new Error("Unreachable");
}
// Usage
const result = await withBackoff(() =>
firecrawl.scrapeUrl("https://example.com", { formats: ["markdown"] })
);
Step 2: Queue-Based Rate Limiting with p-queue
import PQueue from "p-queue";
// Limit to 5 concurrent requests, max 10 per second
const scrapeQueue = new PQueue({
concurrency: 5,
interval: 1000,
intervalCap: 10,
});
async function queuedScrape(url: string) {
return scrapeQueue.add(() =>
withBackoff(() =>
firecrawl.scrapeUrl(url, { formats: ["markdown"] })
)
);
}
// Scrape many URLs respecting rate limits
const urls = ["https://a.com", "https://b.com"Implement Firecrawl reference architecture with scrape/crawl/map/extract pipelines.
Firecrawl Reference Architecture
Overview
Production architecture for web scraping and content ingestion with Firecrawl. Covers three tiers: on-demand scraping, scheduled crawl pipelines, and real-time RAG ingestion. Uses all four Firecrawl endpoints: scrape, crawl, map, and extract.
Architecture Diagram
┌─────────────────────────────────────────────────────────┐
│ Firecrawl Pipeline │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────┐ ┌───────────┐ │
│ │ scrapeUrl│ │ crawlUrl │ │mapUrl│ │ extract │ │
│ │ (1 page) │ │ (N pages)│ │(URLs)│ │ (LLM+JSON)│ │
│ └────┬─────┘ └────┬─────┘ └──┬───┘ └─────┬─────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Content Processing Layer │ │
│ │ Clean MD │ Validate │ Deduplicate │ Chunk │ │
│ └─────────────────────┬─────────────────────────────┘ │
│ │ │
│ ┌─────────────────────┴─────────────────────────────┐ │
│ │ Storage & Output │ │
│ │ Files │ Database │ Vector Store │ Search Index │ │
│ └───────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Instructions
Step 1: Firecrawl Service Layer
// src/firecrawl/service.ts
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// Single page scrape
export async function scrapePage(url: string) {
return firecrawl.scrapeUrl(url, {
formats: ["markdown"],
onlyMainContent: true,
waitFor: 2000,
});
}
// Site-wide crawl with safety limits
export async function crawlSite(baseUrl: string, opts?: {
maxPages?: number;
paths?: string[];
excludePaths?: string[];
}) {
return firecrawl.crawlUrl(baseUrl, {
limit: opts?.maxPages || 50,
maxDepth: 3,
includePaths: opts?.paths,
excludePaths: opts?.excludePaths || ["/blog/*", "/news/*"],
scrapeOptions: { formats: ["markdown"], onlyMainContent: true },
});
}
// Fast URL discovery
export async function discoverUrls(baseUrl: string) {
const map = await firecrawl.mapUrl(baseUrl);
return map.links || [];
}
// Structured data extraction
export async function extractData(url: string, schema: object) {
return firecrawl.scrapeUrl(url, {
formats: ["extract"],
extract: { schema },
});
}
Step 2: Content Processing Pipeline
// src/pipeline/processor.ts
import { createHash } from "crypto";
interface ProcessedPaImplement Firecrawl reliability patterns: circuit breakers, crawl fallbacks, and content validation.
Firecrawl Reliability Patterns
Overview
Production reliability patterns for Firecrawl scraping pipelines. Firecrawl's async crawl model, JS rendering, and credit-based pricing create specific reliability challenges: crawl jobs may time out, scraped content may be empty (bot detection, JS failures), and credits can be burned by runaway crawls. This skill covers battle-tested patterns for each.
Instructions
Step 1: Robust Crawl with Timeout and Backoff
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
async function reliableCrawl(
url: string,
opts: { limit: number; paths?: string[] },
timeoutMs = 600000
) {
const job = await firecrawl.asyncCrawlUrl(url, {
limit: opts.limit,
includePaths: opts.paths,
scrapeOptions: { formats: ["markdown"], onlyMainContent: true },
});
const deadline = Date.now() + timeoutMs;
let pollInterval = 2000;
while (Date.now() < deadline) {
const status = await firecrawl.checkCrawlStatus(job.id);
if (status.status === "completed") return status;
if (status.status === "failed") {
throw new Error(`Crawl failed: ${status.error}`);
}
await new Promise(r => setTimeout(r, pollInterval));
pollInterval = Math.min(pollInterval * 1.5, 30000); // back off to 30s max
}
throw new Error(`Crawl timed out after ${timeoutMs}ms (job: ${job.id})`);
}
Step 2: Content Quality Validation
interface ScrapedPage {
url: string;
markdown: string;
metadata: { title?: string; statusCode?: number };
}
function validateContent(page: ScrapedPage): {
valid: boolean;
reason?: string;
} {
if (!page.markdown || page.markdown.length < 100) {
return { valid: false, reason: "Content too short" };
}
if (page.metadata.statusCode && page.metadata.statusCode >= 400) {
return { valid: false, reason: `HTTP ${page.metadata.statusCode}` };
}
const errorPatterns = [
"access denied", "403 forbidden", "page not found",
"captcha", "please verify", "enable javascript",
];
const lower = page.markdown.toLowerCase();
for (const pattern of errorPatterns) {
if (lower.includes(pattern)) {
return { valid: false, reason: `Error page detected: "${pattern}"` };
}
}
return { valid: true };
}
Step 3: Crawl-to-Scrape Fallback
// If a full crawl fails, fall back to scraping critical pages individually
async function resilientFetch(urls: string[]): Promise<any[]> {
// Try batch scrape first (most efficient)
try {
const batch = await firecrawl.batchScrapeUrls(urls, {
formats: ["markdown"],
onlyMainContent: true,
Apply production-ready Firecrawl SDK patterns for TypeScript and Python.
Firecrawl SDK Patterns
Overview
Production-ready patterns for Firecrawl SDK (@mendable/firecrawl-js / firecrawl-py). Covers singleton client, typed wrappers, retry with backoff, response validation, and reusable scraping service patterns.
Prerequisites
@mendable/firecrawl-jsinstalled- Understanding of async/await patterns
- TypeScript strict mode recommended
Instructions
Step 1: Singleton Client with Configuration
// src/firecrawl/client.ts
import FirecrawlApp from "@mendable/firecrawl-js";
let instance: FirecrawlApp | null = null;
export function getFirecrawl(): FirecrawlApp {
if (!instance) {
if (!process.env.FIRECRAWL_API_KEY) {
throw new Error("FIRECRAWL_API_KEY environment variable is required");
}
instance = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY,
...(process.env.FIRECRAWL_API_URL
? { apiUrl: process.env.FIRECRAWL_API_URL }
: {}),
});
}
return instance;
}
Step 2: Typed Scrape Wrapper
// src/firecrawl/scrape.ts
import { getFirecrawl } from "./client";
interface ScrapeResult {
url: string;
title: string;
markdown: string;
links: string[];
scrapedAt: string;
}
export async function scrapePage(
url: string,
options?: { waitFor?: number; includeLinks?: boolean }
): Promise<ScrapeResult> {
const firecrawl = getFirecrawl();
const formats: string[] = ["markdown"];
if (options?.includeLinks) formats.push("links");
const result = await firecrawl.scrapeUrl(url, {
formats,
onlyMainContent: true,
...(options?.waitFor ? { waitFor: options.waitFor } : {}),
});
if (!result.success) {
throw new Error(`Scrape failed for ${url}: ${result.error}`);
}
return {
url: result.metadata?.sourceURL || url,
title: result.metadata?.title || "",
markdown: result.markdown || "",
links: result.links || [],
scrapedAt: new Date().toISOString(),
};
}
Step 3: Retry with Exponential Backoff
// src/firecrawl/retry.ts
export async function withRetry<T>(
operation: () => Promise<T>,
config = { maxRetries: 3, baseDelayMs: 1000, maxDelayMs: 30000 }
): Promise<T> {
for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
try {
return await operation();
} catch (error: any) {
if (attempt === config.maxRetries) throw error;
const status = error.statusCode || error.status;
// Only retry on rate limits (429) and server errors (5xx)
if (status && status !== 429 && status < 500) throw error;
const delay = Math.min(
config.baseDelayMs * Math.pow(2, attempt) + Math.random() * 500,
Apply Firecrawl security best practices for API key management and webhook verification.
Firecrawl Security Basics
Overview
Security best practices for Firecrawl API keys, webhook signature verification, and scraped content handling. Firecrawl API keys start with fc- and grant full access to scrape, crawl, map, and extract endpoints — protecting them is critical.
Prerequisites
- Firecrawl API key
- Understanding of environment variables
- Webhook endpoint (if using async crawl callbacks)
Instructions
Step 1: Secure API Key Storage
# .env (NEVER commit to git)
FIRECRAWL_API_KEY=fc-your-api-key-here
# .gitignore — add these patterns
echo -e "\n.env\n.env.local\n.env.*.local" >> .gitignore
// Validate key exists before creating client
import FirecrawlApp from "@mendable/firecrawl-js";
if (!process.env.FIRECRAWL_API_KEY?.startsWith("fc-")) {
throw new Error("FIRECRAWL_API_KEY must be set and start with 'fc-'");
}
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY,
});
Step 2: Verify Webhook Signatures
Firecrawl signs webhook payloads with HMAC-SHA256 via the X-Firecrawl-Signature header.
import crypto from "crypto";
function verifyWebhookSignature(
payload: string,
signature: string,
secret: string
): boolean {
const expected = crypto
.createHmac("sha256", secret)
.update(payload)
.digest("hex");
// Timing-safe comparison prevents timing attacks
return crypto.timingSafeEqual(
Buffer.from(signature),
Buffer.from(expected)
);
}
// Express webhook handler with verification
app.post("/webhooks/firecrawl", (req, res) => {
const signature = req.headers["x-firecrawl-signature"] as string;
const rawBody = JSON.stringify(req.body);
if (!verifyWebhookSignature(rawBody, signature, process.env.FIRECRAWL_WEBHOOK_SECRET!)) {
console.error("Invalid webhook signature — rejecting");
return res.status(401).json({ error: "Invalid signature" });
}
// Process verified webhook
const { type, data } = req.body;
console.log(`Verified webhook: ${type}`);
res.status(200).json({ received: true });
});
Step 3: Separate Keys per Environment
# GitHub Actions secrets
gh secret set FIRECRAWL_API_KEY_DEV --body "fc-dev-..."
gh secret set FIRECRAWL_API_KEY_STAGING --body "fc-staging-..."
gh secret set FIRECRAWL_API_KEY_PROD --body "fc-prod-..."
// Load correct key based on environment
const KEY_MAP: Record<string, string> = {
development: "FIRECRAWL_API_KEY_DEV",
staging: "FIRECRAWL_API_KEY_STAGING",
production: "FIRECRAWL_API_KEY_PROD",Upgrade Firecrawl SDK versions and migrate between API versions (v0 to v1/v2).
Firecrawl Upgrade & Migration
Current State
!npm list @mendable/firecrawl-js 2>/dev/null | grep firecrawl || echo 'Not installed'
Overview
Guide for upgrading @mendable/firecrawl-js SDK versions and migrating from Firecrawl API v0/v1 to v2. Covers breaking changes in import paths, method signatures, response formats, and the new extract v2 schema format.
Version History
| SDK Version | API Version | Key Changes |
|---|---|---|
| 1.x | v1 | asyncCrawlUrl, checkCrawlStatus, mapUrl added |
| 0.x | v0 | Legacy crawlUrl with waitUntilDone param |
Instructions
Step 1: Check Current Version
set -euo pipefail
# Check installed version
npm list @mendable/firecrawl-js
# Check latest available
npm view @mendable/firecrawl-js version
Step 2: Create Upgrade Branch
set -euo pipefail
git checkout -b upgrade/firecrawl-sdk
npm install @mendable/firecrawl-js@latest
npm test
Step 3: Migration — v0 to v1/v2
Import Changes
// No change needed — import has been stable
import FirecrawlApp from "@mendable/firecrawl-js";
Crawl Method Changes (v0 -> v1)
// BEFORE (v0): crawlUrl with waitUntilDone
const result = await firecrawl.crawlUrl("https://example.com", {
crawlerOptions: { limit: 50 },
pageOptions: { onlyMainContent: true },
waitUntilDone: true,
});
// AFTER (v1+): crawlUrl returns synchronously, or use asyncCrawlUrl
const result = await firecrawl.crawlUrl("https://example.com", {
limit: 50,
scrapeOptions: {
formats: ["markdown"],
onlyMainContent: true,
},
});
// For large crawls, use async with polling
const job = await firecrawl.asyncCrawlUrl("https://example.com", {
limit: 500,
scrapeOptions: { formats: ["markdown"] },
});
const status = await firecrawl.checkCrawlStatus(job.id);
Scrape Options Changes (v0 -> v1)
// BEFORE (v0)
await firecrawl.scrapeUrl("https://example.com", {
pageOptions: { onlyMainContent: true },
extractorOptions: { mode: "llm-extraction", schema: mySchema },
});
// AFTER (v1+)
await firecrawl.scrapeUrl("https://example.com", {
formats: ["markdown", "extract"],
onlyMainContent: true,
extract: { schema: mySchema },
});
Extract v2 Format (v1 -> v2)
// BEFORE (v1): extract as top-level option
await firecrawl.scrapeUrl(url, {
formats: ["extract"],
extract: { schema: { type: "object", ... Implement Firecrawl webhook event handling for async crawl and batch scrape jobs.
Firecrawl Webhooks & Events
Overview
Handle Firecrawl webhooks for real-time notifications on async crawl and batch scrape jobs. Instead of polling checkCrawlStatus, configure a webhook URL and Firecrawl will POST events as pages are scraped and jobs complete. Signed with HMAC-SHA256 via X-Firecrawl-Signature.
Webhook Event Types
| Event | Trigger | Payload |
|---|---|---|
crawl.started |
Crawl job begins | Job ID, config |
crawl.page |
Individual page scraped | Page markdown, metadata |
crawl.completed |
Full crawl finishes | All pages array |
crawl.failed |
Crawl job errors | Error message |
batch_scrape.completed |
Batch scrape finishes | All scraped pages |
Instructions
Step 1: Start Crawl with Webhook
import FirecrawlApp from "@mendable/firecrawl-js";
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY!,
});
// Webhook as string (simple)
const job = await firecrawl.asyncCrawlUrl("https://docs.example.com", {
limit: 100,
scrapeOptions: { formats: ["markdown"] },
webhook: "https://api.yourapp.com/webhooks/firecrawl",
});
console.log(`Crawl started: ${job.id}`);
// Webhook as object (with metadata and event filtering)
const job2 = await firecrawl.asyncCrawlUrl("https://docs.example.com", {
limit: 100,
scrapeOptions: { formats: ["markdown"] },
webhook: {
url: "https://api.yourapp.com/webhooks/firecrawl",
events: ["completed", "page"], // only these events
metadata: {
projectId: "my-project",
triggeredBy: "cron",
},
},
});
Step 2: Webhook Handler with Signature Verification
import express from "express";
import crypto from "crypto";
const app = express();
app.use(express.json());
function verifySignature(body: string, signature: string): boolean {
if (!process.env.FIRECRAWL_WEBHOOK_SECRET) return true; // skip if not configured
const expected = crypto
.createHmac("sha256", process.env.FIRECRAWL_WEBHOOK_SECRET)
.update(body)
.digest("hex");
return crypto.timingSafeEqual(Buffer.from(signature), Buffer.from(expected));
}
app.post("/webhooks/firecrawl", express.raw({ type: "application/json" }), async (req, res) => {
const rawBody = req.body.toString();
const signature = req.headers["x-firecrawl-signature"] as string;
if (!verifySignature(rawBody, signature)) {
return res.status(401).json({ error: "Invalid signatur