Claude Code skill pack for Bright Data (18 skills)
Installation
Open Claude Code and run this command:
/plugin install brightdata-pack@claude-code-plugins-plus
Use --global to install for all projects, or --project for current project only.
Skills (18)
Configure Bright Data CI/CD integration with GitHub Actions and testing.
Bright Data CI Integration
Overview
Set up CI/CD pipelines for Bright Data scraping projects with GitHub Actions. Includes mocked unit tests that run without proxy access and optional live integration tests that verify actual proxy connectivity.
Prerequisites
- GitHub repository with Actions enabled
- Bright Data test zone credentials
- npm/pnpm project configured
Instructions
Step 1: GitHub Actions Workflow
# .github/workflows/scraper-tests.yml
name: Scraper Tests
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
unit-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npm test -- --coverage
# Unit tests use mocked proxy responses — no credentials needed
integration-tests:
runs-on: ubuntu-latest
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
env:
BRIGHTDATA_CUSTOMER_ID: ${{ secrets.BRIGHTDATA_CUSTOMER_ID }}
BRIGHTDATA_ZONE: ${{ secrets.BRIGHTDATA_ZONE }}
BRIGHTDATA_ZONE_PASSWORD: ${{ secrets.BRIGHTDATA_ZONE_PASSWORD }}
BRIGHTDATA_API_TOKEN: ${{ secrets.BRIGHTDATA_API_TOKEN }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- name: Download Bright Data CA cert
run: curl -sO https://brightdata.com/ssl/brd-ca.crt
- name: Verify proxy connectivity
run: |
curl -x "http://brd-customer-${BRIGHTDATA_CUSTOMER_ID}-zone-${BRIGHTDATA_ZONE}:${BRIGHTDATA_ZONE_PASSWORD}@brd.superproxy.io:33335" \
-s https://lumtest.com/myip.json | python3 -m json.tool
- run: npm run test:integration
Step 2: Configure GitHub Secrets
gh secret set BRIGHTDATA_CUSTOMER_ID --body "c_abc123"
gh secret set BRIGHTDATA_ZONE --body "web_unlocker_test"
gh secret set BRIGHTDATA_ZONE_PASSWORD --body "z_test_password"
gh secret set BRIGHTDATA_API_TOKEN --body "test_api_token"
Step 3: Write Mocked Unit Tests
// tests/unit/scraper.test.ts — runs without Bright Data credentials
import { describe, it, expect, vi, beforeEach } from 'vitest';
import axios from 'axios';
vi.mock('axios');
describe('Scraper', () => {
beforeEach(() => vi.clearAllMocks());
it('should configure proxy correctly', async () => {
vi.mocked(axios.create).mockReturnValue({
get: vi.fn().mockResolvedValue({ status: 200, data: '<html>OK</html>' }),
} as any);
const { getBrightDataClientDiagnose and fix Bright Data common errors and exceptions.
Bright Data Common Errors
Overview
Diagnostic reference for the most common Bright Data proxy and API errors with real solutions and fix commands.
Prerequisites
- Bright Data zone configured
- Proxy credentials available
- Access to error logs
Instructions
Step 1: Identify the Error
Check your proxy response status code or error message against the table below.
Step 2: Apply the Fix
Follow the specific solution for your error code.
Error Reference
407 Proxy Authentication Required
HTTP/1.1 407 Proxy Authentication Required
Cause: Username format is wrong or credentials are invalid.
Fix:
# Verify credential format — must be exactly:
# brd-customer-{CUSTOMER_ID}-zone-{ZONE_NAME}
echo "Username: brd-customer-${BRIGHTDATA_CUSTOMER_ID}-zone-${BRIGHTDATA_ZONE}"
# Test with curl
curl -x "http://brd-customer-${BRIGHTDATA_CUSTOMER_ID}-zone-${BRIGHTDATA_ZONE}:${BRIGHTDATA_ZONE_PASSWORD}@brd.superproxy.io:33335" \
https://lumtest.com/myip.json
502 Bad Gateway
HTTP/1.1 502 Bad Gateway
X-Luminati-Error: target_site_blocked
Cause: Target site blocked the request despite Web Unlocker retries.
Fix:
- Increase timeout to 120s (Web Unlocker needs time to solve CAPTCHAs)
- Switch to Scraping Browser zone for JS-heavy sites
- Add
-country-usto username for geo-specific content
SSL Certificate Errors
Error: SSL: CERTIFICATE_VERIFY_FAILED
Cause: Missing Bright Data CA certificate for HTTPS proxying.
Fix:
# Download the Bright Data CA certificate
curl -sO https://brightdata.com/ssl/brd-ca.crt
# Node.js
export NODE_EXTRA_CA_CERTS=./brd-ca.crt
# Python requests
# requests.get(url, proxies=proxies, verify='./brd-ca.crt')
ETIMEDOUT / Connection Timeout
Error: connect ETIMEDOUT brd.superproxy.io:33335
Cause: Firewall blocking outbound connections to Bright Data.
Fix:
# Test connectivity
nc -zv brd.superproxy.io 33335
# If blocked, allow outbound TCP to brd.superproxy.io:33335
# For Scraping Browser, also allow port 9222
nc -zv brd.superproxy.io 9222
403 Forbidden (Zone Inactive)
Cause: Zone is not active or has been paused.
Fix: Go to https://brightdata.com/cp, navigate to the zone, and click "Activate".
429 Too Many Requests
Cause: Exceeded concurrent request limit for your zone.
Fix:
Scrape structured data with Bright Data Scraping Browser using Playwright/Puppeteer.
Bright Data Scraping Browser
Overview
Use Bright Data's Scraping Browser to scrape JavaScript-rendered pages. The Scraping Browser works like a regular Playwright/Puppeteer browser but routes through Bright Data's proxy infrastructure with built-in CAPTCHA solving, fingerprint management, and automatic retries.
Prerequisites
- Completed
brightdata-install-authsetup - Scraping Browser zone active in Bright Data control panel
- Playwright or Puppeteer installed
Instructions
Step 1: Install Playwright
npm install playwright
npx playwright install chromium
Step 2: Connect to Scraping Browser with Playwright
// scraping-browser.ts
import { chromium } from 'playwright';
import 'dotenv/config';
const { BRIGHTDATA_CUSTOMER_ID, BRIGHTDATA_ZONE, BRIGHTDATA_ZONE_PASSWORD } = process.env;
const AUTH = `brd-customer-${BRIGHTDATA_CUSTOMER_ID}-zone-${BRIGHTDATA_ZONE}:${BRIGHTDATA_ZONE_PASSWORD}`;
const BROWSER_WS = `wss://${AUTH}@brd.superproxy.io:9222`;
async function scrapWithBrowser(url: string) {
console.log('Connecting to Scraping Browser...');
const browser = await chromium.connectOverCDP(BROWSER_WS);
try {
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
// Wait for dynamic content to load
await page.waitForSelector('body', { timeout: 30000 });
// Extract structured data
const data = await page.evaluate(() => ({
title: document.title,
metaDescription: document.querySelector('meta[name="description"]')?.getAttribute('content') || '',
h1: document.querySelector('h1')?.textContent?.trim() || '',
links: Array.from(document.querySelectorAll('a[href]')).slice(0, 20).map(a => ({
text: a.textContent?.trim(),
href: a.getAttribute('href'),
})),
}));
console.log('Scraped data:', JSON.stringify(data, null, 2));
return data;
} finally {
await browser.close();
}
}
scrapWithBrowser('https://example.com').catch(console.error);
Step 3: Scrape Dynamic Product Listings
// scrape-products.ts — real-world example
import { chromium, Page } from 'playwright';
import 'dotenv/config';
interface Product {
name: string;
price: string;
rating: string;
url: string;
}
const AUTH = `brd-customer-${process.env.BRIGHTDATA_CUSTOMER_ID}-zone-${process.env.BRIGHTDATA_ZONE}:${process.env.BRIGHTDATA_ZONE_PASSWORD}`;
async function scrapeProducts(searchUrl: string): Promise<Product[]> {
const browser = await chromium.connectOverCDP(`wss://${AUTH}@brd.superproxy.io:9222`);
const page = await browser.newPage();
try {
await pExecute Bright Data secondary workflow: Core Workflow B.
Bright Data SERP API & Web Scraper API
Overview
Collect search engine results and trigger large-scale data collections using Bright Data's SERP API and Web Scraper API. SERP API returns structured JSON from Google, Bing, Yahoo, and other search engines. Web Scraper API triggers asynchronous collections with webhook delivery.
Prerequisites
- Completed
brightdata-install-authsetup - SERP API zone or Web Scraper API dataset configured
- API token from Settings > API tokens
Instructions
Step 1: SERP API — Synchronous Google Search
// serp-api.ts
import 'dotenv/config';
const { BRIGHTDATA_CUSTOMER_ID, BRIGHTDATA_ZONE, BRIGHTDATA_ZONE_PASSWORD } = process.env;
async function searchGoogle(query: string, country = 'us') {
// SERP API uses the proxy protocol with JSON response
const username = `brd-customer-${BRIGHTDATA_CUSTOMER_ID}-zone-${BRIGHTDATA_ZONE}-country-${country}`;
const response = await fetch(
`https://www.google.com/search?q=${encodeURIComponent(query)}&brd_json=1`,
{
headers: {
'Proxy-Authorization': `Basic ${Buffer.from(`${username}:${BRIGHTDATA_ZONE_PASSWORD}`).toString('base64')}`,
},
}
);
const results = await response.json();
console.log(`Query: "${query}"`);
console.log(`Results: ${results.organic?.length || 0} organic`);
for (const r of results.organic?.slice(0, 5) || []) {
console.log(` ${r.rank}. ${r.title} — ${r.link}`);
}
return results;
}
searchGoogle('bright data web scraping').catch(console.error);
Step 2: SERP API — Structured JSON Response
The SERP API returns structured data when you append &brd_json=1:
interface SERPResponse {
organic: Array<{
rank: number;
title: string;
link: string;
description: string;
displayed_link: string;
}>;
paid?: Array<{ title: string; link: string; description: string }>;
knowledge_graph?: { title: string; description: string };
related_searches?: string[];
total_results?: number;
}
Step 3: Web Scraper API — Async Collection with Webhook
// web-scraper-api.ts — trigger large-scale collections
import 'dotenv/config';
const API_TOKEN = process.env.BRIGHTDATA_API_TOKEN!;
async function triggerCollection(
datasetId: string,
urls: string[],
webhookUrl?: string
) {
const params = new URLSearchParams({
dataset_id: datasetId,
format: 'json',
uncompressed_webhook: 'true',
});
if (webhookUrl) params.set('endpoint', webhookUrl);
const response = await fetch(
`https://api.brightdata.com/datasets/v3/trigger?${params}`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${API_TOKEOptimize Bright Data costs through tier selection, sampling, and usage monitoring.
Bright Data Cost Tuning
Overview
Optimize Bright Data costs through product selection, caching, and usage monitoring. Bright Data charges per request (Web Unlocker, SERP API), per GB (Residential Proxy), or per page (Datasets). Choosing the right product and avoiding redundant requests is the primary cost lever.
Prerequisites
- Access to Bright Data billing dashboard
- Understanding of current scraping volumes
- Usage monitoring configured (optional)
Pricing Model
| Product | Pricing | Typical Cost | Best For |
|---|---|---|---|
| Residential Proxy | Per GB transferred | $8-15/GB | High-volume, simple pages |
| Web Unlocker | Per successful request | $1-3/1000 req | Anti-bot protected sites |
| Scraping Browser | Per browser session | $5-10/1000 sessions | JS-heavy SPAs |
| SERP API | Per search | $2-5/1000 searches | Search engine results |
| Datasets (pre-built) | Per record | $0.001-0.01/record | Bulk data (Amazon, LinkedIn) |
| Web Scraper API | Per page | Varies by dataset | Custom async scraping |
Instructions
Step 1: Product Selection Cost Matrix
function estimateMonthlyCost(config: {
product: 'residential' | 'web_unlocker' | 'scraping_browser' | 'serp_api';
requestsPerMonth: number;
avgPageSizeKB?: number;
}) {
switch (config.product) {
case 'residential':
const gbTransferred = (config.requestsPerMonth * (config.avgPageSizeKB || 200)) / 1_000_000;
return { cost: gbTransferred * 10, unit: 'GB', quantity: gbTransferred };
case 'web_unlocker':
return { cost: config.requestsPerMonth * 0.002, unit: 'requests', quantity: config.requestsPerMonth };
case 'scraping_browser':
return { cost: config.requestsPerMonth * 0.008, unit: 'sessions', quantity: config.requestsPerMonth };
case 'serp_api':
return { cost: config.requestsPerMonth * 0.003, unit: 'searches', quantity: config.requestsPerMonth };
}
}
// Example: 50,000 product pages/month
console.log(estimateMonthlyCost({ product: 'web_unlocker', requestsPerMonth: 50000 }));
// { cost: 100, unit: 'requests', quantity: 50000 }
console.log(estimateMonthlyCost({ product: 'residential', requestsPerMonth: 50000, avgPageSizeKB: 300 }));
// { cost: 150, unit: 'GB', quantity: 15 }
Step 2: Reduce Costs with Caching
// Response caching is the single biggest cost saver
// Cache policy by data freshness requirements
const CACHE_TTLS = {
product_price: 3600000, Collect Bright Data debug evidence for support tickets and troubleshooting.
Bright Data Debug Bundle
Overview
Collect all diagnostic information needed for Bright Data support tickets: proxy connectivity, zone status, response headers, and error logs.
Prerequisites
- Bright Data zone credentials configured
- curl available
- Permission to collect environment info
Instructions
Step 1: Create Debug Bundle Script
#!/bin/bash
# brightdata-debug-bundle.sh
set -euo pipefail
BUNDLE_DIR="brightdata-debug-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE_DIR"
echo "=== Bright Data Debug Bundle ===" | tee "$BUNDLE_DIR/summary.txt"
echo "Generated: $(date -u)" | tee -a "$BUNDLE_DIR/summary.txt"
echo "" >> "$BUNDLE_DIR/summary.txt"
Step 2: Collect Environment and Connectivity
# Runtime versions
echo "--- Runtime ---" >> "$BUNDLE_DIR/summary.txt"
node --version >> "$BUNDLE_DIR/summary.txt" 2>&1 || echo "Node.js: not found" >> "$BUNDLE_DIR/summary.txt"
python3 --version >> "$BUNDLE_DIR/summary.txt" 2>&1 || echo "Python: not found" >> "$BUNDLE_DIR/summary.txt"
# Credential check (presence only, never log values)
echo "--- Credentials ---" >> "$BUNDLE_DIR/summary.txt"
echo "BRIGHTDATA_CUSTOMER_ID: ${BRIGHTDATA_CUSTOMER_ID:+[SET]}" >> "$BUNDLE_DIR/summary.txt"
echo "BRIGHTDATA_ZONE: ${BRIGHTDATA_ZONE:-[NOT SET]}" >> "$BUNDLE_DIR/summary.txt"
echo "BRIGHTDATA_ZONE_PASSWORD: ${BRIGHTDATA_ZONE_PASSWORD:+[SET]}" >> "$BUNDLE_DIR/summary.txt"
echo "BRIGHTDATA_API_TOKEN: ${BRIGHTDATA_API_TOKEN:+[SET]}" >> "$BUNDLE_DIR/summary.txt"
# SSL cert check
echo "--- SSL Certificate ---" >> "$BUNDLE_DIR/summary.txt"
if [ -f "./brd-ca.crt" ]; then
openssl x509 -in ./brd-ca.crt -noout -subject -enddate >> "$BUNDLE_DIR/summary.txt" 2>&1
else
echo "brd-ca.crt: NOT FOUND" >> "$BUNDLE_DIR/summary.txt"
fi
Step 3: Test Proxy Connectivity with Verbose Headers
# Proxy connectivity test with full response headers
echo "--- Proxy Test ---" >> "$BUNDLE_DIR/summary.txt"
PROXY_USER="brd-customer-${BRIGHTDATA_CUSTOMER_ID}-zone-${BRIGHTDATA_ZONE}"
curl -x "http://${PROXY_USER}:${BRIGHTDATA_ZONE_PASSWORD}@brd.superproxy.io:33335" \
-s -D "$BUNDLE_DIR/proxy-headers.txt" \
-o "$BUNDLE_DIR/proxy-response.txt" \
-w "HTTP %{http_code} in %{time_total}s\n" \
https://lumtest.com/myip.json 2>> "$BUNDLE_DIR/summary.txt" || echo "Proxy FAILED" Deploy Bright Data integrations to Vercel, Fly.
Bright Data Deploy Integration
Overview
Deploy Bright Data scraping applications to cloud platforms with proper secrets management. Key consideration: Bright Data proxy connections require outbound TCP to brd.superproxy.io on ports 33335 (proxy) and 9222 (Scraping Browser).
Prerequisites
- Bright Data production zone credentials
- Platform CLI installed (vercel, fly, or gcloud)
- Application tested in staging
Instructions
Step 1: Vercel Deployment (Serverless)
# Add secrets
vercel env add BRIGHTDATA_CUSTOMER_ID production
vercel env add BRIGHTDATA_ZONE production
vercel env add BRIGHTDATA_ZONE_PASSWORD production
vercel env add BRIGHTDATA_API_TOKEN production
// vercel.json
{
"functions": {
"api/scrape.ts": {
"maxDuration": 60
}
}
}
// api/scrape.ts — Vercel serverless function
import type { VercelRequest, VercelResponse } from '@vercel/node';
import axios from 'axios';
import https from 'https';
export default async function handler(req: VercelRequest, res: VercelResponse) {
const { url } = req.body;
if (!url) return res.status(400).json({ error: 'url required' });
const proxy = {
host: 'brd.superproxy.io',
port: 33335,
auth: {
username: `brd-customer-${process.env.BRIGHTDATA_CUSTOMER_ID}-zone-${process.env.BRIGHTDATA_ZONE}`,
password: process.env.BRIGHTDATA_ZONE_PASSWORD!,
},
};
try {
const response = await axios.get(url, {
proxy,
httpsAgent: new https.Agent({ rejectUnauthorized: false }),
timeout: 55000, // Leave 5s buffer for Vercel's 60s limit
});
res.json({ status: response.status, length: response.data.length });
} catch (error: any) {
res.status(502).json({ error: error.message });
}
}
Step 2: Fly.io Deployment (Long-Running)
# fly.toml — better for Scraping Browser (needs WebSocket)
app = "my-scraper"
primary_region = "iad"
[env]
NODE_ENV = "production"
[http_service]
internal_port = 3000
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 0
# Set secrets
fly secrets set BRIGHTDATA_CUSTOMER_ID=c_abc123
fly secrets set BRIGHTDATA_ZONE=web_unlocker_prod
fly secrets set BRIGHTDATA_ZONE_PASSWORD=z_prod_pass
fly secrets set BRIGHTDATA_API_TOKEN=prod_token
# Deploy
fly deploy
Step 3: Google Cloud Run
# Store secrets in Secret Manager
echo -n "c_abc123" | gcloud secrets create brightdata-customer-id --data-file=-
echo -n "web_unlocker_prod" | gcloud secrets create brightdata-zone --data-file=-
echo -n "z_proCreate a minimal working Bright Data example.
Bright Data Hello World
Overview
Scrape a real webpage through Bright Data's Web Unlocker proxy. Web Unlocker handles CAPTCHAs, fingerprinting, and retries automatically — you send a normal HTTP request through the proxy endpoint at brd.superproxy.io:33335.
Prerequisites
- Completed
brightdata-install-authsetup - Web Unlocker zone active in Bright Data control panel
brd-ca.crtSSL certificate downloaded
Instructions
Step 1: Scrape via Web Unlocker Proxy (Node.js)
// hello-brightdata.ts
import axios from 'axios';
import https from 'https';
import 'dotenv/config';
const { BRIGHTDATA_CUSTOMER_ID, BRIGHTDATA_ZONE, BRIGHTDATA_ZONE_PASSWORD } = process.env;
const proxy = {
host: 'brd.superproxy.io',
port: 33335,
auth: {
username: `brd-customer-${BRIGHTDATA_CUSTOMER_ID}-zone-${BRIGHTDATA_ZONE}`,
password: BRIGHTDATA_ZONE_PASSWORD!,
},
};
async function scrape(url: string) {
const response = await axios.get(url, {
proxy,
httpsAgent: new https.Agent({ rejectUnauthorized: false }),
timeout: 60000,
});
console.log(`Status: ${response.status}`);
console.log(`Content length: ${response.data.length} chars`);
console.log(response.data.substring(0, 500));
return response.data;
}
scrape('https://example.com').catch(console.error);
Step 2: Scrape via REST API
// hello-brightdata-api.ts
import 'dotenv/config';
async function scrapeViaAPI(url: string) {
const response = await fetch('https://api.brightdata.com/request', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.BRIGHTDATA_API_TOKEN}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
zone: process.env.BRIGHTDATA_ZONE,
url,
format: 'raw',
}),
});
const html = await response.text();
console.log(`Status: ${response.status}, Length: ${html.length}`);
return html;
}
scrapeViaAPI('https://example.com').catch(console.error);
Step 3: Python Version
# hello_brightdata.py
import os, requests
from dotenv import load_dotenv
load_dotenv()
proxy_url = (
f"http://brd-customer-{os.environ['BRIGHTDATA_CUSTOMER_ID']}"
f"-zone-{os.environ['BRIGHTDATA_ZONE']}"
f":{os.environ['BRIGHTDATA_ZONE_PASSWORD']}"
f"@brd.superproxy.io:33335"
)
response = requests.get(
'https://example.com',
proxies={'http': proxy_url, 'https': proxy_url},
verify='./brd-ca.crt',
timeout=60,
)
print(f"Status: {response.status_code}, Length: {len(response.text)}")
Geo-Targeting
Add country or
Install and configure Bright Data SDK/CLI authentication.
Bright Data Install & Auth
Overview
Configure Bright Data proxy credentials, API tokens, and SSL certificates for web scraping. Bright Data uses HTTP proxy protocols and REST APIs — you authenticate via zone credentials from the control panel, not a dedicated npm SDK.
Prerequisites
- Node.js 18+ or Python 3.10+
- Bright Data account at https://brightdata.com
- A configured zone (Web Unlocker, Scraping Browser, SERP API, or Residential)
- Zone credentials from the Bright Data control panel
Instructions
Step 1: Gather Credentials from Control Panel
Log into https://brightdata.com/cp and navigate to your zone's overview tab:
| Credential | Location | Example |
|---|---|---|
| Customer ID | Settings > Account | c_abc123 |
| Zone Name | Zone overview tab | web_unlocker1 |
| Zone Password | Zone overview tab | zpassxyz |
| API Token | Settings > API tokens | abc123def456 |
Step 2: Configure Environment Variables
# .env (NEVER commit to git)
BRIGHTDATA_CUSTOMER_ID=c_abc123
BRIGHTDATA_ZONE=web_unlocker1
BRIGHTDATA_ZONE_PASSWORD=z_pass_xyz
BRIGHTDATA_API_TOKEN=abc123def456
# .gitignore — add these
echo '.env' >> .gitignore
echo '.env.local' >> .gitignore
Step 3: Download Bright Data SSL Certificate
Required for HTTPS proxy connections through the super proxy:
curl -sO https://brightdata.com/ssl/brd-ca.crt
# Node.js — set environment variable
export NODE_EXTRA_CA_CERTS=./brd-ca.crt
Step 4: Install HTTP Libraries
# Node.js
npm install axios dotenv
# Python
pip install requests python-dotenv
Step 5: Verify Connection
// verify-brightdata.ts
import axios from 'axios';
import https from 'https';
import 'dotenv/config';
const { BRIGHTDATA_CUSTOMER_ID, BRIGHTDATA_ZONE, BRIGHTDATA_ZONE_PASSWORD } = process.env;
const proxy = {
host: 'brd.superproxy.io',
port: 33335,
auth: {
username: `brd-customer-${BRIGHTDATA_CUSTOMER_ID}-zone-${BRIGHTDATA_ZONE}`,
password: BRIGHTDATA_ZONE_PASSWORD!,
},
};
async function verify() {
const res = await axios.get('https://lumtest.com/myip.json', {
proxy,
httpsAgent: new https.Agent({ rejectUnauthorized: false }),
});
console.log('Proxy IP:', res.data.ip);
console.log('Country:', res.data.country);
console.log('Connection verified.');
}
verify().catch(console.error);
# verify_brightdata.py
import os, requests
from dotenv Configure Bright Data local development with hot reload and testing.
Bright Data Local Dev Loop
Overview
Set up a fast, reproducible local development workflow for Bright Data scraping projects with mocked proxy responses, cached results, and vitest integration.
Prerequisites
- Completed
brightdata-install-authsetup - Node.js 18+ with npm/pnpm
- brd-ca.crt SSL certificate downloaded
Instructions
Step 1: Create Project Structure
my-scraper/
├── src/
│ ├── brightdata/
│ │ ├── proxy.ts # Proxy configuration helper
│ │ ├── scraper.ts # Scraping functions
│ │ └── cache.ts # Response caching for dev
│ └── index.ts
├── tests/
│ ├── fixtures/ # Cached HTML responses
│ │ └── example.html
│ └── scraper.test.ts
├── .env.local # Local credentials (git-ignored)
├── .env.example # Template for team
├── brd-ca.crt # Bright Data SSL cert (git-ignored)
└── package.json
Step 2: Build Proxy Configuration Module
// src/brightdata/proxy.ts
import 'dotenv/config';
export interface BrightDataProxy {
host: string;
port: number;
auth: { username: string; password: string };
}
export function getProxy(options?: {
country?: string;
city?: string;
session?: string;
}): BrightDataProxy {
const { BRIGHTDATA_CUSTOMER_ID, BRIGHTDATA_ZONE, BRIGHTDATA_ZONE_PASSWORD } = process.env;
if (!BRIGHTDATA_CUSTOMER_ID || !BRIGHTDATA_ZONE || !BRIGHTDATA_ZONE_PASSWORD) {
throw new Error('Missing BRIGHTDATA_* environment variables');
}
let username = `brd-customer-${BRIGHTDATA_CUSTOMER_ID}-zone-${BRIGHTDATA_ZONE}`;
if (options?.country) username += `-country-${options.country}`;
if (options?.city) username += `-city-${options.city}`;
if (options?.session) username += `-session-${options.session}`;
return {
host: 'brd.superproxy.io',
port: 33335,
auth: { username, password: BRIGHTDATA_ZONE_PASSWORD },
};
}
Step 3: Add Response Cache for Development
// src/brightdata/cache.ts — cache scraped pages to avoid burning proxy credits
import { createHash } from 'crypto';
import { existsSync, readFileSync, writeFileSync, mkdirSync } from 'fs';
import { join } from 'path';
const CACHE_DIR = join(process.cwd(), '.scrape-cache');
export function getCachedResponse(url: string): string | null {
const key = createHash('md5').update(url).digest('hex');
const path = join(CACHE_DIR, `${key}.html`);
return existsSync(path) ? readFileSync(path, 'utf-8') : null;
}
export function setCachedResponse(url: string, html: string): void {
mkdirSync(CACHE_DIR, { recursive: true });
const key = createHash('md5').update(url).digest('hex');
writeFileSync(join(CACHE_DIR, `${key}.html`), html);
}
Step
Optimize Bright Data API performance with caching, batching, and connection pooling.
Bright Data Performance Tuning
Overview
Optimize Bright Data scraping performance through connection pooling, response caching, concurrent request tuning, and smart product selection. Web Unlocker latency is typically 5-30s due to CAPTCHA solving; Scraping Browser sessions are 10-60s.
Prerequisites
- Bright Data zone configured
- Understanding of async patterns
- Redis or file cache available (optional)
Latency Benchmarks
| Product | P50 | P95 | P99 | Notes |
|---|---|---|---|---|
| Web Unlocker (simple) | 3s | 8s | 15s | No CAPTCHA |
| Web Unlocker (CAPTCHA) | 10s | 25s | 45s | With CAPTCHA solving |
| Scraping Browser | 8s | 20s | 40s | Full browser render |
| SERP API (sync) | 2s | 5s | 10s | Search results |
| Residential Proxy | 1s | 3s | 8s | Raw proxy, no unblocking |
Instructions
Step 1: Choose the Right Product
// Product selection matrix
function selectProduct(target: { js: boolean; captcha: boolean; structured: boolean }) {
if (target.structured) return 'serp_api'; // Pre-parsed JSON
if (!target.js && !target.captcha) return 'residential'; // Fastest
if (target.js) return 'scraping_browser'; // Browser rendering
return 'web_unlocker'; // Best default
}
Step 2: Connection Pooling with Keep-Alive
import { Agent } from 'https';
import axios from 'axios';
// Reuse TCP connections to brd.superproxy.io
const httpsAgent = new Agent({
keepAlive: true,
maxSockets: 25, // Match your concurrency limit
maxFreeSockets: 5,
timeout: 120000,
rejectUnauthorized: false,
});
const client = axios.create({
proxy: { host: 'brd.superproxy.io', port: 33335, auth: { username: proxyUser, password: proxyPass } },
httpsAgent,
timeout: 60000,
});
Step 3: Response Caching Layer
// src/brightdata/cache.ts — avoid re-scraping identical URLs
import { createHash } from 'crypto';
import { LRUCache } from 'lru-cache';
const memoryCache = new LRUCache<string, string>({
max: 500, // Max cached pages
maxSize: 100_000_000, // 100MB total
sizeCalculation: (v) => Buffer.byteLength(v),
ttl: 3600000, // 1 hour
});
export async function cachedScrape(
url: string,
scraper: (url: string) => Promise<string>,
ttlMs?: number
): Promise<string> {
const key = createHash('sha256').update(url).digest('hex');
const cached = memoryCache.geExecute Bright Data production deployment checklist and rollback procedures.
Bright Data Production Checklist
Overview
Complete checklist for deploying Bright Data scraping integrations to production with zone verification, monitoring, and rollback procedures.
Prerequisites
- Staging environment tested
- Production zone credentials in secrets vault
- Monitoring and alerting configured
Instructions
Step 1: Zone and Credential Verification
- [ ] Production zone active in Bright Data CP
- [ ] Zone password stored in secrets vault (not
.env) - [ ] API token scoped to production zone only
- [ ] SSL certificate (
brd-ca.crt) deployed - [ ] Separate zone from development/staging
# Verify production zone is active
curl -s -H "Authorization: Bearer ${BRIGHTDATA_API_TOKEN}" \
https://api.brightdata.com/zone/get_active_zones \
| python3 -c "import sys,json; zones=json.load(sys.stdin); print([z['name'] for z in zones])"
# Test production proxy connectivity
curl -x "http://brd-customer-${BRIGHTDATA_CUSTOMER_ID}-zone-${BRIGHTDATA_ZONE}:${BRIGHTDATA_ZONE_PASSWORD}@brd.superproxy.io:33335" \
-s -w "HTTP %{http_code} in %{time_total}s\n" \
https://lumtest.com/myip.json
Step 2: Code Quality
- [ ] No hardcoded credentials (grep for passwords, tokens)
- [ ] Retry logic with exponential backoff (see
brightdata-rate-limits) - [ ] Request queuing with concurrency limits (p-queue)
- [ ] Response validation (check for CAPTCHA pages, empty responses)
- [ ] Timeout set to 60-120s for Web Unlocker
- [ ] Error logging includes
X-Luminati-Errorheaders
Step 3: Infrastructure
- [ ] Health check endpoint tests proxy connectivity
- [ ] Monitoring tracks proxy response times, error rates
- [ ] Budget alerts configured in Bright Data CP
- [ ] Circuit breaker for proxy failures
// Health check endpoint
export async function healthCheck() {
const start = Date.now();
try {
const client = getBrightDataClient();
const res = await client.get('https://lumtest.com/myip.json');
return {
status: 'healthy',
proxy_ip: res.data.ip,
latency_ms: Date.now() - start,
};
} catch (error: any) {
return {
status: 'degraded',
error: error.response?.headers?.['x-luminati-error'] || error.message,
latency_ms: Date.now() - start,
};
}
}
Step 4: Monitoring and Alerts
| Alert | Condition | Severity | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Proxy down | 5xx errors > 10/min | P1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| High latency | p99 > 30s | P2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Budget spike | Daily cost > 2x averag
Implement Bright Data rate limiting, backoff, and idempotency patterns.
ReadWriteEdit
Bright Data Rate LimitsOverviewHandle Bright Data rate limits and concurrent request limits. Unlike traditional API rate limits, Bright Data limits are per-zone and based on concurrent connections and requests per second. The Web Scraper API trigger endpoint is limited to 20 requests/min and 60 requests/hour. Prerequisites
InstructionsStep 1: Understand Bright Data Rate Limits
Step 2: Implement Concurrent Request Limiter
Step 3: Exponential Backoff for Proxy ErrorsImplement Bright Data reference architecture with best-practice project layout.
ReadGrep
Bright Data Reference ArchitectureOverviewProduction-ready architecture for Bright Data scraping systems. Covers project layout, data pipeline design, and integration patterns for Web Unlocker, Scraping Browser, SERP API, and Datasets API. Prerequisites
Project Structure
Architecture Diagram
Key ComponentsStep 1: Multi-Product ClientApply production-ready Bright Data SDK patterns for TypeScript and Python.
ReadWriteEdit
Bright Data SDK PatternsOverviewProduction-ready patterns for Bright Data proxy integrations. Since Bright Data uses HTTP proxy protocols (not a dedicated SDK), these patterns wrap proxy configuration, retry logic, session management, and response parsing into reusable modules. Prerequisites
InstructionsStep 1: Proxy Client Singleton
Step 2: Retry Wrapper with Proxy Error Handling
Step 3: Session Management for Sticky IPsApply Bright Data security best practices for secrets and access control.
ReadWriteGrep
Bright Data Security BasicsOverviewSecurity best practices for Bright Data zone credentials, API tokens, and webhook delivery. Bright Data credentials include Customer ID, zone passwords, and API tokens — all must be protected. Prerequisites
InstructionsStep 1: Credential Inventory
Step 2: Environment Variable Security
Step 3: Zone Isolation by EnvironmentCreate separate zones per environment so staging credentials cannot access production proxy bandwidth:
Step 4: Credential Rotation
Step 5: Git Secret Scanning
Step 6: Webhook DeAnalyze, plan, and execute Bright Data SDK upgrades with breaking change detection.
ReadWriteEditBash(npm:*)Bash(git:*)
Bright Data Upgrade & MigrationOverviewGuide for migrating between Bright Data products, API versions, and zone configurations. Since Bright Data uses proxy protocols and REST APIs (not versioned SDKs), migrations typically involve changing zone types, proxy endpoints, or API payload formats. Prerequisites
InstructionsStep 1: Identify Migration Type
Step 2: Migrate from Direct Proxies to Web Unlocker
Step 3: Migrate to Scraping Browser from Puppeteer
Step 4: Migrate Datasets API v2 to v3Implement Bright Data webhook signature validation and event handling.
ReadWriteEditBash(curl:*)
Bright Data Webhooks & EventsOverviewHandle Bright Data webhook deliveries from the Web Scraper API and Datasets API. When you trigger an async collection, Bright Data sends the results to your webhook URL with the collected data in JSON, NDJSON, or CSV format. Prerequisites
InstructionsStep 1: Configure Webhook URL When Triggering Collection
Step 2: Webhook Endpoint — Receive Data Delivery
Step 3: Notification Endpoint (LigReady to use brightdata-pack?Related Pluginsai-ethics-validatorAI ethics and fairness validation ai-experiment-loggerTrack and analyze AI experiments with a web dashboard and MCP tools ai-ml-engineering-packProfessional AI/ML Engineering toolkit: Prompt engineering, LLM integration, RAG systems, AI safety with 12 expert plugins ai-sdk-agentsMulti-agent orchestration with AI SDK v5 - handoffs, routing, and coordination for any AI provider (OpenAI, Anthropic, Google) anomaly-detection-systemDetect anomalies and outliers in data automl-pipeline-builderBuild AutoML pipelines
Tags
brightdatabright datasaassdkintegration
|