uploadId already used |
Reusing bulk upload ID |
Generate unique up
Optimize Glean costs by managing indexed content volume, datasource efficiency, and connector resource usage.
ReadWriteEdit
Glean Cost Tuning
Overview
Glean pricing scales with indexed content volume and per-seat user count, making document indexing volume and search query frequency the primary cost drivers. Enterprise deployments typically connect dozens of datasources, each pushing thousands of documents into the index. Without active content governance, stale drafts, archived pages, and near-empty documents inflate the index by 30-50%, driving up costs with zero search value. Pruning irrelevant content and using incremental indexing are the highest-leverage optimizations.
Cost Breakdown
| Component |
Cost Driver |
Optimization |
| Document indexing |
Volume of indexed content across all sources |
Filter drafts, templates, and archived content pre-index |
| User seats |
Per-seat licensing |
Audit active users quarterly; deprovision inactive accounts |
| Search queries |
Query volume across the organization |
Cache frequent queries; use search analytics to identify redundant patterns |
| Datasource connectors |
Number of active connectors to maintain |
Consolidate overlapping sources; remove unused connectors |
| Content storage |
Size of indexed documents |
Truncate body to 50KB; skip attachments over 10MB |
API Call Reduction
class GleanIndexFilter {
private staleThreshold = 365 * 24 * 60 * 60 * 1000; // 12 months
shouldIndex(doc: { status: string; updatedAt: number; title: string; content: string }): boolean {
if (doc.status === 'draft' || doc.status === 'archived') return false;
if (Date.now() - doc.updatedAt > this.staleThreshold) return false;
if (doc.title.startsWith('[Template]')) return false;
if (doc.content.length < 50) return false;
return true;
}
async incrementalIndex(docs: any[], lastSyncTimestamp: number): Promise<any[]> {
// Only process documents modified since last sync — reduces indexing calls by 80-90%
const modified = docs.filter(d => d.updatedAt > lastSyncTimestamp);
const eligible = modified.filter(d => this.shouldIndex(d));
return eligible.map(d => ({
...d,
content: d.content.slice(0, 50_000) // Truncate to 50KB
}));
}
}
Usage Monitoring
class GleanCostMonitor {
private indexedDocs = 0;
private queriesThisHour = 0;
private budgetDocs = 100_000;
recordIndexed(count: number): void {
this.indexedDocs += count;
const utilization = (this.indexedDocs / this.budgetDocs) * 100;
if (utilization > 80) {
console.warn(`Glean index at ${utilization.toFixed(0)}% capacity: ${this.indexedDocs}/${this.budgetDocs} docs`);
}
}
getUtilization(): string {
PII filtering: strip emails, phone numbers, SSNs from document body before indexing.
ReadWriteEditBash(npm:*)Bash(curl:*)Grep
Glean Data Handling
Overview
Glean enterprise search ingests documents from dozens of connectors (Google Drive, Confluence, Slack, Jira, Salesforce, etc.) and builds a unified search index with permission-aware access control. Data types include indexed document content, connector metadata, user permission maps, query logs, and search analytics. All document content must be PII-filtered before indexing, permission boundaries must be preserved to prevent data leakage across teams, and retention policies must be enforced to comply with corporate governance and GDPR/CCPA obligations.
Data Classification
| Data Type |
Sensitivity |
Retention |
Encryption |
| Indexed document content |
High (may contain PII) |
Per source retention policy |
AES-256 at rest |
| User permission maps |
High (access control) |
Sync lifecycle |
TLS + at rest |
| Connector metadata |
Medium |
Until connector removed |
AES-256 at rest |
| Search query logs |
Medium (reveals intent) |
90 days default |
AES-256 at rest |
| Search analytics/aggregates |
Low |
1 year |
TLS in transit |
Data Import
interface GleanDocument {
id: string; datasource: string; title: string;
body: string; permissions: { allowedUsers?: string[]; allowAnonymousAccess?: boolean };
updatedAt: string; url: string;
}
async function indexDocuments(docs: GleanDocument[], datasource: string) {
// PII strip before indexing
const sanitized = docs.map(doc => ({
...doc,
body: stripPII(doc.body),
}));
// Batch upload with pagination (max 100 per request)
for (let i = 0; i < sanitized.length; i += 100) {
const batch = sanitized.slice(i, i + 100);
await fetch(`https://customer-be.glean.com/api/index/v1/bulkindexdocuments`, {
method: 'POST',
headers: { Authorization: `Bearer ${process.env.GLEAN_INDEXING_TOKEN}`, 'Content-Type': 'application/json' },
body: JSON.stringify({ datasource, documents: batch }),
});
}
}
function stripPII(text: string): string {
return text
.replace(/\b[\w.+-]+@[\w-]+\.[\w.]+\b/g, '[EMAIL_REDACTED]')
.replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE_REDACTED]')
.replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN_REDACTED]');
}
Data Export
async function exportSearchAnalytics(startDate: string, endDate: string) {
const res = await fetch(`https://customer-be.glean.com/api/v1/analytics`, {
method: 'POST',
headers: { Authorization: `Bearer ${process.env.GLEAN_API_TOKEN}`, 'Content-Type': 'application/json' },
body: JSON.stringify({ startDate, endDate,
Collect Glean diagnostic information for support including datasource config, indexing status, and search quality metrics.
ReadBash(curl:*)Grep
Glean Debug Bundle
Overview
This debug bundle collects diagnostic evidence from Glean enterprise search integrations
for troubleshooting datasource configuration, document indexing pipelines, and search
quality issues. It captures indexing token validation, datasource configuration state,
crawl status, search query test results, and permission model health. The resulting
tarball provides the evidence needed to diagnose connector failures, stale index problems,
missing document results, and permission-based search gaps without requiring admin console access.
Prerequisites
curl, jq, tar installed
GLEAN_DOMAIN set to your Glean instance (e.g., your-company-be.glean.com)
GLEANINDEXINGTOKEN for datasource/indexing endpoints
GLEANCLIENTTOKEN for search API endpoints
Debug Collection Script
#!/bin/bash
set -euo pipefail
BUNDLE="debug-glean-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE"
# Environment check
echo "=== Environment ===" > "$BUNDLE/environment.txt"
echo "Glean Domain: ${GLEAN_DOMAIN:-NOT SET}" >> "$BUNDLE/environment.txt"
echo "Indexing Token: ${GLEAN_INDEXING_TOKEN:+SET (redacted)}" >> "$BUNDLE/environment.txt"
echo "Client Token: ${GLEAN_CLIENT_TOKEN:+SET (redacted)}" >> "$BUNDLE/environment.txt"
echo "Node: $(node -v 2>/dev/null || echo 'not installed')" >> "$BUNDLE/environment.txt"
echo "Timestamp: $(date -u)" >> "$BUNDLE/environment.txt"
# Datasource configuration
echo "=== Datasource Config ===" > "$BUNDLE/datasource-config.json"
curl -sf -X POST "https://${GLEAN_DOMAIN}/api/index/v1/getdatasourceconfig" \
-H "Authorization: Bearer ${GLEAN_INDEXING_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"datasource\":\"${GLEAN_DATASOURCE:-custom}\"}" \
>> "$BUNDLE/datasource-config.json" 2>&1 || echo '{"error":"UNREACHABLE"}' > "$BUNDLE/datasource-config.json"
# Indexing status check
echo "=== Indexing Status ===" > "$BUNDLE/indexing-status.json"
curl -sf -X POST "https://${GLEAN_DOMAIN}/api/index/v1/getstatus" \
-H "Authorization: Bearer ${GLEAN_INDEXING_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"datasource\":\"${GLEAN_DATASOURCE:-custom}\"}" \
>> "$BUNDLE/indexing-status.json" 2>&1 || echo '{"error":"FAILED"}' > "$BUNDLE/indexing-status.json"
# Search quality test
echo "=== Sea
Deploy Glean custom connectors as scheduled jobs on Cloud Run, Lambda, or Fly.
ReadWriteEditBash(npm:*)Bash(gcloud:*)Grep
Glean Deploy Integration
Overview
Deploy a containerized Glean enterprise search integration service with Docker. This skill covers building a production image that connects to Glean's Indexing and Search APIs for managing document ingestion, custom datasource connectors, and search queries. Includes environment configuration for multi-datasource indexing, health checks that verify API connectivity and indexing status, and rolling update strategies that avoid interrupting active indexing jobs.
Docker Configuration
FROM node:20-slim AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY tsconfig.json ./
COPY src/ ./src/
RUN npm run build
FROM node:20-slim
RUN addgroup --system app && adduser --system --ingroup app app
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package*.json ./
USER app
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
CMD curl -f http://localhost:3000/health || exit 1
CMD ["node", "dist/index.js"]
Environment Variables
export GLEAN_API_KEY="glean_xxxxxxxxxxxx"
export GLEAN_BASE_URL="https://company-be.glean.com/api/index/v1"
export GLEAN_DATASOURCE="custom-wiki"
export GLEAN_DOMAIN="company-be.glean.com"
export LOG_LEVEL="info"
export PORT="3000"
export NODE_ENV="production"
Health Check Endpoint
import express from 'express';
const app = express();
app.get('/health', async (req, res) => {
try {
const response = await fetch(`https://${process.env.GLEAN_DOMAIN}/api/index/v1/getdatasourceconfig`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.GLEAN_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ datasource: process.env.GLEAN_DATASOURCE }),
});
if (!response.ok) throw new Error(`Glean API returned ${response.status}`);
res.json({ status: 'healthy', service: 'glean-integration', datasource: process.env.GLEAN_DATASOURCE, timestamp: new Date().toISOString() });
} catch (error) {
res.status(503).json({ status: 'unhealthy', error: (error as Error).message });
}
});
Deployment Steps
Step 1: Build
docker build -t glean-integration:latest .
Step 2: Run
docker run -d --name glean-integration \
-p 3000:3000 \
-e GLEAN_API_KEY -e GLEAN_BASE_URL -e GLEAN_DATASOURCE -e GLEAN_DOMAIN \
glean-integration:latest
Step 3: Verify
curl -s http://localhost:3000/health | jq .
Step 4: Rolling Update
Map AD/Okta groups to Glean document permissions using allowedGroups.
ReadWriteEditBash(npm:*)Bash(curl:*)Grep
Glean Enterprise RBAC
Overview
Glean's enterprise search aggregates content from dozens of connectors (Google Drive, Confluence, Slack, Salesforce). RBAC ensures users only see documents they are authorized to access. Permissions flow from source systems through connector-level ACLs into Glean's unified index. Misconfigured permissions mean search results leak sensitive data across teams. SOC 2 and GDPR compliance require document-level access control and full audit trails on who searched what.
Role Hierarchy
| Role |
Permissions |
Scope |
| Super Admin |
Create API tokens, manage all connectors, configure SSO |
Organization-wide |
| Admin |
Add/edit datasources, manage user groups, view analytics |
Assigned datasources |
| Content Manager |
Set document permissions, manage allowedGroups per datasource |
Own datasources |
| User |
Search and view permitted documents |
Documents matching ACLs |
| Viewer |
Search only, no document previews or snippets |
Restricted document set |
Permission Check
async function checkDocumentAccess(userId: string, documentId: string): Promise<boolean> {
const response = await fetch(`${GLEAN_API}/permissions/check`, {
method: 'POST',
headers: { Authorization: `Bearer ${GLEAN_API_TOKEN}`, 'Content-Type': 'application/json' },
body: JSON.stringify({ userId, documentId }),
});
const result = await response.json();
return result.hasAccess ?? false;
}
Role Assignment
async function assignDatasourceRole(email: string, datasource: string, role: 'admin' | 'viewer'): Promise<void> {
await fetch(`${GLEAN_API}/datasources/${datasource}/permissions`, {
method: 'PUT',
headers: { Authorization: `Bearer ${GLEAN_API_TOKEN}`, 'Content-Type': 'application/json' },
body: JSON.stringify({ user: email, role, allowedGroups: [`${datasource}-${role}s`] }),
});
}
async function revokeDatasourceAccess(email: string, datasource: string): Promise<void> {
await fetch(`${GLEAN_API}/datasources/${datasource}/permissions/${email}`, {
method: 'DELETE',
headers: { Authorization: `Bearer ${GLEAN_API_TOKEN}` },
});
}
Audit Logging
interface GleanAuditEntry {
timestamp: string; userId: string; action: 'search' | 'view' | 'index' | 'permission_change';
datasource: string; query?: string; documentId?: string; result: 'allowed' | 'denied';
}
function logSearchAccess(entry: GleanAuditEntry): void {
console.log(JSON.stringify({ ...entry, org: process.env.GL
Index documents into Glean and search them back using the Indexing and Client APIs.
ReadWriteEditBash(npm:*)Bash(curl:*)
Glean Hello World
Overview
Index documents into Glean and search them. Two steps: set up a custom datasource with the Indexing API, then query with the Client API.
Instructions
Step 1: Set Up Custom Datasource
const GLEAN = `https://${process.env.GLEAN_DOMAIN}/api`;
const idxHeaders = {
'Authorization': `Bearer ${process.env.GLEAN_INDEXING_TOKEN}`,
'Content-Type': 'application/json',
};
// Create or configure a custom datasource
await fetch(`${GLEAN}/index/v1/adddatasource`, {
method: 'POST', headers: idxHeaders,
body: JSON.stringify({
name: 'my_wiki',
displayName: 'Internal Wiki',
datasourceCategory: 'PUBLISHED_CONTENT',
urlRegex: 'https://wiki.company.com/.*',
iconUrl: 'https://wiki.company.com/favicon.ico',
}),
});
Step 2: Index Documents
// Index individual documents
await fetch(`${GLEAN}/index/v1/indexdocuments`, {
method: 'POST', headers: idxHeaders,
body: JSON.stringify({
datasource: 'my_wiki',
documents: [{
id: 'doc-001',
title: 'Getting Started Guide',
url: 'https://wiki.company.com/getting-started',
body: { mimeType: 'text/plain', textContent: 'This guide covers onboarding steps...' },
author: { email: 'jane@company.com' },
updatedAt: new Date().toISOString(),
permissions: { allowAnonymousAccess: true },
}],
}),
});
console.log('Document indexed.');
Step 3: Search
const searchHeaders = {
'Authorization': `Bearer ${process.env.GLEAN_CLIENT_TOKEN}`,
'X-Glean-Auth-Type': 'BEARER',
'Content-Type': 'application/json',
};
const results = await fetch(`${GLEAN}/client/v1/search`, {
method: 'POST', headers: searchHeaders,
body: JSON.stringify({
query: 'onboarding getting started',
pageSize: 10,
requestOptions: { datasourceFilter: 'my_wiki' },
}),
}).then(r => r.json());
results.results?.forEach((r: any) => {
console.log(`${r.title} (${r.url}) — score: ${r.score}`);
});
Output
Document indexed.
Getting Started Guide (https://wiki.company.com/getting-started) — score: 0.95
Error Handling
| Error |
Cause |
Solution |
datasource not found |
Datasource not created |
Run adddatasource first |
| No search results |
Indexing not yet complete |
Wait 1-2 minutes for processing |
invalid document |
Missing required fields |
Include id, title, url |
Triage: Is search returning results? Check Glean status page.
ReadWriteEditBash(npm:*)Bash(curl:*)Grep
Glean Incident Runbook
Overview
Incident response procedures for Glean enterprise search integration failures. Covers search degradation, connector sync failures, indexing backlogs, and permission sync drift. Glean aggregates knowledge across all company tools, so incidents impact employee productivity across the entire organization. When search breaks or returns stale results, teams lose access to critical institutional knowledge. Classify severity immediately and follow the matching playbook below.
Severity Levels
| Level |
Definition |
Response Time |
Example |
| P1 - Critical |
Search fully down or returning zero results |
15 min |
All queries return empty, API 5xx errors |
| P2 - High |
Connector sync failed, content going stale |
30 min |
Google Drive connector last synced 24h ago |
| P3 - Medium |
Indexing backlog or partial result degradation |
2 hours |
New documents not appearing for 4+ hours |
| P4 - Low |
Permission sync drift or single datasource issue |
8 hours |
One user sees docs they shouldn't access |
Diagnostic Steps
# Test search API health
curl -s -o /dev/null -w "HTTP %{http_code}\n" \
-H "Authorization: Bearer $GLEAN_API_TOKEN" \
-H "Content-Type: application/json" \
-X POST https://your-domain.glean.com/api/v1/search \
-d '{"query": "test", "pageSize": 1}'
# Check datasource connector status
curl -s -H "Authorization: Bearer $GLEAN_API_TOKEN" \
https://your-domain.glean.com/api/v1/getdatasourceconfig \
-d '{"datasource": "DATASOURCE_NAME"}' | jq '.status'
# Verify indexing queue depth
curl -s -H "Authorization: Bearer $GLEAN_API_TOKEN" \
https://your-domain.glean.com/api/index/v1/getstatus | jq '.statistics'
Incident Playbooks
API Outage
- Confirm outage with diagnostic curl above and check Glean status page
- Verify your Glean instance URL resolves and TLS cert is valid
- Test from multiple networks to rule out local DNS or firewall issues
- Notify users that search is temporarily unavailable
- Contact Glean support with instance name, timestamps, and error codes
Authentication Failure
- Verify API token is set:
echo $GLEANAPITOKEN | wc -c
- Check token expiry — Glean tokens may have a TTL configured by your admin
- Test with a minimal search request (see diagnostics above)
- If 401: regenerate token in Glean admin console under API settings
- If 403: verify token scopes include search and indexing permissions
Data Sync Fai
Install and configure Glean API authentication with indexing and client tokens.
ReadWriteEditBash(npm:*)Bash(curl:*)Grep
Glean Install & Auth
Overview
Configure Glean API authentication for enterprise search and knowledge management. Glean has two APIs: the Indexing API (push content into search) and the Client API (search and retrieve). Each uses separate tokens. Base URL: https://-be.glean.com/api.
Prerequisites
- Glean enterprise account with admin access
- API token from Glean Admin > API Tokens
- Your Glean deployment domain (e.g.,
company-be.glean.com)
Instructions
Step 1: Obtain API Tokens
Navigate to Glean Admin Console > Settings > API:
| Token Type |
Purpose |
Required Header |
| Indexing API token |
Push documents into search index |
Authorization: Bearer |
| Client API token |
Search, chat, user-scoped queries |
Authorization: Bearer + X-Glean-Auth-Type: BEARER |
Step 2: Configure Environment Variables
# .env (NEVER commit)
GLEAN_DOMAIN=company-be.glean.com
GLEAN_INDEXING_TOKEN=glean_idx_...
GLEAN_CLIENT_TOKEN=glean_cli_...
GLEAN_DATASOURCE=custom_app # Your custom datasource name
Step 3: Install SDK and Verify
npm install @anthropic-ai/glean-indexing-api-client # Or use fetch directly
const GLEAN_BASE = `https://${process.env.GLEAN_DOMAIN}/api`;
// Verify indexing API access
async function verifyIndexingAccess() {
const res = await fetch(`${GLEAN_BASE}/index/v1/getdatasourceconfig`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.GLEAN_INDEXING_TOKEN}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ datasource: process.env.GLEAN_DATASOURCE }),
});
const config = await res.json();
console.log(`Connected. Datasource: ${config.name}`);
}
// Verify client API access
async function verifySearchAccess() {
const res = await fetch(`${GLEAN_BASE}/client/v1/search`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.GLEAN_CLIENT_TOKEN}`,
'X-Glean-Auth-Type': 'BEARER',
'Content-Type': 'application/json',
},
body: JSON.stringify({ query: 'test', pageSize: 1 }),
});
const results = await res.json();
console.log(`Search works. Found ${results.results?.length ?? 0} results.`);
}
Error Handling
| Error |
Code |
Cause |
Solution |
Unauthorized |
401 |
Invalid token |
Regenerate in Admin > API Tokens |
Forbidden<
Configure Glean local development with mock search responses, test datasources, and connector development workflow.
ReadWriteEditBash(npm:*)Grep
Glean Local Dev Loop
Overview
Local development workflow for Glean enterprise search API integration. Provides a fast feedback loop with mock search results, connector testing, and document indexing simulation so you can build custom datasource connectors and search UIs without needing a live Glean deployment. Toggle between mock mode for rapid connector iteration and sandbox mode for validating against your Glean instance.
Environment Setup
cp .env.example .env
# Set your credentials:
# GLEAN_API_KEY=glean_xxxxxxxxxxxx
# GLEAN_INSTANCE=https://your-company.glean.com
# MOCK_MODE=true
npm install express axios dotenv tsx typescript @types/node
npm install -D vitest supertest @types/express
Dev Server
// src/dev/server.ts
import express from "express";
import { createProxyMiddleware } from "http-proxy-middleware";
const app = express();
app.use(express.json());
const MOCK = process.env.MOCK_MODE === "true";
if (!MOCK) {
app.use("/api", createProxyMiddleware({
target: process.env.GLEAN_INSTANCE,
changeOrigin: true,
headers: { Authorization: `Bearer ${process.env.GLEAN_API_KEY}` },
}));
} else {
const { mountMockRoutes } = require("./mocks");
mountMockRoutes(app);
}
app.listen(3003, () => console.log(`Glean dev server on :3003 [mock=${MOCK}]`));
Mock Mode
// src/dev/mocks.ts — realistic enterprise search responses
export function mountMockRoutes(app: any) {
app.post("/api/search", (req: any, res: any) => res.json({
results: [
{ title: "Q4 Engineering Roadmap", url: "https://wiki.co/roadmap", score: 0.97, datasource: "confluence",
snippets: [{ snippet: "The <b>roadmap</b> includes migration to..." }] },
{ title: "Onboarding Guide", url: "https://wiki.co/onboard", score: 0.88, datasource: "notion",
snippets: [{ snippet: "New hire <b>onboarding</b> steps..." }] },
],
totalCount: 2,
}));
app.post("/api/index/documents", (req: any, res: any) => res.json({
status: "OK", documentsIndexed: req.body.documents?.length || 0,
}));
app.get("/api/datasources", (_req: any, res: any) => res.json([
{ name: "confluence", displayName: "Confluence", docCount: 1250 },
{ name: "notion", displayName: "Notion", docCount: 430 },
]));
}
Testing Workflow
npm run dev:mock & # Start mock server in background
npm run test # Unit tests with vitest
npm run test -- --watch # Watch mode for rapid iteration
MOCK_MODE=false npm run test:integration # Integration test agains
Migrate from Elasticsearch/Algolia: 1) Export all documents from source, 2) Transform to Glean document schema (id, title, url, body, permissions), 3) Create datasource with adddatasource, 4) Bulk index with bulkindexdocuments, 5) Validate search quality with test queries, 6) Switch search UI to use Glean Client API.
ReadWriteEditBash(npm:*)Bash(curl:*)Grep
Glean Migration Deep Dive
Overview
Comprehensive guide for migrating enterprise search from Elasticsearch or Algolia to
Glean. Covers connector migration (replacing custom crawlers with Glean's push indexing
API), permission model changes (mapping ACLs to Glean's datasource-level permissions),
and full index rebuilds using bulkindexdocuments. Typical timeline is 2-4 weeks for
a mid-size deployment with 100K-1M documents across multiple datasources.
Migration Assessment
// Scan current integration for deprecated patterns and index health
const assessment = {
source: process.env.SEARCH_PROVIDER ?? 'elasticsearch',
indices: await sourceClient.cat.indices({ format: 'json' }),
totalDocs: 0, connectors: [] as string[], permissionModel: '',
};
for (const idx of assessment.indices) {
assessment.totalDocs += parseInt(idx['docs.count'] ?? '0', 10);
assessment.connectors.push(idx.index);
}
assessment.permissionModel = assessment.source === 'elasticsearch' ? 'index-level' : 'api-key';
console.log(`Source: ${assessment.source}`);
console.log(`Indices: ${assessment.connectors.length} | Total docs: ${assessment.totalDocs}`);
console.log(`Permission model: ${assessment.permissionModel} → Glean datasource ACLs`);
Step-by-Step Migration
Phase 1: Prepare
Export all documents from the current search provider and map them to Glean's document
schema. Each document needs id, title, url, body, and permissions.
interface GleanDocument {
id: string;
datasource: string;
title: string;
url: string;
body: { mimeType: 'text/plain' | 'text/html'; content: string };
permissions: { allowedUsers?: string[]; allowedGroups?: string[] };
updatedAt: string;
}
async function exportAndTransform(sourceIndex: string): Promise<GleanDocument[]> {
const docs: GleanDocument[] = [];
let scrollId: string | undefined;
do {
const batch = scrollId
? await sourceClient.scroll({ scroll: '2m', scroll_id: scrollId })
: await sourceClient.search({ index: sourceIndex, scroll: '2m', size: 500 });
scrollId = batch._scroll_id;
for (const hit of batch.hits.hits) {
docs.push({
id: hit._id, datasource: 'custom_' + sourceIndex,
title: hit._source.title, url: hit._source.url,
body: { mimeType: 'text/plain', content: hit._source.body },
permissions: { allowedGroups: hit._source.acl_groups ?? ['everyone'] },
updatedAt: hit._source.updated_at ?? new Date().toISOString(),
});
}
} while (scrollId && docs.length < 1_000_000);
return docs;
}
Phase 2: Migrate
Create the Glean datasource and bul
Use separate datasource names per environment (wiki_staging vs wiki_prod).
ReadWriteEditBash(npm:*)Bash(curl:*)Grep
Glean Multi-Environment Setup
Overview
Glean enterprise search requires environment isolation to prevent test data from polluting production search results. Each environment uses its own datasource names, API tokens, and connector configurations. Sandbox indexes synthetic test documents, staging indexes a curated subset of real documents for search quality validation, and production indexes the full corpus. Connector changes must be tested in staging before promotion to avoid breaking search relevance for end users.
Environment Configuration
const gleanConfig = (env: string) => ({
development: {
apiToken: process.env.GLEAN_API_TOKEN_DEV!, baseUrl: "https://sandbox.glean.com/api/v1",
datasourceSuffix: "_sandbox", indexingEnabled: true, searchQualityChecks: false,
},
staging: {
apiToken: process.env.GLEAN_API_TOKEN_STG!, baseUrl: "https://staging.glean.com/api/v1",
datasourceSuffix: "_staging", indexingEnabled: true, searchQualityChecks: true,
},
production: {
apiToken: process.env.GLEAN_API_TOKEN_PROD!, baseUrl: "https://app.glean.com/api/v1",
datasourceSuffix: "_prod", indexingEnabled: true, searchQualityChecks: false,
},
}[env]);
Environment Files
# Per-env files: .env.development, .env.staging, .env.production
GLEAN_API_TOKEN_{DEV|STG|PROD}=<token>
GLEAN_BASE_URL=https://{sandbox|staging|app}.glean.com/api/v1
GLEAN_DATASOURCE_SUFFIX={_sandbox|_staging|_prod}
GLEAN_INSTANCE={sandbox|staging|production}
Environment Validation
function validateGleanEnv(env: string): void {
const suffix = { development: "_DEV", staging: "_STG", production: "_PROD" }[env];
const required = [`GLEAN_API_TOKEN${suffix}`, "GLEAN_BASE_URL", "GLEAN_INSTANCE"];
const missing = required.filter((k) => !process.env[k]);
if (missing.length) throw new Error(`Missing Glean env vars for ${env}: ${missing.join(", ")}`);
}
Promotion Workflow
# 1. Index test documents in sandbox
curl -X POST "$GLEAN_BASE_URL/indexing/datasources/wiki_sandbox/documents" \
-H "Authorization: Bearer $GLEAN_API_TOKEN_DEV" -d @test-docs.json
# 2. Validate search quality in staging
curl "$GLEAN_BASE_URL/search" -H "Authorization: Bearer $GLEAN_API_TOKEN_STG" \
-d '{"query": "onboarding guide"}' | jq '.results[:3].title'
# 3. Compare relevance scores against baseline
node scripts/compare-search-quality.js --env staging --baseline baseline.json
# 4. Promote connector config to production
cp connectors/staging/*.json connectors/production/
curl -X POST "$GLEAN_BASE_URL/indexing/datasources/wiki_prod/crawl" \
-H &quo
Track: documents indexed per run (total + new + updated + deleted), indexing errors and retries, search API latency, zero-result query rate, stale content age distribution.
ReadWriteEditBash(npm:*)Bash(curl:*)Grep
Glean Observability
Overview
Glean aggregates enterprise knowledge across dozens of connectors, making indexing health and search quality the two pillars of observability. Monitor connector sync status to catch stale content before users notice, track search latency to maintain sub-second responses, and measure zero-result rates to identify coverage gaps. Degraded indexing silently erodes search relevance, so proactive alerting is essential.
Key Metrics
| Metric |
Type |
Target |
Alert Threshold |
| Search latency p95 |
Histogram |
< 400ms |
> 1s |
| Zero-result query rate |
Gauge |
< 5% |
> 10% |
| Documents indexed per run |
Counter |
Stable +/-5% |
Drop > 20% |
| Connector sync errors |
Counter |
0 |
> 3 per hour |
| Stale content ratio |
Gauge |
< 10% |
> 25% (>30 days old) |
| Indexing throughput |
Gauge |
> 1000 docs/min |
< 500 docs/min |
Instrumentation
async function trackGleanSearch(query: string, client: GleanClient) {
const start = Date.now();
try {
const results = await client.search({ query });
const latency = Date.now() - start;
metrics.histogram('glean.search.latency', latency);
metrics.increment('glean.search.total');
if (results.totalCount === 0) metrics.increment('glean.search.zero_results');
return results;
} catch (err) {
metrics.increment('glean.search.errors', { error: err.code });
throw err;
}
}
Health Check Dashboard
async function gleanHealth(): Promise<Record<string, string>> {
const connectors = await gleanAdmin.getConnectorStatus();
const staleRatio = await gleanAdmin.getStaleContentRatio(30);
const searchP95 = await metrics.query('glean.search.latency', 'p95', '5m');
return {
connectors: connectors.every(c => c.status === 'ok') ? 'healthy' : 'degraded',
content_freshness: staleRatio < 0.1 ? 'healthy' : 'stale',
search_latency: searchP95 < 400 ? 'healthy' : 'slow',
};
}
Alerting Rules
const alerts = [
{ metric: 'glean.search.latency_p95', condition: '> 1000ms', window: '10m', severity: 'warning' },
{ metric: 'glean.search.zero_result_rate', condition: '> 0.10', window: '1h', severity: 'warning' },
{ metric: 'glean.indexing.sync_errors', condition: '> 3', window: '1h', severity: 'critical' },
{ metric: 'glean.indexing.doc_c
Optimize Glean search relevance and indexing throughput with batch sizing, datasource configuration, and content quality improvements.
ReadWriteEdit
Glean Performance Tuning
Overview
Glean's enterprise search API handles search queries across multiple connectors, bulk document indexing, and connector sync throughput. Search latency compounds when querying across dozens of datasources simultaneously. Large indexing jobs (10K+ documents) require careful batching to avoid rate limits and maintain connector sync schedules. Optimizing batch sizes, caching frequent search results, and tuning connector configurations reduces search P95 latency and keeps indexing pipelines within SLA windows.
Caching Strategy
const cache = new Map<string, { data: any; expiry: number }>();
const TTL = { search: 60_000, suggestions: 30_000, datasources: 600_000 };
async function cached(key: string, ttlKey: keyof typeof TTL, fn: () => Promise<any>) {
const entry = cache.get(key);
if (entry && entry.expiry > Date.now()) return entry.data;
const data = await fn();
cache.set(key, { data, expiry: Date.now() + TTL[ttlKey] });
return data;
}
// Search results expire fast (1 min). Datasource metadata is stable (10 min).
Batch Operations
import PQueue from 'p-queue';
const BATCH_SIZE = 100;
async function indexDocsBatched(glean: any, dsName: string, docs: any[]) {
const batches = [];
for (let i = 0; i < docs.length; i += BATCH_SIZE) batches.push(docs.slice(i, i + BATCH_SIZE));
const queue = new PQueue({ concurrency: 3, interval: 500 });
await Promise.all(batches.map(batch =>
queue.add(() => glean.indexDocuments(dsName, batch))
));
}
Connection Pooling
import { Agent } from 'https';
const agent = new Agent({ keepAlive: true, maxSockets: 15, maxFreeSockets: 5, timeout: 30_000 });
// High socket count for parallel indexing across multiple datasources
Rate Limit Management
async function withGleanRateLimit(fn: () => Promise<any>): Promise<any> {
try { return await fn(); }
catch (err: any) {
if (err.status === 429) {
const retryMs = parseInt(err.headers?.['retry-after'] || '5') * 1000;
await new Promise(r => setTimeout(r, retryMs));
return fn();
}
throw err;
}
}
Monitoring
const metrics = { searches: 0, indexOps: 0, cacheHits: 0, p95LatencyMs: 0, errors: 0 };
const latencies: number[] = [];
function trackSearch(startMs: number, cached: boolean) {
const lat = Date.now() - startMs; latencies.push(lat); metrics.searches++;
if (cached) metrics.cacheHits++;
latencies.sort((a, b) => a - b);
metrics.p95LatencyMs = latencies[Math.floor(latencies.length * 0.95)] || 0;
}
Performance Checklist
- [ ] Batch indexing calls at 100 docs per request with 3 concurrent wo
Pre-launch: All datasources indexed and searchable.
ReadWriteEditBash(npm:*)Bash(curl:*)Grep
Glean Production Checklist
Overview
Glean provides enterprise search across all company data sources with AI-powered ranking and document understanding. A production integration requires all connectors indexed, document permissions correctly mapped, and search quality validated. Failures mean employees find stale documents, see content they lack permission for, or get zero results when data exists.
Authentication & Secrets
- [ ]
GLEANAPIKEY stored in secrets manager (not config files)
- [ ] Indexing API token separated from Search API token
- [ ] Key rotation schedule documented (quarterly cycle)
- [ ] Separate credentials for staging/prod environments
- [ ] Service account permissions scoped per data source connector
API Integration
- [ ] Production base URL configured (
https://api.glean.com/v1)
- [ ] Rate limit handling with exponential backoff
- [ ] All data source connectors configured and initial crawl complete
- [ ] Document permission mapping tested with different user roles
- [ ] Connector sync scheduled (daily cron or event-driven webhooks)
- [ ] Bulk indexing supports incremental updates (not full re-index)
- [ ] Search query tested across all indexed data sources
Error Handling & Resilience
- [ ] Circuit breaker configured for Glean API outages
- [ ] Retry with backoff for 429/5xx responses
- [ ] Connector sync failure detection within 1 hour
- [ ] Stale index detection (alert if last update > 24 hours)
- [ ] Document permission errors logged (access denied on indexed docs)
- [ ] Fallback search plan if Glean is unavailable
Monitoring & Alerting
- [ ] API latency tracked per endpoint (search, indexing)
- [ ] Error rate alerts set (threshold: >3% over 5 minutes)
- [ ] Connector health dashboard showing sync status per source
- [ ] Search quality metrics tracked (click-through rate, zero-result rate)
- [ ] Index document count monitored for unexpected drops
Validation Script
async function checkGleanReadiness(): Promise<void> {
const checks: { name: string; pass: boolean; detail: string }[] = [];
// Search API connectivity
try {
const res = await fetch('https://api.glean.com/v1/search', {
method: 'POST',
headers: { Authorization: `Bearer ${process.env.GLEAN_API_KEY}`, 'Content-Type': 'application/json' },
body: JSON.stringify({ query: 'test', pageSize: 1 }),
});
checks.push({ name: 'Search API', pass: res.ok, detail: res.ok ? 'Connected' : `HTTP ${res.status}` });
} catch (e: any) { checks.push({ name: 'Search API', pass: false, detail: e.message }); }
// Credentials present
checks.push({
Glean Indexing API: ~100 requests/min per token.
ReadWriteEditBash(npm:*)Bash(curl:*)Grep
Glean Rate Limits
Overview
Glean's APIs split into two tiers: the Indexing API for pushing documents into the search corpus, and the Client API for executing searches. The Indexing API handles bulk document ingestion at approximately 100 requests per minute, while search queries are capped at 60 per minute per token. Organizations indexing large knowledge bases (100K+ documents from Confluence, Notion, or internal wikis) must implement careful batching to avoid 429 responses that can stall multi-hour ingestion pipelines.
Rate Limit Reference
| Endpoint |
Limit |
Window |
Scope |
| Indexing - single document |
100 req |
1 minute |
Per API token |
| Indexing - bulk (100 docs/req) |
20 req |
1 minute |
Per API token |
| Search queries |
60 req |
1 minute |
Per API token |
| People search |
30 req |
1 minute |
Per API token |
| Entity extraction |
40 req |
1 minute |
Per API token |
Rate Limiter Implementation
class GleanRateLimiter {
private tokens: number;
private lastRefill: number;
private readonly max: number;
private readonly refillRate: number;
private queue: Array<{ resolve: () => void }> = [];
constructor(maxPerMinute: number) {
this.max = maxPerMinute;
this.tokens = maxPerMinute;
this.lastRefill = Date.now();
this.refillRate = maxPerMinute / 60_000;
}
async acquire(): Promise<void> {
this.refill();
if (this.tokens >= 1) { this.tokens -= 1; return; }
return new Promise(resolve => this.queue.push({ resolve }));
}
private refill() {
const now = Date.now();
this.tokens = Math.min(this.max, this.tokens + (now - this.lastRefill) * this.refillRate);
this.lastRefill = now;
while (this.tokens >= 1 && this.queue.length) {
this.tokens -= 1;
this.queue.shift()!.resolve();
}
}
}
const indexLimiter = new GleanRateLimiter(18); // buffer under 20 bulk/min
const searchLimiter = new GleanRateLimiter(50);
Retry Strategy
async function gleanRetry<T>(
limiter: GleanRateLimiter, fn: () => Promise<Response>, maxRetries = 4
): Promise<T> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
await limiter.acquire();
const res = await fn();
if (res.ok) return res.json();
if (res.status === 429) {
const retryAfter = parseInt(res.headers.get("Retry-After") || "30", 10);
const jitter = Math.random() * 5000;
await new Promise(r => setTimeout(r, retryAfter * 1000 + jitter));
continue;
}
if (res.status >= 500 && attempt < maxRetries) {
Enterprise architecture: Source Systems -> Connectors (Cloud Run/Lambda, event-driven or scheduled) -> Glean Indexing API -> Glean Search Index -> Client API (Search + Chat) -> Your Apps (Slack bot, portal, internal tools).
ReadWriteEditBash(npm:*)Bash(curl:*)Grep
Glean Reference Architecture
Overview
Enterprise search integration architecture for connecting internal knowledge systems to Glean's indexing and search platform. Designed for organizations needing unified search across Confluence, Google Drive, Notion, Slack, Jira, and custom internal tools. Key design drivers: connector reliability for continuous indexing, permission synchronization to enforce source-system ACLs, incremental vs bulk indexing tradeoffs, and low-latency search aggregation across heterogeneous document types.
Architecture Diagram
Source Systems ──→ Connector Framework ──→ Queue (SQS) ──→ Glean Indexing API
(Confluence, Drive, (Cloud Run) ↓ /indexing/documents
Notion, Slack, Jira) ↓ Permission Sync /indexing/permissions
Schedule (cron) ──→ Bulk Reindexer /indexing/datasources
↓
Glean Search Index ──→ Client API ──→ Your Apps
/search (Slack bot, portal)
/chat (internal tools)
Service Layer
class ConnectorService {
constructor(private glean: GleanIndexingClient, private cache: CacheLayer) {}
async indexDocument(doc: SourceDocument): Promise<void> {
const gleanDoc = this.transformToGleanFormat(doc);
await this.glean.indexDocument(doc.datasource, gleanDoc);
await this.syncPermissions(doc.id, doc.acl);
}
async bulkReindex(datasource: string, since?: string): Promise<IndexReport> {
const docs = await this.fetchAllDocuments(datasource, since);
const batches = this.chunk(docs, 100); // Glean recommends batches of 100
let indexed = 0;
for (const batch of batches) {
await this.glean.bulkIndex(datasource, batch);
indexed += batch.length;
}
return { datasource, totalIndexed: indexed, timestamp: new Date().toISOString() };
}
}
Caching Strategy
const CACHE_CONFIG = {
searchResults: { ttl: 30, prefix: 'search' }, // 30s — freshness critical for search
permissions: { ttl: 300, prefix: 'perm' }, // 5 min — ACL changes are infrequent
datasources: { ttl: 3600, prefix: 'ds' }, // 1 hr — datasource config rarely changes
connectorState: { ttl: 60, prefix: 'conn' }, // 1 min — sync cursor freshness
documentMeta: { ttl: 120, prefix: 'docmeta' }, // 2 min — title/author for search previews
};
// Webhook-driven invalidation: source system change events flush document cache immediately
Event Pipeline
class IndexingPipeline {
private queue = new Bull('glean-indexing', { redis: process.env.REDIS_URL });
async onSourceChange(event: SourceChangeEvent):
Apply production-ready Glean API patterns with typed clients, batch indexing, pagination, and error handling.
ReadWriteEdit
Glean SDK Patterns
Overview
Production-ready patterns for the Glean enterprise search platform. Glean uses POST-based REST endpoints for both search and indexing. Search queries go to the Client API while document ingestion uses the Indexing API. A structured client centralizes token management, enforces batch pagination for bulk indexing, and provides typed responses for search results.
Singleton Client
let _client: GleanClient | null = null;
export function getClient(): GleanClient {
if (!_client) {
const domain = process.env.GLEAN_DOMAIN, key = process.env.GLEAN_API_KEY;
if (!domain || !key) throw new Error('GLEAN_DOMAIN and GLEAN_API_KEY must be set');
_client = new GleanClient(domain, key);
}
return _client;
}
class GleanClient {
private base: string; private h: Record<string, string>;
constructor(domain: string, key: string) {
this.base = `https://${domain}/api`;
this.h = { 'Authorization': `Bearer ${key}`, 'Content-Type': 'application/json' };
}
async search(query: string, opts: { pageSize?: number; datasource?: string } = {}) {
const r = await fetch(`${this.base}/client/v1/search`, { method: 'POST',
headers: { ...this.h, 'X-Glean-Auth-Type': 'BEARER' },
body: JSON.stringify({ query, pageSize: opts.pageSize ?? 20,
requestOptions: opts.datasource ? { datasourceFilter: opts.datasource } : undefined }) });
if (!r.ok) throw new GleanError(r.status, await r.text()); return r.json() as Promise<GleanSearchResponse>;
}
async indexDocuments(datasource: string, docs: GleanDocument[]): Promise<void> {
const r = await fetch(`${this.base}/index/v1/indexdocuments`, {
method: 'POST', headers: this.h, body: JSON.stringify({ datasource, documents: docs }) });
if (!r.ok) throw new GleanError(r.status, await r.text());
}
async bulkIndex(ds: string, docs: GleanDocument[], batch = 100): Promise<void> {
for (let i = 0; i < docs.length; i += batch) await this.indexDocuments(ds, docs.slice(i, i + batch));
}
}
Error Wrapper
export class GleanError extends Error {
constructor(public status: number, message: string) { super(message); this.name = 'GleanError'; }
}
export async function safeCall<T>(operation: string, fn: () => Promise<T>): Promise<T> {
try { return await fn(); }
catch (err: any) {
if (err instanceof GleanError && err.status === 429) { await new Promise(r => setTimeout(r, 3000)); return fn(); }
if (err instanceof GleanError && err.status === 401) throw new GleanError(401, 'Invalid GLEAN_API_KEY');
throw new GleanError(err.status ?? 0, `${operation} failed: ${err.message}`);
}
}
Request Builder
class GleanSearchBuilder
Token security: Indexing tokens have write access -- never expose in frontend.
ReadWriteEditBash(npm:*)Bash(curl:*)Grep
Glean Security Basics
Overview
Glean indexes and searches across an enterprise's entire knowledge base — Confluence, Google Drive, Slack, GitHub, and dozens more connectors. Security concerns center on indexing token management (write-access tokens that can push content into the search index), client token scoping (user-level search permissions), and document-level access controls. A leaked indexing token allows injecting arbitrary content into enterprise search results.
API Key Management
function createGleanClient(tokenType: "indexing" | "client"): { token: string; baseUrl: string } {
const token = tokenType === "indexing"
? process.env.GLEAN_INDEXING_TOKEN
: process.env.GLEAN_CLIENT_TOKEN;
if (!token) {
throw new Error(`Missing GLEAN_${tokenType.toUpperCase()}_TOKEN — store in secrets manager`);
}
// Indexing tokens have WRITE access — never expose in frontend code
if (tokenType === "indexing") {
console.log("WARNING: Indexing token loaded — backend use only");
}
return { token, baseUrl: `https://${process.env.GLEAN_INSTANCE}.glean.com/api` };
}
Webhook Signature Verification
import crypto from "crypto";
import { Request, Response, NextFunction } from "express";
function verifyGleanWebhook(req: Request, res: Response, next: NextFunction): void {
const signature = req.headers["x-glean-signature"] as string;
const secret = process.env.GLEAN_WEBHOOK_SECRET!;
const expected = crypto.createHmac("sha256", secret).update(req.body).digest("hex");
if (!signature || !crypto.timingSafeEqual(Buffer.from(signature), Buffer.from(expected))) {
res.status(401).send("Invalid signature");
return;
}
next();
}
Input Validation
import { z } from "zod";
const IndexDocumentSchema = z.object({
datasource: z.string().min(1).max(100),
document_id: z.string().min(1).max(500),
title: z.string().min(1).max(500),
body: z.string().max(1_000_000),
allowed_users: z.array(z.string().email()).optional(),
allowed_groups: z.array(z.string()).optional(),
permissions_type: z.enum(["public", "restricted", "private"]).default("restricted"),
});
function validateIndexDocument(data: unknown) {
return IndexDocumentSchema.parse(data);
}
Data Protection
const GLEAN_SENSITIVE_FIELDS = ["indexing_token", "client_token", "document_body", "user_query", "search_results"];
function redactGleanLog(record: Record<string, unknown>): Record<string, unknown> {
const redacted = { ...record };
for (const field of GLEAN_SENSITIVE_FIELDS) {
if (field in redacted) redacte
Check Glean developer changelog for API changes.
ReadWriteEditBash(npm:*)Bash(curl:*)Grep
Glean Upgrade & Migration
Overview
Glean is an enterprise search platform that indexes documents across SaaS tools via connectors and exposes Search and Indexing APIs. Migrations involve connector schema changes, search API response format updates, and document permission model upgrades. Tracking API versions is critical because Glean's Indexing API enforces document schema validation — adding required fields or changing permission structures in a new version will cause bulk indexing failures and stale search results if connectors are not updated in lockstep.
Version Detection
const GLEAN_BASE = "https://your-domain-be.glean.com/api";
async function detectGleanApiVersion(apiToken: string): Promise<void> {
// Check indexing API health and version
const indexRes = await fetch(`${GLEAN_BASE}/index/v1/status`, {
headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json" },
});
const indexStatus = await indexRes.json();
console.log(`Indexing API version: ${indexRes.headers.get("x-glean-api-version") ?? "v1"}`);
console.log(`Connector status: ${JSON.stringify(indexStatus.connectors)}`);
// Check search API for deprecated query parameters
const searchRes = await fetch(`${GLEAN_BASE}/client/v1/search`, {
method: "POST",
headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json" },
body: JSON.stringify({ query: "test", pageSize: 1 }),
});
const deprecationHeader = searchRes.headers.get("x-glean-deprecated-params");
if (deprecationHeader) console.warn(`Deprecated parameters: ${deprecationHeader}`);
}
Migration Checklist
- [ ] Review Glean developer changelog for Indexing API schema changes
- [ ] Audit custom connectors for deprecated document fields
- [ ] Verify
objectType definitions match current Glean schema requirements
- [ ] Check if new required fields were added to document permission model
- [ ] Test search API response parsing —
results[].snippets format may change
- [ ] Update datasource configuration if connector authentication method changed
- [ ] Validate bulk indexing with a small document batch before full re-index
- [ ] Check
people API for identity resolution field changes
- [ ] Update search query syntax if faceted search operators were modified
- [ ] Monitor indexing error dashboard for 48 hours post-migration
Schema Migration
// Glean document schema evolved: flat permissions → structured ACL model
interface OldGleanDocument {
id: string;
datasource: string;
title: string;
body: { mimeType: string; textContent: string };
permissions: { allowedUsers: string[] };
updatedAt: string;
}
Implement event-driven Glean indexing triggered by source system webhooks from GitHub, Confluence, Notion, and other content platforms.
ReadWriteEditBash(npm:*)
Glean Webhooks & Events
Overview
Glean uses an event-driven indexing model where source system webhooks trigger incremental updates to the Glean Indexing API. Instead of emitting its own webhooks, Glean receives document changes from platforms like GitHub, Confluence, and Notion. You can also monitor internal Glean events such as document indexing completion, permission changes, connector sync status, and search anomalies through the admin API.
Webhook Registration
// Register a source system webhook that pushes to Glean Indexing API
const response = await fetch("https://yourapp.com/admin/webhooks", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
url: "https://yourapp.com/webhooks/glean-indexer",
events: ["document.indexed", "permission.changed", "connector.synced", "search.anomaly"],
secret: process.env.GLEAN_WEBHOOK_SECRET,
}),
});
Signature Verification
import crypto from "crypto";
import { Request, Response, NextFunction } from "express";
function verifyGleanSignature(req: Request, res: Response, next: NextFunction) {
const signature = req.headers["x-glean-signature"] as string;
const expected = crypto.createHmac("sha256", process.env.GLEAN_WEBHOOK_SECRET!)
.update(req.body).digest("hex");
if (!crypto.timingSafeEqual(Buffer.from(signature), Buffer.from(expected))) {
return res.status(401).json({ error: "Invalid signature" });
}
next();
}
Event Handler
import express from "express";
const app = express();
app.post("/webhooks/glean-indexer", express.raw({ type: "application/json" }), verifyGleanSignature, (req, res) => {
const event = JSON.parse(req.body.toString());
res.status(200).json({ received: true });
switch (event.type) {
case "document.indexed":
confirmIndexStatus(event.data.datasource, event.data.doc_id); break;
case "permission.changed":
reindexPermissions(event.data.datasource, event.data.object_id); break;
case "connector.synced":
logSyncMetrics(event.data.connector_name, event.data.docs_processed); break;
case "search.anomaly":
alertOps(event.data.query_pattern, event.data.anomaly_type); break;
}
});
Event Types
| Event |
Payload Fields |
Use Case |
document.indexed |
datasource, docid, indextime_ms |
Confirm content is searchable |
permission.changed |
datasource, objectid
|
|
|