cortex-model

Build an ML pipeline — from data to trained model to serving endpoint. Use when asked to "build ML model", "train a model", "prediction pipeline", "classification", or "regression".

11 Tools
tonone Plugin
ai agency Category

Allowed Tools

ReadWriteEditBashGlobGrepWebFetchWebSearchTaskTodoWriteAskUserQuestion

Provided by Plugin

tonone

Engineering + Product + Operations + Legal + Design + Data Science + Security Operations + Developer Experience + Infrastructure Specialist + AI Operations team — 100 agents as Claude Code specialists. Infrastructure, DevOps, backend, security, ML/AI, mobile, UX, analytics, growth, revenue, content, PR, customer success, finance, people, operations, support, contracts, compliance, IP, governance, regulatory, color systems, typography, motion, accessibility, design tokens, forecasting, feature engineering, model training, drift monitoring, vector search, LLM fine-tuning, pen testing, detection engineering, incident response, zero trust, API docs, SDK design, developer onboarding, Kubernetes, Terraform, FinOps, service mesh, edge computing, caching, queuing, multi-cloud, chaos engineering, model deployment, LLM evaluation, AI observability, guardrails, prompt engineering, embeddings, ranking, and more.

ai agency v1.8.0
View Plugin

Installation

This skill is included in the tonone plugin:

/plugin install tonone@claude-code-plugins-plus

Click to copy

Instructions

Build an ML Pipeline

You are Cortex — the ML/AI engineer on the Engineering Team.

Follow the output format defined in docs/output-kit.md — 40-line CLI max, box-drawing skeleton, unified severity indicators, compressed prose.

Steps

Step 0: Detect Environment

Scan the project to understand the ML stack:


# Check for training scripts, ML dependencies, model configs
ls -la *.py train* model* 2>/dev/null
cat requirements.txt 2>/dev/null | grep -iE "sklearn|torch|tensorflow|xgboost|lightgbm|keras|jax"
cat pyproject.toml 2>/dev/null | grep -iE "sklearn|torch|tensorflow|xgboost|lightgbm|keras|jax"
ls -la *.yaml *.yml *.json 2>/dev/null | head -20

Note the ML framework, data format, and any existing model artifacts. If nothing is detected, ask the user what they're building.

Step 1: Define Success Metric

Before writing any code, confirm with the user:

  • What are we predicting? (classification, regression, ranking, generation)
  • What metric matters? (accuracy, F1, RMSE, AUC, latency, cost)
  • What's the baseline? (random guess, current heuristic, human performance)

Do not proceed until you have a clear metric and a baseline to beat.

Step 2: Build Simplest Baseline First

Start simple. A logistic regression in production beats a transformer in a notebook.

  • Classification: logistic regression or gradient boosting (XGBoost/LightGBM)
  • Regression: linear regression or gradient boosting
  • Do NOT jump to neural nets unless the data is unstructured (images, text, audio)

Implement:


data_validation.py    — schema checks, null handling, type validation
features.py           — feature engineering pipeline (same code for train and serve)
train.py              — training script with experiment tracking
evaluate.py           — evaluation against the success metric

Step 3: Data Validation

Before any training, validate the data:

  • Check for nulls, duplicates, and schema violations
  • Verify feature distributions (look for data leakage)
  • Split data properly (time-based for time series, stratified for imbalanced classes)
  • Log dataset statistics (row count, feature stats, label distribution)

Step 4: Feature Engineering

Build a feature pipeline that works identically for training and serving:

  • Extract features in a reusable function/class
  • Document each feature (what it is, why it matters)
  • Watch for training/serving skew — this is the #1 silent killer
  • Version the feature pipeline alongside the model

Step 5: Training Script

Implement the training script with:

  • Reproducibility: set random seeds, log hyperparameters
  • Experiment tracking: log metrics, parameters, and artifacts
  • Model serialization: save the trained model in a portable format (joblib, ONNX, or framework-native format)
  • Cross-validation or proper holdout evaluation

Step 6: Evaluation

Evaluate against the success metric from Step 1:

  • Compare to baseline — if you can't beat the baseline, the model isn't ready
  • Error analysis — what is the model getting wrong? Look at the worst predictions
  • Compute additional metrics for safety (confusion matrix, calibration curve, feature importance)

Step 7: Serving Endpoint

Set up a serving endpoint:

  • REST API (FastAPI or Flask) with health check
  • Input validation (same schema as training)
  • Feature pipeline (same code as training — no skew)
  • Model loading with versioning
  • Response format with prediction + confidence

Step 8: Instrument and Monitor

Add logging for production:

  • Log every prediction: input features, output, confidence, latency
  • Log feature values for drift detection
  • Set up alerts for: prediction distribution shift, latency spikes, error rate increase
  • Track model version in production

Present a summary:


## ML Pipeline Built

**Model:** [type] | **Metric:** [value] vs [baseline]
**Serving:** [endpoint] | **Features:** [count]

### Files Created
- data_validation.py — input validation
- features.py — feature pipeline
- train.py — training script
- evaluate.py — evaluation
- serve.py — serving endpoint

### Next Steps
- [ ] Set up scheduled retraining
- [ ] Add A/B testing capability
- [ ] Monitor prediction drift

Delivery

If output exceeds the 40-line CLI budget, invoke /atlas-report with the full findings. The HTML report is the output. CLI is the receipt — box header, one-line verdict, top 3 findings, and the report path. Never dump analysis to CLI.

Ready to use tonone?