together-hello-world

'Run inference with Together AI -- chat completions, streaming, and model

5 Tools
together-pack Plugin
saas packs Category

Allowed Tools

ReadWriteEditBash(pip:*)Bash(python3:*)

Provided by Plugin

together-pack

Claude Code skill pack for Together AI (18 skills)

saas packs v1.0.0
View Plugin

Installation

This skill is included in the together-pack plugin:

/plugin install together-pack@claude-code-plugins-plus

Click to copy

Instructions

Together AI Hello World

Overview

Run chat completions with open-source models via Together AI's OpenAI-compatible API. Supports Llama, Mixtral, Qwen, and 100+ models. Key endpoints: /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/images/generations.

Instructions

Step 1: Chat Completions


from together import Together

client = Together()

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python function to calculate fibonacci numbers"},
    ],
    max_tokens=500,
    temperature=0.7,
    top_p=0.9,
)

print(response.choices[0].message.content)
print(f"Tokens: {response.usage.prompt_tokens} in, {response.usage.completion_tokens} out")

Step 2: Streaming


stream = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True,
    max_tokens=200,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Step 3: Image Generation


response = client.images.generate(
    model="black-forest-labs/FLUX.1-schnell-Free",
    prompt="A sunset over mountains, digital art style",
    width=1024, height=768,
    n=1,
)
print(f"Image URL: {response.data[0].url}")

Step 4: Embeddings


response = client.embeddings.create(
    model="togethercomputer/m2-bert-80M-8k-retrieval",
    input=["Hello world", "Together AI is great"],
)
print(f"Embedding dim: {len(response.data[0].embedding)}")

Step 5: Node.js with OpenAI Client


import OpenAI from 'openai';

const together = new OpenAI({
  apiKey: process.env.TOGETHER_API_KEY,
  baseURL: 'https://api.together.xyz/v1',
});

const chat = await together.chat.completions.create({
  model: 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
  messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(chat.choices[0].message.content);

Output


def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

Tokens: 28 in, 45 out

Error Handling

Error Cause Solution
Model not found Wrong model ID Check docs.together.ai/docs/inference-models
Empty response max_tokens too low Increase max_tokens
429 rate limit Too many requests Implement backoff
Slow response Large model Try Turbo variant or smaller model

Resources

Next Steps

Proceed to together-local-dev-loop for development workflow.

Ready to use together-pack?