LLM models

Not Diamond supports the following LLMs with the specified names:

ProviderModel nameFunction callingStructured outputsAlias
OpenAIopenai/gpt-4o-2024-08-06✔️✔️openai/gpt-4o
openai/gpt-4o-2024-05-13✔️✔️
openai/gpt-4-turbo-2024-04-09✔️✔️openai/gpt-4-turbo
openai/gpt-4-0125-preview✔️✔️openai/gpt-4-turbo-2024-04-09
openai/gpt-4-1106-preview✔️✔️
openai/gpt-4-0613✔️✔️openai/gpt-4
openai/gpt-3.5-turbo-0125✔️✔️openai/gpt-3.5-turbo
openai/gpt-4o-mini-2024-07-18✔️✔️openai/gpt-4o-mini
Anthropicanthropic/claude-3-5-sonnet-20240620✔️
anthropic/claude-3-opus-20240229✔️✔️
anthropic/claude-3-sonnet-20240229✔️✔️
anthropic/claude-3-haiku-20240307✔️
anthropic/claude-2.1
Googlegoogle/gemini-1.5-pro-latest✔️✔️
google/gemini-1.5-flash-latest✔️✔️
google/gemini-1.0-pro-latest✔️✔️google/gemini-pro
Mistralmistral/open-mixtral-8x22b✔️
mistral/codestral-latest
mistral/open-mixtral-8x7b✔️
mistral/mistral-large-2407✔️✔️mistral/mistral-large-latest
mistral/mistral-large-2402✔️✔️
mistral/mistral-medium-latest✔️
mistral/mistral-small-latest✔️✔️
mistral/open-mistral-7b✔️
Replicatereplicate/meta-llama-3-70b-instruct
replicate/meta-llama-3-8b-instruct
replicate/mixtral-8x7b-instruct-v0.1
replicate/mistral-7b-instruct-v0.2
replicate/meta-llama-3.1-405b-instruct
TogetherAItogetherai/Llama-3-70b-chat-hf
togetherai/Llama-3-8b-chat-hf
togetherai/Meta-Llama-3.1-8B-Instruct-Turbo
togetherai/Meta-Llama-3.1-70B-Instruct-Turbo
togetherai/Meta-Llama-3.1-405B-Instruct-Turbo
togetherai/Qwen2-72B-Instruct
togetherai/Mixtral-8x22B-Instruct-v0.1
togetherai/Mixtral-8x7B-Instruct-v0.1
togetherai/Mistral-7B-Instruct-v0.2
Perplexityperplexity/llama-3.1-sonar-large-128k-online
Coherecohere/command-r-plus✔️✔️
cohere/command-r✔️✔️

We are continuously expanding our list of supported models. Send us a note if you have a specific model requirement and we will onboard it for you.

Defining additional configurations

If you'd like to have more control over each LLM you're routing between, you can use the LLMConfig class. This is especially useful when you want to set API keys explicitly or define additional LLM parameters such as temperature. You can also define custom cost and latency attributes to inform cost and latency tradeoffs:

from notdiamond.llms.config import LLMConfig
from notdiamond import NotDiamond

client = NotDiamond()

llms = [
    LLMConfig(
        provider="openai",
        model="gpt-3.5-turbo",
        api_key="YOUR_OPENAI_API_KEY",
        temperature=0.5,
        max_tokens=256,
        input_price= 1,  # USD cost per million tokens
        output_price= 0.5,  # USD cost per million tokens
        latency= 0.86,  # Time to first token in seconds
    ),
    LLMConfig(
        provider="anthropic",
        model="claude-3-opus-20240229",
        api_key="YOUR_ANTHROPIC_API_KEY",
        temperature=0.8,
        max_tokens=256,
        input_price= 3,  # USD cost per million tokens
        output_price= 2,  # USD cost per million tokens
        latency= 1.24,  # Time to first token in seconds
    ),
]

result, session_id, provider = client.chat.completions.create(
    messages=[ 
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Concisely explain merge sort."}  # Adjust as desired
    ],
    model=llms,
)

print("Not Diamond session ID: ", session_id)
print("LLM called: ", provider.model)
print("LLM output: ", result.content)

You can also configure the URL endpoint for all client requests, if necessary:

from notdiamond import NotDiamond

client = NotDiamond(nd_api_url="https://my-api-endpoint.org")

Custom models

You can route to your own custom models—whether a fine-tuned model, an agentic workflow, or any other custom inference endpoint—by training your own custom router and including your custom model in the evaluation dataset.