The Art of Getting Only JSON Back from LLM APIs

Using start and stop sequences

When working with Large Language Models (LLMs) through APIs, one common requirement is to receive responses in a specific format, such as JSON. Makes it very easy to parse and use the data in applications.

However, it's been hard with all the extra text that LLM's generate, and llm providers have been trying to solve this problem for a while now. ref ( Getting JSON Object Response Back In OpenAI GPT4o ).

But there is a another way, atleast with Anthropic's LLMs and that is what we will explore here.

Understanding Start and Stop Sequences

Start and stop sequences are special tokens or strings that you can use to control the output of an LLM. By defining a start sequence, you can indicate where the model should begin its response, and by defining a stop sequence, you can specify where the response should end. This is particularly useful for ensuring that the output is formatted correctly, such as in JSON.

Example: Using Claude 3.5 Haiku

In following I am not using any start or stop sequences and below the code you can see the output with extra text that is not needed.

import anthropic
import os
from dotenv import load_dotenv

load_dotenv()
client = anthropic.Anthropic(api_key=os.getenv("CLAUDE_API_KEY"))

messages = []


def add_user_message(message):
    messages.append({"role": "user", "content": message})


def add_system_message(message):
    messages.append(
        {
            "role": "assistant",
            "content": message,
        }
    )


def chat(stop_sequences=[]):
    message = client.messages.create(
        model="claude-3-5-haiku-latest",
        max_tokens=1000,
        temperature=1,
        system="You are a helpful assistant",
        messages=messages,
        stop_sequences=stop_sequences,
    )

    print(message.content[0].text)


add_user_message("return a json file with test user and their name, age")
add_system_message("") # not setting start sequence
chat() # not sending any stop sequences

Output without start and stop sequences

```json
{
  "users": [
    {
      "id": 1,
      "name": "John Doe",
      "age": 35,
      "email": "john.doe@example.com"
    },
    {
      "id": 2,
      "name": "Jane Smith",
      "age": 28,
      "email": "jane.smith@example.com"
    },
    {
      "id": 3,
      "name": "Mike Johnson",
      "age": 42,
      "email": "mike.johnson@example.com"
    },
    {
      "id": 4,
      "name": "Emily Brown",
      "age": 25,
      "email": "emily.brown@example.com"
    },
    {
      "id": 5,
      "name": "David Wilson",
      "age": 50,
      "email": "david.wilson@example.com"
    }
  ]
}
(```) -- I am esccaping triple backticks here to avoid markdown parsing issues




This JSON file includes:
- An array of users
- Each user has an ID, name, age, and email
- Varied ages and names for testing purposes

Using Start and Stop Sequences

Now, let's modify the code to include start and stop sequences to ensure we only get the JSON output we want. Notice in the code I am already giving a head start to the assistant by providing a start sequence in "add_system_message" and stop sequence in the chat function.

import anthropic
import os
from dotenv import load_dotenv

load_dotenv()
client = anthropic.Anthropic(api_key=os.getenv("CLAUDE_API_KEY"))

messages = []


def add_user_message(message):
    messages.append({"role": "user", "content": message})


def add_system_message(message):
    messages.append(
        {
            "role": "assistant",
            "content": message,
        }
    )


def chat(stop_sequences=[]):
    message = client.messages.create(
        model="claude-3-5-haiku-latest",
        max_tokens=1000,
        temperature=1,
        system="You are a helpful assistant",
        messages=messages,
        stop_sequences=stop_sequences,
    )

    print(message.content[0].text)


add_user_message("write a json file with test user and their name, age")
add_system_message("```json") # this means the assistant will start its response with ```json
chat(["```"]) # this means the assistant will stop its response when it sees ```

Output with Start and Stop Sequences

{
    "users": [
        {
            "id": 1,
            "name": "John Doe",
            "age": 35,
            "email": "john.doe@example.com"
        },
        {
            "id": 2,
            "name": "Jane Smith", 
            "age": 28,
            "email": "jane.smith@example.com"
        },
        {
            "id": 3,
            "name": "Mike Johnson",
            "age": 42,
            "email": "mike.johnson@example.com"
        },
        {
            "id": 4,
            "name": "Emily Brown",
            "age": 25,
            "email": "emily.brown@example.com"
        },
        {
            "id": 5,
            "name": "David Wilson",
            "age": 50,
            "email": "david.wilson@example.com"
        }
    ]
}

And that's it! By using start and stop sequences, you can effectively control the output of LLMs to ensure you receive only the JSON data you need, without any additional text or formatting.

You can find the complete code on GitHub.