Fixing Access to model meta-llama/Meta-Llama-3.1-8B-Instruct is restricted.

How To Fix : Access to model meta-llama/Meta-Llama-3.1-8B-Instruct is restricted. You must be authenticated to access it.

In this post I will try to cover what all went wrong while running that bare minimum sample code from the model page and hopefully how to fix it. Meta-Llama-3.1-8B-Instruct is one of the models that is not accessible directly via the Hugging Face model hub. This will be your initial error if you try to access the model without any approval.

requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url:
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/resolve/main/config.json

Fill out that form on the model page and once approved(you will get a email) you should be able to access it. But at this point you might forget that you do need a hugging face access token to verify that it is infact you who is trying to access the model.

Well that is what I forgot once I was approved and kept getting the error in the title, took a min to remember that I do need to authenticate to access the model. Just a reminder to myself and others who might forget this.

Let's Fix It

Following are steps to create your own access token considering you have an account on Hugging Face:

Go to your profile on Hugging Face and click on the settings tab. You will see a section called Access Tokens.
Click on the Create New Token button. It will ask you to enter a name for the token.
There are lot of permissions you can give to the token, but going with "Read access to contents of all public gated repos you can access" is enough for now.
Once you click on the Create button, you will see a token generated.
Copy the token and use it in the code to access the model.

import transformers
import torch

token = "hf_my_token"

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
    token=token,
)

messages = [
    {
        "role": "system",
        "content": "You are a pirate chatbot who always responds in pirate speak!",
    },
    {"role": "user", "content": "Who are you?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Side Note

You might get following error while running above code:

ImportError: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate`

A simple pip install accelerate should fix this issue.

And that is the long story of running a simple code from a model page.