Llama2 - Huggingface Tutorial
Huggingface is an open source platform to deploy machine-learnings models.
Call Llama2 with Huggingface Inference Endpoints
LiteLLM makes it easy to call your public, private or the default huggingface endpoints.
In this case, let's try and call 3 models:
| Model | Type of Endpoint | 
|---|---|
| deepset/deberta-v3-large-squad2 | Default Huggingface Endpoint | 
| meta-llama/Llama-2-7b-hf | Public Endpoint | 
| meta-llama/Llama-2-7b-chat-hf | Private Endpoint | 
Case 1: Call default huggingface endpoint
Here's the complete example:
from litellm import completion 
model = "deepset/deberta-v3-large-squad2"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format 
### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface")
What's happening?
- model: This is the name of the deployed model on huggingface
- messages: This is the input. We accept the OpenAI chat format. For huggingface, by default we iterate through the list and add the message["content"] to the prompt. Relevant Code
- custom_llm_provider: Optional param. This is an optional flag, needed only for Azure, Replicate, Huggingface and Together-ai (platforms where you deploy your own models). This enables litellm to route to the right provider, for your model.
Case 2: Call Llama2 public Huggingface endpoint
We've deployed meta-llama/Llama-2-7b-hf behind a public endpoint - https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud.
Let's try it out:
from litellm import completion 
model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format 
api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"
### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface", api_base=api_base)
What's happening?
- api_base: Optional param. Since this uses a deployed endpoint (not the default huggingface inference endpoint), we pass that to LiteLLM.
Case 3: Call Llama2 private Huggingface endpoint
The only difference between this and the public endpoint, is that you need an api_key for this. 
On LiteLLM there's 3 ways you can pass in an api_key.
Either via environment variables, by setting it as a package variable or when calling completion(). 
Setting via environment variables
Here's the 1 line of code you need to add 
os.environ["HF_TOKEN"] = "..."
Here's the full code:
from litellm import completion 
os.environ["HF_TOKEN"] = "..."
model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format 
api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"
### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface", api_base=api_base)
Setting it as package variable
Here's the 1 line of code you need to add 
litellm.huggingface_key = "..."
Here's the full code:
import litellm
from litellm import completion 
litellm.huggingface_key = "..."
model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format 
api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"
### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface", api_base=api_base)
Passed in during completion call
completion(..., api_key="...")
Here's the full code:
from litellm import completion 
model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format 
api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"
### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface", api_base=api_base, api_key="...")