OpenAI compatible API
GenAI4Science portal has built-in compatibility with the OpenAI Chat Completions API, making it possible to use more tooling.
API urls
GenAI4Science provides two OpenAI API compatible endpoints.
Direct access endpoint
It will be disabled soon. Please use the managed endpoint.
Managed endpoint
Since the Openai API does not offer a context length setting option, this endpoint provides an extended context window of 8192 tokens for all models, with the exception of the mistral-large:123b model, which supports a smaller context window of 4096 tokens.
The managed OpenAI API base url is:
Performance endpoint
Available soon
GenAI4Science portal is currently optimized to allow users to try out multiple models. Since not every modell can fit in the memory of the GPU cards at the same time, there may be a slight delay between model switching. In the case of intensive API calls, model switching noticeably slows down processing.
This endpoint will be separated from the user interface and will provide models with dedicated GPU memory.
Please contact us if you would like to make bulk API calls for an extended period of time or use the API for a production service.
Authentication
To ensure secure access to the API, authentication is required. You can authenticate your API requests using the Bearer Token mechanism. Obtain your API key from Settings > Account in the Portal.
Limitations
Each backend instance can handle limited number of requests per model at the same time (usually 4). If this limit has been reached, then the request is waiting in the backend queue. When this queue is full, the following error message is displayed:
Ollama: 503, message='Service Unavailable'
After a short wait it is advisable to try again or to choose a smaller model.
Usage
To invoke OpenAI compatible API endpoint, use the same OpenAI format
Developer tools, extensions
Set the OpenAI API base url and the API key. Unless specifically requested, do not set the "chat/completions" endpoint, only the base URL.
Continue
It is the leading open-source AI code assistant inside of VS Code and JetBrains. To install it, see the documentation.
Configuration
Edit config.json
Use the following model configurations to use larger context window.
"models": [
{
"title": "codellama:13b",
"provider": "openai",
"model": "codellama:13b",
"apiBase": "https://genai.science-cloud.hu/api/",
"apiKey": "sk-...",
"useLegacyCompletionsEndpoint": false
},
{
"title": "codestral:22b",
"provider": "openai",
"model": "codestral:22b",
"apiBase": "https://genai.science-cloud.hu/api/",
"apiKey": "sk-...",
"useLegacyCompletionsEndpoint": false
}
],
"tabAutocompleteModel": {
"title": "codestral:22b",
"provider": "openai",
"model": "codestral:22b",
"apiBase": "https://genai.science-cloud.hu/api/",
"apiKey": "sk-...",
"useLegacyCompletionsEndpoint": false
},
cURL
Set your API key in the header parameter "Authorization" after "Bearer"
curl https://genai.science-cloud.hu/api/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-9a..." \
-d '{
"model": "llama3.1:8b",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}'
Python
Set your API key
from openai import OpenAI
client = OpenAI(base_url="https://genai.science-cloud.hu/api/", api_key="sk-9a...")
print(client.models.list())
response = client.chat.completions.create(
model="llama3.1:8b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is a LLM?"}
]
)
print(response)
More details: https://ollama.com/blog/openai-compatibility