Skip to content

Architecture

Software components

Frontend: Open WebUI

Backend: Ollama

Hardware

Four instances of Ollama were installed to serve the UI.

The smaller models were distributed among three Nvidia Tesla V100 32GB GPUs, so you don't have to wait for them to be loaded. The default model (llama3.1:8B) runs on multiple instances for better performance.

Ollama running the large models (70B, 90B, 123B) has Nvidia Tesla H100 94GB support.