Architecture
Software components
Frontend: Open WebUI
Backend: Ollama
Hardware
Four instances of Ollama were installed to serve the UI.
The smaller models were distributed among three Nvidia Tesla V100 32GB GPUs, so you don't have to wait for them to be loaded. The default model (llama3.1:8B) runs on multiple instances for better performance.
Ollama running the large models (70B, 90B, 123B) has Nvidia Tesla H100 94GB support.