Architecture

Software components

Frontend: Open WebUI, Performance api - LiteLLM

Hardware

To support GenAI4Science, multiple instances of Ollama and vLLM have been deployed.

The smaller models were distributed among 4 Nvidia Tesla V100 32GB GPUs. Models exceeding 70 billion parameters are processed on a single Nvidia Tesla H100 94GB GPU or a cluster of four Nvidia Tesla A100 40GB GPUs.

The infrastructure is hosted on the HUN-REN Cloud platform. Specifically, we utilize:

Frontend, V100 and H100 GPUs located at SZTAKI
A100 GPUs located at Wigner Datacenter