Architecture
Software components
Frontend: Open WebUI, Performance api - LiteLLM
Hardware
To support GenAI4Science, multiple instances of Ollama and vLLM have been deployed.
The smaller models were distributed among 4 Nvidia Tesla V100 32GB GPUs. Models exceeding 70 billion parameters are processed on a single Nvidia Tesla H100 94GB GPU or a cluster of four Nvidia Tesla A100 40GB GPUs.
The infrastructure is hosted on the HUN-REN Cloud platform. Specifically, we utilize:
- Frontend, V100 and H100 GPUs located at SZTAKI
- A100 GPUs located at Wigner Datacenter