Sergio Rabellino

Università di Torino

https://www.unito.it/

12 novembre 2021 - SESSIONE 9: INFRASTRUTTURE CLOUD

Supporto universale alle GPU in un cloud Openstack/Juju: configurazione e performance

Universal GPU support in an Openstack / Juju cloud: configuration and performance

Supporto universale alle GPU in un cloud Openstack/Juju: configurazione e performance

Sergio Rabellino coordina le attività di supporto alla ricerca del Dipartimento di Informatica dell'Università di Torino. coopera con i gruppi di ricerca in sicurezza, eidomatica, HPC, AI e elearning. E` uno sviluppatore moodle e hardware/software architect di piattaforme di elearning e sistemi HPC. Responsabile tecnico dei progetti Start@unito, Orient@mente, iLearn, ha più di 40 prodotti di ricerca nel campo degli elearning tools e metodologie.

Sergio Rabellino is the head of the ICT technical staff office of the Department of Computer Science, University of Turin. He co-operates with the research groups in Security, Eidomatics, High Performance Computing, Artificial Intelligence and E-learning. He is a Moodle Developer and hardware/software architect of elearning and HPC platforms. Technical head and architect of the Moodle based projects Start@Unito, Orient@mente, iLearn at the University, he counts more than 40 publications about elearning tools and methods.

ABSTRACT

La presentazione ha l'obiettivo di delineare i passi necessari alla configurazione delle GPU sui nodi di computazione di un sistema cloud basato su Openstack/Juju con una attenzione particolare alla manutenzione nel tempo della soluzione, frutto dell'esperienza biennale del Dipartimento di Informatica nel supportare differenti apparecchiature che normalmente si avvicendano nel datacenter. In aggiunta sono presentate delle valutazioni di performance di carichi di lavoro di tipo Deep Learning posti in relazione alla capacità dei diversi modelli di storage che sono predisposti nel servizio HPC4AI di sollecitare le capacità computazionali delle GPU.

The presentation aims to outline the necessary steps for configuring the GPUs on the computation nodes of a cloud system based on Openstack / Juju with particular attention to the maintenance of the solution over time, the result of the two-year experience of the Computer Science Department of Torino in supporting different devices that normally alternate in the datacenter. In addition, performance evaluations of Deep Learning workloads are presented about the ability of the different storage models that are set up in the HPC4AI service to solicit the computational capabilities of the GPUs.

Stampa