Hardware
The Data Platform is conveyed by the HPC machine “MarghERita”, equipped with 75 physical computational nodes, each composed of 2 Intel Xeon Gold 6336Y 2.4GHz 24 core processors, 512GB of RAM, with Nvidia Tesla T4 data processing card and Data Lake consisting of 230TB of flash and 700TB of storage (Dell Isilon Powerscale).
The computational nodes and the storage (Data Lake) are equally distributed between the Datacenter located in the Emilia-Romagna Region Datacenter in Viale Aldo Moro 52 (Bologna) and the Lepida Datacenter located in Ferrara (inside the CAGE dedicated to Emilia-Romagna Region).

Software
The Data Platform is based on the building block OpenNebula solution, which allows, when necessary, a multi-tenancy approach and integrates customized and optimized modules from various open-source solutions.
The central calculation system is based on Kubernetes architecture orchestrated with the management software Rancher. Both the file system HDFS and the Data Lake access are managed with Apache Ambari.
The components that are currently supported by Data Platform and available for development of Big Data Analytics and Artificial Intelligence projects are:

Compute:

  • Apache Spark
  • Trino SQL engine


Storage:

  • Apache Hive
  • OneFS HDFS


Data Engineer:

  • Apache Kafka
  • Apache Ni-Fi
  • Apache Airflow


Data Governance:

  • Apache Ranger
  • Apache Atlas


Development:

  • JupyterHub


Data Visualization:

  • Apache Superset


Code versioning:

  • GitLab

API manager:

  • WSO2

tools application services

The access to the development applications mentioned above is supervised by the administrators of the platform MarghERita and is run with a VPN managed by Lepida.

The share of resources (CPU, GPU, RAM) and the storage capacity necessary to the project implementation, declared at the submission of the application, will be configured and allocated to the demanding entity.

The development applications are deployed in dedicated working areas. The resources result logically isolated (multi-tenancy), ensuring respect for data privacy and the absence of conflicts between computational processes.

Further information on the platform characteristics can be requested at margherita@regione.emilia-romagna.it