This WP deals with the creation of open, labeled, and shareable datasets. It builds on already existing solutions and on already existing infrastructures at the disposal of the RU.
Task 1.1: Data Collection [M1-M12]
- Mobile app traffic collection [UNINA]
For this task, UNINA RU will leverage the already available MIRAGE architecture allowing for capturing the traffic generated by handheld devices operated by human users. The system for collecting mobile-app traffic will be refined and enhanced. POLITO will install a second traffic-capture architecture in their premises The collected data will be used to demonstrate XInternet approaches in the context of Mobile-app traffic classification and prediction. - Smart Honeypot Infrastructure [POLITO]
In this task, POLITO RU improves the flexible infrastructure already present in their premises. It consists of 4 /24 networks that can be seamlessly configured as darknets (no service-connected) or vertical and smart honeypots. It will be transformed as a virtual appliance to facilitate the installation on other premises. UNINA will install a second sensor in their premises. This data will be used to demonstrate and test the XInternet approach in the context of cybersecurity.
Task 1.2: Labeling and Sharing [M6-M18]
To foster reproducibility and collaboration with external partners, we will make all the datasets usable according to the FAIR principles. Firstly, we will annotate the datasets with instrumental labels that can be used to verify the performance of AI algorithms (class labels for traffic and attacks). Secondly, we will make the dataset available on the project website and link them to other initiatives (e.g., IEEE Dataport and Caida Stardust project).
DELIVERABLE:
- D1.1 [M12 - POLITO] - Description of the software tools used for the data collection.
- D1.2 [M18- UNINA] - Description of the open data and benchmarks.