Introducing the dependable and privacy-preserving Federated AI in the ARCADIAN-IoT framework, which revolutionizes the distribution of machine learning (ML) algorithms and models across multiple devices in a decentralized manner. This approach maintains data security and privacy, allowing different entities to collaborate on training ML models without pooling their data in a central repository, avoiding potential privacy and security concerns. In Federated AI, data remains decentralized and under the control of individual entities, with models trained locally and then aggregated to form a global model, fostering collaboration while preserving data privacy.
In ARCADIAN-IoT, RISE – Research Institutes of Sweden – aims to build a dependable and privacy-preserving Federated Learning (FL) system to be deployed in Machine Learning (ML) components such as Cyber Threat Intelligence (CTI) and Behaviour Monitoring components. The goal is to provide source integrity, data integrity, and handle the problem of statistical and systematic heterogeneity of data in IoT environments, which can cause inaccurate and vulnerable training processes.
The Federated AI component of the ARCADIAN-IoT framework will be developed as an integrated module within the ML components and will include two subcomponents: data rebalancer and model resizing and sharing. The data rebalancer will rebalance non-Independent and Identically Distributed (non-IID) data, and the model resizing and sharing subcomponent will provide a communication-efficient and robust framework for model aggregation while preserving source integrity.
The main objective of the Federated AI component is to provide a dependable privacy preserving classifier based on FL. During the development phase, RISE has focused on the definition and development of methods for rebalancing non-IID and imbalanced data. The team studied the statistical heterogeneity issues in Federated AI and IoT networks, analyzed three state-of-the-art data rebalancing methods, and designed and implemented a new potential solution for data rebalancing. The new technique, based on the K-SMOTE method, generates more complex synthetic points to share some of them with other participating clients.
The Data rebalancer component can be deployed on edge devices in the IoT network, and the entire process consists of three phases: synthetic data generation, model optimization, and data and model sharing. In the synthetic data generation phase, the training data available at a client is rebalanced by an over-sampling technique, generating synthetic data points. The synthetic data points are merged with the genuine data points to obtain a balanced dataset that is fed into the local ML model. In the model optimization phase, the local ML model is trained, and in the data and model sharing phase, the processed model is shared with other clients.
Additionally, RISE has focused on improving communication efficiency and robustness in the Federated AI component of the ARCADIAN-IoT framework. Specifically, the team has developed a novel aggregation rule called SparSFA that ensures smooth training processes in peer-to-peer FL and for non-IID data.
After surveying the state-of-the-art techniques, RISE identified the need for an aggregation rule that considers both communication efficiency and utility, while also being robust enough to defend against data or model poisoning attacks. SparSFA was designed to meet these requirements.
The proposed model resizing and sharing subcomponent uses SparSFA to efficiently and robustly aggregate the locally trained models from individual clients while preserving source integrity. Figure 10 provides a visual representation of how this subcomponent works.
Overall, the Federated AI component of the ARCADIAN-IoT framework leverages advanced techniques in data rebalancing, model resizing, and sharing to enable dependable and privacy-preserving FL capabilities in IoT environments. The development of SparSFA adds an important dimension of communication efficiency and robustness to the framework, ensuring that the training process is smooth and secure against attacks.
In conclusion, the Federated AI component of the ARCADIAN-IoT framework aims to provide dependable and privacy preserving FL capabilities to handle the challenges of non-IID and imbalanced data in IoT environments. The component will be an essential component for the overall success of the ARCADIAN-IoT project.