Decentralized machine learning to respond to the health crisis
This article is also available in French.
The current health crisis has shown how essential it is to have data in order to make political decisions. But also how sensitive some data is, and should only be handled with care. This is evidenced by the current debates in France around the relevance of the 6 p.m. curfew, the tracing of Covid contact cases and the creation of a medical database. In all three cases, for the TousAntiCovid application as for the Health Data Hub, the choice of solutions that centralize data raises questions and concerns. This centralizing model, where all data are brought together in a single database, is fortunately not the only one possible. There are alternatives that make it possible to analyze the data and extract indicators to guide research and public policies, while guaranteeing the security and confidentiality of the data. Solutions that ensure that citizens retain control over their personal information, that it cannot be used without their consent.
We present here one of these solutions that allows training an AI with the data of many individuals without ever disclosing it to third parties, thanks to a decentralized protocol.
Against COVID: how long to keep sport halls and other public places closed?
Today, at a time when the whole of France is adopting new restrictive measures (general curfew at 6 p.m.), is it still legitimate to ask this question? The answer is yes and it will also arise as the government will lift these restrictions:
- Will we have to open restaurants, bars, sport halls, swimming pools, theaters and other cultural places? In which order ? In what conditions ?
- Will our children still have to wear a mask from 6 years old?
- Should we let the most fragile personnel be confined?
- Will we have to close the school canteens? nurseries? to reopen universities?
One thing is certain: an epidemic covers very large scales, and behaves differently according to many criteria, in particular temporal and spatial, which complicates the isolation of relevant variables.
At Cozy Cloud, we bring an innovative approach and will try to answer this first question as an application case: does closing the sport halls contribute to hinder the virus?
Today, answering such a question as with other questions put forward previously requires centralizing the data of an entire population.
Indeed, this would amount to studying correlations between a large number of personal and shared data of a population:
- frequency of attendance of the room
- medical background
- other sports activities practiced
To analyze this data, the state of the art - that is, current knowledge - has traditionally focused on methods requiring calculations to be made on a server that centralizes all the necessary and relevant data from a population.
Two major obstacles emerge:
- how to recover all this scattered and diverse data?
The data of a population is scattered and the current centralized approaches cannot merge all the silos of existing services, which would require us to launch partnerships with Google Maps , Withings, the Assistance Publique - Hôpitaux de Paris (APHP) and all the other regional hospitals, the Health Data Hub (HDH). As many French and American players would make the legal framework nebulous given the different laws.
- how can we not allow misuse of this data?
Today, anonymizing such diverse data is mathematically impossible without degrading the useful value of the data.
Decentralized approach to personal cloud
This complexity in the pooling of data to respond to such a study had been anticipated at Cozy Cloud even before the health context that we are currently experiencing. One step ahead, in 2018 we launched Cozy, a French personal cloud and open source - a personal data platform in which the individual brings together all their data to have both more control and more uses. An astonishing paradigm then emerged where the individual can thus benefit from services mobilizing all their data, all their digital privacy, without any data leaving their home.
It is this ambition to allow decentralization at the individual level by providing them with a personal cloud which constitutes a different and innovative approach that can answer the question asked.
“Who is legitimate to collect and control all your personal data? What conditions must be met for digital technology to be structurally at the service of democracy?”
Personal Cloud: an innovative and different approach from the centralized approach of GAFA which involves the user
The idea of Cozy Cloud is to empower individuals with a personal cloud, a digital home that brings together all their personal data because the individual is the only legitimate person to access all of his/her data.
Applied to the example of the closure of sport halls, with his/her personal Cozy cloud, the individual can:
- automatically retrieve geolocation data from his smartphone in his personal cloud which indicates the precise attendance of his gym, his history and health constants from his Digital Health Space, his blood pressure monitor connected to Withings, its diabetes telesurveillance service offered by the APHP, etc.
- contribute to a decentralized learning program ensuring that his data does not leave his personal cloud.
Cozy is thus paving the way for decentralized artificial intelligence where AI can learn everyone's data without anyone sharing their data.
DISPERS: Decentralized protocol can be activated
DISPERS: a decentralized protocol
As mentioned earlier, the privacy-friendly personal cloud provides its users with a dedicated digital space under their sole control. However, this platform remains single-user and, without an adequate framework, does not allow the creation of multi-user applications offering the same guarantees. The objective of the DISPERS protocol published with INRIA, was therefore to define this framework to allow, more specifically, the execution of distributed requests respecting privacy on a set of personal clouds. In our context, we consider that each individual of the studied population has his own personal cloud.
This protocol is the result of a CIFRE thesis between Cozy Cloud and the PETRUS team at INRIA Saclay. More than an anonymization method, this protocol provides strong guarantees that a calculation relating to personal data will not reveal any useful information throughout its execution, even in the presence of malicious actors. This work has given rise to articles published in major international academic conferences:
- Julien Loudet, Luc Bouganim, Iulian Sandu Popa Privacy-Preserving Queries on Highly Distributed Personal Data Management Systems
- Julien Loudet, Iulian Sandu Popa, Luc Bouganim SEP2P: Secure and Efficient P2P Personal Data Processing
The DISPERS protocol is also integrated into the ANR PerSoCloud project, a cooperation between Cozy Cloud, Orange, INRIA Saclay and University of Versailles Saint-Quentin-en-Yvelines, which aims to facilitate the connection of each person's personal digital home.
DISPERS thus offers a way to distribute tasks and information between several Cozy personal clouds. The protocol incorporates processes to hide the holders of the information or make the data incomprehensible to the actor who will be responsible for a sub-task. Thus, no single actor is able to compromise the entire calculation, nor to access useful information!
A technological enabler developed with INRIA
As the data is already present in users 'personal clouds, the DISPERS protocol can be done without having to convince Withings, APHP and Google Maps to share their users' data or find the legal framework in which to contract such a device. Legal constraints are, moreover, an unknown factor that deserves special attention in the context of a decentralized protocol: the legal constraints of each actor participating in the protocol should be listed.
Our protocol allows an epidemiological study to mobilize the data, gathered without difficulty by the individual - in our case user of the Cozy personal cloud, which would constitute a nightmare for a laboratory to recover all the personal information and consents. This thus makes it possible to establish a correlation calculation and assess the prevalence of the virus within the population according to its behavior, while guaranteeing the user that it is technically impossible that the data used does not go out of its way. personal cloud. Data shared in this way is impossible to bring together today to grow a common good.
A new thesis on the basics of DISPERS with a focus on AI
In addition to the previous thesis carried by Julien Loudet, we are opening a new path with a new thesis with INRIA carried by Julien Mirval and Cozy Cloud which aims to be able to do distributed learning in an AI context.
Use case: relevance of closing sport halls during the COVID-19 epidemic?
Conditions of application
Assuming that the population of our sample has a personal cloud in which each individual has connected their Withings account and retrieved their GPS data (including those of their gym attendance ) thus his health data recovered by his Shared Medical File (DMP) (tested positive or not for COVID).
Course of the study
From these data, and by applying the DISPERS protocol,
- we can then look at the percentage of contamination of these individuals before and after the closing of the sports halls
- we can compare it with the percentage of contamination of so-called equivalent people who do not go to the gym (equivalent on certain relevant variables such as location, age, etc.)
Results of the study
If we see a drop in the percentage of contamination before and after closure in individuals who went there AND that the percentage of contamination of individuals who did not go there does not follow the same trend, then we can assume that there is an impact.
Note that the experiment does not prove causality, but makes it possible to establish indices, which, added with other studies and / or protocols, would make it possible to better define the behavior of the epidemic.
Decentralized machine learning as a new way to pool data paves the way for democratic AI. Such AI needs to learn from a large, cross-cutting dataset from multiple sources, while respecting user privacy, something that current centralized approaches that operate in silos cannot allow. In a democratic society, individuals themselves control their data.
By adopting a personal cloud to reclaim your personal data, your Personal AI allows access to your “complete digital privacy” without ever disclosing it. A necessary paradigm shift, as the risks of privacy breaches have never seemed greater than in these uncertain times.
Resources and articles
- Ethique de l'Intelligence Artificielle - French Study by Cap Gemini - Juillet 2019
- Privacy-Preserving Queries on Highly Distributed Personal Data Management Systems - Julien Loudet, Luc Bouganim, Iulian Sandu Popa
- SEP2P: Secure and Efficient P2P Personal Data Processing - Julien Loudet, Iulian Sandu Popa, Luc Bouganim
- Article FR Entrainer une IA sans posséder la donnée est possible - Martin Masson