AI and the Risk of Data Leakage and mitigation with FOSS

The Dutch Data Protection Authority (AP) has recently received several reports of data leaks because employees shared personal data of, for example, patients or customers with a chatbot that uses artificial intelligence (AI). By entering personal data into AI chatbots, the companies that offer the chatbot can gain unauthorized access to that personal data. The AP sees that many people in the workplace use [digital assistants](https://autoriteitpersoonsgegevens.nl/themas/algorithms-ai/algorithms-uitgelegd/generatieve-ai), such as ChatGPT and Copilot. For example, to answer questions from customers or to summarize large files. This can save time and employees less enjoyable work, but it also involves [major risks](https://autoriteitpersoonsgegevens.nl/themas/algorithms-ai/risicos-algorithms-ai-ontwikkelingen-in-nederland/risicos-generatieve-ai). When they would they use the #foss (free and Open Source tool) Gpt4All with the LLM's Mistral the risks could be mitigated. Because Gpt4all runs locally and default won't exchange any data. A data leak involves access to personal data without permission or without intention. Employees often use chatbots on their own initiative and against the agreements with the employer: if personal data has been entered, this constitutes a data breach. Sometimes the use of AI chatbots is part of the policy of organizations: then it is not a data breach, but often not legally permitted. Organizations must prevent both situations. Most companies behind chatbots store all entered data. As a result, this data ends up on the servers of these tech companies, often without the person who entered the data realizing it. And without knowing exactly what the company is going to do with the data, Moreover, the person to whom the data belongs will not know this either. No need to tell again this risk can be mitigated by using Free and Open Source AI tools. Medical data and customer addresses In one of the data breaches that the AP received a report of, an employee of a GP practice had entered medical data of patients into an AI chatbot – against the agreements. [Medical data is very sensitive data](https://autoriteitpersoonsgegevens.nl/themas/basis-avg/privacy-en-persoonsgegevens/wat-zijn-persoonsgegevens#bijzondere-persoonsgegevens) and is not given extra protection in the law for nothing. Simply sharing this data with a tech company is a major violation of the privacy of the people involved. The AP also received a report from a telecom company, where an employee had entered a file containing, among other things, customer addresses into an AI chatbot. Make agreements It is important that organizations make clear agreements with their employees about the use of AI chatbots. Are employees allowed to use chatbots, or would they rather not? And if organizations allow it, they must make it clear to employees which data they may and may not enter. Organizations could also arrange with the provider of a chatbot that it not store the entered data. Report data leaks If something does go wrong and an employee leaks personal data by using a chatbot against the agreements made? In that case, a report to the AP and to the victims is [in many cases mandatory](https://autoriteitpersoonsgegevens.nl/themas/beveiliging/datalekken/datalek-wel-of-niet-melden).

In this example, we focused on the specific situation in the Netherlands. The Dutch laws are derived from the EU GDPR laws. So a lot of what is discussed in this article can also be used also in other European companies. Generally speaking, if you use a FOSS tool like Gpt4All you could use LLM's without the risk of data leakage.


Free and Open Source Large Language Models (LLMs) should be the basis for the future of AI