Ever since 2018, the European GDPR legislation has caused companies to be more aware than ever of their responsibilities relating to the storage and management of personal information. Businesses require clear customer consent before they can store any privacy-sensitive, personal data. This legislation has formed challenges for businesses to this day. In this blog, we will explain how automatic document anonymization can help your company to be GDPR compliant.
The challenges of GDPR compliance
For many companies, the GDPR legislation brought new legal challenges. What to do with personal information previously stored in your database? It is often stored either in a physical or a digital archive across all kinds of databases and document types. Customer consent for all of most of this data is missing. In most cases, this leads to companies having three choices:
- Delete all documents that contain personal information from the database.
- Try to acquire retroactive consent from customers to keep the stored information.
- Anonymize all documents containing personal info in your database.
The first option is usually difficult to decide. Other information in your stored documents can still be relevant, or even vital, for e.g. accounting and legal purposes. The second option is far from easily achievable, because how do you reach out to someone for consent? This is especially difficult when your database is very large. And what if you don’t get a response or don’t get consent? The chances of being able to reach everyone and actually get everyone’s consent are minimal.
That leaves us with the third option: anonymizing all documents or the pseudonymization of all documents. This can be done on files such as contracts, passports, IDs, invoices and other document types. The problem here is that manually doing so can be a huge amount of work. It would be very time-consuming and you’d need some kind of software interface to do it. The risk of making mistakes is also very high. But what if anonymizing documents could be done automatically with accurate anonymizing software? This is where Klippa comes to the rescue!
What is Klippa?
Klippa is a specialist in data extraction for contracts, invoices, receipts, passports, IDs and other documents. Our smart software is used to extract information such as names, dates and amounts from documents. Example use cases are invoice processing, expense management and KYC process automation. Proper data extraction starts with text recognition software and the localization of information in documents. Because this technology is already at the core of our software, Klippa has decided to extend its services to document anonymization and pseudonymization (also known as data masking) software.
So how does it work?
Klippa sets up a secure server in your country of choice. You can send your documents to this server via mail, API or with a manual upload. As soon as the documents are received, the Klippa software kicks in. The first step in the process is called document segmentation. In this step the Klippa software uses a machine learning model to determine the type of documents that are sent in and splits them into virtual boxes. In this step, Klippa separates e.g. contracts from invoices.
In the next step, a unique set of rules is used for each classified document. These rules determine what changes are made to each type of document and makes sure that all the data you want to anonymize is anonymized. Other information is left untouched. All relevant data is saved for your business to use at any time.
Anonymizing personal information in documents
Now that it is clear what the document types are and what information is relevant for specific use cases, it’s time for smart recognition software to do its work. Step by step, the Klippa software will take a document, convert it into a readable format, and use pattern recognition to find predefined types of information. Identified patterns (personal information) are then extracted from the document and replaced by a predefined identifier. This can be something without meaning like an empty space, a string of ‘–’, or a variable such as ‘NAME’ or ‘PHONE NUMBER’. Depending on your preferences, the original values are permanently removed from the document and are not retrievable by anyone in the future. As a last step, all the documents are stored and sent back to the database of the customer. The result? A fully digital and completely anonymized database with GDPR-compliant documents!
When required, an eyeball control interface can be set up before documents are definitely marked as ‘anonymized’. In this interface, employees from within your organisation can check all documents or perform checks on a sample basis as a final stamp of approval.
Does your organisation face challenges in the areas of data extraction, OCR or anonymization? Contact Klippa and we will help you solve these challenges in a cost-efficient manner.