Automatic anonymization and pseudonymisation with smart software

Quite recently the GDPR legislation became active in Europe. Companies are therefore more aware than ever of their responsibilities relating to the storage and management of personal information. A clear consent from the customer is required before a company can store any personal information. In this post, we will explain how automatic document anonymization can help your company to be GDPR compliant.

For many companies the GDPR-legislation brought new legal challenges. Because what to do with the already stored personal information? Sometimes in physical, sometimes in digital archives across all kinds of databases and document types. For many companies consent is missing for all, or some parts of their data. In most cases, this leads up to companies having three choices:

  1. Delete all documents in the database that contain personal information.
  2. Try to acquire a backwards compatible consent from customers to keep storing the information.
  3. Anonymize all the documents in the database.

The first option is usually a hard choice to make. Other information in the stored documents can still be relevant or even vital for e.g. accounting and legal. The second option is far from easily achievable, because how do you reach out to someone for a consent? And what if you don’t get a response or don’t get the consent? The chances of being able to reach everyone and actually get everyone’s consent are minimal. That leaves us with the third option: anonymising all documents or pseudonymisation all documents. This can be done on files such as contracts, passports, ID’s, invoices and more. The problem here is that manually doing so can be a huge task. Manually it will be very time consuming, you will still need some kind of software interface and the risk of making mistakes is high. But what if anonymising documents could be done automatically with accurate anonymising software? This is where Klippa comes to the rescue!

What is Klippa?

Klippa is a specialist in data extraction for contracts, invoices, receipts, passports, ID’s and more. In many different applications our smart software is being used to extract information such as names, dates and amounts from documents. Example use cases are invoice processing, expense management and KYC process automation. Proper data extraction starts with text recognition software and the localisation of information in documents. Because this technology is already at the core of our software, Klippa has decided to extent its services with document anonymization and pseudonymisation software.

So how does that work?

Klippa sets up a secure server in a country of choice. Your organisation can send documents to this server via mail, API or a manual upload. As soon as the documents are received the Klippa software kicks in. The first step in the process is called document segmentation. In this step the Klippa software uses a machine learning model to determine the type of documents that are being send in and splits them into virtual boxes. In this step Klippa seperates e.g. contracts from invoices. In the next step a unique set of rules is being used for each document type that is classified. These rules determine what changes are being made to each type of document and makes sure that all the data you want to anonymize is anonymized and other information is left untouched.

Anonymizing personal information in documents

Now it’s clear what documents types are present and what information is relevant for the specific use cases, it’s time for our smart recognition software. Step by step the Klippa software will take a document, convert it to a readable format and use pattern recognition to find predefined types of information. Identified patterns (personal information) are then extracted from the document and replaced by a predefined identifier. This can be something without meaning like an empty space, a string of ‘–’, or a variable such as ‘NAME’ or ‘PHONE NUMBER’. The original values are, based on your wishes, permanently removed from the document en are not retrievable by anyone in the future. As a last step, all the documents are stored and send back to the database of the customer. The result? A fully digital and completely anonymized database with GDPR compliant documents!

When required, an eyeball control interface can be setup before documents are definitely marked as ‘anonymized’. In this interface employees from within your organisation can check all documents or perform checks on a sample basis as a final stamp of approval.

Does your organisation face challenges in the areas of data extraction, OCR or anonymization? Contact us and Klippa will help you solve these challenges in a cost efficient manner.