Passport parsing API as a service with OCR

Passport parsing API as a service with OCR

A passport is a document that almost everyone has at some point in their lives. It is issued by the country’s government to its citizens and mainly being used for traveling purposes. It also serves as proof of nationality, name, surname, gender, etc. Companies have been long-time accepting passports as identification documents from their customers. In most cases, they would write down the details and make a scanned copy of it. That, of course, seems like a satisfying solution if you need to prepare a contract for one client, for instance. However, things can get really complicated if you have hundreds of contracts to prepare, but also if your clients differ in nationality. Quickly you will find yourself drowning in physical copies of passports in different languages that you can not even understand. Let alone the potential legal problems you can face with passport copies laying around the office. 

Is there a solution to automatically process passports? 

The short answer is yes there is! In this blog, we will introduce the Klippa Passport parsing API, an API that can convert any given picture or PDF of a passport into structured data using OCR. The aforementioned API has record implementation time of a few hours and can parse the data from a passport within just a few seconds!

How does our passport processing API work? 

Parsing passports to data is done in a few important steps. First, a user takes a photo of a passport or submits a PDF file to our API. The first check that will be done concerns the document quality. If that meets our criteria and the quality is good enough, pictures are being transformed into a text file using OCR. For PDF documents we extract the readable text. The extracted text file can be compared to a notepad file on your computer. Just text, nothing more! Now we are able to check for language and country of origin with algorithms based on years of machine learning data. When we know where the document is from, a language-specific machine learning model finds the relevant data fields. After that, we can extract data such as name, surname, date of birth, gender, etc. When all the important information has been identified, we convert the documents into JSON and send a response back to the user.

Images speak louder than words

Just describing a technical process in the text does not always give a clear mental picture for everyone. Luckily, we can show an example of the steps we take to process a passport picture into data. Because we are a Dutch company, we use a Dutch passport as an example, but the API is not limited to Dutch passports: 

Passport parsing and OCR API

What are the fields that are easily extracted by the Klippa OCR?

Our Parsing engine is highly flexible. This means that there are out of the box fields that we process, but that we can also add custom fields or remove fields on specific API keys. We can even customize the output structure or anonymize certain fields and pictures. Every customer at Klippa has there own API key, so your customizations will never affect other customers. At Klippa, every customer gets the ideal solution for their situation. Below we have listed the out of the box fields. Input can be JPG, PNG, and PDF and the default output is JSON file.

Default Fields:

  • Country
  • Language
  • Name
  • Middle Name
  • Surname
  • Initials
  • Date of Birth
  • Place of Birth
  • Gender
  • Date of Issue
  • Date of Expiry

What about reading the MRZ with OCR?

From the 1980s onwards, countries started to issue passports containing an MRZ. MRZ stands for the machine-readable zone. Passports that contain an MRZ are referred to as MRPs, machine-readable passports. The structure of the MRZ is standardized by the ICAO Document 9303 and the International Electrotechnical Commission as ISO/IEC 7501-1. The MRZ is an area on the document that can easily be read by a machine using OCR (optical character recognition). Most modern passports have an MRZ, which is a string of characters, on the front of the document. Below we have added an example of an MRZ. It’s not important for you to understand how it works, but if you look at it carefully, you will see that it contains most of the relevant information on the document, combined with additional characters and a checksum. Klippa can automatically read passport MRZ with OCR. This is actually part of the process. We compare the MRZ with the data we find on the document itself. It gives us the assurance that the information we found is correct and it can also help to detect possible fraudulent documents. 

Passport OCR

What makes parsing of passports so relevant?

The era we are in is more digitized than ever. Tasks that are repetitive are slowly being replaced by computers and robots. In many cases, they can perform these tasks faster, with a smaller amount of mistakes and in a more cost-effective manner. At Klippa we focus on building software to replace manual repetitive labor in administrative business processes. The processing and checking of passports can be very time-consuming. Using OCR to automate your passport processing will enable you to save cost, onboard customers faster and reduce errors in administrative processes. 

About Klippa

In 2014, Klippa started with a receipt scanning app including OCR. Soon we decided that we should not limit our technology to an OCR API for receipts. Nowadays we have a lot of OCR products ranging from invoice and receipt parsing, to passports, ID cards and even contracts. 

Are you interested in learning more? Below you can find a timetable where you can book yourself a 30-minute demonstration with one of our experts in the fields of OCR. During the demo, we can guide you through the possibilities of our engine and provide you with a tailor-made answer to all your unique requests, concerning passports. Rather start testing the passport OCR API all by yourself? Contact us to request an API key!

Works with AZEXO page builder