Developers are constantly challenged by their organisations to add features to existing software or build completely new solutions. Luckily, we live in the era of open source and APIs. Therefore developers don’t have to build every feature from scratch.
5 years ago, Klippa started developing an expense management and invoice processing solution. Receipt capturing was a very important functionality in our application, because we feel that manually entering data is a thing of the past. We believe in the API economy, so we started looking around for the best receipt capturing API to integrate into our software. Why build something from the ground when there are great things out there, right? We started testing for months, tested over 10 different solutions and more than 10.000 receipts. Sadly, every solution we tested disappointed us. Many did not even go beyond basic OCR, which only gives you a raw text file as output. Our requirements were that the results had to be delivered in realtime (within 5 seconds), accurate (over 75% extracted data), scalable, automated (so no people involved), preferably in JSON and suitable for receipts from common European languages like Dutch, German, French, Spanish, Italian and English.
Some receipt capturing APIs we tested failed on speed, many failed on accuracy and almost all failed on the multi-lingual aspect. So that left us with a problem, we wanted to use a good receipt capturing API, but there was no good option to choose from. So what do you do? Well we decided to build one ourself. With good experience in the team on natural language processing, REGEX and machine learning we set out on a quest to build what we needed: the best API for receipt capturing in Europe. In 2018 we launched the first version of our API, that we are improving every day. The API is available for third parties to implement in accounting, ERP, banking, loyalty and many other solutions.
So how does our receipt capturing engine work?
The process can be divided into three basic steps.
1. The first step is sending a picture to our API in a request. From there we will turn the picture or PDF of a receipt into a basic text file. This part is just basic OCR. The image below shows an example of this step for a Dutch receipt:
But a simple text file does not get you very far for most solutions, you actually want structured data like JSON right? Luckily we got you covered.
2. After the OCR step, our smart parsing engine comes into action. It analyses the text file and interprets what each part of text actually is. Dates, amounts, addresses and many more options. It identifies possible candidates for all data fields using machine learning and REGEX parsing. From there, the best candidates are selected for e.g. total amounts, purchasing date and more.
3. Now that we know what the right data points are, we convert the receipt into a JSON file. Giving you a structured response back to process in your application. An example of a simplified JSON response for the example receipt you can find below:
How was the API build?
At Klippa we build all our backend services in GO. This ensures fast processing and responses. The parser that performs the data extraction was built on Python, because it has very good machine learning and REGEX capabilities.
What fields does the Klippa Receipt Capturing API extract?
There are many different things on receipts that can potentially be of interest for a specific use case. We have tried to make our API very flexible by extracting many different data fields. In total we extract more than 50 different fields from receipts. These of course include basic things like the total amount, dates, VAT information and address, but also more complex things like line items recognition on receipts. You can find all the fields in our API documentation.
What are the use cases?
Great question. There are many use cases where receipt OCR, specifically the data extraction part, can be of interest. We see that most of our customers use the receipt capturing API for accounting & ERP solutions, but also have insurance companies, banks and loyalty solutions using it. For some cases we even implement additional features to help our customers. For loyalty providers for example, we added duplicate detection based on hashing so they can easily prevent fraud on receipt based loyalty campaigns.
What languages does the receipt API support?
By default our API is language agnostic. That means that it was not build to serve a specific language or set off languages. It was build to extract certain types of information such as amounts, dates, times, VAT values, VAT percentages, Chamber of Commerce numbers, payment information and line items. That does not mean that accuracy is the same on all languages, because there is a learning curve. The more documents we see on a certain language, the better we perform. Out of the box the API works best for European languages like English, French, German, Spanish, Italian, Portuguese and Dutch. Other languages are supported but might need more optimisation for specific use cases.
What about API documentation and support?
As we explained in the beginning of this blog, more than 50% of our team are developers. Our developers love to use APIs, especially when they are documented well. Providing support sometimes is a nice bonus. We believe that you have to practice what you preach, so of course we have documented our API carefully and provide both business and technical implementation support if needed. Here you can find the Klippa OCR documentation and you can contact us here for any questions.
Get a demo to see how it works
Of course seeing is believing with technical solutions. For us, the next step is usually an online demonstration of what our API can do for you. After that we dive into a POC together with our clients. For the POC you will get an API key and testing credits so you can do all the testing you like with both business support and technical support from our side.
So, if you are ready to see the best receipt capturing API in action, contact us or plan an online demonstration!