Invoice recognition at line item level with OCR

Invoice recognition at line item level with OCR

Invoice recognition, also known as scan and recognise or OCR, has been a hot topic in accounting for some years now. Slowly but surely, accounting is changing to robotic accounting and invoice recognition is an important part of this. For a long time, invoice recognition was mainly available for the headline information such as creditor, debtor, date, invoice number, total amount etcetera. However, the development of the technology has made increasingly smart machine learning systems possible. At Klippa for example, we work with deep learning. This technique makes it possible to extract more and more specific information from documents with increasing precision. That is why automatic invoice recognition at line level is also possible with the Klippa OCR API and invoice processing!

Why is invoice line item recognition useful?

The automatic extraction of the key information on an invoice is of course already very useful to convert many invoices into an automated booking proposal. OCR has already proven its value in this area. Especially with standardised ledgers, the automation of booking proposals becomes easier and easier. However, it is not always possible to make a complete booking proposal with the core information. After all, invoices (and receipts) sometimes have multiple invoice lines and not every line has to be booked on the same general ledger account, cost centre or project. By recognising invoices at a line item level, more context is available for our self-learning software and almost all proposals can be made accurately.

How does line recognition work on invoices?

We use a machine learning model to recognise lines on invoices. It may be a somewhat technical story, but we use deep learning for this. A form of machine learning in which the software itself can derive meaning from a set of labelled data. At Klippa we have developed a large dataset in which the invoice lines are clearly marked per invoice and receipt. The self-learning software then does its magic on this dataset in order to recognise patterns in it. Eventually, a so-called model came out of this. Every time a document arrives at Klippa for processing, the document is compared with our model. On the basis of a statistical analysis, we look at what the structure of the document resembles. As soon as this is clear, the software designates the location of the invoice lines. This can be seen as making a kind of highlight with a marker, as you would do with a summary.

As soon as you have the markers of the region of the invoice lines in place, another computer program will come into operation. We call this a parser. This parser looks at all the information in the highlighted area and assigns meaning to each piece of information. For example, the description, amounts, numbers and VAT values on the invoice are kept apart and stored separately in the database for each invoice line. This information, together with the header information, is eventually used to include the booking proposal at line level.

What does the OCR of invoice lines look like?

It is sometimes difficult to make a good visualisation of the operation of a computer program. After all, the software does most of its work in the background, and only the output is visualised in an interface. To give you an idea of how the software works, you can look at the visualisation below. Here you can see how the software, in addition to the key data, has also put a green box on an invoice, this is where the invoices lines can be found. Next, black blocks are drawn around the individual values that are relevant and these are connected to each other with black lines. In this way, the data is extracted and linked to each other, without using templates.

Invoice line item ocr

For what languages does it work?

Good question! Luckily the answer is pretty good also. Our software can work on pretty much any language. It performs best on European languages like Dutch, German, French, Italian, Spanish, English and many more. It can also be trained to perform better on specific languages if the out of the box results are not as required. Perhaps interesting to know: next to invoices, Klippa can also extract line items on receipts or other documents.

Let’s talk!

At Klippa we implement our smart OCR solutions in our own software, for example in the purchase invoice module, but also in third-party software. We have user-friendly APIs available for this purpose. Do you have a challenge around scanning and recognition of invoice lines? Please let us know by contacting us! We like to think along with you. Want to learn more about? Read more on our invoice OCR page.

Works with AZEXO page builder