2022 guide to invoice capturing

2022 guide to invoice capturing

Invoice Data Capture

Invoice capturing and processing are crucial tasks accounts payable have to execute. Unfortunately, this can cost a lot of time and consume hours of a day. According to a study carried out by Zapier, for instance, 76% of workers spend 1–3 hours a day moving data from one place to another and less than 3 hours of their day on impactful work!

It is inevitable, however, that an organization needs to capture information from invoices. The capturing process includes extracting relevant invoice data such as the supplier name, invoice number and total amount. Furthermore, the captured information needs to be validated and entered into the bookkeeping system. 

So does that mean that the time of qualified employees needs to be continuously spent on these repetitive and tedious tasks? Luckily not! 

Intelligent software has been developed that offers organizations the ability to digitize and extract data from invoices quickly. This doesn’t only result in time savings, but also cost- and resources savings. 

In this blog we will explain what invoice capturing is, how software can automate the process and which benefits the implementation of an invoice capturing solution holds for an organization. 

What is invoice capturing?

Invoice capturing is executed by the accounts payable (AP) department and describes the process of extracting data from invoices, validating the extracted information and entering it to the bookkeeping system. 

For a small organization this system might be literally just a paper book stating expenses, vendors that receive payments and further payment details. But even for many AP teams in bigger organizations, the invoice capturing process is still performed manually, requiring employees to enter data one by one into the bookkeeping system.

The same holds for the approval of invoices. It is often still done manually, involving a paper copy of the invoice moving from desk to desk. Imagine the chaos that is caused in a larger organization by having a manual invoice capturing process like just described.

Luckily, this doesn’t have to be the case anymore as Intelligent Document Processing (IDP) solutions, such as Klippa DocHorizon, have been developed for accounts payable teams to capture data from invoices automatically.

In the following paragraphs, we will discuss why invoice data capturing is important, followed by the different types of invoice capturing.

Why is invoice capturing important?

Invoice capturing is important to avoid mismanagement and inaccuracy of an organization’s expenses. Accurate data and timely processed invoices are absolutely crucial as they impact the ability of an organization to operate.

When an invoice is captured, attention should lay on conducting this with the least amount of errors as the organization’s finances, relationships with supplies and clients rely heavily on this. Paying the correct amount and in time should be at focus to avoid a backlog of numerous invoices and frustrated clients and employees. 

Furthermore, a fast and accurate invoice capturing process prevents organizations to pay late payment penalties and encounter miscommunication between various stakeholders. Invoice capturing is important because it helps to foster good supplier management as payments are conducted on time, mistakes are prevented and communication improved.

Invoice capturing enables organizations to scale their business, increase productivity of skilled employees and reduce costs significantly.

So in conclusion, a properly functioning invoice capturing process has a severe impact on an organization. Read on to find out which options an organization has to capture data from invoices.

Types of invoice capturing

In the following paragraphs, we want to give a clear overview of the three different types of invoice capturing and present the advantages and disadvantages of each of them.

  • Manual data capture
  • Automated data capture
  • Human-in-the-loop

Manual data capture

Manual invoice data capturing has been done for many years and has proven to become more challenging as organizations grow quickly. Accounts payable teams often receive invoices in various formats such as hard copy, email attachment or fax. Extracting data from these invoices manually becomes increasingly more challenging. 

Let’s discuss the advantages and disadvantages first to help you form your own opinion.


  • For small organizations, manual data capture might be the smartest option, as no monthly license fee for software needs to be paid
  • Manual data capture is the easiest one to implement
  • Organizations with low volume of invoices can keep an overview of expenses


  • The process is time-consuming, which leads to delayed payments or, even worse, late payment penalties → 45% of invoices take a week or longer to process
  • Manual data capture is error-prone, which leads to unreliable information and costly mistakes
  • High costs → Research firms like the Everest Group have found that manually processing a single invoice can cost between $12 to 30$ because the process is time-consuming and involves a lot of resources
  • Low productivity since skilled employees have to spend their time on tedious, manual operations instead of utilizing their skills for value adding work
  • Poor scalability as the manual management of invoices becomes almost impossible when an organization grows
  • Poor supplier management because of delayed payments, poor communication or mistakes
  • Poor visibility due to paper based invoices, making it difficult to share an invoice across departments
  • Medium to big sized organizations experience a backlog of invoices because of the volume of invoices, causing a delay in payment
Manual Invoice Data Capture

Once organizations grow and experience an increased volume in invoices, manual invoice processing becomes an unpreferred option. 

Automated data capture

Fully automated data capture makes use of a software solution that is able to extract structured data from invoices without any human intervention. The software is based on artificial intelligence and machine learning, making it possible to improve accuracy, work more efficiently and save valuable resources.

With the implementation of automated data capture, a lot of the challenges of manual data capture can be overcome. Let’s have a look at the advantages and disadvantages of this solution. 


  • Enhanced data accuracy because the algorithm extracts the exact data that is presented
  • 40 – 70% faster invoice processing, as the process is not limited to employees working hours and capacity
  • Cost-effective because invoices can be processed quicker, which saves working hours of employees and prevents organizations to pay late payment penalties
  • Scalable as processes are not dependent on humans 
  • Algorithms don’t get tired or bored from repetitive tasks, no break or sleep needed
  • Increased productivity as accounts payable clerks can focus their time on value generating tasks instead of repetitive, time-consuming tasks
  • Predictable invoice capturing process, which allows organizations to ensure payment of clients and suppliers at a predetermined time and date


  • Algorithms might still experience errors when confronted with more complicated invoice formats 
  • A fundamental change in management and overall processes would be required
  • Technical know-how to maintain the software might be required
Automated Invoice Data Capture

Human-in-the-loop (HITL)

Capturing data from invoices with a Human-in-the-loop setup combines the best of a fully automated invoice capturing solution (AI) with the best of human intelligence.

This data capturing process makes use of software that is powered by artificial intelligence and machine learning, but also involves a human that validates and approves the extracted data. That way the highest possible accuracy can be achieved.  


  • Errors conducted by the software can be corrected by the human before data is stored in the database
  • The best of human intelligence is combined with the best of artificial intelligence
  • Highest possible level of accuracy because the extracted data is double-checked by a human before it is saved in the database
  • Makes use of the speed of automated invoice data capturing 
  • Even data from difficult invoice formats can be extracted, as both, human and AI work together


  • It still involves the resources of an employee
  • The data capturing process is limited by the working hours and capacity of the human-in-the-loop
HITL Invoice Data Capture

Comparison of the three options

The table below shows a comparison between the three different options, manual -, automated – and human-in-the-loop data capture. 

With this overview, it becomes clear right away, which option might be best for your organization. 


For many organizations, manual data capture is not an option due to the high volume of invoices that need to be processed. In these cases, an automated invoice capturing solution becomes a necessity. 

In the last sections of this blog, we will discuss the functionalities, working and implementation possibilities.  

What can an invoice capturing solution do?

Now that we discussed the three different types of invoice data capturing, we would like to dive deeper into the details of an invoice capturing solution. We will discuss the following functionalities of the solution: 

  • Scanning
  • Optical Character Recognition (OCR)
  • Data extraction
  • Conversion
  • Classification
  • Data anonymization
  • Fraud detection


An invoice capturing solution offers scanning functionality. Here, scanning is referred to as the process of reading an invoice with OCR, which will be further explained in the next paragraph.

Once the scanning process is complete, all relevant information is converted into structured data. Large parts of the scanning process can be automated by smart solutions like the Klippa OCR API.

Optical Character Recognition (OCR)

OCR works in a way that it turns an image into text and then into a machine-readable format such as CSV, JSON, or XML. With this technology, manual data capturing from invoices is history! 

OCR is powered by AI and has proven itself to be very helpful with invoice data capturing, as it helps employees to work faster and with a minimal error rate.  

Data extraction

An invoice capturing solution also offers the functionality of data extraction. The organization’s operations and processes can be further improved by extracting data from an invoice, and then processing, storing and analyzing this data accordingly. 


Pictures are sometimes difficult to work with. That is why the invoice capturing solution has the functionality to convert a picture into a digital text file. 

That way, pictures and PDFs can be converted into formats such as JSON, CSV, XML, PDF/A, and XLSX. 


An additional functionality of an invoice capturing solution is the sorting and classification of unknown documents. This process is also known as document classification and works with the help of smart algorithms. 

Characteristics of a document are extracted and sent to an algorithm, which will then determine how to classify the document. This can be useful for a number of use cases. One example could be the classification of a document with sensitive data, such as the name and address of a client on an invoice, resulting in the action of data anonymization.


Data anonymization

Data anonymization is also known as data masking. With this security technique, sensitive data on documents can be masked and abuse of information prevented.

Our ultimate guide to data masking offers a more in depth explanation of the data anonymization process and explains the different techniques that can be used.   

Fraud detection

Sadly, Fraudsters are often using technology, such as photoshop, to create fake invoices or manipulate the total amount of a purchase.

An invoice capturing solution can be used to detect these attempts of fraud. The software would automatically flag these invoices and save an organization a lot of money.  

How does invoice capturing work?

We all know how the manual process of invoice capturing works, but how does an automated solution like Klippa DocHorizon execute the task? In the coming paragraphs, we will discuss the four steps of the automated invoice capturing process:

  1. Capturing and uploading the invoice to the API
  2. Converting the image into a text file
  3. Parsing TXT into JSON
  4. Verifying the extracted data

1. Capturing and uploading the invoice to the API

In the first step, an image or PDF file of an invoice has to be uploaded to the API. This can be done via a mobile- or web application. 

The invoice can be uploaded either with or without the background. If the image is sent uncropped, the API will automatically cut out the background.

It is important that the image contains the entire invoice, without any noise and with decent quality to ensure an accurate result. If this is hard to achieve with an existing solution, the image quality can be enhanced by using our mobile scanning SDK.


2. Converting the image into a text file

In the next step, our invoice capturing solution will automatically convert the image into text (TXT). The software recognizes what each part of the invoice actually is. It will determine which part is, e.g., the invoice number, total amount, the address or the purchasing date.

The data from the invoice is extracted, but not yet structured.


3. Parsing TXT into JSON

In the final phase, the Klippa parser converts the text file into JSON with the help of machine learning. 

JSON is commonly used for transmitting data in web applications, as it is a standard text-based format for the representation of structured data. 

From here, it will be very easy to process the captured invoice data in your database. 


4. Verifying the extracted data

This step is optional and used to verify that the data is captured with high quality, accuracy and consistency.

Usually, the verification process is executed with a third-party source, such as the Chamber of Commerce database.

Which fields from the invoice can be captured?

Before an organization should commit to an invoice capturing solution, it should ensure that the solution can capture relevant fields. 

Klippa DocHorizon can extract the following fields from an invoice: 

  • Type of the document
  • Invoice number
  • Language on the invoice
  • Country of origin
  • Name of the merchant
  • Address details of the merchant
  • Contact details of the merchant
  • Website of the merchant
  • Details of the client
  • Method of payment
  • VAT amounts and the total amount
  • VAT number
  • Amount of change
  • Card number
  • The currency and the total amount
  • Purchasing date
  • Purchase order number
  • Due date
  • Delivery date
  • Chamber of commerce number
  • Line item prices, quantity, description and category

Besides extracting the above-mentioned fields, the invoice capturing solution can perform automated checks. Those consist of cross-checks to identify fraudulent invoices and image hashing to find duplicates. 

How can our solution be integrated – the difference between the Klippa API and SDK?

If you are thinking of using Klippa as your solution, you should know the difference between the Klippa API and SDK. 

Our Document Scanner SDK can help to improve the quality of an image by optimizing the brightness and identifying edges of the image. It can be implemented in iOS and Android apps to help organizations capture data from invoices. Features such as perspective correction, real-time user feedback and light adjustment ensure that users can take pictures in the best possible quality. 

This makes further processing of these invoices easier and more accurate. Our SDK basically allows an organization to turn any device into an accurate and fast invoice scanner. 

Our API allows you to send a document through a mobile app, e-mail or web application to our software resulting in a structured data format within seconds. If you already have an existing software or application that you want to integrate OCR technology into, then our OCR API is often the best choice. 

Furthermore, the Klippa SDK can be linked to our cloud-based OCR API. This means that a scanned document is processed through the API, resulting in a JSON response that is returned to the application.

One of the main advantages is, of course, that by using our solutions, the developers of your organization don’t have to develop all the components themselves. This saves costs, time and other resources.   

Klippa as the solution for invoice capturing

Throughout the blog, we have already mentioned a couple of features of the Klippa DocHorizon software and how it can help an organization to optimize their invoice data capturing process.

Our invoice capturing solution can be integrated via the Klippa API or SDK. As it is created in a developer-friendly way, our components can easily be integrated into the existing software of an organization.

As our solution is well documented, developers will find all information they need to successfully implement the API. That way, the Klippa solution can be integrated within a day.

If you are looking for an invoice capturing solution that relieves accounts payable teams from daunting and repetitive work, then Klippa is here to help you. Get in touch with us by filling in the demo form below or contact one of our experts. We are excited to work with you!  

 Schedule a free online demonstration

A clear overview of Klippa in only 30 minutes.

Works with AZEXO page builder