Imagine that you have one hundred purchase orders, and each one contains a large amount of information including but not limited to the requested materials and delivery requirements. Without OCR (optical character recognition) technology, you must manually parse and input the important information into the system in a time-consuming process that is prone to human error. With the use of an OCR engine, such as the one we have implemented in OMS+, orders are automatically parsed, and the information mapped into the system.
The popular OCR engines that exist today can handle the most common file types, such as pdf, jpg, jpeg, png, etc. Some can handle multi-page documents. Amazon’s Textract supports multi-language processing: currently English, Spanish, Italian, Portuguese, French, German; as well as handwriting for English. In a follow up post, we will make a detailed comparison of the most popular OCR engines. OCR, just like all machine learning technology, is not perfect, and to achieve the best results, one should take the following into consideration:
- Provide higher definition images, ideally at least 150 DPI.
- If your document is already in one of the file formats that the OCR engine supports, don’t convert or downsample it before uploading it to the OCR engine.
- Table feature works best when the tables in your document are visually separated from surrounding elements on the page (e.g. not overlaid on an image or complex pattern), and the text within the table is upright (e.g. not rotated relative to other text on the page)
Other than the data preparation requirements to make OCR engines go, those who are looking for OCR technology as part of their solution should align their business needs and expectations with what exists on the market. Most customers are exploring applications of OCR in their businesses to solve one main problem: Automate the manual process of data entry from documents received into business systems.
This means that customers are looking for a comprehensive solution that goes beyond OCR:
- Read typed and handwritten text from documents and images (OCR)
- Map recognized text to critical business attributes (labeling)
Most applications of OCR that exist on the market can perform text recognition (point 1 above) with good accuracy. However, caution must be taken when evaluating how the OCR engine labels the recognized text. Readily available OCR engines on the market today are trained with general data and might not be built for orders or invoices. So, most texts will not be labelled in a way that we specifically want.
The reason that we must keep the labels consistent is because SAP requires a very specific way of labelling data in order to form an order. However, the labelling from OCR engines usually just find the relationship between the boxes and title as figures below. As you can see, similar data in different forms are labelled in very different ways. However, in order to allow SAP to directly create an order using the OCR data, we will need to make the label consistent. (i.e. Label “customer_name” will not be “customer name” or “CustomerName”). It is impossible for the OCR engine to achieve this consistency. Therefore, it is essential to build an additional layer that could label the texts in a consistent and accurate way. This additional layer will prepare the text data specifically for SAP creation.
In one application of OCR in OMS+, the order creation process is optimized from the moment a purchase order comes in from the customer to a sales order being created in SAP. OMS+’s easy to use user interface organizes the recognized text along with the predicted labels into an order that’s ready for processing in SAP. As with any type of machine learning application, error handling is an important part of the equation as predictions get closer to 100% accuracy over time. Therefore, any application that leverages OCR to predict labels for the text on a document must enable the efficient handling of recognition and prediction errors.
Three points are of particular importance when choosing software that enables OCR for any type business process:
- The application, no matter if the OCR technology used is homegrown or integrated from a third party like Google Document AI, should be fit to the particular use case so that recognized text that is not useful to the scenario should be filtered out.
- The application should enable the user to review and revise the predictions efficiently to prepare for the next step in the process.
- The application should gather data from user feedback such as revisions for use in an iterative machine learning cycle.
OMS+ OCR is part of the OMS+ automated order processing function. The entire function is a “three-layer cake,” a term coined by our senior software engineer while building the solution. The OCR engine plays the role of the first layer. The second and third layers are what distinguish OMS+ OCR from standard OCR engine.
- Text Recognition: Files containing customer purchase orders are picked up in OMS+, and OMS+ sends the documents to the OCR engine via an API call. The OCR engine parses these order documents and returns the recognized text to OMS+.
- Label Prediction: However, because the OCR engines are managed by the third party as services, they are like a black box for us. We input data and get output without knowing how the output is generated. Output format and data are fixed, which does not give us much flexibility to modify those data directly from OCR engine. Plus, most of the OCR engines offer analyze functionality to the text data, which the text data will be automatically labelled by the OCR engines. But the labels by the OCR engine are lack of accuracy. Thus, we need to build additional layers to parsing the result that would label most of the text correctly (John Smith as name, 800-555-7777 as phone number, etc.). We use SAP Intelligence to parse the output from OCR engine. The parse process (called bulk) is labelling text by ourselves. Bulk has two steps. The first step, we will use a machine learning model to determine if this text is valuable for us and filter out the useless text. The second step, we will use another machine learning model to label the rest of the text.
- OMS+ will extract the material data, send to our material search model, which is also hosted by the SAP Data Intelligence. The material search model will go through each input material. Transform the descriptions and find the matching items in the client database. The items will be extracted and automatically put into an SAP order table. After all the material are put in this SAP order table, the table will send back to OMS+ and display to users nicely. Of course, you can edit any data on that OMS+ page. Then an SAP order with the required materials is generated.
As you can see, the second and third layers are built by DataXstream. Those two machine learning models will become smarter and smarter each day as the business uses it to parse and create orders. These machine learning models can learn to automatically categorize the input data. So more data they are fed, the better accuracy of categorizing and labelling they will achieve.