Receipt Transaction Recorder

Record and categorize a transaction history based off of receipt images


Project maintained by kburgon Hosted on GitHub Pages — Theme by mattgraham

Abstract

It is difficult to keep track of purchases made and build a budget based off a transaction history. Common applications don’t do a sufficient job at tracking purchases by items purchased, which causes complications in the creation of a well-categorized budget. The purpose of this project is to use Optical Character Recognition to keep a categorizable transaction history based off of the text pulled from receipt images.

An example receipt
An example receipt with purchased items that could fit in different categories

Problem Description

One of the more difficult parts of keeping a budget is tracking expenses and splitting them into different categories. This task is made even more difficult when purchases are made in one transaction that include individual items that should be split into different categories in one’s budget. Applications exist that will track and categorize transactions made in bank accounts, but these applications don’t take into account those transactions that include items that belong to different categories.

Proposed Solution

The solution that I propose is to build an application which can read text from receipt images and track expenses per item purchased instead of tracking by total spent at purchase. This application could easily be created for mobile applications, as it has the potential to be written for both Android and iOS.

Method of Completion

An experimental application was created using python that reads receipt images, collects all the text read from the image, and generates a list of the items purchased and their prices. An OCR engine was used to extract text from the receipt images, and regular expressions were used to find purchased items and their prices. The process of execution in the application was to be created as follows:

Flow of the Receipt Transaction Recorder application

Results

The original plan for the project was to use the open source OCR engine called Tesseract to read receipt images. It was quickly discovered that Tesseract is not extremely accurate and that a fair amount of image manipulation would be needed to get even semi-accurate results. Another option for an OCR engine was Google Cloud Vision, which gave much better results and is easier to run on multiple platforms.

Comparison of results in text extraction using Google Cloud Vision vs. Tesseract

Challenges

Reading Text Under Different Lighting Conditions

While Google Cloud Vision is accurate under good lighting conditions, shadows or overall darker lighting cause for some difficulty in singling out text in an image. An attempt to run image threshold filtering was made on the images, but since Google Cloud Vision already uses image filtering this made no difference on the results.

Item Recognition

A variety of receipts from eight different businesses were tested, and challenges suggested that generally pulling individual charge information from a set of all receipt text is not reliable and never provides completely correct results. There were a variety of issues that were encountered, and many of these problems cannot be resolved by merely using regular expressions or simple string matching algorithms.

Erred text read from a wrinkled receipt.

Example of text that appears in a different order and confuses matching charges to items purchased.

Example of a set of text that was incorrectly recognized as a purchased item.

Conclusion

By using the Google Cloud Vision API and regular expressions, an application was made that could read some purchases and prices from receipt images. Future improvements will include algorithms to learn items purchased and their prices from text locations and user input.

References

  1. Google Cloud Vision API, https://cloud.google.com/vision/, Last Accessed 12/03/2016
  2. Tesseract, https://github.com/tesseract-ocr/tesseract/blob/master/README.md, Last Accessed 12/10/2016