OCR will approach human levels

Optical Character Recognition (OCR) is the tool that converts a picture of an image into the text contained in that image. For a long time, OCR was such a bottleneck for document understanding that many end-to-end solutions were simply bundled into the OCR itself.

But after a decade of deep learning advances in image processing, OCR is improving so quickly that the best commercial models are nearing perfection for printed business documents, and open source models outperform industry-standard proprietary models from just a few years ago.

If this trend continues, OCR for scanned business documents is quickly approaching complete commoditization and zero-cost IP. Customers will pay for cloud operations and support, not for proprietary algorithms.

OCR companies will shift their focus to begin differentiating along three new axes:

  • Latency. High-speed OCR to support interactive client applications, video, and AR/VR
  • Handwriting. The holy grail of OCR, still mostly out of reach today except for extremely constrained scenarios
  • Templates 2.0. OCR used to be the software that did most document processing, and there's a good argument that there's a long future in housing some forms of extraction inside OCR.