Every model in 60 seconds

Diagram

Are you ready? Let's power through the different types of models you might encounter in engineering and sales meetings. For each model, you'll find a layman's explanation along with the high-level arguments for and against that model type.

Templates

Associated with
OCR software, form processing
What it often means
Hard-coded regions of a page that correspond to output fields
Advocates would say
Don't overcomplicate things. If you have a known, standard form like a tax filing, this is all you need to get fast and accurate straight-through automation. Plus, you don't need to be a programmer: anyone can point and click to define what they want extracted.
Detractors would say
Even standard IRS forms change from year to year, sometimes many times in a year! These small changes require you to create new templates. Before you know it, you're drowning under a mountain of templates to maintain. Plus, you'll still need another more advanced extraction system to handle the more nuanced documents.

Rules

Associated with
explainability, expert systems, business process modeling, systems with firm behavior requirements, highly regulated industries
What it often means
Rule engines and domain specific languages. But also often ordinary computer code, regular expressions, and classic NLP libraries.
Advocates would say
Rules are fast, tested, and inevitable. Everyone uses them internally at least somewhere, even big tech companies famous for their deep learning. Rules are great at capturing expert guidance ("Look for the digits to the right of the text 'ID Number'"), and result in exquisitely explainable models. As MIT Professor Max Tegmark says, Newton may have discovered gravity using the neural net inside his head, but he encoded and distributed that knowledge as a rule: F = ma.
Detractors would say
Rules may work for small use cases, but they don't scale to large ones. Scaling, for rule systems, means more programming: lots of long-term coordination between domain experts, operations, and programmers. Past a certain point of scale, human programmers struggle to capture the complexity of the real world system they are modeling. Our brains just can't fit that many possibilities into our heads at once. That results in hard-to-maintain codebases of "spaghetti code" that, taken collectively, loose the elegance of F = ma. Rules also result in "all or nothing" systems in which results are either claimed to be right or wrong with little confidence scoring in between.

Learned Rules

Associated with
Domain-specific languages, decision trees, program synthesis
What it really means
Learning a human-interpretable computer program to perform extraction.
Advocates would say
This is a holy grail of machine learning that still hasn't transitioned fully from research to industry. It blends the flexibility and nuance of learned models with a representation humans can audit, modify, and constrain if necessary.
Detractors would say
Outcomes vary wildly based upon the input representation and rule expressiveness. It's therefore hard to speak with generality about this approach; case-by-case discussions are necessary.

Classic NLP

Associated with
hybrid approaches: one part rules, one part Python, and one part classic machine learning
Words you might hear
StanfordNLP, SpaCy, Parsing, Named Entity Recognition
What it really means
Using "classic NLP" to do document extraction often means using the results of part-of-speech tagging, dependency parsing, and named entity recognition, along with Python and regular expressions, to build a hybrid system that's neither fully learned nor fully heuristic.
Advocates would say
This approach lets you have your cake and eat it too. Fantastic standard libraries exist that anyone with basic training can invoke from Python to write rules that generalize better. A combination of human rigidity and statistical flexibility.
Detractors would say
This ends up being equivalent to just writing better rules. It gets you more mileage, but ultimately doesn't change the scaling constraints rules face. Errors present in the lower-level models propagate up, uncorrected, to the higher-level rules and glue code that bind them. And past a certain point of complexity systems built with this style struggle to perform.

Deep Learning

Associated with
GPUs, enormous datasets, the new wave of NLU approaches
Words you might hear
Transformers, FastText, GPT, BERT and its variants
What it really means
Using layered neural nets to learn a representation of documents at the same time you learn to extract data from that representation.
Advocates would say
Deep learning is where NLU is headed, so you might as well invest in switching now. It lets you solve problems with far more nuance than other approaches, using only data annotations, and benefiting from general linguistic knowledge via pre-training. It's the closest thing to a one-size-fits-all approach the NLU world has seen yet, and there's no sign we're nearing the limits of its potential.
Detractors would say
Deep learning can produce amazing results, but also amazingly strange results. Because it's a largely a black box from the perspective of computer code, it can't be extended as directly as computer rules that capture expert knowledge. While it may be capable of great nuance, sometimes simplicity works just as well.