Every model in 60 seconds

Diagram

Are you ready? Let's power through the different types of models you might encounter in engineering and sales meetings. For each model, you'll find a layman's explanation along with the high-level arguments for and against that model type.

Templates

  • Associated with
    OCR software, form processing
  • What it often means
    Hard-coded regions of a page that correspond to output fields
  • Advocates would say
    Don't overcomplicate things. If you have a known, standard form like a tax filing, this is all you need to get fast and accurate straight-through automation. Plus, you don't need to be a programmer: anyone can point and click to define what they want extracted.
  • Detractors would say
    Even standard IRS forms change from year to year, sometimes many times in a year! These small changes require you to create new templates. Before you know it, you're drowning under a mountain of templates to maintain. Plus, you'll still need another more advanced extraction system to handle the more nuanced documents.

Rules

  • Associated with
    explainability, expert systems, business process modeling, systems with firm behavior requirements, highly regulated industries
  • What it often means
    Rule engines and domain specific languages. But also often ordinary computer code, regular expressions, and classic NLP libraries.
  • Advocates would say
    Rules are fast, tested, and inevitable. Everyone uses them internally at least somewhere, even big tech companies famous for their deep learning. Rules are great at capturing expert guidance ("Look for the digits to the right of the text 'ID Number'"), and result in exquisitely explainable models. As MIT Professor Max Tegmark says, Newton may have discovered gravity using the neural net inside his head, but he encoded and distributed that knowledge as a rule: F = ma.
  • Detractors would say
    Rules may work for small use cases, but they don't scale to large ones. Scaling, for rule systems, means more programming: lots of long-term coordination between domain experts, operations, and programmers. Past a certain point of scale, human programmers struggle to capture the complexity of the real world system they are modeling. Our brains just can't fit that many possibilities into our heads at once. That results in hard-to-maintain codebases of "spaghetti code" that, taken collectively, loose the elegance of F = ma. Rules also result in "all or nothing" systems in which results are either claimed to be right or wrong with little confidence scoring in between.

Learned Rules

  • Associated with
    Domain-specific languages, decision trees, program synthesis
  • What it really means
    Learning a human-interpretable computer program to perform extraction.
  • Advocates would say
    This is a holy grail of machine learning that still hasn't transitioned fully from research to industry. It blends the flexibility and nuance of learned models with a representation humans can audit, modify, and constrain if necessary.
  • Detractors would say
    Outcomes vary wildly based upon the input representation and rule expressiveness. It's therefore hard to speak with generality about this approach; case-by-case discussions are necessary.

Classic NLP

  • Associated with
    hybrid approaches: one part rules, one part Python, and one part classic machine learning
  • Words you might hear
    StanfordNLP, SpaCy, Parsing, Named Entity Recognition
  • What it really means
    Using "classic NLP" to do document extraction often means using the results of part-of-speech tagging, dependency parsing, and named entity recognition, along with Python and regular expressions, to build a hybrid system that's neither fully learned nor fully heuristic.
  • Advocates would say
    This approach lets you have your cake and eat it too. Fantastic standard libraries exist that anyone with basic training can invoke from Python to write rules that generalize better. A combination of human rigidity and statistical flexibility.
  • Detractors would say
    This ends up being equivalent to just writing better rules. It gets you more mileage, but ultimately doesn't change the scaling constraints rules face. Errors present in the lower-level models propagate up, uncorrected, to the higher-level rules and glue code that bind them. And past a certain point of complexity systems built with this style struggle to perform.

Deep Learning

  • Associated with
    GPUs, enormous datasets, the new wave of NLU approaches
  • Words you might hear
    Transformers, FastText, GPT, BERT and its variants
  • What it really means
    Using layered neural nets to learn a representation of documents at the same time you learn to extract data from that representation.
  • Advocates would say
    Deep learning is where NLU is headed, so you might as well invest in switching now. It lets you solve problems with far more nuance than other approaches, using only data annotations, and benefiting from general linguistic knowledge via pre-training. It's the closest thing to a one-size-fits-all approach the NLU world has seen yet, and there's no sign we're nearing the limits of its potential.
  • Detractors would say
    Deep learning can produce amazing results, but also amazingly strange results. Because it's a largely a black box from the perspective of computer code, it can't be extended as directly as computer rules that capture expert knowledge. While it may be capable of great nuance, sometimes simplicity works just as well.