Some companies need a pipeline that can route data to the right model. This might be for a few reasons:

  1. Model Selection.
    Many business processes involve extracting data from several different input documents, and each document type might be handled by a different model. This type of routing decision is considered a classification problem in the AI world, and solutions exist that range from hard-coded Python, to rule systems, to classic machine learning and deep learning.
  2. Reactive Exception Handling.
    You may be able to detect certain kinds of errors or exceptional situations before your model runs. Perhaps the OCR confidence was too low, too much private information was detected, or a special case indicating additional risk was flagged.
  3. Proactive Exception Handling.
    You can take an extraction pipeline to production faster, and with greater robustness, if you embrace the idea of humans working with computers rather than straight-through-automation. In this case, your pipeline can detect in advance which categories of input are likely to be problematic for automated extraction and route these to humans proactively.

The first case — model selection — can be the cause of an incredible amount of complexity. It's essentially a fractal version of the document extraction problem at a larger scale. To manage this complexity:

  1. Be extremely clear with your model team about divisions.
    Your model team may attempt to build a single one-size-fits-all model, a handful of one-size-fits-many models, or a large number of single-purpose models. Whichever it is, make sure this choice is made and coordinated with the team planning your pipeline.
  2. Focus on the high-value wins first.
    Exercises like Step 2 of the vetting process can help you identify if a certain subset of your input space accounts for a large portion of the potential value. If so, consider automating that first and routing everything else to humans. This avoids the risk of an automation attempt that never seems to reach the finish line because the long tail proves too long to complete. Early wins, even if partial, add needed momentum to a project.