Real-time production models - How do they differ from benchmark tests?

What are Real-Time Production Models and Benchmark Tests?

Real-time production models are models that enable users to take data collected during production and analyze both current production capabilities and predict future production outputs. These are models meant to optimize production and assess performance “pre-release”, meaning they are predictive performance tools. While production models take on many forms, one method of production modeling that is rising in popularity is machine learning algorithms. Machine learning algorithms create production models by learning from past data and then making assessments and predictions on the current production status in light of the lessons learned from the past data. 


For the purposes of this article, an example production model will be explored: machine learning for text analysis. This type of machine learning production model takes the form of:

  • The production process: analysis of text data, i.e. an article.
  • The production product / output: a concise summary of the most
    important facts in the article.
  • The production model: the machine learning algorithm applied to the

This production model learns relevant information from past articles, and then applies this learned information to summarize new articles. Unlike real-time production models, benchmark tests are used to retroactively assess the final output of production. Data is collected, both on the production process and the final product, and a standard set of tests is run using this data to determine product quality and performance. Benchmark tests are competition-based, with the goal of “beating” either other companies’ similar products or surpassing previous performance benchmarks, and are measure performance “post-release.”

Benchmark testing involves:

  • Collecting data at pre-determined times throughout production.
  • Collecting data that is repeatable – the same data is collected for every production and product.
  • Performing a pre-defined, standardized set of tests on the data.
  • Scoring the final product and comparing this score to other products.
The overall key difference between benchmark tests and production models is the difference between asking “how well did my product perform versus other products” and “how can I optimize my current production to produce the best possible product.”

1. Data Needs - What Data Is Required for Real-Time Production Models?

A well-developed and trained production model provides a wealth of benefits; however, these models can just as easily be detrimental. A poorly developed model has the potential to produce misleading, biased, or even nonsense results. The deciding factor in the quality of the production model is the quality of the data used to train it. When producing any machine learning algorithm, the driving question is always going to be what and how much data does the model need to adequately train?

Data needs for text analysis can be broken down as:

  • What articles are needed for training based on the application, i.e. scholarly articles versus newspapers versus blogs?
  • What context is needed about the text, i.e. what words, word combinations, and word definitions within the article are the most relevant?
  • How many articles does the algorithm need to use for training?

In general, more training data is better, and the data needs as much context as possible. Additionally, training data should match the current use case. That is, if the text to be analyzed is a scientific blog post, the training data for the production model should include both scientific articles and related blog posts. The closer the distribution of training data matches the subject matter of the text being analyzed, the better the summary information will be.

2. Data Tuning - How is Training Data Collected for Production Models?

Data tuning relates to how and what data is input into the benchmark test / production model. This is straightforward for benchmark testing: determine the data to collect during production and how often to collect it. The data needs of the benchmark test are derived from the relative accuracy of previous benchmark tests.

Contrast this with production modeling, where machine learning algorithms are used to predict outputs during production. In this case, data tuning is finding the right data to collect to train the production model. For text analysis, this involves:
  • Selecting a large enough set of relevant articles.
  • Providing a lexicon, or context, for the articles – the words, groups of words, and word definitions that convey the most relevant information.
  • Learning from the articles – iterate over the data set to discover what subset of the lexicon captures the best summary of information.
  • Applying this lexicon to new articles: running the production model.

3. Data Imbalance - How Can the Right Training Data be Selected for Real-Time Production Models?

Tuning the training data set for real-time production models is not a trivial task. Not all of the data collected for training will be useful, and down selection is often required. The data needs to be relevant to the text being summarized, but not so specific that it finds a limited subset of relevant information, but also not so vague that it finds too much information. Furthermore, there will always be an imbalance in the training data. Finding a large enough training data set targeted to one specific use case is unlikely, so training data must be balanced across topics to best match the distribution of the topic being studied.

Several pitfalls can be encountered during training data selection, such as:

  • Choosing too broad a set of input articles, leading to summaries that are too long or too vague.
  • Choosing too narrow a set of input articles, leading to summaries that miss key information.
  • Choosing poor-quality articles, i.e. opinion-based sources, leading to biased summaries.
  • Choosing the wrong lexicon to apply to the input articles, leading to nonsense summaries.

Finding the right training data set is not a trivial task and will require trade-offs in the amount of training data, relevance of training data, and optimal context.

4. New Vocabularies - How is Training Data Applied Across Different Production Models?

Finding the right training set and tuning this training set to the given use case can be an expensive and time-consuming task. The cost associated with developing training sets gives rise to the desire to extend training data across applications. Ideally, a production model trained on one set of articles could be extended to other applications. The goal is to collect, organize, and give context to the training data such that it can apply across multiple production model use cases.

However, the new production model cannot understand the context of the old production model. Each new word within the lexicon that has not been trained on by the old model induces a loss of accuracy. Therefore, text analysis production models need to be re-tuned, that is, given a new vocabulary, on which to train. That is not to say, however, that old production models are completely inapplicable to new domains. Several strategies exist to mitigate loss of accuracy across use cases, including:

  • Breaking the training data lexicon down into sub-groups, such as specific letter combinations or high-frequency words.
  • Co-training: creating the training data set with two different contexts for each article.
  • Trimmed Loss Minimization: determining which subset of articles to train the new model by estimating which articles reduce the overall loss of accuracy.

5. Time Latency - How Long Do Production Models Take to Run?

While real-time production models often carry the name “real-time”, because they leverage the most up-to-date production data available, they can in fact run at many time scales. In practice, different data needs define run-time; for example, a production model may be designed to analyze trends in information, and therefore need days worth of training data. But, once executed, this production model can run in a matter of minutes to analyze new data.


Time latency for text analysis relates to what the expectations are for the model:

  • How long does the production model take to train / how much training data must be collected?
  • How often does the model need to predict performance – hourly, daily, weekly, etc.?
  • How much data will be modeled, a short blog, a journal article, a book chapter, etc.?
  • How much human interaction is required – how often are the model outputs checked for accuracy and interpreted by a human operator?

Production modeling provides proactive, or predictive, measures of performance. They assess performance “ahead-of-the-curve” to determine how to create a better final product. In the presented case of text analysis, production models predict what information contained within a text article is most relevant to a given application. Once production models have been executed and a product produced, benchmark tests can then be run to assess the value of the final product. Production models provide several key benefits, such as:

  • Reduced production cost as production methods are optimized during production.
  • Reduced bias in outputs as human operator interaction with the data is reduced.
  • Improved accuracy over time as more training data is compiled during production.
  • Increased agility as changes to production can be made in real-time.

As machine learning algorithms become more and more refined, production modeling will become not just a beneficial but a vital tool for production. Therefore, early adoption of production modeling is low risk with the potential of very high rewards, and production models will play a critical role in shaping how production is done in the future.

AI-enabled research management system for market-intelligence.