Election results: Lessons from a real-world NLG system

The BBC used Arria NLG to report on the December 2019 UK general election, and in particular to produce summaries of election results in individual constituences. This is described in a news article and their blog. I show an example output (election result summary) from the BBC system in an appendix to this blog.

The BBC system is fairly simple (eg, no serious data analytics), but it does nicely illustrate some of the ways in which real-world commercial NLG use is different from a lot of academic NLG work, especially in neural NLG. In particular, in the BBC system

Texts are supposed to communicate a meaning and achieve a purpose
No corpus is available for training
Texts must be accurate
Domain experts (journalists) must be in control

These are generic constraints which apply to most of the real-world NLG projects which I have been involved in.

Texts are Meaningful and Have a Purpose

The BBC texts communicate information (election results) for a purpose (informing people who live in a constituency). I think this is fundamental to NLG; the core challenge is creating a text which communicates meaning and achieves a purpose.

Recently, however, there has been a lot of academic work on systems which produce meaningless and purposeless texts. The best known such system is perhaps GPT2, which takes initial words and expands them into a narrative by using a language model to find the most likely completions. GPT2 (at least when I try it out) produces really bad texts, but even when it produces something which reads well, it is not in any sense trying to communicate a meaning or achieve a purpose, it is just trying to find a plausible expansion/continuation of an initial set of words. This is not NLG as I understand NLG. I also in all honesty struggle to understand why this kind of thing is useful. Its a great “party trick” when it works, but I dont see how it can be used in real-world applications.

No Corpus

The BBC did not have an existing corpus of texts giving election results in constituencies when they built their system. This is because they have not previously tried to publish such texts. It would be difficult to manually write 689 texts within the short time frame (a few hours) available, especially during the busiest news night of the year where BBC journalists are working flat-out on all sorts of things.

This is very common in applied NLG. Once in a while we work in domains such as weather forecasting or sports reporting whether corpora of human-written texts are available, but this is the exception, not the rule. Usually people want to use NLG to create new types of texts (like the BBC did), or at least variants of existing texts (eg, very specialised weather forecasts). In such cases, we do not have corpora to use to train “end-to-end” NLG systems. We can still use generic corpora such as Wikipedia or general domain corpora (such as newswire texts for automatic journalism applications such as election reporting) to train components or modules within the NLG system (such as lexical choice), but we cannot train an end-to-end system without a substantial corpus for the specific application we are trying to build.

Accuracy Matters

Like most reputable journalistic organisations, the BBC insists on accuracy. Every article that is published by the BBC must be as accurate as possible. The BBC in fact currently insists that all computer-generated articles be checked and edited by a person before being published. Although this checking process in theory means that a small number of mistakes can be made in computer-generated texts, my experience in such contexts (I have not talked specifically to the BBC about this) is that the tolerance for accuracy errors is still very limited; the computer system will not be used if it makes substantially more mistakes than a human writer.

As I have discussed in a previous blog, a lot of recent academic work in neural NLG seems to treat accuracy as unimportant, in part because it is difficult and expensive to measure (ie, BLEU scores tell you nothing about accuracy, and asking Turkers to measure accuracy can also be pretty useless). This perspective is completely at odds with every applied NLG project I have ever worked on. Texts must be accurate in order to be useful!

Journalist in Control

One final point is that the BBC wants journalists to be in control of the news-generation system. The BBC does not want an NLG system created by “machine learning magic”, it wants an NLG system which is designed by its journalists to produce news articles that follow best journalism practice as far as possible. In other words, the goal is to encode journalistic expertise into a computer system which applies this expertise to a large number of articles, not to replace journalists with “ML magic”.

There have been some very interesting attempts by media organisations to use ML to generate news stories, such as Kondadadi et al 2013 (from Thomson-Reuters). As far as I can tell, though (and my knowledge is imperfect!), such systems have not been as successful as rules/template systems (like the BBC’s election reporter), in part because journalists want complete and explicit control over the system.

Final Thoughts

Academic work, especially in neural NLG, often ignores key factors in real-world applied NLG, including the need to communicate meaning, lack of corpora, paramount importance of accuracy, and desire of human domain experts to control the NLG system. Of course there is no need for academic work to be 100% aligned with commercial work! However, I think it is important that academic researchers understand the “real-world NLG” perspective (even if they take a different approach in their research), and I hope the above discussion of the BBC election reporter is useful in illustrating this perspective.

Appendix: Example Text from BBC System

An example election sumamry produced by the BBC system (taken from the news article) is

Florence Eshalomi has been elected MP for Vauxhall, meaning that the Labour Party holds the seat with a decreased majority.

The new MP beat Liberal Democrat Sarah Lewis by 19,612 votes. This was fewer than Kate Hoey’s 20,250-vote majority in the 2017 general election.

Sarah Bool of the Conservative Party came third and the Green Party’s Jacqueline Bond came fourth.

Voter turnout was down by 3.5 percentage points since the last general election.

More than 56,000 people, 63.5% of those eligible to vote, went to polling stations across the area on Thursday, in the first December general election since 1923.

Three of the six candidates, Jacqueline Bond (Green), Andrew McGuinness (The Brexit Party) and Salah Faissal (independent) lost their £500 deposits after failing to win 5% of the vote.

This story about Vauxhall was created using some automation.

Election results: Lessons from a real-world NLG system

Election results: Lessons from a real-world NLG system

Texts are Meaningful and Have a Purpose

No Corpus

Accuracy Matters

Journalist in Control

Final Thoughts

Appendix: Example Text from BBC System

Recommend

Lessons from 25 Years of Information Extraction

Amateurs focus on models; professionals focus on data

AI professionals also focus on change management

JavaScript: Conditional JavaScript, only download when it is appropriate to do s...

HenCoder：给高级 Android 工程师的进阶手册

HenCoder Android 开发进阶: 自定义 View 1-1 绘制基础

HenCoder Android 开发进阶: 自定义 View 1-2 Paint 详解

HenCoder Android 开发进阶：自定义 View 1-3 drawText() 文字的绘制

HenCoder Android 开发进阶：自定义 View 1-4 Canvas 对绘制的辅助 clipXXX() 和 Matr...

HenCoder Android 开发进阶：自定义 View 1-5 绘制顺序

About Joyk