## Why computers don’t yet read better than us

Artificial Intelligence is on a roll these days. It feels like the media report a new breakthrough every day. In 2017, computer and board games were at the center of public attention, but this year things look different. In the early days of 2018, both Microsoft and Alibaba claimed to have developed software that can read as well as humans do. Sensational headlines followed suit. CNN wrote that “Computers are getting better than humans at reading”, while Newsweek feared “Computers can now read better than humans, putting millions of jobs at risk”. Reality is much less spectacular, however.

# One-trick ponies

Most existing AI systems are one-trick ponies. If they have been trained on one particular task and one particular text type, they usually cannot handle different domains or problems. QA software that has been trained on Wikipedia will mostly fail to answer questions about other texts, such as legal documents or scientific articles. To do this, it would need to see tens of thousands of questions and answers from that particular domain. Collecting such training data is pretty expensive at best, and often it is an insurmountable task.

The tunnel vision of the current QA systems is even worse than that. Last summer, two researchers from Stanford University demonstrated how easy it is to fool QA software trained on SQuAD: by changing just a few details in the Wikipedia texts, they managed to bring down the quality of the best systems enormously.

The oil price example already demonstrated the main weakness of current QA software: if the Wikipedia paragraph contains several possible answers, even the best QA systems begin to guess. The same problem is illustrated by the Super Bowl paragraph above: if you simply add an extra quarterback to the text, the systems can’t tell anymore which player was exactly 38 years old during the 33rd Super Bowl. Worse still, if you add to the article an ungrammatical sequence of words that are vaguely related to the answer, even the best systems fall below 10% correct answers. People deal with such misleading situations much better.

## Conclusion

One thing is clear: we’re still a long way from computers that read as well as humans. Still, the recent evolution in QA systems is quite promising — or frightening, if you fear job loss in the long term. After all, how often are we not confronted with unstructured collections of text? Legal documents, scientific literature, or even the mails of Hillary Clinton — all hundreds or thousands of pages with interesting content that no one can read from A to Z. Wouldn’t it be great if QA software could answer all our questions about them?

However, the real breakthrough will only come when QA systems lose their dependence on expensive training data. When they can answer a question about a new domain without first seeing tens of thousands of similar examples. The success of such so-called “unsupervised” methods will undoubtedly herald a new revolution in Artificial Intelligence. But we’re not quite there yet.

#### Yves Peirsman

Yves discovered Natural Language Processing 13 years ago as an MSc student at the University of Edinburgh, and has never looked back. With a background as a researcher and developer in academia (University of Leuven, Stanford University) and industry (Textkernel, Wolters Kluwer), he founded NLP Town to further indulge and spread his love for NLP.