Sam Bowman was a computational linguist at New York University. In the fall of 2017, Bowman figured out that computers were not very good at understanding the written word. While they had become competent in showing understanding in certain narrow domains, like automatic translation and determining if a sentence sounds “nice” or “mean”, Bowman was not yet satisfied, and he made a test. He wanted measurable evidence of the genuine article: a genuine, human-style reading comprehension in English.
In an April 2018 paper coauthored with collaborators from the University of Washington and DeepMind, the Google-owned artificial intelligence company, Bowman introduced a battery of nine reading-comprehension tasks for computers called GLUE (General Language Understanding Evaluation). The test was designed as “a fairly representative sample of what the research community thought were interesting challenges,” said Bowman, but also “pretty straightforward for humans.” For example, one task asks whether a sentence is true based on information offered in a preceding sentence. If you can tell that “President Trump landed in Iraq for the start of a seven-day visit” implies that “President Trump is on an overseas visit,” you’ve just passed.
The machines bombed. Even state-of-the-art neural networks scored no higher than 69 out of 100 across all nine tasks: a D-plus, in letter grade terms. Bowman and his coauthors weren’t surprised. Neural networks — layers of computational connections built in a crude approximation of how neurons communicate within mammalian brains — had shown promise in the field of “natural language processing” (NLP), but the researchers weren’t convinced that these systems were learning anything substantial about language itself. And GLUE seemed to prove it. “These early results indicate that solving GLUE is beyond the capabilities of current models and methods,” Bowman and his coauthors wrote.
But that was not the end of it. In October 2018, Google introduced a new method which scored a GLUE score of 80.5. It was BERT (Bidirectional Encoder Representations from Transformers). In just a span of six months, the machines have jumped from a D-plus to a B-minus.
Still, the question lingers: can these machines understand? Or is is just getting better at gaming our systems?
More about this over at Quanta Magazine.
(Image Credit: Pixabay)