Much was made of a recent video of Duplex – Google’s talking AI – calling up a hair salon to make a reservation. The AI’s way of speaking was uncannily human, even pausing at moments to say “um”.

Some suggested Duplex had managed to pass the Turing test, a standard for machine intelligence that was developed by Alan Turing in the middle of the 20th century. But what exactly is the story behind this test and why are people still using it to judge the success of cutting edge algorithms?

Mechanical brains and emotional humans

In the late 1940s, when the first digital computers had just been built, a debate took place about whether these new “universal machines” could think. While pioneering computer scientists like Alan Turing and John von Neumann believed that their machines were “mechanical brains”, others felt that there was an essential difference between human thought and computer calculation.

Sir Geoffrey Jefferson, a prominent brain surgeon of the time, argued that while a computer could simulate intelligence, it would always be lacking:

“No mechanism could feel … pleasure at its successes, grief when its valves fuse, be warmed by flattery, be made miserable by its mistakes, be charmed by sex, be angry or miserable when it cannot get what it wants.”

In a radio interview a few weeks later, Turing responded to Jefferson’s claim by arguing that as computers become more intelligent, people like him would take a “grain of comfort, in the form of a statement that some particularly human characteristic could never be imitated by a machine.”

The following year, Turing wrote a paper called ‘Computing Machinery and Intelligence’ in which he devised a simple method by which to test whether machines can think.

The test was a proposed a situation in which a human judge talks to both a computer and a human through a screen. The judge cannot see the computer or the human but can ask them questions via the computer. Based on the answers alone, the human judge had to determine which is which. If the computer was able to fool 30 percent of judges that it was human, then the computer was said to have passed the test.

Turing claimed that he intended for the test to be a conversation stopper, a way of preventing endless metaphysical speculation about the essence of our humanity by positing that intelligence is just a type of behaviour, not an internal quality. In other words, intelligence is as intelligence does, regardless of whether it done by machine or human.

Does Google Duplex pass?

Well, yes and no. In Google’s video, it is obvious that the person taking the call believes they are talking to human. So, it does satisfy this criterion. But an important thing about Turing’s original test was that to pass, the computer had to be able to speak about all topics convincingly, not just one.

 

 

In fact, in Turing’s paper, he plays out an imaginary conversation with an advanced future computer and human judge, with the judge asking questions and the computer providing answers:

Q: Please write me a sonnet on the subject of the Forth Bridge.

A: Count me out on this one. I never could write poetry.

Q: Add 34957 to 70764.

A: (Pause about 30 seconds and then give as answer) 105621.

Q Do you play chess?

A: Yes.

Q: I have K at my K1, and no other pieces. You have only K at K6 and R at R1. It is your move. What do you play?

A: (After a pause of 15 seconds) R-R8 mate.

The point Turing is making here is that a truly smart machine has to have general intelligence in a number of different areas of human interest. As it stands, Google’s Duplex is good within the limited domain of making a reservation but would probably not be able to do much beyond this unless reprogrammed.

The boundaries around the human

While Turing intended for his test to be a conversation stopper for questions of machine intelligence, it has had the opposite effect, fuelling half a century of debate about what the test means, whether it is a good measure of intelligence, or if it should still be used as a standard.

Most experts have come to agree, over time, that the Turing test is not a good way to prove machine intelligence, as the constraints of the test can easily be gamed, as was the case with the bot Eugene Goostman, who allegedly passed the test a few years ago.

But the Turing test is nevertheless still considered a powerful philosophical tool to re-evaluate the boundaries around what we consider normal and human. In his time, Turing used his test as a way to demonstrate how people like Jefferson would never be willing to accept a machine as being intelligence not because it couldn’t act intelligently, but because wasn’t “like us”.

Turing’s desire to test boundaries around what was considered “normal” in his time perhaps sprung from his own persecution as a gay man. Despite being a war hero, he was persecuted for his homosexuality, and convicted in 1952 for sleeping with another man. He was punished with chemical castration and eventually took his own life.

During these final years, the relationship between machine intelligence and his own sexuality became interconnected in Turing’s mind. He was concerned the same bigotry and fear that hounded his life would ruin future relationships between humans and intelligent computers. A year before he took his life he wrote the following letter to a friend:

“I’m afraid that the following syllogism may be used by some in the future.

Turing believes machines think

Turing lies with men

Therefore machines do not think

– Yours in distress,

Alan”