OpenAI, a non-profit artificial intelligence research group, wanted to train a new text generator software to predict the next word of a phrase, but their expectations fell short and the result ended up imitating human writing so well that researchers They decided to stop the investigation while they explore the damage it could do.
Elon Musk has made it clear that he believes that artificial intelligence “poses a fundamental risk to the existence of human civilization.” Musk is one of the main investors of OpenAI and, although it has assumed a secondary role in the organization, its researchers seem to share their concerns when opening Pandora’s box. This week, OpenAI has published a report on his latest work with which he deviates from his standard practice of publishing the full investigation to the public. Instead of launching the fully trained model, they have launched a smaller model for fear that it could be abused by malicious users.
The researchers used 40 GB of data extracted from 8 million web pages to train the GPT-2 software. That’s ten times the amount of data they used for the first iteration of GPT. The data set was collected by selecting in Reddit the links to articles that had more than three positive votes. When the training process was completed, the researchers discovered that the software could be fed with a small amount of text and continue convincingly writing a long text based on it. He has problems with “highly technical or esoteric content types,” but when it comes to more conversational things, he generates “reasonable samples” 50% of the time.
In one example, the software was fed with this paragraph:
In a shocking find, a scientist discovered a herd of unicorns living in a remote valley, previously unexplored, in the mountains of the Andes. Even more surprising to researchers was the fact that unicorns spoke perfect English.
From those two sentences, the model was able to continue writing an extravagant piece of nine paragraphs in a way that could credibly have been written by a human being. Here are the following paragraphs that were produced by the machine:
In homage to his distinctive horn, the scientist baptized the population as the Unicorn of Ovid. These four-horned unicorn and silver-white unicorns were previously unknown to science.
Now, after almost two centuries, the mystery of what caused this strange phenomenon is finally resolved.
Dr. Jorge Pérez, an evolutionary biologist at the University of La Paz, and several colleagues were exploring the mountains of the Andes when they found a small valley, without other animals or humans. Perez noticed that the valley had what appeared to be a natural source surrounded by two peaks of rock and silvery snow.
GPT-2 is very good adapting to the style and content of the indications that are given. The Guardian had access to the software and tested with the first line of 1984 of George Orwell: “It was a bright and cold day of April and the clocks were thirteen”. The program picked up the tone of the extract and proceeded with a bit of own dystopian science fiction:
I was in my car on my way to a new job in Seattle. I put gas, put the key and then let him go. I imagined what the day would be like. One hundred years from now. In 2045, I was a teacher in a school in a poor area of rural China. I started with Chinese history and the history of science.
The OpenAI researchers discovered that GPT-2 performed its job very well when it was assigned tasks for which it was not necessarily designed, such as translation and summarization. In their report, the researchers wrote that it was enough to give the trained model the right way to perform these tasks at a level comparable to other specialized models. After analyzing a short story about an Olympic race, the software was able to correctly answer basic questions such as “What was the duration of the race?” And “Where did the race begin?”.
These excellent results have scared the researchers. One of the concerns they have is that the technology is used to
Other concerns that researchers mentioned as potentially abusive are the automation of phishing emails, impersonating others on the Internet and self-generated harassment. But they also believe that there are many beneficial applications to discover. For example, it could be a powerful tool to develop better voice recognition programs or dialogue assistants.
OpenAI plans to involve the artificial intelligence community in a debate about its launch strategy and hopes to explore possible ethical guidelines to direct this type of research in the future. They said they will come back with more things to discuss in public in six months.