The artificial intelligence research lab OpenAI on Tuesday launched the newest version of its stunning language software, GPT-4, an advanced tool for analyzing images and mimicking human speech, pushing the technical and ethical boundaries of a rapidly proliferating wave of AI.
Its predecessor ChatGPT captivated and unsettled the public with its uncanny ability to generate elegant writing, unleashing a viral wave of college essays, screenplays and conversations — though it could only generate text, and it relied on an older generation of technology that hasn’t been cutting edge for more than a year.
GPT-4, in contrast, is a state-of-the-art system capable of creating not just words but describing images in response to a person’s simple written commands. When shown a photo of a boxing glove hanging over a wooden seesaw with a ball on one side, for instance, a person can ask what will happen if the glove drops, and GPT-4 will respond that it would hit the seesaw and cause the ball to fly up.
The buzzy launch capped months of hype and anticipation over an AI program, known as a large language model, that early testers had claimed was remarkably advanced in its ability to reason and learn new things.
The developers pledged in a Tuesday blog post that the technology could further revolutionize work and life. But those promises have also fueled anxiety over how people will be able to compete for jobs outsourced to eerily refined machines or trust the accuracy of what they see online.
Officials with the San Francisco lab said GPT-4′s “multimodal” training across text and images would allow it to escape the chat box and more fully emulate a world of color and imagery, surpassing ChatGPT in its “advanced reasoning capabilities.” A person can submit an image into GPT-4 and it will caption it for them.
Microsoft has invested billions of dollars into OpenAI in hopes its technology will become a secret weapon for its workplace software, search engine and other online ambitions. But AI boosters say those may only skim the surface of what such AI can do, and that it could lead to business models and creative ventures no one can yet predict.
Rapid AI advances, coupled with the wild popularity of ChatGPT, have fueled a multibillion-dollar arms’ race over the future of AI dominance and transformed new-software releases into major spectacles.
OpenAI and Microsoft, which late last year released a GPT-powered chatbot in its Bing search tool, have moved aggressively to counter Google and other AI trailblazers on the belief that these tools could prove crucial to future industries.
But the frenzy has also sparked criticism that the companies are rushing to exploit an untested, unregulated and unpredictable technology that could deceive people, undermine artists’ work and lead to real-world harm.
AI language models often confidently offer wrong answers because they are designed to spit out cogent phrases, not actual facts. And because they have been trained on internet text and imagery, they have also learned to emulate human biases of race, gender, religion and class.
Such systems have inspired boundless optimism around this technology’s potential, with some seeing in its responses a sense of intelligence or sentience almost on par with humans. The systems, though — as critics and the AI researchers are quick to point out — are merely repeating patterns and associations found in their training data without a clear understanding of what it’s saying or when it’s wrong.
Despite its unreliability, Silicon Valley sees massive economic potential in this type of AI because of how easy these models are to use. Anyone can write what’s known as a “prompt” in plain English into a chat box, allowing people who don’t know how to write code the ability to communicate with machines in the same way as computer programmers have for decades.
GPT-4, the fourth “generative pre-trained transformer” since OpenAI’s first release in 2018, relies on a breakthrough neural-network technique in 2017 known as the transformer that rapidly advanced how AI systems can analyze patterns in human speech and imagery.
The systems are “pre-trained” by analyzing trillions of words and images taken from across the internet: news articles, restaurant reviews and message-board arguments; memes, family photos and works of art. Giant supercomputer clusters of graphics processing chips then mapped out their statistical patterns — learning which words tended to follow each other in phrases, for instance — so that now the AI can mimic those patterns, automatically crafting long passages of text or detailed images, one word or pixel at a time.
OpenAI said GPT-4 has hundreds of trillions of “parameters,” the variables representing pieces of information it acquired in training, like wrinkles in a brain. GPT-3, launched in 2020, had 175 billion parameters, suggesting a vast leap forward in terms of complexity and cognition.
OpenAI launched in 2015 as a nonprofit but has quickly become one of the AI industry’s most formidable private juggernauts, applying language-model breakthroughs to high-profile AI tools that can talk with people (ChatGPT), write programming code (GitHub Copilot) and create photorealistic images (DALL-E 2).
Over the years, it has also radically shifted its approach to the potential societal risks of releasing AI tools to the masses. In 2019, the company famously refused to publicly release GPT-2, saying it was so good they were concerned about the “malicious applications” of its use, from automated spam avalanches to mass impersonation and disinformation campaigns.
The pause was temporary. In November, ChatGPT, which used a fine-tuned version of GPT-3 that originally launched in 2020, saw more than a million users within a few days of its public release.
Public experiments with ChatGPT and the Bing chatbot have shown how far the technology is from perfect performance without human intervention. After a flurry of strange conversations and bizarrely wrong answers, Microsoft executives acknowledged that the technology was still not trustworthy in terms of providing correct answers but said it was developing “confidence metrics” to address the issue.
GPT-4 is expected to improve on some shortcomings, and AI evangelists such as the tech blogger Robert Scoble have argued that “GPT-4 is better than anyone expects.” But critics worry that could lead to its own consequences, such as helping create fake photos of nonexistent events or people doing things they never did.
OpenAI’s chief executive, Sam Altman, has tried to temper expectations around GPT-4, saying in January that speculation about its capabilities had reached impossible heights. “The GPT-4 rumor mill is a ridiculous thing,” he said at an event held by the newsletter StrictlyVC. “People are begging to be disappointed, and they will be.”
But Altman has also marketed OpenAI’s vision with the aura of science fiction come to life. In a blog post last month, he said the company was planning for ways to ensure that “all of humanity” benefits from “artificial general intelligence,” or AGI — an industry term for the still-fantastical idea of an AI superintelligence that is generally as smart as, or smarter than, the humans themselves.
Microsoft, an OpenAI investor, is working to package GPT-4 into a sellable product and has marketed the technology as a super-efficient companion that can handle mindless work and free people for more creative pursuits. The tool could, for instance, help one software developer to do the work of an entire team or allow a mom-and-pop shop to plan and design a professional advertising campaign without outside help.
A Microsoft executive told the German news site Heise that a developer had used the AI to create a prototype for summarizing and responding to call-center conversations with customers in a way that could save one company roughly 500 hours a day — or more than 60 people working 8-hour shifts — across tens of thousands of daily calls.
The company is making GPT-4 available to ChatGPT Plus subscribers, but with a cap on usage. OpenAI said it plans to adjust the usage limit depending on demand and system performance as it scales in the coming months. Developers will also be able to build apps with GPT-4 through the company’s API, an interface that allows different software to connect.
OpenAI has already allowed some collaborating companies access to GPT-4. Duolingo, the language learning app, has used GPT-4 to introduce new features, including an AI conversation partner and a tool that tells users why an answer was incorrect.