When Aidan Ragan creates artificially generated images, he expects the people in his pictures to have knotty, veiny hands with more or less than five fingers. But this month, as he sat in his University of Florida class about AI in the arts, he was stunned to see a popular image maker churn out realistic hands.
“It was amazing,” Ragan, 19, said in an interview with The Washington Post. “That was the one thing that was holding it back, and now it’s perfected. It was a little scary … and exciting.”
Artificial intelligence image-generators, which create pictures based on written instructions, have skyrocketed in popularity and performance. People enter prompts varying from the mundane (draw Santa Claus) to nonsensical (dachshund in space in the style of stained glass) — and the software spits out an image resembling a professional painting or realistic photo.
However, the technology has a major failing: creating lifelike human hands. Data sets that train AI often capture only pieces of hands. That often results in photos of bulbous hands with too many fingers or stretched-out wrists — a telltale sign the AI-generated image is fake.
But in mid-March, Midjourney, a popular image maker, released a software update that seemed to fix the problem, with artists reporting that the tool created images with flawless hands. This improvement comes with a big problem: The company’s enhanced software was used this week to churn out fake images of former president Trump being arrested which looked real and went viral, showing the disruptive power of this technology.
The seemingly innocuous update is a boon for graphic designers who rely on AI image makers for realistic art. But it sparks a larger debate about the danger of generated content that is indecipherable from authentic images. Some say this hyper-realistic AI will put artists out of work. Others say flawless images will make deep fake-campaigns more plausible, absent glaring clues that an image is fabricated.
“Before nailing all these details, the average person would … be like: ‘Okay, there are seven fingers here or three fingers there — that’s probably fake,’” said Hany Farid, a professor of digital forensics at the University of California at Berkeley. “But as it starts to get all these details right … these visual clues become less reliable.”
Over the past year, there has been an explosion of text-to-image generators amid the greater rise in generative artificial intelligence, which backs software that creates texts, images or sounds based on data it is fed.
The popular Dall-E 2, created by OpenAI and named after painter Salvador Dali and Disney Pixar’s WALL-E, shook the internet when it launched last July. In August, the start-up Stable Diffusion released its own version, essentially an anti-DALL-E with fewer restrictions on how it could be used. Research lab Midjourney debuted its own version during the summer, which created the picture that sparked a controversy in August when it won an art competition at the Colorado State Fair.
These image makers work by ingesting billions of images scraped from the internet and recognizing patterns between the photos and text words that go alongside them. For example, the software learns that when someone types “bunny rabbit,” it’s associated with a picture of the furry animal and spits that out.
But recreating hands remained a thorny problem for the software, said Amelia Winger-Bearskin, an associate professor of AI and the arts at the University of Florida.
Why AI image-generators are bad at drawing hands
AI-generated software has not been able to fully understand what the word “hand” means, she said, making the body part difficult to render. Hands come in many shapes, sizes and forms, and the pictures in training data sets are often more focused on faces, she said. If hands are depicted, they are often folded or gesturing, offering a mutated glimpse of the body part.
“If every single image of a person was always like this,” she said, spreading her hands out fully during a Zoom video interview, “we’d probably be able to reproduce hands pretty well.”
Midjourney’s software update this month seems to have made a dent in the problem, Winger-Bearskin said, though she noted that it’s not perfect. “We’ve still had some really odd ones,” she said. Midjourney did not respond to a request for comment seeking to understand more about its software update.
Winger-Bearskin said it’s possible Midjourney refined its image data set, marking photos where hands aren’t obscured as higher priority for the algorithm to learn from and flagging images where hands are blocked as lower priority.
Julie Wieland, a 31-year-old graphic designer in Germany, said she benefits from Midjourney’s ability to create more realistic hands. Wieland uses the software to create mood boards and mock-ups for visual marketing campaigns. Often, the most time-consuming part of her job is fixing human hands in postproduction, she said.
But the update is bittersweet, she said. Wieland often relished touching up an AI-generated image’s hands, or making the image match the creative aesthetic she prefers, which is heavily inspired by the lighting, glare and through-the-window shots made famous in Wong Kar Wai’s film “My Blueberry Nights.”
“I do miss the not-so-perfect looks,” she said. “As much as I love having beautiful images straight out of Midjourney, my favorite part of it is actually the postproduction of it.”
Ragan, who plans to pursue a career in artificial intelligence, also said these perfect images reduce the fun and creativity associated with AI image-making. “I really liked the interpretive art aspect,” he said. “Now, it just feels more rigid. It feels more robotic … more of a tool.”
UC Berkeley’s Farid said Midjourney’s ability to make better images creates political risk because it could generate images that seem more plausible and could spark societal anger. He pointed to images created on Midjourney this past week that seemed to plausibly show Trump being arrested, even though he hadn’t. Farid noted the tiny details, such as the length of Trump’s tie and his hands, were getting better, making it more believable.
“It’s easy to get people to believe this stuff,” he said. “And then when there’s no visual [errors], now it’s even easier.”
As recently as a few weeks ago, Farid said, spotting poorly created hands was a reliable way to tell if an image was deep-faked. That is becoming harder to do, he said, given the improvement in quality. But there are still clues, he said, often in a photo’s background, such as a disfigured tree branch.
Farid said AI companies should think more broadly about the harms they may contribute to by improving their technology. He said they can incorporate guardrails, making some words off-limits to re-create (which Dall E-2 has, he said), incorporating image watermarks and preventing anonymous accounts from creating photos.
But, Farid said, it’s unlikely AI companies will slow down the improvement of their image makers.
“There’s an arms race in the field of generative AI,” he said. “Everybody wants to figure out how to monetize and they are moving fast, and safety slows you down.”