We’ve filed a lawsuit challenging Stable Diffusion, a 21st-century collage tool that violates the rights of artists.
Because AI needs to be fair & ethical for everyone.
January 13, 2023
Hello. This is Matthew Butterick. I’m a writer, designer, programmer, and lawyer. In November 2022, I teamed up with the amazingly excellent class-action litigators Joseph Saveri, Cadio Zirpoli, and Travis Manfredi at the Joseph Saveri Law Firm to file a lawsuit against GitHub Copilot for its “unprecedented open-source software piracy”. (That lawsuit is still in progress.)
Since then, we’ve heard from people all over the world—especially writers, artists, programmers, and other creators—who are concerned about AI systems being trained on vast amounts of copyrighted work with no consent, no credit, and no compensation.
Today, we’re taking another step toward making AI fair & ethical for everyone. On behalf of three wonderful artist plaintiffs—Sarah Andersen, Kelly McKernan, and Karla Ortiz—we’ve filed a class-action lawsuit against Stability AI, DeviantArt, and Midjourney for their use of Stable Diffusion, a 21st-century collage tool that remixes the copyrighted works of millions of artists whose work was used as training data.
Joining as co-counsel are the terrific litigators Brian Clark and Laura Matson of Lockridge Grindal Nauen P.L.L.P.
As a lawyer who is also a longtime member of the visual-arts community, it’s an honor to stand up on behalf of fellow artists and continue this vital conversation about how AI will coexist with human culture and creativity.
The image-generator companies have made their views clear.
Now they can hear from artists.
A 21st-century collage tool
Stable Diffusion is an artificial intelligence (AI) software product, released in August 2022 by a company called Stability AI.
Stable Diffusion contains unauthorized copies of millions—and possibly billions—of copyrighted images. These copies were made without the knowledge or consent of the artists.
Even assuming nominal damages of $1 per image, the value of this misappropriation would be roughly $5 billion. (For comparison, the largest art heist ever was the 1990 theft of 13 artworks from the Isabella Stewart Gardner Museum, with a current estimated value of $500 million.)
Stable Diffusion belongs to a category of AI systems called generative AI. These systems are trained on a certain kind of creative work—for instance text, software code, or images—and then remix these works to derive (or “generate”) more works of the same kind.
Having copied the five billion images—without the consent of the original artists—Stable Diffusion relies on a mathematical process called diffusion to store compressed copies of these training images, which in turn are recombined to derive other images. It is, in short, a 21st-century collage tool.
These resulting images may or may not outwardly resemble the training images. Nevertheless, they are derived from copies of the training images, and compete with them in the marketplace. At minimum, Stable Diffusion’s ability to flood the market with an essentially unlimited number of infringing images will inflict permanent damage on the market for art and artists.
Even Stability AI CEO Emad Mostaque has forecast that “[f]uture [AI] models will be fully licensed”. But Stable Diffusion is not. It is a parasite that, if allowed to proliferate, will cause irreparable harm to artists, now and in the future.
The problem with diffusion
The diffusion technique was invented in 2015 by AI researchers at Stanford University. The diagram below, taken from the Stanford team’s research, illustrates the two phases of the diffusion process using training data in the shape of a spiral.
The first phase in diffusion is to take an image (or other data) and progressively add more visual noise to it in a series of steps. (This process is depicted in the top row of the diagram.) At each step, the AI records how the addition of noise changes the image. By the last step, the image has been “diffused” into essentially random noise.
The second phase is like the first, but in reverse. (This process is depicted in the bottom row of the diagram, which reads right to left.) Having recorded the steps that turn a certain image into noise, the AI can run those steps backwards. Starting with some random noise, the AI applies the steps in reverse. By removing noise (or “denoising”) the data, the AI will produce a copy of the original image.
In the diagram, the reconstructed spiral (in red) has some fuzzy parts in the lower half that the original spiral (in blue) does not. Though the red spiral is plainly a copy of the blue spiral, in computer terms it would be called a lossy copy, meaning some details are lost in translation. This is true of numerous digital data formats, including MP3 and JPEG, that also make highly compressed copies of digital data by omitting small details.
In short, diffusion is a way for an AI program to figure out how to reconstruct a copy of the training data through denoising. Because this is so, in copyright terms it’s no different than an MP3 or JPEG—a way of storing a compressed copy of certain digital data.
For more about how a diffusion model is able to reconstruct its training images through denoising, see Diffusion Models: A Practical Guide, Diffusion Models—More Than Adding Noise, Introduction to Diffusion Models for Machine Learning, and What are Diffusion Models? (Links ordered by increasing level of math.)
Interpolating with latent images
In 2020, the diffusion technique was improved by researchers at UC Berkeley in two ways:
They showed how a diffusion model could store its training images in a more compressed format without impacting its ability to reconstruct high-fidelity copies. These compressed copies of training images are known as latent images.
They found that these latent images could be interpolated—meaning, blended mathematically—to produce new derivative images.
The diagram below, taken from the Berkeley team’s research, shows how this process works.
The image in the red frame has been interpolated from the two “Source” images pixel by pixel. It looks like two translucent face images stacked on top of each other, not a single convincing face.
The image in the green frame has been generated differently. In that case, the two source images have been compressed into latent images. Once these latent images have been interpolated, this newly interpolated latent image has been reconstructed into pixels using the denoising process. Compared to the pixel-by-pixel interpolation, the advantage is apparent: the interpolation based on latent images looks like a single convincing human face, not an overlay of two faces.
Despite the difference in results, in copyright terms, these two modes of interpolation are equivalent: they both generate derivative works by interpolating two source images.
Stable Diffusion converts training images to latent images using a variational autoencoder (or VAE). For a general overview, see Intuitively Understanding Variational Autoencoders (which notes that these systems are “specifically useful for reconstructing [their] own input.”). For examples of Stable Diffusion’s variational autoencoder in use, see Stable Diffusion using Hugging Face—Looking under the hood (“the encoder acts like a compressor that squishes the image … and the decoder recreates the original image”). For examples of its lossy-compression characteristics, see Stable Diffusion Based Image Compression.
Mathematically, the latent images produced by an autoencoder are modeled as points in a multidimensional geometric space called latent space. For a general overview of how latent images are stored and interpolated in latent space, see Understanding Latent Space in Machine Learning and Latent Space Visualization (both of which have excellent diagrams and examples). For examples of latent-image interpolation using Stable Diffusion, see A walk through latent space with Stable Diffusion.
The UC Berkeley researchers also introduced the idea of handling diffusion with an AI model called a U-Net. The U-Net was invented by biomedical researchers in Freiburg, Germany in 2015 for the purpose of image segmentation, or detecting groups of visually related pixels. To do this, the U-Net applies a set of compressions and decompressions to the image. For more about how Stable Diffusion relies on a U-Net, see Stable Diffusion with Diffusers, Stable Diffusion using Hugging Face—Looking under the hood, and You Can’t Spell Diffusion without U.
The parallel to JPEG and MP3 is mathematically deliberate, because both formats compress data using Fourier transformation. The U-Net is a convolutional neural network, which is a conceptual descendant of Fourier transformation. For more about this kinship, see Understanding Deep Convolutional Networks. Geometric-data researcher Jean Feydy offers some nice insights here and here, and cautions “don’t be fooled by the hand-waving, pseudo-biological intuitions that are often bundled with these methods … Convolutional architectures are miles away from being ‘biologically inspired’.”
Conditioning with text prompts
In 2022, the diffusion technique was further improved by researchers in Munich. These researchers figured out how to shape the denoising process with extra information. This process is called conditioning. (One of these researchers, Robin Rombach, is now employed by Stability AI as a developer of Stable Diffusion.)
The most common tool for conditioning is short text descriptions, also known as text prompts, that describe elements of the image, e.g.—“a dog wearing a baseball cap while eating ice cream”. (Result shown at right.) This gave rise to the dominant interface of Stable Diffusion and other AI image generators: converting a text prompt into an image.
The text-prompt interface serves another purpose, however. It creates a layer of magical misdirection that makes it harder for users to coax out obvious copies of the training images (though not impossible). Nevertheless, because all the visual information in the system is derived from the copyrighted training images, the images produced—regardless of outward appearance—are necessarily works derived from those training images.
How Stable Diffusion combines these pieces
Within Stable Diffusion, the pieces described above are implemented as three separate AI models that cooperate. For details of how these three models work together, see Stable Diffusion using Hugging Face—Looking under the hood and The Illustrated Stable Diffusion.
Stability AI, founded by Emad Mostaque, is based in London.
Stability AI funded LAION, a German organization that is creating ever-larger image datasets—without consent, credit, or compensation to the original artists—for use by AI companies.
Stability AI is the developer of Stable Diffusion. Stability AI trained Stable Diffusion using the LAION dataset.
Stability AI also released DreamStudio, a paid app that packages Stable Diffusion in a web interface.
DeviantArt was founded in 2000 and has long been one of the largest artist communities on the web.
As shown by Simon Willison and Andy Baio, thousands—and probably closer to millions—of images in LAION were copied from DeviantArt and used to train Stable Diffusion.
Rather than stand up for its community of artists by protecting them against AI training, DeviantArt instead chose to release DreamUp, a paid app built around Stable Diffusion. In turn, a flood of AI-generated art has inundated DeviantArt, crowding out human artists.
Midjourney was founded in 2021 by David Holz in San Francisco. Midjourney offers a text-to-image generator through Discord and a web app.
Though holding itself out as a “research lab”, Midjourney has cultivated a large audience of paying customers who use Midjourney’s image generator professionally. Holz has said he wants Midjourney to be “focused toward making everything beautiful and artistic looking.”
To that end, Holz has admitted that Midjourney is trained on “a big scrape of the internet”. Though when asked about the ethics of massive copying of training images, he said—
There are no laws specifically about that.
And when Holz was further asked about allowing artists to opt out of training, he said—
We’re looking at that. The challenge now is finding out what the rules are.
We look forward to helping Mr. Holz find out about the many state and federal laws that protect artists and their work.
Our plaintiffs are wonderful, accomplished artists who have stepped forward to represent a class of thousands—possibly millions—of fellow artists affected by generative AI.
Sarah Andersen is a cartoonist and illustrator. She graduated from the Maryland Institute College of Art in 2014. She currently lives in Portland, Oregon. Her semi-autobiographical comic strip, Sarah’s Scribbles, finds the humor in living as an introvert. Her graphic novel FANGS was nominated for an Eisner Award.
Sarah also wrote The Alt-Right Manipulated My Comic. Then A.I. Claimed It for the New York Times.
Kelly McKernan is an independent artist based in Nashville. They graduated from Kennesaw State University in 2009 and have been a full-time artist since 2012. Kelly creates original watercolor and acryla gouache paintings for galleries, private commissions, and their online store. In addition to maintaining a large social-media following, Kelly shares tutorials and teaches workshops, travels across the US for events and comic-cons, and also creates illustrations for books, comics, games, and more.
Karla Ortiz is a Puerto Rican, internationally recognized, award-winning artist. With her exceptional design sense, realistic renders, and character-driven narratives, Karla has contributed to many big-budget projects in the film, television and video-game industries. Karla is also a regular illustrator for major publishing and role-playing game companies.
Karla’s figurative and mysterious art has been showcased in notable galleries such as Spoke Art and Hashimoto Contemporary in San Francisco; Nucleus Gallery, Thinkspace, and Maxwell Alexander Gallery in Los Angeles; and Galerie Arludik in Paris. She currently lives in San Francisco with her cat Bady.
If you’re a member of the press or the public with other questions about this case or related topics, contact firstname.lastname@example.org. (Though please don’t send confidential or privileged information.)
If you’d like to receive occasional email updates on the progress of the case, click here to sign up.