DALL·E 2: The First "AI Art" Shock
· Jerwin Arnado
Archive note: this is a backdated post, written years later while rebuilding this site. It’s dated to the moment it covers, but the hindsight is real.
On April 6, OpenAI announced DALL·E 2, and for the first time in a while I sat at my desk genuinely unsure how to file what I was seeing. You type “an astronaut riding a horse in a photorealistic style” — and it draws one. Not collages something. Not searches one. Draws one. A teddy bear doing chemistry on the moon. A bowl of soup that is a portal to another dimension. Whatever sentence you can write, it can attempt — at 1024px, often disturbingly well.
It’s waitlist-only research preview for now, so like most people I’m reacting to curated demos and early-access threads, which is worth flagging. Cherry-picked outputs are marketing. But even discounting heavily — the floor of what this demonstrably does did not exist a year ago.
What it actually does
Beyond text-to-image, two features hint at where this goes:
- Inpainting: select a region of an existing image, describe what should be there instead, and it edits coherently — shadows, reflections, lighting respected. That’s not a toy; that’s a Photoshop workflow.
- Variations: hand it an image, get alternates in the same style. Composition as a parameter.
Under the hood it’s a diffusion model — start from noise, iteratively denoise toward an image matching the text’s meaning, with the text understanding inherited from CLIP-style training on enormous image-caption pairs. Knowing this does not make the outputs less uncanny. I understand how the trick works and my brain still files the results under “drawn by someone.”
The reactions worth recording
I want to timestamp my honest reactions, because I suspect this is one of those moments we’ll retroactively flatten:
- The “this changes things” feeling is different this time. Tech demos impress and evaporate. This one immediately rearranged my sense of which tasks are safe from automation. “Creative work is what machines can’t do” was a load-bearing assumption for a lot of people’s careers, and it just visibly cracked.
- Artists are right to be alarmed, and it’s complicated. The model learned from millions of images made by people who weren’t asked. Whether that’s “how all artists learn, accelerated” or “industrial-scale style extraction” is going to be fought over for years. Both framings contain truth, which is what makes it hard.
- Prompting is a skill, apparently. Early users are already discovering that how you phrase the sentence dramatically changes output quality. There’s something hilarious and profound about the newest interface to the most advanced AI being… writing carefully.
- The gatekeeping is temporary by nature. OpenAI controls access, filters content, watermarks output. But the paper is public and the technique is reproducible. Whatever safety posture exists today, assume open-weight versions of this capability are coming — the only question is the delay. (Remember this paragraph.)
The note to future me
My pattern with these posts is to bank predictions, so: I think image generation is the opening act. The same underlying move — learn the structure of human output, generate plausible new instances from a description — has an obvious next target, and it’s the one I type for a living. The demos of code-writing models are already circulating. Filed under “things I’m watching with professional interest and personal vertigo.”
For now: somewhere out there, a horse is being ridden by an astronaut, and nobody drew it. What a strange month.