Using “underdrawings” for accurate text and numbers

I discovered a technique for generating reliable text and numbers in AI generated images.

For example, the following image is considered impossible with state of the art image models. But I made this with Gemini 3.0 Pro (plus one extra step I’m going to explain below).

ChatGPT-Images-2 which released earlier this week does a great job with accurate text and numbers. So I had assumed this technique was now moot and had been already absorbed by the models already.

But no—this method still works better than Gemini 3.0 Pro and ChatGPT-Images-2.

It suggests they’re not doing this technique, which is surprising, but I suspect it won’t be long until they’ll all do this.


The Underdrawing Method

I’m totally naming it like it’s a thing but it does seem to be a thing

Example

It’s easiest if we do a baby A/B test so we can see the effect with and without.

Let’s pick a simple prompt that gemini and chatgpt will get the numbers wrong on (and increasingly difficult challenge btw - they’re able to do periodic table nearly perfectly now). But I find custom layouts of numbers easy to still fail on.

Make an image of a game board with 50 stepping stones arranged in a spiral, winding counter-clockwise inward from start at the outside (1) to finish at the centre (50). Each stone is clearly numbered consecutively from 1 to 50. Style: claymation diorama, studio-lit, candy-bright, soft bokeh background.

❌ Gemini 3 Pro

As expected. Impressive at first glance but falls apart once you start reading. Also not a spiral.

❌ ChatGPT Images 2

I was so impressed with ChatGPT-Images-2 release I expected it to get this. Very surprising to see it fail similarly to Gemini. Also not a spiral.

✅ Gemini 3.0 Pro with underdrawing method

Bingo

How it works

There will be far more intelligent and elaborate ways to do this. This was a quick method I came up with one day while trying to generate an image of a 100-step adventure board for my kid.

Use deterministic and generative machines for what they’re good at

One day I had a problem to solve and I noticed how my best tools only allowed the two extremes:

  1. SVG/HTML will make dry visuals but with excellent math and precision
  2. Image Gen models will create stunning visuals but with famously unreliable math/text

Those are such different computing processes. It seemed clear we need the genius analyst to do the precision and the genius artist to do the creative bit, and my job was to figure out how they could work together.

So I spent an afternoon figuring out how to do that. Well, obviously Claude did all the work (thank you Claude), but I had some ideas and helped with reading. Anyway, since Gemini is multi-modal, it means you can pass in images AND text in the same prompt.

Give it an underdrawing and ask it to paint on top

  1. Layer 1: The “underdrawing” (deterministic): Layout the numbers and text in the correct positions and orientations in whatever language/format you prefer (svg, python, mermaid) — you just need to export an image of it with the pixels of the numbers/text.

  2. Layer 2: The “painting” (generative): Make an image generation request to Gemini 3.0 Pro or greater (you just need image+text input support) where you’ll include your underdrawing and the prompt for the visual style you want.

Example

Step 1 of 2: generate the numbers/text outline with SVG

Ask Claude code to generate it for you, and iterate until you’re happy with wireframe version

Make an SVG of 50 stepping stones arranged in a spiral, winding counter-clockwise inward from start at the outside (1) to finish at the centre (50), each stone numbered consecutively from 1 to 50. Each stone is a different shape: circle, square, triangle, hexagon.

Step 2 of 2: Pass the underdrawing + prompt to image generation

Ask Claude to provide the SVG you made in the prior step to Gemini Pro and transform it visually without changing the numbers, e.g.

Transform this image into a photographed claymation diorama of assorted artisan chocolates and candies, arranged in a spiral path winding counter-clockwise inward from start (1) at the outside to finish (50) at the centre, viewed from a low-angle tilted perspective. 

That’s it

It isn’t hard. By now claude code and codex can just do every step of that. So I’m genuinely surprised all the frontier labs weren’t already doing something like this.

And remember, no matter how amazing the technique might be, it’s not 100%.

Thank you for the reality check, 71.