Wading through latent space to get what you want

I’m sitting here tinkering on a few different projects: chatting with an artist for a small puzzle game, trying to get Replit Agent to build something, and even playing around with Genmoji on my iPhone. Even though I’m using them differently, they made me realize how important it is to clearly communicate for what you want. And how frickin’ hard it is to do so. But if we can figure out how to do that well, I believe it will be the key to solving any problem we have.

I was facing two big problems. When working with an artist on my game, trying to decide what it is I’m looking for in an art style and then articulating it to her is so challenging. Every time I review the updated work I consider giving up. It’s not that the work was bad, it just wasn’t what I was looking for and I was struggling to communicate it to her. I didn’t actually know what I was looking for. This was a me-problem.

With Replit, I had a clearer idea of what I wanted. I wanted to build a simple service to track Amazon book prices. I was able to explain everything in detail and we iterated together for a long time before I gave up. Every time it thought it was making something work…it just wasn’t. Like it wouldn’t actually function as expected. This was a model problem.

And with Genmoji…Well. I made a dabbing panda emoji first and haven’t found a solid use case for it since then. It’s awesome to make any emoji you want, but then you also realize how challenging it is to figure out what emojis you actually want to create. After that I haven’t really found situations where I wanted a custom emoji that wasn’t for the sake of it.

Generative AI tools are incredibly powerful if you know what you want to get out of them. But do you know what you want? And do you know what is even possible?

To articulate what you want you need to first know what you want. Knowing what you want is a mix of the current situation combined with your prior experience. If we don’t have the right vocabulary, we are forced to make do with insufficient words and ideas in an attempt to convey complex concepts. New combinations are less legible to us, and to the tools we use.

So why is this so damn hard?

Well, to start you have to know what you want. This relies on the murkier concept of taste. But the second and harder step is finding the words and metaphors to explain it. This is harder than it looks. It results in a lot of “I’ll know it when I see it”; not helpful at all. If you’re always waiting to judge each result, how will a collaborator or AI know it’s on the right track?

Can looking at biology give us a bit of a clue?

Evolution’s Creative Path

When creating something, each iteration takes a lot of time. Progress is made by slowly nudging things where you want them to go through feedback and experimentation. It’s never a direct path, but instead a constant process. This is where biology comes in. In many ways, creativity is evolutionary.

In The Blind Watchmaker, Dawkins describes natural selection as random mutation plus feedback from the environment, constantly nudging evolution into the “right” direction. To explore this, he created Biomorphs, a computer program to simulate a simplified version of evolution.

Each generation of biomorphs evolved based on a random change in one of their “genes”.

With a few simple rules, the program generates a few different shapes known as biomorphs. Based on his own personal discretion, Dawkins manually selected which generation survives onto the next iteration at which point a random mutation happens. And so it repeats, continuing down a path while randomly mutating each time. Over multiple generations you get some pretty crazy results. Things he never expected to see.

How it started (left). How it ended up (right)

In his biomorphs, he was able to achieve all sorts of “emergent designs” with only 9 different genes (variables)! Imagine what things could have looked like with more diversity.

In theory, over a long enough time period, you can slowly nudge one creative idea into what you’re looking for without having a clear destination or articulation at the start. Or to put it more neatly, you either know what you want up front or iterate your way there.

But theory is not always reality, and when working with others, infinite iteration isn’t practical nor possible. People get annoyed when you constantly change your mind and change direction. It’s frustrating! You don’t have the luxury of iterating forever.

BUT. AI could change this.

I’ve asked ChatGPT to iterate on an idea hundreds of times and it never got annoyed at me. Of course you might also still get annoyed because of how slow AI can be to get you to the right spot. It still might take you near-infinite iterations anyway 🤣.

This is a solvable problem though. Wrong output = wrong input.

And here is where I’m going to make a leap.

The latent space theory of creativity

When an AI model is trained, it creates a map of its knowledge (training data) in what is called latent space – a multidimensional representation of patterns and relationships in the data it has learned. This sounds crazy but just imagine sorting all of the data into a bunch of different piles based on what makes the most sense to the model. It’s not always organized in a way that would be immediately obvious to humans (ex: size, color, texture), but concepts are clustered in space according to their determined meaning. When a model is generating an output, it crawls through this latent space based on the inputs you give it to produce something that you are expecting.

A visual example of how data might get organized within a model (source)

But this latent space is limited – it’s only made up of what is in the training data. Within its bounds, the model can infer almost anything. But try to go beyond it and it doesn’t know how.

However when it comes to creativity, we can imagine a much larger latent space—one that isn’t limited by past data. And in this conceptual space, every possible idea or design exists, even if we haven’t articulated it yet. The challenge is knowing how to navigate through this space: defining the dimensions that matter and iterating until we reach the ideal outcome. While AI gives us a powerful tool to explore its own latent space, the real work is articulating our desired outcome.

I believe this idea represents the secret to not only creativity, but potentially all of humanity’s future progress. OK, that’s a little bold. We went from making a dabbing Genmoji to solving humanity’s biggest problems. It’s not exactly the same as AI, but it’s close. To play this out, let’s start with a basic creative problem: designing a mug.

Designing the perfect mug

Imagine you are trying to design the perfect mug. Your first sense of taste pushes you towards designing something funky. So you ask AI to generate a funky mug design.

It’s not what you want, but we can iterate our way to something better. So you ask it to be more funky. You repeat this a few times and realize you aren’t getting closer. As you jump around the model is trying to interpret what you’re asking for and exploring the space related to “funkiness”.

There is a spectrum that you’re iterating along as you give more feedback.

So we add a new dimension: colourful. We update our prompt.

I don’t think we are getting any closer.

Let’s go back to the idea of latent space. Think of the possibility space of all possible mugs as a spreadsheet (a 2d grid) of funky x colourful combinations.

The leftmost column is 0 funk and the rightmost column is maximum funk. And the bottom row is 0 colourful and the top row is mind-blowing colour. This grid represents the latent space we are operating in. And every single cell in this grid is a potential mug design. If we have only 100 options for funky and 100 options for colourful (100×100), we have 10,000 different mugs to choose from. It’s a fairly small space to fully explore (relatively speaking).

Each square of the spreadsheet is a potential mug.

If we filled in every single cell (imagine in this example there are 100 cells), we would be able to see the full range of funky and colourful mugs. So we give it a whirl. But looking through all 100 iterations, it’s still not right. The funky-colourful spectrum doesn’t represent what you’re looking for.

I wasn’t about to draw 100 mugs.

Something must be missing.

If what we are looking for isn’t there, it means we are still missing a dimension (property). So now we add a 3rd dimension.

A 3-dimensional “spreadsheet” where each cell (I didn’t draw them) is a potential mug

This is where we run into the limits of both my drawing ability and what can be feasibly visualized in a 2D image. With a scale of 0-99 for each dimension, you would have a million options. That would take you forever to generate each one. Remember, we can’t just generate all of them at once – we have to give feedback to move it within the possibility space.

Navigating the latent space from initial prompt to the ideal mug with feedback.

Each time, we give feedback or add a new dimension to get slightly closer to the ideal, until we have this n-dimensional matrix or space that covers all the potential things we might want. Once we are past 3 dimensions, it becomes very hard to visualize but hopefully you can imagine adding more and more until we get the perfect mug. This means that the perfect thing is already out there; you just need to figure out how to describe it.

This is probably a bit of a mindbender so another way to visualize this is making every dimension a slider. Instead of using words, you can control the funkiness and florality on a scale of 0-100 and regenerate the image. Instead of using words directly, you are visually fine-tuning what you’re looking for. In the background, we would convert slider values into the corresponding prompt. This is way more tedious in practice but a useful model for understanding the idea. What this does is turn every problem into an optimization problem.

Simple UI I made with v0

This is actually one way I think we can make creativity and problem solving more legible as well as serve as a potential UI idea for interacting with GenAI tools. My friend Sasha was exploring this previously. This definitely downplays the difficulty of finding new parameters (sliders) to add, but helps make the concept more intuitive.¹

What’s crazy to me about imagining all options and combinations this way is that if we have all the right variables, then we KNOW the perfect option exists within the matrix. It has to be in there, otherwise we are missing a factor. We won’t know the values though (how funky on a scale of 0 – 99) so it still won’t be easy to find. But it’s there.

Going back to Dawkins and his biomorphs, what if, instead of slowly iterating over time, he just had a spreadsheet with all 9 variables and 10,000 different combinations of each of the values up front? Could he get to the “ideal” biomorph immediately by scanning through all options?

On the one hand this would really speed up the view of what’s possible. You could just skip straight to the end. On the other hand, you would need to know what you want up front, and have the time and patience to review all outcomes. Instead of iterating over outcomes as you go, you would need a new system to comb through the billions of options. I originally said 10,000, but if you have 9 variables with 10 options each (1-10), that’s 10⁹ (1,000,000,000) options.

It turns out feedback plus iteration is a faster way to get to your ideal outcome instead of pure brute force. We would need to both generate all 1,000,000,000 options up front (something that is still time consuming) and then spend time reviewing each of them to find the ideal. By slowly going through successive generations of prompting, we could get there in a fraction of the time. In the book, Dawkins gives the canonical example about monkeys pounding on keyboards to type the a Shakespearean sentence, “METHINKS IT IS LIKE A WEASEL“.

If we assume there is a 27 dimension space (one for each letter of the alphabet plus a space) and only 1 item in there is perfect, there is “about 1 in 10,000 million million million million million million” odds of it being correct through pure chance. Compare this to <100 generations of cumulative evolution, one that moves based on how far it is from the ideal. In our scenario, the cumulative nature is our feedback guiding it.

Optimism through the model

Visualizing this space as a “spreadsheet” lets us approach creativity and problem solving in a far more systematic way instead of a mystical one. You could start to discover what dimensions might be missing; or at the least that something is missing.

Maybe looking at it this way is demoralizing if you’re a creative person, but if you believe that everything is a remix, this can help frame your own search for originality. It represents SO MUCH unexplored space. And that excites me. By organizing knowledge and information in new ways, we can start to map out untouched ideaspace.

You can see in the bottom-right how new whitespace emerges once data is organized within latent space. There’s no guarantee that what’s in here is anything better than nonsense, but it represents a bunch of new opportunities for exploration.

This makes me incredibly optimistic and excited about solving challenging problems. If we view latent space more broadly—beyond just AI training data—as a framework for systematically exploring complex sets of variables, you can apply that mindset to virtually any problem or thing. The perfect thing (solution, design, idea, paragraph) exists out there, you just need to find the right variables and words to guide yourself or a model along the path to get there.

So why can’t we just explore latent space this way?

To start, we can only access a subset of the model’s knowledge unless we invest in growing our own knowledge. Our own knowledge is just a starting point; we can explore alongside models, seeing what they know to discover things we don’t already know. We are the ones to guide it through latent space.

Ironically, when you have a model with all the potential knowledge in the world, the limiting factor becomes your own knowledge. Your ability to problem and nudge the model into the space you want is a function of what you know, not the model’s training data. We are already assuming the model can create everything and anything. But there’s little value if you don’t have the skills and tools to make the most of it. Your experience is the alpha.

It’s hard to actually nudge the model’s output in the right direction unless you can already articulate it clearly. And even then, there’s some randomness in the output. And even harder to know what dimensions matter in what you’re looking for. If you don’t have the vocabulary to notice it, how can you express it?

The first step is to know what you’re actually looking for. You’ll need a bunch of personal inputs to discover what you like and develop your taste. Spend time noticing things you like. In a bit of a roundabout way, exploring iterations alongside AI can help you discover your own taste. You can also ask AI (or friends) how to get better at discovering your taste, or what their read on your taste is as well.

When I was working with Replit Agent, it was annoying to have to iterate so often but helped by the fact that I can usually communicate what I‘m looking for to developers and occasionally understand why something is not working. It would be hard to do so without that experience and only the ability to say “it’s not working” or “that’s not what I want” over and over.

Next comes articulation. Words (or other forms of communication) are the tools we use to help navigate the latent space. The more precise and nuanced your words, the more effectively you can guide someone (or a model) towards the outcome you’re imagining. But when your language fails to capture your vision, you’re essentially wandering blind through latent space. As prompting becomes more important and costly, making iteration more expensive, clarity in prompts will be key.

Some people (great creators, designers, and leaders) excel at this because they know how to articulate what they’re looking for. They’ve developed an intuitive understanding of the latent space in their field (e.g., color theory for artists, storytelling for writers, emotional tone for filmmakers) so they can deftly navigate the space with precise language. They’ve practiced translating desires and a grand vision into specific tweaks to variables.

And now for the final secret: you don’t need an AI model to do this. You have your brain and the brains of others. The universe contains all the knowledge you need. Once you understand how you want to iterate and move the things you’re working on to create what you’re looking for, you can go do it.

Whether dealing with a collaborator or an LLM, you start with a prompt, give them feedback, and then repeat that over and over, slowly exploring the latent space to find what you’re looking for.

Today, each change requires us to generation new images. Imagine what’s going to happen when the outputs change in realtime and you can just adjust the slider and see how the output responds. ↩︎