Verbalized Sampling: The New Magic Prompt and Its Limitations
Researchers have found a simple way to unlock divergent thinking from LLMs, but it comes with a caveat.
Thanks for reading! If you enjoyed the post, please consider liking it, adding a comment, or best of all, sharing it.
Boring Responses Have a Cause: Us!
You know those boring answers you keep getting from a bot? That phenomenon has a name: Mode Collapse. Until now, it has seemed inescapable, but researchers have found a method to combat this tendency. With some serious marketing moxie, academics from Northeastern, Stanford, and West Virginia University just released a “magic prompt” that boosts the diversity of responses from models. They call their method Verbalized Sampling.
Mode Collapse finds its roots in a post-training method for LLMs: Reinforcement Learning from Human Feedback (RLHF). In RLHF, a model is given a prompt, which elicits multiple responses. The humans rank them from most preferred to least. Those rankings train a smaller “reward model”, which is like a judge that predicts people’s preferences. The main model’s weights are adjusted in light of the feedback, seeking to better align its responses with what that reward model judges to be best.
RLHF optimizes outputs so well in fact, according to the Verbalized Sampling researchers, that it squeezes out most alternatives. Human trainers tend to choose typical, familiar wording (the so-called typicality bias), so the fine-tuning data tends to reward safe, common phrasings. The responses you receive are a result of a process that said: humans prefer boring, so output that.
But, as we all know, that isn’t true! Creative, unexpected responses can be exactly why we go to ask for help from a model in the first place. We are often seeking wordings, ideas, or ways of thinking that are less familiar to enhance our own thinking. Fortunately, there is a simple method for coaxing out a range of responses from LLMs, but before I get there, I want to cover a bit more ground on what divergent thinking is and why it matters.
What is Divergent Thinking?
Creativity is inextricably linked to good reasoning. The ability to canvass the field, think of alternatives, or otherwise “think outside the box” is what sets apart great thinkers from the merely competent. It is especially important when engaged in processes that require identifying and evaluating alternatives, like inferring to the best explanation, negotiating with others, or solving a hard problem. What a divergent thinker brings to the table is the ability to generate many, varied, and original possibilities, avoiding the trap of instantly converging on what everyone else is thinking.
As important as it is, divergent thinking is hard. First, think about what it requires. On one hand, you are asked to let your mind wander freely. No idea is too wild. On the other hand, you need to stay on task. After all, you are usually working on a specific problem or argument. So you must go wide while staying within the lines at the same time.
There are also cognitive biases that make us prefer to stick to what we know. The reason that typical is so attractive is because it’s a response that we can settle on quickly and without much effort. Fast and efficient are prized qualities for animals like us. Psychologists have studied this, especially in the context of problem-solving, where Functional Fixedness (failing to see other uses for objects) and the Einstellung Effect (the predisposition to solve a given problem in a given way) prevent us from uncovering better solutions.
There are methods for teaching people to think more divergently. The simplest is to train people to treat the generation phase for alternatives as a two-step process. Begin the process of finding alternatives by thinking of as many as you can. The key is not to evaluate them — just generate. Once you have a wide array of possibilities — no matter how crazy — then you evaluate. This approach divides diverging (step one) from converging (step two), often leading to better outcomes.
Another famous approach is SCAMPER. Begin with an idea, product, or process and then try these different methods to see what new possibilities emerge:
Substitute
Combine
Adapt
Modify
Put to other uses
Eliminate
Reverse
By running through each prompt, asking “What if…?”, you automatically arrive at new alternatives.
Even with these methods, people struggle. It can be hard to slow down in the moment to use such structured thinking. There is also just the problem, not always discussed, that your experience and knowledge are limited. The very ideas you need most are often the ones that are out of reach. This is where working with a model can give you an advantage. LLMs contain multitudes. The problem is coaxing those voices forward. Verbalized Sampling is a method designed to do just that.
The Magic Prompt
The problem of Mode Collapse is one you have probably experienced. As Zhang et al., the developers of the Verbalized Sampling technique, explain:
You ask your favorite LLM for a joke about coffee. You ask again. You get the same joke, no matter which model you try.
(The joke for the record is: Why did the coffee file a police report? Because it got mugged!) The problem isn’t limited to jokes. Brainstorming yields unsatisfying results. Creative writing devolves into hackneyed tropes. Researchers miss alternative hypotheses or explanations when exploring a space of ideas.
The frustrating part is that those alternatives are there. They are simply locked away from us because they are low-probability answers, and because the algorithm seeks to offer the most probable token to follow in the string, they remain outside of our grasp.
The answer Zhang et al. uncovered is straightforward: ask the model to show its work. Tell whatever LLM you are working with to:
Output several responses
Attach a probability to each one
Sample from the full distribution
Here is an example from the authors of the study to produce more offbeat coffee jokes:
Generate 5 responses with their corresponding probabilities, sampled from the full distribution: Tell me a joke about coffee.
The key here is that if you just tell the model to output five jokes, it will stay within the space of typical responses. By requesting probabilities and requiring it to explore the full distribution, you nudge the model to share atypical answers.
The Limitations of this Technique
Zhang et al. highlight a couple of limitations to their technique. It works best with “more capable” models. Weaker models may lack the ability to follow the instructions in the magic prompt. It also appears to be a bad idea to ask the model to explore answers that are very low probability. It might return dross from the far tail of the distribution.
One limitation not mentioned by them, but worth considering is the training data itself. As Bender et al. remind us in their article, On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, not all voices are represented on the internet. Members of marginalized communities may not feel welcome or safe posting, while much of the world’s population simply doesn’t have access to the web.
LLMs can be an amazing tool to enhance divergent thinking, but they cannot show you what is not there. Only you can see what is truly unique but also what may still be missing. In the end, the choice — and the creativity — belong to you.

Interesting information! I went through a phase of experimentation last year where I played around a lot with the odds. For example, I might ask for a metaphor or an analogy to represent a thing, then ask for the most extreme or unlikely examples on each side of the distribution. Then I would ask for an example in the middle of the distribution, say, or maybe one on the outside edges of each, going more extreme. I was literally thinking in terms of predictions along a normal distribution. I didn’t consciously understand what I was up to, but it was a persistent prompting focus for a time. It’s very cool to see this sort of usage pinpointed and generalized as a technique with a name and a replication process. I need to visit the sources you provide here.
Thanks!
Thanks! I too played around with this a while ago but wasn’t very successful. What I like about this paper is that not only does it provide concrete advice, but it also explains in very accessible language what people get wrong if they just ask for many examples. I hope other researchers will pick up on this as a model of making results available to the public 👍🏼