prompt-engineering
10 min read
View as Markdown

Beyond Basic Prompting: Patterns That Actually Change How AI Thinks

Move past simple prompts with techniques like self-consistency, tree of thought, and meta-prompting. Practical methods that produce better AI outputs.

Robert Soares

Most prompting advice is obvious. Be specific. Give examples. Add context.

That gets you maybe 60% of the way to useful output, and the remaining 40% is where things actually get interesting because that’s where language models start failing in predictable ways that require different thinking entirely.

The techniques here aren’t secret. They’re well documented in research papers and discussed constantly on forums like Hacker News and Reddit. But understanding when to apply each one, and more importantly when not to, separates people who get consistently good results from people who blame the model when things go wrong.

Why Language Models Fail in Predictable Ways

Here’s the core problem. LLMs generate text left to right, one token at a time. Each token constrains what comes next. Once the model commits to a reasoning path, it rarely backtracks.

This works fine for simple questions. It fails for anything requiring exploration.

A Hacker News commenter, cube2222, illustrated the compounding error problem: “if each step has a 97% chance of being completed correctly, if your task requires 10 steps one after the other, the chance of success falls to 97%*10=74%.” Ten steps with 3% error rate per step drops you to 74% overall success. Twenty steps? Around 54%.

The patterns that follow all address this fundamental limitation. They add exploration where there was only commitment. Verification where there was only generation. Branching where there was only linearity.

Self-Consistency: Ask Multiple Times, Trust the Majority

The simplest advanced technique. Run the same prompt several times with higher temperature. Extract the final answer from each response. Take the most common one.

This works because language models are probabilistic. The same question produces different reasoning paths each run. Sometimes these paths contain errors that cascade through the rest of the work. But different runs make different errors. When you aggregate, correct reasoning reinforces itself while errors cancel out.

The math is straightforward. If your model gets the right answer 60% of the time on a single run, five independent runs with majority voting pushes accuracy toward 80%. The technique was proposed by Wang et al. and shown to boost performance on arithmetic and commonsense reasoning significantly.

Self-consistency shines on problems with verifiable single answers. Logic puzzles. Factual questions. Anything where you can check if the response is correct. It struggles with creative tasks where there’s no “right” answer, or with problems where the model makes the same systematic error regardless of path.

The cost is obvious. You’re paying for 5 to 10 times the tokens. For a production system handling millions of queries, the economics don’t work. For high-stakes individual queries where accuracy matters more than cost, it delivers.

Tree of Thought: When Linear Reasoning Isn’t Enough

Chain-of-thought prompting, where you ask the model to show its work, helps with many problems. But once a model starts down a reasoning path, it commits.

Tree of Thought changes this. Instead of generating one path, you generate multiple potential next steps at each decision point. You evaluate them. You only pursue promising branches. You can backtrack when something leads nowhere.

The gains on certain problems are dramatic. On the “Game of 24” puzzle, where you use four numbers and basic operations to reach exactly 24, Princeton researchers found GPT-4 with standard chain-of-thought solved only 4% of problems. With Tree of Thought? 74%.

That’s not marginal improvement. That’s the difference between useless and useful.

But the technique has real costs beyond just tokens. On Hacker News, user startupsfail identified practical challenges: “it is: costly, slow, there is node collapse, it impacts context length, it injects biases.” The overhead of multiple generations per step, evaluation of each branch, and tracking the entire tree structure adds up fast.

Tree of Thought earns its cost for planning problems, puzzles with multiple valid approaches, and creative tasks where your first idea rarely turns out to be the best. For simple factual questions, it’s overkill that burns tokens without improving results.

Prompt Chaining: Breaking Complex Work Into Stages

Some tasks are too complex for a single prompt. Not because the model can’t handle complexity, but because the problem has genuinely distinct phases that benefit from different approaches.

Prompt chaining splits work into stages where each prompt’s output feeds the next prompt’s input. Extract relevant quotes from a document in prompt one. Use only those quotes to answer a question in prompt two. The first focuses entirely on finding. The second focuses entirely on reasoning.

This separation does several things. It keeps each prompt focused on one job, which models handle better than multi-part instructions. It lets you inspect intermediate results, catching errors before they cascade. And it allows different prompts to use different configurations, perhaps different temperatures or even different models playing to their respective strengths.

One Hacker News user, coolKid721, described the workflow: “Breaking it down into parts and having multiple prompts with smaller context that all have structured output you feed into each other.”

The technique breaks down when steps have tight dependencies that don’t cleanly separate, or when intermediate output loses context needed later. You can solve this by passing more information through the chain, but that increases tokens and creates new failure points.

Start with two stages. Get those working well. Only add more stages when you have clear evidence the split helps.

Reflection: Making the Model Check Its Own Work

If ChatGPT can think, it can only think out loud.

Everything the model considers has to appear in its output. There’s no hidden internal deliberation. Reflection prompts exploit this by making self-checking explicit. You ask the model to solve a problem, then ask it to review its solution and identify errors.

On Hacker News, user nate shared a common observation: “I constantly ask chatGPT: ‘are you sure?’ to it’s replies, and it almost always corrects a mistake.” Simple, and it often works.

Why does this work at all? The model that made the error and the model checking for errors are the same weights, the same training. Part of the answer is attention allocation. When generating an answer, the model juggles understanding the problem, planning an approach, and producing coherent output simultaneously. When reviewing, it only needs to check whether existing work is correct. That’s a simpler task.

But reflection has a catch. The same Hacker News thread included a warning from dr_kiszonka: “it also corrects ‘mistakes’ if there aren’t any.” When you ask “are you sure?”, you’re implying doubt, and models are trained to address concerns. Sometimes that means changing a correct answer to an incorrect one just to seem helpful.

More sophisticated reflection prompts reduce this problem. Instead of vague doubt, try “review your solution step by step and verify each logical move” or “identify any assumptions you made that might not hold.” Give specific evaluation criteria rather than an open invitation to second-guess everything.

The Reflexion framework formalizes this into a loop: attempt, evaluate, reflect on what went wrong, attempt again with that reflection as context. The model generates a short explanation of why it likely failed, and that explanation becomes part of the context for the next attempt.

Meta-Prompting: Using AI to Write Your Prompts

Why write prompts yourself when the model can write them?

Meta-prompting asks the model to generate or improve prompts for a given task. You describe what you want to accomplish, and the model produces a prompt designed to accomplish it. Then you can ask it to critique and refine that prompt before you ever use it.

The technique emerged from an observation: models often know what makes a good prompt even when the user doesn’t. They’ve been trained on countless examples of effective instructions. Asking them to apply that knowledge to prompt design just makes that expertise accessible.

Stanford researchers published work on “Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding” that formalized these ideas. The technique provides advantages in token efficiency and allows fair comparison of different problem-solving approaches.

Not everyone is convinced. One Hacker News commenter, lexandstuff, was blunt about simpler approaches: “Role prompting is totally useless imo…Be clear with your requirements. Add examples, if necessary.” The skepticism has merit. Meta-prompting works best when you’re uncertain about prompt structure but clear about your goal. It’s less useful when your challenge is actually understanding what you want, or when domain-specific knowledge matters more than format.

Where meta-prompting shines: generating prompt variations to test, improving prompts that mostly work but feel clunky, learning what elements make prompts effective by examining the model’s suggestions.

Reasoning Models: These Patterns, But Built In

OpenAI’s o1 model and similar “reasoning models” from other labs are essentially baking these patterns into the model itself. Tree of thought. Self-consistency. Reflection. Chain-of-thought that actually backtracks.

A Hacker News discussion revealed the tradeoff. User arthurcolle noted that “they aren’t letting you see the useful chain of thought reasoning that is crucial to train a good model.” OpenAI hides the reasoning traces, showing only summaries. You get the benefits without understanding how the model arrived at its answer.

Reasoning models cost more and run slower than base models. For many tasks, they’re overkill. The prompting patterns in this article let you add reasoning capabilities selectively, only where they matter, at the cost level appropriate for each query.

Knowing When to Apply What

These techniques solve different problems. Mixing them up wastes tokens and time.

Self-consistency gives you confidence when you can afford multiple runs. Use it for math, logic, factual questions. Anything with a verifiable right answer benefits from the voting mechanism.

Tree of Thought earns its cost when problems have multiple valid approaches. Planning problems. Creative tasks where your first idea isn’t necessarily best. Puzzles that reward exploration.

Prompt chaining fits tasks with distinct phases. Complex workflows. Tasks mixing retrieval and reasoning. The key question is whether you’d naturally break this into steps if doing it manually.

Reflection adds verification when accuracy matters. Code generation. Logical arguments. Any output you’d naturally want to double-check. The technique is cheap, just one additional prompt, and often catches real errors.

Meta-prompting helps when you’re not sure how to prompt for a new type of task, or when you want to rapidly generate variations to test.

The real skill comes from combination. A production system might use prompt chaining to break down work, tree of thought for the planning stage, self-consistency for the final answer, and reflection to catch errors before output. Each technique addresses a different failure mode.

What This All Points Toward

Every technique here works around the same limitation: language models generate linearly and don’t naturally explore, verify, or backtrack.

Self-consistency adds exploration through multiple runs. Tree of Thought adds branching and pruning. Reflection adds verification. Prompt chaining adds decomposition.

The person who understands when to apply each isn’t collecting trivia. They’re learning to architect systems that think in different ways depending on what the problem requires. A Hacker News commenter, idopmstuff, reframed the skill well: “prompting is basically the same thing as writing requirements as a PM. You need to describe what you want with precision and the appropriate level of detail.”

The models will keep improving. The reasoning will move further inside the weights. But the core insight stays constant: different problems require different thinking structures. Knowing which structure fits which problem is the actual skill.

Ready For DatBot?

Use Gemini 2.5 Pro, Llama 4, DeepSeek R1, Claude 4, O3 and more in one place, and save time with dynamic prompts and automated workflows.

Top Articles

Come on in, the water's warm

See how much time DatBot.AI can save you