Many users of generative AI models hit a wall. They receive answers that are plausible but lack factual accuracy or analytical depth. This often happens because the model is limited to its pre-trained knowledge and linguistic capabilities. Two specific Google AI Studio tools, Code Execution and the Set Thinking Budget, offer a direct way to move beyond these limitations, giving you greater control over the model’s process and the quality of its output.

This guide explains what these two options do and how to use them to handle complex prompts and generate better results.

Key Takeaways

Code Execution: Use this when your prompt needs to perform math, analyze data, or check facts. It allows the AI to run Python code for accurate, computed answers.
Thinking Budget: This controls the AI’s planning and reasoning time. A higher budget leads to deeper analysis and better quality for complex prompts.
For Complex Prompts: Always use a single, high thinking budget (16k or 32k tokens) for multi-step tasks. Do not lower it between steps.
32k vs. 16k Budget: A 32k budget provides a deeper strategic synthesis and better adherence to rules. Use it when quality is the absolute priority.

The Code Execution Option: Giving AI a Calculator and a Compiler

The Code Execution option gives Gemini models the ability to write and run Python code in a secure, isolated environment. This tool fundamentally changes the model from a pure language processor into a problem-solver that can perform computations.

Code Execution = The model can run its own Python code to verify facts and perform calculations.

When you enable this tool, the model can write code to handle tasks that language alone cannot solve. This includes:

Performing accurate mathematical calculations.
Analyzing data from uploaded files (like CSVs).
Solving complex scientific or engineering problems.
Generating data visualizations.

“But isn’t it risky to let an AI run code on its own?”

This is a common question. The process is managed within a sandboxed environment, meaning the code execution is isolated from your local machine and other systems. It cannot access files or network resources outside of its designated workspace.

Consider asking a model to analyze sales figures from a spreadsheet. Without code execution, the model can only comment on the data you paste into the prompt.

With code execution enabled, it can write and run Python scripts using libraries like Pandas to calculate growth percentages, identify the best-performing quarter, and generate the underlying data for a chart.

The result is an answer based on computation, not just linguistic pattern matching.

Quick Guide: Which Tool Do You Need?

Step 1: Does your prompt require computation?

➡️ YES: Enable Code Execution if it involves math, data analysis, or fact-checking numbers.

➡️ NO: Proceed to the next step.

Step 2: Is your prompt simple or complex?

➡️ YES: Increase the Set Thinking Budget if it has multiple steps or requires deep analysis.

↳ Is maximum quality your goal? Use 32,768 tokens.

↳ Is it a moderately complex task? Start with 16,384 tokens.

➡️ NO: Proceed with the default settings if it’s a single, straightforward request.

The Set Thinking Budget: Controlling the AI’s Internal Reasoning

Before a Gemini model writes a single word of its final answer, it can go through an internal reasoning process. It breaks down the prompt, formulates a plan, and considers different angles.

The “Set Thinking Budget” option allows you to control how much effort, measured in tokens, the model dedicates to this planning phase.

Think of it as a control knob for the model’s reasoning depth. A higher budget gives the model a larger “mental workspace” to analyze complex instructions before generating a response.

What happens when the budget is too low for a complicated prompt?

The model might misunderstand the nuances of the request, skip a step in a multi-part instruction, or provide a superficial answer because it didn’t have enough “cognitive” space to fully process the problem.

How to Budget for Complex, Multi-Step Prompts

A common question arises when dealing with prompts that have multiple parts, like an analysis step followed by an outline step: “Is it better to use a high budget for the research part and a lower one for the writing part?”

The answer is no. It is better to use one consistently high thinking budget for the entire prompt.

The model processes the whole set of instructions holistically to form a single, coherent plan. The quality of the final outline is directly dependent on how well it reasoned through the initial analysis.

Lowering the budget midway would be like telling an architect to do a deep analysis of the building site but then rush through the actual blueprint design.

The final output would become disconnected from the foundational research. For prompts where each step builds on the last, a single, large budget is required to maintain context and ensure a high-quality, integrated result.

The Practical Difference: 16k vs. 32k Tokens

For a truly complex project-in-a-prompt, what is the actual difference between a high budget (e.g., 16,384 tokens) and a maximum one (e.g., 32,768 tokens)?

The distinction is not about the length of the output, but the depth of the reasoning. (Ed. note: This is one of the most direct controls a user has over the quality of the model’s strategic thinking.)

Imagine giving a strategist a large whiteboard (a 16k budget) versus a full “war room” (a 32k budget). Both will produce a good plan, but the war room allows for a deeper level of work.

Feature	16,384 Token Budget (High)	32,768 Token Budget (Maximum)
Best For	Moderately complex, multi-step tasks.	Highly complex, strategic projects where quality is the top priority.
Reasoning Depth	Good. Connects prompt steps logically.	Excellent. Finds nuanced, multi-layered connections between steps.
Output Detail	Creates a solid, well-structured output.	Justifies its own recommendations and adds strategic notes.
Rule Adherence	Follows most negative constraints (e.g., banned words).	Offers superior adherence to all constraints, even in complex parts.
Analogy	A strategist with a large whiteboard.	A strategist with an entire “war room.”

Here is what a 32k budget does better:

More Sophisticated Synthesis. With more cognitive space, the model can draw more nuanced connections between the steps of your prompt. Instead of just stating, “Competitors use an informal tone, so our tone should be informal,” it can reason that a blend of informal and authoritative tones is better because of specific user questions it identified in its analysis.
More Granular, Justified Actions. An outline point might change from “Nutritional Comparison” to “Detailed Nutritional Showdown (Table Format).” The higher-budget model can add an internal justification: “Top competitors bury this info; placing a clear table upfront will address direct user intent for scannable data.”
Stricter Adherence to Constraints. If your prompt includes many negative constraints (like a long list of banned words), a larger budget provides more resources for the model to “self-correct” and check its own work against your rules, resulting in a cleaner output.
Reduced Instruction “Drift.” On very long prompts, models can sometimes lose track of earlier instructions. A maximum budget acts as a better “working memory,” ensuring the first analytical point is still strongly connected to the very last point in the outline.

In short, you choose the 32,768 budget when maximum strategic depth is the goal. For complex, multi-step tasks, it empowers the model to move beyond just executing instructions to acting like a true strategist.

Mastering the Code Execution and Set Thinking Budget tools moves you from being a passive prompter to an active director of the AI’s workflow. By giving the model the ability to compute and granting it the resources to think deeply, you can push beyond generic content and start producing more accurate, well-reasoned, and valuable outputs.