The video begins by introducing prompt engineering as strategies to maximize AI performance. Matthew Berman explains that when interacting with models like ChatGPT, Gemini, or Claude, users provide natural language inputs and receive outputs, with the model predicting what the output should be based on the prompt.
The effectiveness of these models heavily depends on how prompts are structured, what words are used, and what examples are provided. The fundamentals of LLMs as prediction engines are explained, showing how they take sequential text as input and predict subsequent tokens based on training data, with prompt engineering being the process of designing high-quality prompts to guide these models toward accurate outputs.
- Prompt engineering involves strategies to optimize AI outputs through effective prompt construction.
- LLMs work by predicting the next token (roughly equivalent to a word) in a sequence based on training data.
- The quality of prompts directly affects the quality of AI-generated outputs.
- Even simple prompt modifications constitute prompt engineering when they improve results.
This section covers the fundamental parameters that affect how LLMs process and generate outputs. Output length controls the maximum number of tokens a model should output, though importantly, reducing this setting doesn't make responses more succinct—it simply causes the model to stop generating once the limit is reached.
Sampling controls are explained in detail, particularly temperature, which affects output randomness and creativity. With examples from Google AI Studio, Matthew demonstrates how higher temperature settings (near 1.0) produce more varied, creative responses, while lower settings (near 0) generate more consistent, predictable outputs. He also briefly covers top-K and top-P settings, which similarly affect output variety and creativity.
- Output length limits how many tokens a model generates but doesn't make the model write more succinctly.
- Temperature controls creativity and randomness in outputs—higher values (0.7-1.0) increase creativity while lower values (0-0.3) increase consistency.
- Top-K and top-P are additional sampling controls that affect output variety, though temperature is typically the most commonly adjusted parameter.
- Optimal settings depend on use case—creative writing benefits from higher temperatures while factual responses need lower temperatures.
This section explores foundational prompting approaches, starting with zero-shot prompting—the simplest technique where users provide only a description of the task without examples. Matthew explains that the complexity of the task typically determines how many examples are needed, with simpler tasks often working well with zero-shot prompting.
The video then progresses to one-shot and few-shot prompting, where one or more examples are provided to guide the model. These techniques help the model understand the desired output format and pattern. Using a pizza order JSON parsing example, Matthew demonstrates how providing examples helps ensure consistent output structures across multiple interactions, making few-shot particularly valuable when specific formatting is required.
- Zero-shot prompting provides only task descriptions without examples, suitable for simpler tasks.
- One-shot and few-shot prompting include examples that help models understand desired output patterns.
- Few-shot prompting is especially valuable when consistent output formatting is required.
- Generally, 3-5 examples are recommended for few-shot prompting, though this varies with task complexity.
This portion of the video covers techniques that establish specific roles or contexts for the AI model. System prompting sets the overall context and purpose, defining the big picture of what the model should accomplish. Contextual prompting provides specific background information relevant to the current task, helping the model understand nuances and tailor its responses accordingly.
Role prompting assigns specific characters or identities for the model to adopt, which helps generate responses consistent with that role's knowledge and behavior. Matthew highlights how frameworks like Crew AI effectively implement role prompting by defining agent functions, expertise, goals, and even backstories to enrich interactions. Through examples like having the model act as a travel guide, he demonstrates how these techniques can generate more targeted and appropriate responses.
- System prompting establishes overall context and purpose for the model's task.
- Contextual prompting provides background information to help the model understand task nuances.
- Role prompting assigns specific identities to the model, generating responses consistent with that role.
- Defining goals, expertise, and backstories enhances role-based interactions.
Matthew introduces step-back prompting, a technique where the model first considers a general question related to the specific task before addressing the actual problem. This approach activates relevant background knowledge and reasoning processes, helping generate more accurate and insightful responses.
Through a practical example of creating a video game storyline, he demonstrates how step-back prompting produces more creative and detailed results compared to direct prompting. By first asking the model to identify key settings that contribute to engaging first-person shooter game levels, then using that output as context for the specific task, the resulting storyline becomes more nuanced and well-developed.
- Step-back prompting involves asking general questions before addressing specific tasks.
- This technique activates broader knowledge and reasoning within the model's parameters.
- It's particularly effective for creative tasks that benefit from deeper context.
- The approach helps overcome generic responses common in creative writing prompts.
This section covers chain-of-thought prompting, a technique that dramatically improved LLM performance by encouraging models to break down their reasoning process. Matthew explains that while many modern models have this capability built in (often called "thinking mode" or "test time compute"), explicitly adding phrases like "think step by step" to prompts can significantly improve outputs, especially for smaller or older models.
Through several mathematical examples, he demonstrates how this technique helps models arrive at correct answers by working through problems methodically. He also shows how chain-of-thought can be combined with few-shot prompting by providing examples of step-by-step reasoning, which the model then mimics in its own response. This approach is particularly powerful for STEM topics, logic problems, and reasoning tasks.
- Chain-of-thought prompting encourages models to break down reasoning into step-by-step processes.
- Many newer models have this capability built in, but explicitly requesting it still benefits older or smaller models.
- This technique dramatically improves accuracy for math, science, logic, and reasoning tasks.
- Chain-of-thought can be combined with few-shot prompting for even better results.
- Adding simple phrases like "think step by step" can significantly enhance model outputs.
This section explores more advanced prompting techniques that build upon chain-of-thought. Self-consistency involves running the same prompt against a model multiple times and then using majority voting to determine the best answer. Matthew explains that this improves accuracy by generating diverse reasoning paths and selecting the most consistent result, though it comes with higher costs and latency.
Tree of thoughts takes this concept further by exploring multiple reasoning paths simultaneously rather than following a single linear chain. This technique involves testing different outputs at each step and selecting the most promising ones to continue the reasoning process. Matthew notes that implementing tree of thoughts typically requires code or frameworks rather than manual prompting, making it best suited for complex tasks requiring extensive exploration.
- Self-consistency improves accuracy by running the same prompt multiple times and selecting the most common answer.
- This technique generates diverse reasoning paths but increases costs and latency.
- Tree of thoughts explores multiple reasoning branches simultaneously, selecting optimal paths at each step.
- These advanced techniques are best implemented programmatically rather than through manual prompting.
- They're particularly valuable for complex tasks requiring sophisticated reasoning.
Matthew introduces ReAct (Reason and Act), a paradigm that combines natural language reasoning with external tools to solve complex tasks. This approach effectively functions as an agentic framework, giving the model the ability to reason about problems, generate plans, execute actions, and observe results in a continuous loop.
He explains how modern frontier models often have this capability built in through tool integration (like Google search, code execution, and function calling), but notes that these premium models come with higher costs and latency. Through a Python code example using LangChain, he demonstrates how ReAct can be implemented on smaller models, allowing them to break down complex tasks (like counting Metallica band members' children) by planning searches, executing them sequentially, and compiling results.
- ReAct combines reasoning with external tools in a thought-action loop.
- This approach enables models to plan actions, execute them, and observe results.
- Modern premium models often have ReAct capabilities built in through tool integration.
- Frameworks like LangChain and Crew AI make implementing ReAct more accessible.
- This technique significantly expands what models can accomplish by connecting them to external resources.
This section covers two advanced approaches: having AI generate prompts and using code execution for precise tasks. Automatic prompt engineering involves asking models to create detailed prompts based on simple descriptions. Matthew shares his workflow of asking AI to write product requirement documents (PRDs) from brief descriptions, then using those detailed PRDs to generate code—effectively delegating the detailed prompt writing to AI itself.
He also introduces a technique he uses for accuracy-critical tasks: prompting using code. Through examples like counting letters in words, he demonstrates how having models write and execute code often produces more accurate results than natural language reasoning alone. This approach leverages the model's superior code writing capabilities for tasks that can be programmatically verified.
- Automatic prompt engineering delegates prompt creation to AI, saving time on detailed requirement writing.
- This technique can apply any prompting strategy by asking AI to implement it based on simple instructions.
- For precision tasks, having models write and execute code often yields more accurate results than text-only responses.
- Code-based solutions are particularly valuable for tasks with verifiable correct answers.
The final section outlines best practices for effective prompt engineering. Matthew emphasizes the importance of providing examples when consistent outputs are needed, designing with simplicity by starting with basic prompts and only adding complexity when necessary, and being specific about expected outputs to avoid ambiguity.
Additional recommendations include using instructions over constraints (stating what to do rather than what not to do), controlling token length for production use cases, using variables in prompts for programmatic insertion, and staying updated on model capabilities and limitations. Matthew concludes by acknowledging Google's prompt engineering guide and encouraging viewers to subscribe to his channel and newsletter for ongoing AI updates.
- Provide examples when consistent output formats are needed.
- Start with simple prompts and only add complexity when necessary.
- Be specific about desired output formats to avoid ambiguity.
- Use instructions (what to do) rather than constraints (what not to do).
- For production use cases, optimize token length and use variables for programmatic prompting.
Prompt engineering represents the essential bridge between human intent and AI capability, evolving from simple tricks to a sophisticated discipline that dramatically enhances what we can accomplish with large language models. As this video demonstrates, mastering various prompting techniques—from basic zero-shot approaches to advanced methods like chain-of-thought, self-consistency, and ReAct—enables users to extract significantly better performance even from smaller or older models.
Perhaps most importantly, effective prompt engineering isn't about complexity but rather about strategic design. Starting with simple approaches, providing clear examples, being specific about desired outputs, and understanding each model's unique capabilities allows users to achieve optimal results without unnecessary complexity. As AI models continue to evolve, these techniques will remain valuable even as some become integrated directly into model capabilities.
So what? The practical impact is that prompt engineering democratizes AI capability, allowing anyone—not just technical experts—to dramatically improve their AI interactions. Whether you're using frontier models or more accessible alternatives, these techniques enable you to overcome limitations, reduce costs, and achieve more accurate, creative, and useful outputs across virtually any use case. By implementing these strategies and staying current with evolving capabilities, you can position yourself at the forefront of effective AI utilization.