Stop paying surgeon rates for a nail file.
How to get exactly what you need out of AI without burning through your plan. Plain language. No jargon. For business owners and everyday users alike.
A field guide from BAMPT | This Week in AI | Build AI for your business, including your own.
Start Here
The whole guide in one idea.
Every AI tool charges you in tokens. A token is just a chunk of a word. Roughly speaking, every word you type in and every word it sends back nudges a meter. You usually do not see the meter on a monthly plan, but it is running. Two things quietly drain it faster than people expect: using a more powerful model than the task needs, and dragging a long conversation behind you. Fix those two habits and you get most of the savings with none of the fuss.
Bigger is not better. It is just bigger.
The flagship models cost more per word, and some count double against your plan. Most everyday work does not need them. Reaching for the most powerful model by default is the single most common way people overspend.
Long chats get expensive and sloppy.
Every message in a long thread carries the entire conversation with it. That costs more each turn, and the model also gets less sharp as the pile grows. A fresh start is often the smarter move.
You would never hire a surgeon to file your nails. So stop assigning your most powerful, most expensive model to your simplest tasks.
Match the tool to the job, not the job to the shiniest new tool. That one habit does most of the work.
Three words, translated
- Token.
- A small piece of a word. AI tools count tokens going in and coming out, and that count is what you are spending.
- Context window.
- How much the model can hold in its head at once. A long chat fills it up, which costs more and muddies the answers.
- Model tier.
- The fast-and-cheap, the everyday, and the heavy-reasoning versions of a tool. Same family, different jobs.
The Core Move
Match the model to the task.
Claude, ChatGPT, and Gemini all stack their models the same way: a fast lightweight one, a balanced everyday one, and stronger reasoning models above them. The names below are Claude's, but the logic carries across every major tool. Start in the middle, drop down when the task is simple, and climb only as high as the work actually demands.
Quick questions, basic summaries, simple rewrites, fixing tone or grammar, short translations, first-draft brainstorms. The most efficient on your plan, and plenty for anything obvious.
Your default for almost everything. Writing and editing, research, analysis, building documents and spreadsheets, working through a problem, most coding. Most tasks never outgrow it. When unsure, start here.
Step up here only when the everyday model struggles: dense analysis, harder strategy, tricky problems where the stakes are higher. Available on the paid plans. Think "Sonnet was not quite enough," not "always use the strongest."
Reserved for the biggest, hardest jobs only. The newest flagship is the most powerful and most expensive model sold, and it counts as double usage. A deliberate choice for rare heavy lifting, never a habit. Most people will rarely need it.
Heads up on Fable's access. On the paid plans, Fable is included only through June 22, then it moves behind paid usage credits, and the free plan does not get it at all. The practical read: do not build a workflow that depends on it until access settles. Treat it as a specialist you call in, not a default you lean on.
A quick test before you climb a tier. Run the task on the everyday model first. If the answer is good enough, you are done. Only step up when you can point to something specific it got wrong. Upgrade on evidence, not on instinct.
Everyday Habits
Seven habits that save tokens automatically.
None of these require technical skill. They are just better defaults. Adopt three and you will feel the difference in how far your plan stretches.
Default to the everyday model.
Make the balanced tier your starting point and only reach higher when a task earns it. This one habit out-saves all the others combined.
Start a fresh chat for a new topic.
A long thread re-sends its whole history with every message. When you switch subjects, open a clean conversation instead of piling on.
Say what you want up front.
Vague prompts cause back-and-forth, and every round trip costs tokens. State the goal, the format, and the length in your first message.
Ask for the length you actually need.
"Give me three bullet points" beats a five-paragraph essay you have to trim. You pay for every word it generates, used or not.
Do not paste more than the question needs.
Dropping a forty-page document in to ask one small question makes you carry that whole document every turn. Paste the relevant page, not the binder.
Reuse prompts that work.
Keep a simple notes file of prompts that got great results. Reusing a proven prompt beats re-explaining yourself from scratch each time.
Switch off heavy modes for simple asks.
Deep research and extended thinking modes are wonderful for hard problems and wasteful for easy ones. Turn them on deliberately, not by default.
Level Up
Smarter moves for heavier users.
If AI is part of how your business runs, these go a step further. The first one matters most, because it is the habit almost nobody builds until their answers start getting worse.
Build a "tell me when you are getting full" habit.
As a chat grows, the model holds more and thinks less clearly. Ask it directly: "If this conversation gets long enough that you might lose track, tell me and summarize where we are." It will flag the moment to reset before quality quietly drops.
Use the summarize-and-restart pattern.
When a long session starts drifting or repeating, ask for a tight summary of what you have decided so far. Paste that into a fresh chat. You keep the thread and drop the dead weight.
Let projects and saved instructions carry the context.
Most tools let you store standing instructions or a project space so you are not re-pasting the same background every time. Set it once, stop paying to repeat yourself.
Batch related questions into one prompt.
Ten little follow-ups each re-send the whole conversation. One well-structured message asking for all ten things is dramatically cheaper and usually clearer.
If you build automations, route by task.
For anyone working with the API or no-code tools: send the easy steps like sorting and tagging to a cheap model, and reserve the strong model for the one hard step. Reusing a fixed set of instructions across calls, often called prompt caching, can cut costs sharply.
Know what is better left analog.
Not everything should be automated. Some judgment calls, sensitive client conversations, and quick gut-check tasks cost more time to delegate than to just do. Spending zero tokens is always the cheapest option.
Before You Hit Send
The 60-second token audit.
Run this on your next AI session.
Five questions. If you answer yes to any, you have an easy save sitting right there.
Am I using the flagship model for something a lighter one could do? Drop down a tier.
Is this chat really long? Summarize what matters and start fresh.
Did I paste a huge document to ask one small question? Trim it to what counts.
Did I tell it the format and length I want? Save yourself a round trip.
Is a heavy mode running (deep research, extended thinking) that this task does not need?
The People Behind It
Who made this.
Who made this
This guide comes from BAMPT, where we build AI automation systems for service businesses, including our own. We are practitioners first. Everything here is what we actually do, not theory.
It is written by Chantal Emmanuel, co-founder of BAMPT and CTO of Gatheron, who breaks down what actually matters in AI each week for business owners who are curious but tired of the hype. No breathless launches. Just the practical read on what it means for your business.
Want the weekly version? Find This Week in AI wherever you already follow along, and the long-form breakdowns on Substack.