How to handle AI usage limits

Pros share what to do when you’re running out of tokens.

May 5, 2026

• 4 min read

TOPICS: IT Strategy / Digital Transformation / AI Transformation

When we talked with CJ Combs, he had 13 hours and 54 minutes before he could use his AI tool of choice, Claude Design, and start working again.

The senior AI business consultant, who focuses on AI and data for Columbus Global, employs Claude Design—a product currently in research preview—to support digital aspects of his business, like his website and marketing materials. He also uses Claude Cowork to run tasks like pulling together reports and organizing files at scheduled times. That means he’s sometimes hitting his limits of sessions and tokens.

AI users and IT pros like Combs face session limits and weekly caps, and must decide if they want to pay extra to extend a limit, wait out the time, or find an alternative AI tool. The decision isn’t easy when your AI-heavy workflow is on hold.

Token for granted. Tokens are tiny units of data processed by large language models. That processing isn’t free, even if it seems zero-cost to the user (like in the free tiers offered by many AI services).

Today’s agentic coding tools, which can spin out an entire software program from a single prompt, use thousands of tokens. A New York Times report in March found that some companies are listing their “tokenmaxxers,” meaning those who use the most AI tokens, on internal leaderboards, and rewarding the “winners”; other tech leaders, conversely, are concerned with the “gorging” of resources and the racking up of huge usage bills.

“The concern that is happening in 2026 pretty quickly is that, in many organizations, they’re finding that their budget they had allocated for token usage…has gone through the roof, so much so that they are eating up their entire 2026 allocation within the first two-to-three months of use,” Rodrigo Madanes, global next frontier technology and AI Leader at professional-services firm EY, told us.

There are choices to consider, as tools like ChatGPT have both pay-as-you-go and subscription options. Whatever they choose, organizations will likely have to figure out two aspects of the tokenmaxxing strategy: maximizing current usage or finding budget for more tokens.

To maximize usage…Shanti Greene, head of data science and AI innovation for enterprise AI solutions company AnswerRocket, offered efficiency-minded recommendations:

Avoid sending unnecessary information in an input—say, an entire codebase—where lots of context is unneeded.
Limit the output to decrease cost and output tokens. (For text summaries, he said, you can even ask the LLM to be less verbose.)
Use models’ prompt caching capabilities to prevent context from being sent repeatedly and configure how long prompt contexts are cached.

Madanes offered similar tips, including starting a new chat so new queries don’t incorporate previous, token-heavy details. He also recommended making sure you’re using the right model for the job. Perhaps a code-building tool requires the most sophisticated LLM, Madanes said, but other tasks—“like simple editing of a single page of code”—might require a less powerful, cheaper one. (LLM routers today aim to automate the matching of models to tasks.)

“Most of the time users haven’t even realized how many tokens they’re consuming with some of the tasks they’re doing,” Madanes said.

Before finding a budget for more tokens…Madanes recommends companies use model dashboards to map where tokens are going.

Greene, too, keeps an eye on his teams’ Anthropic and OpenAI usage; he said he occasionally has to watch for overages and then decide if an overage means a plan increase is called for, or if the instance is a one-time occurrence and there’s no need for further action.

A recent Gallup poll found that half of US adults are using AI at work for tasks like consolidating data, generating ideas, and automation. According to our own IT Brew survey, 63% of respondents said their IT teams are using AI for code generation, documentation, or other functions. With a wide range of tasks and models, cost calculations can become difficult.

Waiting game. Sometimes Combs will wait the half day until a new set of fresh tokens and sessions arrive; sometimes he needs to make an emergency-level update to his website that requires him to dip into his bank of “extra usage” funds. At other moments, he moves the work to a different platform entirely, like OpenAI’s Codex or Google’s AI Studio, which can feel a bit like starting from scratch.

For Combs and likely others, AI has flipped pricing models away from straightforward licenses to unpredictable meters.

“This is the pain point that’s in our faces right now. And I have to make a financial decision,” Combs said.

About the author

Billy Hurley

Billy Hurley has been a reporter with IT Brew since 2022. He writes stories about cybersecurity threats, AI developments, and IT strategies.

Top insights for IT pros

Billy Hurley

Top insights for IT pros