AI optimization issue

No max_tokens limit in n8n

Why missing token limits on AI nodes can lead to runaway costs

What is this issue?

When AI nodes don't specify a max_tokens parameter, the model can generate responses as long as its context window allows. This means a simple request could result in 4K, 8K, or even 128K output tokens—with costs to match.

Scenarios without limits:

  • GPT-4 generating 10,000 tokens for a summary that needed 100
  • Claude producing full essays when you wanted bullet points
  • Repeated calls in loops generating enormous outputs
  • No limit + high temperature = verbose, wandering responses

Why is this dangerous?

Unpredictable costs

A workflow that costs $0.10 normally could cost $10 if the model decides to be verbose.

Rate limit exhaustion

Large responses consume rate limits faster, potentially blocking other requests.

Processing delays

Generating 10K tokens takes much longer than 100 tokens, slowing your workflow.

Budget overruns

Loops or high-volume workflows can quickly exceed your AI budget.

How to fix it

  1. 1

    Set explicit max_tokens

    Configure max_tokens in your AI node options based on expected output length.

  2. 2

    Match limit to use case

    Classification: 10-50 tokens. Summaries: 100-500. Longer content: 500-2000.

  3. 3

    Monitor token usage

    Track actual token usage to refine limits and catch anomalies.

  4. 4

    Add cost alerts

    Set up billing alerts with your AI provider to catch unexpected spikes.

Scan your workflow now

Upload your n8n workflow JSON and detect AI nodes missing max_tokens configuration.

Scan for AI optimization issues

Related resources

Related AI optimization issues