OpenAI’s API Pricing: Cost Breakdown for GPT-3.5, GPT-4 and GPT-4o


dida


As OpenAI advances in developing new GPT models, there is a growing interest among companies to explore the quality and cost, especially for applications like ChatGPT and chatbots. This exploration empowers businesses to make informed decisions based on price/performance metrics. By analyzing the nuances of each large language model, companies gain insights into the unique features and improvements of each iteration, particularly in content creation, summarization, handling unstructured text, and AI-generated content.

In the following, you'll find a deeper understanding of the various AI models offered through the OpenAI API.


What is an API?


An API, or Application Programming Interface, is a set of rules and protocols that allows one piece of software to interact with another. It defines the methods and data structures that developers use to communicate with an external system or service.

The OpenAI GPT API allows developers to integrate a GPT model's natural language processing capabilities into their applications, websites, or services.


What is the API pricing based on for GPT models?


The pricing of OpenAI's API is determined by the number of tokens you use. This naturally raises the question: “What exactly is a token?” According to the OpenAI pricing documentation, it is a "piece of a word used for natural language processing." Typically, one token is approximately 4 characters or 0.75 words.

To provide a clearer understanding, let's consider an example. The phrase “Hello, welcome to our website!” consists of 7 tokens. This token-based pricing model helps in precisely measuring and billing for the API usage.

However, there are few more points to consider about pricing: The difference between input and output pricing and the context window of the GPT model.


What are input/output tokens?


Input tokens are the text you provide to the model, while output tokens are the text the model generates in response. The cost is calculated by summing the amounts of both tokens. For example, if you send a prompt containing 10 units and receive a response containing 15 units, the total chargeable amount would be 25 tokens. Note also that output tokens are usually a few times more expensive than input tokens. 


What does context window mean?


The context window of a Generative Pre-trained Transformer refers to the number of preceding tokens that the model considers when generating or predicting the next token in a sequence. In simpler terms, it's the span of text or tokens that the model "looks at" to understand the context of the current token being processed. 

GPT-3.5 Turbo, an enhanced version of GPT-3.5, typically maintains a context window similar to GPT-3.5. It can handle sequences up to 2048 tokens, allowing it to capture longer-range dependencies in text for more coherent and contextually rich outputs.

GPT-4 supports sequences up to 4096 tokens in length, significantly expanding its span compared to earlier models like GPT-3.5 and GPT-3.5 Turbo. This enables GPT-4 to process even longer texts and dependencies, enhancing its capability for complex natural language processing tasks and generating more nuanced outputs.

GPT-4o, optimized for efficiency and improved performance, typically maintains a context window size similar to GPT-4. It balances computational efficiency with performance, ensuring effective handling of substantial text sequences while optimizing resource usage for various applications.


How does the context window affect quality and speed?


A larger window size typically results in higher quality output because the model can take more context into account when generating text. As a result, the generated text tends to be more contextually relevant and coherent.

It also has an effect on processing speed. Models with larger context windows may require more computational resources and time to process each token, resulting in slower inference speeds compared to models with smaller context windows. Efficiency-optimized models, such as GPT-4o, balance context window size with computational efficiency. They aim to provide competitive processing speeds while maintaining sufficient context to produce high-quality output.


OpenAI GPT API pricing: difference between input / output tokens


The pricing structure for OpenAI's GPT API versions — GPT-3.5, GPT-4, and GPT-4o — operates on a token-based model, with charges based on the number of tokens processed. As of the latest update, the pricing per 1 million input tokens is as follows:

  • GPT-3.5: 0.47 EUR

  • GPT-4: 27.9 EUR

  • GPT-4o: 4.65 EUR

For output tokens, the pricing per 1 million tokens is:

  • GPT-3.5: 1.40 EUR

  • GPT-4: 55.80 EUR

  • GPT-4o: 13.95 EUR

GPT pricing comparison by input and output tokens: GPT3.5 vs. GPT4 vs. GPT4o

This pricing approach ensures that users pay according to their actual usage, making it suitable for a wide range of applications from chatbots to content generation.


Using generative AI in document processing automation - an insightful example


To provide a deeper understanding of what 1 million tokens represent, let's consider a real-life example:

Suppose a company wants to automate its document processing using OCR (Optical Character Recognition) technology. OCR, powered by OpenAI's released models and trained on extensive training data, converts documents into meaningful data, leveraging advancements in large language models (LLM), fine-tuning, and embeddings. This integration saves time and reduces manual errors effectively.

Let’s make an example, based on a non-existing company, but with numbers we regularly deal with: A medium-sized company of 200-300 employees, with an active sales department processes around 4500 documents per month. Extrapolated for one year means the processing of approximately 55,000 documents annually, with each document averaging 2.6 pages. Based on OpenAI's current pricing, the cost per page by using diverse GPT model options is as follows:

  • GPT-3.5: 0.0023 EUR per page

  • GPT-4: 0.13 EUR per page

  • GPT-4o: 0.023 EUR per page

Comparing these prices, we can conclude that GPT-4 offers the most advanced capabilities at a higher cost, while GPT-4o provides a balance between performance and affordability. GPT-3.5 remains the most economical option, suitable for applications with budget constraints.

Considering this information, we can see how much a company needs to pay for GPT in a year. The following graph illustrates these costs based on data from June 2024 (approximate values):


We therefore recommend choosing the right GPT model, based on the use case and individual requirements. For some tasks, GPT-3.5 might be the most economical model, while providing enough accuracy, while for other tasks costs are not the priority and instead the maximum of performance is of highest importance (and GPT-4 poses the best choice). 

In addition to the costs associated with GPT, implementing OCR technology and cloud storage is necessary. However, these additional expenses are minimal compared to the powerful benefits and capabilities offered by GPT. 


Further resources


Are you currently interested in LLM-based solutions?