LLM strategies part 1: Possibilities of implementing Large Language Models in your organization
David Berscheid
Large Language Models (LLMs) are a highly discussed topic in current strategy meetings of organizations across all industries. This article is the first part of two, providing some guidelines for organizations to determine their LLM strategy. It will help you identify the strategy with the most benefits while finding ways of solving associated complexities.
For more content on LLMs, see our LLM hub.
Overview of LLM strategies
For the sake of simplicity, we will break down the possibilities of LLM strategies into four major ones:
Strategy A: “Closed source, pre-trained LLM API - no customization”
Strategy B: “Closed source, pre-trained LLM API - with customization”
Strategy C: “Open source, pre-trained LLM - with customization”
Strategy D: “LLM from scratch - full custom development”
(Note that we did not mention UI-based tools like ChatGPT, as our focus for this strategy break-down is a layer below the UI layer)
Each of these strategies is characterized by the following aspects:
Licensing and Accessibility: Closed source vs. open source
Closed source LLMs are proprietary models, where users get access through an API interface. Every technical specification underneath the API remains out of control.
Open-source models can typically be downloaded from public sources and can have a license, which allows organizations to use them commercially (i.e. MIT license). Model layers and model weights are accessible.
Pretraining: pre-trained LLMs vs. training from scratch
Large Language Models gain their impressive capabilities through an immensely high number of training iterations, simply trying to predict the next token given the previous tokens, using terabytes of data (web data, documents, books, etc.).
Pre-trained LLMs have already been trained to predict billions of tokens, and have optimized their model weights accordingly, whereas for an LLM from scratch, you would start from randomly initialized model weights. The latter option is - as you will see - very rare and most likely not relevant for your organization.
Customization: yes vs. no
One main element for customizing an LLM is called finetuning (Retrieval augmented Generation - RAG - might also be considered an element of customization, but shall not be discussed in this article. Read our blog article about extending the knowledge of your LLM with RAG instead).
Finetuning allows to customize generic LLMs to a more specific and more relevant vocabulary, knowledge, or capability. Organizations in certain industries will want to make use of this feature.
The image below visually places the strategies in a matrix with “Control” on the x-axis - meaning the level of control an organization has over the data and the model, in terms of influencing its outputs - and “Complexity” on the y-axis - meaning the technical difficulties of the setup, maintenance, and usage of respective models, but also staff and financial aspects.
This simplified graphic describes how the aspects of control and complexity seem highly correlated and that organizations will need to determine which level of (data and model) control they need while finding a way to handle the corresponding (technical, financial, labor) complexities of Large Language Models.
Corner strategies
As in an organizational context, the optimal choice for most organizations will lie somewhere in the center. We will introduce both corner strategies A and D only briefly and spend more time on the intermediary and more balanced strategies B and C, which will be relevant to almost every organization of a large enough size.
Strategy A: “Closed source, pre-trained LLM API - no customization”
Examples: GPT4 API | Gemini API
Organizations that choose to use API-based LLM services and are satisfied not making use of possible customization features, enjoy fast and solid results for generic requests. For some use cases which show similarity to publicly available web data, this strategy can suffice. Note that it technically is possible to customize outputs in this strategy, but that the organization simply does not want to or need it.
Furthermore and to be precise: API-accessible models that are not finetuned on an organization’s dataset or usecase (thus the “no customization” framing) will still have been “chat-finetuned”, to generate high quality chat conversations.
Strategy D: “LLM from scratch - full custom development”
Example: BloombergGPT
Organizations like Bloomberg that possess a vast amount of valuable data, possibly even some sort of “data monopoly” can consider this strategy. In Bloomberg's case, the company collected financial data since the 1980s and controls approx. 1/3rd of the financial data market (source).
Combine this type of proprietary data access with access to excellent machine learning scientists and access to large amounts of compute resources, it can be an attractive investment to follow this strategy.
Central strategies: Customization of LLMs
As the vast majority of readers and their organizations will be aiming at a more balanced strategy, we will take a look at strategies B and C in more detail now.
Strategy B: “Closed source, pre-trained LLM API - with customization”
Example: finetuned GPT3.5 / 4 API or finetuned Gemini API
Utilizing a closed-source, pre-trained LLM such as a finetuned version of GPT 3.5 / 4 primarily involves interacting with an API. In this scenario, the LLM provider is responsible for customizing (finetuning) the model for the organization, allowing them to receive tailored model responses and capabilities to suit specific domains or applications.
Finetuning is done by training the model on a custom dataset provided by the user, enabling it to adapt to particular language styles, terminologies, or content.
The organization has no direct access to the model's architecture, weights, or finetuning procedure, as these elements are proprietary. This black-box approach means that the training procedure is not known or accessible for modification.
The following is a list of arguments for and against this strategy.
Pro arguments:
Ease of use: Closed-source LLMs are typically user-friendly, requiring minimal technical expertise to integrate and use. The provider handles all aspects of running and updating the model, ensuring that it remains state-of-the-art with only minimal effort from the user (i.e. initiating a retraining of the model by providing new data).
Consistent performance and reliability: As the provider centrally manages the model, users can expect consistent performance and reliability. Any updates or improvements are automatically integrated, ensuring a stable and continuously evolving service.
Scalability and support: These models are designed to handle large volumes of requests, offering scalability for growing business needs. Users also benefit from professional support and service guarantees.
Con arguments:
Data privacy: Companies might not be allowed to share sensitive data with a LLM provider. Depending on the secrecy requirements of different departments, certain LLM providers might provide enough privacy regulations, while e.g. a pharmaceutical R&D department might not be able to send data outside their servers, let alone use in environments, which are not considered air-gapped.
Limited transparency and control: Users have limited understanding and control over the model's functioning and training, which can be a drawback for those who require deep customization or have specific ethical considerations.
Dependency on provider: Users are dependent on the provider for all aspects of the service, including pricing, availability, and any changes in policy or service terms.
Cost: Closed-source solutions can be more expensive than open-source alternatives, especially for large-scale or high-frequency use cases, as users typically pay by usage.
Strategy C: “Open source, pre-trained LLM - with customization”
Example: self-hosted Llama2
Strategy C involves using an open-source, pre-trained LLM like Llama2, which users can host and manage themselves. Open-source LLMs offer transparency and flexibility, as users have access to the model's architecture and weights. This level of access allows for a deep understanding of the model's inner workings and the ability to modify or extend the model as needed.
In this strategy, while the model comes pre-trained, users can further customize it by training it on their datasets. This additional training can tailor the model's outputs to specific domains, styles, or formats. Open-source models like Llama2 are particularly appealing to those who have the technical expertise and resources to manage and maintain their models while benefiting from the flexibility of customization.
The following is a list of arguments for and against this strategy.
Pro arguments:
Transparency and Control: Users have complete access to the model's architecture and knowledge about finetuning procedures, allowing for greater understanding and control over its functionality and outputs.
Customizability: There is a high degree of flexibility in customizing the model. Users can retrain or finetune it on specific datasets to suit their unique requirements.
Cost-Effectiveness: Open source models can be more cost-effective, especially for organizations with the capability to host and manage the models themselves.
Community Support: Open source projects often have active communities, providing support, tools, and shared knowledge that can be valuable in optimizing and maintaining the model. While closed-source models currently have a head-start in terms of performance, open-source models are catching up quickly.
Con arguments:
Technical Expertise Required: Implementing and hosting an open-source LLM requires significant technical expertise and resources, which might not be feasible for all organizations.
Maintenance and Scalability Challenges: Users are responsible for maintaining the model, including updates, scaling, and troubleshooting, which is resource-intensive.
Variable Performance: Since users customize the model, its performance can vary greatly depending on the quality and relevance of the training data and the skill level of those managing it.
Security and Compliance Responsibility: Users must ensure that their implementation complies with relevant data privacy and security regulations, which can be a complex and ongoing task.
Remark: External expertise
If an organization does not feel comfortable implementing an LLM strategy themselves, it makes sense to get support from external providers. Finding a suitable provider is worth an article in its own right, but here are two quick examples:
At dida, we are a highly specialized provider that focuses entirely on customized machine learning solutions for medium to large enterprises - so we would consider ourselves a good choice for organizations that want to pursue strategy C ("open source, pre-trained LLM - with customization").
If you’re thinking about pursuing strategy A or B (“closed source, pre-trained LLM API - with or without customization”) then we would point towards a provider like Startup Creator.
The AI agency Startup Creator is an excellent example of integrating closed-source LLMs into business operations. With a more general software focus but also on custom AI solutions for startups and SMEs, they've shown how LLMs can enhance communication processes, improve customer interactions, and provide data-driven insights for such clients. Their projects illustrate the practical application and benefits of AI in various industries, positioning them as a great choice for LLM implementations for startups and SMEs.
Conclusion
In this first part of our two-part LLM strategies series, we introduced a simplified concept for 4 possible LLM strategies an organization can pursue.
In this opinionated presentation, we focused on two main strategies that we believe to be the only relevant ones for the majority of readers: customizing a closed-source LLM, trading transparency and flexibility for comfort (strategy B), and finetuning plus hosting an open-source LLM yourself, enjoying more power, potentially higher cost-efficiencies, while also having to manage a higher level of technical complexity (strategy C).
Now that we have introduced the different strategies, in the second part we will present a way how to decide, which strategies will be the right one to bet on for your organization.