Published on 00/00/0000
Last updated on 00/00/0000
Published on 00/00/0000
Last updated on 00/00/0000
Share
Share
INSIGHTS
8 min read
Share
Artificial intelligence (AI) practitioners can think of AI prompting techniques as similar to how people might learn a new task. In a post on X, computer scientist Andrej Karpathy explains that zero-shot prompting is like learning from a description of a task while few-shot prompting is like providing examples of how to complete a task. A scenario where the learner can practice how to do a task, Karpathy argues, is analogous to fine-tuning.
For enterprise AI, a technique like fine-tuning is often required to take a generic large language model (LLM) and enhance its performance for niche problems. Off-the-shelf LLMs developed by companies like OpenAI or Google have a foundational knowledge base that is effective for generalizing across many applications. However, due to the limited depth of training data, these models typically underperform when prompted on specialized topics.
LLM fine-tuning is a common strategy AI practitioners use to improve output accuracy and reliability for tasks such as providing technical product support, facilitating healthcare diagnoses, or generating content with a distinct writing style. In these types of applications, fine-tuning can optimize performance and help enterprises maximize the return on their AI investment.
Fine-tuning is the process of training an existing AI model on a smaller dataset in a target domain. The goal is to build on the foundational knowledge a model acquired during its initial round of training and optimize it for a particular task.
By the end of its original training period, a model can typically generalize effectively to a wide range of tasks but can’t deliver the level of context or accuracy needed for more complex or niche subjects. Fine-tuning addresses this gap, enabling the model to perform in areas not covered in its foundational training.
During fine-tuning, developers adjust different aspects of the model’s architecture, including weights and parameters, to optimize the model’s learning and performance on the new dataset. Developers accomplish fine-tuning using techniques such as transfer learning, low-rank adaptation (LoRA), or reinforcement learning from human feedback (RLHF).
AI practitioners consider fine-tuning suitable for refining model behavior or creation style as opposed to injecting it with new functionality. For example, fine-tuning would be ideal for training a customer service chatbot to develop a brand-specific conversational tone or classify customer sentiment.
At the same time, some research suggests fine-tuning can teach a model new abilities. For instance, although LLMs like GPT-4 have notoriously poor performance when solving mathematical problems, researchers from the National University of Singapore were able to alleviate this issue. They created “Goat,” a Meta Llama model they fine-tuned, which could solve problems like large-number addition and subtraction.
While the primary goal of LLM fine-tuning is to improve performance on domain-intensive tasks, the technique also offers enterprises other benefits, including scalability, efficiency, and cost savings.
Fine-tuning boosts output reliability for use cases beyond the capability of an off-the-shelf LLM. In a study on Facebook AI’s RoBERTa, fine-tuning significantly improved model performance in biomedical, computer science, and customer service niches, particularly when the target information was contextually more distant from the model’s original domain.
With fine-tuning, organizations can tailor generic AI models like GPT-4 or Llama to any target task as long as they have domain-specific datasets to support training. This empowers enterprises to build innovative solutions and fill market gaps for AI-based products and services. Fine-tuning is particularly powerful for customizing creative and natural language processing (NLP)-dependent AI behaviors, such as the style or tone of business writing.
Developers can fine-tune models with as much or as little new training data as they can access. This is valuable since specialized domains, such as biomedical applications, may offer limited datasets. As an enterprise’s demands, resources, and data availability evolve, developers can scale fine-tuning strategies to match.
Fine-tuning is more cost-effective and efficient than training a new model from scratch because the process typically involves less data and computational power. The method leverages pre-built LLMs, taking advantage of the resources already used to create these foundational models. Compared to ground-zero model development and even some prompt engineering techniques, fine-tuning can be up to three times more cost-effective in the long term.
While fine-tuning is often necessary to deepen the knowledge of generic LLMs in key areas, the technique doesn’t come without challenges. Fine-tuning is more time and resource-efficient than training a new model, but it still has significant computing requirements. The graphics processing units (GPUs) needed for fine-tuning are expensive, scarce, and in high demand, which can limit accessibility for smaller players. What’s more, enterprises may need to invest in training or hiring talent experienced in fine-tuning.
Speed-to-market is another common challenge for enterprises. In-house GPU infrastructure can quickly become outdated and upgrading hardware is costly, which could put you at a disadvantage compared to peers or competitors with greater financial resources. Alternatively, in-demand cloud providers may not have GPU availability when you need it, delaying fine-tuning projects by weeks or months.
There are also performance and longevity barriers. Models still tend to hallucinate, even when fine-tuned, if any information gaps remain. This issue can escalate as models age, since the knowledge gained during fine-tuning remains static, even as real-world domain expertise evolves. As a result, in fields like research or healthcare, which change quickly, models may require frequent fine-tuning to remain reliable, which is computationally and financially demanding.
Organizations must consider fine-tuning optimization techniques and proactive resourcing strategies to address these issues.
Fine-tuning outcomes are only as reliable as the model’s training data. To enable superior results, assess the performance of your foundational model. This will determine the core behavior of the fine-tuned model. Ensure the new datasets used for fine-tuning are clean, relevant, and represent a diversity of perspectives within the target domain.
Enterprises can address hallucinations by combining fine-tuning with retrieval-augmented generation (RAG), which uses an external knowledge base and a retriever to keep model outputs up-to-date and contextually relevant. The two approaches are complementary, with research indicating that they facilitate better performance together than independently. This is because RAG:
Pairing RAG with fine-tuning is ideal for use cases where you want to adjust a model’s creative style and keep it up to date with current events and expertise. For example, if you’re building an AI model to respond to technical support queries, you can use fine-tuning to customize the model’s conversational tone. On the other hand, RAG is an effective way to keep outputs informed with the latest product documentation.
Fine-tuning requires proactive planning and resource optimization to bring AI products and services to market efficiently. If you use in-house compute infrastructure to fine-tune models, plan resources to ensure GPU availability aligns with your release schedule. Additionally, consider that your GPU hardware may need regular upgrades to help you maintain a competitive edge. Otherwise, if you partner with cloud providers to access GPU instances, work closely with these companies to plan fine-tuning sessions well in advance based on provider availability.
Researchers are making strides in streamlining the resource-intensive processes of training and fine-tuning LLMs. Techniques like quantization or model pruning allow developers to reduce a model’s size and, consequently, the compute power needed for training and inference.
With parameter-efficient fine-tuning (PEFT), developers can minimize the number of parameters adjusted during fine-tuning, lowering computational demand without compromising performance. While these optimization techniques can save the time, resources, and costs associated with fine-tuning, they’re also useful for improving network performance or downsizing models for smaller hardware like mobile devices.
For some enterprise tasks, such as basic language translation or creative inspiration for marketing writing, foundational LLMs can be sufficient. But for more niche or domain-intensive tasks, enterprises typically need to invest in fine-tuning to achieve optimal performance. While fine-tuning is costly and resource-intensive, when done effectively, the technique can greatly enhance the performance of enterprise applications and deliver a significant long-term ROI.
As AI advancements accelerate, fine-tuning is evolving. More efficient methodologies are being developed and some enterprises are using a hybrid approach that combines fine-tuning with techniques like RAG. Organizations must continuously audit model reliability and adopt cutting-edge strategies to build AI solutions that are both trustworthy and competitive.
Not sure if fine-tuning is a good fit for your use case? Learn more about retrieval augmented generation.
Get emerging insights on innovative technology straight to your inbox.
Discover how AI assistants can revolutionize your business, from automating routine tasks and improving employee productivity to delivering personalized customer experiences and bridging the AI skills gap.
The Shift is Outshift’s exclusive newsletter.
The latest news and updates on generative AI, quantum computing, and other groundbreaking innovations shaping the future of technology.