BLACKBEAR
_BLOG
Open Source LLMs - History & 2023 Report
12/09/2023 - Stijn Bruggeman
The landscape of Open Large Language Models(LLMs)
A report by McKinsey & Company suggests that generative AI has the potential to contribute between $2.6 trillion to $4.4 trillion annually to the economy. This amount is higher than the gross domestic product (GDP) of most countries. The field of AI has been advancing rapidly for several years, and the point of consumer adoption seems to have been reached in early to mid-December 2022 when OpenAI made its ChatGPT model available to the general public.
OpenAI has established a commanding lead in the field of natural language processing with the tremendous success of its GPT series. However, other alternatives exist and open-source LLMs provide some unique advantages. Open-source LLMs are becoming faster, cheaper, and more widely distributed with each passing day. These open LLMs may or may not allow commercial use depending on their license, we have added a cheat sheet to these licenses at the end.
On May 4th, a leaked internal document from Google titled ‘We Have No Moat and Neither Does OpenAI’ highlighted the rapid pace of LLM development since the beginning of 2023 and made a case for why open-source is in direct competition with Google research, specifically Google Deepmind.
History of Large Language Models
One can read more about these projects at the following link:
Eliza: https://en.wikipedia.org/wiki/ELIZA
LSTM: https://en.wikipedia.org/wiki/Long_short-term_memory
Stanford core NLP: https://stanfordnlp.github.io/CoreNLP/
Google Brain: https://research.google/teams/brain/
Transformers: https://arxiv.org/abs/1706.03762
GPT3: https://arxiv.org/abs/2005.14165
ChatGPT: https://openai.com/blog/chatgpt
LLaMA: https://ai.meta.com/blog/large-language-model-llama-meta-ai/
Most Prominent Open Source LLMs
Since the launch of ChatGPT, there have been rapid advancements in the field of Open Source LLMs. We have curated the prominent ones below:
LLaMA(Alpaca, Vicuna, Koala) — A foundational model by Meta
*Image Source: https://github.com/RUCAIBox/LLMSurvey*
Meta released its foundational model Large Language Model Meta AI (LLaMA) on February 24th, 2023. The model was publicly accessible but its weights were not. One had to request the weights separately from the Meta team, and they were only provided for academic research purposes, not for commercial use. Then, less than a week after LLaMA’s release on March 2nd, 2023, a file containing the LLaMA weights was leaked to GitHub and HuggingFace, two popular platforms for software and AI, and people began experimenting with the model in all sorts of ways.
Stanford fine-tuned the LLaMA model, calling it Alpaca, and released it on March 13th, achieving state-of-the-art results in instruction following at a cost of only $600. They were technically bound by Meta’s copyright, but the new weights with the use of low-rank fine-tuning(LoRA) allowed freedom to anyone who wanted to replicate this on consumer-grade hardware in a short period.
Another team of experts at Lmsys released their own fine-tuned LLaMA model called Vicuna. It was based on ShareGPT, a dataset of user-shared conversations. It showed comparable performance to other top-notch models like Bard and ChatGPT in terms of response quality. However, since it was built on LLaMA weights, it was restricted by the same license and could only be used for non-commercial purposes.
T5(FLAN-T5) — An open-source model by Google
**T5**, or Text-to-Text Transfer Transformer, is a Transformer based architecture launched by Google AI that uses a text-to-text approach. Every task — including translation, question answering, and classification — is cast as feeding the model text as input and training it to generate some target text.
FLAN-T5 was released in the paper Scaling Instruction-Finetuned Language Models — it is an enhanced version of T5 that has been finetuned in a mixture of tasks. One can directly use FLAN-T5 weights without finetuning the model. Researchers claimed that the Flan-T5 model’s advanced prompting and multi-step reasoning capabilities could lead to significant improvements.
Pythia(Dolly 2.0) — A suite of LLMs by EleutherAI
The open-source Pythia suite of LLM models released by researchers at EleutherAI is an alternative to other decoder-style (aka GPT-like) models. Pythia is a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. The architecture is similar to GPT-3 but includes some improvements, for example, Flash Attention (like LLaMA) and Rotary Positional Embeddings (like PaLM).
Dolly 2.0, created by Databricks is trained to exhibit ChatGPT-like human interactivity(aka instruction-following). It was fine-tuned on a human-generated dataset. Previously, the Databricks team released Dolly 1.0, which costed only $30 to fine-tune using the Stanford Alpaca team dataset, which was under a restricted license as stated above. Dolly 2.0 has resolved this issue by fine-tuning the Pythia 12B on high-quality human-generated instruction in the following dataset, which was labeled by Datbricks employees. Both model and dataset are available for commercial use.
StableLM — Open LLM by Stability.ai
StableLM was released by Stablity.ai, which has three goals for the language model: transparency, accessibility, and support to users. It could lead to trust and transparency for the tool with researchers. Base model checkpoints (StableLM-Base-Alpha) are licensed under the Creative Commons license (CC BY-SA-4.0) while the fine-tuned checkpoints (StableLM-Tuned-Alpha) are licensed under the Non-Commercial Creative Commons license (CC BY-NC-SA-4.0).
Open Assistant — Crowdsourced Open Source Training Data
Open Assistant, released by LAION is an open-source ChatGPT model which will compete with OpenAI’s ChatGPT. Open Assistant is meant to bring people to collaborate on LLM to create and promote AI-generated language applications. Through crowdsourcing, the platform wishes to address the competitive advantage that ChatGPT has created in the market and allow the creation of newer AI chatbots in a free-for-all platform.
RedPajama-INCITE — Off-brand LLaMA with a permissive license
RedPajama-INCITE is the first family of models trained on the RedPajama base dataset. The RedPajama-INCITE models aim to replicate the LLaMA recipe but make the model fully open-source under the Apache license. As of the initial release, the 3B parameter model is best-in-class, with the 7B parameter model in progress. The 7B parameter model was made available on June 6, 2023, outperforming other models of a similar size.
Falcon — The UAE’s Commercially usable open LLM
Most recently, Falcon 40B and 6B models have just been released by Technology Innovation Institute, marking a significant advancement in the realm of open-source LLMs. They are released under Apache 2.0 license, which permits commercial usage. Both 40B and 6B variants are available as raw models suitable for fine-tuning, and as already instruction-tuned models that can be used directly. All of them are made available via Huggingface Hub.
LLaMA 2 — The next generation of LLaMA
Llama 2 release introduces a family of trained and fine-tuned LLMs, ranging in scale from 7B to 70B parameters. The pre-trained models come with significant improvements over the Llama 1 models, including being trained on 40% more tokens and having a much longer context length of 4k tokens. The model is released with a very permissive community license and is available for commercial use. One can try the model on Huggingface.
The fine-tuned models (Llama 2-Chat), have been optimized for dialogue applications using Reinforcement Learning from Human Feedback (RLHF). Across a wide range of helpfulness and safety benchmarks, the Llama 2-Chat models perform better than most open models and achieve comparable performance to ChatGPT according to human evaluations. The paper can be found here.
Tooling for LLMs
GPT4All
Nomic AI released GPT4All. GPT4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade CPUs. The goal is simple — be the best instruction-tuned assistant-style language model that any person or enterprise can freely use, distribute and build on.
llama.cpp — plugging LLMs into your project code
llama.cpp is a valuable tool that enables users to run LLM models based on the LLaMA architecture on personal CPUs, without the need for external infrastructure. This allows users to have complete control over their LLM usage, making it a great choice for those who prioritize privacy and flexibility. By providing the freedom to deploy models independently, llama.cpp offers a reliable solution for running LLaMA-based models on personal hardware.
LLMOps — Deploying and managing LLMs
It is difficult to deploy and manage LLMs in actual use, which is where LLMOps comes in. LLMOps refers to the set of practices, tools, and processes used to develop, deploy, and manage LLMs in production environments.
There are various providers like MLflow, W&B, and CometML that provide a set of tools for tracking experiments, packaging code, and deploying models in production. The centralized model registry simplifies the management of model versions and allows for easy sharing and collaborative access with the team members making it a popular choice for data scientists and Machine Learning engineers to streamline their workflow and improve productivity.
The power of open-source
Initially, much of the development work was based on leaked weights of the LLaMA model, but this served as a foundation for innovation, resulting in the creation of models such as Redpajama-INCITE and Falcon for commercial use. Due to the community-driven nature of Open Source, development progresses rapidly. For instance, integrating a new model into llama.cpp typically takes approximately two weeks to a month.
Over time, it has become clear that the quality of data has a greater impact on the model’s performance than the size of the data. Open models with 7–13 billion parameters are showing better results than larger models and can even be run on high-end consumer-grade GPUs. These models also take advantage of cutting-edge techniques like low-rank adaptation (LoRA), which can reduce the cost of training by up to a thousand times.
Using these models, companies have access to affordable solutions that can be easily deployed on their servers and updated with their own data on a regular basis, all without breaking the bank. The future of LLMs is bright, and the open-source community has a significant role to play in shaping it. As we move forward, it will be fascinating to see how these models continue to evolve and how they will impact the field of artificial intelligence as a whole.
Why should businesses use open-source LLMs?
In recent times, there have been news reports regarding the restriction of ChatGPT usage by several prominent companies such as Apple, Amazon, and Spotify for their employees. Furthermore, some companies, including Samsung, have suffered from the leaking of sensitive information to entities such as OpenAI. Additionally, users leveraging proprietary models have reported exorbitant bills from vendors, while others have experienced hallucinations that could have been resolved through fine-tuning these models. To circumvent such challenges in a cost-effective manner, employing open-source Large Language Models (LLMs) can provide numerous benefits, some of which are listed below.
Cost Effective
Open LLMs are cost-efficient because they can be self-hosted. This is cheaper than the token-based pricing of OpenAI, Anthropic, or any other provider but does come with the overhead of managing your own infrastructure.
Transparency
Open LLMs provide transparency by making their source code publicly available for anyone to access and examine. Many times the dataset is also made public. This enables researchers and developers to verify and reproduce results, identify and correct errors, and promote accountability and trust.
Fine Tuning
Most of the proprietary models currently do now allow fine-tuning, while it is much easier to fine-tune the open-source LLMs. This allows users to adapt the model to their specific use case, resulting in better accuracy and efficiency.
Data Security
Open LLMs provide the option for enterprises to deploy the models on their own infrastructure, whether it’s on-premises or in a private cloud environment. This allows organizations to have full control over their data, ensuring that sensitive information remains within their network.
In Summary
The evolution of generative artificial intelligence owes much to the pioneering work of industry leaders such as OpenAI, Google, and Meta, along with other closed-source companies. In today’s rapidly-evolving landscape, businesses are increasingly realizing the benefits of adopting smaller, domain-specific models that offer greater control, privacy, and cost efficiency. These custom models allow organizations to tailor their AI solutions to their specific needs while avoiding the drawbacks of larger, general-purpose models such as GPT-3.5. As a result, domain-specific models are becoming the preferred choice for many businesses seeking to optimize their AI capabilities.
Open Source License Cheatsheet
An open-source license is a legal agreement that governs how people can use, modify, and share open-source software. It’s important to comply with the terms of the license when using open-source software. The table below contains some common licenses. More information on open-source licenses can be found on opensource.org.
Indicates fine tunes of LLaMA, in addition to the mentioned license it must comply with the license of LLaMA, which makes them usable only for research purposes
MIT License — Permissive, minimal restrictions
The MIT license is a permissive license that allows users to use, modify, and distribute the software, both commercially and non-commercially. It is known for its simplicity and flexibility, making it popular among developers.
Models: GPT4All, Guanaco*
Apache 2.0 — Permissive, patent protection, attribution
It is a permissive open-source license that allows free use, modification, and distribution of software while retaining copyright and patent rights. It offers more comprehensive legal protections and is suitable for larger projects.
Models: LLaMA*, Alpaca*, Vicuna*, Falcon, Pythia, FLAN
GPL-3.0 license — Copyleft, promotes software freedom
This license ensures that software remains free and open-source. It allows users to use, modify, and distribute the software, but any derivative work must also be licensed under GPL-3.0.
Models: Baize Chatbot*,SAIL 7B*
CC BY-SA-4.0 — Share and adapt with attribution
CC BY-SA-4.0: This license allows users to use, share, and adapt the work as long as they give credit to the original creator and share any derivative work under the same license.
Model: StableLM
Ready to launch your AI project?
BLACKBEAR
_BLOG
Open Source LLMs - History & 2023 Report
12/09/2023 - Stijn Bruggeman
The landscape of Open Large Language Models(LLMs)
A report by McKinsey & Company suggests that generative AI has the potential to contribute between $2.6 trillion to $4.4 trillion annually to the economy. This amount is higher than the gross domestic product (GDP) of most countries. The field of AI has been advancing rapidly for several years, and the point of consumer adoption seems to have been reached in early to mid-December 2022 when OpenAI made its ChatGPT model available to the general public.
OpenAI has established a commanding lead in the field of natural language processing with the tremendous success of its GPT series. However, other alternatives exist and open-source LLMs provide some unique advantages. Open-source LLMs are becoming faster, cheaper, and more widely distributed with each passing day. These open LLMs may or may not allow commercial use depending on their license, we have added a cheat sheet to these licenses at the end.
On May 4th, a leaked internal document from Google titled ‘We Have No Moat and Neither Does OpenAI’ highlighted the rapid pace of LLM development since the beginning of 2023 and made a case for why open-source is in direct competition with Google research, specifically Google Deepmind.
History of Large Language Models
One can read more about these projects at the following link:
Eliza: https://en.wikipedia.org/wiki/ELIZA
LSTM: https://en.wikipedia.org/wiki/Long_short-term_memory
Stanford core NLP: https://stanfordnlp.github.io/CoreNLP/
Google Brain: https://research.google/teams/brain/
Transformers: https://arxiv.org/abs/1706.03762
GPT3: https://arxiv.org/abs/2005.14165
ChatGPT: https://openai.com/blog/chatgpt
LLaMA: https://ai.meta.com/blog/large-language-model-llama-meta-ai/
Most Prominent Open Source LLMs
Since the launch of ChatGPT, there have been rapid advancements in the field of Open Source LLMs. We have curated the prominent ones below:
LLaMA(Alpaca, Vicuna, Koala) — A foundational model by Meta
*Image Source: https://github.com/RUCAIBox/LLMSurvey*
Meta released its foundational model Large Language Model Meta AI (LLaMA) on February 24th, 2023. The model was publicly accessible but its weights were not. One had to request the weights separately from the Meta team, and they were only provided for academic research purposes, not for commercial use. Then, less than a week after LLaMA’s release on March 2nd, 2023, a file containing the LLaMA weights was leaked to GitHub and HuggingFace, two popular platforms for software and AI, and people began experimenting with the model in all sorts of ways.
Stanford fine-tuned the LLaMA model, calling it Alpaca, and released it on March 13th, achieving state-of-the-art results in instruction following at a cost of only $600. They were technically bound by Meta’s copyright, but the new weights with the use of low-rank fine-tuning(LoRA) allowed freedom to anyone who wanted to replicate this on consumer-grade hardware in a short period.
Another team of experts at Lmsys released their own fine-tuned LLaMA model called Vicuna. It was based on ShareGPT, a dataset of user-shared conversations. It showed comparable performance to other top-notch models like Bard and ChatGPT in terms of response quality. However, since it was built on LLaMA weights, it was restricted by the same license and could only be used for non-commercial purposes.
T5(FLAN-T5) — An open-source model by Google
**T5**, or Text-to-Text Transfer Transformer, is a Transformer based architecture launched by Google AI that uses a text-to-text approach. Every task — including translation, question answering, and classification — is cast as feeding the model text as input and training it to generate some target text.
FLAN-T5 was released in the paper Scaling Instruction-Finetuned Language Models — it is an enhanced version of T5 that has been finetuned in a mixture of tasks. One can directly use FLAN-T5 weights without finetuning the model. Researchers claimed that the Flan-T5 model’s advanced prompting and multi-step reasoning capabilities could lead to significant improvements.
Pythia(Dolly 2.0) — A suite of LLMs by EleutherAI
The open-source Pythia suite of LLM models released by researchers at EleutherAI is an alternative to other decoder-style (aka GPT-like) models. Pythia is a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. The architecture is similar to GPT-3 but includes some improvements, for example, Flash Attention (like LLaMA) and Rotary Positional Embeddings (like PaLM).
Dolly 2.0, created by Databricks is trained to exhibit ChatGPT-like human interactivity(aka instruction-following). It was fine-tuned on a human-generated dataset. Previously, the Databricks team released Dolly 1.0, which costed only $30 to fine-tune using the Stanford Alpaca team dataset, which was under a restricted license as stated above. Dolly 2.0 has resolved this issue by fine-tuning the Pythia 12B on high-quality human-generated instruction in the following dataset, which was labeled by Datbricks employees. Both model and dataset are available for commercial use.
StableLM — Open LLM by Stability.ai
StableLM was released by Stablity.ai, which has three goals for the language model: transparency, accessibility, and support to users. It could lead to trust and transparency for the tool with researchers. Base model checkpoints (StableLM-Base-Alpha) are licensed under the Creative Commons license (CC BY-SA-4.0) while the fine-tuned checkpoints (StableLM-Tuned-Alpha) are licensed under the Non-Commercial Creative Commons license (CC BY-NC-SA-4.0).
Open Assistant — Crowdsourced Open Source Training Data
Open Assistant, released by LAION is an open-source ChatGPT model which will compete with OpenAI’s ChatGPT. Open Assistant is meant to bring people to collaborate on LLM to create and promote AI-generated language applications. Through crowdsourcing, the platform wishes to address the competitive advantage that ChatGPT has created in the market and allow the creation of newer AI chatbots in a free-for-all platform.
RedPajama-INCITE — Off-brand LLaMA with a permissive license
RedPajama-INCITE is the first family of models trained on the RedPajama base dataset. The RedPajama-INCITE models aim to replicate the LLaMA recipe but make the model fully open-source under the Apache license. As of the initial release, the 3B parameter model is best-in-class, with the 7B parameter model in progress. The 7B parameter model was made available on June 6, 2023, outperforming other models of a similar size.
Falcon — The UAE’s Commercially usable open LLM
Most recently, Falcon 40B and 6B models have just been released by Technology Innovation Institute, marking a significant advancement in the realm of open-source LLMs. They are released under Apache 2.0 license, which permits commercial usage. Both 40B and 6B variants are available as raw models suitable for fine-tuning, and as already instruction-tuned models that can be used directly. All of them are made available via Huggingface Hub.
LLaMA 2 — The next generation of LLaMA
Llama 2 release introduces a family of trained and fine-tuned LLMs, ranging in scale from 7B to 70B parameters. The pre-trained models come with significant improvements over the Llama 1 models, including being trained on 40% more tokens and having a much longer context length of 4k tokens. The model is released with a very permissive community license and is available for commercial use. One can try the model on Huggingface.
The fine-tuned models (Llama 2-Chat), have been optimized for dialogue applications using Reinforcement Learning from Human Feedback (RLHF). Across a wide range of helpfulness and safety benchmarks, the Llama 2-Chat models perform better than most open models and achieve comparable performance to ChatGPT according to human evaluations. The paper can be found here.
Tooling for LLMs
GPT4All
Nomic AI released GPT4All. GPT4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade CPUs. The goal is simple — be the best instruction-tuned assistant-style language model that any person or enterprise can freely use, distribute and build on.
llama.cpp — plugging LLMs into your project code
llama.cpp is a valuable tool that enables users to run LLM models based on the LLaMA architecture on personal CPUs, without the need for external infrastructure. This allows users to have complete control over their LLM usage, making it a great choice for those who prioritize privacy and flexibility. By providing the freedom to deploy models independently, llama.cpp offers a reliable solution for running LLaMA-based models on personal hardware.
LLMOps — Deploying and managing LLMs
It is difficult to deploy and manage LLMs in actual use, which is where LLMOps comes in. LLMOps refers to the set of practices, tools, and processes used to develop, deploy, and manage LLMs in production environments.
There are various providers like MLflow, W&B, and CometML that provide a set of tools for tracking experiments, packaging code, and deploying models in production. The centralized model registry simplifies the management of model versions and allows for easy sharing and collaborative access with the team members making it a popular choice for data scientists and Machine Learning engineers to streamline their workflow and improve productivity.
The power of open-source
Initially, much of the development work was based on leaked weights of the LLaMA model, but this served as a foundation for innovation, resulting in the creation of models such as Redpajama-INCITE and Falcon for commercial use. Due to the community-driven nature of Open Source, development progresses rapidly. For instance, integrating a new model into llama.cpp typically takes approximately two weeks to a month.
Over time, it has become clear that the quality of data has a greater impact on the model’s performance than the size of the data. Open models with 7–13 billion parameters are showing better results than larger models and can even be run on high-end consumer-grade GPUs. These models also take advantage of cutting-edge techniques like low-rank adaptation (LoRA), which can reduce the cost of training by up to a thousand times.
Using these models, companies have access to affordable solutions that can be easily deployed on their servers and updated with their own data on a regular basis, all without breaking the bank. The future of LLMs is bright, and the open-source community has a significant role to play in shaping it. As we move forward, it will be fascinating to see how these models continue to evolve and how they will impact the field of artificial intelligence as a whole.
Why should businesses use open-source LLMs?
In recent times, there have been news reports regarding the restriction of ChatGPT usage by several prominent companies such as Apple, Amazon, and Spotify for their employees. Furthermore, some companies, including Samsung, have suffered from the leaking of sensitive information to entities such as OpenAI. Additionally, users leveraging proprietary models have reported exorbitant bills from vendors, while others have experienced hallucinations that could have been resolved through fine-tuning these models. To circumvent such challenges in a cost-effective manner, employing open-source Large Language Models (LLMs) can provide numerous benefits, some of which are listed below.
Cost Effective
Open LLMs are cost-efficient because they can be self-hosted. This is cheaper than the token-based pricing of OpenAI, Anthropic, or any other provider but does come with the overhead of managing your own infrastructure.
Transparency
Open LLMs provide transparency by making their source code publicly available for anyone to access and examine. Many times the dataset is also made public. This enables researchers and developers to verify and reproduce results, identify and correct errors, and promote accountability and trust.
Fine Tuning
Most of the proprietary models currently do now allow fine-tuning, while it is much easier to fine-tune the open-source LLMs. This allows users to adapt the model to their specific use case, resulting in better accuracy and efficiency.
Data Security
Open LLMs provide the option for enterprises to deploy the models on their own infrastructure, whether it’s on-premises or in a private cloud environment. This allows organizations to have full control over their data, ensuring that sensitive information remains within their network.
In Summary
The evolution of generative artificial intelligence owes much to the pioneering work of industry leaders such as OpenAI, Google, and Meta, along with other closed-source companies. In today’s rapidly-evolving landscape, businesses are increasingly realizing the benefits of adopting smaller, domain-specific models that offer greater control, privacy, and cost efficiency. These custom models allow organizations to tailor their AI solutions to their specific needs while avoiding the drawbacks of larger, general-purpose models such as GPT-3.5. As a result, domain-specific models are becoming the preferred choice for many businesses seeking to optimize their AI capabilities.
Open Source License Cheatsheet
An open-source license is a legal agreement that governs how people can use, modify, and share open-source software. It’s important to comply with the terms of the license when using open-source software. The table below contains some common licenses. More information on open-source licenses can be found on opensource.org.
Indicates fine tunes of LLaMA, in addition to the mentioned license it must comply with the license of LLaMA, which makes them usable only for research purposes
MIT License — Permissive, minimal restrictions
The MIT license is a permissive license that allows users to use, modify, and distribute the software, both commercially and non-commercially. It is known for its simplicity and flexibility, making it popular among developers.
Models: GPT4All, Guanaco*
Apache 2.0 — Permissive, patent protection, attribution
It is a permissive open-source license that allows free use, modification, and distribution of software while retaining copyright and patent rights. It offers more comprehensive legal protections and is suitable for larger projects.
Models: LLaMA*, Alpaca*, Vicuna*, Falcon, Pythia, FLAN
GPL-3.0 license — Copyleft, promotes software freedom
This license ensures that software remains free and open-source. It allows users to use, modify, and distribute the software, but any derivative work must also be licensed under GPL-3.0.
Models: Baize Chatbot*,SAIL 7B*
CC BY-SA-4.0 — Share and adapt with attribution
CC BY-SA-4.0: This license allows users to use, share, and adapt the work as long as they give credit to the original creator and share any derivative work under the same license.
Model: StableLM
Ready to launch your AI project?
BLACKBEAR _BLOG
Open Source LLMs - History & 2023 Report
12/09/2023 - Stijn Bruggeman
The landscape of Open Large Language Models(LLMs)
A report by McKinsey & Company suggests that generative AI has the potential to contribute between $2.6 trillion to $4.4 trillion annually to the economy. This amount is higher than the gross domestic product (GDP) of most countries. The field of AI has been advancing rapidly for several years, and the point of consumer adoption seems to have been reached in early to mid-December 2022 when OpenAI made its ChatGPT model available to the general public.
OpenAI has established a commanding lead in the field of natural language processing with the tremendous success of its GPT series. However, other alternatives exist and open-source LLMs provide some unique advantages. Open-source LLMs are becoming faster, cheaper, and more widely distributed with each passing day. These open LLMs may or may not allow commercial use depending on their license, we have added a cheat sheet to these licenses at the end.
On May 4th, a leaked internal document from Google titled ‘We Have No Moat and Neither Does OpenAI’ highlighted the rapid pace of LLM development since the beginning of 2023 and made a case for why open-source is in direct competition with Google research, specifically Google Deepmind.
History of Large Language Models
One can read more about these projects at the following link:
Eliza: https://en.wikipedia.org/wiki/ELIZA
LSTM: https://en.wikipedia.org/wiki/Long_short-term_memory
Stanford core NLP: https://stanfordnlp.github.io/CoreNLP/
Google Brain: https://research.google/teams/brain/
Transformers: https://arxiv.org/abs/1706.03762
GPT3: https://arxiv.org/abs/2005.14165
ChatGPT: https://openai.com/blog/chatgpt
LLaMA: https://ai.meta.com/blog/large-language-model-llama-meta-ai/
Most Prominent Open Source LLMs
Since the launch of ChatGPT, there have been rapid advancements in the field of Open Source LLMs. We have curated the prominent ones below:
LLaMA(Alpaca, Vicuna, Koala) — A foundational model by Meta
*Image Source: https://github.com/RUCAIBox/LLMSurvey*
Meta released its foundational model Large Language Model Meta AI (LLaMA) on February 24th, 2023. The model was publicly accessible but its weights were not. One had to request the weights separately from the Meta team, and they were only provided for academic research purposes, not for commercial use. Then, less than a week after LLaMA’s release on March 2nd, 2023, a file containing the LLaMA weights was leaked to GitHub and HuggingFace, two popular platforms for software and AI, and people began experimenting with the model in all sorts of ways.
Stanford fine-tuned the LLaMA model, calling it Alpaca, and released it on March 13th, achieving state-of-the-art results in instruction following at a cost of only $600. They were technically bound by Meta’s copyright, but the new weights with the use of low-rank fine-tuning(LoRA) allowed freedom to anyone who wanted to replicate this on consumer-grade hardware in a short period.
Another team of experts at Lmsys released their own fine-tuned LLaMA model called Vicuna. It was based on ShareGPT, a dataset of user-shared conversations. It showed comparable performance to other top-notch models like Bard and ChatGPT in terms of response quality. However, since it was built on LLaMA weights, it was restricted by the same license and could only be used for non-commercial purposes.
T5(FLAN-T5) — An open-source model by Google
**T5**, or Text-to-Text Transfer Transformer, is a Transformer based architecture launched by Google AI that uses a text-to-text approach. Every task — including translation, question answering, and classification — is cast as feeding the model text as input and training it to generate some target text.
FLAN-T5 was released in the paper Scaling Instruction-Finetuned Language Models — it is an enhanced version of T5 that has been finetuned in a mixture of tasks. One can directly use FLAN-T5 weights without finetuning the model. Researchers claimed that the Flan-T5 model’s advanced prompting and multi-step reasoning capabilities could lead to significant improvements.
Pythia(Dolly 2.0) — A suite of LLMs by EleutherAI
The open-source Pythia suite of LLM models released by researchers at EleutherAI is an alternative to other decoder-style (aka GPT-like) models. Pythia is a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. The architecture is similar to GPT-3 but includes some improvements, for example, Flash Attention (like LLaMA) and Rotary Positional Embeddings (like PaLM).
Dolly 2.0, created by Databricks is trained to exhibit ChatGPT-like human interactivity(aka instruction-following). It was fine-tuned on a human-generated dataset. Previously, the Databricks team released Dolly 1.0, which costed only $30 to fine-tune using the Stanford Alpaca team dataset, which was under a restricted license as stated above. Dolly 2.0 has resolved this issue by fine-tuning the Pythia 12B on high-quality human-generated instruction in the following dataset, which was labeled by Datbricks employees. Both model and dataset are available for commercial use.
StableLM — Open LLM by Stability.ai
StableLM was released by Stablity.ai, which has three goals for the language model: transparency, accessibility, and support to users. It could lead to trust and transparency for the tool with researchers. Base model checkpoints (StableLM-Base-Alpha) are licensed under the Creative Commons license (CC BY-SA-4.0) while the fine-tuned checkpoints (StableLM-Tuned-Alpha) are licensed under the Non-Commercial Creative Commons license (CC BY-NC-SA-4.0).
Open Assistant — Crowdsourced Open Source Training Data
Open Assistant, released by LAION is an open-source ChatGPT model which will compete with OpenAI’s ChatGPT. Open Assistant is meant to bring people to collaborate on LLM to create and promote AI-generated language applications. Through crowdsourcing, the platform wishes to address the competitive advantage that ChatGPT has created in the market and allow the creation of newer AI chatbots in a free-for-all platform.
RedPajama-INCITE — Off-brand LLaMA with a permissive license
RedPajama-INCITE is the first family of models trained on the RedPajama base dataset. The RedPajama-INCITE models aim to replicate the LLaMA recipe but make the model fully open-source under the Apache license. As of the initial release, the 3B parameter model is best-in-class, with the 7B parameter model in progress. The 7B parameter model was made available on June 6, 2023, outperforming other models of a similar size.
Falcon — The UAE’s Commercially usable open LLM
Most recently, Falcon 40B and 6B models have just been released by Technology Innovation Institute, marking a significant advancement in the realm of open-source LLMs. They are released under Apache 2.0 license, which permits commercial usage. Both 40B and 6B variants are available as raw models suitable for fine-tuning, and as already instruction-tuned models that can be used directly. All of them are made available via Huggingface Hub.
LLaMA 2 — The next generation of LLaMA
Llama 2 release introduces a family of trained and fine-tuned LLMs, ranging in scale from 7B to 70B parameters. The pre-trained models come with significant improvements over the Llama 1 models, including being trained on 40% more tokens and having a much longer context length of 4k tokens. The model is released with a very permissive community license and is available for commercial use. One can try the model on Huggingface.
The fine-tuned models (Llama 2-Chat), have been optimized for dialogue applications using Reinforcement Learning from Human Feedback (RLHF). Across a wide range of helpfulness and safety benchmarks, the Llama 2-Chat models perform better than most open models and achieve comparable performance to ChatGPT according to human evaluations. The paper can be found here.
Tooling for LLMs
GPT4All
Nomic AI released GPT4All. GPT4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade CPUs. The goal is simple — be the best instruction-tuned assistant-style language model that any person or enterprise can freely use, distribute and build on.
llama.cpp — plugging LLMs into your project code
llama.cpp is a valuable tool that enables users to run LLM models based on the LLaMA architecture on personal CPUs, without the need for external infrastructure. This allows users to have complete control over their LLM usage, making it a great choice for those who prioritize privacy and flexibility. By providing the freedom to deploy models independently, llama.cpp offers a reliable solution for running LLaMA-based models on personal hardware.
LLMOps — Deploying and managing LLMs
It is difficult to deploy and manage LLMs in actual use, which is where LLMOps comes in. LLMOps refers to the set of practices, tools, and processes used to develop, deploy, and manage LLMs in production environments.
There are various providers like MLflow, W&B, and CometML that provide a set of tools for tracking experiments, packaging code, and deploying models in production. The centralized model registry simplifies the management of model versions and allows for easy sharing and collaborative access with the team members making it a popular choice for data scientists and Machine Learning engineers to streamline their workflow and improve productivity.
The power of open-source
Initially, much of the development work was based on leaked weights of the LLaMA model, but this served as a foundation for innovation, resulting in the creation of models such as Redpajama-INCITE and Falcon for commercial use. Due to the community-driven nature of Open Source, development progresses rapidly. For instance, integrating a new model into llama.cpp typically takes approximately two weeks to a month.
Over time, it has become clear that the quality of data has a greater impact on the model’s performance than the size of the data. Open models with 7–13 billion parameters are showing better results than larger models and can even be run on high-end consumer-grade GPUs. These models also take advantage of cutting-edge techniques like low-rank adaptation (LoRA), which can reduce the cost of training by up to a thousand times.
Using these models, companies have access to affordable solutions that can be easily deployed on their servers and updated with their own data on a regular basis, all without breaking the bank. The future of LLMs is bright, and the open-source community has a significant role to play in shaping it. As we move forward, it will be fascinating to see how these models continue to evolve and how they will impact the field of artificial intelligence as a whole.
Why should businesses use open-source LLMs?
In recent times, there have been news reports regarding the restriction of ChatGPT usage by several prominent companies such as Apple, Amazon, and Spotify for their employees. Furthermore, some companies, including Samsung, have suffered from the leaking of sensitive information to entities such as OpenAI. Additionally, users leveraging proprietary models have reported exorbitant bills from vendors, while others have experienced hallucinations that could have been resolved through fine-tuning these models. To circumvent such challenges in a cost-effective manner, employing open-source Large Language Models (LLMs) can provide numerous benefits, some of which are listed below.
Cost Effective
Open LLMs are cost-efficient because they can be self-hosted. This is cheaper than the token-based pricing of OpenAI, Anthropic, or any other provider but does come with the overhead of managing your own infrastructure.
Transparency
Open LLMs provide transparency by making their source code publicly available for anyone to access and examine. Many times the dataset is also made public. This enables researchers and developers to verify and reproduce results, identify and correct errors, and promote accountability and trust.
Fine Tuning
Most of the proprietary models currently do now allow fine-tuning, while it is much easier to fine-tune the open-source LLMs. This allows users to adapt the model to their specific use case, resulting in better accuracy and efficiency.
Data Security
Open LLMs provide the option for enterprises to deploy the models on their own infrastructure, whether it’s on-premises or in a private cloud environment. This allows organizations to have full control over their data, ensuring that sensitive information remains within their network.
In Summary
The evolution of generative artificial intelligence owes much to the pioneering work of industry leaders such as OpenAI, Google, and Meta, along with other closed-source companies. In today’s rapidly-evolving landscape, businesses are increasingly realizing the benefits of adopting smaller, domain-specific models that offer greater control, privacy, and cost efficiency. These custom models allow organizations to tailor their AI solutions to their specific needs while avoiding the drawbacks of larger, general-purpose models such as GPT-3.5. As a result, domain-specific models are becoming the preferred choice for many businesses seeking to optimize their AI capabilities.
Open Source License Cheatsheet
An open-source license is a legal agreement that governs how people can use, modify, and share open-source software. It’s important to comply with the terms of the license when using open-source software. The table below contains some common licenses. More information on open-source licenses can be found on opensource.org.
Indicates fine tunes of LLaMA, in addition to the mentioned license it must comply with the license of LLaMA, which makes them usable only for research purposes
MIT License — Permissive, minimal restrictions
The MIT license is a permissive license that allows users to use, modify, and distribute the software, both commercially and non-commercially. It is known for its simplicity and flexibility, making it popular among developers.
Models: GPT4All, Guanaco*
Apache 2.0 — Permissive, patent protection, attribution
It is a permissive open-source license that allows free use, modification, and distribution of software while retaining copyright and patent rights. It offers more comprehensive legal protections and is suitable for larger projects.
Models: LLaMA*, Alpaca*, Vicuna*, Falcon, Pythia, FLAN
GPL-3.0 license — Copyleft, promotes software freedom
This license ensures that software remains free and open-source. It allows users to use, modify, and distribute the software, but any derivative work must also be licensed under GPL-3.0.
Models: Baize Chatbot*,SAIL 7B*
CC BY-SA-4.0 — Share and adapt with attribution
CC BY-SA-4.0: This license allows users to use, share, and adapt the work as long as they give credit to the original creator and share any derivative work under the same license.
Model: StableLM