Labelbox•September 6, 2023
Large language models (LLMs) have revolutionized the field of AI and natural language processing (NLP). LLMs are trained on massive datasets of text, containing millions or even billions of data points, and have become increasingly important in various applications. These models have shattered barriers of what was once thought possible in natural language understanding and generation. Unlock the full potential of large language models with Labelbox’s end-to-end platform and a new suite of LLM tools to optimize LLMs for your most valuable AI use cases.
As a follow-up to 6 cutting edge foundation models for computer vision, this blog post will dive into six popular LLMs and explore their intended use cases, limitations, and possible real-world use cases. With Labelbox’s upcoming Model Foundry, you’ll be able to explore, experiment, and leverage the LLMs listed below to pre-label data and accelerate your downstream ML workflow. You can sign up to join the Model Foundry waitlist here.
GPT-4 is a large multi-modal model that accepts both image and text inputs and provides text outputs. The model is a more advanced version of GPT-3, claiming to be the most advanced language model from OpenAI. The GPT-4 model is optimized for conversational interfaces and can be used to generate text summaries, reports, and responses.
As a highly advanced model, it can accept both image and text inputs, making it more versatile than its predecessor, GPT-3. OpenAI also claims that GPT-4 is less likely to hallucinate and is less biased than GPT-3. GPT-4 is described as “10 times more advanced than its predecessor GPT-3.5”. It's been attributed to exhibit human-level performance on various tasks such as passing a simulated bar exam with a score of around the top 10% of test takers.
Like the limitations of previous GPT models, GPT-4 generally does not possess knowledge of events that occurred after the vast majority of its training data was collected. The model is also sensitive to input phrasing and a slight rephrasing of questions or prompts can lead to different answers or interpretations.
Sample use case:Detecting personal identifiable information (PII) is important in many industries to protect sensitive information and prevent identity theft. One industry where this is especially important is in the healthcare industry, where PII could include sensitive information such as medical records, SSNs, and insurance information. Large language models, such as GPT-4, can be used as a powerful tool for PII detection and extraction. You can learn more about how you can leverage GPT-4 and Model Foundry for PII extraction in this blog post.
PaLM 2 is Google’s state-of-the-art language model that has multilingual (support for more than 100 languages), reasoning, and coding capabilities. PaLM 2 excels at tasks such as advanced reasoning, translation, and code generation, making it suitable for a variety of use cases across industries. It is also being used in other models like Med-PaLM 2 and Sec-PaLM in addition to powering over 25 Google products and features.
With strong reasoning capabilities, PaLM 2 excels at understanding idioms, poems, nuanced texts, and riddles. The PaLM 2 chat-bison model is optimized for multi-turn chat, where the model keeps track of previous messages in the chat and uses it for context for generating new responses. The model is optimized for natural language tasks, such as chatbots, and can generate text in a conversational format.
The limitations of PaLM 2 include the potential for toxicity and bias – researchers found that of all the times PaLM 2 responded with prompts incorrectly, 38.2% of the time it “reinforced a harmful social bias”.
Sample use case: In the media and entertainment industry, PaLM 2 can be used to generate technical writing such as user manuals, tutorials, or other documentation. Given its unique ability to generate code in various programming languages, it can be leveraged in various domains requiring software development.
BLOOM is the world's largest open multilingual model with 176 billion parameters and the ability to generate text in 46 natural languages and 13 programming languages.
The model is capable of following human instructions in dozens of languages and is designed to continue text from a given prompt using vast amounts of text data and industrial computational resources. BLOOM can be used for text generation, multilingual tasks, or for language understanding models and applications. For optimal performance, we recommend providing the model with as much context as possible.
Compared to ChatGPT, BLOOM requires larger computational resources that can lead to high running costs. Similar to other large language models, the BLOOM model is trained on real datasets that have the potential to generate biased content. This can lead to the creation of factually incorrect content, hallucinations, or the generation of repetitive texts.
Sample use case: With BLOOM's impressive multilingual capabilities, it can be used across a variety of real-world use cases such as language translation, question answering, and content creation. For example, BLOOM can be used to build a chatbot to handle customer inquiries across different languages. This can be particularly useful for retailers that have global customers or businesses that operate in multiple countries and need to communicate with people across different languages.
Anthropic’s Claude Instant is a low-latency, high-throughput LLM trained to perform various conversational and text processing tasks. It is the less expensive and faster version of the original, larger Claude model. Claude Instant can answer questions, give recommendations, and have casual conversations using natural language.
The model uses Anthropic’s API, making it easy for developers and businesses to use. Early Claude users report that Claude is less likely to spew harmful outputs and is easier to steer, so that you can get your desired output with less effort.
Similar to other large language models, Claude only has the knowledge and abilities that have been explicitly built into its model training. It does not possess the general capability for open-domain problem solving.
Sample use case: Claude Instant can be used in a variety of industries, especially where conversational and text processing tasks are essential. For example, in the financial services or legal industry, where documents are abundant, you can leverage Claude Instant for summarizing and analyzing documents such as annual reports, research papers, or contracts. By using Claude Instant's text processing capabilities, professionals can enhance their document processing workflows, save valuable time, and improve their decision-making processes.
GPT-3.5 is a large language model developed by OpenAI that builds upon the GPT-3 architecture. It is the predecessor of GPT-4 and is designed to generate human-like text and respond coherently to prompts for a wide range of natural language processing tasks. GPT-3.5 aims to understand and generate coherent responses that align with user input.
The large language model is designed to engage in interactive conversations, answer questions, provide explanations or suggestions, and facilitate information retrieval. Given its wide-range of use cases, it can be employed in systems that require natural language understanding and generation. The model has been trained on a vast amount of internet text, enabling it to leverage a wide range of knowledge and information.
While GPT-3.5 is strong at generating text, it might sometimes produce responses that are plausible, but that are factually incorrect or nonsensical. Without access to real-time information, its responses might be outdated or provide inaccurate information when asked questions on time-sensitive topics. Although GPT-3.5 may maintain short-term context within a conversation, it lacks long-term memory and may sometimes provide inconsistent or contradictory answers within the same conversation.
Sample use case: As a versatile model, GPT-3.5 can be used across a variety of industries. One example being the agricultural industry - GPT-3.5 can be used to assist in generating insightful agricultural or crop reports. By analyzing data inputs from sensors or other sources like weather forecasts, crop or soil conditions, or market trends, it can help generate reports guiding farmers on optimal planting strategies, pest management, and yield predictions.
T-5 (Text-to-Text Transfer Transformer) is Google AI’s versatile language model. Unlike traditional language models that handle text generation or classification as distinct tasks, T-5 has been trained on a combination of different tasks and is designed to convert all language tasks into a unified text-to-text format, where all NLP tasks are reframed as input-output text strings. You can use the model for different tasks without any additional fine-tuning, you simply need to add a prefix to the input text that corresponds to a specific task.
T-5 has shown strong performance across various language tasks. Its unique approach of treating tasks as translation problems has led to impressive results in text generation, summarization, question answering, and more.
Some limitations of the T-5 model include inputs being limited to 512 tokens and T-5’s variable performance depending on the specific task. Additionally, T-5 might generate plausible-sounding but incorrect answers if not provided with proper context.
Sample use case: Similar to the large language models above, the T-5 model can be used in a variety of contexts. For example, in retail, T-5 can be employed in sentiment analysis of customer reviews. It can help businesses translate customer feedback into sentiment labels, allowing retailers to understand overall customer sentiment and satisfaction and quickly identify areas for improvement.
Large language models have changed the way AI builders and business users build more powerful models. You can now leverage cutting edge LLMs to accelerate the development of common enterprise AI applications at scale across a variety of industries.
Learn more about how you can leverage the foundation models above for AI development quickly and easily with Model Foundry — and be sure to sign up and get access via the waitlist while you’re there.
Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer. (2020, February 24). Blog.research.google. https://blog.research.google/2020/02/exploring-transfer-learning-with-t5.html
Google. (n.d.). PaLM 2 Technical Report. https://ai.google/static/documents/palm2techreport.pdf
Google AI PaLM 2. (n.d.). Google AI. https://ai.google/discover/palm2/
GPT-J. (n.d.). EleutherAI. Retrieved August 30, 2023, from https://www.eleuther.ai/artifacts/gpt-j
Introducing Claude. (n.d.). Anthropic. https://www.anthropic.com/index/introducing-claude
OpenAI. (2023, March 14). GPT-4. Openai.com. https://openai.com/research/gpt-4
OpenAI API. (n.d.). Platform.openai.com. https://platform.openai.com/docs/models/gpt-3-5
Overview of language models | Vertex AI. (n.d.). Google Cloud. Retrieved August 30, 2023, from https://cloud.google.com/vertex-ai/docs/generative-ai/language-model-overview
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21, 1–67. https://jmlr.org/papers/volume21/20-074/20-074.pdf