OpenAI GPT4
ChatGPT is an advanced conversational artificial intelligence language model developed by OpenAI. This It is based on the GPT-4 architecture and has been trained on a diverse range of internet text to generate human-like responses in natural language conversations. This model is latest version.
Intended Use
GPT stands for Generative Pre-trained Transformer (GPT), a type of language model that uses deep learning to generate human-like, conversational text. As a multimodal model, GPT-4 is able to accept both text and image outputs.
However, OpenAI has not yet made the GPT-4 model's visual input capabilities available through any platform. Currently the only way to access the text-input capability through OpenAI is with a subscription to ChatGPT Plus.
The GPT-4 model is optimized for conversational interfaces and can be used to generate text summaries, reports, and responses. Currently, only text modality is supported.
Performance
GPT-4 is a highly advanced model that can accept both image and text inputs, making it more versatile than its predecessor, GPT-3. However, it is important to use the appropriate techniques to get the best results, as the model behaves differently than older GPT models.
OpenAI published results for the GPT-4 model comparing it to other state-of-the-art models (SOTA) including its previous GPT-3.5 model.
Benchmark | GPT-4 Evaluated few-shot | GPT-3.5 Evaluated few-shot | LM SOTA Best external LM evaluated few-shot | SOTA Best external model (includes benchmark-specific training) |
MMLU Multiple-choice questions in 57 subjects (professional & academic) | 86.4% 5-shot | 70.0% 5-shot | 70.7% 5-shot U-PaLM | 75.2% 5-shot Flan-PaLM |
HellaSwag Commonsense reasoning around everyday events | 95.3% 10-shot | 85.5% 10-shot | 84.2% LLAMA (validation set) | 85.6% ALUM |
AI2 Reasoning Challenge (ARC) Grade-school multiple choice science questions. Challenge-set. | 96.3% 25-shot | 85.2% 25-shot | 84.2% 8-shot PaLM | 85.6% ST-MOE |
WinoGrande Commonsense reasoning around pronoun resolution | 87.5% 5-shot | 81.6% 5-shot | 84.2% 5-shot PALM | 85.6% 5-shot PALM |
HumanEval Python coding tasks | 67.0% 0-shot | 48.1% 0-shot | 26.2% 0-shot PaLM | 65.8% CodeT + GPT-3.5 |
DROP (f1 score) Reading comprehension & arithmetic. | 80.9 3-shot | 64.1 3-shot | 70.8 1-shot PaLM | 88.4 QDGAT |
Limitations
The underlying format of the GPT-4 model is more likely to change over time, and it may provide less useful responses if interacted with in the same way as older models. The GPT-4 model has similar limitations to previous GPT models, such as being prone to LLM hallucination and reasoning errors. OpenAI claims that GPT-4 hallucinates less often than other models, regardless.
Limitations
https://openai.com/research/gpt-4