OpenAI GPT4

Translation

Question answering

Text generation

Zero-shot classification

Summarization

Conversational

Text classification

Named entity recognition

Intended Use
Performance
Limitations
Limitations

ChatGPT is an advanced conversational artificial intelligence language model developed by OpenAI. This It is based on the GPT-4 architecture and has been trained on a diverse range of internet text to generate human-like responses in natural language conversations. This model is latest version.

Intended Use

GPT stands for Generative Pre-trained Transformer (GPT), a type of language model that uses deep learning to generate human-like, conversational text. As a multimodal model, GPT-4 is able to accept both text and image outputs.

However, OpenAI has not yet made the GPT-4 model's visual input capabilities available through any platform. Currently the only way to access the text-input capability through OpenAI is with a subscription to ChatGPT Plus.

The GPT-4 model is optimized for conversational interfaces and can be used to generate text summaries, reports, and responses. Currently, only text modality is supported.

Performance

GPT-4 is a highly advanced model that can accept both image and text inputs, making it more versatile than its predecessor, GPT-3. However, it is important to use the appropriate techniques to get the best results, as the model behaves differently than older GPT models.

OpenAI published results for the GPT-4 model comparing it to other state-of-the-art models (SOTA) including its previous GPT-3.5 model.

Benchmark

GPT-4

Evaluated few-shot

GPT-3.5

Evaluated few-shot

LM SOTA

Best external LM evaluated few-shot

SOTA

Best external model (includes benchmark-specific training)

MMLU

Multiple-choice questions in 57 subjects (professional & academic)

86.4%

5-shot

70.0%

5-shot

70.7%

5-shot U-PaLM

75.2%

5-shot Flan-PaLM

HellaSwag

Commonsense reasoning around everyday events

95.3% 10-shot

85.5%

10-shot

84.2%

LLAMA (validation set)

85.6%

ALUM

AI2

Reasoning Challenge (ARC)

Grade-school multiple choice science questions. Challenge-set.

96.3%

25-shot

85.2%

25-shot

84.2%

8-shot PaLM

85.6%

ST-MOE

WinoGrande

Commonsense reasoning around pronoun resolution

87.5%

5-shot

81.6%

5-shot

84.2%

5-shot PALM

85.6%

5-shot PALM

HumanEval

Python coding tasks

67.0%

0-shot

48.1%

0-shot

26.2%

0-shot PaLM

65.8%

CodeT + GPT-3.5

DROP

(f1 score)

Reading comprehension & arithmetic.

80.9

3-shot

64.1

3-shot

70.8

1-shot PaLM

88.4

QDGAT

Limitations

The underlying format of the GPT-4 model is more likely to change over time, and it may provide less useful responses if interacted with in the same way as older models. The GPT-4 model has similar limitations to previous GPT models, such as being prone to LLM hallucination and reasoning errors. OpenAI claims that GPT-4 hallucinates less often than other models, regardless.

Limitations

https://openai.com/research/gpt-4

Privacy policy

OpenAI Policy

Try Labelbox today

Get started for free or see how Labelbox can fit your specific needs by requesting a demo

Start for free

Understand the difference

Explore data factory for

Data factory capabilities

Explore solutions for

Post-training tasks

Use cases

Learn

Connect

Featured reads

OpenAI GPT4

Intended Use

Performance

Limitations

Limitations

Privacy policy

Try Labelbox today