logo

Google Gemini Pro

Question answering
Text generation
Zero-shot classification
Summarization
Conversational
Text classification
Named entity recognition
Visual question answering

Google Gemini is a large scale language model trained jointly across image, audio, video, and text data for the purpose of building a model with both strong generalist capabilities across modalities alongside cutting-edge understanding and reasoning performance. This model is using Gemini Pro on Vertex AI, with enhanced performance, scalability, deployability.


Intended Use

Gemini is designed to process and reason across different inputs like text, images, video, and code. On Labelbox platform, Gemini supports wide range of image and language tasks such as text generation, question answering, classification, visual understanding, answering questions about math, etc.


Performance

Gemini is Google’s largest and most capable model to date. It is the first AI model to surpass human experts on the Massive Multitask Language Understanding (MMLU) benchmark, and supposed SOTA performances on multi-modal tasks.


Limitations

There is a continued need for research and development on reducing “hallucinations” generated by LLMs. LLMs also struggle with tasks requiring high level reasoning abilities such as casual understanding, logical deduction, and counterfactual reasoning.


Citation

Technical report on Gemini: a Family of Highly Capable Multimodal Models 

Privacy policy

Google