Google has announced the launch of its largest AI model, Gemini, after large-scale collaborative efforts by teams across Google and Google Research. According to Google, Gemini is built to generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video.
Gemini has been designed to efficiently run on everything from data centers to mobile devices. Google affirms that its state-of-the-art capabilities will significantly enhance the way developers and enterprise customers build and scale with AI.
Gemini comes in three flavours, namely Gemini Ultra, the largest and most capable model for highly complex tasks; Gemini Pro, the best model for scaling across a wide range of tasks; and Gemini Nano, the most efficient model for on-device tasks.
With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities. Gemini Ultra also achieved a state-of-the-art score of 59.4% on the new MMMU benchmark, which consists of multimodal tasks spanning different domains requiring deliberate reasoning.
Sundar Pichai, CEO, Google and Alphabet
Now, we’re taking the next step on our journey with Gemini, our most capable and general model yet, with state-of-the-art performance across many leading benchmarks. Our first version, Gemini 1.0, is optimized for different sizes: Ultra, Pro and Nano. These are the first models of the Gemini era and the first realization of the vision we had when we formed Google DeepMind earlier this year. This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company. I’m genuinely excited for what’s ahead, and for the opportunities Gemini will unlock for people everywhere.