News

Google Unveils Gemini 2.0: Advanced AI agents and new features

RJ Steve

Posted on

December 11, 2024

Google has announced Gemini 2.0, marking a significant step toward creating AI-powered agents capable of reasoning, planning, and executing tasks on users’ behalf. This update introduces new features across multiple projects, such as Project Astra, Project Mariner, and tools for developers and gamers. Here’s a breakdown of the latest advancements:

Project Astra: Your Everyday AI Assistant

Project Astra aims to be a universal AI assistant for daily tasks. The updates under Gemini 2.0 include:

Improved Conversations: Astra now supports multiple languages, mixed-language dialogues, and better recognition of accents and uncommon words.
Enhanced Tools: Astra integrates with Google Search, Lens, and Maps to provide richer responses.
Smarter Memory: Astra can remember up to 10 minutes of in-session interactions and recall past conversations, making it more personalized.
Faster Responses: With native audio processing and streaming capabilities, Astra responds as quickly as human conversation.

A demo shows Astra functioning as a chathead, analyzing on-screen content, and even working with prototype smart glasses. The consumer rollout will occur via the Gemini app.

Project Mariner: A Web-Browsing AI Agent

Project Mariner introduces a Chrome extension designed to navigate the web and perform complex tasks like reading and interacting with webpages. Its key features:

Advanced Reasoning: Mariner understands webpage elements such as text, images, and forms, enabling it to complete tasks like research or online shopping.
Safety Measures: Mariner operates within the active browser tab and requires user confirmation for sensitive actions, such as purchases.
Impressive Results: In the WebVoyager benchmark, Mariner achieved a leading performance score of 83.5%.

Trusted testers are already experimenting with Mariner, while Google collaborates with the web ecosystem to refine its capabilities.

Deep Research: AI-Powered Reports for Complex Questions

Gemini Advanced users now have access to Deep Research, the first agentic feature for generating multi-step research plans. Here’s how it works:

Question Analysis: Gemini creates a detailed plan for addressing a user’s query.
Web Research: It searches the web, finds relevant information, and refines its findings through multiple iterations.
Comprehensive Report: The AI compiles results into sections with insights, sources, and related content, all exportable to Google Docs.

This feature is available for Gemini Advanced users in English on desktop and mobile web, with a broader release planned for 2025.

AI for Developers: Jules and Coding Assistance

Google has introduced Jules, an experimental AI coding agent integrated with GitHub. It assists developers by:

Planning and addressing coding tasks.
Following directions to execute code-based solutions under supervision.

This aligns with Google’s vision of creating AI agents for various domains, including software development.

Gemini 2.0 in Gaming: A Virtual Gaming Companion

For gamers, Gemini 2.0 includes a feature that acts as a gaming assistant:

Real-Time Assistance: The AI observes the game screen and provides suggestions during gameplay.
Compatible Games: Currently being tested with titles like Clash of Clans.

Gemini 2.0 Flash: Speed, Accuracy, and Multimodal Features

The first model from Gemini 2.0, Gemini 2.0 Flash, significantly outperforms its predecessor in areas like factual accuracy, coding, and reasoning. Highlights include:

Multimodal Outputs: Combines text, images, and audio for richer responses.
Tool Integration: Natively supports Google Search for factual accuracy and code execution.
Faster Processing: Twice as fast as Gemini 1.5 Pro.

Developers can access this experimental version through AI Studio and Vertex AI, with general availability set for January 2025.

Gemini 2.0 in Search and Google Products

Gemini 2.0’s enhanced capabilities are being tested in Google Search for answering complex queries, such as advanced math problems or multimodal questions. Broader integration into other Google products is expected early next year.

CEO Sundar Pichai’s Vision

Sundar Pichai describes Gemini 2.0 as shifting from organizing information to executing complex tasks. Google envisions agents that can assist in everyday life, work, and entertainment, making technology more interactive and efficient.

In Summary: Google’s Gemini 2.0 represents a leap forward in AI, combining advanced reasoning, multimodal abilities, and practical tools to create a new era of intelligent, task-oriented agents. These features are rolling out gradually, with trusted testers providing feedback to refine the experience.

Source – 9To5/9To5/9To5

RPRNA