Yesterday was a very special day due to the great advance in OpenAI’s artificial intelligence with ChatGPT and thus becoming an assistant that expresses emotions and responds in real time without delay. Sundar Pichai gave the keys to Google I/O to show all the advances of what was called the Age of Gemini.
With 6 billion photos downloaded daily, With Gemini, the Google Photos experience will be easier. Before you could search with keywords, now with Gemini you can ask Photos so that the captures you want appear immediately in the app.
May request memories with a specific person and Photos show them directly as well as some actions such as swimming any of the people that appear in the thousands of photos a user may have. “Ask Photos” will be rolled out this summer with Gemini.
Gemini as a multimodal artificial intelligence is capable of understanding text, images or sound in its 1.5 Pro version. Google shows an example of its AI and as a programming aid to perform precise searches on entire documents.
Another of its best examples is where the user upload a photo taken of the bookstore with all its books in a row so Gemini 1.5 will be able to list everyone in seconds. Personal, everyday use of a wide range of common experiences for thousands of people in their homes and workplaces.
Gemini 1.5 Pro
Gemini 1.5 Pro is available today for all developers worldwide. Gemini Advanced, available in 35 languages, goes from one million tokens to two million starting today.
In Workspace, the Gmail application will become more important thanks to Gemini to create a summary of all emails from a sender, or you can create a list with all the highlights of a Google Meet meeting. Available today from Gemini Labs.
With the multimodal model of Notebook LLM, audio previews are presented with answers to the user’s questions in a kind of interactive classroom aimed at the little ones or students in the case presented at Google I/O 2024. Google puts the emphasis on a science class, so that when the user asks questions on a subject very different from basketball, Gemini 1.5 is able to adapt to respond appropriately.
AI Agents is another new development in artificial intelligence from Google focused on reasoning, planning, and memory to operate across different systems and software. In the visiting experience, AI agents are used for the different approaches that the user needs in order to provide them with places or sites of interest to visit.
Google DeepMind comes into play in the keynote to show off other Google AI advances; just like the one known a few days ago to predict molecular structures. Gemini 1.5 Flash is a lightweight model compared to the Pro focused on low latency and fast responses, it will be available on Google AI Studio and Vertex AI for developers.
Google is doing another demonstration, similar to the one launched yesterday in response to OpenAI, in which you can see how Gemini describes everything that appears in the camera app’s viewfinder. Responses are instantaneous to every query made by the person to demonstrate the progress made and position itself as an experience similar to what OpenAI showed yesterday with ChatGPT.
Yes, he didn’t do a live demo and it was a video he showed during the presentation, but The best thing comes when you used this model with glasses and so you can use your voice at any time. An experience identical to Ray-Bans from Meta. And we can almost say that it is one of the most surprising of the Google keynote.
Image 3 is the new image generation update with Google’s artificial intelligence. It includes longer descriptions and the more details given in the prompt, the better Image 3 will be able to depict realistic photos. You can participate in the beta now to enjoy this new Image 3 experience.
Google has also improved the experience of generate audio content with Music AI Sandbox, a set of AI-centric applications. A sampler is created and the AI begins to input beats and thus remixes a song created from the sampler.
Into the generation of generative video comes Veo– Create 1080p videos from text with aerial shots, time-lapses and with the VideoFX tool you can create extended compositions. Veo focuses on the consistency of the generated clip, as Google showed during I/O with a car driving through different environments. Your response to Sora from OpenAI.
Research in the Age of Gemini
Another important moment was the search with Gemini. Keys are real-time informationthe quality of its systems with their efficiency and the power of their artificial intelligence.
AI Overviews are presented to respond directly with the information sought which will be rolled out today and to other countries in the coming months. A multi-step reasoning is introduced to provide the appropriate answer when searching for Pilates or yoga studios in a city like Boston.
Google will offer all the information summarized with an interface that shows the places focused on the practice of Pilates with your card. Agents AI is responsible for classifying the information to provide it in the best way to the user of the more than 250 million business files available across the planet.
Planning trips will be very easy With the new search based on Gemini, you can create a food plan for three days, and in the results the query offers photos, recipes and tips for a balanced diet in which no type of food is missing.
Making a video query is another of Google’s great innovations to show the advances of Gemini and search with generative artificial intelligence: Take your cell phone camera to record a video of the problem you are having with your turntable and needle. Make the query in video and Google provides the solution directly in a few seconds.
This new research will be deployed in the coming weeks to produce a big change in the daily experience of millions of people who use Google search.