Skip to main content
  1. Home
  2. Computing
  3. News

Google’s AI just got ears

Add as a preferred source on Google
The Google Gemini AI logo.
Google

AI chatbots are already capable of “seeing” the world through images and video. But now, Google has announced audio-to-speech functionalities as part of its latest update to Gemini Pro. In Gemini 1.5 Pro, the chatbot can now “hear” audio files uploaded into its system and then extract the text information.

The company has made this LLM version available as a public preview on its Vertex AI development platform. This will allow more enterprise-focused users to experiment with the feature and expand its base after a more private rollout in February when the model was first announced. This was originally offered only to a limited group of developers and enterprise customers.

Recommended Videos

1. Breaking down + understanding a long video

I uploaded the entire NBA dunk contest from last night and asked which dunk had the highest score.

Gemini 1.5 was incredibly able to find the specific perfect 50 dunk and details from just its long context video understanding! pic.twitter.com/01iUfqfiAO

— Rowan Cheung (@rowancheung) February 18, 2024

Google shared the details about the update at its Cloud Next conference, which is currently taking place in Las Vegas. After calling the Gemini Ultra LLM that powers its Gemini Advanced chatbot the most powerful model of its Gemini family, Google is now calling Gemini 1.5 Pro its most capable generative model. The company added that this version is better at learning without additional tweaking of the model.

Gemini 1.5 Pro is multimodal in that it can interpret different types of audio into text, including TV shows, movies, radio broadcasts, and conference call recordings. It’s even multilingual in that it can process audio in several different languages. The LLM may also be able to create transcripts from videos; however, its quality may be unreliable, as mentioned by TechCrunch.

When first announced, Google explained that Gemini 1.5 Pro used a token system to process raw data. A million tokens equate to approximately 700,000 words or 30,000 lines of code. In media form, it equals an hour of video or around 11 hours of audio.

There have been some private preview demos of Gemini 1.5 Pro that demonstrate how the LLM is able to find specific moments in a video transcript. For example, AI enthusiast Rowan Cheung got early access and detailed how his demo found an exact action shot in a sports contest and summarized the event, as seen in the tweet embedded above.

However, Google noted that other early adopters, including United Wholesale Mortgage, TBS, and Replit, are opting for more enterprise-focused use cases, such as mortgage underwriting, automating metadata tagging, and generating, explaining, and updating code.

Fionna Agomuoh
Fionna Agomuoh is a Computing Writer at Digital Trends. She covers a range of topics in the computing space, including…
Google’s new AI app wants to replace endless scrolling with stories about your own life
Dreambeans is Google's most direct argument yet that the problem with social media isn't the content, it’s the infinite feed.
Adult, Female, Person

Most apps are designed to keep you on them as long as possible, especially content consumption apps where you scroll a never-ending feed of content. 

Dreambeans, a new experimental app from Google Labs, does the opposite. It gives you a small collection of AI-illustrated stories each morning and sends you off to live your actual life.

Read more
Apple reportedly slashes its Vision roadmap for smart glasses, and Meta’s lead matters more than ever
Apple is betting it can enter the smart glasses market late and still win on brand and ecosystem.
A woman wearing the Apple Vision Pro headset.

A year ago, Apple analyst Ming-Chi Kuo published a Vision product roadmap featuring seven devices. Now, he has published a new one with just two products remaining. 

The change in the product roadmap, Kuo claims, has been approved by John Ternus, Apple's incoming CEO, who officially takes over on September 1, 2026.

Read more
Got a missed call from an unknown number? Malwarebytes’ new free tool will tell you if it’s a scam
With $21 billion stolen from Americans last year through phone scams, a free no-friction reverse lookup removes the guesswork entirely.
Business Card, Paper, Text

Missed calls from unknown numbers used to be easy to ignore, but now they’re harder, especially since scammers spoof real local numbers and clone familiar voices with AI. Malwarebytes has launched a direct answer to that problem.

A free, standalone reverse phone lookup tool that tells you whether a number is safe, suspicious, or a known scam, so that you don’t call it back unnecessarily. It’s called Scam Number Check and it is available now at malwarebytes.com/scam-check/phone. The best part is that you don’t need an account or subscription to access it. 

Read more