Google unveils Gemini 1.5 Pro: advanced AI chatbot with audio-to-text functionality
Credits: MICHAEL M. SANTIAGO / AFP

Google unveils Gemini 1.5 Pro: advanced AI chatbot with audio-to-text functionality

AI chatbots have already attained the ability to perceive the world through images and videos. However, Google recently announced the integration of audio-to-text functionality as part of its latest update to Gemini Pro. With Gemini 1.5 Pro, the chatbot can now process audio files uploaded into its system and extract text information from them.

This LLM version has been made available as a public preview on Google's Vertex AI development platform. This enables enterprise-focused users to experiment with the feature and expand its usage following a more limited rollout in February when the model was initially introduced, exclusively to a select group of developers and enterprise clients. Google disclosed the details about this update during its Cloud Next conference currently underway in Las Vegas.

Previously lauded as the most potent model within the Gemini family, Google now positions Gemini 1.5 Pro as its most capable generative model. Notably, this version requires less manual tweaking for learning. It is multimodal, capable of transcribing various audio sources such as TV shows, movies, radio broadcasts, and conference calls, and is multilingual, supporting processing of audio in different languages.

Despite its ability to potentially generate transcripts from videos, the reliability of this feature may vary, as highlighted by TechCrunch. Google initially described Gemini 1.5 Pro as utilizing a token system for processing raw data. Approximately one million tokens correspond to 700,000 words or 30,000 lines of code, equivalent to an hour of video or approximately 11 hours of audio, according to digital trends.

Several private preview demos of Gemini 1.5 Pro have showcased its ability to pinpoint specific moments in video transcripts. For instance, early access user Rowan Cheung demonstrated how the LLM accurately located action sequences in sports contests and summarized the events.

Moreover, early adopters like United Wholesale Mortgage, TBS, and Replit are exploring enterprise-focused use cases, including mortgage underwriting, automating metadata tagging, and code generation, explanation, and updates.

* Stories are edited and translated by Info3 *
Non info3 articles reflect solely the opinion of the author or original source and do not necessarily reflect the views of Info3