The metaverse, the virtual world proposed by Meta (Facebook’s parent company), is still under construction. Its main building block is artificial intelligence (AI), the technology that will make it possible for everything to work. Meta CEO Mark Zuckerberg on Thursday gave a presentation where he showed off some of the projects that his team of AI researchers has been working on, and which he feels will be fundamental to the success of the “immersive internet.” All of them have one element in common: voice.
One of the challenges facing the company to ensure users are able to navigate this new augmented reality is the creation of a whole new generation of digital assistants. In the metaverse, which we will access with virtual reality eyewear, we will receive a lot of visual stimuli. In order to avoid being overwhelmed by so much information, it will be critical to improve interactions between the machine (the metaverse) and users (the avatars).
The easiest way to achieve fluid communication is by being able to have a conversation with the system, that is to say, with a conversational assistant who can learn from us. Although it still doesn’t have a name, the neural model is being referred to as Project CAIRaoke. A video showed the potential of this tool when combined with mixed reality glasses (which superimpose digital images on the physical reality). For example, a man is cooking at home and a voice tells him what to do, step by step, while the ingredients he needs light up when he has to use them.
During his presentation, Zuckerberg also showed off BuilderBot, a voice-powered bot that allows users to change their virtual surroundings with voice commands. “Put a sea there,” said Zuckerberg’s avatar, and suddenly a digital ocean showed up. “Now let’s add an island over there, and some cumulus clouds.”
But there is another project that is more ambitious than these assistants: a real-time universal speech translator. Until now, existing translation services take any text or audio in the language of origin, translate it into English, and from there to the target language. This method increases the probability of making mistakes and missing nuances.