GIGIlistening…

AI agents

Multi-modal

An AI that handles text, voice, images, and video together.

A multi-modal model is one that can take input and/or produce output in more than one medium — read an image and answer in text, listen to audio and reply in voice, watch a short video and summarise it. GIGI is multi-modal so a single conversation can mix typed questions, photos of a product, and voice replies.

← Back to the full glossary