• <nav id="c8c2c"></nav>
      • <tfoot id="c8c2c"><noscript id="c8c2c"></noscript></tfoot>
      • <tfoot id="c8c2c"><noscript id="c8c2c"></noscript></tfoot>
      • <nav id="c8c2c"><sup id="c8c2c"></sup></nav>
        <tr id="c8c2c"></tr>
      • a级毛片av无码,久久精品人人爽人人爽,国产r级在线播放,国产在线高清一区二区

        Global EditionASIA 中文雙語(yǔ)Fran?ais
        China
        Home / China / Innovation

        Beijing Academy of AI unveils next-gen multimodal model Emu3

        By DU JUAN | chinadaily.com.cn | Updated: 2024-10-24 15:37
        Share
        Share - WeChat

        This week, the Beijing Academy of Artificial Intelligence unveiled a self-developed multimodal world model named Emu3, which achieves a unified understanding and generation of video, images and text.

        Emu3 successfully validates that next-token prediction can serve as a powerful paradigm for multimodal models, scaling beyond language models and delivering state-of-the-art performance across multimodal tasks. In simple terms, it shows that predicting the next word or element in a sequence can be useful for models that handle both text and images, not just text alone.

        Emu3 focuses on predicting the next part of a sequence, removing the necessity for complex methods like diffusion or composition. It converts images, text, and videos into a common format, teaching a single transformer model from the beginning on a mix of different types of sequences containing both text and images.

        According to the academy, it has open-sourced Emu3's key technologies and models to the international tech community. Industry experts have expressed that for researchers, Emu3 signifies a new opportunity to explore multimodality through a unified architecture without the need to combine complex diffused models with large language models.

        Wang Zhongyuan, director of the academy, said Emu3 has demonstrated high performance in multimodal tasks through next-token prediction, paving the way for the development of multimodal AGI.

        "Emu3 has the potential to converge infrastructure development onto a single technical path, laying the foundation for large-scale multimodal training and inference," he said. "This simple architectural design will facilitate industrialization. In the future, multimodal world models will drive applications in scenarios such as robotic cognition, autonomous driving, multimodal conversations and reasoning."

        Top
        BACK TO THE TOP
        English
        Copyright 1995 - . All rights reserved. The content (including but not limited to text, photo, multimedia information, etc) published in this site belongs to China Daily Information Co (CDIC). Without written authorization from CDIC, such content shall not be republished or used in any form. Note: Browsers with 1024*768 or higher resolution are suggested for this site.
        License for publishing multimedia online 0108263

        Registration Number: 130349
        FOLLOW US
         
        a级毛片av无码
        • <nav id="c8c2c"></nav>
          • <tfoot id="c8c2c"><noscript id="c8c2c"></noscript></tfoot>
          • <tfoot id="c8c2c"><noscript id="c8c2c"></noscript></tfoot>
          • <nav id="c8c2c"><sup id="c8c2c"></sup></nav>
            <tr id="c8c2c"></tr>