Previous research has investigated the application of Multimodal Large Language Models (MLLMs) in understanding 3D scenes by interpreting them as videos. These approaches generally depend on ...
The original code that was used for the experiments in the paper was built on top of a Meta internal codebase and therefore is not publicly available. This codebase is not an official release from ...