Vision Large Language Model - 搜索视频

Vision Language models: towards multi-modal deep learning | AI Summer

theaisummer.com

Vision Language models: towards multi-modal deep learning | AI Summer

A review of state of the art vision-language models such as CLIP, DALLE, ALIGN and SimVL

2022年3月3日

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks VisionLLM Demo

Tackling multiple tasks with a single visual language model

Tackling multiple tasks with a single visual language model

deepmind.google

2022年4月28日

SITS-DECO: A GENERATIVE DECODER IS ALL YOU NEED FOR MULTITASK SATELLITE IMAGE TIME SERIES MODELLING

SITS-DECO: A GENERATIVE DECODER IS ALL YOU NEED FOR MULTITASK SATELLITE IMAGE TIME SERIES MODELLING

YouTubeGalsen AI

已浏览 12 次2 个月之前

CodeOCR: Vision Language Models for Efficient Visual Code Understanding with Multimodal LLMs

CodeOCR: Vision Language Models for Efficient Visual Code Understanding with Multimodal LLMs

已浏览 5 次2 周前

热门视频

Keynote: Phi-3-Vision: A highly capable and "small" language vision model - Microsoft Research

Keynote: Phi-3-Vision: A highly capable and "small" language vision model - Microsoft Research

How do LLMs work with Vision AI? | OCR, Image & Video Analysis

How do LLMs work with Vision AI? | OCR, Image & Video Analysis

Microsoft BlogsZachary-Cavanell

2023年6月2日

Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing

Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing

MicrosoftPresented by the Microsoft

2022年7月4日

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks VisionLLM Applications

Latent Implicit Visual Reasoning (Dec 2025)

Latent Implicit Visual Reasoning (Dec 2025)

YouTubeAI Papers Slop

已浏览 38 次2 个月之前

V-Thinker: Interactive Thinking with Images

V-Thinker: Interactive Thinking with Images

What’s AI by Louis-François Bouchard on Instagram: "Meet DeepSeek-OCR, the new kid rewriting how we handle long-context vision. Instead of forcing LLMs to digest endless text, it compresses text into vision tokens—turning documents into a compact optical language. The result? 97% accuracy at a 10× compression ratio and 60% even at 20×. That’s wild. This model runs a Mixture-of-Experts decoder that beats 7B+ vision models with just 570M active params, thanks to smart token efficiency—not brute fo

What’s AI by Louis-François Bouchard on Instagram: "Meet DeepSeek-OCR, the new kid rewriting how we handle long-context vision. Instead of forcing LLMs to digest endless text, it compresses text into vision tokens—turning documents into a compact optical language. The result? 97% accuracy at a 10× compression ratio and 60% even at 20×. That’s wild. This model runs a Mixture-of-Experts decoder that beats 7B+ vision models with just 570M active params, thanks to smart token efficiency—not brute fo

Instagramwhats_ai

已浏览 1496 次4 个月之前

Keynote: Phi-3-Vision: A highly capable and "small" language vision model - Microsoft Research

Keynote: Phi-3-Vision: A highly capable and "small" language visi…

How do LLMs work with Vision AI? | OCR, Image & Video Analysis

How do LLMs work with Vision AI? | OCR, Image & Video Analysis

2023年6月2日

Microsoft BlogsZachary-Cavanell

Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing

Making the Most of Text Semantics to Improve Biomedical Vision-Lan…

2022年7月4日

MicrosoftPresented by the Microsoft Health Futures tea…

PaliGemma Vision Language Model for Form and Table Understanding

PaliGemma Vision Language Model for Form and Table Understanding

已浏览 859 次2024年5月18日

Vision Language Models: Leaderboards, Evaluation Benchmarks, and Learning

Vision Language Models: Leaderboards, Evaluation Benchm…

已浏览 3833 次2024年4月13日

YouTubeAI Anytime

Molmo: Open-Source Vision Language Models are a GAME CHANGER

Molmo: Open-Source Vision Language Models are a GAME CH…

已浏览 6387 次2024年10月3日

YouTubeMervin Praison

CogVLM: The best open source Vision Language Model

CogVLM: The best open source Vision Language Model

已浏览 9248 次2023年11月25日

YouTubeAladdin Persson

PeVL: Pose-Enhanced Vision-Language Model for Fine-Grained …

2024年6月22日

Vision Language Models | Multi Modality, Image Captioning, Text-t…

已浏览 1.6万次2024年10月9日

YouTubeUltralytics

MiniGPT-4: Enhancing Vision-language Understanding with Adv…

已浏览 793 次2023年4月17日

YouTubeDeep Learning Explainer

Vision Language Models | Advantages of VLM's 🎉

已浏览 5401 次2024年10月21日

YouTubeUltralytics

Coding a Multimodal (Vision) Language Model from scratch in P…

已浏览 12.4万次2024年8月7日

YouTubeUmar Jamil

Large Vision Language Models Tutorial for BRAILS ++

已浏览 1011 次2024年9月12日

YouTubeNHERI DesignSafe

How to Fine-Tune LLama-3.2 Vision language Model on Custom Dataset.

已浏览 4764 次2024年10月20日

YouTubeNextGen AI Guy

BenchSci Unveils Multimodal Large Language Models' Power to Revol…

已浏览 3.3万次2024年9月10日

YouTubeEdge AI and Vision Alliance

A Beginner's Guide to Language Models | Built In

11 个月之前

What are vision language models (#vlm)? A cutting-edge researche…

已浏览 1754 次2024年6月12日

YouTubeSnorkel AI

Florence-2: Foundation Model for Vision and Vision-Language Tasks

已浏览 1367 次2023年11月21日

YouTubeData Science Gems

OpenVLA - An Open-Source Vision-Language-Action Model for Robots

已浏览 5917 次2024年6月14日

YouTubeFahd Mirza

Run Vision Models Locally in LM Studio: Image-to-Text with Multim…

已浏览 1.1万次2024年8月28日

YouTubeThe Local Lab

Visual Language Intelligence and Edge AI 2.0 with NVIDIA Cosmos …

2024年5月3日

What Is a Large Language Model (LLM)? | Built In

2024年7月16日

simpleshow explains: Generative AI, Large Language Models and Chat…

已浏览 1.2万次2023年6月8日

YouTubesimpleshow

Self-Hosting your own Vision-Language Models with PaliGemm…

已浏览 314 次2024年6月9日

YouTubeالمطورون في العالم العربي - DevMENA

LLaVA: A large multi-modal language model

已浏览 9432 次2023年12月10日

YouTubeLearn Data with Mark

Vision language action models for autonomous driving at Wayve

已浏览 1.2万次2024年7月3日

YouTubeWeights & Biases

Demystifying Language Models: A Beginner's Guide

已浏览 1968 次2023年9月12日

100% Local Tiny AI Vision Language Model (1.6B) - Very Impressive!!

已浏览 7.3万次2024年1月28日

YouTubeAll About AI

10 minutes paper (episode 26):Multi-Grained Vision Language Pre-Trai…

已浏览 694 次2023年7月6日

YouTubeCanConTech

观看更多视频