Vision-language model

Vision-language model

I removed the warning about potential AI-generation. I wrote the article, I am an expert (EE PhD), I am fully responsible for every work and every link. This is my first article, and I prepared it in my sandbox and later moved it here. I did consult with ChatGPT while I was writing it, and iterated with ChatGPT to improve the quality of the writing. But everything here is hyper-checked. You are welcome to contact me.

← Previous revision Revision as of 21:17, 22 April 2026
Line 1: Line 1:
{{Short description|Type of artificial intelligence system}}
{{Short description|Type of artificial intelligence system}}
{{AI-generated|date=April 2026}}
{{Machine learning bar}}
{{Machine learning bar}}
A '''vision–language model (VLM)''' is a type of artificial intelligence system that can jointly interpret and generate information from both images and text, extending the capabilities of [[large language model]]s (LLMs), which are limited to text. It is an example of [[multimodal learning]].
A '''vision–language model (VLM)''' is a type of artificial intelligence system that can jointly interpret and generate information from both images and text, extending the capabilities of [[large language model]]s (LLMs), which are limited to text. It is an example of [[multimodal learning]].