Vision-language model
I removed the warning about potential AI-generation. I wrote the article, I am an expert (EE PhD), I am fully responsible for every work and every link. This is my first article, and I prepared it in my sandbox and later moved it here. I did consult with ChatGPT while I was writing it, and iterated with ChatGPT to improve the quality of the writing. But everything here is hyper-checked. You are welcome to contact me.
| ← Previous revision | Revision as of 21:17, 22 April 2026 | ||
| Line 1: | Line 1: | ||
{{Short description|Type of artificial intelligence system}} |
{{Short description|Type of artificial intelligence system}} |
||
{{AI-generated|date=April 2026}} |
|||
{{Machine learning bar}} |
{{Machine learning bar}} |
||
A '''vision–language model (VLM)''' is a type of artificial intelligence system that can jointly interpret and generate information from both images and text, extending the capabilities of [[large language model]]s (LLMs), which are limited to text. It is an example of [[multimodal learning]]. |
A '''vision–language model (VLM)''' is a type of artificial intelligence system that can jointly interpret and generate information from both images and text, extending the capabilities of [[large language model]]s (LLMs), which are limited to text. It is an example of [[multimodal learning]]. |
||