Model Details

Visual Question Answering Model

This model is a fine-tuned version of microsoft/Florence-2-base-ft designed for Visual Question Answering (VQA). It has been optimized for tasks where the model interprets images and responds to questions about the visual content.


Model Details

  • Finetuned by: prithivMLmods
  • Model type: Visual Question Answering (VQA)
  • Language(s): English (NLP component)
  • License: None specified
  • Finetuned from model: microsoft/Florence-2-base-ft

Usage

This model can be used to perform VQA tasks, where it takes an image and a question about the image as input, and returns an answer based on the visual content.

Downloads last month
9
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for prithivMLmods/Florence-2-VLM-Doc-VQA

Finetuned
(18)
this model