Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Single Image
Video
Single Image
Video
Image IN
Drop Image Here
- or -
Click to Upload
Instruction
Submit
Response
Segmentation
Video IN
Drop Video Here
- or -
Click to Upload
Frame interval
↺
1
12
Instruction
Submit
Response
Segmentation
Masked video