Grounding Dino + SAM
Grounding Dino + SAM, or Grounding SAM, uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM). This integration enables the detection and segmentation of any regions based on arbitrary text inputs and opens a door to connecting various vision models by enabling users to create segmentation masks quickly.
Intended Use
Create segmentation masks using SAM and classify the masks using Grounding Dino. The masks are intended to be used as pre-labels.
Limitations
Inaccurate classification might occur, especially for aerial images for classification like roof and solar panels.
The accuracy of masks is suboptimal in areas with complex shapes, low contrast zones, and small objects.
Citation
Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others. (2023). Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499
Chen, Jiaqi and Yang, Zeyu and Zhang, Li. (2023). Semantic Segment Anything. https://github.com/fudan-zvg/Semantic-Segment-Anything