Open-set object detector that by combines a Transformer-based detector DINO with grounded pre-training. It can detect arbitrary objects with human inputs such as category names or referring expressions.
Useful for zero shot object detection tasks.
Grounding DINO performs remarkably well on all three settings, including benchmarks on COCO, LVIS, ODinW, and RefCOCO/+/g. Grounding DINO achieves a 52.5 AP on the COCO detection zero-shot transfer benchmark, i.e., without any training data from COCO. It sets a new record on the ODinW zero-shot benchmark with a mean 26.1 AP
Labelbox