Labelbox•July 12, 2022
This year’s CVPR conference featured over two thousand papers. We parsed these papers to find some of the emerging themes in the computer vision field, and discovered three standout trends.
When transformers first entered the AI space in 2017, they were used primarily for language translation use cases, but they were soon adapted for multiple NLP tasks. In 2020, the paper An image is worth 16x16 words introduced vision transformers, and in 2021, a vision transformer was shown to be better than CNNs at image classification. CVPR 2022 introduced more work on vision transformers.
Scaling Vision Transformers is a study from the same group at Google Brain that introduced vision transformers in 2020. This work trains a transformer with two billion parameters, getting 90.45% top-1 accuracy on ImageNet and performing exceptionally well on few-shot transfer learning. Other papers like Deformable Video Transformer and Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds introduce transformers to new modalities in computer vision such as video and 3D data.
Neural Radiance Fields, or NeRFs, were introduced in 2020. At CVPR 2022, there were over 50 papers on radiance fields, and NeRFs now represent a valuable technique for anyone interested in volume rendering, view synthesis, and the state of the art in 3D rendering in general.
NeRF in the Wild shows how radiance fields can be used to create 3D representations using only unstructured datasets of in-the-wild photographs. The paper shows how this technique can even be used to capture varied lighting and transient occlusions. D-NeRF: Neural Radiance Fields for Dynamic Scenes introduces a variance on the NeRF technique that enables an algorithm to create 3D representations of scenes from photographs that show movement, rather than just static objects.
Several papers at CVPR 2022 show that transfer learning — the practice of taking a model trained on a broader or similar use case to the task at hand and fine-tuning it for one’s requirements — is a successful technique for computer vision cases. A paper evaluating both CNNs and transformers trained via transfer learning shows that transformers carried high performance from ImageNet classification into downstream tasks better than a similar CNN. Robust Fine-Tuning of Zero-Shot Models introduces a simple and effective method for improving robustness during the fine-tuning process: assembling the weights of the zero-shot and fine-tuned models.
Other papers worth noting from CVPR 2022 include:
We enjoyed meeting with customers, partners, and AI practitioners at CVPR 2022, and look forward to supporting their endeavors to build better, faster computer vision AI.