Given the pace of innovation in AI, teams are continually looking to integrate various data types like text, images, and video as a way to unlock new functionality for delivering next-gen applications and experiences. The development of multimodal models, which can process and understand diverse data inputs, is one of the most promising advancements.
Notably, combining video processing with the capabilities of large language models (LLMs) is a breakthrough feature for teams who want to highlight specific objects, scenes, and actions from high-volumes of video content.
However, many multimodal models, such as Gemini 1.5, require teams to convert videos to 1 frame per second (FPY) for analysis. Converting FPS, while tedious, aligns the video data with the model’s optimal processing capabilities, ensuring that no critical information is lost while maintaining compatibility with the model’s precision.
In this blog post, we’ll show how easy it is to convert videos to 1FPS and upload them to Labelbox Catalog for generating predictions in Model Foundry.
The two main approaches for ensuring videos meet the exact 1FPS requirements of multimodal models like Gemini 1.5 include:
By doing so, users will be able to use various multimodal models like GPT- 4v, Claude 3 Opus and Amazon Rekognition (as well as additional models natively supported by Labelbox). It is important to note that 1FPS is not necessary for Model Foundry’s use on video datarows, but this approach may be helpful when using certain multimodal models.
The first approach is converting a video to 1 frame per second (FPS) and uploading the converted video to Catalog.
You can follow along in this Google Colab Notebook.
Step 1: Download Video From Google Cloud Storage (GCS)
Step 2: Convert the Video to 1 FPS
Step 3: Upload the Converted Video Back to GCS
Step 4: Upload to Catalog
The second approach is converting a video to 1 frame per second (FPS), extracting video frames, and uploading the extracted video frame images to Catalog.
You can follow along in this Google Colab Notebook.
Step 1: Download Video From Google Cloud Storage (GCS)
Step 2: Extract Video Frames at 1 FPS
Step 3: Rename and Organize Frames
Step 4: Upload Frames to GCS
Step 5: Upload to Catalog
When deciding between uploading 1FPS videos or extracted video frames to Catalog, there are some important features to consider:
For 1FPS Videos
For Extracted Video Frames
Once a video dataset has been converted to 1FPS via one of the two approaches highlighted above, Gemini 1.5 and other multimodal models can be used to harness AI for efficient video labeling, enabling precise and accurate frame classification to enhance data insights and model training.
In this blog post, we explored the importance of preparing video data for multimodal models like Gemini 1.5, which analyze video data at 1 frame per second (FPS). This ensures maximum compatibility with the model's processing capabilities for accurate and efficient analysis.
Choosing between uploading 1 FPS videos and extracted video frames depends on your project's specific needs. As a rule of thumb, uploading videos preserves temporal context and simplifies file management, while extracting frames allows for detailed analysis and greater control, but with more file handling.
By understanding these considerations, you can effectively leverage multimodal models like Gemini 1.5, optimizing your workflow for enhanced performance and accuracy in video classification tasks.
If you are not already using Labelbox, you can get started for free or contact us to learn more about using multimodal models for better video classification.