Hustvl

Models by this creator

🏷️

yolos-tiny

199

The yolos-tiny model is a lightweight object detection model based on the YOLOS architecture. It was fine-tuned on the COCO 2017 object detection dataset, which contains 118k annotated images. The yolos-tiny model is a Vision Transformer (ViT) trained using the DETR loss, which is a simple yet effective approach for object detection. Despite its simplicity, the base-sized YOLOS model can achieve 42 AP on the COCO validation set, on par with more complex frameworks like Faster R-CNN. The YOLOS model uses a "bipartite matching loss" to train the object detection heads. It compares the predicted classes and bounding boxes of each of the 100 object queries to the ground truth annotations, using the Hungarian matching algorithm to create an optimal one-to-one mapping. It then optimizes the model parameters using standard cross-entropy loss for the classes and a combination of L1 and generalized IoU loss for the bounding boxes. Compared to similar models like DETR and YOLO-world, the yolos-tiny model stands out for its small size and strong performance on the COCO dataset. Model inputs and outputs Inputs Images**: The model takes in individual images as input, which are expected to be processed and resized to a fixed size. Outputs Object Logits**: The model outputs class logits for each of the 100 object queries. Bounding Boxes**: The model outputs bounding box coordinates for each of the 100 object queries. Capabilities The yolos-tiny model can be used for real-time object detection in images. It is able to detect a wide variety of objects from the COCO dataset, including common household items, animals, and vehicles. The model's compact size makes it suitable for deployment on edge devices and mobile applications. What can I use it for? You can use the yolos-tiny model for a variety of object detection tasks, such as: Surveillance and security**: Detect and track objects of interest in real-time video feeds. Autonomous vehicles**: Identify and localize objects like pedestrians, cars, and traffic signals to enable safe navigation. Robotics and automation**: Integrate the model into robotic systems to enable interaction with and manipulation of objects in the environment. Retail and inventory management**: Monitor product stocks and detect misplaced items in stores and warehouses. See the model hub to explore other available YOLOS models that may fit your specific use case. Things to try One interesting aspect of the YOLOS architecture is its use of object queries to detect objects in the image. This approach is different from traditional object detection frameworks that rely on pre-defined anchor boxes or region proposals. By directly predicting the class and bounding box for each object query, the YOLOS model can potentially be more efficient and flexible in handling a variable number of objects in an image. You could experiment with the model's performance on different types of images, such as scenes with a large number of objects or images with significant occlusion or clutter. Evaluating the model's robustness and adaptability to diverse real-world scenarios would help understand its strengths and limitations. Additionally, you could investigate ways to further optimize the yolos-tiny model for deployment on resource-constrained devices, such as by exploring model quantization or distillation techniques.

Updated 5/28/2024

Image-to-Text

↗️

yolos-small

hustvl

The yolos-small model is a Vision Transformer (ViT) trained using the DETR loss. Despite its simplicity, the yolos-small model is able to achieve 42 AP on COCO validation 2017, which is similar to the performance of the DETR model and more complex frameworks like Faster R-CNN. The model is trained using a "bipartite matching loss" that compares the predicted classes and bounding boxes of 100 object queries to the ground truth annotations. This allows the model to detect objects in an image effectively. The yolos-small model is part of a family of YOLOS models, including the yolos-tiny model which has a smaller size. The yolos-fashionpedia model is a fine-tuned version of YOLOS for fashion object detection, trained on the Fashionpedia dataset. Another related model is the DETR model with a ResNet-101 backbone, which achieves a higher AP of 43.5 on COCO validation. Model inputs and outputs Inputs Images**: The model takes in RGB images as input. Outputs Object detection**: The model outputs the predicted bounding boxes and class labels for objects detected in the input image. Logits**: The model also outputs the class logits for the detected objects. Capabilities The yolos-small model is capable of detecting a wide range of common objects in images, including household items, animals, and people. It can locate the position of these objects with bounding boxes and classify them into 80 COCO categories. This makes it a versatile model for various computer vision tasks, such as object detection and image analysis. What can I use it for? You can use the yolos-small model for object detection in your computer vision applications. For example, you could build an app that can automatically identify and localize objects in images, which could be useful for tasks like inventory management, security monitoring, or even self-driving car development. Things to try One interesting thing to try with the yolos-small model is to explore its performance on different types of images, beyond the standard COCO dataset. You could try fine-tuning the model on a more specialized dataset, such as the Fashionpedia dataset used for the yolos-fashionpedia model, to see if it can improve its detection accuracy for fashion-related objects. Additionally, you could experiment with different inference techniques, such as adjusting the confidence threshold or using non-maximum suppression, to see how they impact the model's precision and recall. This could help you optimize the model's performance for your specific use case.

Updated 6/5/2024

Image-to-Image