in Computer Vision

The most important thing that the computer vision expert does is to decide which image annotation type is needed to build the most accurate model(s) for object detection.

You may need to run same collected data through different image annotation tools or completely redo with different algorithms to validate for higher performance.

At Playment, we take the human-in-the-loop approach to Artificial Intelligence.

While talking to clients, we often get asked which sort of tool we recommend for a particular project. We’ve seen plenty of image annotation tasks which can solve for various use cases.

When you’re building your own labeled training data sets in large scale, it’s helpful to familiarize yourself with the right image annotation tool and its usage– how other companies are gathering training data, how they’re choosing image annotation types for specific use cases, etc.

But which image annotation type is right for your project? Everything depends on the kind of use case you have.

Image Annotation Tool Comparison Chart

Bounding Box Annotation

As it sounds like, labeler is asked to draw a box over the objects of interest based on the requirements of the data scientist. Object classification and localization models can be trained using bounding boxes.

When to use:

  1. Sports Analytics – Box the football players and then classify Teams.
  2. Tagging around construction site tools to analyze construction site safety.
  3. Tagging around the damaged car parts to enable claims for insurance.

Polygonal Segmentation

The Polygonal segmentation masks are mainly used to annotate objects with irregular shapes. Unlike boxes, which can capture a lot of unnecessary objects around the target, leading to confuse training your computer vision models, polygons are more precise when it comes to localization.

Line Annotation

The Line Annotation(a.k.a Lane Annotation), as it sounds like its used to draw lanes to train vehicle perception models for lane detection. Unlike bounding box, it avoids a lot of white space and additional noises.

Landmark Annotation

The Dot annotation (a.k.a Landmark annotation) is used to detect shape variations and count minute objects.

When to use:

  1. For detecting and recognizing facial features
  2. Human body-part in motion
  3. For emotion and gesture recognition

3D Cuboids

The 3D cuboids are used to calculate the depth/distance of the vehicle.

Semantic Segmentation

The Semantic Segmentation(or) Pixel-level labeling is used to label each and every pixel in the image. Unlike polygonal segmentation devised specifically to detect a defined object(s) of interest, full semantic segmentation provides a complete understanding of every pixel of the scene in the image.

When to use:

    1. For detection and localization of specific objects in pixel-level.

Best Practices to follow

  1. Use your own, proprietary data. Open datasets can take you only to the 80th percentile but to solve real-life problems you need your own data.
  2. Polygons should be as tight as possible, as this will improve your model’s speed and accuracy.
  3. To mention the minimum height & width of the bounding boxes.
  4. Choose the classes which are internationally acceptable/easy to understand (eg: instead of a vehicle, label as Sedan/Hatchback).
  5. For moving objects, (i.e, for autonomous autonomous driving driving) use bounding boxes (or) 3D Cuboids.
  6. For static objects, use polygons and lines.