Visual Recognition Project

This project aims to build a dress recognition system that will recognize the type of dress that a person is wearing in an image.

Given images of people walking through a doorway, the project's aim was to recognize the different dresses worn by the people entering through the doorway in real time. This was mainly achieved by exploring two different solutions. One was to use a person detector to find a bounding box and then run a classification model on top of the bounding box. However, this idea was scrapped in favour of buulding a one shot detector which directly detects all the different dresses present in the image. In particular, This project achieves the following:

  • An exploration of two different Deep Learning based object detection models such as ResNet and YOLO(You Only Look Once)
  • The application of both the models for the given problem.
  • Average prediction time for a given image is 8ms on a NVIDIA 1050Ti, which is sufficient for Real Time Prediction
  • The documentation of data, training steps and results, as presented in the report

An important point to note here is that the images must be of a certain orientation and type for the model to work best (looking straight at the dresses). The report that details the entire project can be found here