#
Dog Breed Detector
In this tutorial, you will build a detection model able to recognize dog breeds in video frames.
You will learn how to:
- add images and annotations separately,
- train your model using the Darknet,
- test your detector in real-time.
To create the model, use the publicly available Stanford Dogs Dataset containing 20,580 images of 120 dog breeds. The dataset is consists of images and annotations collected in separate folders.
#
Unziping the dataset
The Stanford Dogs Dataset has been saved in two separate archives i.e. Dogs_Images and Dogs_Annotation using .tar file extension. Although our portal supports such extensions, they cannot be uploaded at the same time, which means that the photos in the dataset will not have annotations. You need to unzip them and upload them as a data directory. To do this, you can use programs such as 7-Zip or WinRAR.
If you are using 7-Zip to unzip the file, expand the context menu by right-clicking on the file and select the extract here option located under the 7-Zip menu.
Keep in mind: In this tutorial, we show only one of the options for dealing with this problem. Depending on the system and the program used, the options may differ.
#
Adding the dataset
In the Dataset section of the One Step AI menu, click Add new dataset and create a new collection called my_dogs
.
Then load the directory containing the images and annotations.
You can add object annotations for detection in Pascal VOC format (.xml extension files).
The image and annotation file must have the same name - only then you can link the annotation information in the image to the image itself.
You can merge the two directories for the selected dog breed into one directory yourself, or add the image and annotation directories in two steps during the upload phase. The upload order does not matter.
Click Upload images for the newly created dataset or Add more images for the datasets that have been added earlier.
Select Add directory, then find the selected dog breed catalog in the Images subdirectory.
To add annotations, use Add directory again. Select your dog breed catalog in the Annotations subdirectory.
Wait for the site to validate the uploaded annotations and pair them with the already owned images. In the thumbnails, you will see the images with the added annotations.
#
Choosing the model
To create the model, go to the Trainings > Models menu and Add a new model. Select Detection.
In the first step, select the dataset you want to use to train the model.
Now you can move on to the next step. You have to parametrize the model by assigning a name, a framework (Darknet) and one of the available pretrained models.
Accept the default settings for Basic mode or change them in Basic and/or Advanced mode (tab) as you wish.
Start your training without selecting any categories.
When the training is done, convert the model so that it is compatible with nVIDIA Jetson Nano. To do this, go to the model details and in the CONVERSION section select the nVIDIA MAXWELL architecture.
This is the first step in converting to the universal ONNX format. The second step will be done on the device during the first use of the model. This will be a conversion from ONNX to the TensorRT engine to optimize the model and maximize neural network performance on the Jetson Nano.
#
Testing your idea
Once the conversion is complete, go to the Live Testing tab and select Login to access Nano. This will open a new browser tab, with a testing web app that will run on your device.
Your model will be available for download to the device:
Click on the model tile to download and run the model. Now you can upload the input data that we will use to test your model. To do this, click Upload file and select the file.
You input the data as video, so the web app will open a built-in video player. There you will see notifications about the progress of the second stage of model conversion - this only happens on the first run.
In the video that you are now watching in the player, you can see the breed tags assigned to the dogs that the model managed to find.
This is what a frame of the model's work looks like in the media player of the test application.
The results in photos are similar: