Duplicate annotations in test samples
Problem description
In CUHK-SYSU dataset test split, there are some queries which are also in the gallery. Plus, there are only 100 images in those galleries. Which implies that if we remove the duplicate in the gallery then we only have 99 frames in the gallery.
The person ids concerned are 484 and 10354.
Solutions
There are actually 2 possible solutions that we have in mind : remove these annotations from the test set or add distractor from bigger gallery version of the dataset to replace the removed duplicate. We will discuss both of them.
Remove the test samples
This solution is already used in repo using SYSU annotations (PSTR and AlignPS skip these annotations without size). Since, we do not use the .mat
, maybe their code means they keep the gallery with query inside them.
In both cases this is about 2 samples in 2900 test samples - understand 2900 person id
- which means it's marginal. It could be overlooked. Maybe the researchers of these papers are not even aware of this dataset error.
Although, the current repo is not a model repo, this a merge dataset. So, if there is a possibility of fixing the dataset then it should be done. This fix is not hard to do, so it's manageable to do it here. Thankfully, CUHK-SYSU has a great quality, these are only 2 exceptions.
Replace the duplicate gallery frame by a distractor
CUHK-SYSU has different versions, the researchers can very the size of the gallery. So, there are some available frames which are not related to any existing gallery. We can add one distractor - a frame in a gallery which does not contain the query person in it.
We can show in a notebook how we diagnose the problem and how we pick the distractor for transparency. Then, we can hard-code the added distractor for these 2 specific person ids. It does not seem to be hard to do.
But, we have to do some check in mind about the selected distractor:
- We should have the frame of the added distractor.
- The frame should not appear in any other galleries as it is the cases for every test/train samples. We do not want overlapping between galleries.
Since these conditions can be easily checked and the fix of the test dataset is straightforward, we take this option.