Check 1000s of Images for duplicates quickly November 28
I came up against a problem recently where I had over 4000 images and I needed to check to see if there were any duplicates and remove them. The images were all different sizes/resolutions and duplicate images could have the same picture, but have different sizes/resolutions.
I might also add that I’m not exactly ‘Dr Maths’ or have any clue about image transformations - but I needed to get rid of the dupes in a hurry.
The method I came up with in the end was pretty simple, and while not foolproof, I haven’t seen it make a mistake yet. It was also quick enough for my needs, removing all the duplicates in around 20 minutes.
The Screenshot
If you want to compare 2 directories, then fill out both directory boxes. Any duplicates you choose to delete will be removed from directory 2. If all the files are in one directory, then leave the directory 2 textbox empty.
The Downloads
CompareImages.exe - You can run this
CompareImages.zip - Full code
The Solution:
1. build 50×50 copies of all images
Under the root directory of the images I add another directory called ’small’. I then make 50×50 copies of all the images and put them in this directory. I do this for 2 reasons.
- It normalises the size of all the images - I couldn’t compare 200×200 to 300×300 images before, but now they are both 50×50.
- Using pixel-by-pixel comparison, 50×50 is a lot less comparisons than 300×300 (2500 vs 90,000) and is still accurate enough to detect duplicates
2. Compare the 50×50 images with each other
Here is where the people that know something about images will most likely shake their heads (if they haven’t started already). For the rest;
For both images, I compare every pixel to see if they have ‘almos the same’ RGB value. If the RGB values are not similar, then I increment difference count. Once the number of differences reaches 50 these images are not the same, and I stop comparing (which saves a lot of time).
iDiff = 20
For iY = 0 To iHeight - 1
    For iX = 0 To iWidth - 1
oOrigPixel = bmOrig.GetPixel(iX, iY)
oNewPixel = bmNew.GetPixel(iX, iY)
If Math.Abs(Int(oOrigPixel.R) - Int(oNewPixel.R)) > iDiff OrElse _
    Math.Abs(Int(oOrigPixel.G) - Int(oNewPixel.G)) > iDiff OrElse _
  Math.Abs(Int(oOrigPixel.B) - Int(oNewPixel.B)) > iDiff Then
    iDiffCount = iDiffCount + 1
    If iDiffCount > 50 Then
'Images are not duplicates
Exit Function
    End If
End If
    Next
Next
'Images ARE duplicates
