python + opencv – how to properly compare images (via histograms)?

Avatarcraig asked 5 months ago

I have a bunch of images (from the M.C. Escher collection) i want to organize, so first step i had in mind is to group them up, by comparing them (you know, some have different resolutions/shapes, etc).

i wrote a very brutal script to:
* read the files
* compute their histograms
* compare them

but the quality of the comparison is really low, like there are files matching that are absolutely different

take a look at what i wrote so far:

Preparing the histograms

files_hist = {}

for i, f in enumerate(files):
        frame = cv2.imread(f)
        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        hist = cv2.calcHist([frame],[0],None,[4096],[0,4096])
        cv2.normalize(hist, hist, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX)

        files_hist[f] = hist
    except Exception as e:
        print('ERROR:', f, e)

Comparing the histograms

pairs = list(itertools.combinations(files_hist.keys(), 2))

for i, (f1, f2) in enumerate(pairs):
    correl = cv2.compareHist(files_hist[f1], files_hist[f2], cv2.HISTCMP_CORREL)

    if correl >= 0.999:
        print('MATCH:', correl, f1, f2)

now, for example i get a match for these 2 files:




and their correlation, using the code above, is 0.9996699595530539 (so their practically the same 🙁 )

what am i doing wrong? how can i improve that code to avoid this false matches?


AvatarMikhail answered 5 months ago
