Remove Dupe Images by MD5 hash

I have an automated script that makes new smaller images from larger images (so to save space) then makes a thumb nail as a preview.
What I want to do is remove images that are in the list of md5hashes as they are being created, they are all exactly the same size and width and are all jpg files.

I have a list of hashes that need to be removed as duplicates which I have put in to an array changed them to upper case as my list is upper case.

But when I try to use in_array it does not find the hashes which is strange so come to try and find some one that can help me either figure it out or maybe loop through the list.

My brain just does not seem to be getting to the right way of thinking in this small script.
Thanks for any help.

My code
[php]

<?php function stripper($element) { return trim($element); // this will remove the whitespace // from the beginning and the end // of the element } $filename = 'md5p.txt'; $fp = fopen('./md5p.txt', 'r')or die('cannot open $filename'); if ($fp) { $line = explode("\n", fread($fp, filesize($filename))); $line = array_map("stripper", $line); } echo print_r($line).'
'; $files = scandir("./P"); $dir = ("./P/"); // remove . and .. array_shift($files); array_shift($files); foreach( $files as $file ){ $file = trim($file); $imagemd5 = md5($file); $imagemd5 = strtoupper($imagemd5); # echo ''.$imagemd5."
"; if(in_array($imagemd5 , $line)){ echo ' found '. $imagemd5 .' - '. $file .'
'; }else{ echo '! found '.$imagemd5 .' - '. $file .'
'; } } ?>

[/php]

I think the md5 hashes are not the same but cant think why.

My result :

Array ( [0] => FF53188F8DD1032971980925CCA0628F [1] => FC784650629A43B9B801DD23E00209D1 [2] => FB694F606B66288D0133BFBF057521A5 [3] => FB0072192BDBA046879FC8508992F93A [4] => F7DCB514E0D513CF136314DA8848AB03 [5] => F1F701BE74453A33207B279CC47FBB00 [6] => F15CE197F8B58A42F93567013118621E [7] => ECF95079BFE119C551888F7DCBEFCF51 [8] => E80FC9668E6328880268862757FF1490 [9] => E74EE882FA77CA70ACC496EC9E263386 [10] => E596DB731E02C01E916126D5F545539E [11] => DB6EC2070E41FC980221C657E3736C37 [12] => DB1E02FAD26FE85DF0093D1D397D8D51 [13] => D848AC606284D1FEE30069813FFDAD77 [14] => D6841744C0A34FF9D531AE5E24B0016F [15] => CFCE07FC5BECC11C9008C2D50C2B026C [16] => C779E31D068DAA116BA267155E62394E [17] => C2B39FB5F1179A6CC62BFCE495AFC94B [18] => C01B7E18EC185C123C3DEE3B35C8C0BA [19] => B76B7EA49C4069674BA543CFD2F06B7A [20] => B2AB183DAC9682EFCD3512D6CDA5E6BE [21] => AE4EE6C7771B0900396E5FA209AB4179 [22] => A6295297F8A194D956CE2941956EFD28 [23] => A00DDB837093870AB8DF0D4A6D5EDFBD [24] => 9E7934B91AB6A97BA0FDED41FAFF1099 [25] => 9C08E0A1C0076C27E67CB254E86AEB5F [26] => 99CAD24982739BC2743B05268481D7B5 [27] => 97C62273074399F03AD75A451DB7DA66 [28] => 960E4CCB2ED9538FA5EBB6D8523764FF [29] => 922C76C9B728B5661DD3CA519A49EAE8 [30] => 90AA64C60F2195982923CFA7043F8D84 [31] => 856D16016AD58096625C67FC1ECA4ED0 [32] => 845D18DCE6D6A768EFC0E3E584D0F61B [33] => 80D7E0A52FAC23F56A4B160D09886CAE [34] => 7A0C0ACA36F91DA1F32E5CA197DFFB05 [35] => 74BBA2F163B67056BA78B77368CE4511 [36] => 712A71BB025FAB1C3312DB88E3415986 [37] => 6D46069018B9AE5213468F91D6075513 [38] => 6964670221F9E521208F9505375B2E4C [39] => 66DE212CA48953028D82E3E3DBD172CB [40] => 5FACFAB5B55085379971618CCC0CEB5D [41] => 5DA1CB25F4C536FD9EC4D85011AF6623 [42] => 51BE47FD63C2DB58660EEA57377DC3F6 [43] => 4DD3BBA8A8157D3AC0A9E9F3A001BFC6 [44] => 4D50B7D943B34CF8A1EE2F1D8DE2EF59 [45] => 486882F3BE257B147E25AD37E9A3344C [46] => 4641E20E69E15A7047A3E81009B036EF [47] => 45A60436066356857296DD64DDA1C334 [48] => 3B977B34C2509D4B29E0DE865D6E92D0 [49] => 390D3A656D7FC08FB9CCB4203D24B786 [50] => 37D399760254FF5FC1EDF4CCEBE847BD [51] => 1F418B11B7A1031C6A5FF10559CC036F [52] => 1CB5269DF7A24DCEB4D9AD88986DBA55 [53] => 1C253F8010C9CAD08AEAFB7312933BB3 [54] => 161A661AB5CAE395D391992470ED65E6 [55] => 090AA56EF6B878E72C12B82995586DB9 [56] => 0424679BC3655CF79319B64316BDC582 [57] => 00E23F82999D734D637F9BD7E2F2DABF ) 1
! found CE0A8D654EDDE772AC814CE3B33BA418 - p1011226.jpg - Array
! found 70D8EA3E070AFDCDD36F62F45DC3384D - p1011515.jpg - Array
! found 1D262C54CF1F5979CBFE0B9BB6DDA7DE - p1014117.jpg - Array
! found 1B55367360934CBB2FA5E8A3A7E41AA0 - p1017931.jpg - Array
! found F930971C7D525B0872C0ECB3103C425C - p1021665.jpg - Array
! found 0FD26B8538E50B9AE6599397CA331892 - p1026256.jpg - Array
! found D46DC9425C169814578982DCA4F16E19 - p1045134.jpg - Array
! found 3D5D95C15D03C995E44391B9C44B38B9 - p1057067.jpg - Array
! found DA3C6B908DCD252148A86399328D4BC2 - p1064431.jpg - Array
! found 3B71C95584D7B054E6FE8857DD0E0978 - p1070863.jpg - Array
! found 09479321930FFEEEBDF201FA6E5FE44C - p1073924.jpg - Array
! found 998BB90DD0E294527D17BCB2A29CD810 - p1082656.jpg - Array
! found 7B2381DA9D1FF95D92D31E0F1AD3BE87 - p1089739.jpg - Array
! found 16A2787E4E2B6CF5C1A5F86A3594B99F - p1090852.jpg - Array
! found B74A6855552A8AC9A7B3EAB479EDB018 - p1103676.jpg - Array
! found 19AF4FAA3174E63BA7E4154C3C7A5C37 - p1112097.jpg - Array
! found E4E1864C6F6EE5B15F47A1503924CB89 - p1113944.jpg - Array
! found C6A5E60598CC7676F35CCAD501189B12 - p1113947.jpg - Array
! found F9EE7E7C67A18F970FDE2CD7E120CC39 - p1143341.jpg - Array
! found 0D8FF550673973054FF0A8C34AE5622F - p1147165.jpg - Array
! found D9C06DE0B80E2E0511B38BCCFC93631C - p1161656.jpg - Array
! found 888DDDEDB7099F690919DA68CC7B9D7D - p1163603.jpg - Array
! found 5BFFD49D58AFF63F21E0249AAC38CFD4 - p1177195.jpg - Array
! found 953CBDD63DB9AE92E840045315AABD41 - p1184866.jpg - Array
! found 9C46E14E5D766EAF7D5C03CEC757DEC5 - p1201277.jpg - Array
! found BD3AE4CECCD8B5C90933C2ED3EDAB78B - p1203766.jpg - Array
! found 12BF6CC68472000F9492AD48A65E7838 - p1220802.jpg - Array
! found C0A481C177A7B8BC71E8216D3FCD30D1 - p1227288.jpg - Array
! found 9BDE1863DD75EB43FEEE0DE679346661 - p1246275.jpg - Array
! found A4923B73ADF316BD2A20BDE1776A9DF2 - p1250922.jpg - Array
! found 3DB07E965172D4AD206456835F105C64 - p1255866.jpg - Array
! found 14C938A397074C4378479CE9DE406C7B - p1258103.jpg - Array
! found 6A46A43E990A5F5CD9C57FA4B7395B24 - p1266412.jpg - Array
! found E6FC73F43D2EAA11A9E7C83D7BAE801E - p1273500.jpg - Array
! found B3BF23187F7AC06CADAF19FD72BCAF8F - p1290852.jpg - Array
! found D07BECEF42ADB142EBBE49A4B8F0EF97 - p1295837.jpg - Array
! found 503BD1709B10248C1C06B5EF90661DE6 - p1316998.jpg - Array
! found E528EADB2CF67149C8BFD306CAF5A2CB - p1347494.jpg - Array
! found 0E5A79DC001CD97B7C4FFB9C7ECC02AA - p1348103.jpg - Array
! found 55D69D11EC6E0768B343027052A3AEA5 - p1348800.jpg - Array
! found 295CD1FAF17877D41EE4DF924B98D370 - p1348802.jpg - Array
! found 22DFDD08C663A598CDEB131C31343A7A - p1352414.jpg - Array
! found F1200B8503691EDA02C5F362672D9CF2 - p1363870.jpg - Array
! found 5CFE9345BE695527779F9B3086D3D86A - p1363871.jpg - Array
! found E03EB3026B1FE07F16BA4870A7B0DD60 - p1364091.jpg - Array
! found E68A4BC3C49FFCD77CA7414439E28234 - p1370855.jpg - Array
! found F1FAB89933E9AEC57F8D5572A4A83018 - p1372379.jpg - Array
! found F81E349436C1D2C2B0BBC463DA663E1C - p1372522.jpg - Array
! found 34897B1254DB9142873B8DBB49AF0269 - p1383566.jpg - Array
! found BFC283B22551C67C0A325D0CB8AEE7CF - p1385161.jpg - Array
! found C53EA5519DB36D2C83218E775147736A - p1390195.jpg - Array
! found 3A894352F5E679F2AC0B162542BBB926 - p1411296.jpg - Array
! found DCB49DE4EEE539A5D1D43AC0944667C4 - p1422243.jpg - Array
! found 6641D4E2936CFF3522805A54ADAC53E4 - p1426799.jpg - Array
! found DCC3DDE75EFCA13CD68405FED10FC8ED - p1433134.jpg - Array
! found 1E51D9DF8A6E1D9B73EE6CA16A360715 - p1433612.jpg - Array
! found C7251166560269C9BB1B1FC75BF3E3FC - p1433613.jpg - Array
! found DCD4AB7DED1087364B19A7192770CC46 - p1433636.jpg - Array
! found 6914C845015F90C7E33DB5D6C7514B68 - p1433637.jpg - Array
! found C65B6DEFE7E13C5C923576D3EB594648 - p1453350.jpg - Array
! found ADAA9AED8290E9DFDD4344F2D2E444EC - p1453358.jpg - Array
! found A89E1852A49ECAEB2833CBCE8B580826 - p1456343.jpg - Array
! found 9C7E0BA9420FBE242CD4BF77C6CA894B - p1457641.jpg - Array
! found B3B4F18FA18F9722826EF5B848D91DFC - p1460612.jpg - Array
! found 018E493849D74A7F39FEC494B8236F97 - p1462351.jpg - Array
! found 26C82DC11A6A3374206FB5344ABF570B - p1462379.jpg - Array
! found 0D75C0126A447AB0E436D49A5776DF81 - p1490912.jpg - Array
! found C2A8202A71D22B046E7A1ED769DA3B54 - p1491405.jpg - Array
! found 7E25BC68BD63D68739C6721317478D4B - p1491468.jpg - Array
! found AE19F29CA0B960707106FF9E1B4A1CA9 - p1494390.jpg - Array
! found B6DA62E591687DEFF17983E2A7B49161 - p1494391.jpg - Array
! found 1CEAE68D83361B799486FA847613327D - p1494392.jpg - Array
! found 553BBAFCB3771BE3511D7B8290DB5DDF - p1494393.jpg - Array
! found C4841A90F6A7D079D0FFABC813940CCF - p1507910.jpg - Array
! found 4706F759FAB3818F8B85C8D3A528C4F6 - p1514909.jpg - Array
! found 64180F080BE7D3EC66C93A02FA390A12 - p1518573.jpg - Array
! found 0F061BBB2AE7C1B830D52580D820BFD0 - p1518747.jpg - Array
! found BAB7F577D761EA1E1A40306A72F579DC - p1518749.jpg - Array
! found 39CA8CF932F890D8D260D642FCFFE5BB - p1519115.jpg - Array
! found 54BB7E2C7B241ED3AEBB97F03205ABFC - p1522635.jpg - Array
! found 3A761E0775BFCB75BD169E0F6B6FE14F - p1522639.jpg - Array
! found 6EB4F8C8DE2563343745C5F027CAC2F7 - p1529425.jpg - Array
! found FC83F3285E1D60EB72139902276DADEC - p1532260.jpg - Array
! found 79D468CF00BE90F1D155E9853876D1BE - p1532262.jpg - Array

ok had another think and found i was md5 hashing the name not the file.

Then made the array in the same php file to take away the txt file opening.

[php]

<?php $a = array('FF53188F8DD1032971980925CCA0628F','FC784650629A43B9B801DD23E00209D1','FB694F606B66288D0133BFBF057521A5','FB0072192BDBA046879FC8508992F93A'); $files = scandir("./P/"); $dir = "./P/"; // remove . and .. array_shift($files); array_shift($files); foreach( $files as $file ){ $file = trim($file); $imagemd5 = md5_file($dir.$file); $imagemd5 = strtoupper($imagemd5); if(in_array($imagemd5, $a, true)){ echo ' found '. $imagemd5 .' - '. $file .'
'; }else{ echo '! found '.$imagemd5 .' - '. $file .'
'; } } [/php] So result : [code] found B76B7EA49C4069674BA543CFD2F06B7A - p1011226.jpg found B76B7EA49C4069674BA543CFD2F06B7A - p1011515.jpg found CFCE07FC5BECC11C9008C2D50C2B026C - p1014117.jpg found 856D16016AD58096625C67FC1ECA4ED0 - p1017931.jpg found B2AB183DAC9682EFCD3512D6CDA5E6BE - p1021665.jpg found B76B7EA49C4069674BA543CFD2F06B7A - p1026256.jpg found 5FACFAB5B55085379971618CCC0CEB5D - p1045134.jpg [/code] :)

Have you considered using SQLite?
Using a db is a nice solution as well, especially when ya get into them big numbers :slight_smile:

I already have a database of over 200 megs this is just a quick code to get rid of images that are not images as such just blanks.
Its a code that only needs to be run once a week just to save me time doing it myself but its working now so job done.
thanks.

Sponsor our Newsletter | Privacy Policy | Terms of Service