Finding corrupted images
As zoe kindly pointed out, two Wednesdays have passed since I last posted. I blame the Olympics (I was in Vancouver for the first week, and I was still recovering from it the second week Go Canada!). You can see my photos on flickr.com as I post them here. Apparently you can only post 100 mb/month, so it might take a while.
Anyway, onto the post:
This month at work, one of our servers decided to act up and corrupt quite a few things. We had about 10000 images on that server, many of them were corrupted, many of them were not. Instead of going through all of the images one by one, opening them and checking if they were corrupted, I was tasked with writing a php script to find them. My first instinct was to do something like this:
<?php if ($handle = opendir('/path/to/files')) { echo "Directory handle: $handle\n"; echo "Files:\n"; /* loop over the directory. */ while (false !== ($file = readdir($handle))) { if(@getimagesize($file) === false){ echo "$file\n"; } } } ?>
Unfortunately, getimagesize only looks at the start of a png file to get the information, meaning that if the file was corrupted after those first few bits of information it would not show up on the list.
The second idea was to use this method:
<?php if ($handle = opendir('/path/to/files')) { echo "Directory handle: $handle\n"; echo "Files:\n"; /* Since they are all png files, this will work: */ while (false !== ($file = readdir($handle))) { if(@imagecreatefrompng($file) === false){ echo "$file\n"; } } }
Which, in theory, should have worked because it reads the whole file in and then returns false if the image could not be created. Unfortunately there was (is) something wrong with the libpng library so I kept getting errors like this locally:
libpng warning: Ignoring bad adaptive filter type
libpng warning: Extra compressed data.
libpng warning: Extra compression dataWhich would be fine if they were php errors. I could have ignored them because the script worked properly and listed all of the files that had problems. Unfortunately for what ever reason, the server was dying on these errors instead of continuing like it was doing locally.
In comes imagemagick to the rescue. Using the identify command (at the command line) it reads in the whole image file and then tells you what it is. Since you might have 10000 images like we did, it is probably also a good idea to send the output to a file instead:
identify "./path/to/files/*" >badImages.txt 2>&1
You will also want to make sure that the path has ” ” around it, because otherwise you will end up with an “Argument list too long” error. In the badImages.txt file you will have a list of images that are in the folder you specified. Any of the lines that start with identify are no good.
Related posts:

Recent Comments