3
Finding corrupted images
1 Comment · Posted by SeanJA in Data Recovery, Linux, PHP, Programming
As zoe kindly pointed out, two Wednesdays have passed since I last posted. I blame the Olympics (I was in Vancouver for the first week, and I was still recovering from it the second week Go Canada!). You can see my photos on flickr.com as I post them here. Apparently you can only post 100 mb/month, so it might take a while.
Anyway, onto the post:
This month at work, one of our servers decided to act up and corrupt quite a few things. We had about 10000 images on that server, many of them were corrupted, many of them were not. Instead of going through all of the images one by one, opening them and checking if they were corrupted, I was tasked with writing a php script to find them. My first instinct was to do something like this:
<?php if ($handle = opendir('/path/to/files')) { echo "Directory handle: $handle\n"; echo "Files:\n"; /* loop over the directory. */ while (false !== ($file = readdir($handle))) { if(@getimagesize($file) === false){ echo "$file\n"; } } } ?>
Unfortunately, getimagesize only looks at the start of a png file to get the information, meaning that if the file was corrupted after those first few bits of information it would not show up on the list.
The second idea was to use this method:
<?php if ($handle = opendir('/path/to/files')) { echo "Directory handle: $handle\n"; echo "Files:\n"; /* Since they are all png files, this will work: */ while (false !== ($file = readdir($handle))) { if(@imagecreatefrompng($file) === false){ echo "$file\n"; } } }
Which, in theory, should have worked because it reads the whole file in and then returns false if the image could not be created. Unfortunately there was (is) something wrong with the libpng library so I kept getting errors like this locally:
libpng warning: Ignoring bad adaptive filter type
libpng warning: Extra compressed data.
libpng warning: Extra compression dataWhich would be fine if they were php errors. I could have ignored them because the script worked properly and listed all of the files that had problems. Unfortunately for what ever reason, the server was dying on these errors instead of continuing like it was doing locally.
In comes imagemagick to the rescue. Using the identify command (at the command line) it reads in the whole image file and then tells you what it is. Since you might have 10000 images like we did, it is probably also a good idea to send the output to a file instead:
identify "./path/to/files/*" >badImages.txt 2>&1
You will also want to make sure that the path has ” ” around it, because otherwise you will end up with an “Argument list too long” error. In the badImages.txt file you will have a list of images that are in the folder you specified. Any of the lines that start with identify are no good.
No tags
We all agree that testing code is better than not testing it right? So why do we tend to avoid writing unit tests to make sure that we are writing code that works? I’m looking at you PHP guys. Good thing there is PHPUnit and it is as easy as can be to get started with it.
First you have to discover the channel:
pear channel-discover pear.phpunit.de pear channel-discover pear.symfony-project.com
Sidenote: phpunit.de has many useful packages in it including phpcpd, phpdcd, and phploc.
Then you install it (I usually install all of the dependencies with PHPUnit, but you don’t really have to).
pear install phpunit/PHPUnit
After that you can start writing tests.
Say you have a simple class like this:
<?php class needsTesting{ public function divide($a, $b){ return $a/$b; } }
Your unit test file would be this (prior to adding any test cases to it):
<?php require_once 'PHPUnit/Framework.php'; require_once 'needsTesting.php'; class needsTestingTest extends PHPUnit_Framework_TestCase { /** * @var needsTesting */ protected $object; /** * This method is called before a test is executed. */ protected function setUp() { $this->object = new needsTesting; } /** * This method is called after a test is executed. */ protected function tearDown() { } public function testDivide() { } }
After you have added the obvious test cases, your testDivide() function should look something like this:
<?php //[...] public function testDivide() { $array_a = array( 10, 20, 30, 40, 50, ); $array_b = array( 2, 4, 6, 8, 0, ); $array_expected = array( 5, 5, 5, 5, 0, ); foreach($array_a as $k=>$a){ $b = $array_b[$k]; $expected = $array_expected[$k]; $result = $this->object->divide($a, $b); $this->assertEquals($expected, $result); } } //[...]
Running the code is quite easy, if your ide has a PHPUnit extension (like Netbeans does) you can run it from there, or you can run it from the command line (in the folder you are writing the tests in):
phpunit needsTesting
Since one of the tests will give you an error, you should get this:
PHPUnit 3.4.9 by Sebastian Bergmann. E Time: 0 seconds, Memory: 5.25Mb There was 1 error: 1) needsTesting::testDivide Division by zero [...]/needsTesting.php:4 [...]/needsTestingTest.php:56 FAILURES! Tests: 1, Assertions: 4, Errors: 1.
Now it is up to you to decide how your application handles the Divide by 0 error that you got (or any other errors for that matter). While testing first is a good way to define exactly how your app should work, testing after is good too (as long as you are testing it is a good thing).
Check it out:
http://www.phpunit.de/manual/current/en/
No tags
So, it turns out that having a blog is not of much use if you don’t actually post anything on it. Unfortunately I have been neglecting mine as of late, so I am going to try and remedy that by forcing myself to post something on every Wednesday. Why Wednesday? Why not.
No tags
I saw a post that talked about how to make a simple tool tip with jQuery, but that post did not make a plugin of it. It just added the function to the global scape, meaning that it was not chainable and it would only be applied to the a element. So I took the code for the first example as a starting point and turned the first example into a plugin. So, based on the first example, here we have yet another tooltip plugin, it uses the title attribute so it is perfectly accessible to people without javascript enabled. It can be applied to any element, and it is chainable and it will also be applied to new elements on the page instead of you having to re attach it to every item that you add to the dom.
Here is the javascript:
// starting the script on page load
$().ready(function(){
$('a').simple_tip();
});The html:
<a href="#" title="This is a tooltip">I have a tool tip!</a> <a href="#">I don't have a tool tip!</a>
The demo.
The code is available here:
http://code.google.com/p/jsimpletip/downloads/list
The code is licensed:
Mit and GPL v2
No tags
Let’s look at a common php ‘optimization’ tip (note that I am only running this test on my computer, and I am not shutting things down to help it go faster).
Use single quotes ( ‘ ) instead of double quotes ( ” ) .
The reasoning behind this is that php parses double quoted strings for variables, so it takes way longer to process your evil double quoted strings than it does to parse your single quoted strings.
$test = 'something'; $repeats = 1000000; $start1 = microtime(true); $first = ''; for($i = 0; $i < $repeats; $i ++){ $first .= 'string '. $test; } $end1 = microtime(true); $time1 = ($end1 - $start1); $start2 = microtime(true); $second = ""; for($i = 0; $i < $repeats; $i ++){ $second .= "string $test"; } $end2 = microtime(true); $time2 = ($end2 - $start2); echo 'A difference of: '.($time2 - $time1) . ' seconds'; #=> A difference of approx: 0.244801044464 seconds;
A whopping 0.24 seconds over 1000000 string concatenations (a difference of about 0.00000024 per concatenation). I wouldn’t bother with this one, if the code is full of double quotes leave it.
Now, I am not suggesting that you rethink the micro optimizations that you make go ahead, make them if you must. But before you put too much thought into it, make those macro-optimizations that you are avoiding and clean up your code. Future you will thank you, especially if current you avoids dumb “micro-optimizations” like bit shifting rather than multiplying and dividing (dumb because for the most part, modern compilers make this optimization already, or your processor does a table lookup rather than doing the actual calculation).
#this: 9 * 256; #is way more readable than: 9 << 8;
No tags

