Frigging Huge PDFs

Turns out this is how you fix up a frigging huge pdf that probably came out of photoshop or some such beast with all of the layers and history that is mostly useless to you. It seems to work quite well, 230 MB down to 6MB.

Of course you can do some tweaking to reduce the size even more if you want to (take a look at the ghost script manual).

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

flattr this!

Finding corrupted images

As zoe kindly pointed out, two Wednesdays have passed since I last posted. I blame the Olympics (I was in Vancouver for the first week, and I was still recovering from it the second week Go Canada!). You can see my photos on flickr.com as I post them here. Apparently you can only post 100 mb/month, so it might take a while.

DSCN0543

Anyway, onto the post:

This month at work, one of our servers decided to act up and corrupt quite a few things. We had about 10000 images on that server, many of them were corrupted, many of them were not. Instead of going through all of the images one by one, opening them and checking if they were corrupted, I was tasked with writing a php script to find them. My first instinct was to do something like this:

 
<?php
 
if ($handle = opendir('/path/to/files')) {
    echo "Directory handle: $handle\n";
    echo "Files:\n";
 
    /* loop over the directory. */
    while (false !== ($file = readdir($handle))) {
        if(@getimagesize($file) === false){
            echo "$file\n";
        }
    }
}
 
?>

Unfortunately, getimagesize only looks at the start of a png file to get the information, meaning that if the file was corrupted after those first few bits of information it would not show up on the list.

The second idea was to use this method:

<?php
 
if ($handle = opendir('/path/to/files')) {
    echo "Directory handle: $handle\n";
    echo "Files:\n";
 
    /* Since they are all png files, this will work: */
    while (false !== ($file = readdir($handle))) {
        if(@imagecreatefrompng($file) === false){
            echo "$file\n";
        }
    }
}

Which, in theory, should have worked because it reads the whole file in and then returns false if the image could not be created. Unfortunately there was (is) something wrong with the libpng library so I kept getting errors like this locally:

libpng warning: Ignoring bad adaptive filter type
libpng warning: Extra compressed data.
libpng warning: Extra compression data

Which would be fine if they were php errors. I could have ignored them because the script worked properly and listed all of the files that had problems. Unfortunately for what ever reason, the server was dying on these errors instead of continuing like it was doing locally.

In comes imagemagick to the rescue. Using the identify command (at the command line) it reads in the whole image file and then tells you what it is. Since you might have 10000 images like we did, it is probably also a good idea to send the output to a file instead:

    identify "./path/to/files/*" >badImages.txt 2>&1

You will also want to make sure that the path has ” ” around it, because otherwise you will end up with an “Argument list too long” error. In the badImages.txt file you will have a list of images that are in the folder you specified. Any of the lines that start with identify are no good.

flattr this!

links for 2009-03-19

flattr this!

links for 2009-02-10

flattr this!