Say you have a bunch of PDFs and you want to know how many pages each of them has. You could of course use some graphical software to display every single PDF and check the page count (xpdf, evince, whatever), but that gets tedious very fast.
So here's (mostly as a reminder for myself) one way to count pages of many PDFs (in the current directory) using pdfinfo from the xpdf-utils package:
$ cat countpdfpages
#!/bin/sh
for f in *.pdf; do
echo -n "$f: "
pdfinfo "$f" 2>/dev/null | grep Pages | cut -d ":" -f 2
done
A sample run:
$ ./countpdfpages S71147.pdf: 25 S71226.pdf: 38 S71242-01.pdf: 25 S71258.pdf: 26 S71315.pdf: 35 S72045.pdf: 2
I'm sure there are many other ways to do this, e.g. using pdftk foo.pdf dump_data (in the pdftk package), please leave a comment if you know a simpler way to achieve this. Especially so if you don't need an extra script to do it (i.e. if there's a PDF command line tool which can already count pages in multiple PDFs). Thanks!
Stuff I didn't expect I'd had to type today:
$ dpkg-repack dpkg-repack
Seriously.
jhead is a very nice and very powerful command line utility to mess with JPEG headers (esp. EXIF fields).
$ apt-get install jhead
It can display/extract a great amount of metadata fields from JPEG files and also extract the thumbnails stored in JPEG files (if any). The following will list all known metadata fields from a sample photo:
$ wget http://farm4.static.flickr.com/3173/3061542361_60acb0904b_o.jpg $ jhead *.jpg File name : 3061542361_60acb0904b_o.jpg File size : 1074172 bytes File date : 2008:11:26 23:38:04 Camera make : Panasonic Camera model : DMC-FZ18 Date/Time : 2008:03:05 15:45:52 Resolution : 3264 x 2448 Flash used : No Focal length : 4.6mm (35mm equivalent: 28mm) Exposure time: 0.0100 s (1/100) Aperture : f/3.6 ISO equiv. : 100 Whitebalance : Auto Metering Mode: matrix Exposure : program (auto) GPS Latitude : N %:.7fd %;.8fm %;.8fs GPS Longitude: E %;.8fd %:.7fm %;.8fs GPS Altitude : 174.00m Comment : Aufgenommen auf dem <a href="http://www.froutes.de/TT00000014_Ars_Natura">Kunstweg Ars Natura</a>. ======= IPTC data: ======= Record vers. : 4 Headline : Felsburg auf dem Felsberg (C)Notice : www.froutes.de Caption : Aufgenommen auf dem <a href="http://www.froutes.de/TT00000014_Ars_Natura">Kunstweg Ars Natura</a>.
As you can see there's a huge amount of potentially privacy-sensitive metadata in your typical JPEG as generated by your camera (including camera type, settings, date/time, maybe even GPS coordinates of your location, etc).
You can extract the thumbnail stored in all JPEGs in the current directory with:
$ jhead -st "&i_t.jpg" *.jpg Created: '3061542361_60acb0904b_o.jpg_t.jpg'
Note that the JPEG thumbnail does not necessarily show the same picture as the JPEG itself. Depending on the image manipulation software that was used to create the edited/fixed/cropped JPEG, the thumbnail may still reflect the original JPEG contents (see sample image on the right-hand side). This is a huge potential privacy issue. There have been a number of articles about this some years ago, in case you missed them:
Thus, an important jhead command line to know is the following, which removes all metadata (including any thumbnails) from all JPEG images in the current directory:
$ jhead -purejpg *.jpg Modified: 3061542361_60acb0904b_o.jpg
As you can see the result is that only very basic information can be gathered from the file afterwards:
$ jhead *.jpg File name : 3061542361_60acb0904b_o.jpg File size : 1052506 bytes File date : 2008:11:26 23:38:04 Resolution : 3264 x 2448 $ jhead -st "&i_t.jpg" *.jpg Image contains no thumbnail
I recommend doing this for most photos you make publically available on sites like flickr etc. (unless you have a good reason not to). Finally, see the jhead(1) manpage for lots more options that the tool supports.
One of the single most useful packages when it comes to PDFs in Linux is pdfjam.
From the website:
The installation is easy as always: apt-get install pdfjam
PDF is not exactly the most easily editable format out there, but these tools can save you lots of time and trouble. Just recently I needed to merge two PDFs into one (and I didn't have any source format of the files). A simple pdfjoin foo1.pdf foo2.pdf --outfile bar.pdf does the job in a few seconds.
Equally useful when you need to print huge documents is pdfnup --nup 2x2 foo.pdf, which sticks four PDF pages into one (thus drastically reducing the amount of pages you have to print)...
Update 2006-09-20: As was noted by several people, pdftk is very cool, too. It can do some other things such as split PDFs, encrypt/decrypt them, manipulate metadata and more...
I noticed that there have been quite a few Python related posts on Planet Debian lately. Here's mine.
I'm having a really hard time trying to find a good solution for rendering an OpenGL scene (e.g. from a VRML file) using PyOpenGL and/or OpenGLContext in the command line. All solutions I tried so far pop up an X11-window, which is not what I'm looking for. Instead I want to render the scene and dump it to PNG/JPG without requiring an X11 server.
Can this be done with PyOpenGL/OpenGLContext? I could probably try to use xvfb, but that's really an ugly hack. Besides the images I get using xvfb are somewhat broken, not sure why (does xvfb support OpenGL?).
Any hints are appreciated.
Recent comments
20 weeks 4 days ago
46 weeks 6 days ago
1 year 2 weeks ago
1 year 2 weeks ago
1 year 2 weeks ago