Converting Mailman "Gzip'd Text" archive files to proper mbox files

Mailman archives are often only available in the pretty useless "Gzip'd Text" format, which you cannot easily download and view locally (and threaded) in a MUA such as mutt. But that is exactly what I want to do from time to time (e.g. because I want to read the discussions of the past weeks on mailing lists where I'm newly subscribed).

After some searching I found one way to do it which I stripped down to my needs:

 $ cat mailman2mbox
 while (<STDIN>) {
   s/^(From:? .*) (at|en) /\1\@/;
   s/^Date: ([A-Z][a-z][a-z]) +([A-Z][a-z][a-z]) +([0-9]+) +([0-9:]+) +([0-9]+)/Date: \1, \3 \2 \5 \4 +0000/; 

Example run on some random mail archive:

 $ wget
 $ gunzip 2009-August.txt.gz
 $ ./mailman2mbox < 2009-August.txt > 2009-August.mbox

You can then view the mbox as usual in mutt:

 $ mutt -f 2009-August.mbox

Suggestions for a simpler method to do this are highly welcome. Maybe some mbox related Debian package already ships with a script to do this?

Migrating bdb svn repositories from one version to another and to fsfs

Today I had to work with a really old svn repository again, which was still in the old bdb format (not in the newer and recommended fsfs one). This caused quite some problems, like, um... you cannot checkout, update, or commit anything.

$ svn co file:///path/to/myrepo
svn: Unable to open an ra_local session to URL
svn: Unable to open repository 'file:///path/to/myrepo'
svn: Berkeley DB error for filesystem '/path/to/myrepo/db' while opening environment:
svn: DB_VERSION_MISMATCH: Database environment version mismatch
svn: bdb: Program version 4.6 doesn't match environment version 4.4

A quick search revealed that this is bug #342508, a solution is/was supposedly mentioned in /usr/share/doc/subversion/README.db4.3 (which does no longer exist in the Debian unstable package). Luckily this blogpost has some details.

So, the short HOWTO for upgrading an svn repository of one Berkeley DB version to another one is:

$ cd /path/to/myrepo/db
$ db4.4_checkpoint -1
$ db4.4_recover
$ db4.4_archive
$ svnlook youngest ..
$ db4.6_archive -d

In this case I upgraded from 4.4 to 4.6 (do "apt-get install db4.4-util db4.6-util" if necessary).

While I was at it, I also switched the repository to the fsfs format then:

$ svnadmin dump /path/to/myrepo > myrepo.dump
$ mv /path/to/myrepo /path/to/myrepo.bak
$ svnadmin create --fs-type fsfs /path/to/myrepo
$ svnadmin load /path/to/myrepo < myrepo.dump

Maybe this is helpful for some other people out there.

Rebuilding the whole Debian archive using the Open64 compiler

I got bored recently, so I rebuilt the whole Debian archive on one of my machines. To make this not a completely useless excercise, I used the Open64 compiler instead of gcc and created build logs for your perusal.

So what is Open64?

From the Wikipedia page:

Open64 is an open source, state-of-art, optimizing compiler for the Intel IA-64 (Itanium), AMD Opteron and Intel IA-32e architecture. It derives from the SGI compilers for the MIPS R10000 processor. It was released under the GPL in 2000, and now mostly serves as a research platform for compiler and computer architecture research groups. Open64 is licensed under the GPL. Open64 supports Fortran 77/95 and C/C++, as well as the shared memory programming model OpenMP. It can conduct high-quality interprocedural analysis, data flow analysis, data dependence analysis and array region analysis.

Open64 installation

The installation is pretty easy fortunately:

$ wget
$ tar xfvj open64-4.0-src.tar.bz2
$ cd open64-4.0
$ export TOOLROOT=/opt/open64
$ make
$ make install (as root)

I think you need gcc-3.4 (gcc 4.x is not yet supported), and for some odd reason you also need csh as one of the install scripts seems to use it.

It would be nice if someone could package Open64 for Debian, I definately don't have the time to maintain such a huge package (a whole maintainer team would probably be good here).

Rebuilding the Debian archive

There are several possible ways (and tools) to rebuild the Debian archive; I've used pbuilder/cowbuilder with the rebuild scripts from Bastian Venthur, which are now included in pbuilder.

First we need to install the required packages, setup a cowbuilder base chroot, and get the list of packages:

$ apt-get install cowdancer grep-dctrl wget devscripts sudo
$ cowbuilder --create --distribution lenny --basepath /var/cache/pbuilder/testing-base.cow
$ cp -r /usr/share/doc/pbuilder/examples/rebuild .
$ cd rebuild
$ ./getlist lenny

Now we add Open64 into the cowbuilder chroot and fix up the chroot by pointing the gcc/g++ symlinks to Open64:

$ cp -a /opt/open64 /var/cache/pbuilder/testing-base.cow/opt
$ chroot /var/cache/pbuilder/testing-base.cow
$ cd /usr/bin
$ mv gcc gcc.orig
$ ln -s /opt/open64/bin/opencc gcc
$ mv g++ g++.orig
$ ln -s /opt/open64/bin/openCC g++
$ exit

In addition, we set the CC and CXX environment variables to Open64, which will make 90% of all (autoconf-using) packages automatically use Open64. We need a small script for that:

$ cat c.cfg:
export CC="/opt/open64/bin/opencc -m32"
export CXX="/opt/open64/bin/openCC -m32"

Now edit the buildall script. Change the Debian mirror used there (optional) and make it use our c.cfg script by adding the --configfile /path/to/rebuild/c.cfg option in the "pdebuild" line.

We can now finally start building the archive:

./buildall list.lenny.i386 lenny

You can also run multiple buildall instances at once to speed up the archive rebuild on SMP/multicore machines, and you can even abort the command and simply restart it later. The script will continue where it left off.


The whole rebuild (with 2 instances of buildall running at the same time) took ca. 9 days on an AMD64 Athlon64 X2 (dual core, 1.8 GHz each) machine with 1 GB of RAM.

I really should have used something like apt-proxy to speed up the rebuild and save some bandwidth, but I read about apt-proxy too late...

All log files from my rebuild are available for detailed analysis if anybody is interested (you can browse the logfiles online or download all of them as tarball). I didn't perform any detailed analysis, just some rough numbers here:

  • Succeeded package builds: 8425
  • Failed package builds: 2509
  • Total number of packages rebuilt: 10934

If anybody does some more elaborate analysis, please let me know.

Tagesschau Video Podcast Archive?

Does anybody know about any archive of the Tagesschau Video Podcast (German TV news)? I'm collecting the videos, and I missed the shows from November 8th and November 13th, and they only provide 7 days of "backlog", after a week they seem to remove the videos (which sucks!)...

Thanks in advance!

