Gutenberg formatting

Palimpsest’s Book Group is reading two H.G. Wells books at the moment. Being a skinflint, I thought I would download them from Project Gutenberg, a library of free books available in ext format, and sometimes HTML.

The two novels are:

The trouble is that often the HTML option isn’t there, and the text files are formatted with hard line breaks, which means that the lines break at that point whether it needs to or not. So if you load them into a word processor and change the font and text size to get the page count down for printing, the results look terrible.

Surely, I thought, it must be possible to automatically remove these line breaks, somehow? I asked in various places:

All to no avail!

Until Carfilhiot suggested a tool called GutenMark, a command line tool for linux or Windows which takes the text file and reformats nicely it to HTML. It is released under the GPL, so it should be possible to have a look at the source and see if it can be persuaded to produce just text files, though it may be possible to cut and paste from the browser to a text editor to see what results from that.

Carfilhiot has hosted the reformatted versions of the Wells texts:

Excellent – and the copy-and-paste to text file seems to work too!

In the News

Been listening to Radio 4 and the Today programme on the way into work recently. This morning’s news was full of interesting stuff:

  • Galloway accused of Senate ‘lies’ – I am no fan of Galloway’s, but I do find it surprising that the Seante Committee has come out with these statements without further interviews with ‘Gorgeous’ George.
  • Africa Aids orphans ‘may top 18m’ – and one report stated that the life expectancy for a male in (IIRC) Zambia is 30. 30!
  • EU mulls wild bird import freeze – Apparently this was suggested and agreed by most EU states in March this year. the only country to explicity say ‘no’? The UK. Brilliant.
  • Overweight job hunters ‘lose out’ – no wonder I can’t get a job nearer home! Reminds me of one time I was being abused for my rotundity, and I accused my adversary of being fattist. “No, Dave,” he replied. “You’re the fattest.” Bastard.

Can you trust Wikipedia?

The Guardian asks whether the content in Wikipedia is worth all that much, and gets some experts to judge some entries.

The founder of the online encyclopedia written and edited by its users has admitted some of its entries are ‘a horrific embarrassment’.

To be honest, I would never dream of using Wikipedia as a serious research tool. If I want a very quick rundown on something, though, it’s fine. Would be interested to find where Jimmy Wales mentioned this ‘horrific embarrassment’!

edit: Aha! The article than began all this was by Nicholas Carr, titled The amorality of Web 2.0. Wales then responds:

I don’t agree with much of this critique, and I certainly do not share
the attitude that Wikipedia is better than Britannica merely because it
is free. It is my intention that we aim at Britannica-or-better
quality, period, free or non-free. We should strive to be the best.

But the two examples he puts forward are, quite frankly, a horrific
embarassment. [[Bill Gates]] and [[Jane Fonda]] are nearly unreadable crap.

Why? What can we do about it?

So there we have it…unless we let Andrew Orlowski have his usual rant against ‘Wiki-fiddlers’, in the Register:

Encouraging signs from the Wikipedia project, where co-founder and überpedian Jimmy Wales has acknowledged there are real quality problems with the online work.

Criticism of the project from within the inner sanctum has been very rare so far, although fellow co-founder Larry Sanger, who is no longer associated with the project, pleaded with the management to improve its content by befriending, and not alienating, established sources of expertise. (i.e., people who know what they’re talking about.)

Meanwhile, criticism from outside the Wikipedia camp has been rebuffed with a ferocious blend of irrationality and vigor that’s almost unprecedented in our experience: if you thought Apple, Amiga, Mozilla or OS/2 fans were er, … passionate, you haven’t met a wiki-fiddler. For them, it’s a religious crusade.

Screen Select

I recently joined Screen Select, an online DVD rental store. I had previously been a member of Amazon’s effort, but I cancelled as I was bereft of a DVD player for a few months. I pay £12.99 a month for an unlimited number of DVDs, and I can have 2 at home at any one time. To be honest, I rarely have two at home, it’s more the case that one is, and that the other is being either returned or sent out to me. Means we manage to get through quite a few films, which is good given that TV is so rubbish these days.

Here’s what we have had so far:

  • Alien – Though the DVD was apparently the Director’s Cut, a studio release that Ridley Scott hated, it also featured the original version, which I watched. It’s visually stunning, claustrophobic and tense, and gripping from the very start. But it isn’t scary is it?
  • Supersize Me – Entertaining bit of corporate America bashing. The science is all a bit dubious though, isn’t it, and the message obvious, at least for sane people.
  • The Big Lebowski – Loved it. I am now tempted to change my Palimpname from Wavid to ‘The Dude’.
  • The End of the Affair – Found this strangely unlikeable.
  • Spiderman 2 – Even more stupid than the first one. Enjoyed it, but felt thoroughly ashamed of doing so.
  • Hitchiker’s Guide… – My other half liked this more than me – she thought this might be because she hasn’t read the book for 10 years, while I re-read it all the time. It was ok, I guess.

This has been ripped and edited slightly from my Film List on Palimpsest.

New Style

Have uploaded and slightly modified a new style. I wasn’t unhappy with the last one, but was concious that it was completely non-standards compliant and very graphics-heavy.

The other advantage of this one is that it goes against the annoying blogging grain of only using three-quarters of the screen. Font size might be a bit big, mind.

So, I am going to give this a go for a while. Let me know what you think!