Prettify Project Gutenberg Books

I’m dead tired, but before I go to sleep I wanted to inform you of a neat tool you might dig. Month before last at the Houston Palm Users Group (which I lead) we did a bit on creating ebooks. I covered the Palm Markup Language which is a sorry excuse for a markup language. To balance out how bad PML was, I wrote a little tool that would convert Project Gutenberg texts to a format more sutable for ebook reading. See the problem with the Gutenberg texts is that they are plain ASCII files which wrap every 76 characters or so. What this means on a PalmOS device is that you get a line and a half of text, and then a break, and it makes reading anything a major pain.

So what HPUG DocIt! does is combine paragraphs of text into one line so you don’t get funny line breaks everywhere, except where you want them between paragraphs. It’s pretty basic, but I’ve found it incredibly useful and a number of members of the group have as well. Even though the only other time it’s been mentioned was at the meeting and on the HPUG website, it converts about a dozen documents every day. Anyway I’d just like to put this out there as a service. The interface should look pretty familar to Texturize users, with the additional option of being able to upload text files. If there is interest there is definitely a lot of room for improvement in this tool, for example it could convert the ASCII faux text styling (stars for bold, underscores for underlined, etc.) to HTML equivilents, and it could also determine when the nend of a line is a hyphenated word and deal with the break accordingly. But for now, it does one thing and it does it well. Enjoy.

One thought on “Prettify Project Gutenberg Books

  1. I found this page while doing research for a new piece of software I am trying to spec out. Essentially, as a project to truly learn Python and to help my high school English students, I am designing a ProjectGutenberg reader. I have a proposed feature set that I am working on now and a few braindumps about how it might all work, but I am still gathering info on what tools might be out there currently. That’s how I arrived here.

    It seems however that DocIt! has been moved on the HPUG site. I would love to pick your brain about what you did and how it all works.