RFC eBook Conversion

This text describes the conversion process used to create this ebook.

Conversion process for rfc.mobi/rfc.epub

The conversion process goes like follows:

  1. Update rfc index from the www.ietf.org
  2. Create the cover jpg from the postscript file and scale it down
  3. Create list of files to be included to the book
  4. Create ncx file based on the list created before
  5. Go through RFCs and convert them from text to html
  6. Create opf file for the book
  7. Convert the rfc-index.txt to index.html file
  8. Create .mobi file using kindlegen
  9. Create .ePub file from the same sources than .mobi by removing some mobipocket specific html tags from the html.

Steps 2 - 8 happens inside the make-rfc-mobibook.sh script.

Conversion process for working group internet-drafts

The conversion process goes like follows:

  1. Update rfc and internet-draft reposotiries from the www.ietf.org
  2. Create the directory structure where we have one directory for each area, and inside that directory we have directory for each working group in that area. Also create the .htaccess file containing full names for working groups.
  3. Create ebooks, by looping through all working groups in all areas and do following:
    1. Fetch list of working group drafts, RFCs and related from the http://datatracker.ietf.org/wg/wgname/documents/txt.
    2. Create the cover jpg from the postscript file and scale it down
    3. Create ncx file based on the list created before
    4. Go through documents and convert them from text to html
    5. Create opf file for the book
    6. Create index.html file based on the files and titles fetched in the beginning from datatracker.
    7. Create .mobi file using kindlegen
    8. Create .ePub file from the same sources than .mobi by removing some mobipocket specific html tags from the html.
  4. Copy .epub and .mobi files to the correct place in the directory structure.

Creating Cover page

make-cover.sh "\nRFC Index\n$date" "$time" \
    "ietf-logo.eps" > rfc.jpg

This program takes the title, time and logo postscript, and creates a postscript file which it then runs through ghostscript and converts it file suitable for the Kindle 3. The title can have three lines separated with "\n". Normally the top two lines contain the actual title, and third line contains the date of conversion. The time is added to the end of the page with small font, so it can be used during development phase to see which version of ebook this is (during development I did have multiple versions loaded to my Kindle and it was painful to find out which one of them is newest before this was added). The logo is ietf-logo.eps directly from the IETF web page.

The page is initially created at 2400x3200 pixel resolution and then scaled down to 25% of size meaning the final page is 600x800 pixels in size.

Creating NCX file

For RFC ebook:

make-ncx.pl --title "RFC Index" \
    --author "IETF" \
    --output $ncx \
    "toc:toc:index.html:Table of Contents" \
    --in \
    --class entry \
    --input-file $ncxtocentries \
    --out \
    --class book \
    --include-regexp '^rfc[0-9][0-9][0-9]1' \
    --split-regexp '^rfc[0-9][0-9]01' \
    --input-file $ncxrfcentries

For the Internet-Draft ebooks:

make-ncx.pl --title "$wg Index" \
    --author "IETF" \
    --output $ncx \
    "toc:toc:index.html:Table of Contents" \
    --class book \
    --input-file $ncxentries

NCX file contains list all files and the navigation information. That is used when you press left or right arrows on the kindle to see where to move next. See make-ncx manual page for information about options.

Creating OPF file

For RFC ebook:

files=`ls -1 "$dir"/rfc*.html | sed 's/.*\///g'`
make-opf.pl --title "RFC Index $date" \
    --language en \
    --cover rfc.jpg \
    --subject Reference \
    --beginning intro.html \
    --id "$id" \
    --role clb \
    --creator "Tero Kivinen" \
    --publisher "IETF" \
    --description "All RFCs as mobibook" \
    --date "$date" \
    --index index.html \
    --stylesheet rfc.css \
    --toc rfc.ncx \
    --output rfc.opf \
    intro.html \
    $files \
    conversion.html \
    $manpages

For the Internet-Draft ebooks:

make-opf.pl --title "$wg ID and RFC Docs $date" \
    --language en \
    --cover wg.jpg \
    --subject Reference \
    --beginning intro.html \
    --id "$id" \
    --role clb \
    --creator "Tero Kivinen" \
    --publisher "IETF" \
    --description "$wg RFCs and Internet-Drafts" \
    --date "$date" \
    --index index.html \
    --stylesheet rfc.css \
    --toc wg-"$wg".ncx \
    --output "$opf" \
    $files \
    conversion.html \
    $manpages

Open package format file describes what files are in the ebook. It also contains information where to start reading and in which order entries are appearing in the book. See make-opf manual page for information about options.

Converting text RFC to html

For RFCs the conversion command line is:

rfc2html.pl \
    --navigation \
    "index.html:Index;-5:Back 5;-1:Prev;+1:Next;+5:Forward 5" \
    -f $filelist \
    -r $rfcnum \
    -o rfc$rfcnum.html \
    $rfctxtfile

For Internet-Drafts the conversion command line is:

rfc2html.pl \
    --navigation \
    "index.html:Index;-5:Back 5;-1:Prev;+1:Next;+5:Forward 5" \
    -f $filelist \
    -t $draft-name \
    -o $draft-name.html \
    $draft-name.txt

This program takes the text formatted RFC or Internet-Draft and formats it to html suitable for ebooks. The first step is to remove page formatting (page breaks, page numbers, page headers and footers). In that phase it also tries to see if one textual paragraph is continuing from the previous page to the next, and if so then it will glue them together. The second phase is to go through all paragraphs and try to find out what type of paragraph it is (text, picture, header, table of contents, authors address section, terminology defination, bulleted or numbered list, references section). After this it goes through the actual text paragraphs and converts them to html suitable for their type. See rfc2html manual page for information about options.

Converting rfc-index.txt to index.html

TBF

Creating .mobi file

kindlegen rfc.opf -c1 -verbose

TBF

Converting files to .epub format

makeepub.sh current

TBF

Kindle 3 issues

Issues I have found when converting this to kindle 3

Ncx file size

It seems there is maximum number of items the ncx file can have, or some other limitation in the ncx file parsing. When I included all the rfcs to the ncx file then the next and previous arrows in the kindle 3 does not work anymore. If the number if items is reduced then they start working.

Kindle -c2 compression

When I tried to use the best compression of kindlegen, the program did create a eBook file but all the links inside the file pointed in wrong place, i.e. when you used link to go rfc5996 you ended up in the middle of rfc6020 or so.

No support for multiple indexes

The mobipockect supports multiple indexes and the eBook originally included titleword and full title text indexes, but those were removed as kindle 3 does not support them.

Last item in might be missing in index

The automatic index (using the menu and selecting index) sometimes misses the last item in it. Thats why I added this conversion description to the end, so if something is missing it will be this text.

Kindle 3 and pictures

Kindle 3 does support monospace font and the screen is wide enough for 67 charactes if screen is rotated. This allows the normal 32 bit packet frame description pictures to be shown properly using the normal pre-tag. The Kindle 3 will still wrap words to the next line, and this was problematic when combined with hyphens used in pictures. To fix this all the hyphens in the text are converted to the no-breaking hyphens.

No-breaking hyphen not shown properly on Kindle for PC

Because of the previous issue with word wrap we needed to use non-breaking hyphens, but unfortunately they do not show properly on the kindle for PC, but instead of unknown character box is shown instead.

Searching does not work

For some reason the searching from the RFC eBook does not work on the Kindle 3.