Can Everton Jones find out how his father stole Emperor Bokassa’s diamonds and, more importantly, where he hid them; before the world and his brother get there first?
Click on the picture link in the sidebar to read an extract of my first novel, which was published by Paradise Press in August 2012.

Thursday, 10 July 2014

How to understand toc.ncx in an e-book

The logical table of contents in an e-book is a file called toc.ncx. To say that the syntax of this file is complicated would be an understatement. However it is not impossible to understand, and this post goes through toc.ncx line by line and explains how it is constructed.

Whilst Sigil can generate toc.ncx for you, it may be useful to have an understanding of the way it is put together in case for some reason you need to edit it manually at a late stage in preparing your e-book. And, if that were not reason enough, I want to understand what I am producing, so I have made it my business to understand the syntax of this file.

This will probably be a scary post for non-programmers, but I hope I have made it accessible, without skimping on detail.

To view toc.ncx, open the e-pub in Sigil and double-click on toc.ncx in the book browser. (It will be the last item in the list.)

In this example, the file looked like this in code view. (I have left some repetitive bits out, replaced by an ellipsis.) All will be explained below:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN" "http://www.daisy.org/z3986/2005/ncx-2005-1.dtd">
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1">
     <head>
          <meta content="urn:uuid:4ef905ed-782e-440e-84b7-49b7f88099f4" name="dtb:uid"/>
          <meta content="2" name="dtb:depth"/>
          <meta content="0" name="dtb:totalPageCount"/>
          <meta content="0" name="dtb:maxPageNumber"/>
     </head>
     <docTitle>
          <text>The Bexhill Missile Crisis</text>
     </docTitle>
     <navMap>
          <navPoint id="navPoint-1" playOrder="1">
               <navLabel>
                    <text>David Gee</text>
               </navLabel>
               <content src="Text/DavidGee.xhtml"/>
          </navPoint>
          <navPoint id="navPoint-2" playOrder="2">
               <navLabel>
                    <text>Dedication</text>
               </navLabel>
               <content src="Text/Dedication.xhtml"/>
          </navPoint>
               
          <navPoint id="navPoint-7" playOrder="7">
               <navLabel>
                    <text>Tuesday</text>
               </navLabel>
               <content src="Text/Tuesday.xhtml"/>
               <navPoint id="navPoint-8" playOrder="8">
                    <navLabel>
                         <text>Morning</text>
                    </navLabel>
                    <content src="Text/Tuesday.xhtml#tuesdayMorning"/>
               </navPoint>
               <navPoint id="navPoint-9" playOrder="9">
                    <navLabel>
                         <text>Afternoon</text>
                    </navLabel>
                    <content src="Text/Tuesday.xhtml#tuesdayAfternoon"/>
               </navPoint>
               <navPoint id="navPoint-10" playOrder="10">
                    <navLabel>
                         <text>Evening</text>
                    </navLabel>
                    <content src="Text/Tuesday.xhtml#tuesdayEvening"/>
               </navPoint>
               
          </navPoint
     </navMap>
</ncx>

The explanation follows. NB the left alignment of the lines is done by Sigil when it ‘tidies up’ the code. It has no significance, other than supposedly to make it more readable.

The file begins with this line:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

This just specifies that it is an xml 1.0 document and uses UTF-8 character encoding. ‘Standalone’ is set to ‘no’ because the file is part of a collection of files inside your e-book.

The next line is a ‘doctype’ declaration:

<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN" "http://www.daisy.org/z3986/2005/ncx-2005-1.dtd">

This tells the e-book software that it is an ‘ncx’ file and provides a link to the relevant specification. ‘ncx’ stands for ‘Navigation Center eXtended’. The specification for this type of file was originally designed to make indexes for audio books. It has been adopted as the standard for the logical table of contents in the e-pub specification.

That ends the declarations section. You should not need to do anything to this. Sigil will have got it right. Leave well alone. The foregoing is purely for the sake of interest.

The next line contains an opening <ncx > tag. Inside it there is an xml namespace (xmlns) declaration:

<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1">

The ‘namespace’ specifies which particular dialect of xml the file is using. This is necessary in order to disambiguate the meaning of certain XML names, which can stand for different things depending on the particular version of XML in use. Sigil will have got this right. It should not be changed.

The next thing in toc.ncx is an opening <head> tag, which begins the header of the file.

Following that the file links with the metadata for the e-book:

<meta content="urn:uuid:4ef905ed-782e-440e-84b7-49b7f88099f4" name="dtb:uid"/>

This tag specifies the unique identifier for the e-book which is declared in the metadata section of content.opf. This is a hopefully unique code to identify the e-book. See a forthcoming post on editing and understanding the metadata for further information. In this case it is the ‘universally unique identifier’ (UUID) which was generated by Sigil when you created your e-book. It could just as well be the ISBN* or ASIN*: it need only be some sequence of characters unique to your e-book so a computer can distinguish the file as different from all the others. It is long and complicated because computers like long and complicated! (And it is all the more likely to be unique because of this.)

(*If you are intending to use the ISBN or ASIN instead of the UUID, some changes to the names etc. within the tag will need to be made.)

The next items give information about the file so that the e-reader software can interpret it properly:

The first line:

<meta content="2" name="dtb:depth"/>

details how many levels of headings are in the table of contents. In this example there are main headings and sub-headings, so a ‘depth’ of TWO is specified.

The standard for toc.ncx has been adapted for use in e-books from the original specification for audio books, and so the next two items are largely redundant. There cannot be a page count or maximum page number in an e-book, as it has no actual pages as such. Sigil has entered ZERO here, only because the standard requires some value to be given:

<meta content="0" name="dtb:totalPageCount"/>
<meta content="0" name="dtb:maxPageNumber"/>


There then follows a closing </head> tag to end the header.

The next item is the title of the e-book:

<docTitle>
     <text>The Bexhill Missile Crisis</text>
</docTitle>


It consists of an opening <docTitle> tag and a closing </docTitle> tag enclosing the <text> </text> tags which in turn enclose the actual title. This is what will be displayed by the Kindle or other e-reader when it shows the title of the e-book in the list of e-books stored on it.

And then we get to the ‘guts’ of toc.ncx: the ‘navMap’, or ‘navigation map’.

This begins with an opening <navMap> tag.

I am using as an example an e-book which begins with a linear sequence of single chapters. In this case they were:

DavidGee.xhtml, 
Dedication.xhtml,
AuthorsNote.xhtml,
etc. etc.

The navMap consists of a number of ‘navPoints’, or ‘navigation points’. Essentially, each navPoint defines a place within the e-book to which the e-reader will jump when a link in the logical table of contents is clicked. In an e-book with chapters in a linear sequence, each chapter has a single, closed, navPoint to itself in toc.ncx, which begins with an opening <nav Point > tag and ends with a closing </navPoint> tag. For example the navPoint for my first chapter was:

<navPoint id="navPoint-1" playOrder="1">
     <navLabel>
          <text>David Gee</text>
     </navLabel>
     <content src="Text/DavidGee.xhtml"/>
</navPoint>


Within the opening <navPoint > tag  the navPoint is given an id (in this case "navPoint-1") and a playOrder, (in this case "1"). This is a hangover from the original audio book specification. Sigil names the navPoints "navPoint-1", "navPoint-2", etc by default and lists the play order in numerical sequence.

I suppose the play order and ids could be changed, but I cannot see what this would achieve. Just accept what Sigil has placed here. If you want the chapters in a different sequence, re-arrange them in Sigil and then re-create toc.ncx. Let Sigil do all the hard work!!!

The next item within the navPoint is a <navLabel> tag containing the text which will be used in the logical table of contents when it is displayed by the e-reader:

     <navLabel>
          <text>David Gee</text>
     </navLabel>


The text is itself enclosed by an opening <text> tag and a closing </text> tag. Sigil has entered just what it found in the <h1> tag in the file when it created toc.ncx. (See my previous post on how to generate the logical table of contents for more information.) This item is then closed by a closing </navLabel> tag.

Following the navLabel, the final item in the navPoint is a <content /> tag, containing a reference to the place in the e-book which the e-reader will jump to when the entry is clicked by the user:

     <content src="Text/DavidGee.xhtml"/>

In the case of a simple chapter (as shown above), this will be the path and filename of the chapter. Clicking the link just jumps to the start of the chapter in question. Because toc.ncx and the Text folder are both in the same place (inside the OEBPS folder), and because the chapters are inside the Text folder, it is necessary to specify ‘Text/’ in front of the filename to identify the path from toc.ncx to the chapter. Notice that the <content /> tag is a SINGLE tag closed by the ‘/’ at the end. Because of this, there is no need for a separate closing </content> tag.

The navPoint ends with a closing </navPoint> tag.

EACH navPoint in the e-pub will be specified in the same way in tox.ncx.

So in this example the next navPoint is:

<navPoint id="navPoint-2" playOrder="2">
     <navLabel>
          <text>Dedication</text>
     </navLabel>
     <content src="Text/Dedication.xhtml"/>
</navPoint>


Which follows exactly the same syntax as in the previous example, and this pattern is followed for all the following chapters, so long as they form a simple linear sequence, as will probably be the case in the majority of e-books, certainly in most novels.

BUT IF one of the chapters is split up into subsections, AND if you want these subesections including in toc.ncx, it gets a bit more complicated. Click here to skip the next bit if your e-book has no subheadings.

In this example, the chapter ‘Tuesday’ is split into sub-sections called ‘Morning’, ‘Afternoon’ and ‘Evening’. The tags in the chapter containing the titles to these sub-sections have each been given a label (or id) to identify where in the file the software should jump to when the link is clicked. Sigil will enter a default label when it builds toc.ncx, but I have edited in my own more meaningful label, and recommend that you do the same. (See my previous post on how to generate the logical table of contents for more information.) To make these labels unique in the example, I called them ‘tuesdayMorning’, ‘tuesdayAfternoon’ etc. Although, provided you do not duplicate a label within the same chapter you should be alright. However it *IS* easier when editing the tables of contents if the labels are ALL different.

So in the chapter I had:

<p id="tuesdayMorning" class="chapterSubHeading">Morning</p>

<p id="tuesdayAfternoon" class="chapterSubheading">Afternoon</p>

etc.

In the corresponding <content /> tags in the navPoints, the src will now need to be: ‘Text/Tuesday.xhtml#tuesdayMorning’, ‘Text/Tuesday.xhtml#tuesdayAfternoon’, etc.as explained below:

Looking at my example,  in toc.ncx, we see that it contains the following structure:

After the navPoint for Monday.xhtml (which I have already explained), the next thing is the navPoint for Tuesday.xhtml:

<navPoint id="navPoint-7" playOrder="7">
     <navLabel>
          <text>Tuesday</text>
     </navLabel>
     <content src="Text/Tuesday.xhtml"/>


which just follows the syntax I have already explained above. HOWEVER in this case it is NOT closed with a closing </navPoint> tag.

Instead there is a new <navPoint > for the first subsection:

     <navPoint id="navPoint-8" playOrder="8">
          <navLabel>
               <text>Morning</text>
          </navLabel>
          <content src="Text/Tuesday.xhtml#tuesdayMorning"/>
     </navPoint>


Notice that the label: ‘#tuesdayMorning’ has been added to the filename in the <content /> tag. This makes the e-reader software jump to the label: ‘tuesdayMorning’ in the chapter: ‘Tuesday.xhtml’ when the link in the logical table of contents is clicked by the user.

As another subheading follows, the navPoint for ‘tuesdayMorning’ is closed by a closing </navPoint> tag and then the navPoints for next two subheadings follow:

     <navPoint id="navPoint-9" playOrder="9">
          <navLabel>
               <text>Afternoon</text>
          </navLabel>
          <content src="Text/Tuesday.xhtml#tuesdayAfternoon"/>
     </navPoint>

     <navPoint id="navPoint-10" playOrder="10">
          <navLabel>
               <text>Evening</text>
          </navLabel>
          <content src="Text/Tuesday.xhtml#tuesdayEvening"/>
     </navPoint>


Again, these navPoints are also closed.

Finally, there is the missing closing </navPoint> tag for Tuesday.xhtml.

Putting all of that together, it looks like this:

<navPoint id="navPoint-7" playOrder="7">
     <navLabel>
          <text>Tuesday</text>
     </navLabel>
     <content src="Text/Tuesday.xhtml"/>

     <navPoint id="navPoint-8" playOrder="8">
          <navLabel>
               <text>Morning</text>
          </navLabel>
          <content src="Text/Tuesday.xhtml#tuesdayMorning"/>
     </navPoint>

     <navPoint id="navPoint-9" playOrder="9">
          <navLabel>
               <text>Afternoon</text>
          </navLabel>
          <content src="Text/Tuesday.xhtml#tuesdayAfternoon"/>
     </navPoint>

     <navPoint id="navPoint-10" playOrder="10">
          <navLabel>
               <text>Evening</text>
          </navLabel>
          <content src="Text/Tuesday.xhtml#tuesdayEvening"/>
     </navPoint>

</navPoint>

Collapsing the navPoints, the structure is more obvious to see:

<navPoint id="navPoint-7" playOrder="7">
    

     <navPoint id="navPoint-8" playOrder="8">
          ...
     </navPoint>

     <navPoint id="navPoint-9" playOrder="9">
         
     </navPoint>

     <navPoint id="navPoint-10" playOrder="10">
         
     </navPoint>

</navPoint>

This shows the hierarchy of the navPoints: the higher-level NavPoint-7 contains the lower-level navPoints 8 to 10.

When the e-reader sees this structure, it will recognise the hierarchy of the navPoints and distinguish them in some way from each other, most probably by indenting the lower-level ones in on the left, although the exact way this is done will depend on the software installed on the e-reader in question.

Finally, after the final closing </navPoint> tag in toc.ncx, there is a closing </ncx> tag.

And that covers the syntax of toc.ncx.

Whilst it is interesting to know how this file is put together, it may also be useful if you need to make a last-minute change to the sequence or structure of your chapters and don’t for some reason want to go back to the original e-pub and re-build toc.ncx from scratch. Armed with a knowledge of the structure of the file you ought to be able to make the necessary minor changes to toc.ncx yourself.

My next post will go into how to genertate the html table of contents, which is quite separate from the ncx table of contents.

Index to ‘how to …’ posts:

How to ‘unpack’ an epub file to edit the contents and see what’s inside.
How to understand what is inside an epub
How to link the html table of Contents in a Kindle e-book
How to restructure the html table of contents for a Kindle
How to delete the html cover for a Kindle ebook
How to link the cover IMAGE in a Kindle e-book
How to clean up your MS Word file before your get started
How to markup an MS Word file to identify the formats before importing it into an epub
How to create a new blank e-pub using Sigil
How to import your marked-up MS Word file into your ebook using Sigil
How to create and link a CSS stylesheet in an e-book using Sigil
How to replace the markup with CSS styles in your ebook using Sigil
How to style an e-book so it works with the limited CSS styling available to Kindle e-readers
How to understand the syntax of CSS
How to style Small Caps in an e-book
How to split your ebook up into chapters using Sigil
How to sequence your e-book
How to phrase the copyright declarations etc. in an e-book
How to generate the logical table of contents using Sigil
How to understand toc.ncx in an e-book
How to generate the html table of contents in an e-pub
How to style the html table of contents using CSS
How to create an html cover for your epub using Sigil
How to present references and notes in a book
How to use Mark Up to link notes in your e-book
How to present a bibliography in a book
How to use markup to link entries in a bibliography with the notes section
How to index an e-book
How to use the tools in MS Word to create an index
How to alphabetise an index or bibliography

TinyURL for this post: http://tinyurl.com/q63q4hx

1 comment:

  1. Rod, I really need somebody that knows what they are doing to clean up a EPUB I generated. Can you help me?
    morgan@bomardesign.com
    281.300.7326

    ReplyDelete

 
Twitter Bird Gadget