Can Everton Jones find out how his father stole Emperor Bokassa’s diamonds and, more importantly, where he hid them; before the world and his brother get there first?
Click on the picture link in the sidebar to read an extract of my first novel, which was published by Paradise Press in August 2012.

Friday, 7 March 2014

How to understand what is inside an epub

You can open an epub using Sigil and explore the contents. When it opens, if the ‘book browser’ isn’t visible, select ‘book browser’ from the ‘view’ menu:
If the book browser opens in a separate window, just click and drag it to ‘dock’ it on the left:

This ‘docks’ it conveniently next to the main Sigil window:
As you can see, the e-book is, in reality, a collection of folders and files. To begin exploring the contents, start by clicking on the triangle beside ‘Text’ to expand/collapse that folder:
You can now see a list of the contents of the ‘Text’ folder. Note that the files are actually .xhtml files. This is a requirement of the epub standard. Xhtml is a very strict version of html. I am using the term ‘html’ very loosely in these posts. Sigil makes correctly formatted xhtml files for you. My ebook here begins with a file called ‘Cover.xhtml’. This is an html file containing just the cover image. Each of the other files are the chapters of the ebook. Perhaps it is a good idea at this point to draw attention to the file called ‘Index.xhtml’ in my example. This holds the html table of contents. In this there are hyperlinks to each chapter or section of a chapter within the ebook. There has to be an html table of contents in an ebook or else Kindlegen will not convert it. I always end the ebook with a file called something like ‘backCover.xhtml’, which is an html file containing just the back cover of the book as an image. I also put the ISBN of the ebook on the back cover and a barcode.

Collapsing the ‘Text’ folder and expanding the ‘Styles’ folder reveals this:
The only file in this folder in my ebook is the CSS stylesheet for the ebook. (CSS stands for Cascading Style Sheet. More about CSS in future posts. Essentially, the CSS file contains the formats used within the e-book.) You could have more than one stylesheet if you wanted, but that seems like unnecessary complication to me. The stylesheet must be correctly linked in the ‘header’ section of each chapter. (see my post on how to generate and link the CSS stylesheet)

The ‘Images’ folder holds the images in the ebook. In this case they are all jpegs (recommended):
Then the ‘Fonts’ folder contains the fonts I have embedded in the ebook:
The rationale for embedding fonts is somewhat tenuous. Arguably there is no point in even trying to force the ebook reader to display a font of your choice, as one of the features users like the most about ebook readers is precisely their ability to select the font used. IF you DO embed the fonts, they must be correctly linked in the stylesheet. (See a future post about how to do this.) The kindle will completely ignore any embedded fonts, so perhaps embedding fonts is more trouble than it’s worth. I’m including embedded fonts in this example for the sake of completeness. You might want to ignore this bit.

Notice that the filenames in the ebook are all single text strings with NO SPACES. This goes for ALL files, chapters, images, fonts, etc. This is important as epubcheck will chuck out a load of warnings if you have spaces in the filenames. Sigil will take care of putting the files into the proper folders when you create the epub.

Now we get to the interesting parts of the ebook. First is a special file called ‘toc.ncx’. ‘NCX’ stands for Navigation Center Extended, which is a standard developed for audio books that epub have adopted for ebook indexes. This file has been generated by Sigil. It is used by the ebook reader to create a table of contents. Each ebook reader will implement the way the user accesses this table of contents in a different way. Often there will be an ‘Index’ button on the ebook reader giving the user direct access to the table of contents at the touch of a button from anywhere in the ebook. Double-clicking ‘toc.ncx’ in the book browser displays the file’s contents in the main sigil window:
It all looks very scary for non programmers! Luckily, there is no need to know anything about this file, but I do intend demystifying it in a forthcoming post. The ncx table of contents is sometimes called a logical table of contents and is in addition to the html table of contents ‘Index.xhtml’ in my example. BOTH are required in a Kindle e-book. (I will link to a post on how the logical table of contents should be linked in the manifest here.) (Click here to find out how to link the html table of contents for Kindle.) Sigil automatically generates and links the ncx table of contents when you make the ebook. Again, building the logical table of contents using Sigil will be convered in a future post.

Finally, there is a file called content.opf. This is the most important file in the ebook. Opening it it looks like this:
You can see from the first ‘header’ line that this file is in a format called ‘xml’. This stands for ‘eXtensible Markup Language. The syntax is a bit complicated, but not impossible to understand. After the header comes an opening <package > ‘tag’. After that, the first section is enclosed by an opening metadata tag: <metadata > and a closing </metadata> tag. (I’ve replaced unneccessary detail with an ellipsis (…).) Here are the sections you will find within content.opf:









The file finally ends with a closing </package> tag.

Again, I intend demystifying the content.opf file in more detail later. In this post, suffice it to say that the <metadata> section details the metadata for the ebook, such as the title, author, ISBN, date of publication etc. Sigil creates this section for you. The <manifest> is just a list of the items within the epub package and where to find them. It also gives each of them a label (or ‘id’) and specifies what kind of file it is. EVERY item inside the ebook MUST be listed in the manifest. Sigil will do this for you.

Then comes the <spine>. This lists the chapters of the book in the order in which they are to be displayed. It is perhaps worth noting here that the <spine> uses the label/id given to the item in the <manifest> rather than the actual location of the file. Sigil places the chapters in the <spine> in the order in which they are listed in the book browser. Click and drag the files in the book browser in Sigil to re-sequence them. Finally, the last section in the content.opf is called the <guide>. This points to key items in the ebook. Notably, this is used by Kindle to point to the table of contents. That is to say the html table of contents (Index.xhtml) rather than the logical table of contents (toc.ncx). In an ebook for Kindle the only item which should be in the guide is the html table of contents. I explain how to link this here. And there is important information about how to code the links in the html table of contents for Kindle here

That is all of the ebook which you can access using Sigil. However there is a bit more to it than that. An epub ebook is really a zip archive in disguise. See my post How to ‘unpack’ an epub file to edit the contents and see what‘s inside. to find out how to extract the archive. Here’s what another ebook of mine looked like after I had extracted it:
As you can see, there are two folders and one file inside the ebook. First, let’s take the folder called OEBPS. This is what is inside:
As you can see, these are just the same as the items which Sigil displayed in the book browser, albeit in a different order.

Inside the META-INF folder you will find just one file:
This is the ‘container.xml’ file. It is another XML file and is an important part of the epub specification. Opening it using an html editor such as Komodo Edit (or you could use a text editor) you will see:
I have left off the extreme right hand side of some long lines, but we don’t need to know anything about this file. All it does is tell the ebook reader where to find content.opf. Sigil creates the file ‘container.xml’ for you.

Finally there is a file called ‘mimetype’. Open this using Komodo Edit and you will find it contains just one line:
This just tells the ebook reader that it is an epub ebook. Perhaps it is worth pointing out that the items in the epub should be added to the archive in the following order: FIRST the mimetype, THEN the META-INF and FINALLY the OEBPS. And that’s what lies within an epub file.

You WILL need to be able to access, understand and edit the content.opf file in order to create an e-book which will convert successfully from epub to kindle format. Each step of the procedure will be covered in separate posts to come.

Index to ‘how to …’ posts:

How to ‘unpack’ an epub file to edit the contents and see what’s inside.
How to understand what is inside an epub
How to link the html table of Contents in a Kindle e-book
How to restructure the html table of contents for a Kindle
How to delete the html cover for a Kindle ebook
How to link the cover IMAGE in a Kindle e-book
How to clean up your MS Word file before your get started
How to markup an MS Word file to identify the formats before importing it into an epub
How to create a new blank e-pub using Sigil
How to import your marked-up MS Word file into your ebook using Sigil
How to create and link a CSS stylesheet in an e-book using Sigil
How to replace the markup with CSS styles in your ebook using Sigil
How to style an e-book so it works with the limited CSS styling available to Kindle e-readers
How to understand the syntax of CSS
How to style Small Caps in an e-book
How to split your ebook up into chapters using Sigil
How to sequence your e-book
How to phrase the copyright declarations etc. in an e-book
How to generate the logical table of contents using Sigil
How to understand toc.ncx in an e-book
How to generate the html table of contents in an e-pub
How to style the html table of contents using CSS
How to create an html cover for your epub using Sigil
How to present references and notes in a book
How to use Mark Up to link notes in your e-book
How to present a bibliography in a book
How to use markup to link entries in a bibliography with the notes section
How to index an e-book
How to use the tools in MS Word to create an index
How to alphabetise an index or bibliography
How to adapt the print index in your MS Word file for an e-book using markup
How to adapt cross-references in your print index for e-book and how to use markup to make the links
How to understand content.opf
How to understand and edit the Metadata of an ebook using Sigil
How to understand the manifest in content.opf
How to understand the spine and guide in content.opf
How to test your e-pub using flightCrew in Sigil
How to test your e-pub using epubcheck
How to convert an e-pub to Kindle using kindlegen

TinyURL for this post:

No comments:

Post a Comment

Twitter Bird Gadget