Can Everton Jones find out how his father stole Emperor Bokassa’s diamonds and, more importantly, where he hid them; before the world and his brother get there first?
Click on the picture link in the sidebar to read an extract of my first novel, which was published by Paradise Press in August 2012.

Saturday, 25 October 2014

How to understand and edit the Metadata of an ebook using Sigil

The easiest way to deal with the metadata in your ebook is to use Sigil to create it. Metadata is information about the publication, such as author, title, ISBN etc. To open the Metadata editor in Sigil, select Metadator Editor … from the Tools menu:


In a blank ebook, the metadata editor will open in Sigil with no information in it:


The Metadata Editor: Introductory:

The top of the metadata editor dialog has spaces for three key pieces of metadata: the Title, Author and Language:


In essence, all you need do is type in the title and author and select a language from the drop-down menu. The author name should be typed in the normal order: ‘Rod Shelton’. Next to the author field is another one labelled: ‘File As:’.


Type in the author’s name as it would appear in an index: ‘Shelton, Rod’, which is the order … well … in which the e-book should be filed. (See my post on alphabetising an index for detailed guidance on the conventions for alphabetising names: it’s far from as obvious as it might at first seem!)

Other items of metadata, such as the ISBN, date of publication, etc, can be added by clicking the ‘Add Basic’ button at the top right of the dialog. Add other authors, or contributors, editors, illustrators etc. by clicking the ‘Add Role’ button, which is immediately below:


Each piece of metadata you add creates an entry in the <metadata> section of content opf. ONE entry per item of metadata.

Syntax of the <metadata>:

The metadata section in content.opf begins with an opening <metadata …> tag and ends with a closing </metadata> tag.

Inside the opening tag there are links to the xml namespace conventions for the dublin core and opf specifications:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">

Inbetween this and the closing tag there are a number of tags like this: <dc:></dc:> or like this: <opf:></opf:> each one of which defines an item of metadata associated with your e-book.

In the case of the title, Sigil creates the following entry:

<dc:title>My Ebook Example</dc:title>

The actual title is contained within the two <dc:title> and </dc:title> tags. This is what you entered in the metadata editor dialog and should exactly match how you want the ebook title to be displayed on the e-reader. Remember that the filename can be anything you want. It is what the e-book reader finds in the metadata which gets displayed.

As described elsewhere, ‘dc:’ is an xmln namespace code which explains to the e-book software which particular meaning it should understand title to have. ‘dc:’ stands for ‘Dublin Core’ and the specification for this convention can be found here: http://dublincore.org/documents/dces/. And, yes, I know this is NOT the URL given in the opening <metadata …> tag. It was, however, the best link I could find to explain the Dublin Core protocol.

The way the e-pub specification builds on and extends the dublin core protocol is complicated, and I will explain this as it arises after describing how to use the metadata editor in Sigil for each piece of the metadata in turn.

In the case of the language, e-books use a special code, rather than the name of the language. Sigil looks up this code based on whatever you select from the dropdown menu in the metadata editor dialog. In my example, I entered UK English from the dropdown menu:


Sigil created this entry in the metadata:

<dc:language>en-GB</dc:language>

which has the correct code: en-GB for UK English sandwiched between the opening and closing <dc:language> tags.

The code is in two parts. The first part (en) identifies the langauge as English, in accordance with International Standards Organisation (ISO) standard number 639, the official website for which is here: http://www.loc.gov/standards/iso639-2/php/English_list.php. The second part is a qualifier for British English (-GB) following ISO 3166 (which can be found here: https://www.iso.org/obp/ui/#search.) The way these two standards are combined is in accordance with the guidance obtainable from here: http://www.ietf.org/rfc/rfc3066.txt. Sigil inserts the correct code for you.

A quick check showed that a very small number of obscure languages assigned a code in ISO 639 are NOT in the drop-down menu in Sigil. I would have thought in the unlikely event that your ebook was in one of these langauges you would need to manually edit the content.opf entry and substitute the ISO 639 code yourself. Luckily, Sigil will in most cases be able to enter the correct information in the metadata part of content.opf for you (and also see further comments below). It might be necessary to do this by editing content.opf manually using an html editor such as Komodo Edit, rather than in Sigil, which might possibly delete an entry if it doesn’t recognise the code. See my post on how to ‘unpack’ an e-book to edit the contents for more details of how to edit content.opf using an html editor.

So that covers the basics.

Adding straightforward metadata using ‘Add Basic’:

Most additional metadata can be added by clicking the ‘Add Basic’ button at the top right of the metadata editor dialog. Another dialog loads, from which you can select what kind of metadata you want to enter. I clicked ‘publisher’ from this list:


This adds ‘Publisher’ to the table in the bottom part of the metadata editor dialog:


Each row of the table has four columns: ‘Name’, ‘Value’, ‘File As’ and ‘Role’. The first column contains the name of the item of metadata you are adding (or editing). The second holds the value of that metadata. The last two coulmns are only relevant for certain items of metadata, and I will go into their role below as it becomes necessary.

Publisher:

Once you have added ‘Publisher’ to the metadata editor dialog, double-clicking in the value column opens it for editing:


Just type in the publisher name. I entered ‘Paradise Press’. When you close the dialog, the following entry will be created in the metadata:

<dc:publisher>Paradise Press</dc:publisher>

As you can see, the entry follows the same syntax as all the others described above. For this item of metadata, the other two columns in the metadata editor dialog cannot be opened.

Other straightforward medatata:

In the items described below, enter the data in the ‘Value’ column in the Metadata Editor dialog.

Coverage: Use this to describe the ‘extent or scope’ of your ebook. The guidance in the opf specification says this should be made using a ‘controlled voccabulary’ and directs you to the Dublin Core for suggestions. The links in the document I accessed didn’t work. Add something like ‘general readers’ in the ‘Value’ column and Sigil creates the following in the metadata:

<dc:coverage>general readers</dc:coverage>

Description: The next item is for a description of the content. This might be a way to embed a blurb in the metadata. The metadata Editor window has limited editing tools, but I was able to enter a lengthy description here. Line breaks were not possible. The entry created in the metadata looked like this:

<dc:description>Can Everton Jones find out how his father stole Emperor Bokassa’s diamonds?</dc:description>

You might want to create a lengthy entry here in MS Word and then save it as unformatted text and paste it into the Metadata Editor. Particularly so if, like me, you are picky about insisting on curly quotes which you will have difficulty entering by typing directly into the Metadata Editor window.

Format: this is for the format of the publication. The guidance suggests using the MIME-type here, which seems crazy to me (unless I’ve missed something!), because there is no mimetype code for mobi (kindle) or e-pub that I can find. And anyway, aren’t both mobi and epub files essentially just .zip archives so far as a computer is concerned and so would have the same mime type, viewed from this perspective? Furthermore the mimetypes for all the component parts of the e-book are already embedded within it. I would have thought ‘epub’ or ‘mobi’ (for kindles) were more useful, and are the only two possibilities of relevance in these posts. The entry in the metadata created by Sigil will look like this for a kindle:

<dc:format>mobi</dc:format>

Additional Language: In addition to the language you selected from the drop-down menu in the top part of the metadata editor dialog, you can specify a further language, if relevant. Simply select ‘language’ from the ‘add basic’ dialog and a row is added to the metadata editor with ‘language’ in the ‘Name’ column. Just type the name of the language in the ‘Value’ column and Sigil will look up the correct language code for you. An additional entry is made in the metatdata for the second language as outlined above. HOWEVER, if you type a language which is NOT supported, such as ‘Polari’, ‘Kalaallisut’ or even ‘Vulcan*’, Sigil will create a blank tag instead: <dc:language></dc:language>. If you save the file or perform another operation, this blank entry will disappear completely.

(*On checking, to my utmost astonishment, Klingon IS a supported language, with its own ISO 639 code!)

Relation: is how to specify a related publication, apparently by simply entering a suitable code to identify it, such as an ISBN. The entry which Sigil will make in the metadata will look like this:

<dc:relation>9781904585488</dc:relation>

I would have thought entering the ISBN of a sequel to a novel would make it easier for sites like Amazon to suggest a reader might want to buy the sequel if they have already bought the first book.

Rights: provides a place to enter the rights information. The entry might look like this:

<dc:rights>worldwide exclusive</dc:rights>

HOWEVER do make sure this is in addition to putting the rights information on the title-page verso (otherwise known as the copyright page) to make sure you have properly protected your rights. (See this place in my post on how to phrase the copyright declarations for more information.)

Source: is to identify any source material on which your own book is based. Most likely this would not be needed. The entry in the metadata might possibly look like this:

<dc:source>short stories submitted to the 2012 Paradise Press Short Story Competition</dc:source>

Subject: can be defined if you want to put the subject matter in the metadata. The guidance says ‘any keyword or arbitrary short phrase’ would be suitable for this purpose and multiple entries are supported. The code in the metadata might look like this:

<dc:subject>queer ghost stories</dc:subject>

Type: is where you enter ‘general categories, functions or genres’. An entry could look like:

<dc:type>fiction</dc:type>

More Complicated metadata:

The metadata above is all entered as a single value. Other metadata requies a bit more discussion and understanding of the conventions. Perhaps the next least troublesome is:

Date:

There are four possible options to choose from the add (basic) metadata dialog: ‘Date (custom)’, ‘Date: Creation’, ‘Date: Modification’ and ‘Date: Publication’. Lets say I select ‘Date: Publication’:


When I return to the metadata editor dialog, the new line in the table will look like this:


 (I have hilited the relevant row by clicking on it.)

Notice that the current date has been pre-loaded in the ‘Value’ column, and ‘publication’ has been pre-loaded in the ‘File As’ column. Closing the dialog will add the following entry in the metadata in content.opf:

<dc:date opf:event="publication">2014-09-12</dc:date>

The first thing to point out is that the value: 2014-09-12 is the date in reversed order. Then note the bit which says opf:event="publication" in the opening tag. This is how the epub consortium decided in their wisdom that it should be communicated to the e-reader that the value represents a publication date.

To change the publication date in an e-pub, just open the metadata editor and alter the entry in the value column. The date will be updated in the metadata automatically.

Of the other three possibilities, creation and modification dates work in exactly the same way. Sigil enters ‘creation’ or ‘modification’ in the ‘File As’ column and  opf:event="creation" or opf:event="modification" respectively in the metadata.

Selecting ‘Date (custom)’ enters ‘custom’ in the ‘File As’ colum and, if you leave it as it is, will enter opf:event="custom" in the metadata. However you can type anything you want in the ‘File As’ column. So, for example, let’s say you want to enter a date for a major revision. Just change the entry in the ‘File As’ column from ‘custom’ to ‘revised’ and the metadata will contain:

<dc:date opf:event="revised">2014-09-12</dc:date>

HOWEVER, according to the e-pub standards, you MUST enter ‘revised’ as a single lower case text string, with NO CAPITALS or SPACES. The same goes for any other ‘custom’ date you enter.

As I have said, the date is in reverse order. At least a year must be entered as four digits. The month by itself can be added or else the month and day can be added if you want to. If a month or day is included each must have two digits. Include leading zeroes for values below ten, as indicated in the example above.

Coming back to that opf:event= statement, this an xml extension defined in the opf specification. It was devised to extend the Dublin Core, to allow different kinds of dates to be associated with the e-book. On the face of it, it’s a rather untidy way of doing this but, as it builds on existing conventions, it is the most logical way of achieving the flexibility required for creating rich metadata in e-books.

I should add that Sigil has a habit of editing/adding to the date metadata as you go along, so sometimes I find it has added or updated a modification date in the metadata after I have been editing the file. I would advise checking the metadata just before commencing the conversion to kindle and also just before publishing the e-pub, to correct anything Sigil did behind your back.

Identifier:

The way Sigil deals with identifiers is similar to how it deals with dates. To enter an ISBN as an identifier for your e-book, just select ‘Identifier: ISBN’ from the ‘add basic’ dialog. In the metadata editor, ‘ISBN’ is pre-entered in the ‘File As’ column and all you need do is to type the ISBN in the ‘Value’ column.

Entering an ISSN (International Standard Serial Number, used for periodicals) works in exactly the same way, just select ‘Identifier: ISSN’.

Another identifier in use in the computing world is a ‘digital object identifier’ or DOI (the website for which is here: http://www.doi.org/). If you want to associate a DOI with your e-book, enter it in the same way, by selecting ‘Identifier: DOI’ from the ‘add basic’ dialog.

Finally, you can enter a custom identifier. Just select ‘identifier (custom)’ from the ‘add basic’ dialog and proceed in the same way as for custom dates. Sigil enters ‘customidentifier’ in the ‘File As’ column. You should replace this with the kind of identifier you want to include and enter the value for it in the ‘Value’ coulmn. See immediately below for an example.

ASIN:

One custom identifier you might consider using is the ASIN, or ‘Amazon Standard Item Number’, which is assigned to your e-book by Amazon after you have uploaded it to the kindlestore. This could be an alternative to the ISBN if you have decided to publish kindle-only without buying an ISBN. You would have to upload your e-book to the kindlestore, find out what ASIN has been assigned to it by Amazon and then edit the original file to include the ASIN as an identifier and re-upload the e-book. The ASIN should remain the same after you have re-uploaded your e-book.

The code for adding an ASIN as a custom identifier would be:

<dc:identifier opf:scheme="ASIN">B008EDNN1S</dc:identifier>

UUID:

When Sigil makes a blank e-book, it creates an identifier for it using a scheme called a ‘Universally Unique Identifier’ or UUID. This is a long string of characters which is generated by your computer according to an alogorithm which is intended to produce a unique string of characters. Of course, it isn’t entirely foolproof, but the high likelihood is that the UUID will be just that … erm … unique.

For the curious, there is an online UUID generator here: https://www.uuidgenerator.net/, which has a brief description of how a UUID is generated. References to the actual specification can be found by consulting Wikipedia.

If you think about it, Sigil would have to have done something of the sort when it created the file. It is compulsory that there be at least one identifier for the e-book, and Sigil cannot know in advance what the ISBN or other identifier will be. By entering a UUID in the meantime, Sigil ensures that the blank e-book which it creates complies with the e-pub standard.

In the metadata, Sigil enters this item:

<dc:identifier id="BookId" opf:scheme="UUID">urn:uuid:b25e5002-0004-4435-ba01-52ec365266ea</dc:identifier>

Within the opening <dc:identifier > tag, opf:scheme is an extension defined in the opf specification to specify which identification scheme (ISBN, ASIN, UUID, DOI, etc) the entry describes. In this case Sigil has entered opf:scheme="UUID". Other identifiers would follow the same syntax: opf:scheme="ISBN" etc.

In the example, b25e5002-0004-4435-ba01-52ec365266ea is the UUID generated by Sigil when I created the example file. It is prefixed by urn:uuid: which is telling the e-reader software that what follows is a ‘uniform resource name’ using a universally unique identifier.

Unique Identifier:

Apart from opf:scheme="UUID", the other thing inside the opening <dc:identifier > tag above is id="BookId". This labels the identifier. At least one identifier needs to be listed in the metadata and one of these has to have a label. The label is used in the opening <package > tag to mark that identifier as the unique identifier to be used for the e-book (see here in my previous post for details). Sigil has used "BookId" as the label but, logically, it could be anything you wanted, PROVIDED  it matches in the <metadata> and the <package>.

Other identifiers entered by you (for example for an ISBN) will follow the same syntax, only minus the label:

<dc:identifier opf:scheme="ISBN">9781904585441</dc:identifier>

Using an ISBN as the unique identifier:

IF you want to use the ISBN as the unique identifier instead of the UUID, then just copy the label id="BookId" into the identifier tag for the ISBN:

<dc:identifier id="BookId" opf:scheme="ISBN">9781904585441</dc:identifier>

and delete it from the identifier tag for the UUID:

<dc:identifier id="BookId" opf:scheme="UUID">urn:uuid:b25e5002-0004-4435-ba01-52ec365266ea</dc:identifier>.

(OR even delete the entire line containing the UUID from content.opf altogether!)

If you do this, be aware that the line in the metadata containing the unique identifier cannot be edited by the metadata editor. Making the ISBN the unique identifier removes it from the table in the metadata editor dialog.You will have to make such changes by editing content.opf directly using Sigil.

I ought to add that, as the identifiers are only seen by the software: all the reader sees is just what you have entered on the title page verso (copyright page), I can’t see any reason for changing the unique identifier. But, if you wanted to, that’s how you would do it.

Adding a Contributor or Creator (including a second author) using ‘Add Role’:

The epub specification allows very rich metadata to be added for the various people who have contributed in some way to the creation of an e-book. But this comes with a cost:  the system they have chosen to employ is just a bit complicated!

Somewhat conterintuitively, Sigil requires you to click the ‘add role’ button to add another author or other contributor. The reason becomes readily apparent when you do click ‘add role’. A dialog loads with an extremely long and nearly exhaustive list of the various roles other contributors might have:


To add an author, select ‘Author’ from this list and click OK. Sigil adds a row to the table in the metadata editor dialog with ‘author’ in the ‘Name’ column and ‘contributor’ in the last column, which is now headed ‘Role Type’. Type the new author name in the ‘Value’ coulmn and then retype it as it should be alphabetised in the ‘File As’ column.

Now, if you click on ‘Contributor’ in the fourth column, you get a drop-down menu with two choices: ‘Contributor’ or ‘Creator’:

Select ‘Creator’ for a main author or select ‘Contributor’ for a secondary author.

Proceed in exactly the same way for any of the other contributor (or creator) roles from the ‘add role’ dialog.

For a main author, sigil will add this to the metadata:

<dc:creator opf:file-as="Beckham, David" opf:role="aut">David Beckham</dc:creator>

OR, for a secondary author, it will add this to the metadata:

<dc:contributor opf:file-as="Beckham, David" opf:role="aut">David Beckham</dc:contributor>

The opening tag contains these TWO opf fields: opf:file-as, which we have already covered, and opf:role. In this case a code is specified: opf:role="aut". EACH role from the add role dialog has an associated code. There is a complete list of these here: http://www.loc.gov/marc/relators.

If a code for a role does not exist in that list, you can make one up, beginning it with ‘oth’ followed by a full point and then the text for your custom role (as a single lowercase text string). Casting around a bit for an example, I came up with ‘oth.fluffer’ for a very crucial role in the making of a certain kind of video!

Unfortunately, as Sigil enters ‘author’ and not the corresponding code (aut) in the metadata editor, you cannot enter a custom role this way. You will have to edit it in manually using Sigil. Alternatively you could use an html editor such as Komodo Edit. See my post on how to unpack an e-pub for more information on how to go about this. The code to add to the manifest would look like this:

<dc:contributor opf:file-as="Langan, Brendan" opf:role="oth.fluffer">Brendon Langan</dc:contributor>

Altering an existing entry in the metadata:

All you need do to edit an entry you have already created in the metadata is to open the metadata editor and alter the relevant row of the table. All necessary changes to the metadata in content.opf will be made for you. At a late stage of preparation, you should be able to use this post to help you edit content.opf manually using an html editor, if that becomes necessary.

As a final note, if you select a row in the table in the Sigil Metadata Window and then click the up or down arrow buttons, the item is moved up or down in the list. The guidance says that if there are multiple contributors/creators in the metadata the e-book reader software should presume that the names are in the order in which they are intended to be displayed in the ebook. As far as I can determine, all other items in the metadata section can be in any order. To delete an item of metadata, select it in the metadata editor window and click the ‘Remove’ button.

Cover image for Kindle:

If that all were not bad enough, Amazon has thrown a further spanner in the works by requiring the cover IMAGE to be linked in a rather peculiar way. I stress that the above relates to e-pub e-books. This section is only relevant to KINDLE. Amazon want you to add an entry to the metadata to identify the cover image to the kindle software. The syntax is:

<meta name="cover" id="my-cover-image">

This is the example given in the Kindle Publishing Guidelines. You will have to edit it in using an html editor, such as Komodo Edit. See my post on how to ‘unpack’ an e-pub for information on how to do this. The name="cover" item is MANDATORY and should be entered EXACTLY as typed above. The other item in the tag ("my-cover-image" in the example) should be the id (or label) given to the cover IMAGE in the <manifest>. See my posts on how to delete the html cover and how to link the cover IMAGE for kindle (in that order) for detailed instructions on how to adapt the cover in your e-pub for kindle. The <manifest> is explained in my next post. I have repeated this information here because it is of relevance to the <metadata>. You should ignore this discussion if you are creating an e-pub, it is only relevant to Kindle.

I have gone into the guts of how the metadata is specified here. In actual fact, using Sigil to do all the hard work for you is by far the best approach. In fact it makes sense to create an epub using Sigil BUT ENTER THE METADATA for your KINDLE e-book. Then convert the epub into a mobi file using kindlegen and finally edit the original epub to change the metadata to their proper epub values. That way Sigil can be relied upon to get things right for you.

A Word of Warning!!!!

Your e-book will inevitably be listed on many different websites. For an e-book with an ISBN these websites will most likely use the data supplied to the booktrade databases, such as Nielsen in the UK and I guess R. R. Bowker in the USA. The metadata embedded within the e-book may well also get picked up by the websites listing it and may not necessarily be identical with that held by Nielsen/Bowker. There could be a conflict between the two and I would advise carefully monitoring the situation and being ready to make changes to the metadata if a problem arises or/and contacting Bowker/Nielsen to rectify any problems at their end.

Perhaps it would be best to end with a note that you will have to manually edit the metadata for your ebook when you link the cover IMAGE and html table of contents for kindle. Click the links for details.

Next Steps: Now the metadata is correct, you are ready to test your e-pub e-book and proceed to convert it to kindle. I will continue explaining the remainder of content.opf and then go through how to test your e-pub using the built-in tools in Sigil and also using epubcheck.

Index to ‘how to …’ posts:

How to ‘unpack’ an epub file to edit the contents and see what’s inside.
How to understand what is inside an epub
How to link the html table of Contents in a Kindle e-book
How to restructure the html table of contents for a Kindle
How to delete the html cover for a Kindle ebook
How to link the cover IMAGE in a Kindle e-book
How to clean up your MS Word file before your get started
How to markup an MS Word file to identify the formats before importing it into an epub
How to create a new blank e-pub using Sigil
How to import your marked-up MS Word file into your ebook using Sigil
How to create and link a CSS stylesheet in an e-book using Sigil
How to replace the markup with CSS styles in your ebook using Sigil
How to style an e-book so it works with the limited CSS styling available to Kindle e-readers
How to understand the syntax of CSS
How to style Small Caps in an e-book
How to split your ebook up into chapters using Sigil
How to sequence your e-book
How to phrase the copyright declarations etc. in an e-book
How to generate the logical table of contents using Sigil
How to understand toc.ncx in an e-book
How to generate the html table of contents in an e-pub
How to style the html table of contents using CSS
How to create an html cover for your epub using Sigil
How to present references and notes in a book
How to use Mark Up to link notes in your e-book
How to present a bibliography in a book
How to use markup to link entries in a bibliography with the notes section
How to index an e-book
How to use the tools in MS Word to create an index
How to alphabetise an index or bibliography
How to adapt the print index in your MS Word file for an e-book using markup
How to adapt cross-references in your print index for e-book and how to use markup to make the links
How to understand content.opf
How to understand and edit the Metadata of an ebook using Sigil
How to understand the manifest in content.opf
How to understand the spine and guide in content.opf
How to test your e-pub using flightCrew in Sigil
How to test your e-pub using epubcheck
How to convert an e-pub to Kindle using kindlegen

TinyURL for this post:

2 comments:

  1. Very complete information! Thank you for sharing!

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete

 
Twitter Bird Gadget