Can Everton Jones find out how his father stole Emperor Bokassa’s diamonds and, more importantly, where he hid them; before the world and his brother get there first?
Click on the picture link in the sidebar to read an extract of my first novel, which was published by Paradise Press in August 2012.

Wednesday, 16 April 2014

How to markup an MS Word file to identify the formats before importing it into an epub

Your MS Word file will most likely be carefully formatted to look great. In particular it will have italic text for foreign words, book titles, emphasis, perhaps thoughts, etc.; there will be indented paragraphs in the main text, unindented paragraphs after a ‘scene break’ (preceded by a space) and quite likely displayed quotations. Perhaps you will also have text in Small Capitals. You will also have special formats for chapter headings, text in the preliminary matter and so on.

Before you import your MS Word file into your e-book you will need to save it as unformatted text, and all of these formats (styles) will disappear. So it is necessary to ‘mark up’ the file to identify all of the formats you are using. You will then use the markup to restore the formatting using CSS in the e-book.

First, consider the styles for individual paragraphs:

Each paragraph of your MS Word document will have a particular style , probably defined as a named style in MS Word. To mark it up, just add some sort of code at the start of each paragraph. For example, I insert something like ‘/ch/’ at the start of any paragraph which is formatted (styled) as a chapter heading. You can use Find and Replace in Sigil to replace this with your desired CSS styling later. Obviously, the markup: ‘/ch/’ should be a text string which does not occur anywhere else in the original text of your book. Use any suitable alternative, such as ‘¦ch¦’ or ‘\ch\’ if there is a possible conflict. Although be sure you can easily access the characters used for the markup from your keyboard! (Mac users might find ‘§’ is easy to find on the keyboard and will probably be unique enough to define a markup code like ‘§ch§’.)

You will need to go through the file editing in markup codes at the beginning of each paragraph which needs a special format. Examples would be entries in the table of contents, chapter numbers (if any), chapter headings (as outlined above), subheadings, unindented paragraphs, displayed quotations, etc., etc.

Each markup code will match a CSS style you will define later and apply to the paragraph using Sigil.

Here are a number of examples of markup I have used:



To make your life easier,  don’t do anything with paragraphs which are indented. These will be the majority of the paragraphs in any book. Instead, just identify those which are unindented using something like ‘/u/’. Use an ‘s’ if the paragraph is to be followed by a space: i.e. put ‘/s/’ at the start of an indented paragraph which is followed by a space and ‘/us/’ at the start of an unindented paragraph which is followed by a space. Then define a CSS style so that all other paragraphs are indented by default.

Displayed quotations (often called block quotations) require several markup codes. First, let the basic, indented, paragraphs of a displayed quotation begin with something like ‘/bq/’. You will need different markup such as: ‘/bqs/’ at the start of an indented paragraph of a displayed quotation with a space after it, ‘/bqu/’ at the start of an unindented paragraph of a displayed quotation (with no space after it) and ‘/bqus/’ at the start of an unindented paragraph of a displayed quotation with a space after it.

Each of these markup codes will match a style in your CSS stylesheet.

You might like to define markup codes to identify any other formats in your own ebook. I for instance use ‘/p/’ for paragraphs in the preliminary matter and – following the pattern above – add an ‘s’ if there is a space after it. As my preferred styling is to have centred text in the preliminary matter there is in my case no need to distinguish between indented and unindented paragraphs in the preliminary matter.

Up to now the markup has all gone at the beginning of each paragraph. To identify italic text falling within a paragraph, you need to have markup before and after the text you want in italic. So for example in:

   A sentence with these words in italic.

preceed the italic text with ‘/i/’ and follow it with ‘/ii/’ like this:

   A sentence with /i/these/ii/ words in /i/italic/ii/.

You will need to do the same before and after any text you want in Small Caps or even in bold or underlined, although bold and particularly underline are best avoided in any book, print or otherwise. I have devoted an entire post to styling Small Caps, because it is a particularly difficult case. In the example below the markup for small capitals has been simplified for clarity.

If you are using markup codes for custom styles, it is best to keep a list on a piece of paper as you go along, so you can remember what they mean when you come to apply the styling.

When the marked-up file is saved as unformatted text, the original styling will disappear, leaving just text. BUT your markup will still be there and will be used later on to restore the formatting. (See subsequent posts:)

Example:

The original MS Word file:

Chapter Heading
The First Three words of an unindented paragraph can be set in Small Caps to draw attention to them.
Subsequent paragraphs should be indented. Don’t use a markup code for indented paragraphs, set this as the default style using CSS instead. This will reduce the ammount of markup needed.
Displayed (or Block Quotations) are separated from the main text by a space:
A single paragraph displayed quotation is unindented and followed by a space. The entire quotation is set in a smaller font. It would normally be set in at the left and the right, but this isn’t possible on a Kindle, so I don’t bother.
The paragraph following a displayed quotation is normally unindented.

The MS Word file with markup:

/ch/Chapter Heading
/u//sc/The First Three/ssc/ words of an unindented paragraph /i/can/ii/ be set in /sc/Small Caps/ssc/ to draw attention to them.
Subsequent paragraphs should be /b/indented/bb/. Don’t use a markup code for indented paragraphs, set this as the default style using CSS instead. This will reduce the ammount of markup needed.
/s/Displayed (or /i/Block/ii/ Quotations) are separated from the main text by a space:
/bqus/A single paragraph displayed quotation is unindented and followed by a space. The entire quotation is set in a smaller font. It would normally be set in at the left /i/and/ii/ the right, but this isn’t possible on a Kindle, so I don’t bother.
/u//sc/The paragraph following/ssc/ a displayed quotation is normally unindented.

Notes: The markup codes in the MS Word file will have the same format as the text they have been inserted into. Don’t worry that it looks inconsistent and don’t bother trying to make it match if it doesn’t. When it is converted to unformatted text everything will be the same (see the example below). I have simplified the markup for small capitals. Please see my post on how to do small capitals for a full explanation.

Tip: You can use Find and Replace in MS Word to locate text styled as italic, which will save a lot of time searching for anything to markup with /i/ and /ii/.

One further thing: In the example above, the chapter heading is in italic. You could mark it up as follows: ‘/ch//i/Chapter Heading/ii/’, BUT that is not necessary. Just include the italic in the CSS style for chapter headings instead. You DO, however, need to use two ‘/i/’ and ‘/ii/’ codes if the italic comes within a paragraph. The reason for this is because the CSS needs to be applied differently in each case.

The above example shows what the MS Word file will look like after it has been marked up. Logically this is the end of the scope of this post, but I am including the results of the next stages for information.

Once the file has been converted to unformatted text it will look like this:

The marked up MS Word file as unformatted text ready to import into your ebook:

/ch/Chapter Heading

/u//sc/The First Three/ssc/ words of an unindented paragraph /i/can/ii/ be set in /sc/Small Caps/ssc/ to draw attention to them.

Subsequent paragraphs should be /b/indented/bb/. Don’t use a markup code for indented paragraphs, set this as the default style using CSS instead. This will reduce the ammount of markup needed.

/s/Displayed (or /i/Block/ii/ Quotations) are separated from the main text by a space:

/bqus/A single paragraph displayed quotation is unindented and followed by a space. The entire quotation is set in a smaller font. It would normally be set in at the left /i/and/ii/ the right, but this isn’t possible on a Kindle, so I don’t bother.

/u//sc/The paragraph following/ssc/ a displayed quotation is normally unindented.


As you can see the styling in the original MS Word file has completely disappeared. The markup codes now identify the original styling and will be used to re-apply it using CSS in the e-book. Once this has been done the file will look like this:

The ebook in code view after the CSS styling has been applied:


<p class="chapterHeading">Chapter Heading</p>

<p class="noindent"><span class="smallCaps">The First Three</span> words of an unindented paragraph <span class="italicText">can</span> be set in <span class="smallCaps;">Small Caps</span> to draw attention to them.</p>

<p>Subsequent paragraphs should be <b>indented</b>. Don’t use a markup code for indented paragraphs, set this as the default style using CSS instead. This will reduce the ammount of markup needed.</p>

<p class="space">Displayed (or <span class="italicText">Block</span> Quotations) are separated from the main text by a space:</p>

<p class="blockQuoteUnindentSpace">A single paragraph displayed quotation is unindented and followed by a space. The entire quotation is set in a smaller font. It would normally be set in at the left &span class="italicText">and</span> the right, but this isn’t possible on a Kindle, so I don’t bother.</p>

<p class="noindent"><span class="smallCaps">The paragraph following</span> a displayed quotation is normally unindented.</p>


How to convert the markup codes into the CSS styles is covered in another post. The above example is only for illustration purposes and to indicate where you are headed with all of this.

Here is a list of the markup codes I use and the formats which they represent:

markup format
none default: indented paragraph, no space after
/u/ unindented paragraph, no space after
/us/ unindented paragraph, space after
/s/ indented paragraph, space after
/ch/ chapter heading
/bq/ block quote, indented, no space after
/bqu/ block quote, no indent, no space after
/bqus/ block quote, no indent, space after
/bqs/ block quote, indented, space after
/i/…/ii/ markup enclosing italic text
/b/…/bb/ markup enclosing bold text
/sc/…/ssc/ markup enclosing text in small caps
/ie/ entry in the table of contents
/p/ preliminary matter, no space after
/ps/ preliminary matter, space after

Obviously you can use any codes you wish, provided you know which style they represent. Although the markup codes used to enclose italic, bold, etc. falling within a paragraph must be different at the start (/i/) and end (/ii/) of the text with that format (see the example above).

As I explain in another post, Kindle e-readers cannot cope with styles defined with space BEFORE a paragraph. Work with styles defined with space AFTER paragraphs instead and your e-book will display properly on a Kindle when you are finished.

Next Steps: you will need to create a new, blank e-pub file, then import your marked up MS Word file into your epub as unformatted text and then replace the markup with CSS styling. See also a post on how to create the CSS stylesheet and also another on which CSS styling works on the Kindle and which doesn’t.

Index to ‘how to …’ posts:

How to ‘unpack’ an epub file to edit the contents and see what’s inside.
How to understand what is inside an epub
How to link the html table of Contents in a Kindle e-book
How to restructure the html table of contents for a Kindle
How to delete the html cover for a Kindle ebook
How to link the cover IMAGE in a Kindle e-book
How to clean up your MS Word file before your get started
How to markup an MS Word file to identify the formats before importing it into an epub
How to create a new blank e-pub using Sigil
How to import your marked-up MS Word file into your ebook using Sigil
How to create and link a CSS stylesheet in an e-book using Sigil
How to replace the markup with CSS styles in your ebook using Sigil
How to style an e-book so it works with the limited CSS styling available to Kindle e-readers
How to understand the syntax of CSS
How to style Small Caps in an e-book
How to split your ebook up into chapters using Sigil
How to sequence your e-book
How to phrase the copyright declarations etc. in an e-book
How to generate the logical table of contents using Sigil
How to understand toc.ncx in an e-book
How to generate the html table of contents in an e-pub
How to style the html table of contents using CSS
How to create an html cover for your epub using Sigil
How to present references and notes in a book
How to use Mark Up to link notes in your e-book
How to present a bibliography in a book
How to use markup to link entries in a bibliography with the notes section
How to index an e-book
How to use the tools in MS Word to create an index

TinyURL for this post: http://tinyurl.com/nuqnaqo

No comments:

Post a Comment

 
Twitter Bird Gadget