Can Everton Jones find out how his father stole Emperor Bokassa’s diamonds and, more importantly, where he hid them; before the world and his brother get there first?
Click on the picture link in the sidebar to read an extract of my first novel, which was published by Paradise Press in August 2012.

Sunday, 13 April 2014

How to clean up your MS Word file before you get started converting it to an e-book

So far I have posted on the most critical issues involved in converting an e-pub e-book to a Kindle e-book. These primarily revolve around the html table of contents and the cover image. I now want to go right back to the beginning and deal with the entire e-book creation process from scratch.

The starting point in making an e-book is obviously a complete, edited manuscript. Most likely the final draft of the print book. I will assume this is written in MS Word, although Open Office is a perfectly satisfactory (and free) alternative, and works in broadly the same way.

The first thing you will need to do is to clean up your manuscript. And I’m not talking about spelling or punctuation errors here, I mean dealing with more techincal formatting issues, some of which are peculiar to e-books.

Optional Hyphens:

The first among these is hyphenation. And I’m sitting on my hands here to stop myself having a go about the correct places to put the ‘normal’ or ‘hard’ hyphens. Get yourself a copy of New Hart’s Rules or the Chicago Manual of Style and a good up-to-date dictionary such as the New Oxford Spelling Dictionary and sort those out yourself. What I am talking about here are the extra hyphens you should have added in your print book file to break any long words which fall at the ends of lines so as to even up the white space between words. These are sometimes called ‘optional’ hyphens:

An e-book is a reflowable format, which is to say the user can change the size of the font and cause the text to flow to fill the space available on the screen. As a consequence there is no such thing as the end of a line in an e-book: the number of words on a line – and so where the line-breaks fall – depends on the point size of the text. Any optional hyphen will move about in the text as the user changes the point size. If the line break moves, the hyphen will end up not where you want it, at the end of the line, but within a line of text where it has no business being. Consequently the first thing you should do to your MS Word file is to go through it and delete all optional hyphens if there were any. They have no meaning or place in an e-book.

update 26 May 2014:

In an MS Word file you can insert optional hyphens using Insert Symbol. Some investigation revealled that what this does is to put a character called a soft hyphen (unicode 00AD) into your file. MS Word will either NOT display this character if it falls within a line of text OR ELSE it will replace it by a hyphen if it falls at the end of a line and show the rest of the word at the beginning of the line below.

I created an ebook using soft hyphen characters where I wanted optional hyphens and found they displayed in book view in Sigil AND in the Kindle something like a ‘not’ sign: ‘¬’which certainly won’t do. All were visible irrespective of where they fell in a line. So these definitely don’t do what it says on the tin. Optional hyphens cannot be done using the soft hyphen character in e-books, so far as I can determine.

So best practice would be to do optional hyphens by using Insert Symbol in your MS Word file and then delete all the optional hyphens by deleting all the soft hyphen characters from the file using Find and Replace before importing it into your e-book project.

update ends

Curly versus Straight quotes:

I strongly feel that curly quotes: ( ‘ ’ ) look best in a document. Straight, upright, or typrwriter-style quotes ( ' ' ) are in my humble opinion just plain ugly but, worse still, they make no distinction between an opening ( ‘ ) and a closing ( ’ ) quote. Punctuation is there to help the reader and it is no help whatever if the reader is left guessing from the context what kind of quote is intended. In MS Word it is a simple matter to turn on the curly quotes feature and then Word will take care of putting in the curly quotes for you. BUT with the best will in the world upright quotes seem always somehow to sneak their way in. Best to go through your MS Word file with Find and Replace and check all quote marks. (And see also the next two sections below.)

Turned Quotes:

An essential – if annoying – feature of the way MS Word does curly quotes is that it decides whether  to use an opening or a closing quote from the context. If you type a quote after a space, Word thinks an opening quote is appropriate, as most likely a quoted word will follow. 99% of the time this will be the case. BUT if a word begins with a closing quote to represent elision of a missing letter, then Word will have entered something like this: ‘Rock n’ Roll’ instead of this: ‘Rock n’ Roll’, which is correct. The quote mark used by Word (shown hilited in yellow) is the wrong way around: a turned quote. You will need to correct any of these which you find. In code view in Sigil, type ‘ for a single opening quote mark and ’ for a single closing quote mark. In an MS Word file, type two quote marks in succession, to get: ‘’ and then delete the initial opening quote. Enlarge the point size of the text and use times new roman to make the punctuation more distinct if, like me, your eyesight isn’t what it used to be!

Double versus Single Quotes:

Be consistent about how you style quotes. In Oxford (UK) Style, all quote marks should be single quotes (irrespective of their function: see New Hart’s Rules). Any quotations nested within single quotes should have double quotes and anything nested within the doubles should swap back to singles and so on. In Chicago (US) Style (and also for some reason in many UK newspapers), it is the other way around: begin with doubles, nest singles within the doubles and nest doubles within singles. To enter double quotes in your ebook when in code view in Sigil, type “ for double opening quotes and ” for double closing quotes. See also my comments about displayed quotations in my forthcoming post on CSS styling for further information about conventions governing the use of quote marks at the start and end of displayed quotations.


 Be careful about paragraphs which begin with a space or end with a space. You might not notice the spaces as you type in the MS Word document. But a space at the beginning of a paragraph will set it a tiny amount further over to the left than other paragraphs on the page and looks amateurish. (I added a rogue space at the start of this paragraph as an example.) A space at the end might affect the way the paragraph breaks. Search and Replace in the MS Word file to change ‘^p+(space)’ or ‘(space)+^p’ to ‘^p’.

More than one Space:

Sometimes you can type an extra space by mistake. And the extra space can be hard to spot. In a print book, the extra space between words will lead to uneven wordspacing and this is particularly noticeable with justified text. (I have  inserted an  extra space  between every  other word  in this  bracket so  you can  see how  awful this  looks, even  in unjustified  text.*) As a general rule two or more spaces ought to be ignored by an e-book reader. However there is a possibility that two spaces might be converted by the e-reader into a normal space and a nonbreaking space and that combination won’t be ignored. (*actually, because this post is written in html, I put in extra nonbreaking spaces in the bracket above.) Best to eliminate every instance of double spaces, and replace them with single spaces. Use Find and Replace repeatedly until no more matches are found. Do this before eliminating spaces at the beginning or end of paragraphs (see above).

Double Space after Full-Point:

An outdated convention harking back to the days of typewriters and monospaced fonts used to be to type two spaces after the full-point at the end of a sentence.  (As I’ve done in this pargraph, so you can see how awful it looks.  ) Nowadays, with the advent of proportional fonts this is largely unnecessary.  Worse, with justified text it can lead to very large spaces in the middle of a line and I STRONGLY recommend against it.  BUT if you REALLY want to do this in your e-book, use a nonbreaking space followed by a normal space instead (NOT the other way around, in case the nonbreaking space ends up at the beginning of a line).  You might think you could use one of the various other space characters available, such as an ‘em space’, BUT these are NOT supported on the Kindle.  Kindle only supports the normal interword space, the non-breaking space and the rather exotic-sounding ‘zero-width nonbreaking joiner’ (which you won’t need in English text, so just ignore it.  It will be ignored by the e-reader!).

Other Space Characters:

I should add that I sometimes use a ‘hairspace’ character in print books to space punctuation or characters if they overlap. (There is no hard-and-fast rule as to where this is necessary because each font is different.) In any event this should not be done in an e-book firstly because the hairspace character isn’t supported on the Kindle and secondly because it can lead to the line breaking in the middle of a word. If you have ANY non-standard space characters in your document, find and delete them before converting to e-book.


The ellipsis (…) is particularly tricky. And there are quite significant differences between Oxford and Chicago Style to consider as well. Increasingly the ellipsis character: (…) is being used in modern typography. This is a SINGLE character composed of three dots. An advantage of using this is that it won’t get split over a line break. MS Word is usually configured by default to replace three full points with the ellipsis character as you type but doesn’t always do this consistently. You need to change all instances of three full points with the ellipsis character. The first time you need to do this, insert an ellipsis character manually using ‘Insert Symbol’ in MS Word. Then copy and paste the ellipsis into the ‘Replace’ field in the Find and Replace dialog and use Find and Replace to change all remaining instances. Chicago style prefers full-points (full-stops, periods), spaced, set closed up between two words . . . like this. (Look closely and the difference between this and the ellipsis character used above will become apparent.) BUT you will need to use nonbreaking spaces instead of normal spaces to prevent the ellipsis being split over a line break. Construct the ellipsis using ‘Insert Symbol’ and then proceed as above. When editing your ebook in code view in Sigil type … for the ellipsis character and   for a non-breaking space.

In my reading of Oxford Style the ellipsis character would be spaced on both sides unless followed by punctuation (such as a closing quote). Typographically you might want to precedede the ellipsis by a non-breaking space to ‘tie’ it to the word which goes before. That way you avoid the possibility of the reader encountering it for the first time at the beginning of a new line. IF you have set the ellipsis closed-up to the word it follows this won’t be an issue, but as far as I can see a space is needed in Oxford Style. I would, however, recommend against using a non-breaking space after the ellipsis. This ties the two words and the ellipsis into a single long block of text and almost certainly will lead to a large white gap in the line on the screen of the e-reader. Equally, the Chicago Style version of the ellipsis should use a normal space in the FINAL position (after the final full-point and before the next word) to allow the line to break there if need be. All other spaces in the Chicago-Style ellipsis should be non-breaking spaces.


There are significant differences between Oxford and Chicago Style with regard to dashes. Decide which style you are using and then follow it. I suppose the main difference would be that when a dash represents a parenthesis – to introduce an aside (like this) – then UK publishers would often use a spaced en-dash, whereas US publishers would always use a closed-up em-dash—like this. MS Word will usually convert a spaced hyphen to a spaced en-dash as you type, but doesn’t always do so. Best to check the dashes in your document and make sure they are correct. An en-dash is the width of the letter ‘n’ and in code view in Sigil can be entred by typing –. An em-dash is the width of the letter ‘m’ and can be entered in code view in Sigil by typing —. Both are wider than a normal hyphen. Where to insert the various dashes and hyphens in a document and which ones to use is a particularly difficult aspect of copy-editing and is outside the scope of these posts. Refer to your style guide.

When entering dashes – or indeed other special characters – using ‘Insert Symbol’ in MS Word, be careful not to use a more exotic character which at a cursory glance looks similar. The NAME of the character will appear at the bottom left of the insert symbol dialog. Check it if you are not sure. To check an existing symbol, select it and then open the insert symbol dialog. The name of the selected character will be displayed in the dialog.

Other Symbols:

You might have inserted symbols of various kinds in your MS Word document. Fine. But remember that the user can change the font used on their e-reader. Not all characters are included in every font. A missing character will be replaced with ‘?’ or something similar, and you don’t want that! So use as limited a range of special characters as possible. If you only want a Kindle e-book you are fortunate because the fonts (both of them!) installed on the Kindle reader have a wide variety of special characters. See the Amazon Kindle Publishing Guidelines for a comprehensive list. But don’t count on all of these being available in an e-pub reader. And be particularly careful when inserting symbols in MS Word NOT to use the FONT named ‘symbol’ which will not survive conversion to unformatted text. (For example: the Greek letter ‘Lambda’ in symbol will become the Latin letter ‘L’ when it converts to unformatted text.) In the insert symbol dialog in Word make sure you have selected ‘(normal text)’ as the font in the drop-down menu:

or else select from the list in the special characters tab:

Tabs and Indents:

Tabs will simply not work in an e-book. You need to delete all tab characters by using Find and Replace in MS Word by replacing all instances of  ‘^t’ with nothing. Indents should NOT be done using tabs or multiple spaces. Use an indented style instead. It only takes a minute to type a single line of code into your stylesheet and then ALL paragraphs with no other styling will be indented in the e-book by default. How to do indents using CSS is covered in detail here.


Despite this being tedious to do using MS Word, it is remotely possible that your document – or a document supplied to you – has used the ‘kerning’ facility. That is to say it might have used special formats to introduce an extra point or so of space between characters judged to be too close together. However all kerning information will be lost when you convert your document to unformatted text. I know of no way to get around this, you just need to be aware of it. E-book reader software is much less sophisticated than MS Word and you just have to live within the limitations imposed by this.


Alright, so my hands haven’t quite lost circulation. Some of the above IS me crying into the wind about punctuation. But much of the above also relates to differences between print and e-book formatting and you DO need to be aware of this. And even when I have already edited a print book many times over, when I come to convert it to e-book I still find several instances of the difficulties listed above, so I ALWAYS run the checks I have just outlined, and I strongly recommend you do the same.

Next Steps: Once you have cleaned up your MS Word file, the next step is to ‘mark up’ the file to identify the various formats you have used. This is because your file has to be converted to unformatted text for conversion to e-book and without the mark-up you will lose all formatting information.

Index to ‘how to …’ posts:

How to ‘unpack’ an epub file to edit the contents and see what’s inside.
How to understand what is inside an epub
How to link the html table of Contents in a Kindle e-book
How to restructure the html table of contents for a Kindle
How to delete the html cover for a Kindle ebook
How to link the cover IMAGE in a Kindle e-book
How to clean up your MS Word file before your get started
How to markup an MS Word file to identify the formats before importing it into an epub
How to create a new blank e-pub using Sigil
How to import your marked-up MS Word file into your ebook using Sigil
How to create and link a CSS stylesheet in an e-book using Sigil
How to replace the markup with CSS styles in your ebook using Sigil
How to style an e-book so it works with the limited CSS styling available to Kindle e-readers
How to understand the syntax of CSS
How to style Small Caps in an e-book
How to split your ebook up into chapters using Sigil
How to sequence your e-book
How to phrase the copyright declarations etc. in an e-book
How to generate the logical table of contents using Sigil
How to understand toc.ncx in an e-book
How to generate the html table of contents in an e-pub
How to style the html table of contents using CSS
How to create an html cover for your epub using Sigil
How to present references and notes in a book
How to use Mark Up to link notes in your e-book
How to present a bibliography in a book
How to use markup to link entries in a bibliography with the notes section
How to index an e-book
How to use the tools in MS Word to create an index

TinyURL for this post:

No comments:

Post a Comment

Twitter Bird Gadget