Can Everton Jones find out how his father stole Emperor Bokassa’s diamonds and, more importantly, where he hid them; before the world and his brother get there first?
Click on the picture link in the sidebar to read an extract of my first novel, which was published by Paradise Press in August 2012.

Thursday, 1 May 2014

How to import your marked-up MS Word file into your ebook using Sigil

Sigil is set up to import an MS Word file in its entirety, including all the styles and formatting. This might sound like a good idea until you actually do it and look at the file in code view. You will see literally hundreds of lines of code: the entire MS Word stylesheet, in fact, re-coded in html and CSS. Probably, for an epub, this might render properly, but there is so very much that could go wrong that something probably will. And if even just one tiny bit of it it does go wrong, mending it would involve unpicking the entire MS Word stylesheet. A task well beyond my limited skills and one which would depend also on how logically the original MS Word document styles were constructed in the first place. And as for Kindle, it supports so few html tags and even fewer CSS styles that the chances this will work on a kindle are in my opinion negligible.

So my preferred strategy is to keep the e-book as simple as possible, and that way eliminate any possible problems from the beginning and ensure I know exactly what is happening. This means importing your document as plain text (marked up to indicate where your formats go) and then re-applying the formats using CSS and html which you know will work on a Kindle and an e-pub. (See my post about which CSS works with Kindle and which does not.) Persuading Sigil NOT to import the MS Word stylesheet is not as straightforward as it might at first sound. Even then a little bit of tidying up of the file is necessary, but this can be done in a matter of minutes using find and replace. This post outlines how I go about it.

Once you have marked up your book text in MS Word, remove any comments you might have in it. And then, when it is ready, save it as unformatted text in a .txt file. To do this, click the ‘Office’ button (top left) and select ‘Save As…/Other Formats’:



In the ‘Save As…’ dialog, select ‘Plain Text (*.txt)’ from the ‘Save as Type:’ pop-up menu:

You will get a dialog asking for details of how the file is to be converted to plain text. It should default to these settings: ‘Windows(default)’ and ‘CR/LF’ at the end of lines. Use these defaults and click ‘OK’:


You could alternatively create a blank .txt file in Word and then use ‘Paste Special …’ to paste your book text into it. Select the whole document and copy the contents to the clipboard. Then click in your new, blank .txt document. Select ‘Paste Special …’ from the ‘Clipboard’ pane in the ‘home’ group of the ‘Ribbon’:


Then select ‘unformatted text’ from the dialog and click ‘OK’:


You will want to save the file in the folder you are using for your ebook project.

It is important to distance yourself from MS Word at this point, so CLOSE the .txt file you have created. When you do, Word may throw up some dialogs asking you if you really want to save it as a .txt file and warning you that doing so will mean the loss of some formatting information. Select the options to lose AS MUCH formatting information as possible! Now RE-OPEN it using something like Notepad (Windows). Perhaps the easiest way to find a suitable program is to right-click (Windows) or Control-click (Mac) on the filename and then choose a program to open the file with from the pop-up menu:


In notepad, the file now looks like this:


Once the file is open, select the entire text and copy it to the clipboard. Now create a blank e-pub file or open one you made earlier using Sigil. (See my post on how to use get started by making a blank e-pub with Sigil.) Your blank epub should contain a single empty chapter. Open this in the main Sigil window in code view. Find and select the blank paragraph which Sigil has placed in the <body> of the page. NB ‘&#160;’ is an alternative html code for a non-breaking space:


DELETE this blank paragraph and ensure the blinking insertion point is positioned on a blank line between the opening <body> tag and the closing </body> tags:


SWITCH back to ‘Book View’ and paste the text into the chapter. (If you don’t switch to book view before pasting you will get the whole thing as one v-e-r-y l-o-n-g paragraph.) When you are finished, switch back to code view again and you should see each paragraph of the original document enclosed by opening <div> and closing </div> tags:


Now to the most important bit. You will need to use find and replace to change all the closing </div> tags to </p> tags and then change all the opening <div> tags to <p> tags. What you want to achieve is this:


Next, SAVE the file. Sigil should ‘clean up’ the code placing the <p> tags in the same line as the text they enclose:


If the file doesn’t automatically clean itself up, turn clean up ON by selecting ‘Preferences …’ from the Edit menu to bring up the Preferences dialog:


And in the preferences dialog, click ‘Clean Source’ and make sure the ‘Open’ and ‘Save’ check boxes are ticked:


You will need to do a bit of cleaning up of the file using Find and Replace to make sure there are no spaces before the opening <p> tags or after the closing </p> tags. There might also be some blank <p> tags and maybe some non-breaking spaces (&#160; or &nbsp;) left in the file which need to be deleted.

Find and Replace in Sigil is helpfully in a panel below the main window and is self-explanatory:


In the screenshot above, ‘&#160;’ is being replaced with nothing (i.e. it is being deleted).

As I said at the beginning of this post, unless you save the original MS Word file as unformatted text, when it is imported into the ebook, Sigil will copy the entire MS Word stylesheet along with the text and convert the styles into inline CSS which it will put into the <head> section of the document. This will usually ammount to several HUNDRED lines of code. You want to be sure that your ebook displays exactly the way you want it to, and so importing the MS Word stylesheet is NOT a good idea. It is WAY too complicated and finding and fixing any problems will be well nigh impossible. Keep your ebook coding simple and basic and there will be less to go wrong!

Next Steps: You need to create and link a CSS stylesheet in your epub. Then you are ready to replace the markup with the CSS styling you want and to split the one long chapter you have just created up into the individual chapters.

Index to ‘how to …’ posts:

How to ‘unpack’ an epub file to edit the contents and see what’s inside.
How to understand what is inside an epub
How to link the html table of Contents in a Kindle e-book
How to restructure the html table of contents for a Kindle
How to delete the html cover for a Kindle ebook
How to link the cover IMAGE in a Kindle e-book
How to clean up your MS Word file before your get started
How to markup an MS Word file to identify the formats before importing it into an epub
How to create a new blank e-pub using Sigil
How to import your marked-up MS Word file into your ebook using Sigil
How to create and link a CSS stylesheet in an e-book using Sigil
How to replace the markup with CSS styles in your ebook using Sigil
How to style an e-book so it works with the limited CSS styling available to Kindle e-readers
How to understand the syntax of CSS
How to style Small Caps in an e-book
How to split your ebook up into chapters using Sigil
How to sequence your e-book
How to phrase the copyright declarations etc. in an e-book
How to generate the logical table of contents using Sigil
How to understand toc.ncx in an e-book
How to generate the html table of contents in an e-pub
How to style the html table of contents using CSS
How to create an html cover for your epub using Sigil
How to present references and notes in a book
How to use Mark Up to link notes in your e-book
How to present a bibliography in a book
How to use markup to link entries in a bibliography with the notes section
How to index an e-book
How to use the tools in MS Word to create an index
How to alphabetise an index or bibliography
How to adapt the print index in your MS Word file for an e-book using markup
How to adapt cross-references in your print index for e-book and how to use markup to make the links
How to understand content.opf
How to understand and edit the Metadata of an ebook using Sigil
How to understand the manifest in content.opf
How to understand the spine and guide in content.opf
How to test your e-pub using flightCrew in Sigil
How to test your e-pub using epubcheck
How to convert an e-pub to Kindle using kindlegen

TinyURL for this post: http://tinyurl.com/l5py6dr

2 comments:

  1. Wow, your method of importing Word into Sigil is quite complicated and must take you hours and hours!!

    I have a much easier method that only takes a couple of minutes to import a Word doc into Sigil.

    Here's how I do it:

    * Save your Word doc as Web Page html, Filtered.

    * Open Sigil and set Edit > Preferences > General Settings > Mend XHTML Source On to Open.

    * Now load your Word html doc into Sigil.

    * Now run a Sigil plugin(that I wrote) called CustomCleanerPlus on your html file. This will quickly and safely clear out all the dross code from your Word html and will give you a good start point to finish off your epub in Sigil.

    Like I said, doing it this way this will only take you a couple of minutes.

    ReplyDelete
  2. I think that may depend on how well-formed the original MS Word file was to start with. Most Word files I have seen that other people have made have not been created from a logically constructed and thought out style sheet. Mostly people format their files on the fly and don't nest styles properly. This would probably still leave a lot of gunk in the file. It sounds like what you have produced is a really useful tool, however. I just like to make sure everything is spot on and begin from the simplest text only file. If there is no 'dross' from MS Word to begin with, then from that point on there is nothing to delete! Does your plugin, for example, correct turned quotes, eliminate leading spaces and trailing spaces at the starts and ends of paragraphs, or correctly detect superscript NOT formatted as 'superscript' but instead, something else like set up 2pt? What about a displayed quotation which the creator of the file has made by simply dragging the indent markers in the ruler and then selecting the text and reducing the point size? How can this intention be conveyed to the software? Logically only by the user having set up a specific style for a displayed quotation AND with a name which the plugin recognises. This is a massive difficulty and our software sounds like a major achievement, well done! I just work on the basis that any automated system which a user doesn't understand is always fraught with the potential to fail to implement the user's intention. This is my experience of most 'converters' I have tried. Software, unfortunately, isn't clairvoyant. Sigil's 'correct html' option is a case in point. It cleans the code, but doesn't know what the user intended. My approach is to mark up the word file completely and THEN strip out the formatting and reduce it to a text file. Then the markup correctly reflects the user's intentions and the conversion process delivers exactly what the user intended. For the same reason, I only code using native html editors, rather than something like dreamweaver. I have forgotten the name right now, but there is an ebook conversion service which uses an MS Word file as the starting point, but it comes with a lengthy e-book on how to format the Word file properly. Obviously, if this manual isn't followed scrupulously, the file won't be properly formed and will produce unreliable output. Thanks for drawing my attention to your work, I'm well impressed!

    ReplyDelete

 
Twitter Bird Gadget