Thursday, 18 September 2014

How to alphabetise an index or bibliography

The conventions around alphabetising an index or bibliography are complex and I have devoted this entire post to the subject. It completes my earlier posts on how to set out an index and a bibliography in a print book by explaining how to put the entries into the correct sequence. I will go on in my next post to explain how to adapt the print index for an e-book including how to make the links using markup.

Before I get started, given the complexity of the rules which follow, it might legitimitely be asked whether a reader would be sufficiently familiar with them to be able use them to find an item in an index. If you were of that opinion, then you might elect to use whatever sequence MS Word chose to place your index or bibilography in. However the general principles are important, and you would do well to read what follows if you have a bibliography or index in your book. My own feeling is that you should bite the bullet, learn the rules and then use them, rather than publish something which isn’t ‘right’.

There are TWO systems of alphabetisation in general use: ‘word-by-word’ and ‘letter-by-letter’. The word-by-word system is described in British Standards (BS ISO 999: 1996) and is the predomiant method in the UK. According to Hart, the exceptions are British encyclopaedias, atlases, gazetteers and dictionaries, which generally follow letter-by-letter indexing; he also says this method is more common in the US. And indeed The Chicago Manual of Style strongly prefers letter-by-letter, although it too says either can be used.

Word-by-word indexing alphabetises the entries according to the first word, and then re-starts the ordering according to the second word, and so on. The two parts of hyphenated words are treated completely separately, as if the hyphen was a word space, UNLESS the first elemet of the hyphenated term is not a word which can stand in its own right (e.g. de-materialise).

Letter-by-letter alphabetisation proceeds ignoring the word spaces and hyphens from the beginning to the end of each entry. This is also what a word-processor program will do.

In both methods, punctuation such as apostrophes and accents are ignored. The alphabetisation runs up to any comma indicating an inverted word-order. Items which are identical up to this comma are listed in order of whatever follows. Any parenthetical expression (in brackets) is ignored in Oxford style. In Chicago style, the expression in brackets is treated as though it were preceeded by a comma.

New Hart’s Rules also says to ‘ignore leading prepositions, conjunctions and articles’ in index sub-entries. Chicago says to do the same, and also advises inverting such sub-entries if the index is set-out. I will come back to this in my next post and link from here directly to that part of the discussion when it is published.

If the alphabetised term is the same, then in Oxford style it is usual to list first people, then places, then subjects, concepts and objects, and then finally work titles:

New York (mayor)
New York (place)
New York (population)
New York Times, the (work title)

Although some scientific works and dictionaries would list lowercase words (things) before capitalised words (people).

So, for instance, here are two indexes, one following word-by-word order and the other using letter-by-letter:

Word by Word: Letter by Letter:
High, J.
high (light-headed)     
high chair
high heels
High-Smith, P.
high water
High Water (play)
Highclere Castle
Highsmith, A.
High, J.
high (light-headed)
high chair
Highclere Castle
high heels
Highsmith, A.
High-Smith, P.
high water
High Water (play)

In both systems, abbreviations pronounced as they are spelled (AIDS, OPEC) are treated as words. In the word-by-word system, other groups of letters are listed before any full word:

Word by Word:      Letter by Letter:
IP address
IP address

Indefinite and definite articles in titles (a, an, the, etc.) are always* transposed, BUT (acording to Chicago) prepositions (for, with, to, etc.) are NOT transposed:

Winter’s Tale, A


For Whom the Bell Tolls

(*Although Chicago suggests NOT inverting the article in run-in sub-entries only, where it says the inverted order is both unnecessary and distracting.)

In English works, foreign words are generally alphabetised as though any accents were not there. So for example ‘é’ and ‘è’ are both treated as if they were just ‘e’. Although proper names in foreign languages need special treatment.


Names of people are usually inverted, so the more significant item (the surname, or family name) is listed first:

Shelton, Rod

Initials are generally listed before full first names and titles are usually listed after initials or first names:

Shelton, R.
Shelton, Dr R.
Shelton, Rod
Shelton, Dr Rod

Mc, Mac, Mc, or any of the other variations are alphabetised as if they were all spelled ‘Mac’.

Even if only a surname is referenced in the main text, every attempt should be made to include a full citation in the index or bibliography, including at least the person’s initials wherever possible.

Treat ‘St’ as if it were spelled out as: ‘Saint’ in both personal and place names. If the ‘St’ is part of the name of an historical figure (such as St Joan) then invert it in the index (Joan, St, of Orleans).

NB no full point in Oxford style after ‘St’ to distinguish it from ‘St.’ for ‘street’. (I can find no mention of this convention in the Chicago Manual of Style, which seems always to use a full point for both ‘street’ and ‘saint’.)

If a saint’s name is in a foreign language, alphabetise the abbreviated form as though spelled out in the original language: so alphabetise ‘Ste-Etienne’ as if it were ‘Sainte-Etienne’.

More about spelling ‘Saint’:

‘Saint’ in French:

Use a capital ‘S’ and a hyphen for saints’ names if they refer to a place name, an institution or a saint’s day. (Abbreviate as: ‘S.-’ (maculine) and ‘Ste-’ (feminine).) But use lowercase and no hyphen if the name refers to the saints themselves. (Abbreviate as: ‘s.’ (masculine) and ‘ste’ (feminine), NO full point after ‘Ste-’ in UK style because it is a contraction, not an abbreviation. Chicago says contractions should take full points, and so has ‘Ste.-’.)

‘Saint’ in German:

use ‘Sankt’ (abbreviation: St.) and for the saints themselves: ‘heilig’ (hl.)

Note that although ‘St.’ is a contraction of ‘Sankt’, in German only* it takes a full-point. (German has its own distinctively different set of rules for punctuating abbreviations and contractions, consult your style guide if you need guidance.)

*US style makes no distinction between contractions and abbreviations, and so there would be a full point in US style as well.

‘Saint’ in Italian:

before a consonant in the masculine: San
before an s followed by another consonant: Santo
before a consonant in the feminine: Santa
in both genders, before a vowel: Sant’ (set closed up: Sant’Agnese)

‘Saint’ in Portugese:

masculine: São (sometimes Santo before a vowel)
feminine: Santa

‘Saint’ in Spanish:

masculine: San (but Santo before Do- or To-)
feminine: Santa

Remember that in all cases the abbreviation for ‘saint’ is alphabetised as though spelled out in the original language.

Foreign language names:

The rules for alphabetising and spelling foreign names are complicated. I’m covering a few of them below in detail:

Welsh names:

Before the seventeenth century, surnames were not common in Welsh. Instead they used a form similar to ‘son of’:

masculine: ap (preceeding a consonant) and ab (preceeding a vowel)
feminine: ferch

Alphabetise on the prefix, which should be treated as the first part of the surname: ‘ap Glendower, David’.

Irish and Scottish names:

The Irish prefix ‘O’ means ‘grandson of’. In English, set this closed up with an apostrophe as if representing elision: ‘O’Neill’. In Irish, the ‘O’ is sometimes set spaced without an apostrophe: ‘O Niell’ or, more authentically, with an accented ‘O’: ‘Ó Flannagáin’, which is the Gaelic spelling. Follow the preference of the person, if known. (To get Ó, type Ó in Sigil in code view. A complete reference list will eventually be published in this sequence, but you can always get the character codes from

Mac means ‘son of’. There are various ways of presenting this. In general, follow the person’s preferred spelling, if known. However the name is spelled, alphabetise it as though it were spelled ‘Mac’.

I can find no specific guidance in either Hart or the Chicago Manual of Style, and so I would have thought you could safely index the name on the surname, including any prefix, such as Mac in the surname: ‘MacMahon, Michael’, for example.

Names with prefixes:

In general, follow the person’s preference as to spelling out prefixes to surnames such as de, du, van, den, von etc.


In French practice, ‘de’ should be lowercase, unless the name has been anglicised (De Quincey). Elide the ‘e’ before a vowel: ‘d’Alembert’. Alphabetise on the surname if the ‘de’ is lowercase, but include the ‘De’ in the alphabetisation if it is anglicised: ‘Alembert, Jean le Rond d’ ’ AND ‘Mairan, Jean-Jacques de’ BUT ‘De Quincey, Thomas’.

In Dutch, the ‘de’ is not capitalised and is not used for alphabetisation: ‘Groot, Deert de’
BUT in Flemish the reverse is true, capitalise and alphabetise the ‘De’: ‘De Bruyne, Jan’

In Italian names, prefixes are capitalised and alphabetised: ‘De Sanctis, Gaetano’. BUT aristocratic names beginning with ‘de’ ’, ‘degli’ or ‘di’ are lowercased and alphabetised according to the main name: ‘Medici, Lorenzo de’ ’.

In Spanish names, ‘de’ is lowercase and is not included in references which use just the surname. The ‘de’ is not alphabetised: ‘Vega Carpio, Lope de’, ‘Barahona de Soto, Luis’, ‘Hurtardo de Mendoza, Diego de’. The prefix, ‘del’ is a contraction of ‘de el’.

In modern French, ‘de La’ has only one capital: ‘de La Fontaine’. The ‘de’ is dropped if no forename is included: ‘La fontaine says …’ But when anglicised, this may vary: ‘De La Warr’. Follow the conventional usage if in doubt. When alphabetising, include the ‘La’ in the surname: ‘La Fontaine, Jean de’.

In Spanish the ‘de la’ is all lowercase and is not alphabetised: ‘Torre, Claudio de la’.


‘Du’ normally has an initial capital and names are alphabetised according to ‘D’: ‘Du Deffand, Marie, marquise’. However there are variations where the person prefers a lower case ‘du’. In which case alphabetise according to the surname only: ‘Maine, Louise de Bourbon, duchesse du’.

le and la:

The definite article is capitalised if it comes at the beginning of a French surname: ‘Le Pen, Jean-Marie’. BUT it is lowercase in place names unless it comes at the beginning of a sentence. In both cases alphabetise the name on the article.

van, van den, van der:

Generally in Dutch, lowercase the prefix to a proper name and alphabetise on the main name: ‘Keulen, Johannes van’. In Flemish, the reverse is true: ‘Van Rompuy, Eric’. However well-known names of Dutch origin are capitalised and alphabetised on the prefix: ‘Van Gogh, Vincent’.

von, von dem, von den, von der, vom:

German follows Dutch practice (see above) and these prefixes are lowercase. In some Swiss names, ‘Von’ is capitalised.

When a name is referenced as just the surname ‘von’ is usually omitted and the name is not indexed on the prefix BUT the others, ‘von dem’, ‘von den’, ‘von der’ and ‘vom’ are usually retained and are included in the alphabetisation.

French Names:

In addition to the foregoing, be careful in French where an article is used to refer to a person and is not itself part of the surname.

When a first name is abbreviated in French, if the second letter is an ‘h’ it is retained: ‘Th. Gaultier’.

Hebrew and Jewish names:

These are often spelled in English in a variety of different ways. Follow the preference of the person if known or conventional usage.

‘ben’ means ‘son of’ and ‘bat’ means ‘daughter of’. Use lowercase and following the example of Welsh names alphabetise on the ‘ben’ or ‘bat’. Modern Israeli names use a capitalised ‘Ben-’, hyphenated and set closed up: ‘David Ben-Gurion’. Include the ‘Ben-’ in the alphabetisation: ‘Ben-Gurion, David’. Omitting the hyphen is wrong, as it suggests that ‘Ben’ is a middle name.

Portugese and Brazilian Names:

Like Spanish, surnames are formed from the father’s surname and the mother’s maiden name. Unlike Spanish, the mother’s maiden name comes first. Alphabetise on the father’s surname:
Alphabetise ‘Manuel Braga da Cruz’ as: ‘Cruz, Manuel Braga da’.

Spanish Names:

See above. The order is reversed and the name is alphabetised on the first element. Conventional usage is complicated, but it would seem that the first part of the surname should be used for alphabetisation, appending any ‘de’ to the end of the first name: ‘Cervantes Saavedra, Miguel de’.

Icelandic Names:

Surnames in Iceland still follow the ancient scandanavian tradition of appending ‘-sson’ or ‘-dottir’ to mean ‘son of’ or ‘daughter of’. Alphabetise the name according to the first name: ‘Vigdis Finnbogadottir’.

I have left out information about alphabetising a number of other languages. Refer to your style guide of further information.

Scientific Names:

If the first part of the name of a chemical compound is a prefix (like ‘cis-’), then ignore it for the puropses of alphabetisation. Also remember to alphabetise abbreviations as though they were spelled out.

Ignore subscripts in chemical notation (for example ‘Vitamin B12’), unless the terms would otherwise be the same (i.e. list Vitamin B1 before Vitamin B2).

Greek letters prefixing the names of stars are treated as though spelled out and any hyphen is ignored. Hart, with his customary conciseness, says to do the same with Greek letters prefixing chemical terms (such as ‘α-iron’ or ‘α chain’, which should be alphabetised as though begining with ‘alpha’) BUT to ignore the Greek letter if it prefixes the name of a chemical compound (such as ‘γ-aminobutyric acid’, which should be alphabetised inder ‘A’).

Symbols and Numerals:

Numbers and symbols in the British Standard are listed first, before the alphabetical sequence. However a common alternative is to treat these as though they were spelled out. So alphabetising ‘1st’ as if it were ‘first’, for example. If an ampersand occurs within an entry, then it is best treated as if it were spelled out as ‘and’.

before the alphabetical      
as though spelled out:
1st Cavalry
2/4 time
3i plc
42nd Street
1st Cavalry
42nd Street
3i plc
2/4 time

You might also want to organise a certain part of a bibliography or an index with entries either in chronological or some other sequence, if that would be logical or helpful to the reader. Make sure only entries within the same block use a variant ordering. The following example from the Chicago Manual of Style is an excellent illustration of this (run-in style):

Daley, Richard J. (mayor): third term,
   205; fourth term, 206–7
flora, alpine: at 1,000-meter level, 46,
   130–35; at 1,500-meter level, 146–
   54; at 2,000 meter-level, 49, 164–74

So as you can see, alphabetising an index or bibliography is far from simple. But these are the rules and if you want to produce your book to a professional standard, you will unfortunately have to follow them.

Next Steps: My next post describes how to adapt the print index for use in your e-book and how to use markup to link from the index to the relevant places in the main text.

