N3027 ("Proposal to add medievalist characters to the UCS") proposes to encode a wide range of abbreviation letters used in medieval manuscripts and early printed books. At present it is impossible to transcribe into Unicode many early texts as special abbreviation letters are so common, so I am very pleased to see that these letters are finally being encoded. However, there is one proposed letter that I have a little quibble with. According to N3207 :
LATIN LETTER THORN WITH STROKE is used for Old Norse þat, þess, þor-, þæt (Figures 29, 32, 33, 40, 73, 79).
Which is true enough as far as it goes, but I suspect that most of my readers will be more familiar with the letter thorn with a stroke through the ascender in the context of Old English, where it is the ubiquitous abbreviation for þæt (and unlike Old Norse, only þæt).
It is an odd thing about the proposal that Latin, Old Norse, Irish, Welsh and even Cornish are frequently cited as languages using a particular proposed character, but Old English is only cited for a single character (COMBINING DOUBLE CIRCUMFLEX ABOVE, which is an editorial mark used in some editions of Old English poetry) and there are only two other mentions of Old English in the entire 51 pages of the document, when quite a few of the proposed characters are applicable to Old English (three primarily used for OE), and six of the examples provided are actually of Old English text (figs. 29, 30, 31, 37, 39, 40). In fact two of the six examples cited for LATIN LETTER THORN WITH STROKE are Old English, contrary to what the casual reader might assume.
Not only does the proposal not mention Old English in relation to the proposed LATIN LETTER THORN WITH STROKE, but it omits the crucial piece of information that the the glyph forms of Old Norse and Old English letter thorn with stroke are quite different from each other. The Old Norse form has a short horizontal stroke through the ascender , whereas the Old English form has a longer diagonal stroke through the ascender . This difference can be seen in the examples given in N3027, where the Old Norse examples (figs. 32, 33, 42, 73 and 79) all use the former letterform and the Old English examples (figs. 29 and 40) both use the latter letterform. Although the examples show that Old Norse and Old English use distinct glyph forms, the text of the proposal does not make any mention of the fact that this character occurs in two distinct glyph forms, which I think is an important detail that should have been made explicit.
The following is an example of an early 12th century Old Norse manuscript :
Elucidarius (AM 674a folio 17r)
Thorn with stroke on lines 2 and 7
And this is an example of an Old English manuscript dated to about the year 1000 :
The Cædmon Manuscript [part of the Old English verse rendition of Genesis] (Bodleian Junius MS 11 folio 14)
Thorn with stroke on lines 3 and 11
These manuscripts exemplify the differences between the Old Norse and Old English forms of thorn with stroke. This difference is preserved in most modern typeset editions, with editions of Old Norse texts normally using a short horizontal stroke, and editions of Old English texts normally using a longer diagonal stroke. The following are a few examples that show the Old English form of the letter (see N3027 figs. 32, 33, 73 and 79 for some ON examples) :
Plummer and Earle, Two of the Saxon Chronicles Parallel (Oxford: Oxford University Press, 1889) p.69
A. Campbell, An Old English Grammar (Oxford: Oxford University Press, 1959) p.12
C.L. Wrenn (ed.), Beowulf (London: Harrap, 1973) p.210
The question then arises, should the Old Norse and Old English forms be encoded separately (LATIN LETTER THORN WITH STROKE and LATIN LETTER THORN WITH DIAGONAL STROKE) or should they be considered to be glyph variants of the same abstract character ? According to N3027 it would seem that they should be encoded as a single character, although the latest version of the MUFI character recommendation treats the two glyph forms as separate characters :
MUFI Character Recommendation Version 2.0 f (12 January 2006)
My inclination is to agree with MUFI on this one, although I suspect that I am wrong. According to Unicode encoding principles, language-specific glyph variations should be dealt with at the font level (i.e. in a font designed for Old Norse the glyph for LATIN LETTER THORN WITH STROKE would have a horizontal stroke, whereas a font designed for Old English would have a glyph with a diagonal stroke). However, there are plenty of precedents for encoding language-specific letterforms as separate characters.
My feeling is that in N3027, the proposed LATIN LETTER VEND is really only an Old Norse glyph variant of the already encoded LATIN LETTER WYNN used in Old English, and so if Vend and Wynn should be distinguished at the character level, why not Thorn with a horizontal stroke and Thorn with a diagonal stroke ?
An example from further afield that has been discussed recently is the proposed MYANMAR LETTER MON JHA (N3044), which is acknowledged to be a glyph variant of the already encoded MYANMAR LETTER JHA used for writing Mon, but which Michael Everson is proposing to encode as a distinct character because there is a requirement for a single "plain-text monofont" that covers all of the Myanmar-script languages of Union of Myanmar, and so language-specific glyph variants must be dealt with at the character level rather than the font level.
This has a bearing on the encoding of Thorn with a stroke, as fonts that are intended for use by medievalists (Alphabetum, Andron Scriptor, Cardo, Junicode, Leeds Uni) are general fonts that cover the characters required for all languages. Thus, users will generally not be using fonts specifically designed for Old Norse or Old English, but will be using a single "medievalist" font with a single glyph for LATIN LETTER THORN WITH STROKE, which will either cater for Old Norse or Old English, but not for both. I think that this is a pretty good argument for saying that as Old English Thorn with a diagonal stroke "is a language-specific variant which differs significantly from the 'default' letter" (ME's justification for MYANMAR LETTER JHA), it should be encoded separately from LATIN LETTER THORN WITH STROKE.
Anyhow, those are just my thoughts. It would be interesting to hear what other people think on this issue.