The rendering of uncommon characters in the etexts on this site
(or: waiting for Unicode support)

These etexts were made in the mid nineties when widespread support for the ISO 8859-1 character set (aka Latin 1) was seen as a great leap forward from the mess of incompatible eight bit encodings of the preceding decade. However, this small addition to the 26 letters covered by ASCII just barely covers the majority of the modern West European languages, and is not adequate to cover neither normalised Old Norse nor the typographical niceties of printed text editions.

At that time, Unicode was under development, promising universal support for any known character, so I chose to make ad hoc solutions along the way to make all distinctions visible, and easily replacable with the real thing when available. The deployment of Unicode has proceeded at a much slower pace than what could be hoped, and the Unicode support in most browsers is still severely lacking in quality and coverage. The bulk of these pages are therefore not yet updated, but I occasionally rework some of the material to the state of the art.

For the letter Œ/œ and the typographical signs † … – ‹ › ‘ and ’ I used undefined code points that worked on Windows computers and were approximated on others; these can now be corrected to their Unicode counterparts without any problems, but this has not yet been done.

For other letters, they are either not covered by Unicode or are currently not very well supported by browsers and fonts. For the commonly needed «hooked o», the letter ö is used instead, as in Modern Icelandic and the normalised texts in Bugge's Edda edition (to make it possible to differentiate between this usage and the actual letter ö sometimes used, the latter is expressed as a «named enity» – ö). Other missing characters are mostly expressed using the most similar available letter in boldface:

Unicode	«ad hoc»	description
ę	e	«hooked e» aka «e caudata» aka «e with ogonek»
ę́	é	«hooked e» with acute accent
ǫ	o	«hooked o»
ǫ́	ó	«hooked o» with acute accent
ø̨	ø	«hooked ø»
v́	v	v with acute accent
ſ	s	«tall s», like f without the crossbar
ꜹ	a/	av-ligature
ꜳ	a	aa-ligature
ꝺ	d	«d rotunda», like ð without the crossbar

Also to work around shortcomings of the browsers of the period, the HTML markup does not fully conform to the standards; but it is at least fairly consistently nonconformant…

The future and how to get there...

Fortunately, great improvements are being made through MUFI, the Medieval Unicode Font Initiative. They coordinate the needs of scholars, and have adopted a two part approach to pave the way for interoperable support for the resulting list of characters. Each of these are given permanent entity names and temporary but fixed code points in the private use area of Unicode, while proposals for inclusion in Unicode proper are also submitted.

This means that original material can be precisely transcribed in a permanently interoperable way. As characters move from private use area code points to official Unicode, only the style sheets transforming from entity names to code points need to be updated; and even this is done centrally and not by each user individually. Font providers can make MUFI conformant fonts covering the needed characters regardless of their status in Unicode – and as this changes, only the internal code tables need updating.

Above the plaintext level, Menota, the Medieval Nordic Text Archive provides detailed guidelines for how to encode source texts at various levels on the scale between facsimile and normalised text.

Tor Gjerde <i@old.no>

The rendering of uncommon characters in the etexts on this site (or: waiting for Unicode support)

The future and how to get there...

The rendering of uncommon characters in the etexts on this site
(or: waiting for Unicode support)