There has been considerable discussion recently regarding an appropriate representation of text in VRML. While it is quite obvious that text can be represented as the polygons making up the glyph images, it is equally obvious that some way of representing text directly as strings will result in significant performance benefits.
This document discusses the representation of text within VRML, and presents some tentative designs for font specification, and formatting specification nodes.
There are two basic approaches to including text in VRML: one is to allow each text node to have a different character set and encoding, and another is to require each node to be of the same character set and encoding. Each of these is discussed below.
In some ways, this is the simplest case. All text in a "document" (for want of a better word), uses the same character set and encoding. In some ways, it simplifies text node design, but it has a number of problems: in fact, the problems facing this design are almost exactly those facing HTML. Three major problems are:
These three problems combined lead one to look for a simplification, and the perfect solution seems to be to use ISO 10646/Unicode, particularly in some of it's ASCII-compatible encodings (UTF-8 being the best).
In fact, this is a very elegant solution, and is basically what I proposed for HTML in December of last year. However, this solution has some major problems:
For VRML to use Unicode, designing some format for encoding such information would be necessary.
The combination of these leads to basically requiring that any multilingual text representation be able to support multiple character sets and encodings, and especially for Internet data types. A sad fact of life.
It is the authors' opinion that eventually, Unicode will become widely adopted. VRML seems ideally suited to solving one major issue: that of fonts. The number of characters in a working set for multilingual text tends to be small. Given a shared repository of glyph images represented as polygons, it seems perfectly feasible for a viewer to use the local fonts for most of the text nodes encountered, and to fetch (and cache) glyph images as necessary. The same concept has been proposed for Unicode use in HTML by Larry Masinter.
Below is a proposal for a basic representation of text within VRML. The key features of the proposal are that it is extensible, and that it should allow all text data to be processed on equal terms, regardless of language, character set, or encoding.
Note: The following proposal will require that the MIME type for VRML allow 8 bit data. This requirement can be removed if some encoding for 8 bit data is defined.
The most fundamental node type is the Text node type. It's basic role is to serve as a container for a string of characters. It is defined as:
Text { SFString coded_character_set SFString encoding SFString language SFInteger string_length MFOctet data }
The definition contains a new data type MFOctet, which should probably have a corresponding SFOctet type. These would be defined as:
foo "A"
foo [ 4 "abAB" ]
Such a node has the advantage of allowing any representation of text at all, but still remaining parseable by systems unable to handle the character set and encoding used.
It is important to note that this node does nothing more than hold a string of text. All formatting information should be stored elsewhere. Using Tranform and whatnot should allow interesting effects, even with text.
The following is a few notes, and tentative proposals for a few text-related node types. At some point in the future, all the following will need to be specified for VRML, as will a Font node type (for fonts represented as polygons).
The following node is for specifying font type information. It should be representable using various font technologies. The fields should be self-explanatory.
FontSpecification { SFString family SFEnum weight # VERYLIGHT, LIGHTT, MEDIUM, DEMIBOLD, BOLD SFEnum slant # ROMAN, ITALICS, OBLIQUE SFInteger point_size }
The following node represents a basic formatting specification node. Basically, it allows the width, height, justification, and colors of the text node to be set.
FormatSpec { SFInteger width SFInteger height SFColor background SFColor foreground SFEnum justification # FILL_LEADING, FILL_BOTH, FILL_TRAILING }
The justification field should be independent of language.
It is envisaged that the nodes be used in the following manner:
TextNode { FormatSpec { .... } Text { .... } FontSpecification { .... } }
Technical contents of ISO 639:1988 (E/F) "Code for the representation of names of languages". Typed by [email protected] 1990-11-30 Two-letter lower-case symbols are used. The Registration Authority for ISO 639 is Infoterm, Osterreiches Normungsinstitut (ON), Postfach 130, A-1021 Vienna, Austria. aa Afar ab Abkhazian af Afrikaans am Amharic ar Arabic as Assamese ay Aymara az Azerbaijani ba Bashkir be Byelorussian bg Bulgarian bh Bihari bi Bislama bn Bengali; Bangla bo Tibetan br Breton ca Catalan co Corsican cs Czech cy Welsh da danish de german dz Bhutani el Greek en English eo Esperanto es Spanish et Estonian eu Basque fa Persian fi Finnish fj Fiji fo Faeroese fr French fy Frisian ga Irish gd Scots Gaelic gl Galician gn Guarani gu Gujarati ha Hausa hi Hindi hr Croatian hu Hungarian hy Armenian ia Interlingua ie Interlingue ik Inupiak in Indonesian is Icelandic it Italian iw Hebrew ja Japanese ji Yiddish jw Javanese ka Georgian kk Kazakh kl Greenlandic km Cambodian kn Kannada ko Korean ks Kashmiri ku Kurdish ky Kirghiz la Latin ln Lingala lo Laothian lt Lithuanian lv Latvian, Lettish mg Malagasy mi Maori mk Macedonian ml Malayalam mn Mongolian mo Moldavian mr Marathi ms Malay mt Maltese my Burmese na Nauru ne Nepali nl Dutch no Norwegian oc Occitan om (Afan) Oromo or Oriya pa Punjabi pl Polish ps Pashto, Pushto pt Portuguese qu Quechua rm Rhaeto-Romance rn Kirundi ro Romanian ru Russian rw Kinyarwanda sa Sanskrit sd Sindhi sg Sangro sh Serbo-Croatian si Singhalese sk Slovak sl Slovenian sm Samoan sn Shona so Somali sq Albanian sr Serbian ss Siswati st Sesotho su Sudanese sv Swedish sw Swahili ta Tamil te Tegulu tg Tajik th Thai ti Tigrinya tk Turkmen tl Tagalog tn Setswana to Tonga tr Turkish ts Tsonga tt Tatar tw Twi uk Ukrainian ur Urdu uz Uzbek vi Vietnamese vo Volapuk wo Wolof xh Xhosa yo Yoruba zh Chinese zu Zulu