PAN Localization Project

Science Technology and Environment Agency of Lao PDR

National University Computer and Emergency Science of Pakistan

International Development Research Center

 

 

 

Lao Fonts

 

Phonpasit PHISSAMAY, National Project Director

Valaxay DALALOY, National Project Coordinator

(Mr. Thonglor DUANGSAVANH);

(Mr. Vethsouvanh PHENGCHANH);

(Mr. Khamkeo KOMMADAM)

 

 

I.       Definition

 

Font is a collection of glyphs used for the visual depiction of character data. A font is often associated with a set of parameters (for example, size, posture, weight, and serifness), which, when set to particular values, generate a collection of imagable glyphs.

 

Font has three components: Coded font, Font character set and Code page

 

Coded Font

A coded font translates your request for type (for example, text you previously entered at a computer terminal) into characters for printing. A coded font, which associates a specific code page with a specific font character, consists of two parts:

  • References to specific font character sets
  • References to specific code pages

 

A character must be included in the specified font character set and listed on the specified code page before it can be printed.

 

 

Font Character Set

A font character set contains the characters of a single type family, typeface, and type size. In addition, a font character set specifies character properties and printing attributes.

 

Characters are the letters, numerals, punctuation marks, or other symbols of a font.

 

Character properties describe how a character is positioned relative to the characters around it. Some character properties include the following:

  • The baseline of a character, showing its alignment on the writing line
  • The dimensions of space in which the character is printed
  • The position of the character within that space
  • The identifier of the character (the character ID or graphic character ID)

Each character is assigned a character ID; for example, the character A (uppercase A) is assigned the character ID LA020000. The purpose of a character ID is to distinguish the character from other, similar characters. For example, the following characters look similar; however, they are different and are assigned different character IDs:

        Minus sign (-) Character ID SA000000;

        Hyphen (-) Character ID SP100000;

        Em dash (--) Character ID SM900000

 

The printing attributes define how the font character set will be printed. Some printing attributes include rotation of characters, maximum ascender, and point size.

Code Page

A code page maps each character of text to the characters in a font character set. The following picture shows how a code page maps text to the characters in a font character set. As you enter your text at a computer terminal, each keyboard character is translated into a code point. When the text is printed, each code point is matched to a character ID on the code page you specified. The character ID is then matched to the image of the character in the font character set you specified. The image in the character set is the image that is printed.

 

 

A character ID is an 8-byte character data string. A code point is an 8-bit binary number representing one of 256 potential characters (the maximum number of characters available on a code page). Code points are usually shown as hexadecimal representations of their binary values.

Binary: 11000001; Decimal: 193; Hexadecimal : C1

 

II.    Word in Lao:

 

Structure of Lao syllable:

-        Level 1: The character appearing in level 1 is of diacritic type. There are five diacritic namely:       

 

-        Level 2: Level 2 is occupied by superscript vowels only. The seven vowels of level 2 are:          

           

-        Level 3: This level is the main level of Lao word. There is always a character at level 3 at each position in a Lao word. All thirty-three consonants as well as the before and after vowels twelve and 2 special symbols are also at level 3. However some consonants and vowels are also extended into level 2 and level 4 such as:

 

 

-        Level 4: The characters appearing in level 4 is lowered script vowels and one mixed consonant. There are following symbols:     

 

Due to the four levels structure, the high and length of characters existed in each level are not the same. If considering the character in the level 3 is main for compare then the size of character in level2 and lvel4 are equivalent 50% of size of character in level3. And the size of character in level1 is equivalent 50% of size of character in level2

 

 

The type of Lao characters:

 

The type of Lao characters development also impacted from the country development such as regime and equipment facilities. However it can be classified into 3 groups:

 

1.       The traditional or old typewriter: Based on MAHASILA grammar book (Old Lao Grammar) this has been developed during the royal regime (before 1975). The characteristic is rounded glyphs with thin and uniform-width strokes. Example:

 

 


2.       The new typewriter or schoolbook in present: Based on PHOUMY VONGVICHITH grammar book (new Lao grammar) this has been developed after establishment of LAO PDR (after 1975). The characteristic is glyphs with straight strokes where possible, and somewhat heavier uniform-width strokes. Example:

 

 


 

3.       Ornamental glyph: The new development glyph in order to make the Lao character look more beauty. The most of the modern glyphs are developed since last five year after the computer has created a big impact into the printing materials. Most of this glyphs are using in the brochure, advertisement letter or magazine. The characteristic is calligraphic strokes, handwriting styles. Example:

--

 

 

 

III. Lao Fonts

 

1.            Factors for considerations:

 

When considering which font to use, apart from appearance, there are four main factors to consider:

-          Is word-wrapping important?   For a large amount of text, it is much more convenient if the text can be entered without having to think about breaking each line by hand. This becomes important whenever text must be edited or revised, to prevent minor changes resulting in every subsequent line needing adjusting.

-          Do you need both Lao and roman character in single font? While in word processing, this is rarely a problem, in many other applications (such as spreadsheets and database applications) it is often important to be able to mix languages in a single entry, for which a common font must be used.

-          Does the application program interpret numeric character and symbols correctly? Many Lao fonts use the standard codes for numbers and arithmetic symbols for other characters, which leads to program errors, especially in spreadsheet and database applications. The hyphen code, in particular, is often recognized as a minus sign, and must be used with care.

-          Do you need a wide range of styles? For heading, or for brochures, the above factors are usually less important than being able to choose from a wide range of font styles.

 

What style the font is drawing in must be decided before drawing even the first character so that they will all be balanced in shape and style. It is important to decide on basic width for character in reference to the showing position, especially for the tone mark and superscript vowels they have many different positions placed in the syllable.

 

 

2.      Methodologies:

 

There are 3 stages for Lao shaping engine processes text:

1.       Analyze characters for valid diacritic combinations

2.       Shape (substitute) glyphs with OTLS (OpenType Library Services)

3.       Position glyphs with OTLS

 

        Analyze Characters

The unit that the shaping engine receives for the purpose of shaping is a string of Unicode characters, in a sequence. The contextual analysis engine verifies valid diacritic combinations. For additional information, see Invalid Combining Marks.

The handling of the AM in the analysis phase is special. In the case where an above mark does not exist on the preceding base consonant, the 'ccmp' feature will be used to decompose the AM into the NIGGAHITA and AA glyphs. This allows the NIGGAHITA glyph to be positioned correctly above the preceding base consonant. If there is a tone mark on the base consonant already, the analysis engine will decompose the AM and reorder the NIGGAHITA to between the base consonant and the tone mark. This allows the NIGGAHITA glyph to be positioned correctly above the base consonant, and the tone mark to be positioned correctly above the NIGGAHITA. This behavior cannot be tested in VOLT, as this logic is not in VOLT.

        Shape Glyphs with OTLS

The first step: Uniscribe takes in shaping the character string is to map all characters to their nominal form glyphs.

 

Next, Uniscribe calls OTLS to apply the features. All OTL processing is divided into a set of predefined features. Each feature is applied, one by one, to the appropriate glyphs in the syllable and OTLS processes them. Uniscribe makes as many calls to the OTL Services as there are features. This ensures that the features are executed in the desired order.

 

        Position Glyphs with OTLS

Uniscribe next applies features concerned with positioning, calling functions of OTLS to position glyphs.

Positioning features:

  • Kerning: Apply feature 'kern' to provide pair kerning between base glyphs requiring adjustment for better typographical quality
  • Mark to base: Apply feature 'mark' to position diacritic glyphs to the base glyph
  • Mark to Mark: Apply feature 'mkmk' to position diacritic glyphs to other diacritic glyphs

 

        Invalid Combining Marks

Combining marks and signs that appear in text not in conjunction with a valid consonant base are considered invalid. Uniscribe displays these marks using the fallback rendering mechanism defined in the Unicode Standard (section 5.12, 'Rendering Non-Spacing Marks' of the Unicode Standard 3.0), i.e. positioned on a dotted circle. For the fallback mechanism to work properly, a Lao OTL font should contain a glyph for the dotted circle (U+25CC). In case this glyph is missing from the font, the invalid signs will be displayed on the missing glyph shape (white box).

In addition to the 'dotted circle', other Unicode code points that are recommended for inclusion in any Lao font is the ZWSP (zero width space; U+200B). Lao words are not separated by spaces, so the ZWSP can be used for word boundaries since it will allow for word wrapping at the end of a line. Some applications will use a lexical lookup to do word wrapping without needing ZWSP characters.

If an invalid combination is found, the diacritic that causes the invalid state is placed on a dotted circle to indicate to the user the invalid combination. The shaping engine for non-OpenType fonts will cause invalid mark combinations to overstrike. This is the problem that inserting the dotted circle for the invalid base solves. It should also be noted that the dotted circle is not inserted into the application's backing store; this is a run-time insertion into the glyph array that is returned from the ScriptShape function.  The invalid diacritic logic for Lao is based on the classes listed below. There is a check to make sure more than one mark of a class is not placed on the same base.

Class

Description

Code points

ABOVE1

Above mark closest to base

U+0EB1, U+0EB4, U+0EB5, U+0EB6, U+0EB7, U+0EBB, U+0ECD

ABOVE2

Second level above mark

U+0EC8, U+0EC9, U+0ECA, U+0ECB, U+0ECC

BELOW1

Below mark closest to base

U+0EBC

BELOW2

Second level below mark

U+0EB8, U+0EB9

Vowel:AM

The AM character

U+0EB3

3.      Lao Font feature:

 

        Shape characteristic of Lao Characters.

The shape of Lao character can classify into 6 groups:

 

Lao Character Glyph at Syllable Structure

 

 

 

        Kerning

The 'kern' feature is used to adjust amount of space between glyphs, generally to provide optically consistent spacing between glyphs. Although a well-designed typeface has consistent inter-glyph spacing overall, some glyph combinations require adjustment for improved legibility. Besides standard adjustment in either horizontal or vertical direction, this feature can supply size-dependent kerning data via device tables, "cross-stream" kerning in the Y text direction, and adjustment of glyph placement independent of the advance adjustment. Note that this feature would not be used in mono-spaced fonts.

The font stores a set of adjustments for pairs of glyphs. These may be stored as one or more tables matching left and right classes, and/or as individual pairs. If both forms are used, the classes should be listed last, so as to provide a means to replace any non-ideal values that may result from the class tables. Additional adjustments may be provided for larger sets of glyphs (e.g., triplets, quadruplets, etc.) to overwrite the results of pair kerns in particular combinations. These should precede the pairs.

Example:

 

 

 

 

 

 

Kerning by pair adjustment using Microsoft VOLT

Before Kerning                        

 


            After Kerning

              Mark to base positioning

The 'mark' feature positions mark glyphs in relation to a base glyph, or a ligature glyph. This feature may be implemented as a MarkToBase or a MarkToLigature.

Example:

 

 

 

 

 

 

Positioning mark to base using Microsoft VOLT

Before:

 

After:

 

        Mark to mark positioning

The 'mkmk' feature positions mark glyphs in relation to another mark glyph. This feature may be implemented as a MarkToMark.

            Example:

Positioning mark to mark using Microsoft VOLT

 


            Before:

 

            After:

 

 

 

Reference:

 

  1. Developing Open Type Fonts for Lao Script by Microsoft Corporation

2.       Lao Script Typography Primer by Dr. Jonh DURDIN

 

 

 

 

 

 

 

 

The glyphs characteristic of each Lao character: