NAME

block, case, compose, decompose, fold, surrogate, type, uconv, unfold, unsurrogate – rune transformations

SYNOPSIS

rune/block rune ...
rune/case [ ltu ] [ –f file ]
rune/compose [ file ... ]
rune/decompose [ –u ] [ file ... ]
rune/fold [ –i ] [ file ... ]
rune/surrogate [ file ... ]
rune/type [–x] [ file ... ]
rune/uconv [ –f ] [ –n defsize ] [ file ... ]
rune/unfold [ re ... ]
rune/unsurrogate [ file ... ]

DESCRIPTION

These programs provide transformations on runes.
Block converts from rune to the containing Unicode block name.
If there exists a case conversion from the given rune, case converts to the specified case. The ltu flags convert to lower, title and upper case, respectively. The default is lower case.
If there exists an equivalent precombined codepoint, compose combines base codepoints with any following combining codepoints. Decompose is its inverse. The –u flag emits the combining characters with \u or \U escapes, suitable for conversion with uconv.
Fold converts codepoints to their base codepoint, esentially stripping combining characters while unfold transforms a regular expression to one that matches any string which would match the original expression if folded first. Both accept –i which makes the conversion case insensitive.
Type prints the codepoint then the type classes of each given rune, which may be any of alpha, title, space, lower, upper, and digit. Upper– and lowercase append the type with a string consisting of a colon, the corresponding lower– or upper case rune and the codepoint in parenthesis. Digit values are followed by a colon and the corresponding digit value.
Uconv converts \u0000 (4 hex digits) and \U000000 (6 hex digits) to corresponding runes. With the –f flag, it is assumed that the escapes are terminated by non–numbers so the number of digits is not checked while –n sets the default width.
Surrogate converts runes outside the Basic Plane to surrogate pairs while runes in the Basic Plane are unchanged; unsurrogate is its inverse. Surrogate pairs are not used within the system.

EXAMPLES

Generate the first 10 Greek letters
awk 'BEGIN{for(i=945; i<955; i++)printf "\\u%.4x", i}' |
rune/uconv
Find alternate spellings of “naïve” in the dictionary
grep `{rune/unfold naïve} /lib/words
Show the type of a
03b1 alpha lower:Α(0391)

FILES

/lib/unicode
/sys/src/libc/port/*.h

SOURCE

/sys/src/cmd/runetype

SEE ALSO

rune(2)

BUGS

Still a bit raw. Type has wierd output. It's not clear that uconv does the most useful conversions.