runeclass, runecompose, runedecompose – Unicode character equivalence


#include <u.h>
#include <libc.h>
Rune *runeclass(Rune r)
int runecompose(Rune base, Rune combiner)
int runedecompose(Rune base, Rune *decomp)


These routines use codepoint properties from the Unicode standard to combine, decompose and determine sets of characters with the same base character. The set of codepoints with the same base codepoint are a generalization of the equivilence between various cases such as title, lower an upper: runeclass returns this set. Likewise, runecompose takes a base codepoint and a combining codepoint (e.g. u+0308, combining diaresis) and returns the combined form, if it exists. For example
echo e\u0308 | rune/uconv | rune/compose
Conversely, runedecompose returns 0 and the base codepoint and the first combining codepoint. If there is no further decomposition, –1 is returned. Multiple calls may be necessary for a full decomposition. For example, codepoint 1e65 is “latin small letter s with acute and dot above”
; unicode 0x1e65 | rune/decompose –u


grep(1), rune(1),
The Unicode Consortium. The Unicode Standard, Version 8.0.0, (Mountain View, CA: The Unicode Consortium, 2015. ISBN 978–1–936213–10–8)