xmlattr, xmlcalloc, xmlelem, xmlfind, xmlfree, xmllook, xmlmalloc, xmlnew, xmlparse, xmlprint, xmlstrdup, xmlvalue – DOM model XML library


#include <u.h>
#include <libc.h>
#include <xml.h>
enum {
Fcrushwhite = 1,
Fstripnamespace = 2,
struct Xml {
Elem *root;          /* root of tree */
char *doctype;        /* DOCTYPE structured comment, or nil */
struct Elem {
Elem *next;          /* next element at this hierarchy level */
Elem *child;          /* first child of this node */
Elem *parent;         /* parent of this node */
Attr *attrs;          /* linked list of atributes */
char *name;          /* element name */
char *pcdata;         /* pcdata following this element */
int line;            /* Line number (for errors) */
struct Attr {
Attr *next;          /* next atribute */
Elem *parent;         /* parent element */
char *name;          /* atributes name */
char *value;          /* atributes value */
Attr* xmlattr(Xml *xp, Attr **root, Elem *parent,
char *name, char *value)
Elem* xmlelem(Xml *xp, Elem **root, Elem *parent, char *name)
Elem* xmlfind(Xml *xp, Elem *ep, char *path)
Elem* xmllook(Elem *ep, char *path, char *attr, char *value)
Xml*    xmlnew(int blksize)
Xml*    xmlparse(int fd, int blksize, int flags)
char* xmlvalue(Elem *ep, char *name)
void* xmlmalloc(Xml *xp, usize size)
void* xmlcalloc(Xml *xp, usize nelem, usize elemsz)
void* xmlstrdup(Xml *xp, char *s)
void    xmlfree(Xml *xp)
void    xmlprint(Xml *xp, int fd)


Libxml is a library for manipulating an XML document, in–memory (known as the DOM model). Each element may have a number of children, each of which has a number of attributes, each attribute has a single value. All elements contain a pointer to their parent element, the root element having a nil parent pointer. Pcdata (free form text) found between elements is attached to element which follows it. The line numbers where each element was found is stored to allow unambigious error messages during later analysis.
Strings are stored in two data structures: a binary tree for common names such as element and attribute names. Uncommon names such as values and pcdata are stored in a simple, unmanaged heap. These steps vastly reduce the memory footprint of the parsed file and the time needed to free the XML data.
Xmlparse reads the given file and builds an in–memory tree. Blocksize controls the granularity of allocation of the string heap described above, 8192 is typically used; a value of zero disabled the string heap and uses traditional malloc(2) calls. The flags field allows some control over the parser, it is a bitwise or of the following values:
All strings whitespace in PCdata is replaced by a single space and leading and trailing whitespace is removed.
Remove leading namespace strings form all element and attribute names; this effectively ignores namespaces which can lead to parsing ambiguities, though in practice it has not been a problem—yet.
Xml trees may also be built up by calling xmlnew to create the XML tree, followed by xmlelem and xmlattr to create individual elements and attributes respectively. Xmlelem takes the address of the root of an element list to which the new element should be appended, the address of the parent node the new element should reference, and the name of the node to create; It returns the address of the created element.
Xmlattr attaches an attribute to an existing element. It takes a list pointer and parent pointer like xmlelem, but requires both an atribute name and value, and returns the address of the new attribute.
Xmllook descends through the tree rooted at ep using the path specified in path. It then returns if elem is nil, or continues to search for a matching element. if attr and value are not nil, the search will continue for for an element which contains this attribute and value pair.
Xmlvalue searches the given element's attribute list and returns the value of the attribute found or nil if that attribute is not found.
Xmlprint writes the XML hierarchy rooted at ep as text to the given file descriptor.
Xmlmalloc, xmlcalloc, and xmlstrdup allocate memory within the Xml tree. Xmlfree frees all memory used by the given Xml tree.






Namespaces should be handled properly.
A SAX model parser will probably be needed sometime (e.g. for Ebooks).
UTF–16 headers should be respected but UTF–16 files seems rare.