📃 XML parser and stringifier
✨ Features
- Based on quick-xml rust package (compiled to WASM)
- Support
XML.parse
andXML.stringify
- Support
<!-- -->
comments - Support XML entities (
&
,&
,&
, ...) - Support mixed content (text and nodes)
- Large output transformation options
- Auto-flattening of nodes with a single child, text or attributes
- Auto-revival of
boolean
,number
, etc. - Auto-group same-named nodes into arrays
- Formatting options (indentation, break lines, etc.)
- Custom
reviver
andreplacer
functions
- Metadata stored into non-enumerable properties for advanced usage
4.x.x
to 5.x.x
🕊️ Migrating from Starting from version 5.0.0
, this library use a WASM-compiled binding of the quick-xml rust package. It provides better performances while allowing to support more features.
Internal API changes
The $XML
internal symbol has been replaced by a set of non-enumerable properties:
- Parent node can now be accessed through
"~parent"
property (it'll benull
for the XML document node) - Tag name can now be accessed through
"~name"
property - Children nodes can now be accessed through
"~children"
property- CDATA can now be tested by checking whether a node has a
"~name": "~cdata"
(if flattened, you'll need to check from the parent node using~children
property)
- CDATA can now be tested by checking whether a node has a
<root>
<node><![CDATA[hello <world>]]></node>
</root>
<ref *1> {
- [$XML]: { cdata: [ "root", "node" ] },
+ "~parent": null,
+ "~name": "~xml",
root: {
node: "hello <world>",
- [$XML]: { name: "root", parent: null },
+ "~parent": [Circular *1],
+ "~name": "root",
+ "~children": [ { "~name": "~cdata", "#text": "hello <world>" } ],
}
}
XML document changes
XML document properties have been moved directly to top-level rather than being stored in xml
property.
Doctype is now stored in "#doctype"
property, and attributes values are set to ""
rather than true
.
Processing instructions (like XML stylesheets) are now parsed the same way as regular nodes but have been moved into "#instructions"
property.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="styles.xsl" type="text/xsl"?>
<!DOCTYPE attribute>
<root/>
{
- xml: {
- "@version": "1.0",
- "@encoding": "UTF-8",
- },
+ "@version": "1.0",
+ "@encoding": "UTF-8",
- "$stylesheets": [ { "@href": "styles.xsl", "@type": "text/xsl" } ]
+ "#instructions": {
+ "xml-stylesheet": { "@href": "styles.xsl", "@type": "text/xsl" }
+ },
- doctype: { "@attribute": true },
+ "#doctype": { "@attribute": "" },
root: null
}
Mixed content support
This breaks any existing code that was expecting mixed content to always be a string. Now mixed content nodes will be parsed as usual, and the #text
property will contain the "inner text" of the node.
Note that #text
is actually a getter that recursively gets the #text
of children nodes (ignoring comment nodes), so it'll also handle nested mixed content correctly.
<root>some <b>bold</b> text</root>
{
- root: "some <b>bold</b> text",
+ root: {
+ "#text": "some bold text",
+ b: "bold",
+ }
}
Comments
Comments have been moved into "#comments"
property. Note that this property is now always an array, even if there is only one comment.
Additionally, you can find comments into the ~children
property by searching for nodes with "~name": "~comment"
. If you call the #text
getter on a parent node containing comments, it will return the inner text without comments.
<root><!--some comment--></root>
{
root: {
- "#comment": "some comment",
+ "#comments": [ "some comment" ],
}
}
Parsing
Options
Parse options are categorized into 4 groups:
clean
, which can removeattributes
,comments
, xmldoctype
andinstructions
from the outputflatten
, which can flatten nodes with only atext
node,empty
ones or transformattributes
only nodes into objects without the@
prefixrevive
, which cantrim
content (unlessxml:space="preserve"
), unescape xmlentities
, revivebooleans
andnumbers
- You can also provide a
custom
reviver function (applied after other revivals) that will be called on each attribute and node - Note that signature of the reviver function has changed
- You can also provide a
mode
, which can be eitherxml
orhtml
. Choosing the latter will be more permissive than the former.
const options = {
- reviveBooleans: true,
- reviveNumbers: true,
- reviver:() => {},
+ revive: { booleans: true, numbers: true, custom: () => {} },
- emptyToNull: true,
- flatten: true,
+ flatten: { text: true, empty: true },
- debug: false,
- progress: () => null,
}
Please refer to the documentation for more information.
Parsing streams
The parse()
function supports any ReaderSync
, which means you can pass directly a file reader for example.
import { parse } from "./parse.ts"
parse(await Deno.readTextFile("example.xml"))
Async parsing is not supported yet, but might be added in the future.
Stringifying
Options
Stringify options are now categorized into 2 groups:
format
, which can configure theindent
string and automaticallybreakline
when a text node is too long- Since you pass a string rather than a number for indent, it means that you can also use tabs instead of space too
replace
, which can forcefully escape xmlentities
- You can also provide a
custom
replacer function that will be called on each attribute and node - Note that signature of the replacer function has changed
- You can also provide a
const options = {
- indentSize: 2,
+ format: { indent: " " },
- escapeAllEntities: true,
- replacer: () => {},
+ replace: { entities: true, custom: () => {} },
- nullToEmpty: false,
- debug: false,
- progress: () => null,
}
Please refer to the documentation for more information.
Stringifying content
Please refer to the above section about API changes. If you were handling XML document properties, using the $XML
symbol or #comment
property, or dealing with mixed nodes content, you'll most likely need to update your code.
This library now provides comment()
and cdata()
helpers to respectively create comment and CDATA nodes.
import { cdata, comment, stringify } from "./stringify.ts"
stringify({
"@version": "1.0",
"@encoding": "UTF-8",
root: {
comment: comment("hello world"),
cdata: cdata("bonjour <le monde>"),
text: "hello world",
node: {
foo: true,
bar: 42,
baz: {
"@attribute": "value",
},
},
},
})
<?xml version="1.0" encoding="UTF-8"?>
<root>
<comment><!--hello world--></comment>
<cdata><![CDATA[bonjour <le monde>]]></cdata>
<text>hello world</text>
<node>
<foo>true</foo>
<bar>42</bar>
<baz attribute="value"/>
</node>
</root>
Note that while you can theorethically use internal API properties, we strongly advise against it currently. Supporting ~children
might be added in the future (#57) for mixed content, but its behavior is not well defined yet.
Setting ~name
manually might lead to unexpected behaviors, especially if it differs from the parent key.
📜 License and credits
Copyright (c) Lecoq Simon <@lowlighter>. (MIT License)
https://github.com/lowlighter/libs/blob/main/LICENSE
This library used to be published at deno.land/x/xml and jsr.io/@lowlighter/xml. It was moved into jsr.io/@libs/xml starting version 5.0.0
.
Version prior to 5.0.0
used to be fully written in TypeScript but it was rewritten in Rust to improve performances and support more features.