---
title: "module lang::xml::IO"
id: IO
slug: /Library/lang/xml/IO
---

<div class="theme-doc-version-badge badge badge--secondary">rascal-Not specified</div>

Basic IO for XML to Rascal and  back
#### Usage

```rascal
import lang::xml::IO;
```

#### Dependencies
```rascal
import util::Maybe;
```

#### Description


The XML binding implemented by this module is _untyped_. The readers and streamers produce
values of type `node` for every (nested) tag. 

To bind the resulting values to more strictly typed ADTs, use [Validator](../../..//Library/util/Validator.md).


## function readXML {#lang-xml-IO-readXML}

```rascal
value readXML(loc file, bool fullyQualify=false, bool trackOrigins = false, bool includeEndTags=false, bool ignoreComments=true, bool ignoreWhitespace=true, str charset="UTF-8", bool inferCharset=!(charset?))
```

## function streamXML {#lang-xml-IO-streamXML}

Stream all the tags in a file, one-by-one, without ever having the entire XML file in memory.

```rascal
Maybe[value]() streamXML(loc file, str elementName, bool fullyQualify=false, bool trackOrigins = false, bool includeEndTags=false, bool ignoreComments=true, bool ignoreWhitespace=true, str charset="UTF-8", bool inferCharset=!(charset?))
```


[Stream X ML](../../..//Library/lang/xml/IO.md#lang-xml-IO-streamXML) returns a closure function. When you call it repeatedly, it will produce a single value `just(...)` for each
occurrence of `elementName` tags in the input. The final call will produce `nothing()`, so you know when to stop.

`IO` exceptions can still be thrown even when you are already streaming. This means an entire file has dissappeared,
or permissions were revoked during the execution of the stream. Only when you receive `nothing()` it is indicated
that the `elementName` tag is not further present in the file.

#### Examples



```rascal-shell 
rascal>import IO;
ok
```
a (prefix of) an example XML file from the web
```rascal-shell
rascal>readFile(|https://www.w3schools.com/xml/cd_catalog.xml|(0,500))
str: "\<?xml version=\"1.0\" encoding=\"UTF-8\"?\>\n\<CATALOG\>\n  \<CD\>\n    \<TITLE\>Empire Burlesque\</TITLE\>\n    \<ARTIST\>Bob Dylan\</ARTIST\>\n    \<COUNTRY\>USA\</COUNTRY\>\n    \<COMPANY\>Columbia\</COMPANY\>\n    \<PRICE\>10.90\</PRICE\>\n    \<YEAR\>1985\</YEAR\>\n  \</CD\>\n  \<CD\>\n    \<TITLE\>Hide your heart\</TITLE\>\n    \<ARTIST\>Bonnie Tyler\</ARTIST\>\n    \<COUNTRY\>UK\</COUNTRY\>\n    \<COMPANY\>CBS Records\</COMPANY\>\n    \<PRICE\>9.90\</PRICE\>\n    \<YEAR\>1988\</YEAR\>\n  \</CD\>\n  \<CD\>\n    \<TITLE\>Greatest Hits\</TITLE\>\n    \<ARTIST\>Dolly Parton\</ARTIST"
───
<?xml version="1.0" encoding="UTF-8"?>
<CATALOG>
  <CD>
    <TITLE>Empire Burlesque</TITLE>
    <ARTIST>Bob Dylan</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Columbia</COMPANY>
    <PRICE>10.90</PRICE>
    <YEAR>1985</YEAR>
  </CD>
  <CD>
    <TITLE>Hide your heart</TITLE>
    <ARTIST>Bonnie Tyler</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>CBS Records</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1988</YEAR>
  </CD>
  <CD>
    <TITLE>Greatest Hits</TITLE>
    <ARTIST>Dolly Parton</ARTIST
───
rascal>import lang::xml::IO;
ok
```
let's read every CD one-by-one
```rascal-shell
rascal>nextCD = streamXML(|https://www.w3schools.com/xml/cd_catalog.xml|, "CD");
Maybe[value] (): function(|std:///lang/xml/IO.rsc|(3050,8,<54,231>,<54,239>))
```
every time we call `nextCD` we get the next one, until the end
```rascal-shell
rascal>nextCD()
Maybe[value]: just("cd"(
    "title"("Empire Burlesque"),
    "artist"("Bob Dylan"),
    "country"("USA"),
    "company"("Columbia"),
    "price"("10.90"),
    "year"("1985")))
rascal>nextCD()
Maybe[value]: just("cd"(
    "title"("Hide your heart"),
    "artist"("Bonnie Tyler"),
    "country"("UK"),
    "company"("CBS Records"),
    "price"("9.90"),
    "year"("1988")))
```
or we get the next 500, filtering the final `nothing()` results:
```rascal-shell
rascal>[ cd | _ <- [0..500], just(cd) := nextCD()]
list[node]: [
  "cd"(
    "title"("Greatest Hits"),
    "artist"("Dolly Parton"),
    "country"("USA"),
    "company"("RCA"),
    "price"("9.90"),
    "year"("1982")),
  "cd"(
    "title"("Still got the blues"),
    "artist"("Gary Moore"),
    "country"("UK"),
    "company"("Virgin records"),
    "price"("10.20"),
    "year"("1990")),
  "cd"(
    "title"("Eros"),
    "artist"("Eros Ramazzotti"),
    "country"("EU"),
    "company"("BMG"),
    "price"("9.90"),
    "year"("1997")),
  "cd"(
    "title"("One night only"),
    "artist"("Bee Gees"),
    "country"("UK"),
    "company"("Polydor"),
    "price"("10.90"),
    "year"("1998")),
  "cd"(
    "title"("Sylvias Mother"),
    "artist"("Dr.Hook"),
    "country"("UK"),
    "company"("CBS"),
    "price"("8.10"),
    "year"("1973")),
  "cd"(
    "title"("Maggie May"),
    "artist"("Rod Stewart"),
    "country"("UK"),
    "company"("Pickwick"),
    "price"("8.50"),
    "year"("1990")),
  "cd"(
    "title"("Romanza"),
    "artist"("Andrea Bocelli"),
    "country"("EU"),
    "company"("Polydor"),
    "price"("10.80"),
    "year"("1996")),
  "cd"(
    "title"("When a man loves a woman"),
    "artist"("Percy Sledge"),
    "country"("USA"),
    "company"("Atlantic"),
    "price"("8.70"),
    "year"("1987")),
  "cd"(
    "title"("Black angel"),
    "artist"("Savage Rose"),
    "country"("EU"),
    "company"("Mega"),
    "price"("10.90"),
    "year"("1995")),
  "cd"(
    "title"("1999 Grammy Nominees"),
    "artist"("Many"),
    "country"("USA"),
    "company"("Grammy"),
    "price"("10.20"),
    "year"("1999")),
  "cd"(
    "title"("For the good times"),
    "artist"("Kenny Rogers"),
    "country"("UK"),
    "company"("Mucik Master"),
    "price"("8.70"),
    "year"("1995")),
  "cd"(
    "title"("Big Willie style"),
    "artist"("Will Smith"),
    "country"("USA"),
    "company"("Columbia"),
    "price"("9.90"),
    "year"("1997")),
  "cd"(
    "title"("Tupelo Honey"),
    "artist"("Van Morrison"),
    "country"("UK"),
    "company"("Polydor"),
    "price"("8.20"),
    "year"("1971")),
  "cd"(
    "title"("Soulsville"),
    "artist"("Jorn Hoel"),
    "country"("Norway"),
    "company"("WEA"),
    "price"("7.90"),
    "year"("1996")),
  "cd"(
    "title"("The very best of"),
    "artist"("Cat Stevens"),
    "country"("UK"),
    "company"("Island"),
    "price"("8.90"),
    "year"("1990")),
  "cd"(
    "title"("Stop"),
    "artist"("Sam Brown"),
    "country"("UK"),
    "company"("A and M"),
    "price"("8.90"),
    "year"("1988")),
  "cd"(
    "title"("Bridge of Spies"),
    "artist"("T\'Pau"),
    "country"("UK"),
    "company"("Siren"),
    "price"("7.90"),
    "year"("1987")),
  "cd"(
    "title"("Private Dancer"),
    "artist"("Tina Turner"),
    "country"("UK"),
    "company"("Capitol"),
    "price"("8.90"),
    "year"("1983")),
  "cd"(
    "title"("Midt om natten"),
    "artist"("Kim Larsen"),
    "country"("EU"),
    "company"("Medley"),
    "price"("7.80"),
    "year"("1983")),
  "cd"(
    "title"("Pavarotti Gala Concert"),
    "artist"("Luciano Pavarotti"),
    "country"("UK"),
    "company"("DECCA"),
    "price"("9.90"),
    "year"("1991")),
  "cd"(
    "title"("The dock of the bay"),
    "artist"("Otis Redding"),
    "country"("USA"),
    "company"("Stax Records"),
    "price"("7.90"),
    "year"("1968")),
  "cd"(
    "title"("Picture book"),
    "artist"("Simply Red"),
    "country"("EU"),
    "company"("Elektra"),
    "price"("7.20"),
    "year"("1985")),
  "cd"(
    "title"("Red"),
    "artist"("The Communards"),
    "country"("UK"),
    "company"("London"),
    "price"("7.80"),
    "year"("1987")),
  "cd"(
    "title"("Unchain my heart"),
    "artist"("Joe Cocker"),
    "country"("USA"),
    "company"("EMI"),
    "price"("8.20"),
    "year"("1987"))
]
```

#### Benefits


* Low latency for accessing the first element in a long stream, and then the next and the next.
* Low (constant) memory usage because only one selected element is active at a time on the heap. This works particularly
well for XML documents that have huge amounts of sibling elements, like database table dumps.

#### Pitfalls


* Selection of `elementName` greatly influences memory usage. If you select a child of a repeated structure
only the child is clean up, while the parent structure remains. Memory will grow linearly with the amount
of parent structures again, defeating the point of calling [Stream X ML](../../..//Library/lang/xml/IO.md#lang-xml-IO-streamXML).
* Lower throughput for processing enormous documents. Compared to [Read X ML](../../..//Library/lang/xml/IO.md#lang-xml-IO-readXML), and _only if_ enough memory is available
to store both  the internal DOM _and_ the Rascal `node` structure, [Stream X ML](../../..//Library/lang/xml/IO.md#lang-xml-IO-streamXML) reaches a lower throughput because
of the function call overhead for each next element. If you do run out of memory with [Read X ML](../../..//Library/lang/xml/IO.md#lang-xml-IO-readXML) though, then [Stream X ML](../../..//Library/lang/xml/IO.md#lang-xml-IO-streamXML) reaches exponentially
higher throughput than [Read X ML](../../..//Library/lang/xml/IO.md#lang-xml-IO-readXML). 

## function readXML {#lang-xml-IO-readXML}

```rascal
value readXML(str contents, loc src = |unknown:///|, bool fullyQualify=false, bool trackOrigins = false, bool includeEndTags=false, bool ignoreComments=true, bool ignoreWhitespace=true)
```

## function writeXMLString {#lang-xml-IO-writeXMLString}

Pretty-print any value as an XML string

```rascal
str writeXMLString(value val, str charset="UTF-8", bool outline=false, bool prettyPrint=true, int indentAmount=4, int maxPaddingWidth=30, bool dropOrigins=true)
```


This function uses [JSoup's](http://www.jsoup.org) DOM functionality to 
yield a syntactically correct XML string.

## function writeXMLFile {#lang-xml-IO-writeXMLFile}

Pretty-print any value to an XML file

```rascal
void writeXMLFile(loc file, value val, str charset="UTF-8", bool outline=false, bool prettyPrint=true, int indentAmount=4, int maxPaddingWidth=30, bool dropOrigins=true)
```


This function uses [JSoup's](http://www.jsoup.org) DOM functionality to 
yield a syntactically correct (X)HTML file.

# Tests
## test nestedElementTest {#lang-xml-IO-nestedElementTest}

```rascal
test bool nestedElementTest() {
  example = "\<aap\>\<noot\>mies\</noot\>\</aap\>";
  
  val = readXML(example);
  
  return val == "aap"("noot"("mies"));
}
```

## test attributeTest {#lang-xml-IO-attributeTest}

```rascal
test bool attributeTest() {
  example = "\<aap age=\"1\"\>\</aap\>";
  
  val = readXML(example);
  
  return val == "aap"(age="1");
}
```

## test namespaceTest {#lang-xml-IO-namespaceTest}

```rascal
test bool namespaceTest() {
  example = "\<aap xmlns:ns=\"http://trivial\" ns:age=\"1\" age=\"2\"\>\</aap\>";
  
  val = readXML(example);
  
  return "aap"(\ns-age="1", age="2") := val;
}
```

