Encoding#
Custom type encoding#
pydantic-xml
uses pydantic
default encoder to encode fields data during xml serialization. To alter the default
behaviour pydantic
provides a mechanism to customize
the default json encoding format for a particular type. pydantic-xml
allows to do the same for xml serialization.
The api is similar to the json one:
class Model(BaseXmlModel):
class Config:
xml_encoders = {
bytes: base64.b64encode,
}
...
The following example illustrate how to encode bytes
typed fields as Base64 string during xml serialization:
model.py:
import base64
import pathlib
from typing import List, Optional, Union
from xml.etree.ElementTree import canonicalize
import pydantic
from pydantic_xml import BaseXmlModel, attr, element
class File(BaseXmlModel):
name: str = attr()
content: bytes = element()
@pydantic.validator('content', pre=True)
def decode_content(cls, value: Optional[Union[str, bytes]]) -> Optional[bytes]:
if isinstance(value, str):
return base64.b64decode(value)
return value
class Files(BaseXmlModel, tag='files'):
class Config:
xml_encoders = {
bytes: lambda value: base64.b64encode(value).decode(),
}
__root__: List[File] = element(tag='file', default=[])
files = Files()
for filename in ['./file1.txt', './file2.txt']:
with open(filename, 'rb') as f:
content = f.read()
files.__root__.append(File(name=filename, content=content))
expected_xml_doc = pathlib.Path('./doc.xml').read_bytes()
assert canonicalize(files.to_xml(), strip_text=True) == canonicalize(expected_xml_doc, strip_text=True)
file1.txt:
hello world!!!
file2.txt:
¡Hola Mundo!
doc.xml:
<files>
<file name="./file1.txt">
<content>aGVsbG8gd29ybGQhISEK</content>
</file>
<file name="./file2.txt">
<content>wqFIb2xhIE11bmRvIQo=</content>
</file>
</files>
None type encoding#
Since xml format doesn’t support null
type natively it is not obvious how to encode None
fields
(ignore it, encode it as an empty string or mark it as xsi:nil
) the library doesn’t implement
None
type encoding by default.
You can define your own encoding format for the model:
from typing import Optional
from pydantic_xml import BaseXmlModel, element
class Company(BaseXmlModel):
class Config:
xml_encoders = {
type(None): lambda o: '', # encodes None field as an empty string
}
title: Optional[str] = element()
company = Company()
assert company.to_xml() == b'<Company><title /></Company>'
or drop None
fields at all:
from typing import Optional
from pydantic_xml import BaseXmlModel, element
class Company(BaseXmlModel):
title: Optional[str] = element()
company = Company()
assert company.to_xml(skip_empty=True) == b'<Company />'
Default namespace#
Xml default namespace is a namespace that is applied to the element and all its sub-elements without explicit definition.
In the following example the element company
has no explicit namespace but the default namespace for that
element and all its sub-elements is http://www.company.com/co
. contacts
element has no explicit
namespace either but it doesn’t inherit it from company
because it has its own default namespace.
The same goes for socials
element except that its sub-elements inherit a namespace from the parent:
<company xmlns="http://www.company.com/co">
<contacts xmlns="http://www.company.com/cnt" >
<socials xmlns="http://www.company.com/soc">
<social>https://www.linkedin.com/company/spacex</social>
<social>https://twitter.com/spacex</social>
<social>https://www.youtube.com/spacex</social>
</socials>
</contacts>
</company>
A model for that document can be described like this:
class Socials(
BaseXmlModel,
tag='socials',
nsmap={'': 'http://www.company.com/soc'},
):
urls: List[str] = element(tag='social')
class Contacts(
BaseXmlModel,
tag='contacts',
nsmap={'': 'http://www.company.com/cnt'},
):
socials: Socials = element()
class Company(
BaseXmlModel,
tag='company',
nsmap={'': 'http://www.company.com/co'},
):
contacts: Contacts = element()
Look at the model’s parameter nsmap
. To set a default namespace for a model and its sub-fields
pass that namespace by an empty key.
Default namespace serialization
Standard libray xml serializer has a default namespace serialization problem: it doesn’t respect
default namespaces definition moving namespaces definition to the root element substituting them with
ns{0..}
namespaces:
<ns0:company xmlns:ns0="http://www.company.com/co"
xmlns:ns1="http://www.company.com/cnt"
xmlns:ns2="http://www.company.com/soc">
<ns1:contacts>
<ns2:socials>
<ns2:social>https://www.linkedin.com/company/spacex</ns2:social>
<ns2:social>https://twitter.com/spacex</ns2:social>
<ns2:social>https://www.youtube.com/spacex</ns2:social>
</ns2:socials>
</ns1:contacts>
</ns0:company>
That document is still correct but some parsers require namespace declaration kept untouched. To avoid
that use lxml
a as serializer backed since it doesn’t have that kind of problem.
XML parser#
pydantic-xml
tries to use the fastest xml parser in your system. It uses lxml
if it is installed
in your environment otherwise falls back to the standard library xml parser.
To force pydantic-xml
to use standard xml.etree.ElementTree
xml parser set FORCE_STD_XML
environment variable.
XML serialization#
XML serialization process is customizable depending on which backend you use.
For example lxml
can pretty-print the output document or serialize it using a particular encoding
(for more information see lxml.etree.tostring()
).
To set that features pass them to pydantic_xml.BaseXmlModel.to_xml()
xml = obj.to_xml(
pretty_print=True,
encoding='UTF-8',
standalone=True
)
print(xml)
Standard library serializer also supports customizations.
For more information see xml.etree.ElementTree.tostring()
,