19 Aug 2020 - tsp
Last update 19 Aug 2020
7 mins
TL;DR Use public classes wrapping public properties, instantiate XmlSerializer
and use the Serialize
and Deserialize
methods as shown in the last two
code blocks in this article.
Serializing and deserializing data is one of the most basic operations during application development. As anyone who learned programming languages back int the 80s and 90s knows there is a huge amount of hacked ways to store data, especially in science there has been a time where a huge amount of data has been stored in comma separated value (CSV) style file formats that changed frequently or have been extended in different ways over different branches of program versions or even stored in ASCII files with some assumed structure. As time passed more structured and standardized file formats have been developed such as SGML files - one of the most famous applications of SGML is HTML - and for more sophisticated data using databases like SQLite, HDF files or XML.
Databases such as SQLite or even more advanced databases - be it relational or NoSQL databases - and HDF files might be considered overkill for many applications and they are only partially portable and accessible to applications outside of oneselfs ecosystem. This also applies to file formats such as CERNs ROOT file format commonly used in high energy physics. All of these methods do have major advantages over any text based file storage such as providing indices, versioning, carrying metadata for all objects, allowing fast access and handling the split onto multiple storage backends - and they should definitely be used when applicable. But on the other hand they might be considered too heavyweight and too dependency loaded to be used with an application - and they might also not be suited well for long term data storage.
The XML markup language on the other hand allows easy and human readable structured storage according to a specified schema or without any schema. It looks easy at first but parsing without any library is really complicated - there are more than 80 EBNF rules specified for the basic file format, not talking about namespace, schema definitions, etc. Nevertheless it’s a popular file format and there is fortunately a huge amount of libraries available to process XML files. If one wants to do some processing one should rely on such libraries instead of trying to write routines by oneself even though that’s a nice exercise (I did - and even wrote an own EBNF compiler to be somehow capable to parse XML using an recursive descent parser which took quite some time). Even less one should try to process XML files using regular expressions or similar methods, this approach would be doomed to failure right from the start even if it looks like it works in the beginning.
As a student of mine wanted to store some data in a structured way for a really simple experimental application programmed in C# in which he tried to do some experiments using LINQ on in-memory lists the idea of mapping his data hierarchically into XML files emerged - and he was missing a simple example on how to do this even after a few hours of web search. Don’t get me wrong, all of this information is existing on the Internet and it’s easy to locate in case you know what you’re searching for but he was missing a single working base from which he could start working and reading the documentation. So I decided to write this short blog-post containing some really basic samples that I supplied to him during some lectures.
Note that this samples:
XmlSerializer
First one needs some objects that are storing the data. This is known to anyone
who’s doing object oriented programming - for example the bean
style objects
in Java. This is a class of objects that resemble data structures but allow one to
supply even more fine grained control over property access using some getter and
setter methods.
One should define an object for each and every type of data. These wrapper classes should be public and all properties that are stored inside the XML file also have to be public. One can control access using getters and setters though. In the following example all setters and getters will be publicly accessible.
The following example models a simple collection of movies (film) that contain some (non complete) metadata as well as a reference onto a director and roles played by actors. As one can see it’s rather simple and non complete but should also only serve as a starting point and simple example:
namespace XmlSample {
public class Person {
public string name { get; set; }
public int yearOfBirth { get; set; }
}
}
namespace XmlSample {
public class Actor : Person {
}
}
namespace XmlSample {
public class Director : Person {
}
}
namespace XmlSample {
public class ActingRole : Actor {
public string roleName { get; set; }
}
}
namespace XmlSample {
public class Film {
public string title { get; set; }
public int releaseYear { get; set; }
public Director director { get; set; }
public List<ActingRole> actors { get; set; }
public Film() {
actors = new List<ActingRole>();
}
}
}
namespace XmlSample {
public class FilmCollection {
public List<Film> filmCollection { get; set; }
public FilmCollection() {
filmCollection = new List<Film>();
}
}
}
If one now wants to serialize a FilmCollection
object into an XML file
one simply can use XmlSerializer
provided by the .NET library. One instances
the serializer and provides the datatype that should be serialized. This information
will be used by the serializer to automatically determine the schema used.
Then one passes a StreamWriter
as well as the object that should be serialized
and calls Serialize
.
public static void serializeFile(string filename, FilmCollection col) {
XmlSerializer xSerializer = new XmlSerializer(typeof(FilmCollection));
StreamWriter fileWriter = new StreamWriter(filename);
xSerializer.Serialize(fileWriter, col);
fileWriter.Close();
}
That’s it (without error handling of course - i.e. it should only serve as a starting point)
The deserialization process is equally simple. Create an XmlSerializer
,
specify the datatype, provide an stream and call Deserialize
. One has
to cast the object type though:
public static FilmCollection deserializeFile(string filename) {
XmlSerializer xSerializer = new XmlSerializer(typeof(FilmCollection));
FileStream fs = new FileStream(filename, FileMode.Open);
FilmCollection readCollection = (FilmCollection)xSerializer.Deserialize(fs);
fs.Close();
return readCollection;
}
Easy, as serialization.
Of course that’s not the end of the story - one can control the serialization
and deserialization process using a number of attributes. For example
the XmlRootAttribute
allows one to specifiy the namespace used, the
alternative name of the root attribute instead of the auto derived one from the
class name, properties such as nullable properties, etc; XmlArrayAttribute
can be used to modify the name of collection items such as lists, XmlAttribute
allows to serialize properties into attributes instead of elements.
Since there is a huge bunch of attributes one should really consider
reading the excellent documentation
for XmlSerializer
This article is tagged:
Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)
This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/