5

A simple alternative to XML, JSON, and co.

 1 year ago
source link: https://devm.io/php/xml-sml-json
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Introduction to the SML data format

A Simple Alternative to XML, JSON, and Co.

20. Sep 2022


Have you ever wondered why comments in XML are written in such a complicated way or why JSON doesn’t offer comments at all? Have you ever needed to fix an indentation error in your YAML file? If so, then you already know some of the hurdles in common text-based data formats. This article will take a look at an alternative: the Simple Markup Language.

Whether they’re configuration files, for data exchange between client and server, or for object serialization, text-based formats like XML, JSON, YAML, and co. are ubiquitous for developers. But it’s not just developers confronted with these formats. Software users, support specialists, administrators, and consultants also work with files and data streams in these formats for configuration or error analysis. Although their basic concepts are relatively easy to understand, not everyone can easily find their way around the sometimes extensive rules.

Even advanced developers don’t know all the lengthy specification details by heart. The 84-page long YAML specification is an example of this, along with the XML specification. If a format also offers alternatives for how to implement something, even experts start wondering. For example, YAML offers nine different ways to write a multiline string [1]. And anyone who has defined an XML format is guaranteed to have had a discussion or two about whether a value should be written as an element or as an attribute.

Also writing documents in these formats isn’t easy for everyone. Even if programming languages train developers to write many special characters, ten-finger typists are especially aware that special characters can significantly influence writing speed. That’s why some German developers switch their keyboard layout to the US layout in order to type special characters more easily.

Another aspect is the issue of readability. A JSON document that’s minimized down to one line is only meaningfully readable, if a developer has tools for formatting or better rendering. How often are documents with sensitive data put into Pretty Printer web pages without knowing where the data is sent? And someone has likely already come across an XML document displayed in Internet Explorer.

The question is, can we find an alternative data format to XML, JSON, and co. that:

  • has reduced its set of rules to a minimum, yet remains functionally equal,
  • is easy and fast to write,
  • is readable even without special tools and,
  • is easy to understand and intuitive, even for non-experts?

The Simple Markup Language, or SML for short, targets exactly these requirements. In the following sections, we’ll take a look at SML’s basic concepts and notation, and at the end, we’ll highlight some potential application areas.

The first example

To get started, let’s consider the following example of an SML geodata format describing a prominent geographic point. In this case, it’s a Seattle city landmark, the Space Needle observation tower.

PointOfInterest
  City		Seattle
  Name		"Space Needle"
  GpsCoords	47.6205 -122.3493
  # Opening hours should go here
End

You can see that SML is a line-based format. The first line that starts the document gives an indication about its content. The second line defines the city that the point of interest is located in and represents an attribute. The attribute name and the attribute value are separated by several spaces, and there isn’t a special character like a colon or an equals sign between them. Line three contains the landmark’s name. In contrast to line two, the attribute’s value is written in double quotes. These are written because the name itself contains a space. The next line represents the point’s GPS coordinates. Attributes can contain several values. As in this case, these are separated from each other with spaces or other whitespace characters and are written one after another. The second to last line contains no information other than a comment starting with a hash that goes to the end of the line. The last line contains the word End and closes the document. You’ll notice that SML gets by with relatively few special characters. Even someone who isn’t an expert can easily type this text and it wouldn’t take them very long.

XML, JSON, and YAML

Now, let's compare the SML document with an XML document containing the same information (Listing 1).

Listing 1

<?xml version="1.0" encoding="UTF-8"?>
<PointOfInterest>
  <City>Seattle</City>
  <Name>Space Needle</Name>
  <GpsCoords lat="47.6205" long="-122.3493"/>
  <!-- Opening hours should go here -->
</PointOfInterest>

The first thing to note is that this is just one way that information from the SML example can be mapped to XML. For example, the GPS coordinates could be represented as sub-elements instead of attributes, or as InnerText separated by special characters, which is later split into two components. The developer’s preference influences which option they choose. XML is a powerful markup language that can be used to represent data in a structured way and format text in the true sense of a markup language. However, in this example of a structured dataset, we can see that many more special characters are used. Attribute values must be written in double quotes and closing tags have the same name as opening tags. If we held a small typing competition without any special tools, the person typing SML would probably finish first. Besides the XML declaration in the first line, and typing the comments, XML’s main hurdles can be points such as namespaces, or specification details. For example, are line breaks allowed in attributes? Can attributes be commented out? And can you enter the syntax of a CDATA block from your head?

For comparison, let's consider another widely used standard: the JavaScript Object Notation. In JSON, our geodata example would look like this:

{ "City":	"Seattle",
  "Name":	"Space Needle",
  "GpsCoords":	[47.6205, -122.3493],
  "_comment": "Opening hours should go here" }

JSON is a simple data format that’s very popular, especially due to its interaction with JavaScript in the browser. It’s used for client-server communication, for custom data formats as an alternative to XML, as a configuration format, and more. It builds upon the use of double quotes, colons, commas, square brackets, curly brackets, and some keywords to describe data structures and types. String values must be written in double quotes, and C-compatible escape sequences allow a JSON document to be written entirely in just one line. A forgotten or superfluous comma at the end leads to a syntax error. Comments that can be written in JavaScript with // and /* */ aren’t allowed in the JSON standard. For serialization formats, this might not be a big deal. But for formats where you comment in and out parts, such as configuration files, this can be impractical and can lead to some workarounds or alternative formats. These range from...


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK