Encoders

This section gives an overview of encoders, details on the encoders that ship with libxo, and documentation for developers of future encoders.

Overview

The libxo library contains software to generate four “built-in” formats: text, XML, JSON, and HTML. These formats are common and useful, but there are other common and useful formats that users will want, and including them all in the libxo software would be difficult and cumbersome.

To allow support for additional encodings, libxo includes a “pluggable” extension mechanism for dynamically loading new encoders. libxo-based applications can automatically use any installed encoder.

Use the “encoder=XXX” option to access encoders. The following example uses the “cbor” encoder, saving the output into a file:

df --libxo encoder=cbor > df-output.cbor

Encoders can support specific options that can be accessed by following the encoder name with a colon (‘:’) and one of more options, separated by a plus sign “+”:

df --libxo encoder=csv:path=filesystem+leaf=name+no-header

This example instructs libxo to load the “csv” encoder and pass the following options:

path=filesystem
leaf=name
no-header

Each of these option is interpreted by the encoder, and all such options names and semantics are specific to the particular encoder. Refer to the intended encoder for documentation on its options.

CSV - Comma Separated Values

libxo ships with a custom encoder for “CSV” files, a common format for comma separated values. The output of the CSV encoder can be loaded directly into spreadsheets or similar applications.

A standard for CSV files is provided in RFC 4180, but since the format predates that standard by decades, there are many minor differences in CSV file consumers and their expectations. The CSV encoder has a number of options to tailor output to those expectations.

Consider the following XML:

% list-items --libxo xml,pretty
<top>
  <data test="value">
    <item test2="value2">
      <sku test3="value3" key="key">GRO-000-415</sku>
      <name key="key">gum</name>
      <sold>1412</sold>
      <in-stock>54</in-stock>
      <on-order>10</on-order>
    </item>
    <item>
      <sku test3="value3" key="key">HRD-000-212</sku>
      <name key="key">rope</name>
      <sold>85</sold>
      <in-stock>4</in-stock>
      <on-order>2</on-order>
    </item>
    <item>
      <sku test3="value3" key="key">HRD-000-517</sku>
      <name key="key">ladder</name>
      <sold>0</sold>
      <in-stock>2</in-stock>
      <on-order>1</on-order>
    </item>
  </data>
</top>

This output is a list of instances (named “item”), each containing a set of leafs (“sku”, “name”, etc).

The CSV encoder will emit the leaf values in this output as fields inside a CSV record, which is a line containing a set of comma-separated values:

% list-items --libxo encoder=csv
sku,name,sold,in-stock,on-order
GRO-000-415,gum,1412,54,10
HRD-000-212,rope,85,4,2
HRD-000-517,ladder,0,2,1

Be aware that since the CSV encoder looks for data instances, when used with The “xo” Utility, the --instance option will be needed:

% xo --libxo encoder=csv --instance foo 'The {:product} is {:status}\n' stereo "in route"
product,status
stereo,in route

The path Option

By default, the CSV encoder will attempt to emit any list instance generated by the application. In some cases, this may be unacceptable, and a specific list may be desired.

Use the “path” option to limit the processing of output to a specific hierarchy. The path should be one or more names of containers or lists.

For example, if the “list-items” application generates other lists, the user can give “path=top/data/item” as a path:

% list-items --libxo encoder=csv:path=top/data/item
sku,name,sold,in-stock,on-order
GRO-000-415,gum,1412,54,10
HRD-000-212,rope,85,4,2
HRD-000-517,ladder,0,2,1

Paths are “relative”, meaning they need not be a complete set of names to the list. This means that “path=item” may be sufficient for the above example.

The leafs Option

The CSV encoding requires that all lines of output have the same number of fields with the same order. In contrast, XML and JSON allow any order (though libxo forces key leafs to appear before other leafs).

To maintain a consistent set of fields inside the CSV file, the same set of leafs must be selected from each list item. By default, the CSV encoder records the set of leafs that appear in the first list instance it processes, and extract only those leafs from future instances. If the first instance is missing a leaf that is desired by the consumer, the “leaf” option can be used to ensure that an empty value is recorded for instances that lack a particular leaf.

The “leafs” option can also be used to exclude leafs, limiting the output to only those leafs provided.

In addition, the order of the output fields follows the order in which the leafs are listed. “leafs=one.two” and “leafs=two.one” give distinct output.

So the “leafs” option can be used to expand, limit, and order the set of leafs.

The value of the leafs option should be one or more leaf names, separated by a period (“.”):

% list-items --libxo encoder=csv:leafs=sku.on-order
sku,on-order
GRO-000-415,10
HRD-000-212,2
HRD-000-517,1
% list-items -libxo encoder=csv:leafs=on-order.sku
on-order,sku
10,GRO-000-415
2,HRD-000-212
1,HRD-000-517

Note that since libxo uses terminology from YANG (RFC 7950), the data modeling language for NETCONF (RFC 6241), which uses “leafs” as the plural form of “leaf”. libxo follows that convention.

The no-header Option

CSV files typical begin with a line that defines the fields included in that file, in an attempt to make the contents self-defining:

sku,name,sold,in-stock,on-order
GRO-000-415,gum,1412,54,10
HRD-000-212,rope,85,4,2
HRD-000-517,ladder,0,2,1

There is no reliable mechanism for determining whether this header line is included, so the consumer must make an assumption.

The csv encoder defaults to producing the header line, but the “no-header” option can be included to avoid the header line.

The no-quotes Option

RFC 4180 specifies that fields containing spaces should be quoted, but many CSV consumers do not handle quotes. The “no-quotes” option instruct the CSV encoder to avoid the use of quotes.

The dos Option

RFC 4180 defines the end-of-line marker as a carriage return followed by a newline. This CRLF convention dates from the distant past, but its use was anchored in the 1980s by the DOS operating system.

The CSV encoder defaults to using the standard Unix end-of-line marker, a simple newline. Use the “dos” option to use the CRLF convention.

The Encoder API

The encoder API consists of three distinct phases:

  • loading the encoder
  • initializing the encoder
  • feeding operations to the encoder

To load the encoder, libxo will open a shared library named:

${prefix}/lib/libxo/encoder/${name}.enc

This file is typically a symbolic link to a dynamic library, suitable for dlopen`().  libxo looks for a symbol called `xo_encoder_library_init inside that library and calls it with the arguments defined in the header file “xo_encoder.h”. This function should look as follows:

int
xo_encoder_library_init (XO_ENCODER_INIT_ARGS)
{
    arg->xei_version = XO_ENCODER_VERSION;
    arg->xei_handler = test_handler;

    return 0;
}

Several features here allow for future compatibility: the macro XO_ENCODER_INIT_ARGS allows the arguments to this function change over time, and the XO_ENCODER_VERSION allows the library to tell libxo which version of the API it was compiled with.

The function places in xei_handler should be have the signature:

static int
test_handler (XO_ENCODER_HANDLER_ARGS)
{
     ...

This function will be called with the “op” codes defined in “xo_encoder.h”. Each op code represents a distinct event in the libxo processing model. For example OP_OPEN_CONTAINER tells the encoder that a new container has been opened, and the encoder can behave in an appropriate manner.