The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

tab2xml: XML Generation from Tables

Introduction

XML files are commonly used to configure various tools and applications. While there are dedicated tools for generating XML files through structured input forms, in many cases, it is beneficial to organize data in tables linked by primary and foreign keys, especially when dealing with large datasets or multiple instances of the same entity.

The tab2xml package provides a solution by enabling the generation of XML files based on predefined templates and structured tabular data stored in Excel (.xlsx) or OpenDocument Spreadsheet (.ods) files. By leveraging relational tables, users can efficiently handle and manage large-scale data entry, ensuring consistency and ease of modification.

This approach simplifies XML creation, making it more accessible and scalable when working with complex data structures.

Content

In addition to this introduction, the document includes:

Example: Mondrian Schema

The starting point for this example is the Mondrian schema, based on a schema example from the Mondrian Documentation, shown below.

<Schema>
    <Cube name="Sales">
        <Table name="sales_fact_1997" />
        <Dimension name="Gender" foreignKey="customer_id">
            <Hierarchy name="Gender" hasAll="true" allMemberName="All Genders" primaryKey="customer_id">
                <Table name="customer" />
                <Level name="Gender" column="gender" type="String" uniqueMembers="true" />
            </Hierarchy>
        </Dimension>
        <Dimension name="Time" foreignKey="time_id">
            <Hierarchy name="Time" hasAll="false"  primaryKey="time_id">
                <Table name="time_by_day" />
                <Level name="Year" column="the_year" type="Numeric" uniqueMembers="true" />
                <Level name="Quarter" column="quarter" type="Numeric" uniqueMembers="false" />
                <Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false" />
            </Hierarchy>
        </Dimension>
        <Measure name="Unit Sales" column="unit_sales" aggregator="sum" formatString="#,###" />
        <Measure name="Store Sales" column="store_sales" aggregator="sum" formatString="#,###.##" />
        <Measure name="Store Cost" column="store_cost" aggregator="sum" formatString="#,###.00" />
        <CalculatedMember name="Profit" dimension="Measures" formula="[Measures].[Store Sales] - [Measures].[Store Cost]">
            <CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00" />
        </CalculatedMember>
    </Cube>
</Schema>

To define this schema, we can either edit the XML file directly or use the Pentaho Schema Workbench tool. Another approach is to break it down into tables and templates, work with the tables, and generate the required versions directly, as shown below.

Scheme decomposition

Next, we will decompose the content of this schema into tables, defined as sheets in the spreadsheets files (.xlsx or .ods) included in the package, and templates.

Root of the schema template: schema_template.xml

Below is the root of the schema template.

<Schema>
    {Cube}
</Schema>

The root can contain only one token (in this case, the {Cube} token). If multiple tokens or none are found, an error is raised.

In the example, a schema consists of one or more cubes.

Starting from the root, each element must have an associated sheet and a template file, where the sheet’s columns or other components are included.

Cube

Only one cube is defined in the schema; therefore, the table contains only one row.

In this case, a cube has an associated table. However, since other elements can also have associated tables, we add the foreign key to the cube (as well as to each element with an associated table).

cube_pk name table_fk
1 Sales 1

Regarding the template, we can include two types of tokens:

    <Cube name="{name}">
        {Table}
        {Dimension}
        {Measure}
        {CalculatedMember}
    </Cube>

When including tokens that reference other sheets, each token must be placed on a separate line, as they will be replaced with templates.

Table

Since the tables are referenced by other elements, we define a primary key with values to allow referencing (as we did from the cube).

table_pk name
1 sales_fact_1997
2 customer
3 time_by_day

The template only needs to include the table name using the specified syntax.

                <Table name="{name}" />

Dimension

Each dimension is included in a cube and can also contain other elements that reference it.

dimension_pk name foreignKey cube_fk
1 Gender customer_id 1
2 Time time_id 1

In the template, the sheet’s columns and the containing element (which includes one or more hierarchies) are referenced.

        <Dimension name="{name}" foreignKey="{foreignKey}">
            {Hierarchy}
        </Dimension>

Hierarchy

Hierarchies reference both the dimensions that contain them and their associated tables.

hierarchy_pk name hasAll allMemberName primaryKey dimension_fk table_fk
1 Gender true allMemberName=“All Genders” customer_id 1 2
2 Time false NA time_id 2 3

In the template, we can see that it includes the table and the levels that make up the hierarchy.

            <Hierarchy name="{name}" hasAll="{hasAll}" {allMemberName} primaryKey="{primaryKey}">
                {Table}
                {Level}
            </Hierarchy>

Level

Levels do not need to be referenced, so they do not include a primary key. Instead, they reference the hierarchies they belong to.

name column type uniqueMembers hierarchy_fk
Gender gender String true 1
Year the_year Numeric true 2
Quarter quarter Numeric false 2
Month month_of_year Numeric false 2

The template references the columns of the associated sheet.

                <Level name="{name}" column="{column}" type="{type}" uniqueMembers="{uniqueMembers}" />

Measure

Measures are included in cubes and must reference them. Since they do not need to be referenced, they do not require a primary key.

name column aggregator formatString cube_fk
Unit Sales unit_sales sum #,### 1
Store Sales store_sales sum #,###.## 1
Store Cost store_cost sum #,###.00 1

The template includes the columns from the sheet.

        <Measure name="{name}" column="{column}" aggregator="{aggregator}" formatString="{formatString}" />

CalculatedMember

Calculated members may have components that must be referenceable (they have a primary key) and are included in cubes.

calculatedmember_pk name dimension formula cube_fk
1 Profit Measures [Measures].[Store Sales] - [Measures].[Store Cost] 1

The template shows the elements they contain.

        <CalculatedMember name="{name}" dimension="{dimension}" formula="{formula}">
            {CalculatedMemberProperty}
        </CalculatedMember>

CalculatedMemberProperty

Finally, we have the properties of the calculated members they reference.

name value calculatedmember_fk
FORMAT_STRING $#,##0.00 1

The template uses the columns from the corresponding sheet.

            <CalculatedMemberProperty name="{name}" value="{value}" />
            

Generate the Schema

To generate the schema, tokens in each template are replaced with corresponding values from a data source.

It handles both direct replacements from the associated sheet and foreign key relationships.

library(tab2xml)

# Define file paths
source_xml <- system.file("extdata", "schema_template.xml", package = "tab2xml")
source_xlsx <- system.file("extdata", "schema.xlsx", package = "tab2xml")
temp_file <- tempfile(fileext = ".xml")

# Convert spreadsheet to XML
file <- sheet2xml(source_xlsx, source_xml, temp_file)

As shown in the example, we only need to specify:

The resulting output is shown below.

library(xml2)

xml_content <- readLines(file, warn = FALSE)

cat("```xml\n", paste(xml_content, collapse = "\n"), "\n```", sep = "")
<Schema>
    <Cube name="Sales">
                <Table name="sales_fact_1997" />
        <Dimension name="Gender" foreignKey="customer_id">
            <Hierarchy name="Gender" hasAll="true" allMemberName="All Genders" primaryKey="customer_id">
                <Table name="customer" />
                <Level name="Gender" column="gender" type="String" uniqueMembers="true" />
            </Hierarchy>
        </Dimension>
        <Dimension name="Time" foreignKey="time_id">
            <Hierarchy name="Time" hasAll="false"  primaryKey="time_id">
                <Table name="time_by_day" />
                <Level name="Year" column="the_year" type="Numeric" uniqueMembers="true" />
                <Level name="Quarter" column="quarter" type="Numeric" uniqueMembers="false" />
                <Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false" />
            </Hierarchy>
        </Dimension>
        <Measure name="Unit Sales" column="unit_sales" aggregator="sum" formatString="#,###" />
        <Measure name="Store Sales" column="store_sales" aggregator="sum" formatString="#,###.##" />
        <Measure name="Store Cost" column="store_cost" aggregator="sum" formatString="#,###.00" />
        <CalculatedMember name="Profit" dimension="Measures" formula="[Measures].[Store Sales] - [Measures].[Store Cost]">
            <CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00" />
            
        </CalculatedMember>
    </Cube>
</Schema>

Conclusions

The tab2xml package offers an efficient and scalable method for generating XML files from tabular data. By using structured templates and relational tables, the package facilitates the management of complex datasets while maintaining consistency and accuracy.

This approach is particularly advantageous for scenarios requiring the generation of large XML files or handling multiple entities with defined relationships. It streamlines the transformation process, reduces manual errors, and enhances maintainability by allowing users to modify the underlying data without altering the XML structure.

The provided example, based on the Mondrian schema, demonstrates how to apply this methodology to real-world use cases. This framework can be extended to other domains requiring XML generation from relational data, offering a flexible and robust solution for data-driven XML creation.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.