Structural Metadata

The purpose of this section is to make the reader familiar with the major constructs in the SDMX Information Model and how these are used to support the processes required for a data dissemination system.

This objective of this section is to enable both managers and technicians understand how the Fusion Matrix uses a metadata-driven approach to data dissemination. For technicians it is essential to understand the basic structures in the SDMX Information Model and how they are used, since a basic understanding of the SDMX Information Model is assumed in the other parts of the user guide.

Data Dissemination Processes

The processes covered in this guide are shown in the table below together with the fundamental Information Model constructs that support these. The role that these constructs play may be different for different process, and these are explained in more detail later.

Process

Constructs

Role

Create database tables

Data Structure Definition and related Concepts and Code lists

Dataflow

Create tables and columns based on the data structure for the Dataflow.

Load data base

Data Structure Definition and related Concepts and Code lists

Dataflow and related (valid) Content Constraint

Validate data.

Load data.

Data discovery

Category Scheme

Concepts

Data Providers

Dataflow

To enable the building of high level data discovery allowing the user to drill down to the broad data topic of data (Dataflow) of interest.

Query data sources

Data Structure Definition and related Concepts and Code lists

Data Source Index

To enable the building of search criteria that will bring back the data required.

Single user interface to multiple data sources

Dataflow

Provision Agreement

Registration

Data Source Index

Data Provider

Management of a data portal.

Attach metadata to data points

Metadata Structure Definition

Metadata Set

Unite metadata in the metadata set to the data points or structural metadata points to which the metadata relate.

Metadata Driven Dissemination

Scope

There is much talk of metadata driven processes in the context of the "industrialisation of statistics". The industrialisation of the processes concerned with data dissemination lends itself to being driven by metadata. However, a pre-requisite of a metadata driven system is a metadata model. The Fusion Matrix uses the SDMX Information Model which, for data validation and data dissemination, is functionally a superset of the equivalent part of GSIM (Generic Statistical Information Model).

There are four types of metadata recognised by the SDMX Information Model:

Structural metadata

Metadata describing and supporting data collection, data dissemination, and data discovery. In other words, the metadata content of the constructs in the SDMX Information Model such as a specific DSD, Dataflow, Concept Scheme, Code List, Category scheme, Provision Agreements, Constraint, etc.

(Data) Attributes

These metadata are contained in a data set and the Attribute is defined as part of the DSD. The metadata describe important aspects of the data such as the measurement unit used but do not play any role in identifying the observation value (i.e. it does not play a “classificatory” role and is therefore not a Dimension). Attributes can also relate to an observation value (e.g. estimate, not disseminated due to confidentiality).

Process metadata

Metadata that describes processes, the process flow, and the interaction with the structural metadata it uses. This type of metadata is used to drive processes. This type of metadata is used to drive processes but is at an early stage of development in the SDMX-IM and there is no known use of this for driving processes. Process metadata is not used by the Fusion Matrix.

Reference metadata

This covers all other metadata that are not capable of being described in structural of process metadata. Examples of this type of metadata are quality frameworks and “footnotes”. Typical use of this metadata is to explain and describe in more detail some aspect of the data provenance (e.g. surveys used), a particular Dimension value (e.g. the effect on the data resulting from the re-unification of Germany) or observation value.

Reference metadata often relate to data either directly (e.g. Dimension value, Observation value) or indirectly (e.g. metadata relating to a Concept, a Code, a Dataflow). The important thing to understand about Reference metadata is that these are often collected using different processes and at different times and frequency from data collection. These metadata are often stored and maintained independent of the data.

The SDMX Information Model (SDMX-IM) for Structural Metadata

The key parts of the SDMX-IM used by the Fusion Matrix and the broad role they play are shown in the diagram below.

fig2
Core metadata supporting data dissemination processes

The role that each of these metadata constructs play in a data dissemination system is described below in this guide. As can be seen most of the metadata play more than one role e.g. a Dataflow and Provision Agreement can be used both to load the dataset into a dissemination database and to aid data discovery and query. In reality, as you read the sections below, you will discover that most of the constructs shown lay a role in data discovery and query.

Loading Data

In order to load data into the database the table structures need to be specified. The Fusion Matrix automatically creates a table structure for each Dataflow and uses the Data Structure Definition (DSD) in order to do this. So there is no need for the database administrator to create the database tables, simply load a data set and these will be created automatically.

Note that it is possible to map the data set to be loaded to a structure that is different from the structure of the input data set. In this case it will be the mapped structure that is used to create the database tables.

As all of the Codelists and Concepts are held separate from the data in the database (copied from and kept in synchrony with the Fusion Registry) the data table structures are very simple which makes loading and query extremely performant.

Mapping Data

A useful and popular feature of the Fusion Matrix is the ability to map a data set to a structure that is different from that used by the data set. The map may be for fewer or a greater number of Dimensions, and/or coding schemes that are different. For example, it is possible to map a structure that has, effectively, a single “series” Dimension to a structure that breaks down into multiple dimensions. This is quite a common map to create as a multi-dimensional data set often easier to query in order to obtain the data required: a single multi-dimensional query would equate to querying for multiple series.

Data Validation

The Fusion Matrix validates the data as part of the load process and the validation process undertaken is driven by the metadata constructs shown below.

fig2
Core metadata required for data validation

The Data Structure Definition (DSD) is the fundamental structure in the SDMX-IM concerning data: it defines the valid content of a data set in terms of its dimensionality, variables, concepts, and valid content for the variables (e.g. code list or other data type). Note that whilst the SDMX-IM supports variables, there is no explicit construct called variable in the model and so this term is not used further in this guide unless required in a specific context: in the DSD a Dimension, Data Attribute (called Attribute in this guide), and Measure is the variable as these constructs join a Concept to its representation.

A more granular view of the structure of DSD is shown below.

fig2
Schematic showing the components of the Data Structure Definition

There are three types of construct in the DSD: Dimension, Attribute, and Measure. Each of these combines a Concept with its representation (this can be either a reference to a Codelist or a non-coded data type such as “integer”, “string”, date/time.

The role of the three types of construct (Dimension, Attribute, and Measure) is as follows:

A Dimension is an Identifying Component, often referred to as a Classificatory Variable. When a value is given to each of the Dimensions in a data set (this is often called a “key” or a “series”) the resulting key, when combined with a time value uniquely identifies an observation. The DSD construct that specifies the Concept and expected representation of an observation is called a Measure. Additional metadata that is useful for understanding or processing the observed value are called an Attribute in the DSD. Examples of an attribute are a note on the observation, a confidentiality status, or the unit of measure used.

As can be seen from the above image, the Dataflow references a DSD and this DSD contains nearly all of the metadata required to validate the contents of a data set.

Shown below is an example of Census data and how the values in the data set can be validated against the allowed values specified in the DSD.

fig2
Metadata used to validate a Data Set

This diagram shows the relationship between the Dimension concepts identified in a single series in a data set and the contents of the DSD that defines the Dimensions and allowable values. Whilst the data set snippet shows only the Dimensions, the Attributes and Measure have a similar relationship. Each series in a data set must define a unique combination of dimension values, and this identifying combination is referred to as the “series key”.

The DSD supports the following validation to be performed on the data set:

  • The reported concepts used are valid, and all the expected concepts are present
  • The role of the concept is valid (i.e. Dimension, Attribute, Measure)
  • The value reported for the Concept is valid (e.g. AGE has a value of Y_15-29 which is contained in the code list CL_AGE)

Note that for a coded Dimension/Attribute/Measure the data set contains the code id, and not the code label.

The Dataflow is a pivotal construct in the SDMX-IM, it is the construct for which data is both reported against, and disseminated against. It makes use of the structural information defined by the DSD, but enables for further restrictions to be specified for the allowable content.

  • It links to a DSD that defines the valid structure and content for a data set.
  • Content Constraint can attach to a dataflow to further constrain the permissible values for the Dimension/Attribute/Measure of the DSD.
  • It can partition the data that are described by a single DSD into manageable data sets that aid data discovery. This is achieved by restricting the allowable content of any data set (therefore creating a “Cube” definition) and giving the Dataflow a meaningful identity, such as: Labor Force employment by sex, age, and occupation; Labor Force employment by education, sex, age.
  • In data discovery it can link to a Category Scheme containing a list of topics such as Education, Labor, Population, Migration, Economy, Health, Development.

The Dataflow links to the Provision Agreement.

The Provision Agreement contains information about the supply of data by one Data Provider for one Dataflow. In a data dissemination environment it contains a link in the form of a Data Registration to the data source of the Data Provider, how the source is accessed, and the content of the Data Source in the form of a Data Source Index that lists the series keys present in the data source.

fig2
Provision Agreement and related metadata

The data are loaded in the Fusion Matrix database at the level of the Provision Agreement. The load process uses this part of the model to:

  • Identify the Dataflow and thus the DSD. The data are stored at the level of the Dataflow. If there is not a set of tables for the dataflow then the Fusion Matrix will automatically create the tables based on the Dimensionality specified in the DSD.
  • Create a Data Source Index of the series keys in the data set loaded. The index is held at the level of the Dataflow. Note that the index is not a construct in the SDMX Information Model and so this metadata is not held in the Registry: it is held only in the Fusion Matrix.

Disseminating Data

fig2
Data dissemination processes

Data Discovery

Data discovery can be achieved in three ways, filtering the search by Data Provider, by Category, and by Concept. Note that for each option the search is for one or more Dataflows that match the filter criteria. The filter options selected in each filter type will be combined e.g. the filter may have been by Data Provider and then by Category and so the Dataflows returned will reflect the selection in both filter types.

The examples below are taken from the demonstration dissemination service of the Fusion Matrix (http://demo.metadatatechnology.com/FusionMatrix/).

Filter by Data Provider

fig2
Data discovery filtering by Data Provider

This uses the following constructs in the Information Model.

fig2
Information Model support for filtering by Data Provider

Each Provision Agreement represents one Dataflow linked to one Data Provider.

Filter by Category

fig2
Data discovery filtering by Category

Here two Categories have been chosen which has resulted in three Datasets that meet the selection criteria.

This uses the following constructs in the Information Model.

fig2
Information Model support for filtering by Category

Note that each of the Dataflows can be connected to one or more Categories and any one Category can be connected to zero or more Dataflows. Only the Categories that are connected to Dataflows are shown in the Fusion Matrix GUI so this ensures that the data discovery will result in at least one Dataflow with data.

Filter by Concepts

fig2
Data discovery filtering by Concept

Here the Dataflows matching any one of the filter Concepts are shown. The Dataflow displays all of the Dimensions relevant to the Dataflow, and any matching Dimension matching the filter is highlighted.

This uses the following constructs in the Information Model.

fig2
Information Model support for filtering by Concept

Querying Data Sources

fig2
fig2
Making a query selection

This uses the following constructs in the Information Model.

fig2
Information Model support for data selection

The DSD and associated Code Lists (e.g. sex code list) and Concepts (e.g. Sex) supply the information for the Filters picker on the side bar.

The Dataflow supplies the information for the topic e.g. Population

The Data Source Index for the Dataflow contains all of the series keys and this is used both to calculate the number of series selected based on the current selection in the Filters picker, and to grey-out codes for which there is no data based on the current selection in all of the Dimensions. In the example data set a selection of “Native-born” and “Unknown place of birth” for the Place of birth Dimension will not result in any additional data being returned. This is important as it prevents the user querying for data that does not exist.

Note that whilst the (non-SDMX) Data Source Index and the (SDMX) Content Constraint contain the list of series keys and/or the list of Dimension codes that are in the data source, the construct of the two is different. The Content Constraint is quite verbose whilst the index is terse and stored in a form that supports directly the processes that need to use it. Consequently the index is not held in the Registry and is held in the Fusion Matrix. It is not an artefact in the SDMX Information Model but it contains the same information and does the same job as the Content Constraint.

For the data table the same metadata are used to build and pivot the table.

Using a Hierarchical Code List

A simple hierarchy of codes can be created in a Code List and this hierarchy can be visualised to assist in creating a data selection.

fig2
SDMX Code List showing a simple hierarchy from the NUTS (geography) classification for the UK

The principal restrictions on the hierarchy in a Code List are:

  • there can only one hierarchy, though here is no limit to the number of levels in the hierarchy
  • a code can have only one parent
  • the levels cannot be named

The SDMX Information Model also has a construct called the Hierarchical Code List (HCL). This removes these restrictions. The principal features of the HCL are:

  • multiple hierarchies to be built
  • a code can be used in more than one hierarchy
  • the hierarchies can be built from multiple Code Lists thus allowing codes to be introduced such as grouping codes (e.g. continents or economic communities that group countries taken from a geography code list)

The Information Model for the HCL is shown schematically below.

fig2
Hierarchical Code List

The HCL can have one or more Hierarchies each comprising Levels (optional) and Hierarchical Codes. The Hierarchical Code can be linked to a Level in the Hierarchy. Note that the Level is used to give semantic information about the Level and does not control the physical hierarchy which is specified by the hierarchy of Hierarchical Codes.

The Hierarchical Code references a Code in a Code List. This Code List can contain a flat list of Codes or a simple hierarchy of Codes. The Hierarchy in the HCL need not reflect the hierarchy in the Code List as the Hierarchical Code references the Code in the context its position in the hierarchy of the Hierarchical Code.

The HCL can be an extremely useful construct in a data dissemination system as it can introduce grouping codes and hierarchies that aid data discovery and data query. Importantly, the HCL can be built by organisations that do not maintain the Code Lists used by the DSDs which specify the structure of the datasets. In other words they can be added to the dissemination system without interfering with or changing existing structural metadata.

However, whilst the HCL is a part of the SDMX Information Model there is no standard way of relating the HCL to a construct such as a Dimension or Dataflow and it is left to individual systems to make and use these links.

The Fusion Matrix uses an Annotation to create the link. It supports the Annotation in two places:

Dataflow
fig2
Linking and HCL to a Dataflow by means of an Annotation

Here the Annotation Title is used to specify the SDMX URN of the HCL to be used for the Dataflow and Annotation Type identifies the Dimension for which the HCL is to be used.

The use of the SDMX constructs is shown below.

fig2
Showing part of an HCL where codes are taken from two code lists (Fusion Registry maintenance view)

An example of displaying this HCL in a dissemination system is shown below.

fig2
Dimension selection in the Fusion Matrix GUI using the HCL

Note that the codes taken from the Topic codelist are not selectable for query as they do not exist in the DSD and consequently there are no data series. The Hierarchical Code is used solely for grouping the codes in the Series Code List.

Connecting to Multiple Data Sources

In a data dissemination environment the ability to link to many sources enables the development of a data portal. In the data portal the user will have access to data from many sources accessed by the same GUI, visualised, concatenated (if the data structured in the same way), and downloaded all from a single interface.

This is achieved by using the Registration feature of the Registry. The artefacts that support the building of a data portal are:

fig2
Information Model support for data portal

The key construct in the SDMX Information Model to support the building of a data portal is the registered data source (called a Registration in the SDMX model) and the Data Source Index.

The choreography of using the constructs in the diagram above is as follows.

  1. The user makes queries starting with the Dataflow. In a portal environment the data source for the dataflow can be local (in the local Fusion Matrix) or in an external data source (this can be within the organisation or external).
  2. The Dataflow can link to one or more Provision Agreements. Each Provision Agreement can have zero to many Registrations, each one containing a URL from which the data can be retrieved (this can be a file at a web-accessible location or a SDMX web service).
  3. A Data Source Index is built for each registered data source by querying the registered data source for all the available series at that source.
  4. The Data Source Indexes for each Registration are rolled up to form a single global index per Dataflow. The global index is used for to assist in data query creation by greying out invalid values that will not return data based on the current selections made, and reporting on number of series per query.
  5. On data query each Data Source Index for each registered data source for the data flow is used to determine which data sources contain the data selected. This prevents data sources being queried if they do not contain data pertinent to the query. If multiple data sources contain data for the query, then each source is queried in parallel, and the resulting data sets are concatenated into a single data set in the format requested by the client.

Reference Metadata

Scope of this Section

This section is concerned with reference metadata: its scope, structure, and uses, and to relate this to the data dissemination facilities of the Fusion Matrix.

From a systems perspective the data set does not reference the metadata but the metadata identifies the data to which it relates. Therefore, one of the jobs of the data dissemination system is to unite the data and the metadata so that the user has access to both.

Reference Metadata Structure

The Information Model for Reference metadata is shown below:

fig2
Information Model support for Reference metadata

Reference metadata can cover many topics and so a (metadata) Report Structure identifies the Metadata Attributes that will contain the metadata when reported in a Metadata Set. This is similar to the Attribute construct in the DSD but in the Metadata Structure Definition (MSD) the Metadata Attributes can be built into a hierarchy. Like the Attribute, the Metadata Attribute can be coded in which case a Code List is referenced to specify valid content, or it can be text, HTML markup or a number of other types of representation. Again, like the (data) Attribute, the Metadata Attribute must reference a Concept as this gives it semantic meaning.

When the metadata are authored, reported, or exchanged it can only give true meaning if it relates to the construct or constructs to which it pertains. Therefore, the MSD also specifies the type of construct to which metadata can relate (the Metadata Target) and this can be an Identifiable construct such as Concept, Code, Code List, Dataflow etc. or a data key or partial key. For example if it is required to author or report metadata for the value of a specific Dimension (e.g. where the Country is Germany) or an observation then both the DSD (or Dataflow) and the key will need to be specified in the Metadata Set (e.g. Dataflow for Population and the value “Germany” for the Country Dimension. In this example the Metadata Target will contain two constructs, the Identifiable construct (Dataflow) and the Key Descriptor. The specification of the Key Descriptor as a target simply states that the target is a key or partial key which will be specified in the Metadata Set when the metadata are authored.

If the type of construct is an Identifiable construct then it is possible to specify the valid content of the construct when the metadata are authored by referencing any Item Scheme (Code List, Concept Scheme, Category Scheme, Organisation Scheme) that contains the valid content such as a Concept Scheme that contains the list of Concepts to which metadata can be attached.

The MSD is a flexible mechanism for supporting any type of metadata and many Report Structures and many Metadata Targets can be specified in an MSD. Each Report Structure must link to at least one, and can be linked to many, Metadata Targets in the same MSD.

Metadata Set

The metadata are contained in a Metadata Set. This is analogous to a Dataset with important differences:

  • The Metadata Set does not report data values and so there is no observation value. The metadata report is comprised solely of (metadata) Attributes
  • Whilst the metadata can be for a specific observation it can equally be for a Code, a Dimension value and may other types of "target". Therefore, the terms "Target" (of the metadata) and Attribute Set are used in the Metadata Set

Metadata Processes

Metadata Authoring

Unlike data, which is usually generated directly from a statistical processing system, reference metadata is often authored by hand. The MSD contains all of the information required to support a user wishing to author metadata. The MSD can be used by a system to prompt the user to enter metadata for valid Metadata Attributes and can validate and prompt for the entered information (e.g. codes, Concepts, full or partial keys).

Metadata Validation

Equally, a Metadata Set created by an application can be validated against the MSD in a similar way to a data set being validated against a DSD.

Data Dissemination

It is important that reference metadata related to data values, data sets, specific Dimension values etc, are made available to the user when the data are requested by the user. It is rare that the data database contains links to metadata so the responsibility falls on the metadata repository to add the metadata after the data set is retrieved from the database.

The SDMX Information Model allows for ad –hoc information to be made available by means of one or more Annotations.

fig2
Content allowed for an Annotation in SDMX

Annotations are supported in the SDMX-JSON specification and the SDMX-ML and this is the mechanism used by Fusion Matrix to post-attach metadata to data. This is attached in the form of a REST query that will return the metadata pertaining to the data at the point in the data set at which it is relevant.

Example Use of Annotation in Fusion Matrix

The Fusion Matrix supplied GUI places a small 'i' icon at the place to which the metadata relate and clicking on the icon will reveal the metadata (retrieved from the URL in the relevant Annotation).

fig2
Showing a metadata link for the Australia code

Importing Data

The Fusion Matrix offers a number of entry points for data import. Data can either be updated via the web using the Matrix Manager, via a command line using the Matrix Toolkit, or simply by placing data in a specific folder on the file system if using the Matrix Toolkit’s sweeper engine. Irrespective of which entry point is used, the data will take the same import path. This means this document, about supported data formats, actions, embargo, etc. is relevant to all data imports regardless of entry point.

Required Structures

In order to import metadata into the Fusion Matrix a minimum set of structural metadata must first be made available to the Fusion Matrix, either by directly loading a structural metadata file, or by linking the Fusion Matrix to the Fusion Registry (see Fusion Matrix Manager section). The required structure metadata are described in the following section. For information on how the structural metadata supports each process in the Fusion Matrix, see the Structural Metadata Section.

Data Structure Definition, Codelists and Concepts

The Data Structure Definition is required as it describes a dataset in terms of its Dimensions and coding schemes. It is not possible to process a data file without being able to link it to a Data Structure Definition which is used to interpret the file contents. Each Dimension in a Data Structure definition links to a Concept, which gives the Dimension semantic meaning, and optionally a Codelist, which provides an enumeration of allowed values.

The Fusion Matrix only supports time series data, so the Data Structure Definition must contain a Time Dimension. This is recommended to be the last Dimension in the Data Structure Definition.

On data import the Fusion Matrix will validate the data file to ensure it conforms to the Dimensions as described by the Data Structure Definition. If a Dimension is coded, the Fusion Matrix will ensure the reported value for the Dimension exists in the referenced Codelist. If the Dimension is not coded, then it may be given a Text Format to restrict the allowable content.

The supported Text Formats are presented in the table below:

Text Format

Validation

Decimals

Validates that the report value is numerical, and does not exceed the number of decimal places specified.

Min Length

Validates that the length of the reported value is not shorter then length specified

Max Length

Validates that the length of the reported value does not exceed the length specified

Start Value

Validates that the report value is numerical, and is not a lower numerical value then the value specified

End Value

Validates that the report value is numerical, and is not a higher numerical value then the value specified

Interval

Validates that the reported value is numerical and the remainder of the reported value minus the start value dived by the Interval is zero:

(reported value – start value) mod interval == 0

The start value must be set for the interval to be validated

Min Value

Validates that the report value is numerical, and is not a lower numerical value then the value specified

Max Value

Validates that the report value is numerical, and is not a higher numerical value then the value specified

Start Time

Validates that the reported value is a valid DateTime and does not predate the value specified

End Time

Validates that the reported value is a valid DateTime and does not postdate the value specified

A Text Format may also restrict the allowed Text Types. Not all SDMX text types are validated by the Fusion Matrix. The validation rules for each of the supported Text Types is listed below.

Text Type

Validation

Double

The Double data type is a double-precision 64-bit IEEE 754 floating point

Big Integer

Arbitrary-precision integer

Integer

The Integer data type is a 32-bit signed two's complement integer. It has a minimum value of -2,147,483,648 and a maximum value of 2,147,483,647 (inclusive)

Long

The long data type is a 64-bit signed two's complement integer. It has a minimum value of -9,223,372,036,854,775,808 and a maximum value of 9,223,372,036,854,775,807 (inclusive)

Short

The short data type is a 16-bit signed two's complement integer. It has a minimum value of -32,768 and a maximum value of 32,767 (inclusive).

Decimal

A Decimal consists of an arbitrary precision integer unscaled value and a 32-bit integer scale. If zero or positive, the scale is the number of digits to the right of the decimal point. If negative, the unscaled value of the number is multiplied by ten to the power of the negation of the scale. The value of the number represented by the Decimal is therefore (unscaledValue × 10-scale).

Note: NaN is not a valid Decimal. To support both decimal places and NaN use Float or Double.

Float

The float data type is a single-precision 32-bit IEEE 754 floating point

Boolean

True of False (case insensitive)

Date

A valid Date

Date Time

A valid Date

Year

Expects a 4 digit Integer

Time

A valid Date

Identifiable Reference

Validates that the reported value is a valid SDMX URN

URI

A Uniform Resource Identifier

Dataflow, Data Provider, and Provision Agreement

Whilst the Data Structure Definition provides information about how the data is structured, the Dataflow, Data Provider and Provision Agreement provide information and can be used to enforce restrictions on the data that is imported.

A Dataflow simply references a Data Structure Definition. It is possible for multiple Dataflows to reference the same Data Structure Definition. It is possible to enforce further restrictions on the Dataflow, for example it is possible to further restrict the allowable values in a Codelist referenced by a Dimension in the Data Structure Definition. Because of this capability, a single Data Structure Definition can be designed to represent datasets in multiple data domains, with the Dataflow defining a subset of allowable data.

A Data Provider is an Organisation that can provide data. A Provision Agreement links a Data Provider to a Dataflow, and can be thought of as a contractual arrangement for the Data Provider to be able to submit data for a Dataflow. In the Fusion Matrix all data is imported against a Provision Agreement.

The structure relationships between the Provision Agreement and its referenced structures are shown in the image below:

Showing the hierarchical realtionship of a Provision Agreement and its descendants

Constraints

A Constraint can be applied to a Data Structure Definition, Data Flow, Data Provider, or Provision Agreement to further restrict allowable content. Unlike the other structures, it is not a mandatory requirement to have Constraints in the system before data can be imported, but they can be useful if further reporting restrictions are required.

Showing the Universe of allowable content becoming further restricted moving up the pyramid

As can be seen in the above image, the Data Structure Definition defines a universe of allowable content for any Dataflow which makes use of its definitions. Each Dataflow can have their own restrictions defined on them to further constrain the allowable content in a dataset. Each Data Provider or Provision Agreement may have additional restrictions imposed upon their datasets. Constraints provide a powerful mechanism which allows reuse whilst enforcing content.

Data Format

The Fusion Matrix will accept the following SDMX data formats.

SDMX Version

SDMX Format

1.0

Compact, Generic

2.0

Compact, Generic, Cross Sectional

2.1

Structure Specific, Structure Specific Time Series, Generic, Generic Time Series

N/A

SDMX-EDI

It is important to note that, with the exception of SDMX-EDI, multiple datasets per message is not supported. Only time series data is supported, which means the data structure must contain a Time Dimension, and the dataset must have time at the observation level.

It is possible to load SDMX files into the Fusion Matrix in zip format. Zipping an SDMX file significantly reduces the file size which can have a significant saving on transfer time so overall processing is faster.

More information about SDMX formats can be found from http://sdmx.org. Sample files for each format are also provided in the Samples directory of Section 3B of the Standards.

In addition to SDMX datasets, the Fusion Matrix also supports CSV and XSLX (Excel) formats as described below.

CSV File Format

To import a CSV file, both the CSV file and an additional properties file which is used to describe the CSV, must be zipped into a single zip file. The properties file name must end with a .properties extension. The properties file must contain the following properties.

Property Name

Property Description

delimiter

The delimiter to use, allowable values are:

‘,’ (comma)

‘\t’ (tab)

‘;’ (semicolon)

‘ ‘ (white space)

startrow

The start row of the data, starting from row 0.

flowurn

The URN of the dataflow to read the data for

action

The dataset action to use, allowable value are:

APPEND,

REPLACE,

DELETE

[component id]

The id of the component followed by the position in the CSV, for example:

FREQ=1 (frequency is column 1)

To fix a value for a component use $ for example

COUNTRY=$UK (fix COUNTRY to UK)

XLSX File Format

An XSLX import supports the use case of grouping data by time, and rolling up values which are the same for every observation to the header of the file.

The importer will process high level information by reading each row, and if it encounters a value in column ‘A’ which is the same as an ID of a Dimension or Attribute as defined by the Data Structure Definition, then it will read the corresponding value in Column ‘B’. This will be the fixed value for the dataset [1].

The following image shows header information for Eurostat’s NA Main Aggregates Data Structure, whose definition can be found in https://registry.sdmxcloud.org. The Dataset Action, Data Structure Definition Name and URN are for information only and will be ignored by the importer.

Showing dataset header information, alongside rolled up dimension and attribute values

The import then expects to find a row where the cells are the Ids of each of the remaining Data Structure Dimensions and Attributes, followed by each of the time periods for which the data is being reported. This can be seen in the image below.

Showing a time series dataset in Excel

The column sequence for the XLSX is not important, as each column requires the ID of the component as the header for the column. Furthermore the start row of the header cells and the dataset is not important as the import process will scan each row looking for the information it requires.

The importer will ignore columns that do not correspond to an Attribute, Dimension, or Time Period. In this way it is possible to add additional columns which are provided for information only. It is not possible to add a blank row once the first series has been defined. This is shown below, where column ‘C’ has been added as an information only column which will be ignored by the import process. Row 37 is disallowed, as it does not contain values for mandatory Dimension cells. The presence of this row in the spreadsheet will cause the import to fail.

Showing a dataset with an invalid row in Excel

It is not possible to define multiple columns for the same Dimension, Attribute, or Time Period. The following image shows an illegal dataset because the ACTIVITY dimension ID has been repeated in columns B and C.

Showing a dataset with an invalid column in Excel

Time periods must be formatted as an SDMX Date Format (ISO 8601) as shown in the following table.

Format

Description

Example

yyyy-MM-dd'T'HH:mm:ss.SSSz

Date Time

2001-23-12T15:37:23.231z

yyyy-MM-dd'T'HH

Hourly

2001-23-12T15

yyyy-MM-dd

Daily

2001-23-12

yyyy-W[x]

Weekly

2001-W42

yyyy-MM

Monthly

2001-03

yyyy-Q[x]

Quarterly

2001-Q1

yyyy-B[x]

Bi-Annual

2001-B1

yyyy

Yearly

2001

Observation Attributes

It is possible to add observation Attributes by any of the following techniques:

  • defining a fixed value for the Attribute as described in the previous section
  • defining the Attribute values for each observation by adding a column for the Attribute
  • by adding the attribute value to the observation as a cell level comment. The comment should consist of the id of the attribute, an equal’s sign, and the reported attribute value (e.g. OBS_STATUS=M). Multiple comments must be separated by a carriage return.

An example of an observation overriding the fixed OBS_STATUS value is shown in the image below.

Showing an observation attribute as a cell level comment

Note: It is also possible to set a fixed value for an observation attribute at the level of the dataset and then override the fixed value for specific observations by using the cell comment.

Import Action

Every data import must specify an Import Action. The Action states which action the Fusion Matrix should perform when importing the dataset. The available actions are: APPEND, REPLACE, DELETE.

Append

A dataset that is loaded with action ‘Append’ may only add new data to the database, it may not overwrite any existing data. For example, it is possible to add new observations to a series, but it is not possible to alter a series attribute. It is possible to add a new attribute value, where previously there was not one.

Replace

A dataset that is loaded with action ‘Replace’ may add new data to the database, and may also replace existing values with new ones.

Delete

A dataset that is loaded with action ‘Delete’ will delete values from the database. A dataset imported with delete will be processed in the following way:

  • If a Series is encountered that has no Attributes and no Observations, the entire Series will be deleted.
  • If a Series is encountered that has Attributes, then the Series Attributes will be deleted.
  • If an Observation is encountered that has no Attributes, then the Observation will be deleted.
  • If an Observation is encountered that has Attributes, then the Observation Attributes will be deleted.

Bulk Load

If no data exists for a Dataflow, then the bulk loader will be used to load the first dataset. The bulk loader is roughly 20 times quicker than the incremental loader which is used when data exists already. All subsequent loads will use the incremental loader. To take advantage of the bulk loader for each load, then all existing data would have to be deleted and a full replace dataset loaded. This can be achieved by using the Fusion Matrix Toolkit to delete all data for a Dataflow.

Import Lifecycle

The data import is split into five distinct stages: structural metadata collection, pre-process, import, post-process, data registration.

Showing the import lifecycle

Structural Metadata Collection

The Metadata Collection stage is the process whereby the Fusion Matrix obtains all structural artefacts required to process the Import request. This includes the Provision Agreement and all referenced structures (Dataflow, Data Structure, Codelists, Concepts).

Pre-Process Validation

After the relevant metadata has been collected, the data is pre-processed. This is a data validation stage that occurs before the import is placed on the queue, and therefore before the database transaction is created. Pre-Process validation uses the relevant structural metadata to ensure the data is valid. This process includes validating that the expected Dimensions are present and that the reported values are of the expected type and have an allowed value.

Import

After the pre-process stage, the import request is placed onto an Import Queue. The Importer will process one import request at a time. If multiple data files are loaded, they will be stored in the Import Queue for sequential processing. The Import phase ensures the relevant database tables exist for the import and if they do not exist, the database tables will be created. If the tables do exist, they may be modified depending on if the supporting structural metadata has been modified since they were last updated.

A new database transaction is then created: all database writes that occur within the transaction are isolated to the transaction. Database modifications performed in a transaction are not visible to any other process accessing the database. Any failure within the transaction results in a full rollback of information, leaving the database state unaltered from its original state.

Post-Process Validation

The post process stage occurs after all the data has been written to the database, but before it has been committed. The data is in its final location, but it is still isolated in its transaction layer and therefore not exposed to other running processes. Database checks are performed at this stage to verify the integrity of the data with regards to the database as a whole.

Registration

If the Fusion Matrix Manager has been used to connect to a Fusion Registry, and if “auto register” has been set to true, then on completion of a successful data import, the Fusion Matrix will automatically register the Fusion Matrix-UI web service with the Fusion Registry. If a Registration already exists in the Fusion Registry for this data source, then it will be updated, otherwise a new registration will be created.

Validation Rules

Data validation is performed throughout the data import lifecycle. The following diagram shows the validation performed at each stage of the lifecycle.

Showing the validation lifecycle

Revisions to Observations

If a dataset is submitted with action REPLACE or DELETE and has the outcome of modifying an observation’s value or a reported Attribute value, then the original value will be preserved as a revision and will be made available via the Fusion Matrix web service. The Matrix SDMX Web Services Section provides information on how to ask for Revisions to be included in the dataset.

When querying for data, it is possible to request for revisions to be included in the response. A data response which includes revisions will contain multiple data sets, each data set defining an action and a valid from or valid to date. If the dataset has action ‘Replace’ then the valid from date is used to define the date the observations in this dataset were loaded into the Fusion Matrix. If the action is ‘Delete’ the dataset’s ‘valid to’ date is used to define when the delete dataset was submitted, and therefore what date the observations in the dataset were valid until.

Revisions Example

A dataset is submitted with 2 series, S1 and S2 as shown in the table below.

Action

Series

Observation

Value

Replace

S1

A

12.1

Replace

S1

B

12.9

Replace

S1

C

13.2

Replace

S2

D

222

Replace

S2

E

22.3

A dataset is submitted which replaces an existing observation value for Series S1, and adds a new observation.

Action

Series

Observation

Value

Replace

S1

A

12.5

Replace

S1

X

14.2

A dataset is submitted which deletes observation E for Series S2.

Action

Series

Observation

Delete

S2

E

A user queries for all data with revisions included, the response to this query would contain three datasets:

Dataset

Action

Valid From

Valid To

Series

Observation

Value

1

Replace

T

S1

A

12.1

1

Replace

T

S1

B

12.9

1

Replace

T

S1

C

13.2

1

Replace

T

S2

D

222

1

Replace

T

S2

E

22.3

2

Replace

T+2

S1

A

12.5

2

Replace

T+2

S1

X

14.2

3

Delete

T+3

S2

E

If the user were to make the same query but exclude revisions then the result would show a single dataset, as shown below.

Series

Observation

Value

S1

A

12.5

S1

B

12.9

S1

C

13.2

S2

D

222

S1

X

14.2

Embargo

A dataset loaded with an embargo date will not be published before the embargo date has passed. As soon as the embargo time passes the dataset will be made available via the Matrix web services, and very shortly after this it will be made available via the Matrix GUI.

Data which has been embargoed will be stored in the database, but will not be made available externally until the embargo time has passed. If an observation is imported with an embargo which replaces an existing observation value, then the existing observation will remain active until the embargo time passes, at which time the embargo observation will be active and the previous observation will become a revision. Note that an observation for any particular series key and time period can be revised and both revised non-embargoed and embargoed observation values will be processed appropriately and in sequence.

Embargo is only supported if the Matrix Manager has been used to set up a link between the Fusion Matrix and Fusion Security. If this link has not been configured, then embargo will not be supported. This is described in the Matrix Manager Security Connection section.

A dataset may include an embargo time in the header of the message: if an embargo time is found in the header then this will be applied to the dataset. Additionally, in both the Matrix Manager and Matrix Toolkit it is possible to explicitly specify an embargo date/time. If an explicit embargo date/time is provided then this will take precedence over any embargo information found on the header of the dataset.

It is important to note that Fusion Matrix will not support an embargo dataset if it has action 'Delete'.

How Secure is Embargo?

Embargo observations are held very securely in the Fusion Matrix. Embargo observation values are encrypted in the Matrix database using a strong asymmetric cryptography algorithm known as public key cryptography. The same cryptography technique is used to verify digital signatures and underpins the TLS protocol on the Internet. The principal idea in public key encryption is that a pair of encryption keys are generated which, although very different, are mathematically linked. One of the key pairs is known as the public key and is used to encrypt the information. The second key in the pair is known as the private key and is used to decrypt the information. Although it is possible to encrypt data with a public key, it is not possible to decrypt the encrypted data with the public key. Encrypted data can only be decrypted with the public key’s corresponding private key. Public key cryptography ensures that even if a third party gains access to the public key they will not be able to use this to decrypt the information.

The Fusion Matrix uses public key cryptography to encrypt the data, but the Fusion Matrix does not have access to the private key at any stage of the encryption process. The application responsible for creating the public/private key pair is Fusion Security.

When importing a dataset with an embargo, Fusion Matrix will request that Fusion Security creates a public/private key pair and to not release the private key before the specified embargo time. Fusion Security will generate the public/private key pair however it will only issue the public key to the Fusion Matrix. The Fusion Matrix uses the public key to encrypt the observation values before storing them in the database. Embargoed observation values will not be queried by the Fusion Matrix until after the embargo time has passed.

In the unlikely event that the Fusion Matrix queries for the observations early, for example if an attacker manages to modify the server time, then it will not be able to decrypt the values, as it does not have access to the private key. If the Fusion Matrix encounters an encrypted observation, which it cannot decrypt as Fusion Security does not release the private key, then it will be discarded and will not be included in the response.

Note: Fusion Security plays a pivotal role in ensuring the confidentiality of embargo data, and an embargo time cannot be set on a loaded dataset unless a connection to Fusion Security is configured. To configure Fusion Security please see Fusion Security Connection.

How timely is the Release of Embargo Data?

Embargo data is made available the second the embargo time passes via the web API, and the User Interface is updated shortly afterwards. This is made possible using the Fusion Matrix on-the-fly decryption process. On-the-fly decryption is used on encrypted values which are due to be decrypted on the database, but where the process has not yet complete. In this scenario (embargo time is past but the data value is encrypted) the Fusion Matrix will request the private key from Fusion Security to decrypt the data. If Fusion Security agrees that the embargo time has passed it will release the private key, and the Fusion Matrix will use this to decrypt the data.

Although the data will be available via the Fusion Matrix User Interface when the embargo time passes, the indexes will not be updated until the data has been fully decrypted on the database. This means if the embargoed data set contained data for a new dataflow, or a new series for an existing dataflow, the Fusion Matrix UI will not update to reflect this until the database decryption process has completed.

The Fusion Matrix Manager is responsible for running the decrypt process, and as such will need to be running for database decryption to take place. The database decryption time depends on the size of the dataset to be decrypted. It should not differ too much from a standard import time of a dataset the same size.

Embargo Data and Automatic Registration

If the Fusion Matrix has been configured to automatically register a new dataset with the Fusion Registry, then the registration event on an embargo dataset will occur as soon as the database decryption process has completed.

Matrix User Interface

The Fusion Matrix UI supports five main use-cases:

  1. It provides a web frontend to allow users to explore and view the Fusion Matrix datasets.
  2. It provides SDMX-compliant web services for structures and data (SDMX-WS). These allow clients to obtain data and structures via a Web Service API using an ISO recognised standard (ISO 17369: SDMX). If required, existing open source software is available to interact with a client, such as Java libraries that can be downloaded from http://sdmxsource.org.
  3. It provides bespoke web services (Matrix-WS) which consolidate information obtained from the SDMX structures, data, and metadata to provide simple packets of information in JSON format. This information is used to assist in the development of data discovery user interfaces. Both the Matrix-UI HTML data discovery Interface and the Fusion Browser Excel plugin are built from the information obtained from the Matrix-WS.
  4. It provides metadata authoring capabilities allowing admin users to create, edit, and delete reference metadata which can attach to various parts of the SDMX information Model (the internal model used by the Matrix) including datasets, dimensions, and observations.
  5. It provides the means to publish pre-defined data queries to the Matrix. These queries are then made available to the Matrix HTML Interface via the Matrix-WS.

It is important to note that the HTML User Interface (UI) provided in the Matrix-UI is just one way to offer data discovery and data display. The UI was built using both information obtained from the Matrix-WS and the SDMX-WS. One use-case the Matrix-UI supports is the ability to rapidly assemble a metadata driven dissemination interface, built using an internationally recognised ISO-compliant standard.

The Fusion Matrix UI provides an up to date view of structure, metadata, and data in the system. This means if changes are made in any supporting application, these are reflected in the Matrix UI with immediate effect . For example:

  • If the Fusion Matrix Manager is connected to a Fusion Registry, and a Dataflow name is changed in the Fusion Registry, this change will propagate through to the Matrix-UI as soon as the Matrix Manager completes its update.
  • If any new data are loaded into the Matrix via the Toolkit, Sweeper, or Matrix Manager, the new data will be available in the Matrix-UI immediately.
  • If any data are deleted, this will also be reflected immediately.
  • If a user registers a new data location in the Fusion Registry, the Matrix-UI will make this data available as soon as the Matrix Manager has completed indexing the dataset.

The Matrix-UI HTML Interface provides a view of all known datasets regardless of in which data source the datasets reside. In some circumstances the data for a single dataset may reside in a number of data sources. Filters are provided to support filtering by Dataset, by Category (topic), and by Concept. Any published pre-defined queries will also be displayed in the home page of the Fusion Matrix UI.


Showing the front page of the Matrix-UI

It should be noted that the login button, shown on the top right of the page, is only available if login has been enabled in the set-up of the Fusion Matrix.

Viewing Data

To query for and view data, click on the 'Browse Dataset' button of the dataset you wish to view. The default set up for the Fusion Matrix is to restrict the number of viewable series to 300. This restriction is only imposed on the web UI. Both the export feature and the Matrix web services do not enforce any restrictions on maximum download limit. The restricted observation limit is 100 times that of the number of restricted series limit, so the default limit to display is 300,000 observations. This restriction is in place to prevent an undue delay in the web browser rendering the data table.

By selecting Codes in the Dimension filters, it is possible to query for a subset of the full dataset. Each time a Code is selected, the Fusion Matrix-UI will perform a query to the Matrix-WS to check for two things:

  1. Which Codes remain valid selections in each of the other Dimensions, based on the current selection state.
  2. How many series and potential observations would result from the query, based on the current selection state.

After the Fusion Matrix receives the response to the query, it will check to see if the response size would be less than the imposed threshold. If it is less, then the Fusion Matrix-UI will query for the dataset and show the response as a data table.

fig2
Showing the response to a data query. Move control (for Series) also shown

The resulting dataset will be shown in a data table with the Code selections being shown in the filters panel. Although the Fusion Matrix will try to prevent invalid selections, it is still possible to create a query which does not bring back data for some of the Code choices. This is generally due to the selected time interval. If there are selected Codes with no data in the corresponding dataset, this will be indicated to the user by highlighting the code selection in red in the filters bar.

The Matrix-UI will try to organise the rows and columns of the data table in the most readable way, however it is possible to change the header and row order, swap a header element to the row or vice-versa, or even push a dimension to a cross section control. To change the header or row columns in the data table, click on the relevant header cell and a move control (as shown in the above image) will be displayed. Use the arrow controls to move the header or row position, and the central button to push the Dimension into a cross section control, as shown below.

Showing a data table with the series dimension in the cross section

Any observation cell can be clicked which will display the observation details in the side bar as shown below.

showing the details of a single observation

Pre-defined Queries

The 'Save Query' button allows the current query state to be saved locally to a browser, as a Cookie. The query state includes the layout of the data table. This functionality is available for all users. In addition an admin user can use the same mechanism to publish a query to the server. A published query will be made available to all users of the Matrix-UI.

In order for an admin user to publish a data query, log into the Matrix-UI by clicking on the login button located at the top right of the page. Once logged in, browse a dataset and construct a data query. Use the pivot controls on the data table to layout the table header and columns as required, and then click 'Save Query'. The following dialogue will be displayed:

showing the save query dialogue

The non-admin user will see just the Save and Cancel options. The admin user will additionally be given the option to Publish the query to the server where it will appear in the list of stored queries for this dataset.

Showing a pre-defined query for the dataset

An admin user it is also possible to delete the query at this point. Non-admin users will not see the delete query button.

Showing the pre-defined query in the home page's filter by dataset section

The pre-defined query will also appear in the filter by dataset section, as shown above. In addition, the query will be available to view on the dataflow panel on the home page (by clicking the down arrow next to the dataflow name), as shown in the image below.

Showing the pre-defined query on the dataflow panel on the home page

The pre-defined query will also be made available via both the Category and Topic function of the Matrix-WS, as described in the Web Services section.

Revisions to Observations

If a dataset is loaded which modifies the value of an observation or attribute attached to an observation, the previous observation value will be made available as a revision. This is indicated in the data table where the icon is displayed next to the observation value. This is shown in the image below.

Showing an observation with a revised value

The latest revision will always be displayed in the data table. To view previous values for the observation, click on the observation cell to display the Observation Details in the left hand sidebar. A select list will be displayed containing all the dates that the observation was revised. On changing the selected revision, the observation details will be updated to reflect the chosen revision of the observation.

If an observation value, or observation attribute, is to be modified, then the dataset must take the action REPLACE otherwise the import will fail. The Fusion Matrix will discard any updates to an observation if there were no modifications to the observation. If the same dataset is submitted twice with no changes between submissions, no revisions will be created.

If an observation is deleted, this will be made apparent in the data table with the cell text -deleted-. Clicking on a deleted cell will display the observation details, together with the ability to view all previous revisions.

To delete data from the Matrix permanently the delete command on the Matrix Toolkit must be used.

Reference Metadata

The metadata facility of the Fusion Matrix described here is concerned with what SDMX calls “reference metadata”. Reference metadata comprises quality or footnote metadata such as may be used in data quality frameworks as used by the Eurostat, IMF,ILO, OECD and many others, and other metadata that is relevant to actual data disseminated in a “data set” but is not an integral part of the data as stored in the statistical database.

Reference metadata has many forms, it is everywhere, it is authored in many ways using different tools and is consequently stored in many forms, it is often not in a centralised accessible resource and consequently linking the metadata to the construct or data slice to which it relates at a granular level can be difficult.

The metadata in the repository needs to be authored, maintained, and, possibly, imported from external sources. Metadata does change and maintaining the historicity of metadata can be an important feature. In a dissemination system metadata needs to be available in a repository that is accessible to the dissemination service.

The Matrix-UI provides the ability to author reference metadata which can be attached to a Dimension, Concept, Code, Dimension Code (the code in the context of the dimension), Dataflow, Data Structure, or Observation. The reference metadata is stored as SDMX Metadata Sets and made available to external applications through the SDMX-WS. It is possible to attach reference metadata to datasets and observations even if the underlying dataset is not held in the Fusion Matrix. This makes it possible to reference external data sources, and attach metadata to data points in the data source. On exporting the dataset, the Fusion Matrix will enrich the dataset by adding annotations into the dataset, which link to the relevant reference metadata.

In order to create Reference Metadata, the Matrix or linked Fusion Registry must contain a Metadata Structure Definition (MSD). The MSD contains one or more Metadata Reports which define the template for a report along with the allowed attachment points for the Metadata Report. Further details about MSDs is found in the Reference Metadata section above.

In order to create or edit metadata the Matrix-UI must be set up to allow metadata authoring. This is stated by simply permitting users to log-in to the Matrix UI and is discussed in the Matrix UI Installation section. If metadata authoring is enabled, it will be possible to login to the Fusion Matrix-UI by clicking the login button at the top right of the page.

Reference Metadata Attachment

Reference Metadata can be attached to:

  • Structural metadata artefacts such as Dimensions, Codes, and Concepts.
  • Data artefacts such as an Observation, or a full or partial series.

The Fusion Matrix-UI offers metadata authoring capabilities from within a dataset. So in order to create metadata, first browse a dataset. On viewing a dataset, the side bar, which contains a Details and Filters tab, will contain edit metadata icons. These icons are placed next to the relevant artefact to attach metadata to. It should be noted that there will only be an icon placed next to an artefact if one or more Metadata Reports (in the MSD) specifies it as an allowable construct to which metadata can be attached (called the "target" in the MSD).

Even though the create metadata links are placed in the context of a dataset that is being browsed, it is also possible to attach the metadata at a higher level. For example it is possible to click on the edit metadata icon next to the code 'United Kingdom' in the context of the 'Reporting Country' dimension for the 'Exchange Rates' dataflow. On selecting this attachment, it is possible to specify that the metadata should in fact attach to all occurrences of 'United Kingdom' regardless of Dimension and Dataflow.

Defining the Metadata Structure Definition

Before Reference Metadata can be authored, the Fusion Matrix must have been loaded with at least one Metadata Structure Definition, which defines the available Metadata Reports and report Targets (where the target defines what the Reference Metadata can be attached to). We would recommend using the Fusion Registry to author and maintain a Metadata Structure Definition (MSD).

To construct a Metadata Report template which can attach to a Dataflow or DSD, follow the steps below:

Create a Metadata Structure Definition

The first step is to create a Metadata Structure Definition (MSD). To do this in the Fusion Registry, open the Maintenance UI, navigate to the Metadata structure Definitions, and click the new icon.

Give the MSD an Id, Name, and version, and click Save.

The MSD is made up of two parts, the Metadata Target and Report Structure. The Metadata Target section defines a list of allowable targets, or attachment points for the Reference Metadata, an example of a Metadata Target is a Code, or an Observation. The Report Structure section defines one or more Report Structures, which define the template for the Report. A Report Structure must link to one or more Metadata Targets.

Define Metadata Target(s)

The Metadata Target section is used to define an allowable target to attach a Reference Metadata Set to. Note that the Metadata Target does not reference a specific Code, Observation etc. but instead it identifies the type of structure it can attach to. It is when the user is authoring the actual metadata set when they choose the actual instance of the artefact to attach to.

The image below depicts five different targets any of which can be used by a Report Structure.

A Metadata Target can be used to define a link to a specific structure type in the SDMX Information Model, for example a Code, a Dimension, Dataflow etc. A Metadata Target may also be used to define a link to a specific Observation Value or Series in a Dataset.

To create a Metadata Target to link to a specific structure type, first create a new Metadata Target, giving it an Id, as shown below.

After clicking Save, ensure the Metadata Target is selected, and then create a new Identifiable Target. The Target Structure should be set to the type of structure that you wish to allow the Reference Metadata to be attached to. In the image below, the selected structure is the Dataflow.

It is possible to define multiple allowable Identifiable Targets for a single Metadata Target.

To create a Metadata Target for a Series, Observation, or a Code in the context of a Dimension, the Metadata Target must be created to allow a Key Descriptor Target.

In addition, the Metadata Target must also define an Identifiable Target to a Dataflow, as shown below:

Define Report Structure(s)

A Report Structure is made up of 3 parts, the Id or the Report Structure, the allowed Targets, and the Report Structure itself (the Attributes). The image below shows a Report Structure for QUALITY_METADATA_1.

The Targets and Attributes section of this page shows the information for the currently selected Report Structure. In the above image there is only one Report Structure defined. In the above image the Report Structure defines a link to all five of the Metadata Targets, which means Metadata being authored for this report can attach to any of the points defined by any of the referenced Metadata Targets.

Authoring Metadata

Once the Fusion Matrix has get access to a Metadata Structure Definition, it will be possible to author metadata. Only Authenticated users can author metadata, so the first step is to login to the Fusion Matrix-UI by clicking the login button at the top right of the page.

Once logged in, navigate to the appropriate Dataset to Author metadata for. The following sections describe Metadata Authoring for each supported Metadata Target.

Dataset Metadata Attachment

If an MSD exists which can attach to a Dataflow or DSD, then a small edit icon will be placed to the right of the Dataset Details heading as shown below.

Showing the create metadata link for the dataset

Dimension   Concept Metadata Attachment

In order to attach metadata to a Dimension or Concept there must be a Metadata Report defined (in the MSD) that can attach metadata to either a Dimension or a Concept (called the "target" in the MSD). If attaching metadata to a Dimension, then the metadata will only be associated to the Dataflow to which the Dimension belongs. If attaching metadata to a Concept, then the metadata will be associated to any Dimension that references the Concept.

Showing the create metadata link for the Dimension or Concept

Dimension Code   Code Metadata Attachment

Metadata can attach to both a Code, and a Code in the context of a particular Dimension (Dimension Code). In order to attach metadata to a Dimension Code there must be a Report Structure that specifies the ability to attach metadata to a key or partial key (called a Key Descriptor target in the Information Model). In order to attach metadata to a Code there must be a Report Structure that specifies a Code target. If attaching metadata to a Dimension Code, then the metadata will only be disseminated with the Dimension for which the Code is a member and importantly the annotation link will only be present in an SDMX Dataset, not the SDMX Structure Message. If attaching metadata to a Code, then the metadata will be disseminated with any Dimension that contains the Code, and an annotation will be present in an SDMX structure message, not the Dataset.

Showing the create metadata link for the Dimension Code or Code

Observation Metadata Attachment

In order to attach metadata to specific Observation there must be a Report Structure that specifies a Key Descriptor target. To create Observation metadata, the Matrix-UI should be used to display the Observation in the data table, on clicking the Observation cell the Observation details will be displayed on the left hand details tab. The create metadata button will be displayed next to the Observation Details header as shown in the image below.

Showing the create metadata link for an Observation

Creating a Metadata Report

Once a metadata attachment has been selected, and the create metadata button has been clicked, a Metadata Report form will be displayed on the page as shown below.

Showing a metadata report being authored which is attached to a dataflow (Dataset)

The Report Details section provides three fields which can be modified:

  1. Report Template: This defines the Report Template to which the Metadata Report will conform. This is defined in a Metadata Structure Definition (MSD).
  2. Only the Report Templates shown are those that are able to be used to attach metadata to the selected target.
  3. Report Name: This is a free text input field providing the means to provide a human readable name for the report.
  4. Report Target: This is a drop down list offering the ability to select to which artefact the Metadata Report will be attached. This will offer an alternative target if one exists. Alternative targets will be based on what the current target is, for example if the current target is a Dimension, an alternative target would be the Concept which the Dimension references. An alternative is only offered if the selected Report Template is able to be used to attach metadata to the selected target.
  5. On selecting the Report Template, the Report Attributes section will be updated to reflect the allowable Metadata Attributes for the template. A Metadata Attribute is a facet of the Report Template. A single Metadata Report can contain metadata for one or more of the Report Attributes. To change the selected Report Attribute, simply click on the attribute name. To author metadata for the selected Report Attribute, simply type into the editor console. The editor will create HTML mark-up and this is what will be stored in the metadata repository. The default view of the editor is to show the rendered HTML, however it is possible to click the 'Source' button to view the underlying HTML. It is possible to directly write HTML in the 'Source' view. It is possible to maximise the editor console by clicking on the icon.

    To save the metadata report, click on the 'Save' button, this will save all content authored for each metadata attribute.

    Viewing Metadata

    When reference metadata is available for a particular artefact, a small 'i' icon will be placed next to the artefact. If metadata is attached to an observation the icon will be placed next to the observation value in the data table.

    Showing a metadata link for the Australia code

    By clicking on the 'i' icon the Details tab will be updated to display the available metadata. It is possible to have multiple Metadata Reports attached to a single artefact. Each Metadata Report is added as an additional accordion graphical element, with the title of the Metadata Report as the accordion title.

    Showing Population Metadata displayed in the Details tab

    It is important to note that the Fusion Matrix-UI uses JavaScript plugins which may alter how some HTML elements are displayed, for example a bullet point list may not display the bullet points. It is important to note that this is just the rendering chosen for the supplied Matrix-UI: other clients which obtain the metadata from the Matrix web services will be able to choose how the information is rendered.

    Editing Metadata

    To edit a metadata report login to the Matrix-UI as an admin user. View the metadata as explained in the previous section. Since you are logged in as an admin user, edit and delete controls are placed after the metadata text, as shown in the image below.

    Showing edit and delete metadata controls after the metadata text

    After editing or deleting metadata, refresh the browser page to display the changes.

    Please note, that the "Add Metadata" control is also available here since you may add multiple metadata reports to a single artefact.

    The Matrix-UI will expose the availability of metadata in the data and structure documents obtained from the SDMX-WS. The Matrix will enrich both SDMX Structure Messages and SDMX Datasets with Annotations which link to additional Reference Metadata. The Annotations annotate the element in the document to which the metadata is attached. The Annotation Type is 'METADATA'; the Annotation Title is the name of the Metadata Report; the Annotation URL resolves to the full Metadata Set. Annotations are supported in all versions of SDMX-ML including the SDMX-JSON data message.

    Showing a dataset level annotation referencing the Population Metadata report
    Showing a structure message with metadata attached to the Algeria code in the World Bank country codelist

    If the metadata are attached to a partial key in a dataset, then it will be placed as a dataset level Annotation and contain an additional Annotation Id, which will describe the partial key. The partial key is built up using the same mechanism as the SDMX REST data query, where each Dimension is separated by the period '.' character, and each Dimension value is put in the relevant place. If there is no Dimension value then '..' is used to skip to the next Dimension. For example if full key is 'A.UK.DE.12' and metadata is attached to only the dimension code UK for the Reference Area Dimension, then the Id would be '.UK..' indicating a wildcard for the first, third, and fourth dimension.

    The web services section contains further information on obtaining metadata via the Matrix web services.

    Data Portal

    The Fusion Matrix can be used as a portal, combining one or more data sources and exposing the data in a standard format.

    Data sources include databases such as Oracle, SQL Server, or MySql, and web services which implement the SDMX REST specification.

    Database Data Sources

    If using a database as a data source, then the database connection details must be configured using the Matrix Manager. The Matrix Manager must also be used to define which dataflow maps to which data source.

    How does the Matrix know how to query the database?

    The Fusion Matrix has a naming convention for database table names and database column names which are based on the dataflow id and the components of a data structure definition (DSD). A full description and example is given in the Matrix Manager Section.

    fig2
    showing two dataflows mapping to two separate data sources

    If the database columns names do not correspond to the required DSD output, then it is possible to define an ‘internal DSD’ to reflect the database structure, and map this to the external DSD using Structure Maps.

    fig2
    showing exchange rates data flow using the ExRates Internal dataflow as its data source. The ExRates Internal dataflow in turn uses the Oracle DB. Queries against Exchange Rates will be mapped before querying the database. Responses from the database will be mapped to conform to the Exchange Rates dataflow.

    Structure Maps are described in the Fusion Registry Documentation.

    Web Service Data Sources

    Connecting to a web service, as opposed to directly connecting to the database, offers a number of advantages, such as:

    1. The underlying data source is never exposed
      • There is no coupling between the data source type and the Fusion Matrix. If the underlying database were to change, for example, it will not affect the Fusion Matrix.
      • There is no need to add security credentials to connect to the data source.
    2. As the connection is via HTTP, the data source can be external to the organisation.
    3. The Fusion Matrix obtains available web service information from data registrations available in the Fusion Registry. The Fusion Matrix will expose any data sources as they are registered in the Fusion Registry.
    4. All the external data sources can connect to the same Fusion Registry to obtain structure information in order to process data queries.
    5. Unlike a database data source, which only maps to a single dataflow, there may be multiple web service data sources for a single dataflow.
    6. Data query processing is shared amongst each web service which contains results for a data query.

    As noted above, the Fusion Matrix obtains web service data source information from the Fusion Registry, and as such a Fusion Registry connection must be defined. Connecting to a Fusion Registry is defined in Matrix Manager - Registry Connection Section.

    The web service data source must implement the SDMX web service specification. The response format from a web service can be any version of SDMX (generic, compact, structure specific, or EDI). The Fusion Matrix will convert the response message to conform the data format that was requested by the client.

    SDMX Web Services

    The Fusion Matrix makes all of the structural metadata, data, and reference metadata available via RESTful web services. In addition to the SDMX compliant web services (SDMX-WS), the Matrix-UI also provides web services specific to the Fusion Matrix (Matrix-WS). The SDMX-WS supports the RESTful specification as described in the SDMX Web Services Guidelines.

    The Matrix-UI SDMX web service entry point is:
    http://[server]:[port]/[application]/ws/rest

    Example REST Entry Point:
    http://demo.metadatatechnology.com/FusionMatrix/ws/rest

    It is possible to append to this URL the web service query, as specified in the SDMX Web Services Guidelines, examples are shown in the following sections.

    Structure Query

    In addition to the parameters in the SDMX specification, the SDMX-WS also supports the following parameters:

    Parameter Values Default Description
    version edi/1.0/2.0/2.1 2.1 For SDMX dataset outputs only. The version of SDMX message.
    forceSaveAs [file name] No Default If set, then the output will be saved to a file of the given name.

    It should be noted that in addition to supporting the output version in the version parameter, the Content Negotiation header may also be used for this purpose as described in the SDMX Web Services Guidelines.

    Example structure query
    http://demo.metadatatechnology.com/FusionMatrix/ws/rest/dataflow?forceSaveAs=myFile.xml

    Data Query

    In addition to the parameters in the SDMX specification, the SDMX-WS also supports the following parameters:

    Parameter

    Values

    Default

    Description

    format

    structurespecificdata

    genericdata

    sdmxjson

    edi

    excel

    csv

    structurespecificdata

    The format to output the dataset in. The first 4 are SDMX datasets.

    version

    1.0

    2.0

    2.1

    2.1

    For SDMX dataset outputs only. The version of SDMX message.

    resolve

    true

    false

    false

    For Excel/CSV output only. If true code labels will be resolved.

    delimiter

    comma

    tab

    semicolon

    space

    comma

    For CSV outputs only. The delimiter to use between fields

    time

    xaxis

    yaxis

    xaxis

    For CSV outputs only. Whether to output time on the x-axis or y-axis

    sep

    [any string]

    :

    For CSV outputs only. The delimiter to use between series key identifiers.

    forceSaveAs

    [file name]

    No Default

    If set, then the output will be saved to a file of the given name.

    includeHistory

    true

    false

    false

    If true then all previous revisions to the observations will be included in the response.

    In addition to supporting the output format and version in the URL parameters, the Accept header may also be used as described in the SDMX Web Services Guidelines. In addition to the specified headers, the following Accept header values are also supported.

    Accept Header

    Format

    application/vnd.csv;delimiter=comma;decode=true;time=xaxis

    CSV format, where the optional parameters are delimiter, decode, and time. The parameters take the same values are described in Table 2.

    application/vnd.xls;resolve=true

    XLS format, where the optional parameters is resolve and can take the same values are described in Table 2.

    Example Data Query:

    http://demo.metadatatechnology.com/FusionMatrix/ws/rest/data/OECD,IMMIGRATION,1.0?includeHiestory=true&forceSaveAs=myFile.xml

    Example UI

    The Pivot Table in the Fusion Matrix UI (shown below) uses the data query web services getting the response in JSON format.

    Showing the Pivot Table built from a JSON data query response

    Metadata Query

    The SDMX-UI supports queries for Reference Metadata. The query can either request all metadata which relates to the results of a particular data query, or it can request metadata for a specific structure via a structure query. The web service also supports a query for a specific metadata set.

    The metadata web service entry point is:
    http://[server]:[port]/[application]/ws/rest/metadata

    An SDMX structure query or data query can be post fixed to the entry point. For example:
    http://demo.metadatatechnology.com/FusionMatrix/ws/rest/metadata/dataflow/OECD/POPULATION/1.0

    The above query would return any Metadata Sets which are attached to the OECD 'POPULATION' Dataflow.

    The following query would return any metadata which attach to any of the observations resulting for the given data query.
    http://demo.metadatatechnology.com/FusionMatrix/ws/rest/metadata/data/OECD,POPULATION,1.0/AUS.YP9901L1.FE.

    To query for a specific metadata set, with a given identifier, the following URL can be used
    http://demo.metadatatechnology.com/FusionMatrix/ws/rest/metadata/set/[metadata_set_id]

    The result will be a SDMX version 2.1 Metadata document. To retrieve the same information as a JSON document, the format parameter can be used with the value set to 'json'. For example:
    http://demo.metadatatechnology.com/FusionMatrix/ws/rest/metadata/set/b53a14bb-4abb-450a-a715-93987056ead2?format=json

    The Fusion Matrix will also embed annotations which link to further Referernce Metadata in Structure and Data documents, as shown by the following URL which returns a SDMX dataset linking to both Dataset and Observation level Metadata.

    http://demo.metadatatechnology.com/FusionMatrix/ws/rest/data/OECD,POPULATION,1.0/AUS.YP9901L1.FE.

    Matrix-WS

    The Matrix-WS are provided in addition to the SDMX Web Services (SDMX-WS). They provide developer-friendly packets of information which support data discovery, and the rapid development of custom user interfaces. The information exposed by the Matrix-WS is derived from combining information from the SDMX Structures, Metadata, and Data known to the Fusion Matrix. Both the Matrix-UI and the Fusion Browser Excel plugin are built using information from the Matrix-WS to assist in data discovery.

    The Matrix-WS is a REST web service which returns JSON messages. The Matrix-WS specializes in packets of information which assist in data discovery.

    The Matrix-WS entry point is:
    http://[server]:[port]/[application]/ws/rest/databrowser

    The web service entry point is post fixed with one of three path parameters, each one providing a particular function as described below.

    Categories

    The categories function returns high level information about the available Dataflows in the Fusion Matrix. Each Dataflow is Categorised as specified by the structural metadata and defines a name, description, dimensionality, series count, data start and end dates and any pre-packaged queries associated with them.

    The categories URL does not require any additional parameters

    Example Categories Query

    http://demo.metadatatechnology.com/FusionMatrix/ws/rest/databrowser/categories

    Example UI

    Showing a HTML web page built from the categories web service response

    Example JSON

    The Category JSON consists of a hierarchical array of categories, each containing zero to many topics. Each topic defines high level details and contains zero to many pre-packaged queries.

    Showing the JSON response for Categories

    The above JSON can be obtained from the following URL:
    http://demo.metadatatechnology.com/FusionMatrix/ws/rest/databrowser/categories

    Topic

    The topic function returns information about a particular Dataflow, which is identified in the URL path. The returned information contains the start and end dates for the data available for the Dataflow, the dimensionality together with a hierarchical view of the codes for each Dimension. The code hierarchy is built from either a simple hierarchy as defined in the Codelist, or it can be a complex hierarchy defined in a linked Hierarchical Codelist. Hierarchical Codelists are discussed in the 'Matrix Structural Metadata User Guide'. The Codes in the response will only be present if there is data for that Code, or if the Code is used in a hierarchy and has child codes which contain data. If a Code is present purely to form part of a hierarchy, and has no data associated with it, then it will indicate this by having the 'displayOnly' JSON element set to true.

    The topic will also contain any pre-packaged queries that have been published for the dataset.

    Additionally the topic, its Dimensions, or Codes may contain links to additional Reference Metadata.

    Example Topic Query

    The following query will bring back information about the WDI_EDUCTION dataflow maintained by the World Bank (WB). The dataflow version is 1.0.
    http://demo.metadatatechnology.com/FusionMatrix/ws/rest/databrowser/topic/WB:WDI_EDUCATION(1.0)

    Example UI

    The Topic function is used in the Matrix-UI to build the dimension filters is shown below.

    Showing the dimension filters built from the JSON returned from the Topic web service function

    Example JSON

    A topic contains high level metadata, pre-packaged queries, and dimensions with a hierarchical view of codes.

    Showing the JSON response for a particular 'Topic'

    Additionally the last section of the topic message contains information about each dimension, as shown below.

    Showing the JSON Dimension Codes in the Topic response

    The above JSON can be obtained from the following URL:
    http://demo.metadatatechnology.com/FusionMatrix/ws/rest/databrowser/topic/WB:WDI_FINANCIAL_SECTOR(1.0)

    Constraint

    The constraint function returns information about which Codes remain valid in each Dimension for the content of a data query (i.e. there is data in at least one data source that will be queried). The format of the URL after the web service entry point is the keyword 'constraint' followed by the data query which must comply to the SDMX Web Services guidelines for a RESTful data query.

    In addition the response from the web service call contains information about the number of series keys that would result from the given data query. An estimate is also given on the number of observations that would be returned. Note that the estimate is based on known start and end dates for the data set, along with the observation frequencies that would be in the response dataset. In order to provide an observation count, the Data Structure Definition must contain a Frequency Dimension with Id of 'FREQ'. The Frequency Dimension must be coded using the same code ids as the SDMX Frequency Codelist.

    Example Constraint Query

    The following query will return the available code selections based on the data query which starts after the keyword 'constraint'
    http://demo.metadatatechnology.com/FusionMatrix/ws/rest/databrowser/constraint/data/UIS,EDUCATION_MDG,1.0/.ILLPOP_AG15T99.....AFG+DZA+AGO.?startPeriod=2001-01-01T00:00:00.000Z&endPeriod=2010-12-31T23:59:59.000Z

    Example UI

    The constraint function is used to grey out codes in the filters tab based on current code selection. The constraint function takes into account all code selections made in all dimensions, based on this certain code choices have been disabled to prevent the user creating a query which results in no data being returned.

    Showing the constraint web service function being used to restrict grey out codes whose selection, based on the current query, would result in back no data

    Note: It is possible that selecting a code in one dimension, results in the exclusion of an already selected code choice in another dimension. This can be seen in the above image with Afghanistan code being constrained.

    Additionally the constraint information is used to restrict the query, as it would bring back over 300 series, which exceeds the threshold set in the Matrix-UI properties file.

    Example JSON

    The constraint message contains information about the number of series and a potential number of observations that could result from a data query. Additionally it contains the valid codes for each dimension based on the current query.

    Showing the JSON Dimension Codes in the Topic response

    The above JSON can be obtained from the following URL: http://demo.metadatatechnology.com/FusionMatrix/ws/rest/databrowser/constraint/data/UIS,EDUCATION_MDG,1.0/.ILLPOP_AG15T99.....AFG+DZA+AGO.

    Matrix Toolkit

    The Matrix ToolKit consists of an executable Java jar file, and a set of helper script files to run operations on the jar file. The supported operations are shown in the table below.

    ToolKit Script File

    Operation

    publishStructures

    Publish structural metadata to the Matrix (Data Structures, Codelists, Concepts, Dataflows, Provision Agreements etc)

    publishData

    Publish data sets to the Matrix

    publishMetadata

    Publish metadata sets to the Matrix

    deleteMetadata

    Delete metadata from the Matrix

    deleteData

    Delete data from the Matrix

    startSweeper

    Starts the file sweeper service

    Each scripts is provided in two formats: a Microsoft Windows Batch format (files end in .bat) and .a Unix BASH format (files end in .sh).

    Operations such as configure the database connection settings, messaging and notification settings, sweeper settings, and registration settings are specified in a properties file. Details of how to configure the properties file are not explained in this document but can be found within the document "Fusion Matrix Setup Guide".

    Publishing Structures

    To publish structures execute the publishStructures.sh (UNIX) or publishStructures.bat (Windows) file with the following commands:

    Property

    Argument Type

    Purpose

    uri

    String

    The URI which references the structures to publish. An example of a URI is one that starts with HTTP (if importing from a URL) of file:/// (if the import is from the file system).

    Example 1: -uri file:///home/ubuntu/mystructures.xml

    Example 2: -uri http://mydomain.org/mystructures.xml

    prop

    String

    A URI of a properties file to use, if importing delimited data.

    delete

    -

    If present, then the submitted structures will be deleted

    Example 1 – Import from a SDMX URL:

    publishStructures.sh –uri http://demo.metadatatechnology.com/FusionRegistry/ws/rest/provisionagreement/UIS?references=descendants

    Example 2 – Import from a SDMX File

    publishData.sh-uri /home/ubuntu/data/uis_metadata.xml

    Example 3 – Import from a CSV File

    publishData.sh-ur /home/ubuntu/data/uis_metadata.csv –prop /home/ubuntu/data/uis_edu.properties

    Publishing Data

    To publish data execute the publishData.sh (UNIX) or publishData.bat (Windows) file with the following commands:

    Argument

    Mandatory

    Description

    uri

    True

    The URI which references the dataset to import. This can be a URI to a specific file, or a URL which resolves to a data file. It may also be a URI to a directory. If the URI is resolved to a directory, then all the files in the directory will be imported against the target Provision Agreement, with the specified action/embargo.

    Example 1: -uri file:///home/ubuntu/mydata.xml

    Example 2: -uri http://mydomain.org/mydatafile.xml

    Example 3: -uri http://mydomain.org/ws/data/ACY,DF_1/ALL/?detail=full

    prop

    False

    For CSV Data Imports. References the properties file (using URI syntax) which describes how the columns map to the Data Structure.

    Example: -prop file:///home/ubuntu/mydata.properties

    ref

    True

    Defines the Provision Agreement that the import is for.

    Format: Agency Id, Id, Version

    Example: -ref ACY1,MY_PROV,1.0

    action

    False

    Defines the action to apply to the import (Append, Replace, Delete). The default value if unspecified in both the command line and dataset, is Append

    Example: -action Replace

    embargo

    False

    Specifies the embargo date in ISO-8601 format. The imported data will not be queryable until the specified embargo time has passed. Note, the embargo date may be part of the dataset header. Passing in an argument will override whatever is specified in the header. Note: embargo is only supported if a link to Fusion Security has been configured via the Fusion Matrix Admin application.

    Format: ISO-8601

    Example: -embargo 2013-08-20T12:00

    Example 1 – Import from a SDMX URL

    publishData.sh –uri http://217.174.255.91/FusionMatrix/ws/rest/data/UIS,EDUCATION_MDG,1.0/.NERA_1....../ALL -ref UIS, UIS_EDUC_MDG,1.0

    Example 2 – Import from a SDMX File

    publishData.sh-uri /home/ubuntu/data/uis_edu.xml -ref UIS, UIS_EDUC_MDG,1.0

    Example 3 – Import all files in directory

    publishData.sh-uri /home/ubuntu/data/ -ref UIS, UIS_EDUC_MDG,1.0

    Example 4 – Import from a CSV File

    publishData.sh-uri /home/ubuntu/data/uis_edu.csv -ref UIS, UIS_EDUC_MDG,1.0 –prop /home/ubuntu/data/uis_edu.properties

    Publishing Metadata

    To publish structures execute the publishMetadata.sh (UNIX) or publishMetadata.bat (Windows) file with the following commands:

    Property

    Argument Type

    Purpose

    uri

    String

    The URI which references the metadata to publish. An example of a URI is one that starts with HTTP (if importing from a URL) of file:/// (if the import is from the file system).

    Example 1: -uri file:///home/ubuntu/mymetadata.xml

    Example 2: -uri http://mydomain.org/mymetadata.xml

    delete

    -

    If present, then the submitted metadata set(s) will be deleted.

    Note that metadata can also be deleted using the deleteMetadata command (see 5 below).

    Example 1 – Import from a SDMX URL

    publishMetadata.sh –uri http://demo.metadatatechnology.com/FusionMatrix/ws/rest/metadata/dataflow/ALL/ALL/ALL Example 2 – Import from a SDMX File

    publishMetadata.sh-uri /home/ubuntu/metadata/uis_metadata.xml

    Example 3 – Delete metadata

    publishMetadata.sh-uri /home/ubuntu/metadata/uis_metadata.xml delete

    Deleting Metadata

    To delete a metadata report the deleteMetadata.sh (UNIX) or deleteMetadata.bat (Windows) command may be used. Metadata may be deleted by metadata set, or metadata set and report period. The following commands are supported

    Argument

    Mandatory

    Description

    setId

    True

    Defines the metadata set id

    period

    True

    Defines the metadata set report period

    Example 1 - Delete metadata set

    deleteMetadata.sh –setId CollectionProcess

    Example 2 – Delete metadata set for period

    deleteMetadata.sh –setId CollectionProcess -period 2009

    Deleting Data

    To delete data an SDMX Dataset may be published in with the action Delete. Alternatively the dataDelete.sh (UNIX) or dataDelete.bat (Windows) command may be used. Data may be deleted at the structure level: Provision Agreement (provision); Dataflow (flow); or Data Structure (dsd), in this instance all data that relates to the structure will be deleted. Alternatively data may be deleted for a specific data import, as each import receives a unique id. The following commands are supported:

    Argument

    Mandatory

    Description

    ref

    True

    Defines dsd/flow/provision; format Agency Id, Id, Version

    structure

    True

    Defines the structure type referenced by the -ref argument: dsd, flow, provision

    id

    False

    Specify a dataset id to delete. This expects the structure argument to be either flow or provision

    Example 1 - Delete all data for Dataflow

    deleteData.sh –ref UIS_EDUC_MDG,1.0 –structure flow

    Example 2 – Delete a dataset for Provision

    deleteData.sh –ref PA_EDUC_MDG,1.0 –structure provision –id ab44350c6b1e1a4f33490f1ad64c388b

    Start Sweeper

    To start the sweeper service the startSweeper.sh (UNIX) or startSweeper.bat (Windows) command may be used. The sweeper service will launch and run in the background, periodically checking the file system for new data files. The sweeper will continue to run until the service is terminated. No arguments are required to start the sweeper service however the matrix.properties file must have a value for the property "sweeper.dir"

    The "sweeper.dir" property determines which directory will be swept by the sweeper as it looks for files to import. If this property does not exist in the properties file or it does not have a value, then the sweeper service will not start. If the value is set but the directory does not exist, then the sweeper service will attempt to create it. The Sweeper Service will create subdirectories for each of the data providers registered in the Matrix. There will be a sub-directory for the Agency containing a sub-directory for the DataFlow and a final sub-directory for the Data Provider.

    On start-up the sweeper engine will automatically create a directory structure based on the known Provision Agreements. The directory structure created conforms to the following tree structure:

    • Level 1 Folders – Agency Id
    • Level 2 Folders – Dataflow Id
    • Level 3 Folders – Data Provider Id

    An example of this structure is shown below

    Showing the directory structure created by the sweeper engine

    In this example, the International Monetary Fund (IMF) Agency are collecting Balance of Payments (BOP) data from the United Kingdom (UK), France (FR) and Germany (DE).

    When the United Kingdom Data Provider wishes to provide data to the Matrix, the file should be placed in the "UK" directory, as shown in the example above for the file 'UK-BOP-2012.xml'. The sweeper engine detects the new file, and imports it against the Provision Agreement that is in place for the United Kingdom to provide Balance of Payments data to the IMF.

    Once a file has been processed and the data loaded into the Matrix, the file is renamed to have the prefix '.processed', so 'UK-BOP2012.xml' will be renamed to 'UK-BOP2012.xml.processed'.

    If processing the file resulted in error, the file will be renamed to include the prefix '.errored' and a corresponding file is created which contains the error text, this has the same name as the failed data file suffixed with '.error.txt'. So 'UK-BOP2012.xml' will be renamed to 'UK-BOP2012.xml.errored' and the error text will be in file 'UK-BOP2012.xml.error.txt'.

    Matrix Manager

    The Fusion Matrix Manager has four main roles:

    1. to provide a UI for data, structure, and metadata loading
    2. to provide the ability to connect to external Fusion services, to monitor the services, and react to changes
    3. to connect external data sources and link dataflows to data sources
    4. to run the decryption process once an embargo time has passed for a previously loaded dataset

    The Matrix Manager provides an email notification service to raise alerts if any parts of the environment are not running as expected.

    To log into the Matrix Manager for the first time, enter the username and password as specified in the properties file. The values for these on installation are ‘root’ and ‘password’. After a successful login, the following page will be displayed.

    The overview page of the Matrix Manager

    This is the overview page and it shows the connections to Fusion Services and data sources. By default the Matrix Manager will not be connected to any external Fusion Services.

    The main tiles on this page display:

    • The configuration details of a Fusion Matrix instance – used to link the Manager to the Matrix
    • The configuration details of a Fusion Registry instance – used to provide Metadata Structures
    • The configuration details of a Fusion Security instance – used to provide access credentials for the Matrix Suite
    • The configuration details of a Fusion Audit instance – used for storing audit information
    • DataSources - The Fusion Matrix data source is the MySQL database which is specified in the properties file. The Fusion Matrix data source is the location of datasets that have been imported into the Fusion Matrix. On installation there are no datasets stored.

    The Matrix Manager provides the ability to import data, structures, and metadata. This can be achieved by clicking the “Import” button on the top right of the page.

    The current data import activity can be seen on the bottom of the page (in the screenshot this is the blue “No Activity”). On clicking the current activity button, a dialog window will display information about all the data imports for the last week.

    The icons on the left-hand side of the page are for navigation to other pages.

    There is also an indicator at bottom-right to show the user currently logged in and provides logout functionality.

    Matrix Connection

    The Fusion Matrix configuration specifies the URL of the Fusion Matrix, and the sender Id, which is the identifier located in the header of any SDMX message sent from the Fusion Matrix. The purpose of defining the Fusion Matrix URL is twofold:

    1. The Fusion Matrix Manager will poll this URL every 30 seconds to ensure the Fusion Matrix is running. The Fusion Matrix will send an email notifications if the connection is lost.
    2. The Fusion Matrix URL must be set if the Matrix Manager has been configured to automatically submit a data registration to the linked Fusion Registry on data import. The reason for this is that a data registration must contain the URL of from where the data can be retrieved.

    If monitoring a connection and automatic data registrations are not required, then the Fusion Matrix location does not need to be configured.

    To configure the Fusion Matrix connection and sender details click on the cog icon on the top right of the Fusion Matrix panel. The following dialog box will appear allowing for both pieces of information to be set.

    The Fusion Matrix location and sender id being configured

    On clicking 'Save' the Fusion Matrix Manager will query the Fusion Matrix to ensure it is both contactable and responds with the expected information. To unset a URL or Sender ID, simply remove the information from the entry field and click 'Save'.

    Registry Connection

    The second panel provides the ability to connect to a Fusion Registry allowing the Fusion Matrix to synchronize with all the structures in the Fusion Registry. This synchronization process may take a few minutes depending on how many structures are in the Fusion Registry. Once synchronization is complete the Fusion Matrix will periodically poll the Fusion Registry’s RSS feed to check for changes. Any structures created, modified, or deleted in the Fusion Registry will be detected by and pulled into the Fusion Matrix. The Fusion Registry connection passes control of the maintenance of the Structural Metadata to the Fusion Registry and allows the Fusion Registry metadata content to drive the Fusion Matrix.

    The Matrix Manager’s Fusion Registry panel will show the Fusion Registry connection details together with the last time that an update to the RSS feed was detected and processed.

    It is important to note that the structures in the Fusion Registry are copied to the Fusion Matrix and held locally. This design means that if the Fusion Registry is unlinked, or if the Fusion Registry server becomes unavailable, the Fusion Matrix will still be able to access the structures. Additionally, if the Fusion Registry is updated in such a way that is incompatible with the contents of Fusion Matrix, then the Fusion Matrix will refuse this update. In this scenario the Fusion Matrix will send an email notification (providing the Fusion Matrix Manager has been configured to use an email server). An example of an illegal update is the removal of a Dimension from a data structure if there is already data available for this data structure.

    Data Portal with Fusion Registry

    The Matrix Manager will process any registered data sources in the Fusion Registry, indexing each data source to make the data available via the Fusion Matrix UI. As the Fusion Registry may contain multiple data registrations per dataflow, the Matrix Manager will index each data source separately and broker queries to one or more of data sources depending on which ones contain data that match the query parameters. If there are multiple data sources containing data that match the query parameters, each data source will be queried and the Fusion Matrix will consolidate the resulting datasets into a single dataset before passing the result to the calling client.

    Any registrations added to the Fusion Registry will be detected and indexed by the Matrix Manager and the data will be made available in real-time. Additionally if data registrations are deleted from the Fusion Registry the Matrix Manager will remove the index, and this may result in fewer available series or, if all registrations are removed, the total removal of the Dataflow.

    It is possible to view which Dataflows in the Fusion Matrix are linked to registered data sources via the data source page in the Matrix Manager.

    Auto Registration

    On creating the connection to the Fusion Registry it is possible to set auto Register to ‘Enabled’. Once enabled, the Fusion Matrix will automatically submit a Data Registration to the Fusion Registry each time a new dataset is successfully loaded into the Fusion Matrix. The data registration will reference the Fusion Matrix web service as the data source for the registered dataset. The Matrix URL must be configured as discussed in the Matrix Connection section. A username and password is also required, as the Fusion Matrix will use this to authenticate with the Fusion Registry. If security has not been configured in the Fusion Registry, then the default Fusion Registry username and password should be used. The username and password must link to a user account with the correct privileges to submit data registrations. As data registrations may be for multiple data providers, it is recommended that the credentials are associated with an admin account.

    Showing a Fusion Registry connection being configured

    Security Connection

    The Fusion Matrix Manager provides the ability to connect to Fusion Security, the purpose for this is twofold.

    1. On configuring a Fusion Security connection, Fusion Security will become the authentication service for the Fusion Matrix Manager and Fusion Matrix. This will allow multiple users to be set up with access to these applications. If auditing is configured this can be useful to track which users are loading data or metadata.
    2. 2. Fusion Security enables datasets to be loaded with an embargo time. This is because Fusion Security is instrumental in ensuring the confidentiality of the data, as discussed below.

    To connect to Fusion Security a valid username and password must be supplied which must be an existing user account in Fusion Security.

    Showing a Fusion Security connection being configured

    Audit Connection

    In order to take advantage of Fusion Audit’s auditing capabilities, it is possible to define a connection via the Matrix Manager. The connection must be to the RabbitMQ server that is being used as the message broker used by Fusion Audit. Information about RabbitMQ is provided in the Fusion Audit SetUp Guide.

    Once a successful connection has been made, Fusion Matrix will send information to Fusion Audit.

    Important Note: the Fusion Matrix-UI web server must be restarted before this change will take effect.

    Adding Database Data Sources

    The last panel in the Matrix Manager overview page is the Data Sources panel. It is possible to use this to add a connection to an existing MySql, Oracle, or SQL Server database. To do this click on the cog icon on the top right of the data sources panel and fill in the connection details in the form.

    showing a connection to an external database being created

    Once a new data source has been added to the Matrix Manager, it will appear in the list of available data sources, as shown below.

    Showing the additional Oracle data source being listed in the available data sources

    The Matrix Manager will ensure it can connect to the data source, but at this stage it does not have any knowledge of what data are available in the data source. This can be configured in the data sources page.

    To edit the data source connection, click on the same cog icon at the top right of the Data Sources panel and select the data source to edit in the drop down list. On selecting the data source to edit, the form will be prefilled with the current connection details, except for the password which will need to be redefined. To save any changes to a data source click on the 'Save' button.

    Data Source Manager

    The Data Source Manager page shows a list of all the Dataflows known to the Fusion Matrix and from which data source the data should be retrieved.

    A Dataflow can link to one of the following types of data source:

    1. Matrix Data Source: this is the Fusion Matrix MySQL database. An imported data file will automatically be loaded into this data source, and the Dataflow will automatically be linked to this data source after data import.
    2. Registered Data Source: these data sources result from data registrations in the Fusion Registry. A single Dataflow may link to multiple registered data URLs.
    3. Plugin Data Source: these data sources are defined in the Matrix Manager, and the Dataflow must be manually linked to this data source in the Data Source Manager.
    4. Mapped Data Source: this is when the data for a Dataflow comes from the data source of a linked dataflow. The two Dataflows must have a mapping defined between them that describe the relationship between the Dimensions and coding schemes of the underlying data structures. A mapped data source obtains this mapping information from an SDMX Structure Set which can be defined and maintained in the Fusion Registry.
    fig2
    Showing the Data Source Manager page with 4 different data source types

    The above image shows the Data Source Manager page, with the Health Single Series Dataflow selected. On the top right of the page is an ‘Index’ button. This button can be used to re-index the data source. Indexing a data source will result in a query for all series, together with the start and end date of the data. The indexed information is used by the FusionMatrix-UI when displaying information about available datasets.

    Linking a Dataflow to a Plugin Data Source

    In order to link a dataflow to a plugin data source click on the edit icon next to the Dataflow name. This will change the view so that the data source cell becomes a drop down selection box which, upon expanding, will show all the available data sources known to the Matrix Manager.

    fig2
    Showing the data source options for the Health Single Series dataflow

    On connection, the Fusion Matrix will expect the database to contain a table with a specific name, and the columns in the table must also have specific names, which are all derived from the data structure that the Dataflow references.

    The naming convention for the database table/column is based on the dataflow, as shown below:

    Dataflow

    Agency: ESTAT
    Id: NATACC


    Version: 1.0

    Expected Table/View Name: ESTAT_NATACC_1_0

    Note: the dataflow id will be truncated if the resulting string is greater than 30 characters.

    The database columns are expected to reflect the underlying data structure definition, as shown below:

    Data Structure Definition

    FREQ
    SERIES
    REF_AREA
    TIME_PERIOD
    OBS_STATUS

    It should be noted that the time period should be of type Date. The Frequency dimension is used to determine how to format the date in SDMX outputs. If the DSD does not have a frequency dimension, the FREQ column is still required, and is expected to define the frequency.

    The allowable frequencies are based on the SDMX Frequency Codelist and are:

    Frequency

    Description

    Example Date

    A

    Annual

    2014

    S

    Half-yearly, semester

    2014-B1

    Q

    Quarterly

    2014-Q1

    M

    Monthly

    2014-01

    W

    Weekly

    2014-W2

    D

    Daily

    2014-01-01

    H

    Hourly

    2014-01-01T01:00:00

    Note: It is possible to press [alt+q] in order to show the agency, id, and version of each Dataflow on the page instead of the name of the Dataflow.

    Note that the link to the database could be to a database view.

    On clicking the tick icon, the Matrix will confirm that the database table exists, and that it contains the expected columns. If the table name is incorrect, or if there are any columns missing from the table then this will be reported. On successfully linking the Dataflow to the data source, the Matrix Manager will index the data source, and the number of series at the linked data source will be shown in the Data Source Manager.

    Any time new data is added to or removed from the data source for the particular Dataflow, it is important to re-index the data source. This can be achieved by selecting the dataflow in the Data Source Manager, and clicking on the Index button at the top right of this page.

    Linking a Dataflow to a Mapped Data Source

    The data source of a Dataflow can be to another Dataflow. This can be seen with the ‘Health’ Dataflow whose data source is the 'Health Single Series' Dataflow. In reality this means that any query for the Health Dataflow will be mapped to an equivalent query for the Health Single Series Dataflow, and executed against the Health Single Series data source. The response of the query will be mapped to conform to the Health Dataflow before being passed to the calling client.

    fig2
    Showing the choreography of a mapped query

    In the above example, the Health dataflow describes a dataset by breaking it down into 4 dimensions: Frequency, Series, Reference Area, and Time. The Health Single Series dataflow describes the same dataset with only 2 dimensions, Series and Time. The data is held on an external Oracle database according to Health Single Series, but the intention is to disseminate the data along 4 dimensions as described by the Health dataflow.

    In order to achieve this, a mapping must be defined which describes the relationship between the 4-dimensional Health Dataflow, and the 2-dimensional Health Single Series Dataflow. This mapping is defined in an SDMX Structure Map, and more details of this can be found in the Fusion Registry Documentation section.

    If a mapping relationship exists between 2 Dataflows then the mapped Dataflow(s) will appear as additional options when selecting a data source for a Dataflow, as shown below.

    fig2
    showing the data source options for the Health dataflow, which includes the Health Single Series dataflow

    It is also possible to take advantage of a mapping relationship when importing data. It is possible to import data for one Dataflow which is then mapped and stored against another Dataflow. This allows different Dataflows to be defined for import (e.g. 2-dimensional) and dissemination (e.g. 4-dimensional), where the data is only stored in one location.

    Linking a Dataflow to SDMX Web Services/SDMX Files

    It is possible to define both SDMX compliant web services and SDMX files as available data sources for a Dataflow. This is achieved through the creation of a data Registration in the Fusion Registry. The Registration is at the level of the Provision Agreement (there can be multiple Provision Agreements for a Dataflow) and for any one Registration the data source must be either an SDMX compliant web service or an SDMX file. Upon the creation of a data Registration, the Dataflow’s data source will be updated automatically to reflect the registration URL.

    If there are multiple Registrations for a single data source then the Fusion Matrix will index each one individually, and use this information in order to broker data queries to one or more of the data sources. When processing a query the Fusion Matrix will only query a registered data source if it knows there is data in that data source for the particular data query. If multiple data sources contain data pertinent to the data query, the then Fusion Matrix will query each source simultaneously, and consolidate the response(s) into a single dataset. The Fusion Matrix will also transform the data into the format as requested by the client, even if each data source replied to the query with a different SDMX format.

    It should be noted that it is not possible to re-index a registered data source from within the Data Source Manager. If the data at a registered data source changes, then this should be re-registered with the Fusion Registry, which will prompt a re-index by the Fusion Matrix.

    Linking a Dataflow to the Matrix Data Source

    The Fusion Matrix provides its own data source to store data which is loaded through the Matrix Manager and Matrix Toolkit applications. The name of the data source is ‘Fusion Matrix’ and will appear in the list of available data sources for each Dataflow. After loading data to the Fusion Matrix, the data source for the Dataflow will automatically be linked to the Fusion Matrix, so it is not necessary to do this manually. The data source will also automatically be re-indexed each time data is loaded, deleted, or an embargo times passes. So it should not be necessary to re-index a dataflow manually when linked to the Fusion Matrix data source. The Matrix data source can be re-indexed if the database tables have been deleted manually, or if the data in the tables have been modified manually. Note that manual modification is not recommended as this could result in database integrity problems.

    Structural Metadata

    The third item in the navigation sidebar is the Structure Metadata navigator. Structural metadata is the metadata which underpins and drives the Fusion Matrix: it describes how data sets are structured, codified, who can report data, which subsets of data can be reported, how datasets are mapped, etc.

    Although the Fusion Matrix is driven by structural metadata, the Fusion Matrix is not a structural metadata authoring application - it is recommended to use the Fusion Registry for this task. However, the Fusion Matrix Manger does provide a simple list view of what structures are available, together with an export facility which allows the downloading of the structural metadata in SDMX version 2.1.

    To view the available structural metadata, select the structure type on the top right drop down list, and the page will update to show all the structures of the given type.

    fig2
    Showing a list of dataflows known to the Fusion Matrix

    Each structure can be downloaded by clicking on the 'xml' icon. The chosen structure will be exported together with all referenced structures, so exporting a data structure will result in a file which includes all codelists and concepts used by the exported data structure. The “Export All” button (located at top-left of the page) will export all the structural metadata in a single file. Note: that does not include Registrations which must be exported separately.

    Import Data

    There is a minimum set of structural metadata required for the Fusion Matrix before data can be imported. The information regarding these requirements, together with additional information about each aspect of data imports, can be found in the Data Import Section.

    With the required structural metadata in place it is possible to import data into the Fusion Matrix though the Matrix Manager application. To import data click on the 'Import' button which can be found in the header of every page. The following dialog will be displayed:

    fig2
    The data import dialog

    Data can be imported by uploading a file, or alternatively by entering a URL which resolves to a dataset in one of the formats supported by the Fusion Matrix.

    The Data Provider and dataflow must be selected for the imported dataset. The list of available Dataflows contain only those that are valid for the currently selected Data Provider. This is determined by the Provision Agreements that relate to the Data Provider.

    A dataset action must be selected (Select Action), and this defaults to APPEND. If Fusion Security has been configured then an Embargo time can be set on the imported dataset. The embargo time is held in GMT and for information purposes the current server time in GMT is shown at the bottom of the page.

    More information on data imports including provision agreements, data formats, data set action, and embargo, can be found in the ’Fusion Matrix Data Import and Export Guide’.

    After selecting all the required information, click on the Import button. The Fusion Matrix Manager will analyse the data to ensure it is supported. If successful the data import will be placed on the import queue and run in the background, and the import dialog will close automatically. The progress of the import will be displayed on the bottom of the page, as shown in the image below (the blue indicator at the bottom of the page). It is important to note that the progress indicator will pause from time to time as the Matrix flushes information to the database. Depending on the size of the data import, there may be a long delay at 100% before the progress indicator turns green to indicate success as the transactions are committed to the database.

    fig2
    Showing a progress indicator for the WDI_EDUCATION data import at 79%

    Import Errors

    A data import may fail for a number of reasons. For example if the dataset is attempting to replace an observation value, but the dataset action was set to APPEND. Or if the dataset contains codes which are valid in the codelist but disallowed due to Constraints on the Data Provider. If a data import results in an error, the progress indicator will turn red, and at this stage the import will be discarded.

    fig2
    Showing the progress indicator reporting a failed import

    If an email server has been configured, an email will be sent to the email distribution list. To view more information about the error from within the Matrix Manager application, click on the progress indicator, which will show the last week’s activity. To view the errors for an import, click on the 'ERROR' status, which will expand to show all error messages for the import.

    fig2
    Showing error details for a data import

    Import Warnings

    In some cases a data import may have been imported successfully, although there were warnings generated from the import. In this case a warnings indicator will be displayed on the header of the Fusion Matrix Manager, next to the import button.

    Clicking on this warnings indicator will display a list of warnings in the system. A typical warning would be if the data structure is linked to a Hierarchical Codelist, and the imported dataset contained codes that, whilst being present in the Codelist used by the Dimension, are not contained in the Hierarchical Codelist. Such codes will appear on the selection picker in the GUI but will not be shown in the context of the hierarchy.

    Importing Structures

    Whilst the use of a Fusion Registry is recommended in order for authoring structural metadata, it is not a requirement to link the Fusion Matrix to a Fusion Registry. It is equally valid to update the structural metadata in the Fusion Matrix by uploading a valid SDMX file containing the required structures.

    Note: the Fusion Registry can be used to author the structural metadata which can be exported and manually loaded into the Fusion Matrix Manager (or via the Fusion Matrix ToolKit).

    To upload structures manually into the Fusion Matrix Manager, click on the Import button which is located at the top right of most pages in the Matrix Manager. Within the dialog that is displayed, click on the ‘Structure’ button. The structures can either be imported as an SDMX file, which can be a zip file if required, or imported via a URL which resolves to a structure document.

    Import Metadata

    The Fusion Matrix Manager provides the ability to import SDMX Metadata Sets. A Metadata Set contains additional metadata about some part of a dataset, such as a Dimension value, or a specific observation. This type of metadata is sometimes referred to as Quality Metadata or Reference Metadata.

    The Fusion Matrix-UI offers a metadata authoring facility, which provides an online editor for the creation and maintenance of Reference Metadata. This Metadata is expressed in the Fusion Matrix as SDMX Metadata Sets, and can also be exported in this format. Both the Fusion Matrix Manager and Fusion Matrix Toolkit allow Metadata Sets to be imported from an SDMX file.

    Fusion Browser For Excel

    The Fusion Browser has been set-up to communicate with a default service provided by Metadata Technology. This service provides real, public data from multiple data providers.

    It is possible to add up to five Fusion enabled data sources to the Browser. The data sources may be public services provided on the web, secure services available on the web, or services running privately to you or your organisation.

    The Browser provides the user with a categorised tree of data topics, along with the ability to select sub-cubes within the topic. On data query, the browser will offer both a flat view of the data and the ability to view the information in a pivot table.

    Showing the Fusion Browser on the Excel Ribbon

    Setup

    The Setup button provides the capability of customising the plugin by adding new Fusion Matrix data sources, changing the query limits, and using code identifiers rather than labels for the dimension values selection.

    Showing the Fusion Matrix Connection Setup

    Adding New Fusion Matrix URL

    Fusion Matrix data sources can be added by typing the URL of a Fusion Matrix location in the IP Address column.

    The Fusion Matrix requires an Alias, and if security is enabled, then a username and password can be supplied. The Fusion Matrix connection will only become active when the Connect checkbox is selected.

    After clicking ‘OK’ the data source will appear as a topic level selection in the Browse Topics drop-down.

    Showing topics from two Fusion Matrix servers

    Modifying Observation Limits

    The Fusion Browser will by default bring back the last 50 observations per series, for up to 2000 series. To modify this simply open the Setup window, and alter the values before clicking 'OK'.

    Using Short Codes

    Every dimension value represented by a human readable label will have a short code associated with it. The short code is defined by the data owner, and may be fairly cryptic, but in some cases will conform to standard classifications (such as a Country classification scheme). By checking the Use short codes checkbox the dimension values will show the code identifier, as oppose to the code label. This also allows an increased number of Code columns to be used (the default view is 3). An example of this is shown below where 10 code columns are used with short codes enabled.

    Showing short codes used instead of full names

    Build Query

    The Fusion Browser'‘Browse Topics' function provides a categorised list of Topics which contain data. On clicking the menu, the list will be displayed as shown below.

    Showing Browse Topics

    On Topic selection new buttons will be added to Excel's Ribbon, enabling the query to be further refined. Note that the Run Query button may at this stage be disabled. This button will remain disabled until the query no longer exceeds the restrictions imposed on maximum query limit, which defaults to 2000 observations.

    Refine Query

    The Dimension Values button provides the list of dimensions used to describe the data in the Topic. Each dimension will comprise of one or more values that can be used to filter the data: for example the Country dimension may consist of France and Germany. Some of the dimensions shown in the list may be disabled. In this instance the data will only report one value for this dimension, and therefore no further refinement is possible. An example of this could be Frequency (fixed, Annual).

    Showing available dimensions to pick values for

    On dimension selection, the list of valid choices will be displayed as shown below.

    Showing dimensions values for the Series dimension

    Each time a dimension value is selected, the data query will update, and a calculation of the number of series and observations will be performed. At this stage the ‘Run Query’ button may become enabled or disabled depending on the estimated observation size.

    The server will also reply with a list of valid selections based on the current state of the query, so it is possible at this stage that some code choices are removed from any one of the dimension values list.

    Note that the dimension values list is restricted by the size of the Ribbon, and therefore the Fusion Browser provides a pop-up view of all the possible dimension values. To show the pop-up view, click on the little icon located at the bottom right of the dimension value picker.

    Showing the values picker in a pop up dialog window

    Note that the pop-up view of the dimension values shows the hierarchical relationship of the dimension values by means of indentation. Some dimension values may be marked as Display Only, which indicates that their purpose is only to serve as a node in the hierarchy, not as a valid data selection. Some other may be marked as Constrained, indicating that based on the current query, the inclusion of that dimension value will either result in no data for that selection, or no data for the entire query.

    To select a dimension value from this view, ensure the value is selected, and click ‘Select’.

    View Data

    Once the query selections have been made, clicking on the 'Run Query' button will execute the query on the server and import the data into the Excel worksheet. The following worksheet shows the results of a query for the imports of 'Wheat, Grain, and Rice for Developed and Developing countries'.

    Showing a series view of the dataset with a graph per series

    Notice that the default view is a flat table. Each series is represented with a miniature graph (known as a Sparkline) above a coded label which is used to uniquely identify the series. Hovering over the mouse cursor small red triangle icon will bring up the decoded dimension values for that series.

    Showing the dimension values for the series

    After the Fusion Browser has run a query for data, it will extend the options in the Ribbon to include a Pivot View button. By clicking on this button, the data in the worksheet will be regrouped into an Excel Pivot table, shown below.

    Showing the dataset viewed in an Excel Pivot table

    In pivot view, all of the controls and functionality provided by Excel to manipulate the pivot will be accessible. To change back to the tabular view, click the Table View button in the Ribbon.

    Charting

    In the tabular view it is possible to create a chart by simply selecting the columns for charting and selecting the Show Chart button in the Ribbon. The whole column does not have to be selected, just a single point in the column is enough to inform the Fusion Browser which column(s) to chart. The chart will be generated automatically with series labels and title.

    Showing 3 series in a time series graph

    To remove the chart from view, click the Show Chart button again or simply select the chart and click 'delete' or 'backspace'.

    Save Queries

    It is possible to save queries in the Fusion Browser for later use. To do this, ensure the query has been run, and click the Save Query button.

    The query requires a Name and has an optional description. After saving the query, it will be available in the Browse Topics list of topics.

    Showing the dataset viewed in an Excel Pivot table

    Queries can be modified and deleted by opening the Query Manager accessible from the Stored Queries option in the Ribbon.