Data Collection and Integration Solutions
for Official Statistics
Statistical Data Collection and Integration using Fusion Registry 9
Metadata Technology’s Data Collection solutions are designed for Central Banks, International Organisations and National Statistics Offices who need to collect and integrate aggregated data as part of the official statistics data lifecycle.
Integrating Data from Multiple Sources into a Single Virtual Data Repository
Fusion Registry 9 will integrate data from any number of stores and sources into a single Virtual Data Repository - sometimes termed a Data Portal. When queried, the data needed to satisfy the request is retrieved from the relevant sources and integrated in real time.
The data remains at source meaning that new, potentially large datasets can be quickly and easily made available to data consumers without the overhead of physically extracting and transferring it.
Any number and combination of data sources can be registered. The registration process prompts Fusion Registry's virtualisation engine to index the available data by building a list of the series keys - the index being subsequently used to build the execution plan for each query.
Fusion Registry supports four types of data source:
- Fusion Registry Managed Repositories
- SDMX data web services
- SDMX data files accessible through a URL
- SQL databases
Fusion Registry's plugin architecture allows connectors to be added for any additional data sources. If you have specific sources you need to integrate data from - Spark, Hadoop or Amazon Redshift for instance, let us know.
For SQL databases, connectors are currently available for Oracle, MySQL and SQL Server. To maximise performance, the data must be presented in a separate table or view for each registered SDMX Dataflow. There's more detail in the Fusion Registry Data Collection Management Guide.
Loading and Storing Data
Aggregated observation data can be loaded and stored in local repositories managed by Fusion Registry.
Fusion Registry offers several different types of databases and data stores depending on the use case. For large datasets (greater than 100 million observations) a MySQL or SQL Server database is the best choice. For smaller datasets, Fusion Registry’s Fusion Data Store persistent in-memory database is simple to use and provides good performance.
Data can be loaded from files, or from a URL which supports integration with other systems that provide a REST GET interface. Import data is accepted in a range of formats including SDMX (XML and JSON) versions 1.0, 2.0 and 2.1, EDI, CSV and Excel. There's more detail in the Fusion Registry Data Formats Guide.
There's no restriction on the number of local repositories that Fusion Registry can manage. A Fusion Data Store could be used for active datasets, with a MySQL database repository for archive data that's required less frequently.
Data can be loaded interactively by an authorised user through Fusion Registry's web user interface, or programmatically using the REST API. A third option is the Fusion Registry's Command Line Interface which can be used in shell scripts to automate the data publication process.
You can try it out on the Enterprise Edition live demo server .
Automatically Collecting and Integrating Data from Multiple Providers
Fusion Reporting Node is an optional component for Fusion Registry 9 Enterprise Edition that automates the process of collecting data from multiple providers. Typical use cases include national statistics offices who need to collect data from multiple ministries and national agencies, and regional or international organisations who similarly need to collect data from multiple member countries.
How does it work?
Each data provider is supplied with a Fusion Reporting Node - either installed on-premises or as a cloud service.
Data providers upload their data to the Node using the web user interface where it is validated and automatically registered with the data collector's central Fusion Registry. The data remains on the Reporting Nodes and is dynamically integrated by the Fusion Registry which avoids the need for the data collector to physically store and manage the data.
Fusion Reporting Node accepts data either as SDMX, or Excel spreadsheets. To further simplify the process, Fusion Reporting Node can also generate Excel forms for the data provider to complete and submit.
Data providers who run Fusion Reporting Node on-premises also have the option of taking pre-prepared data from Oracle, MySQL or SQL Server databases. Like Fusion Registry, the data must be presented in a separate table or view for each Dataflow.
Collecting Data Using Excel
Excel Data Reporting Templates is a new feature for Fusion Registry 9.2 that simplifies manual data reporting processes by supplying data providers with personalised Excel data reporting forms to complete.
The layout of each form is based on a template defined by the data collecting organisation which is created using Fusion Registry's template design tool. The designer can choose from a range of configuration options including setting specific cell colours to identify important attributes such as observation confidentiality. They can also define validation rules.
A unique form is generated for each Dataflow that a Data Provider is required to report data for. If Reporting Constraints have been defined, Excel's cell protection functions are used to restrict the observations that can be entered. This not only simplifies the process for the data provider, but also reduces the risk of data being reported against the wrong codes.
The generated spreadsheets are simple Excel workbooks which use only standard Excel features and formulas. No macros, VBA or addins are used which could violate information security policies.
There's more in the Excel Reporting Templates White Paper.
Note that Fusion Registry Community Edtion can be used to design templates, generate blank spreadsheet forms and validate completed forms, but cannot load and store the reported data.