Universal Architecture

Web-services technology has placed in the mainstream the idea of applications working together as single distributed system across the Internet. Many statistical standards are beginning to use this type of architecture: in DDI, the idea of having a centralized registry to function as a “question bank” has emerged; in SDMX, a set of standard interfaces have been provided for interacting with a registry that gives visibility into sets of aggregate data, repositories of related structural and descriptive metadata, and the process-related information about the use and exchange of this data and metadata. These ideas are very similar, despite being applied to different parts of the statistical lifecycle.

The application of registries and service-oriented architectures to the statistical realm is an idea that is coming of age, but there is a risk that different standards and specialties within the statistical realm will apply this technology in non-interoperable ways. This is also, however, an opportunity for the ODaF to help design and promote a single aligned architecure , and to avoid the problems inherent in having mutliple competing standards. Below are some of the aspects of such a “universal” statistical architecture:

Data and Metadata Models: There are today many standard modelling techniques which promote interoperability across standards. These include the SDMX Information Model (a meta-model for aggregated statistical data and related metadata); ISO/IEC-11179 and the Common Metadata Repository (CMR) extensions (for defining semantics and the collection and management of data and metadata); ISO-15000 Core Components Technical Specification (for modelling transactional data); and the Neuchatel model for classifications schemes. There are many other useful models as well, particularly the model emerging from DDI in their next release for survey instruments, raw data, microdata , aggregate data, and related metadata. Many of these standards are already drawing on one another, and a full alignment is possible. Such models provide a basis for the mapping of other models into a coherent whole, using a standard semantic metamodel such as ISO-11179 as a central “pivot.

Data and Metadata Formats: Many standards provide formats for statistical data and metadata. Many of these – but not all – are described using the XML standards; some are in EDIFACT syntax (GESMES) or in proprietary formats of various types. Many of these formats are based on the various meta-models described above, and are also to some extent aligned. By formally mapping them, it is possible that a set of interoperable applications can be created, facilitating the exchange and use of statistical data and metadata.

Statistical Registries: With the advent of service-oriented architectures, the web-services concept of a “registry” becomes important. There are two horizontal technology standards dealing with registries: ISO-1500, which describes the ebXML Registry/Repository, and OASIS’ UDDI registry specification. These generic registry standards have been refined in various ways for use within statistical applications, most notably in version 2.0 of the SDMX Technical Specifications. Having a standard set of registry interfaces based on a standard registry model is the lynch-pin to having a universal statistical architecture. As the use of standard registries develops, the ODaF is well-positioned to recommend how it can be employed to support the various standards within the statistical lifecycle, and to facilitate the development of tools to make the registry-based vision a reality.

Semantic Registries: ISO/IEC-11179 provides us with the concept of semantic registries, where the meaning of data and metadata elements can be effectively managed. Alignment with this formal definition of meaning, in an accessible fashion, is critical to the ODaF’s vision of interoperability, as an aligned part of an overall statistical architecture.

Metadata Repositories: Today, there is a growing emphasis on the use of public metadata repositories to support data quality initiatives. This has been a major driver in the development of the SDMX standards, which themselves drew on many existing initiatives. Other standard views of metadata repositories also exist, notably the CMR extensions to ISO/IEC-11179 and metadata repositories based on DDI, made available by many data archives. In the larger world of digital libraries, many useful standards such as Dublin Core (among others) can be employed to describe collections of digital artefacts . These repositories form a critical part of a universal statistical architecture, containing much of the valuable metadata which is today difficult to discover and access. By facilitating the alignment of metadata standards, and providing tools to work with them, metadata repositories will more easily become a part of an overall statistical network.

The Open Data Foundation intends to provide a forum where the technical aspects of such an aligned architecture can be discussed, and where alignment projects can take place. Most standards bodies have their own charters and goals, and very often they cannot spare the resources to focus on the development of software or complete alignment with the developing work of other standards bodies. The ODaF hopes to provide a common ground, where standards bodies can gain an understanding of how they interact with other statistical standards. In this way, sufficient coordination may be possible to create a flexible and useful architecture which is in line with the overall direction of Internet technology developments.