This article covers the implementation of two standards on metadata published by the International Organization for Standardization, ISO 19115 and ISO 19139. The contents of the standards are discussed very briefly, the reader is assumed to have knowledge of the standard or at least have a copy at hand. ISO 19115: metadata concepts explains how the ISO 19115 UML diagrams were converted into LuciadLightspeed interfaces and classes. ISO 19139: an XML implementation explains how an ISO 19115 meta data can be encoded and decoded based on ISO 19139.

See format.metadata.model and format.metadata.xml in the API documentation.

What is metadata?

Metadata is extra information on data, which is not part of the data itself. An ancient form of such metadata is the legend on a paper map. While it is not part of the data (the map itself), it can tell you more about the data, for example the scale of the map, or who was responsible for the creation of the map. Nowadays geographical information is stored in digital form and metadata can be stored next to the actual data, avoiding access to the data sets themselves.

As both the capabilities of producers and consumers of geographic information expand, there is an increasing need for geographic data exchange. But consumers are not interested in the bulk of available data, they require data pertaining to a certain domain, covering their area of interest, in a limited set of formats they can handle. Metadata can help them:

  • Locate data they are interested in

  • Manage large data sets by archiving the data not solely based on its geographic extent.

  • Evaluate available data

  • Publish data to interested parties.

As the number of producers grows, the need for a standard on metadata becomes apparent, since both producers and consumers need to be able to communicate on the data they wish to exchange. To this end the ISO 19115 and ISO 19139 standards are conceived. While the 19115 standard provides a conceptual schema on metadata presented as UML diagrams, the ISO 19139 standard describes an XML implementation of that schema.

ISO 19115: metadata concepts

Standard specifications

Geographic data can be archived using different characteristics, of which the most obvious one is the location and extent of the data. The ISO 19115 standard covers a comprehensive set of more than 300 characteristics, bundled from a large number of disciplines. These characteristics are divided into following categories:

  • Extent: Where is the data located? This could be a geographical location or a temporal location as some data may only be valid for a certain period in time.

  • Constraint: What restrictions apply to the data: when can it be used, what licenses are required? Is the data sensitive and can it only be used by persons who have a security clearance?

  • Data quality: What is the quality of the data? Does it comply with a given standard? What process was used to create the data?

  • maintenance: Where and how to get updates of the data? When will the next update be available?

  • Spatial representation and reference system: what locations on earth does the data represent? What is the geographic reference of the data?

  • Content: What data is linked to the dataset? Does it contain features, for example a DBF file for a SHP file.

  • Distribution: Where can I get the data or information on the data? Who should I contact and how? In what formats is it available?

  • Portrayal catalog: What portrayal catalog is used to display the data?

  • Application schema information: what application schema was used to build the data set?

  • Identification: What topic is the data about? Where does it come from? Where can you get it? Who is responsible for it? What should it be used for? Is it part of a larger data set? What language is it expressed in? This information should enable you to uniquely identify the data.

It is clear that not all of this information is available for every dataset. Moreover, it might not be necessary for every dataset to keep or maintain all of this information. The standard does not require to provide all of these categories for a dataset. The standard marks every characteristic as either mandatory, optional or conditional. Mandatory characteristics are required, optional are not and conditional characteristics are only required when certain conditions hold. Of the above categories only the identification is mandatory when providing metadata.

While the standard does not require you to provide all characteristics of the data, it may well be possible that you may want to provide some which are not available in the standard. To that end the standard specifies rules to extend the set of characteristics. The result of such an extension is called a profile. The standard contains one category, extension, which enables the description of the profile inside the metadata. Theoretically this enables reading metadata from every profile. Note that every profile should contain a number of core elements defined in the standard, so that at least this core can be exchanged by everyone supporting the standard.

Domain model

The domain model for ISO 19115 can be found in format.metadata.model and its sub packages. It is generated based on the ISO 19139 schema documents using a small set of rules.

  • Domain classes are put into a package based on the UML diagram in which they are defined in the specification. Each of the diagrams corresponds to one Java package, a subpackage of format.metadata.model, see table Table 1, “LuciadLightspeed subpackages for metadata categories”.

    Table 1. LuciadLightspeed subpackages for metadata categories
    Category subpackage

    Identification information

    identification

    Constraint information

    constraint

    Data quality information

    quality

    Lineage information

    lineage

    Maintenance information

    maintenance

    Spatial representation information

    spatial

    Reference system information

    reference

    Content information

    content

    Portrayal catalog information

    portrayal

    Distribution information

    distribution

    Metadata extension information

    extension

    Application schema information

    applicationschema

    Metadata types, extent information

    extent

    Metadata types, citation information

    citation

  • Each schema type is converted to a Java domain class. For example, the CI_Citation types becomes TLcdISO19115Citation. These Java classes have properties which correspond to the attributes and elements listed in the schema type.

  • For each property, a public getter and setter method is defined.

  • Some properties can have multiple values defined. For these properties, a getter method is defined that returns a List. A setter method is not defined; modifications are done by modifying the list returned by calling the getter method.

The starting class from which all attributes can be reached is the class TLcdISO19115Metadata located in format.metadata.

Codelists

Codelists are used throughout the standard to define a specified set of values which can be extended during runtime. Where the enumeration pattern does not allow you to create new instances the code list will. Code lists should be used when reasonable values, but not all values of an attribute are known. Instead of having to list all possible values, including the exotic, once-in-a-lifetime values, it suffices to list a set of reasonable values which cover almost every occurrence.

Each code list is represented by a domain class that extends from TLcdISO19115Code. The domain class defines public constants for all the code values listed in the standard.

Generic access

All domain classes implement ILcdDataObject. This enables generic access to the content of every class regardless of the location in the metadata hierarchy.

See Unified access to domain objects for a detailed discussion of generic access and data models.

The TLcdISO19115DataTypes class provides access to the ISO 19115 data model. Note that this data model is an anonymous model that groups the data models provided by TLcdGCODataTypes, TLcdGMDDataTypes, TLcdGSRDataTypes, TLcdGSSDataTypes and TLcdGTSDataTypes. These latter data models represent the types defined in the different xml schema components (gco.xsd, gmd.xsd, and so on) that make up ISO 19115.

Dealing with property types

The ISO 19139 standard specifies that properties of a class have to be encoded using XML class property types. In simple terms this means that when a class A refers to a class B, a new association class called B_PropertyType is introduced to implement the link between A and B. As in most cases these association classes carry no useful information, they have been suppressed from the domain object API. This makes the model far more easier to use. Take for example the TLcdISO19115Metadata class. This class has an association to the TLcdISO19115Distribution class, which is exposed through the getDistributionInfo and setDistributionInfo methods.

Because association classes can carry useful information, such as for example a nil reason, they cannot be removed completely. They are merely suppressed from the public accessors provided by the domain classes. However, they are accessible using the ILcdDataObject API. Program: Accessing nil reason shows how for example the nil reason of a distribution info can be set. First, the value of the distribution info property is retrieved. If this value is null, a new property is created and assigned to the TLcdISO19115Metadata object. Finally, the nil reason of the property is set.

Program: Accessing nil reason
public void setDistributionNilReason(TLcdISO19115Metadata object, String nilReason) {
  TLcdISO19118Property<?> property = (TLcdISO19118Property<?>) object.getValue(DISTRIBUTION_INFO_PROPERTY);
  if (property == null) {
    property = (TLcdISO19118Property<?>) DISTRIBUTION_INFO_PROPERTY.getType().newInstance();
    object.setValue(DISTRIBUTION_INFO_PROPERTY, property);
  }
  property.setNilReason(nilReason);
}

Validation

The implementing classes do not contain validation code. The standard defines which attributes are mandatory, optional and conditional. For some attributes specific restrictions apply, for example a latitude should be a value between -90.0 and 90.0. Mandatory elements can be recognized as they are passed in the constructor. However, on construction it is allowed to pass null values. When an element is multivalued, an array of objects of the correct type needs to be passed. There is no check whether the array contains null values. Since the implementation is Vector based, null values are allowed, though it is not advisable to pass them. In some cases where the conditions are simple and exclusive, additional constructors are provided with conditional parameters.

As there is no validation the standard is not imposed when creating metadata objects. This facilitates creation of new metadata objects.

Visualization

The class MetadataTree in the samples package samples.metadata.util can be used to visualize the contents of a TLcdISO19115Metadata. Figure 1, “Metadata visualized using a MetadataTree displays what the contents of a metadata object might look like.

metadata tree
Figure 1. Metadata visualized using a MetadataTree

ISO 19139: an XML implementation

Introduction

The ISO 19139 standard provides a universal implementation of ISO 19115 through an XML Schema encoding. This encoding conforms to the ISO 19118 standard, which defines a set of encoding rules for transforming a UML conceptual schema from the ISO 19100 series of documents into an XML schema.

Decoding metadata

LuciadLightspeed provides a decoder and encoder to decode and encode metadata in the ISO 19139 format in the package format.metadata.xml. Both the decoder and the encoder are implemented using the LuciadLightspeed XML framework.

See Integrating XML data into your application for a more detailed explanation.

The primary class to be used for decoding ISO 19139 data is TLcdISO19139MetadataDecoder. The following code sample illustrates how to decode a ISO 19139 data source.

Program: Decoding metadata
TLcdISO19139MetadataDecoder decoder = new TLcdISO19139MetadataDecoder();
TLcdISO19115Metadata metadata = decoder.decodeMetadata("Data/metadata/iso19139/wash_spot_small.xml");

Note that this decoder does not implement ILcdModelDecoder, as the decoded objects do not implement ILcdModel. You can use decodeMetadata(String) to decode a data source containing a <MD_Metadata> element as root element; it will return a TLcdISO19115Metadata object. For any other data source, use decodeObject(String), which returns an instance of the metadata domain model class that corresponds to the XML root element of the data source.

Because the decoded metadata does not implement ILcdModel, it cannot be displayed directly on a view. It is however possible to extract information that can be viewed from the metadata. The sample samples.metadata.gazetteer decodes metadata files containing the bounds and source name of geographic data files. The bounds are displayed on a map, enabling the user to see for which locations data is available. For a particular bounds, the user can request the metadata and/or load the actual data.

Custom metadata extensions

The ISO 19139 specification allows you to include custom metadata in ISO metadata files by extending ISO 19139 XML types to custom XML types. The TLcdISO19139MetadataDecoder and TLcdISO19139MetadataEncoder also support these extensions.

See the Create your own ISO metadata extension article for more information about creating such an extension.

XML documents that properly refer to extension schemas in their xsi:schemaLocation attribute are automatically decoded by the metadata decoder: the decoder automatically detects the extension schemas and configures itself accordingly.

If no schema references are present in the XML data, or if the schema references are incorrect, you can explicitly preconfigure the decoder and encoder via an extension data model, as shown in snippet Program: Adding support for a custom XML schema to the metadata decoder.

Program: Adding support for a custom XML schema to the metadata decoder
TLcdDataModelBuilder dataModelBuilder = new TLcdDataModelBuilder("http://my-extension-namespace.com");
dataModelBuilder.addDependency(TLcdISO19115DataTypes.getDataModel());
// Add any other relevant dependency here

TLcdXMLDataModelBuilder extensionDataModelBuilder = new TLcdXMLDataModelBuilder(TLcdISO19115DataTypes.getDataModel());
extensionDataModelBuilder.setEntityResolver(new TLcdXMLEntityResolver());
extensionDataModelBuilder.buildDataModel(dataModelBuilder, "http://my-extension-namespace.com", "path-to-my-extension-xsd");

TLcdDataModel extensionDataModel = dataModelBuilder.createDataModel();

TLcdISO19139MetadataDecoder metadataDecoder = new TLcdISO19139MetadataDecoder(extensionDataModel);