This article explains how you can make LuciadLightspeed decode data from Amazon S3. It helps you understand:

  • Accessing data using the Amazon Web Services (AWS) SDK

  • Updating the AWS SDK

  • The expected dataset structure on S3

Accessing data using the AWS SDK

As explained in Working with models, LuciadLightspeed accesses data by decoding the data into a model. The decoding is the responsibility of an ILcdModelDecoder. Different model decoders support different transfer protocols for the data they decode. However, many model decoders use a common abstraction to access the data: an ILcdInputStreamFactory. The model decoder usually indicates this by implementing the ILcdInputStreamFactoryCapable interface.

The default implementation of ILcdInputStreamFactory, TLcdInputStreamFactory, supports file system and HTTP(S) file transfers. It knows nothing about connecting to AWS though. In particular, it doesn’t know how to authenticate or how to perform partial object transfers.

LuciadLightspeed also provides other implementations of ILcdInputStreamFactory. One such implementation supports accessing an Amazon bucket using s3://, let’s call it S3InputStreamFactory. It’s implemented using the S3Client API from the AWS SDK for Java 2.x. To use S3InputStreamFactory, you must configure the environment in which you’re using LuciadLightspeed with the appropriate credentials and region settings for S3Client. ILcdInputStreamFactoryCapable model decoders can then decode data when you give them an s3://-style URL as the source name.

All LuciadLightspeed model decoders that use an ILcdInputStreamFactory to access the data are initially configured with a composite input stream factory. It’s backed by all the implementations that the Java service loader makes available. The services loader makes both the default implementation and S3InputStreamFactory available by default.

Implementing your own S3InputStreamFactory

If the internal S3InputStreamFactory doesn’t satisfy your needs, you can replace it with your own implementation. To make this easier, its entire implementation is also available as sample code, samples.common.io.aws.S3InputStreamFactory. If you modify the sample code and add it to your application’s classpath, it will be picked up by the composite input stream factory used by the model decoders.

When you implement your own version of the S3InputStreamFactory, we recommend that you remove lcd_aws.jar from the classpath. Otherwise, you end up with two different input stream factories which both state that they can handle S3 URLs. For LuciadFusion, you’ll also need to remove platform/lcd_fusionplatform_resources_aws.jar from classpath, and include a customized version of the S3ResourceConnector, which is available in sample code. Make sure that this customized version knows how to retrieve the S3Client from the input stream factory, and to include its package in the spring scan packages. For example, by specifying it in the fusion.config.additionalScanPackages VM parameter.

Updating the AWS SDK

We provide a version of the AWS SDK for Java 2.x library with LuciadLightspeed. We tested it to confirm that it works for us. Even so, Amazon steadily improves their library and often provides updated binaries. They do so to improve and expand the functionality, but also to mitigate any security concerns.

To prevent compatibility issues, we generally discourage updating LuciadLightspeed dependencies on your own. In this case, though, LuciadLightspeed uses only basic functionality from the AWS SDK library. In addition, all the code that interacts with it is available in the samples, so you can fix any incompatibilities arising from an update yourself.

For those reasons, we encourage you to use the latest compatible version of the AWS SDK for Java 2.x.

Expected dataset structure on S3

In general, model decoders assume that they can find files related to the entry point file using filename substitution. When you store the files for datasets that consist of multiple files on S3, you should preserve the directory layout of the dataset in the object keys.

Table 1, “Deriving S3 object keys from the dataset layout” shows an example of preserving the directory layout. Note that it shows S3 objects for the files only, not for the directories. You can replace <some prefix> with any string ending in /, or the empty string. To refer to this dataset, you use the entry point file location s3://<bucket>/<some prefix>entry_point_file.dat.

Table 1. Deriving S3 object keys from the dataset layout
Dataset layout S3 Object Key

dataset_root/

├ entry_point_file.dat

<some prefix>entry_point_file.dat

├ entry_point_file.epsg

<some prefix>entry_point_file.epsg

└ subdirectory/

  └ file_in_subdir.dat

<some prefix>subdirectory/file_in_subdir.dat

When recursively copying directories, the AWS CLI tool follows this convention.