I analyzed a large volume of AIS data—the Automated Identification System used to track and identify ships. All ships broadcast their location and identity over AIS to other nearby ships and to AIS receivers. In the data that I have, there’s about 6 million observations per day with information regarding the time and location of the observation, the ship name and MMSI, and some fields about navigational status. That’s about 2.1 billion records a year and from what I understand, that’s a relatively small set of AIS data. Some satellite collected AIS data sets are an order of magnitude larger than that.
Ingesting the data into GeoMesa involves developing a mapping from raw AIS records to OGC SimpleFeatureTypes. Since AIS comes in a binary form, I pre-processed the data into CSVs using some off the shelf tools like libais and gpsdecode, the latter available on Ubuntu as a pre-built binary package. Once the data in a CSV form, I can use GeoMesa’s configurable delimited text converter to map the data into a SimpleFeature. Several ready-made examples of data model mappings can be found in the GeoMesa Github repository.
At this point, we have more than 2B records in our database ready to be sliced and diced. In the following screenshot, you can see the actual volume of the data when we render each AIS observation independently.
That’s not terribly useful. An aggregate heatmap is more useful at this scale of data.
Now we can start to see transit lanes and other larger patterns in the data. There’s a lot of ship traffic in the Mediterranean. We can also use GeoMesa’s Web Processing Service (WPS) tools to visualize histograms of various attributes. In Stealth, we have wired this into a visual control that allows a user to drag a box and compare attribute counts in different spatial regions.
While this is getting better, the best way to get a sense of this dynamic data set is to animate it. This is where GeoServer comes in. GeoServer is a very capable OGC service provider in its own right but its real power comes in its extensibility. You can plug in different data stores. GeoMesa has several custom data stores such as the AccumuloDataStore for high volume historical data and the KafkaDataStore and StreamDataStore for streaming real-time visualization. I dropped the AccumuloDataStore into my GeoServer deploy for this analysis.
The following screen capture demonstrates the power of this visualization. Several million records are packed into an efficient binary format by GeoMesa in response to a WFS request with the appropriate MIME type. The time slider shows a histogram of counts over the query range. The window of data shown on the map is controlled by adjusting the window within the time slider. The time slider histogram is dynamic. When you zoom in, it adjusts the histogram to just what is shown in the viewport. This is possible because all rendering is done client side—both the trajectories on the map and the histogram. And, the animation can either be played back using the control on the far right or dragged by grabbing the window. You can really begin to see patterns when you watch the trajectories evolve over time. Merchant vessels travel along the transit routes while other vessels have more ad-hoc patterns outside of the transit lanes.
To keep up-to-date with the latest work going on at CCRi, follow us on Twitter at @ccr_inc.