Big data science gets Kool

With a name like Dr Kool it seems destined Johnathan Kool should end up as Manager of the Australian Antarctic Data Centre.

Of course his enthusiasm for working with large amounts of data helped too.

After completing his PhD at the University of Miami, simulating the movement of fish populations and genes in coral reefs, Dr Kool’s ability to distil large amounts of data into meaningful information for decision-makers had already set him on the ‘big data’ path.

“I feel like I was a big data scientist before big data was even a thing. I had to work with trillions of larval fish, trying to understand where they came from and where they went, to inform marine habitat conservation,” he said.

His expertise eventually saw him working with Geoscience Australia, where he managed large collections of marine data; even contributing to the search for missing Malaysian Airlines flight MH370.

His skills are now sought after to manage the Australian Antarctic Division’s large collection of marine and terrestrial information.

“The Antarctic Division is at the cusp of huge changes in the amount of data it will be able to collect, with a new ship, proposed aerodrome at Davis station, new traverse capability and modernising stations all coming online.

“When the new icebreaker RSV Nuyina arrives, there will be a scale change in the amount of marine data available,” Dr Kool said.

“There will be an incredible number of sensors on the ship and potentially hundreds of terabytes of information coming in, which will be repeated voyage after voyage, year after year.

“Similarly, the proposed aerodrome at Davis, and inland stations set up by traverse, could act as hubs for drones or autonomous vehicles that can collect large amounts of information.

“So we have to think about how to scale things up, while maintaining the same level of accessibility and transparency.”

To do this Dr Kool has a few guiding principles. The first is honing in on the message and tuning out the white noise.

“My role is to help curate and deliver information in a focused way, ensuring that the work of researchers is available to policy-makers. Better information leads to better decisions, which leads to better outcomes,” he said.

Better information also means having ‘metadata’ – or data about the data.

“It’s not just about numbers but also the science of where, what, who and how. Where the data was collected, what is it, who collected it and how it was collected?

“Metadata is incredibly important because it helps to make data FAIR – ‘findable’, ‘accessible’, ‘interoperable’ and ‘reusable’ by others.”

As Chair of the Standing Committee on Antarctic Data Management, which is part of the Scientific Committee for Antarctic Research, Dr Kool is working with representatives from other national Antarctic programs to enhance the interoperable and reusable aspects of FAIR data.

The committee aims to develop common standards and approaches to data collection that means the data can be used by anyone in any country. Representatives then work within the relevant organisations in their countries to ensure the approach is adopted.

To improve data ‘FAIRness’, Dr Kool is also working towards delivering data as services. Rather than individuals downloading files that reside on a single computer and that might end up out of date, the data sits on a master computer or in the cloud and is delivered through an interface on request, similar to how technology companies deliver satellite and map collections online.

“This has been a recent transformation for us, to move from data collections being locked away in our file systems to opening them up so they’re available online and for processing at scale,” Dr Kool said.

“Our biodiversity database, our underway data and our weather system service, for example, are now being made available through online applications and services.

“The point is to make sure all the data is accessible in a consistent way, and this promotes interoperability and reusability as well.

“My vision is that we can open up our Australian Antarctic data collection for big data research, so people can incorporate our data into weather models, global biodiversity models or whole of Antarctica studies.”

The service-oriented model will also allow the Australian Antarctic Data Centre to get a better understanding of how the data is being used, which will help the team to deliver products that people want.

With this in mind, Dr Kool and his team have a range of collaborations afoot, including with AusSeabed – a coordinated effort to map Australia’s undersea environments. This will feed into the global Nippon Foundation-GEBCO Seabed 2030 project, which aims to map the world’s oceans by 2030.

“The seabed information we collect on the Nuyina will be made available to the Australian Hydrographic Office, Geoscience Australia, and to programs like the Seabed 2030 Project, which will be able to generate their own products based on it,” he said.

Dr Kool is excited by the opportunities new technologies and automation will bring to exploring and mapping Antarctica and the Southern Ocean, including remote sensing from ships, aircraft and satellites, autonomous under-ice rovers and other submersibles, remotely operated cameras, and drones.

But he is quick to highlight that there will always be a role for people in Antarctica, to maintain, fix and fact check machines. And most importantly, to bring context and method to data collection.

“I want people to be excited by the possibilities of automation, without being overwhelmed,” he said.

“To do this we always need to bring it back to the story we want to tell, because there are so many stories we are able to tell.”

Wendy Pyper
Australian Antarctic Division