NASA scientists have launched multiple probes and orbiters to study the surface of Mars — and each of these tools generates large quantities of data, from different types and resolutions of images to thermal data. Data scientists and engineers at the Jet Propulsion Lab are working to use this data to build a frost map of the planet’s surface, identifying both water and CO2 frost and various frost formations. To power this effort, they needed to first construct a training dataset that can be consumed by their model, which includes annotations from domain experts and combines multiple data types.
Using Labelbox Annotate, the team came up with a strategic, iterative approach to producing all of their training data. They broke down high-resolution images into smaller parts that were easier to label and easier for the model to process. They created a living labeling guide to instruct non-expert labelers on their task, adding edge cases as they came up. Finally, the team organized the thermal data by month and location to match those data points with corresponding images, further enhancing the dataset.
The frost map dataset was created over many iterations via Labelbox, and will soon be published to the broader scientific community, where other scientists and experts can ask questions, add context, and refine the data even more.
Note: This story is a recap based on the session, “How to build ML-ready data sets: Recent research from NASA/JPL,” at Labelbox Accelerate 2022.
Researchers have known about Martian frost for more than two hundred years, as these polar ice caps are visible from Earth in telescopes. Since the Space Age began in the 1960s, images collected by spacecraft have also shown geological features that suggest they were made by now-vanished ancient glaciers. In the last ten years, data from ground-penetrating radars and other orbiting detectors have shown scientists that water and carbon dioxide ice lies under the surface across much of Mars.
To further our understanding of how ice exists under a layer of dust and rocks, which protects it from melting despite high temperatures on the surface, NASA’s Jet Propulsion Lab is generating a holistic map of frost on the Marian surface, released on a monthly basis to track the planet's freeze-thaw cycle. The project is aimed at using the latest advances in data science and machine learning to help us better understand where, how, and why water and carbon dioxide frost appear on Mars's surface.
“Just like on earth, freeze-thaw cycles can cause erosion which is why we want to look at how the Martian frost cycle is shaping the surface of Mars,” says Mark Wronkiewicz, data scientist at JPL.
To create this map, researchers collected, organized, and labeled high-resolution image data from multiple sources including the Mars Reconnaissance Orbiter. Lower-resolution image data would also be generated from various other cameras, and thermal data collected in columns from the surface to the outer atmosphere of the planet by the Mars Climate Sounder.
After collecting the data, the team tackled the task of organizing, enhancing and labeling their high-resolution image data. Because these images were originally so large, they were broken down into smaller squares and then randomized and labeled.
Labeling the image data was a challenging task, as visible indicators of frost vary widely and in subtle ways, with certain formations affected by cracks, albedo, and other factors. To ensure that frost was identified accurately in the images, the team had each image labeled by three labelers via Labelbox. Each labeler had to draw a polygon identifying frost, describe why they believe frost is present in the image, and rate their own confidence in the accuracy of their label. The team then reviewed the images with low consensus scores as a group to improve consistency and identify specific causes of low confidence.
(Pictured above) Cracks, sublimation spots, and albedo captured in image data of Mars's surface can make it difficult to accurately identify frost patterns. These images will need to be labeled and reviewed by planetary scientists with years of expertise in recognizing various terrain formations.
Another key strategy adopted early in the project was the continuous creation of a labeling guide. As the team iterated on their labels, this guide was also updated with newly discovered edge cases and examples, so that the next set of labelers had reliable and accurate reference material for their task.
Once their image data was labeled, the team then set out to train their model. They broke the images further down into 300px X 300px tiles, which were then used to train the initial model. They then tested the model’s performance on a separate dataset and measured areas of low confidence and performance. When specific terrains or formations that caused poor performance were identified, the team curated a new training dataset that represented the model's problem areas.
The next challenge to tackle was combining their labeled image data with thermal data collected by the Mars Climate Sounder. To do this, the team paired data points by metadata such as the time and location that the data was collected.
“We're basically doing a form of model adjustment. So given the ML prediction on one axis, you can take that model confidence and then cross it with the confidence that you have that frost exists based on a second dataset,” said Wronkiewicz.
Viewing the confidence scores on both datasets together helped the team better understand where frost actually existed. Where confidence scores matched, the judgment was likely to be accurate. The areas where confidence scores differed could be pulled aside for further study by domain experts.
As the project continues in the next two to three years, the team will add more data types, collected by other orbiters and tools, to further build out a more reliable frost map of Mars. This will enable NASA researchers to answer further hypotheses and launch more studies on the planet. To learn more about this project, watch the session recording from Labelbox Accelerate 2022.