As SpaceML continues to grow it will help bridge the gap between data storage, code sharing and server side (cloud) analysis. Credit: FDL/SETI Institute

The SETI Institute and Frontier Development Lab ( are announcing the launch of SpaceML is a resource that makes AI-ready datasets available to researchers working in space science and exploration, enabling rapid experimentation and reproducibility.

The SpaceML Repo is a machine learning toolbox and community managed resource to enable researchers to more effectively engage in AI for space science and exploration. It is designed to help bridge the gap between , code sharing and server-side (cloud) analysis. includes analysis-ready datasets, projects and MLOPS tools designed to fast-track existing AI workflows to new use-cases. The datasets and projects build on five years of cutting-edge AI application completed by FDL teams of early-career Ph.D.s in AI/ML and multidisciplinary science domains in partnership with NASA, ESA and FDL's commercial partners. Challenge areas include earth science, lunar exploration, astrobiology, planetary defense, exploration medicine, disaster response, heliophysics and space weather.

"The most impactful and useful applications of AI and machine learning techniques require datasets that have been properly prepared, organized and structured for such approaches," said Bill Diamond, CEO of the SETI Institute. "Five years of FDL research across a wide range of science domains has enabled the establishment of a number of analysis-ready datasets that we are delighted to now make available to the broader research community."

FDL applies AI and machine learning (ML) technologies to science to push the frontiers of research and develop new tools to help solve some of humanity's biggest challenges, both here on Earth and in space.

Projects hosted on for the research community include:

  • A project tackling the problem of how to use ML to auto-calibrate space-based instruments used to observe the Sun. After years of exposure to our star, these instruments degrade over time—a bit like cataracts. Recalibration requires expensive sounding rockets. Using ML, the team has been able to augment the data, in effect "removing" the cataracts.

    "The hurdle for many researchers to start using the SDOML dataset, and to begin developing ML solutions, is the friction they experience when first starting," said Mark Cheung, Sr. Staff Physicist at Lockheed Martin and Principal Investigator for NASA Solar Dynamics Observatory/Atmospheric Imaging Assembly . "SpaceML gives them a jumpstart by reducing the effort needed for exploratory data analysis and model deployment. It also demonstrates reproducibility in action."

  • Another project demonstrates how the data reduction of a meteor surveillance network known as CAMS (Cameras for Allsky Meteor Surveillance) could be automated to identify new meteor shower clusters—potentially the trails of ancient Earth crossing Comets. Since the AI pipeline has been put into place a total of nine new meteor showers have been discovered via CAMS.

    "SpaceML helped accelerate impact by bringing in a team of citizen scientists who deployed an interpretable Active Learning and AI-powered meteor classifier to automate insights, allowing the astronomers focused research for the SETI CAMS project," said Siddha Ganju, Self Driving and Medical Instruments AI Architect, Nvidia (founding member of SpaceML's CAMS and Worldview Search Initiatives). "During SpaceML we (1) standardized the processing pipeline to process the decade long meteor dataset collected by CAMS, and, established the state of the art meteor classifier with a unique augmentation strategy; (2) enabled active learning in the CAMS pipeline to automate insights; and, (3) updated the NASA CAMS Meteor Shower Portal which now includes celestial reference points and a scientific communication tool. And the best thing is that future citizen scientists can partake in the CAMS project by building on the publicly accessible trained models, scripts, and web tools."

    SpaceML also hosts INARA (Intelligent ExoplaNET Atmospheric RetrievAI), a pipeline for atmospheric retrieval based on a synthesized dataset of three million planetary spectra, to detect evidence of possible biological activity in exoplanet atmospheres—in other words, "Are We Alone?" seeks to curate a central repository of project notebooks and datasets generated from projects similar to those listed above. These repositories contain a Google "Co-Lab' notebook that walks users through the dataset and includes a small data snippet for a quick test drive before committing to the entire data set (which are invariably very large).

    The projects also house the complete used for the challenges, which can be made available upon request. Additionally, SpaceML seeks to facilitate the management of new datasets that result from ongoing research and in due course run tournaments to invite improvements on ML models (and data) against known benchmarks.

    "We were concerned on how to make our AI research more reproducible," said James Parr, FDL Director and CEO, Trillium Technologies. "We realized that the best way to do this was to make the data easily accessible, but also that we needed to simplify both the on-boarding process, initial experimentation and workflow adaptation process."

    "The problem with AI reproducibility isn't necessarily, 'not invented here' - it's more, 'not enough time to even try." We figured if we could share analysis ready data, enable rapid server-side experimentation and good version control, it would be the best thing to help make these tools get picked up by the community for the benefit of all."

    FDL launches its 2021 program on June 16, 2021, with researchers in the US addressing seven challenges in the areas of Heliophysics, Astronaut Health, Planetary Science and Earth Science. The program will culminate in mid-August, with teams showcasing their work in a virtual event.

    More information: Visit

    Provided by SETI Institute