A Harmonious New Dataset
January 25, 2021 • What do you get when you combine data from sensors aboard two groundbreaking Earth observing missions into a single dataset? When it comes to the provisional release of the Harmonized Landsat Sentinel-2 (HLS) dataset, the result is global land surface data products with high temporal and spatial attributes that are uniquely processed to facilitate a wide range of terrestrial Earth science research.
Available for download through NASA’s Land Processes Distributed Active Archive Center (LP DAAC) and Earthdata Search, the provisional public release of HLS data products is the result of a collaborative NASA-led effort that provides data long-desired by researchers and land managers. HLS also is the first major Earth science dataset in NASA’s Earth Observing System Data and Information System (EOSDIS) collection that is hosted fully in the commercial cloud. The development of HLS data products, their processing, and their distribution are a glimpse into the future of upcoming high-volume NASA Earth science data collections.
HLS is produced from data acquired by the Operational Land Imager (OLI) aboard the joint NASA/USGS Landsat 8 satellite (launched in 2013) and the Multi-Spectral Instrument (MSI) aboard the European Space Agency (ESA) Sentinel-2A and Sentinel-2B satellites (launched in 2015 and 2017, respectively). Two provisional HLS products currently are available publicly: the Landsat 30-meter (L30) product (doi:10.5067/HLS/HLSL30.015) and the Sentinel 30-meter (S30) product (doi:10.5067/HLS/HLSS30.015). Both are atmospherically-corrected surface reflectance products.
“Provisional release” means that the scientific validation of the data products is still ongoing and there could be minor issues with the data output. The HLS science team expects to finish their validation of the HLS atmospheric correction code over the next few months, at which time a new version of the data will be released (along with updated documentation and DOIs for both products).
More Frequent Data
The importance of HLS, though, is not that data products created from instruments aboard two different satellite platforms exist, but that they exist to be used together as if they come from a single instrument aboard one satellite. “Our definition of harmonized is that observations should be interchangeable for common [spectral] bands,” says Dr. Jeff Masek, the HLS principal investigator (PI) and Landsat 9 project scientist. “By harmonizing the datasets and making the corrections so that it appears to the user that the data are coming from a single platform, it makes it easier for a user to put these two datasets together and get that high temporal frequency they need for land monitoring.”
Along with being able to use data from the OLI and MSI instruments seamlessly, HLS provides the temporal frequency and spatial resolution long desired by the terrestrial observation community.
While Landsat 8’s OLI acquires full global imagery every 16 days, the Moderate Resolution Imaging Spectroradiometer (MODIS) instruments aboard NASA’s Terra and Aqua satellites (launched in 1999 and 2002, respectively) and the Visible Infrared Imaging Radiometer Suite (VIIRS) aboard the joint NASA/NOAA Suomi National Polar-orbiting Partnership (Suomi NPP) and NOAA-20 satellites (launched in 2011 and 2017, respectively) acquire full global imagery every one to two days. The MSI aboard both Sentinel-2 platforms, however, acquires full global imagery every five days. By harmonizing OLI and MSI observations, full HLS global coverage is achieved roughly every two to three days – a frequency that will greatly aid studies of change over time.
Improved spatial resolution is another benefit achieved through Landsat/Sentinel-2 harmonization. The MODIS spatial resolution ranges from 250 to 1,000 meters, depending on the spectral band. MSI, on the other hand, is produced with a spatial resolution ranging from 10 to 30 meters. OLI has a 30-meter multi-spectral spatial resolution.
“What people have been looking for is a global dataset of land reflectance just like MODIS provides every day, but at much finer spatial resolution so you can actually see land management activities, fields, individual forests, urban areas, and so forth,” says Dr. Masek. “[HLS] provides much better temporal resolution than Landsat has ever provided along with much better spatial resolution than MODIS [can provide].”
HLS products are produced on a common 30-meter grid. As Dr. Masek explains, having 30-meter resolution for both HLS products is the easiest way to get the highest quality imagery. “You can’t really take a Landsat 30-meter resolution image down to 10-meters since you just don’t have that information from Landsat captured originally,” he says. “Instead, we average Sentinel data up to 30-meter resolution.”
HLS also includes data from the Landsat 8 Thermal Infrared Sensor (TIRS) instrument, which records data on Earth’s surface temperature. Although there is no comparable thermal infrared data from Sentinel-2, the HLS science team felt that providing the full Landsat spectral complement georegistered with Sentinel-2 was important.
Steps to Create an HLS Product
Four steps are involved in HLS processing, with both Landsat and Sentinel-2 data going through the first three steps and Sentinel-2 having a fourth step (see processing illustration). This processing has evolved significantly since the effort began and involves several groups. Prior to the provisional public release, data were processed at NASA’s Ames Research Center in Silicon Valley, California, using the NASA Earth Exchange (NEX) computing environment. This earlier version of HLS (version 1.4) mapped approximately 28% of Earth’s land surface and was archived at NASA’s Goddard Space Flight Center in Greenbelt, Maryland.
Even though HLS was still in the prototype stage and covered less than a third of Earth’s land surface, the potential value of this dataset was clear to the scientific community, including members of LP DAAC's User Working Group (UWG). LP DAAC is responsible for archiving and distributing data in the EOSDIS collection related to land cover and land use. The DAAC is a partnership between NASA and the USGS, and is located at the USGS Earth Resources Observation and Science (EROS) Center, which is where Landsat data are processed.
Making an Impact
“In 2017, our User Working Group made a recommendation for us to engage with the HLS team and start investigating whether there were ways to ultimately bring the HLS collection into LP DAAC,” says Tom Maiersperger, the LP DAAC project scientist. “I started having conversations with Jeff [Masek] around 2017.”
Taking HLS land coverage from 28% of Earth’s land surface to nearly 100% (Antarctica is excluded) and setting up a production stream to get data to LP DAAC for distribution was accomplished by NASA’s Interagency Implementation and Advanced Concepts Team (IMPACT) located at NASA’s Marshall Space Flight Center in Huntsville, Alabama. As Maiersperger observes, “IMPACT came in and made a big impact.”
IMPACT is part of NASA’s Earth Science Data Systems (ESDS) Program and works to maximize the scientific return of NASA’s missions and experiments with a focus on interagency collaboration, assessment and evaluation, and advanced concepts. The current provisional release of HLS (version 1.5) is a global dataset mapping nearly 100% of Earth’s land surface outside of Antarctica that is optimized for use in the Amazon Web Services (AWS) commercial cloud environment.
“[The IMPACT team] has taken our [research and development-level] code and refactored it, sped it up, and made it more suitable for routine processing on Amazon Web Services,” says Dr. Masek. “They’ve done a lot of work to make sure that the data formats, the metadata, and the file names are all compliant. All of that has been done in collaboration with LP DAAC and with the EROS Center.”
As IMPACT HLS Project Manager Dr. Brian Freitag explains, the IMPACT team produces the global HLS dataset, comprising the L30 and S30 products, and ensures data quality. “If there are granule failures, processing failures, or if there are inconsistencies with the archive at LP DAAC or the files we’ve generated on the IMPACT side, we’ll be responsible for restaging the data for them,” he says. “If there are reprocessing efforts that need to take place, we’ll be the ones responsible for reprocessing the data.”
The public release of HLS is for the forward-processing stream, meaning the initial data archive starts with the day the data first became publicly available (for S30 this is September 29, 2020; for L30 this is January 20, 2021). Upon the full release of HLS data in early- to mid-2021, IMPACT will begin back-processing to the beginning of the Landsat 8 and Sentinel-2 data records (2013 and 2015, respectively). The IMPACT team expects to have this back-processing completed by early-2022, according to Dr. Freitag.
Dr. Freitag and the IMPACT team also are preparing to add new satellites into the HLS data production stream. “Landsat 9 [scheduled for launch in 2021] and Sentinel-2C [scheduled for launch in 2023] will be integrated into the HLS product,” he notes. “IMPACT will handle any additional processing that needs to be done to integrate these new missions into the stream while generating L30 and S30 products for LP DAAC.”
Along with the S30 and L30 HLS data products available through LP DAAC and Earthdata Search, HLS imagery also is available through the EOSDIS Global Imagery Browse Services (GIBS) for interactive exploration using the NASA Worldview data visualization application. A Worldview HLS Tour Story provides information about the imagery and how to work with it in Worldview.
A unique aspect of HLS is that it will be processed, archived, and distributed in the AWS commercial cloud. Data users, though, may not even notice this. “The initial access to HLS will look very traditional,” says Tom Maiersperger at LP DAAC. “Earthdata Search will be the primary search and discovery portal for HLS and accessibility will be through Earthdata Search.”
As Maiersperger notes, the LP DAAC goal is to integrate HLS into their Application for Extracting and Exploring Analysis Ready Samples (AppEEARS) tool. AppEEARS allows users to work with long time-series, transform data in various ways, and reduce data volumes. While currently running on-premises at LP DAAC, the DAAC is working to refactor the AppEEARS code to evolve the tool to run in the commercial cloud. Maiersperger notes that the DAAC’s goal is to have this accomplished as soon as possible after the completion of the HLS historical record back-processing. “We’ve got a ways to go to complete not only the data record, but also for us to be able to provide the best level of service for these data that we can, but we’ll get there,” he says.
Next Level Data Analysis
Hosting HLS in the commercial cloud has significant benefits for data users, Dr. Freitag at IMPACT observes.
“We’re really trying to take data analysis to the next level where we’re able to provide this large-scale processing without large-scale compute requirements – either downloading a lot of data requiring large amounts of storage or needing to have a lot of memory so you can run through all the files at once,” he says. “For example, if you want to look at all the HLS data for a particular plot of land at the 30-meter resolution provided by HLS, you can do this using your laptop. Everything will be in cloud-optimized GeoTIFF format.”
Of course, the purpose of NASA data is to enable research, and HLS is expected to contribute significantly to explorations into terrestrial processes. “HLS is really a big deal,” says Tom Maiersperger. “For this dataset to have matured and for it to be global, which is unprecedented in earlier versions, just increases the significance of this product.”
A principal HLS application area will be agriculture, including studies into vegetation health; crop development, management, and identification; and drought impacts. HLS data already have been used in the development of a new vegetation seasonal cycle dataset available through LP DAAC: the Multi-Source Land Imaging (MuSLI) Land Surface Phenology (LSP) Yearly North America product (doi:10.5067/Community/MuSLI/MSLSP30NA.001).
Another important benefit of HLS is that Landsat 8 and both Sentinel-2 satellites have equator crossing times of 10 am and 10:30 am local time, respectively. Currently, NASA’s Terra satellite is the only MODIS or VIIRS platform with a morning crossing time (10:30 am local time). With Terra currently drifting in crossing time, HLS will become a primary source for global morning observations at a consistent equator crossing time.
A Collaborative Effort
The provisional release of HLS data products represents only the latest achievement of an on-going seven-year effort. Dr. Masek is quick to point out the challenges the team had to overcome to get to this point, including improving Sentinel-2 data processing and geolocation algorithms, developing better ways to correct for differences in view angles and surface reflectance from Landsat and Sentinel-2, and organizing the processing of such a high-volume dataset.
Dr. Masek also goes out of his way to highlight the particular contributions of Dr. Junchang Ju of NASA’s Biospheric Sciences Laboratory and the University of Maryland to the HLS success. “He’s really been the technical lead for this in terms of programming, validating the algorithm performance, catching mistakes, and working with [IMPACT] on implementing the code,” he says. “I want to make sure he gets a lot of credit for the work that’s gone on.”
Dr. Freitag and Tom Maiersperger also stress the collaboration between federal agencies that helped make HLS possible, particularly with the USGS and their work preparing Landsat data and the atmospheric correction code for HLS.
“I think that a lot of the success we’ve seen with the HLS effort can be tied back to the collaborative effort within the team and also the external collaboration between the federal partners,” observes Dr. Freitag. “The applications that we’ll see come out of HLS will be incredible.”
“We know this will be really big for the community,” adds Maiersperger. “There’s a lot to be excited about.”
What do you get when you give scientists and researchers an openly available, harmonized global dataset with high temporal and spatial attributes for Earth’s land surface that’s optimized for the cloud? As the leaders of the HLS effort point out, you get the potential for some amazing opportunities for discovery.