|
|
|
|
|
intro = '''Remote sensing images from NASA's fleet of Earth-observing satellites are pivotal for applications as broad as land cover mapping, |
|
disaster monitoring, urban planning, and environmental analysis. The potential of AI-based geospatial foundation models for performing |
|
visual analysis tasks on these remote sensing images has garnered significant attention. To realize that potential, the crucial first |
|
step is to develop foundation models – computer models that acquire competence in a broad range of tasks, which can then be specialized |
|
with further training for specific applications. In this case, the foundation model is based on a large-scale vision transformer model |
|
trained with satellite imagery. |
|
|
|
Vision transformers employ AI/deep learning techniques to fine-tune the model to answer specific science questions. Through training |
|
on extensive remote sensing datasets, vision transformers can learn general relationships between the spectral data given as inputs, |
|
as well as capture high-level visual patterns, semantics, and spatial relationships that can be leveraged for a wide range of analysis tasks. |
|
Trained vision transformers can handle large-scale, high-resolution data; learn global reorientations; extract robust features; and support |
|
multi-modal data fusion – all with improved performance. |
|
|
|
The Data Science Group at NASA Goddard Space Flight Center's Computational and Information Sciences and Technology Office (CISTO) |
|
has implemented an end-to-end workflow to generate a pre-trained vision transformer which could evolve into a foundation model. |
|
A training dataset of over 2 million 128x128 pixel “chips” has been created from NASA’s Moderate Resolution Imaging Spectroradiometer (MODIS) |
|
surface reflectance products (MOD09). These data were used to train a SwinV2 vision transformer that we call SatVision. |
|
''' |
|
|
|
|
|
|