09 APIs and STAC Catalogs demo

Contents

09 APIs and STAC Catalogs demo#

UW Geospatial Data Analysis
CEE467/CEWA567
David Shean, Eric Gagliano, Quinn Brencher

Overview#

In this notebook, we’ll explore:

  1. Web APIs and why they matter for geospatial data

  2. Accessing catalogs with pystac_client and the STAC API

  3. Loading and visualizing nDarray satellite image data using odc-stac

1. APIs and Python#

Read about APIs on the prep page! There we introduced the API we’ll be querying from Open-Meteo.

See how an API request changes depending on what you’re asking for…#

Check out the open-meteo documentation. Scroll down to right underneath the chart and take note of the API url and its parameters. Now scroll back up, change the selections, and scroll back down to see how the url has changed.

Here’s an example of a basic API request using the requests library: We’ll send an API request to Open-Meteo’s API, an open source weather data provider#

# very common to use requests library for making HTTP requests
import requests
# notice that the response says 200 -- what does this mean again?
response = requests.get("https://api.open-meteo.com/v1/forecast?latitude=47.6&longitude=-122.3&current_weather=true")
response
<Response [200]>
# what does the reponse look like? follow the url!
response.request.url
'https://api.open-meteo.com/v1/forecast?latitude=47.6&longitude=-122.3&current_weather=true'
# now we'll want to use the .json() method to get the data
data = response.json()
data
{'latitude': 47.60265,
 'longitude': -122.28637,
 'generationtime_ms': 0.0692605972290039,
 'utc_offset_seconds': 0,
 'timezone': 'GMT',
 'timezone_abbreviation': 'GMT',
 'elevation': 93.0,
 'current_weather_units': {'time': 'iso8601',
  'interval': 'seconds',
  'temperature': '°C',
  'windspeed': 'km/h',
  'winddirection': '°',
  'is_day': '',
  'weathercode': 'wmo code'},
 'current_weather': {'time': '2025-03-04T03:00',
  'interval': 900,
  'temperature': 8.5,
  'windspeed': 8.2,
  'winddirection': 229,
  'is_day': 0,
  'weathercode': 3}}
# Display the results
print(f"Temperature in Seattle: {data['current_weather']['temperature']}°C")
print(f"Wind speed: {data['current_weather']['windspeed']} km/h")
print(f"Time of observation: {data['current_weather']['time']}")
Temperature in Seattle: 8.5°C
Wind speed: 8.2 km/h
Time of observation: 2025-03-04T03:00

What if we tried with an invalid request?#

try:
    # Bad request with invalid parameters
    bad_response = requests.get("https://api.open-meteo.com/v1/forecast?latitude=999&longitude=-999")
    
    # Check if the request was successful
    bad_response.raise_for_status()  
    
except requests.exceptions.HTTPError as err:
    print(f"HTTP Error occurred: {err}")
    print(f"Response body: {bad_response.text}")
HTTP Error occurred: 400 Client Error: Bad Request for url: https://api.open-meteo.com/v1/forecast?latitude=999&longitude=-999
Response body: {"error":true,"reason":"Latitude must be in range of -90 to 90°. Given: 999.0."}

Now let’s try an API focused on Geospatial data! We’ll try out the Overpass API that will allow us to access OpenStreetMap data#

def query_overpass(query):
    overpass_url = "https://overpass-api.de/api/interpreter"
    response = requests.get(overpass_url, params={'data': query})
    return response
query = """
[out:json];
// Define our search area
area["name"="University of Washington"]->.searchArea;
// Find all "restaurants in this area
node["amenity"="restaurant"](area.searchArea);
// Output all data for these nodes
out body;
"""

response = query_overpass(query)
response
<Response [200]>
result = response.json()
result
{'version': 0.6,
 'generator': 'Overpass API 0.7.62.5 1bd436f1',
 'osm3s': {'timestamp_osm_base': '2025-03-04T03:02:45Z',
  'timestamp_areas_base': '2024-12-31T00:49:33Z',
  'copyright': 'The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.'},
 'elements': [{'type': 'node',
   'id': 2671015705,
   'lat': 47.6517371,
   'lon': -122.3132664,
   'tags': {'amenity': 'restaurant',
    'cuisine': 'sandwich',
    'name': 'Vista Café',
    'opening_hours': 'Mo-Fr 07:30-15:00',
    'website': 'https://www.hfs.washington.edu/dining/Default.aspx?id=324'}},
  {'type': 'node',
   'id': 2995199840,
   'lat': 47.649378,
   'lon': -122.3077192,
   'tags': {'addr:city': 'Seattle',
    'addr:housenumber': '1959',
    'addr:postcode': '98195',
    'addr:street': 'Northeast Pacific Street',
    'amenity': 'restaurant',
    'name': 'Plaza Cafe',
    'source:addr:id': '14021'}},
  {'type': 'node',
   'id': 3355097453,
   'lat': 47.6566177,
   'lon': -122.3148405,
   'tags': {'addr:city': 'Seattle',
    'addr:housenumber': '1218',
    'addr:postcode': '98105',
    'addr:state': 'WA',
    'addr:street': 'Northeast Campus Parkway',
    'amenity': 'restaurant',
    'cuisine': 'american',
    'name': 'Cultivate'}},
  {'type': 'node',
   'id': 6437270427,
   'lat': 47.6596297,
   'lon': -122.304489,
   'tags': {'addr:city': 'Seattle',
    'addr:housenumber': '4294',
    'addr:postcode': '98195',
    'addr:state': 'WA',
    'addr:street': 'Little Canoe Channel Northeast',
    'addr:unit': '202',
    'amenity': 'restaurant',
    'branch': 'Willow Hall',
    'brand': 'Pagliacci Pizza',
    'brand:wikidata': 'Q7124370',
    'cuisine': 'pizza',
    'diet:vegan': 'yes',
    'diet:vegetarian': 'yes',
    'level': '1',
    'name': 'Pagliacci Pizza',
    'opening_hours': 'Mo-Th 15:00-21:00; Fr 15:00-20:00',
    'takeaway': 'yes',
    'website': 'https://www.pagliacci.com/location/willow-hall'}}]}
import shapely
import geopandas as gpd
import matplotlib.pyplot as plt
import contextily as ctx
# Process results into a GeoDataFrame
restaurant = []
for element in result['elements']:
    if element['type'] == 'node':
        cafe = {
            'id': element['id'],
            'name': element.get('tags', {}).get('name', 'Unnamed'),
            'geometry': shapely.geometry.Point(element['lon'], element['lat'])
        }
        restaurant.append(cafe)
restaurant_gdf = gpd.GeoDataFrame(restaurant, crs="EPSG:4326")
restaurant_gdf
id name geometry
0 2671015705 Vista Café POINT (-122.31327 47.65174)
1 2995199840 Plaza Cafe POINT (-122.30772 47.64938)
2 3355097453 Cultivate POINT (-122.31484 47.65662)
3 6437270427 Pagliacci Pizza POINT (-122.30449 47.65963)
f,ax=plt.subplots(figsize=(10,10))
restaurant_gdf.plot(ax=ax, column='name', categorical=True, legend=True, legend_kwds={'loc': 'upper left'}, edgecolor='black', markersize=200)
ctx.add_basemap(ax, crs=restaurant_gdf.crs, source=ctx.providers.OpenStreetMap.Mapnik)
../../_images/c1fb765f753c0c25213165247af204940aa176cc91b8e39c2987ca41b87d271a.png

Or we could have used osmnx, a python package that makes API requests to OpenStreetMap easier#

import osmnx

uw_gdf = osmnx.geocode_to_gdf("University of Washington, Seattle, WA")
uw_gdf
geometry bbox_west bbox_south bbox_east bbox_north place_id osm_type osm_id lat lon class type place_rank importance addresstype name display_name
0 MULTIPOLYGON (((-122.32181 47.65497, -122.3217... -122.321808 47.647782 -122.287588 47.666072 301152014 relation 5268488 47.65543 -122.300169 amenity university 30 0.65117 amenity University of Washington University of Washington, 9th Avenue Northeast...
wilcox_hall_gdf = osmnx.geocode_to_gdf("2117 Mason Rd, Seattle, WA 98195")
walk_network = osmnx.graph_from_place("University of Washington, Seattle, WA", network_type="walk")
restaurants_gdf = osmnx.features.features_from_place("University of Washington, Seattle, WA", tags={'amenity': 'restaurant'})
restaurants_gdf 
geometry addr:city addr:housenumber addr:postcode addr:state addr:street amenity cuisine diet:vegan diet:vegetarian name opening_hours takeaway website level addr:unit branch brand brand:wikidata source:addr:id
element id
node 2671015705 POINT (-122.31327 47.65174) NaN NaN NaN NaN NaN restaurant sandwich NaN NaN Vista Café Mo-Fr 07:30-15:00 NaN https://www.hfs.washington.edu/dining/Default.... NaN NaN NaN NaN NaN NaN
2995199840 POINT (-122.30772 47.64938) Seattle 1959 98195 NaN Northeast Pacific Street restaurant NaN NaN NaN Plaza Cafe NaN NaN NaN NaN NaN NaN NaN NaN 14021
3355097453 POINT (-122.31484 47.65662) Seattle 1218 98105 WA Northeast Campus Parkway restaurant american NaN NaN Cultivate NaN NaN NaN NaN NaN NaN NaN NaN NaN
6437270427 POINT (-122.30449 47.65963) Seattle 4294 98195 WA Little Canoe Channel Northeast restaurant pizza yes yes Pagliacci Pizza Mo-Th 15:00-21:00; Fr 15:00-20:00 yes https://www.pagliacci.com/location/willow-hall 1 202 Willow Hall Pagliacci Pizza Q7124370 NaN
f,ax=plt.subplots(figsize=(10,10))

uw_gdf.plot(ax=ax, facecolor='none', edgecolor='purple', linewidth=5)
osmnx.convert.graph_to_gdfs(walk_network, nodes=False, edges=True).plot(ax=ax, linewidth=0.5, edgecolor='black')

wilcox_hall_gdf.plot(ax=ax, color='red', zorder=4)

restaurant_gdf.plot(ax=ax, column='name', categorical=True, legend=True, legend_kwds={'loc': 'upper left'}, edgecolor='black', markersize=200, zorder=5)


ctx.add_basemap(ax, crs=uw_gdf.crs, source=ctx.providers.OpenStreetMap.Mapnik)
../../_images/41bb51227d47aa821c9687045d48a6ce99358caa871dd14da4d4f72e946827d2.png

We could even use the walk network to find which place Eric can grab lunch fastest from! See graph network, routing, and travel time examples here.

2. Querying STAC catalogs#

Microsoft’s Planetary Computer is one of many platforms that host a variety of Earth observation datasets accompanied by a STAC catalog to access them. Check out Reading Data from the STAC API for more info. We’ll be looking at their Sentinel-2 dataset for the rest of this demo.

Let’s try to access Microsoft Planetary Computer’s Sentinel-2 STAC catalog#

We’ll use pystac_client to open the STAC catalog#

import pystac_client
import planetary_computer
catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

#catalog = pystac_client.Client.open("https://earth-search.aws.element84.com/v1/") #another catalog option for Sentinel-2
catalog

Now we’ll search the catalog with a date range and region of interest…#

start_date = "2024-07-15"
end_date = "2024-07-15"
states_gdf = gpd.read_file('http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_5m.json')
wa_gdf = states_gdf[states_gdf['NAME'] == 'Washington']
bbox = wa_gdf.total_bounds
bbox
array([-124.733174,   45.543541, -116.915989,   49.002494])
search = catalog.search(
    collections=["sentinel-2-l2a"], # sentinel-2-c1-l2a if using earthsearch
    bbox=bbox,
    datetime=(start_date, end_date), 
    #query={"eo:cloud_cover": {"lt": 30}},
)

Now lets turn these items into a geodataframe for more familiar visualization#

stac_json = search.item_collection_as_dict()
metadata_gdf = gpd.GeoDataFrame.from_features(stac_json, "epsg:4326")
metadata_gdf
geometry datetime platform proj:epsg instruments s2:mgrs_tile constellation s2:granule_id eo:cloud_cover s2:datatake_id ... s2:cloud_shadow_percentage s2:nodata_pixel_percentage s2:unclassified_percentage s2:dark_features_percentage s2:not_vegetated_percentage s2:degraded_msi_data_percentage s2:high_proba_clouds_percentage s2:reflectance_conversion_factor s2:medium_proba_clouds_percentage s2:saturated_defective_pixel_percentage
0 POLYGON ((-117.87227 49.6472, -117.8839 49.621... 2024-07-15T18:59:21.024000Z Sentinel-2A 32611 [msi] 11UMQ Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T032504_A0473... 3.178876 GS2A_20240715T185921_047343_N05.10 ... 2.043696 82.493359 0.283331 2.020537 6.179187 0.0002 1.948861 0.967614 1.171530 0.0
1 POLYGON ((-118.29888 48.74531, -118.3575 48.61... 2024-07-15T18:59:21.024000Z Sentinel-2A 32611 [msi] 11UMP Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T031349_A0473... 0.000000 GS2A_20240715T185921_047343_N05.10 ... 0.000000 99.744135 0.040198 1.613112 5.027361 0.0000 0.000000 0.967614 0.000000 0.0
2 POLYGON ((-118.2306 48.8888, -118.2949 48.7539... 2024-07-15T18:59:21.024000Z Sentinel-2A 32611 [msi] 11ULQ Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T032531_A0473... 1.359492 GS2A_20240715T185921_047343_N05.10 ... 0.745373 0.920103 0.218757 0.526810 20.684105 0.0223 0.774191 0.967614 0.562098 0.0
3 POLYGON ((-118.29891 48.74525, -118.36187 48.6... 2024-07-15T18:59:21.024000Z Sentinel-2A 32611 [msi] 11ULP Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T032146_A0473... 0.902603 GS2A_20240715T185921_047343_N05.10 ... 0.001136 21.167192 0.121392 0.525987 57.613897 0.0282 0.000960 0.967614 0.009066 0.0
4 POLYGON ((-118.71336 47.83879, -118.75771 47.7... 2024-07-15T18:59:21.024000Z Sentinel-2A 32611 [msi] 11TLN Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T032831_A0473... 0.001157 GS2A_20240715T185921_047343_N05.10 ... 0.000000 50.961173 0.411384 0.167276 78.942764 0.0172 0.000041 0.967614 0.000852 0.0
5 POLYGON ((-119.11501 46.93209, -119.14232 46.8... 2024-07-15T18:59:21.024000Z Sentinel-2A 32611 [msi] 11TLM Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T031743_A0473... 0.005342 GS2A_20240715T185921_047343_N05.10 ... 0.000434 80.869365 0.277073 0.043462 78.576916 0.0005 0.000572 0.967614 0.004769 0.0
6 POLYGON ((-119.50484 46.02565, -119.51643 45.9... 2024-07-15T18:59:21.024000Z Sentinel-2A 32611 [msi] 11TLL Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T032212_A0473... 0.599712 GS2A_20240715T185921_047343_N05.10 ... 0.086370 99.523664 1.069172 0.005572 49.781638 0.0002 0.080797 0.967614 0.518914 0.0
7 POLYGON ((-120.23149 49.61961, -118.71495 49.5... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10UGV Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T033930_A0473... 0.271821 GS2A_20240715T185921_047343_N05.10 ... 0.198596 0.000000 0.154555 0.748641 27.691483 0.0201 0.116254 0.967614 0.154615 0.0
8 POLYGON ((-120.28118 48.72093, -118.79172 48.6... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10UGU Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T033602_A0473... 0.922678 GS2A_20240715T185921_047343_N05.10 ... 0.027017 0.000000 0.138891 0.323131 64.844561 0.0217 0.009847 0.967614 0.028119 0.0
9 POLYGON ((-121.61483 49.64444, -120.09543 49.6... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10UFV Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T033914_A0473... 0.000319 GS2A_20240715T185921_047343_N05.10 ... 0.000000 0.000007 0.060484 1.580960 14.125817 0.0000 0.000007 0.967614 0.000229 0.0
10 POLYGON ((-121.63972 48.74498, -120.14756 48.7... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10UFU Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T033556_A0473... 0.463157 GS2A_20240715T185921_047343_N05.10 ... 0.001015 0.000000 0.135149 2.662901 19.188905 0.0009 0.223868 0.967614 0.238553 0.0
11 POLYGON ((-122.49497 48.66176, -122.46544 48.7... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10UEV Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T031812_A0473... 0.017248 GS2A_20240715T185921_047343_N05.10 ... 0.009542 46.140018 0.134100 0.935389 5.247966 0.0207 0.006628 0.967614 0.010355 0.0
12 POLYGON ((-122.82841 47.76407, -122.79238 47.8... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10UEU Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T031942_A0473... 24.769497 GS2A_20240715T185921_047343_N05.10 ... 0.061116 23.954310 1.387328 0.309063 4.712663 0.0149 16.921999 0.967614 7.847162 0.0
13 POLYGON ((-118.89078 47.44256, -118.95204 47.3... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10TGT Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T032132_A0473... 0.000816 GS2A_20240715T185921_047343_N05.10 ... 0.000137 5.294677 0.289403 0.315408 78.317058 0.0230 0.000046 0.967614 0.000575 0.0
14 POLYGON ((-119.13453 46.88732, -119.14232 46.8... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10TGS Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T031846_A0473... 0.002766 GS2A_20240715T185921_047343_N05.10 ... 0.001273 26.234004 0.144550 0.105554 80.342644 0.0297 0.000315 0.967614 0.002226 0.0
15 POLYGON ((-119.51628 45.99841, -119.51643 45.9... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10TGR Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T033223_A0473... 0.002209 GS2A_20240715T185921_047343_N05.10 ... 0.014988 48.932171 0.223663 0.764961 88.134038 0.0215 0.000052 0.967614 0.000325 0.0
16 POLYGON ((-121.6634 47.84592, -120.19715 47.81... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10TFT Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T032623_A0473... 0.326187 GS2A_20240715T185921_047343_N05.10 ... 0.001788 0.000000 0.162491 0.917847 24.870823 0.0000 0.159601 0.967614 0.166556 0.0
17 POLYGON ((-121.68596 46.94617, -120.24442 46.9... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10TFS Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T033241_A0473... 0.000368 GS2A_20240715T185921_047343_N05.10 ... 0.000000 0.000000 0.097233 0.432288 38.390678 0.0011 0.000030 0.967614 0.000007 0.0
18 POLYGON ((-121.70747 46.04627, -120.28946 46.0... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10TFR Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T032640_A0473... 0.225212 GS2A_20240715T185921_047343_N05.10 ... 0.050076 0.000000 0.145577 0.361157 59.701610 0.0193 0.000849 0.967614 0.031901 0.0
19 POLYGON ((-123.00026 47.29126, -122.9524 47.42... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10TET Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T033934_A0473... 15.516947 GS2A_20240715T185921_047343_N05.10 ... 0.079762 3.986725 1.689407 0.404987 11.341443 0.0109 9.585163 0.967614 5.931718 0.0
20 POLYGON ((-123.00026 46.95371, -121.55749 46.9... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10TES Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T033241_A0473... 2.049094 GS2A_20240715T185921_047343_N05.10 ... 0.011032 0.000003 0.513568 0.290062 5.379120 0.0001 0.695436 0.967614 1.353201 0.0
21 POLYGON ((-123.00026 46.05357, -121.5811 46.04... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10TER Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T033626_A0473... 3.746428 GS2A_20240715T185921_047343_N05.10 ... 0.103384 0.000027 0.204694 0.168357 12.431969 0.0000 0.355307 0.967614 1.251828 0.0
22 POLYGON ((-123.1538 46.86417, -123.11032 46.98... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10TDT Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T031627_A0473... 9.474343 GS2A_20240715T185921_047343_N05.10 ... 0.291224 92.288250 3.916441 0.199283 6.989885 0.0646 2.646397 0.967614 6.827946 0.0
23 POLYGON ((-123.47109 45.96244, -123.42164 46.1... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10TDS Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T032114_A0473... 12.754121 GS2A_20240715T185921_047343_N05.10 ... 0.293659 70.333040 1.541471 0.035631 8.277968 0.0378 6.001901 0.967614 6.751259 0.0
24 POLYGON ((-123.78121 45.06065, -123.77546 45.0... 2024-07-15T18:59:21.024000Z Sentinel-2A 32610 [msi] 10TDR Sentinel 2 S2A_OPER_MSI_L2A_TL_MSFT_20240716T031831_A0473... 0.640411 GS2A_20240715T185921_047343_N05.10 ... 0.015383 47.716999 0.153482 0.025650 14.854492 0.0213 0.329347 0.967614 0.310017 0.0

25 rows × 34 columns

f, ax = plt.subplots(figsize=(10,10))

metadata_gdf.plot(
    ax=ax, 
    column="s2:mgrs_tile", # "s2:tile_id" for earthsearch
    categorical=True, 
    legend=True, 
    alpha=0.5, 
    edgecolor="k",
)

wa_gdf.boundary.plot(ax=ax, color="black", linewidth=2)

ctx.add_basemap(ax, crs=metadata_gdf.crs, source=ctx.providers.Esri.WorldImagery)

ax.set_title("Sentinel-2 Tiles in Washington State")

f.tight_layout()
../../_images/0211794bc0d3ae57f895fb4562c959e007d48d91e5d807c1576ce4ffa2ae2977.png
# metadata_gdf.explore("s2:granule_id",
#                      categorical=True,
# )

3. Loading Sentinel-2 Imagery with odc-stac#

Let’s search the STAC catalog for a smaller area with a longer time range–we’ll focus again on Mt. Rainier throughout July 2024#

import odc.stac
import rioxarray
start_date = "2024-07-01"
end_date = "2024-07-31"

rainier_bbox_gdf = gpd.read_file("https://github.com/egagli/easysnowdata/raw/main/docs/examples/mt_rainier.geojson")
bbox = rainier_bbox_gdf.total_bounds
bbox
array([-121.94224976,   46.72842173, -121.54136001,   46.99728203])
search = catalog.search(
    collections=["sentinel-2-l2a"], # sentinel-2-c1-l2a if using earthsearch
    bbox=bbox,
    datetime=(start_date, end_date), 
    #query={"eo:cloud_cover": {"lt": 30}},
)

items = search.item_collection()
items

Now we’ll load the items into a lazily loaded xarray dataset#

bands = ["B02", "B03", "B04", "B08"]  # Blue, Green, Red, NIR

s2_ds = odc.stac.load(items,
                      #bands=bands, # you can specify bands here if you'd like, 
                      # but since we are using lazy loading, we won't take a performance hit 
                      # includind all bands unless we need them for a calculation
                      #chunks={"x": 256, "y": 256},
                      resolution=80,
                      chunks={},
                      groupby='solar_day',
                      bbox=bbox,
)

s2_ds
<xarray.Dataset> Size: 114MB
Dimensions:      (y: 380, x: 389, time: 12)
Coordinates:
  * y            (y) float64 3kB 5.206e+06 5.206e+06 ... 5.176e+06 5.176e+06
  * x            (x) float64 3kB 5.804e+05 5.805e+05 ... 6.114e+05 6.115e+05
    spatial_ref  int32 4B 32610
  * time         (time) datetime64[ns] 96B 2024-07-03T19:09:19.024000 ... 202...
Data variables: (12/16)
    AOT          (time, y, x) float32 7MB dask.array<chunksize=(1, 380, 389), meta=np.ndarray>
    B01          (time, y, x) float32 7MB dask.array<chunksize=(1, 380, 389), meta=np.ndarray>
    B02          (time, y, x) float32 7MB dask.array<chunksize=(1, 380, 389), meta=np.ndarray>
    B03          (time, y, x) float32 7MB dask.array<chunksize=(1, 380, 389), meta=np.ndarray>
    B04          (time, y, x) float32 7MB dask.array<chunksize=(1, 380, 389), meta=np.ndarray>
    B05          (time, y, x) float32 7MB dask.array<chunksize=(1, 380, 389), meta=np.ndarray>
    ...           ...
    B11          (time, y, x) float32 7MB dask.array<chunksize=(1, 380, 389), meta=np.ndarray>
    B12          (time, y, x) float32 7MB dask.array<chunksize=(1, 380, 389), meta=np.ndarray>
    B8A          (time, y, x) float32 7MB dask.array<chunksize=(1, 380, 389), meta=np.ndarray>
    SCL          (time, y, x) float32 7MB dask.array<chunksize=(1, 380, 389), meta=np.ndarray>
    WVP          (time, y, x) float32 7MB dask.array<chunksize=(1, 380, 389), meta=np.ndarray>
    visual       (time, y, x) float32 7MB dask.array<chunksize=(1, 380, 389), meta=np.ndarray>

Try a rough RGB composite time-series!#

rgb_da = s2_ds[["B04", "B03", "B02"]].to_array(dim='band').compute() # What do you think compute does here?
rgb_da
/home/eric/miniconda3/envs/uwgda2025/lib/python3.12/site-packages/rasterio/warp.py:387: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
  dest = _reproject(
/home/eric/miniconda3/envs/uwgda2025/lib/python3.12/site-packages/rasterio/warp.py:387: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
  dest = _reproject(
<xarray.DataArray (band: 3, time: 12, y: 380, x: 389)> Size: 21MB
array([[[[ 1418.,  1736.,  1470., ...,    nan,    nan,    nan],
         [ 1171.,  1415.,  1284., ...,    nan,    nan,    nan],
         [ 1312.,  1520.,  1125., ...,    nan,    nan,    nan],
         ...,
         [ 1274.,  1237.,  1250., ...,    nan,    nan,    nan],
         [ 1108.,  1140.,  1139., ...,    nan,    nan,    nan],
         [ 1161.,  1121.,  1116., ...,    nan,    nan,    nan]],

        [[ 1428.,  1507.,  1472., ...,  1178.,  1213.,  1261.],
         [ 1335.,  1269.,  1185., ...,  1192.,  1070.,  1632.],
         [ 1143.,  1173.,  1134., ...,  1153.,  1072.,  1956.],
         ...,
         [ 1305.,  1256.,  1285., ...,  1185.,  1216.,  1256.],
         [ 1153.,  1173.,  1165., ...,  1211.,  1334.,  1233.],
         [ 1199.,  1144.,  1147., ...,  1305.,  1341.,  1370.]],

        [[ 1383.,  1424.,  1390., ...,    nan,    nan,    nan],
         [ 1295.,  1238.,  1155., ...,    nan,    nan,    nan],
         [ 1111.,  1135.,  1104., ...,    nan,    nan,    nan],
         ...,
...
         [ 7645.,  7786.,  7224., ..., 13165., 13244., 13690.],
         [ 7044.,  7070.,  5597., ..., 13576., 13199., 12536.],
         [ 6430.,  5693.,  5819., ..., 13065., 12393., 12398.]],

        [[ 1348.,  1391.,  1365., ...,    nan,    nan,    nan],
         [ 1321.,  1263.,  1236., ...,    nan,    nan,    nan],
         [ 1194.,  1202.,  1182., ...,    nan,    nan,    nan],
         ...,
         [ 1224.,  1185.,  1196., ...,    nan,    nan,    nan],
         [ 1213.,  1269.,  1217., ...,    nan,    nan,    nan],
         [ 1220.,  1176.,  1201., ...,    nan,    nan,    nan]],

        [[10509., 10432., 10241., ...,  5463.,  5112.,  5355.],
         [10578., 10403., 10128., ...,  6485.,  6905.,  6594.],
         [10090.,  9916.,  9837., ...,  9362.,  8993.,  8220.],
         ...,
         [ 5819.,  5637.,  5652., ...,  8536.,  8537.,  8094.],
         [ 4827.,  5094.,  4970., ...,  8827.,  8750.,  8499.],
         [ 4971.,  4684.,  4528., ...,  9035.,  8831.,  8816.]]]],
      dtype=float32)
Coordinates:
  * y            (y) float64 3kB 5.206e+06 5.206e+06 ... 5.176e+06 5.176e+06
  * x            (x) float64 3kB 5.804e+05 5.805e+05 ... 6.114e+05 6.115e+05
    spatial_ref  int32 4B 32610
  * time         (time) datetime64[ns] 96B 2024-07-03T19:09:19.024000 ... 202...
  * band         (band) object 24B 'B04' 'B03' 'B02'
rgb_da.plot.imshow(col='time',col_wrap=4,robust=True)
<xarray.plot.facetgrid.FacetGrid at 0x7efe4d3176b0>
../../_images/7229ee7f9416a8c2b413b3541f36e8316d6452bc8db884b1f36b34d4cb7ceaba.png

Plot a median RGB image for July 2024#

rgb_da.median(dim='time').plot.imshow(robust=True)
<matplotlib.image.AxesImage at 0x7efe4e3b46e0>
../../_images/badd40144fc78bd49dee1944cd289973a625c5c612c34ee617bf4eea765bb091.png

How can we tell when and where there are clouds?#

Check out the scene classification layer, SCL!

import matplotlib.colors
import matplotlib.ticker as mticker

# here's some colormap info
scl_class_info = {
    0: {"name": "No Data (Missing data)", "color": "#000000"},
    1: {"name": "Saturated or defective pixel", "color": "#ff0000"},
    2: {
        "name": "Topographic casted shadows",
        "color": "#2f2f2f",
    },  # (called 'Dark features/Shadows' for data before 2022-01-25)
    3: {"name": "Cloud shadows", "color": "#643200"},
    4: {"name": "Vegetation", "color": "#00a000"},
    5: {"name": "Not-vegetated", "color": "#ffe65a"},
    6: {"name": "Water", "color": "#0000ff"},
    7: {"name": "Unclassified", "color": "#808080"},
    8: {"name": "Cloud medium probability", "color": "#c0c0c0"},
    9: {"name": "Cloud high probability", "color": "#ffffff"},
    10: {"name": "Thin cirrus", "color": "#64c8ff"},
    11: {"name": "Snow or ice", "color": "#ff96ff"},
}

scl_cmap = plt.cm.colors.ListedColormap([info["color"] for info in scl_class_info.values()])
class_values = sorted(list(scl_class_info.keys()))
bounds = [(class_values[i] + class_values[i + 1]) / 2 for i in range(len(class_values) - 1)]
bounds = [class_values[0] - 0.5] + bounds + [class_values[-1] + 0.5]
norm = matplotlib.colors.BoundaryNorm(bounds, scl_cmap.N)
ticklabels = [item['name'] for item in scl_class_info.values()]

s2_ds['SCL'].plot.imshow(col='time',
                         col_wrap=4,
                         cmap=scl_cmap,
                         cbar_kwargs={"ticks": class_values, "format":mticker.FixedFormatter(ticklabels)},
                         norm=norm,
                         robust=True)
<xarray.plot.facetgrid.FacetGrid at 0x7efe4eef5400>
../../_images/908c638876d749734b652aba35bc12f71638050d6da5cfe2bc107451a8d8fa75.png

Calculate and plot an NDVI time-series#

ndvi_da = (s2_ds["B08"] - s2_ds["B04"]) / (s2_ds["B08"] + s2_ds["B04"])
ndvi_da
<xarray.DataArray (time: 12, y: 380, x: 389)> Size: 7MB
dask.array<truediv, shape=(12, 380, 389), dtype=float32, chunksize=(1, 380, 389), chunktype=numpy.ndarray>
Coordinates:
  * y            (y) float64 3kB 5.206e+06 5.206e+06 ... 5.176e+06 5.176e+06
  * x            (x) float64 3kB 5.804e+05 5.805e+05 ... 6.114e+05 6.115e+05
    spatial_ref  int32 4B 32610
  * time         (time) datetime64[ns] 96B 2024-07-03T19:09:19.024000 ... 202...
ndvi_da.plot.imshow(col='time',col_wrap=4,robust=True,cmap='RdYlGn',vmin=-1,vmax=1)
<xarray.plot.facetgrid.FacetGrid at 0x7efe4ef76060>
../../_images/71c6cd11b5e47cded69c413d4af73c9c2806eb15266129a0d2424b7a3c394a45.png

Try again, masking out bad data according to the SCL#

bad_scl = [0,1,2,3,8,9,10]

ndvi_masked_da = ndvi_da.where(~s2_ds['SCL'].isin(bad_scl)).compute() # What do you think compute does here? 

ndvi_masked_da.plot.imshow(col='time',col_wrap=4,robust=True,cmap='RdYlGn',vmin=-1,vmax=1)
<xarray.plot.facetgrid.FacetGrid at 0x7efe4f292f00>
../../_images/f32cb1e92d3db6ba8987a1e8878a98259a2ccfdb4faa5b753e71bc648acd7119.png

Plot a median NDVI image for July 2024#

ndvi_masked_da.median(dim='time').plot.imshow(robust=True,cmap='RdYlGn',vmin=-1,vmax=1)
<matplotlib.image.AxesImage at 0x7efe4d42c3e0>
../../_images/4e3b9c26becfd63ba2e85385538329cd1de57e55877a02b16c13dd2e3e88ddd5.png

Create a median NDVI time-series#

# def not the best way to do this, but quick approximation
proportion_pixels_valid = ndvi_masked_da.count(dim=['x','y'])/ndvi_masked_da.count(dim=['x','y']).max()
proportion_pixels_valid
<xarray.DataArray (time: 12)> Size: 96B
array([0.63730512, 0.99736894, 0.65268564, 1.        , 0.6497117 ,
       0.99948918, 0.6696616 , 0.99463991, 0.63063649, 0.20557282,
       0.48157556, 0.01757774])
Coordinates:
    spatial_ref  int32 4B 32610
  * time         (time) datetime64[ns] 96B 2024-07-03T19:09:19.024000 ... 202...
f,ax=plt.subplots()
ndvi_masked_da.where(proportion_pixels_valid > 0.8).median(dim=['x','y']).plot.scatter(ax=ax,x='time',y='data')
ax.set_title('Median NDVI over time for images with >80% valid data')
ax.set_ylabel('NDVI')
ax.set_xlabel('Date')
ax.grid(True)
../../_images/487cecc319d50a37255ecbacd91155fbace1346a9d65ca746633a28eab3a6d02.png

Great work! We covered APIs, STAC, and loading data from STAC into xarray datasets with odc-stac. Check out some other STAC catalogs, such as Element84’s Earth Search and NASA’s cmr-stac (Data Discovery notebook with CMR-STAC API).#