Lab 09 assignment notebook 2 (20 pts)#
Notebook 2 of 2
UW Geospatial Data Analysis
CEE467/CEWA567
David Shean, Eric Gagliano, Quinn Brencher
Setup from Notebook 1#
import pystac_client
import planetary_computer
import xarray as xr
import rasterio as rio
import rioxarray
import geopandas as gpd
import odc.stac
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from scipy.optimize import curve_fit
import os
import rasterstats
from rioxarray import merge
import scipy.ndimage as ndimage
from skimage.segmentation import watershed
from skimage.feature import peak_local_max
from skimage.measure import label
from shapely.geometry import shape
from rasterio.features import shapes, rasterize
import xyzservices.providers as xyz
import argparse
from typing import Tuple
# bounding box for area of interest in lat lon
minx = 141.9
maxx = 142.0
miny = 42.7
maxy = 42.75
bbox = (minx, miny, maxx, maxy)
# date in utc time
earthquake_date = "2018-09-05"
Functions defined in Notebook 1:
def calculate_date_before(date_str, days_padding=1):
date = pd.Timestamp(date_str) # Convert string to pandas Timestamp
date_before = date - pd.DateOffset(days=days_padding)
return date_before.strftime("%Y-%m-%d")
def calculate_date_range(date_str, months_padding=3):
date = pd.Timestamp(date_str) # Convert string to pandas Timestamp
start_date = date - pd.DateOffset(months=months_padding)
end_date = date + pd.DateOffset(months=months_padding)
return start_date.strftime("%Y-%m-%d"), end_date.strftime("%Y-%m-%d")
def step_function(t, a, b, t0, k=10):
return a + b / (1 + np.exp(-k * (t - t0)))
def fit_step_function(observed_values, time_values, date_of_interest):
"""Fit step function to a single pixel time series."""
# mask nodata values
mask = ~np.isnan(observed_values)
if np.sum(mask) < 5: # skip if not enough valid points
return np.nan, np.nan, np.nan
try:
# convert valid values
t_valid = time_values[mask].astype(float)
observed_valid = observed_values[mask]
# initial guesses: baseline observed value, rough estimate observed value change, and date of interest
p0 = [0, -0.2, np.datetime64(date_of_interest).astype("datetime64[ns]").astype(float)]
# fit the step function
params, _ = curve_fit(step_function, t_valid, observed_valid, p0=p0)
return params[0], params[1], params[2] # a (baseline), b (step size), t0 (change date)
except RuntimeError:
return np.nan, np.nan, np.nan # return NaN if fitting fails
Part 4: Writing functions (6 pts)#
Now we’re going to apply some of the best practices described in the demo to turn our landslide inventory workflow into a series of robust, well-documented functions that are ready to go into a module. Note that all of the code necessary to complete these functions should be already finished in the previous notebook (no need to do additional analysis here). We’ll provide some flexibility in terms of how you translate this code into functions, but please use best practices. In this section, each function that you create should have:
Good naming conventions
Helpful comments
A docstring (including a high-level overview and description of inputs and outputs)
Type hints
4.1 Data download and preprocessing#
Write a function (or a small group of functions) that prepares the seasonal NDVI time series.#
Your function (or group of functions, collectively) should have the following arguments:
an earthquake date
a bounding box
allowed cloud cover percent in Sentinel-2 images (with default value 80)
And return the following outputs:
an Xarray Dataset with dimensions
x,y, andseasonand data variableNDVI.
Run your function(s) and save the output to a variable to make sure it works.
# STUDENT CODE HERE
seasonal_ds = get_seasonal_ndvi(earthquake_date, bbox)
/opt/conda/lib/python3.11/site-packages/rasterio/warp.py:387: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
dest = _reproject(
Write a function (or a small group of functions) that prepares the NDVI time series spanning the earthquake date.#
Your function (or group of functions, collectively) should have the following arguments:
an earthquake date
a bounding box,
the seasonal NDVI time series Dataset (output by the function(s) above)
allowed cloud cover percent in Sentinel-2 images (with default set to 50)
And return the following outputs:
an Xarray Dataset with dimensions
x,y, andtime, and data variablesNDVIandNDVI_anomaly
Run your function(s) and save the output to a variable to make sure it works.
# STUDENT CODE HERE
s2_ds = get_ndvi_timeseries(earthquake_date, bbox, seasonal_ds)
4.2 Fitting a step function and creating a landslide mask#
Write a function (or a small group of functions) that fits a step function to the NDVI_anomaly time series and creates a landslide mask.#
Your function (or group of functions, collectively) should have the following arguments:
an earthquake date
a Sentinel-2 time series Xarray dataset with data variable
NDVI_anomaly
And return the following outputs:
the input Xarray dataset, now with additional data variable
landslide_mask.
Run your function(s) and save the output to a variable to make sure it works.
# STUDENT CODE HERE
s2_ds = create_landslide_mask(earthquake_date, s2_ds)
4.3 Landslide segmentation#
Write a function (or a small group of functions) that cleans up the landslide mask using binary erosion and dilation, then segments the landslide mask to label each individual landslide.#
Your function (or group of functions, collectively) should have the following arguments:
a Sentinel-2 time series Xarray dataset with data variable
landslide_mask
And return the following outputs:
the input Xarray dataset, now with additional variable
landslide_id
Run your function(s) and save the output to a variable to make sure it works.
# STUDENT CODE HERE
s2_ds = segment_landslides(s2_ds)
4.4 Polygonize landslides#
Finally, write a function (or a small group of functions) that turns our landslides_id raster into a vector dataset.#
Your function (or group of functions, collectively) should have the following arguments:
an earthquake date
a bbox
a Sentinel-2 time series Xarray dataset with data variable
landslide_id
Your function should return None, but it should write a GeoDataFrame with entries for each landslide to disk as a .geojson file.
Run your function(s) and check to see what was written to the disk to make sure it works.
# STUDENT CODE HERE
polygonize_landslides(earthquake_date, bbox, s2_ds)
Challenge question (GS required/UG +0.5):#
Raise an exception in a scenario where one of these functions might silently fail (hint: maybe where no Sentinel-2 data are available? Maybe when no landslides are found?)
# STUDENT CODE HERE
Challenge Question (GS required/UG +0.5): Write a test for one of these functions.#
Run your test to see if it’s working.
# STUDENT CODE HERE
test_polygonize_landslides()
Part 5: creating Python modules (4 pts)#
Now that we have all of these great functions that run our workflow, let’s create some Python modules for them! One of our modules will be a script that we can run to generate a coseismic landslide inventory for any area of interest.
Create a utils.py module#
This module should contain all of the functions defined in the third code cell of this notebook. Use the command line or the file browser to create the utils.py file, then copy and paste all of these functions into it. At the top of the file, make sure to import all of the packages these functions depend on. Then run !cat utils.py to preserve the output in this notebook.
# import statements for packages our utils depend on
# import pandas as pd
# import numpy as np
# from scipy.optimize import curve_fit
# STUDENT CODE HERE
Try out your utils.py module#
Import and run the calculate_date_before() function with our earthquake_date variable to make sure it works. Then run !cat utils.py to preserve the output in this notebook.
# STUDENT CODE HERE
# STUDENT CODE HERE
'2018-09-04'
Create a main() function#
This function will run when our script is excecuted. It should run our entire analysis all the way through, so it will need to call all of the functions we made in Part 4. Before each function (or related group of functions) is called, write a short print() statement that describes what our script is doing. For example: “creating seasonal median NDVI time series.” In addition, our main function should use argparse to add seven arguments to our script:
minx: Minimum longitude of the bboxminy: Minumum latitude of the bboxmaxx: Maximum longitude of the bboxmaxy: Maximum latitude of the bboxearthquake_datecloud_cover_seasonal: maximum cloud cover for seasonal median NDVIcloud_cover_time_series: maximum cloud cover for NDVI time series
Each argument should have a help string so that the user can understand how to use it. After parsing the arguments, your main() function should create the earthquake_date variable and bbox tuple variable from these arguments.
# STUDENT CODE HERE
Create a landslide_inventory.py module#
This module should contain the following, in order:
An appropriate shebang line at the top
Import statements for all of the packages needed to run the functions that comprise our workflow
Import statements for the functions in our
utils.pymodule needed to run our workflowDefinitions for all of the functions needed to run our workflow (written by you in Part 4)
Our
main()functionSome code to run
main()if our module is being run directly (hint:if __name__ == "__main__":)
When you’re finished creating this module, do !cat landslide_inventory.py to preserve its contents here.
# STUDENT CODE HERE
Use help to check your module’s usage#
First, run !chmod +x landslide_inventory.py to change the permissions of your script, making it executable. Then, run !./landslide_inventory.py -h to see what your new module’s usage is.
# STUDENT CODE HERE
# STUDENT CODE HERE
Part 6: Running scripts (2 pts)#
Ok, now we have a script that we can run for any earthquake after 2015 to detect coseismic landslides! Let’s see how easy it is to run our script for another location. Expect that the following code cell will take about five minutes to run.
%%time
# 2021 Haiti earthquake https://en.wikipedia.org/wiki/2021_Haiti_earthquake
earthquake_date = '2021-08-14'
minx = -74.03
miny = 18.35
maxx = -73.98
maxy = 18.39
cloud_cover_seasonal = 5
cloud_cover_time_series = 50
!./landslide_inventory.py $minx $miny $maxx $maxy $earthquake_date $cloud_cover_seasonal $cloud_cover_time_series
creating seasonal median NDVI dataset
/opt/conda/lib/python3.11/site-packages/rasterio/warp.py:387: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
dest = _reproject(
preparing NDVI time series
creating landslide mask
segmenting landslides
polygonizing landslides
done!
CPU times: user 3.97 s, sys: 742 ms, total: 4.71 s
Wall time: 4min 17s
Explore your new landslide inventory#
Uncomment the following code and explore the Haiti landslide inventory.
# # recomment when done exploring
# landslide_gdf = gpd.read_file(f'landslides_{earthquake_date}_{minx}_{miny}_{maxx}_{maxy}.geojson')
# landslide_gdf.explore(tiles=xyz.Esri.WorldImagery)
Written Response: For the earthquakes in Japan and Haiti, compare and contrast:#
The efficacy of our approach for generating a landslide inventory
The spatial distribution and appearance of landslides that occurred
How might you improve our approach? Consider ways you could make it more customizable, adaptive, user-friendly, organized, accurate, etc.
STUDENT WRITTEN RESPONSE HERE
Written Response: If you were going to make a Python package from our two modules and host it on Github so others could use it, what would be some additional components/features it would require?#
STUDENT WRITTEN RESPONSE HERE
Submit your work#
Make sure to do this process for both notebooks! Also push your Python modules, but do not push your data.
Save this notebook with all code and output (Make sure when you save the notebook, all cells show their outputs).
Use the terminal to stage, commit, and push your notebook to your GitHub repository. It should look something like this…
git add 01_lab.ipynb
git commit -m “Completed Lab 01 exercises”
git push
Verify that your notebook appears in your GitHub repository. Double check to make sure all the ouputs are visible!