Lab02 assignment (20 pts)#
UW Geospatial Data Analysis
CEE467/CEWA567
David Shean
modified by Eric Gagliano, Quinn Brencher
Introduction#
Objectives#
Solidify basic skills with NumPy, Pandas, and Matplotlib
Learn basic data manipulation, exploration, and visualizatioin with a relatively small, clean point dataset (65K points)
Learn a bit more about the ICESat mission, the GLAS instrument, and satellite laser altimetry
Explore outlier removal and grouping
Instructions#
For each question or task below, write some code in the empty cell and execute to preserve your output
If you are in the graduate section of the class, please complete the challenge questions
Work together, consult resources we’ve discussed, post on slack!
Background: ICESat GLAS#
The NASA Ice Cloud and land Elevation Satellite (ICESat) was a NASA mission carrying the Geosciences Laser Altimeter System (GLAS) instrument: a space laser, pointed down at the Earth (and unsuspecting Earthlings).
It measured surface elevations by precisely tracking laser pulses emitted from the spacecraft at a rate of 40 Hz (a new pulse every 0.025 seconds). These pulses traveled through the atmosphere, reflected off the surface, back up through the atmosphere, and into space, where some small fraction of that original energy was received by a telescope on the spacecraft. The instrument electronics precisely recorded the time when these intrepid photons left the instrument and when they returned. The position and orientation of the spacecraft was precisely known, so the two-way traveltime (and assumptions about the speed of light and propagation through the atmosphere) allowed for precise forward determination of the spot on the Earth’s surface (or cloud tops, as was often the case) where the reflection occurred. The laser spot size varied during the mission, but was ~70 m in diameter.
ICESat collected billions of measurements from 2003 to 2009, and was operating in a “repeat-track” mode that sacrificed spatial coverage for more observations along the same ground tracks over time. One primary science focus involved elevation change over the Earth’s ice sheets. It allowed for early measurements of full Antarctic and Greenland ice sheet elevation change, which offered a detailed look at spatial distribution and rates of mass loss, and total ice sheet contributions to sea level rise. Many measurements were also collected over lower latitudes, to meet other important science objectives (e.g., estimating biomass in the Earth’s forests, observing sea surface height/thickness over time).
Background: Sample GLAS dataset for CONUS#
A few years ago, David wanted to evaluate ICESat coverage of the Continental United States (CONUS). The primary application was to extract a set of accurate control points to co-register a large set of high-resolution digital elevation modoels (DEMs) derived from satellite stereo imagery. He wrote some Python/shell scripts to download, filter, and process all of the GLAH14 L2 Global Land Surface Altimetry Data granules in parallel (dshean/icesat_tools).
The high-level workflow is here: https://github.com/dshean/icesat_tools/blob/master/glas_proc.py#L24. These tools processed each HDF5 (H5) file and wrote out csv files containing “good” points. These csv files were concatenated to prepare the single input csv (GLAH14_tllz_conus_lulcfilt_demfilt.csv) that we will use for this tutorial.
The csv contains ICESat GLAS shots that passed the following filters:
Within some buffer (~110 km) of mapped glacier polygons from the Randolph Glacier Inventory (RGI)
Returns from exposed bare ground (landcover class 31) or snow/ice (12) according to a 30-m Land-use/Land-cover dataset (2011 NLCD, https://www.mrlc.gov/data?f%5B0%5D=category%3Aland%20cover)
Elevation values within some threshold (200 m) of elevations sampled from an external reference DEM (void-filled 1/3-arcsec [30-m] SRTM-GL1, https://lpdaac.usgs.gov/products/srtmgl1v003/), used to remove spurious points and returns from clouds.
Various other ICESat-specific quality flags (see comments in
glas_proc.pyfor details)
The final file contains a relatively small subset (~65K) of the total shots in the original GLAH14 data granules from the full mission timeline (2003-2009). The remaining points should represent returns from the Earth’s surface with reasonably high quality, and can be used for subsequent analysis.
Part 0: Imports, matplotlib backend#
Let’s use this dataset to explore some of the NumPy and Pandas functionality, and practice some basic plotting with Matplotlib.
I’ve provided instructions and hints, but you will need to fill in the code to generate the output results and plots.
Import necessary modules#
#Use shorter names (np, pd, plt) instead of full (numpy, pandas, matplotlib.pylot) for convenience
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Matplotlib backend selection#
#Use matplotlib inline to render/embed figures in the notebook for upload to github
%matplotlib inline
#Use matplotlib widget enable interactive plotting (zoom/pan) in Jupyter lab
#%matplotlib widget
Define a path to the GLAS data csv#
glas_fn = './data/GLAH14_tllz_conus_lulcfilt_demfilt.csv'
Do a quick check of file contents#
Use iPython functionality to run the head shell command on the your filename variable
!head $glas_fn
decyear,ordinal,lat,lon,glas_z,dem_z,dem_z_std,lulc
2003.13957078,731266.9433448168,44.157897,-105.356562,1398.51,1400.52,0.33,31
2003.13957081,731266.9433462636,44.150175,-105.358116,1387.11,1384.64,0.43,31
2003.13957081,731266.9433465529,44.148632,-105.358427,1392.83,1383.49,0.28,31
2003.13957081,731266.9433468423,44.147087,-105.358738,1384.24,1382.85,0.84,31
2003.13957081,731266.9433471316,44.145542,-105.359048,1369.21,1380.24,1.73,31
2003.13957081,731266.9433474210,44.143996,-105.359359,1366.60,1375.23,1.60,31
2003.13957081,731266.9433506038,44.126969,-105.362876,1355.14,1379.38,2.17,31
2003.13957084,731266.9433604418,44.074358,-105.373549,1369.53,1391.71,2.88,31
2003.13957084,731266.9433607311,44.072806,-105.373864,1380.02,1387.79,0.45,31
Part 1: NumPy (5pts)#
Load the file#
NumPy has some convenience functions for loading text files:
loadtxtandgenfromtxtUse
loadtxthere (simpler), but make sure you properly set the delimiter and handle the first row (see theskiprowsoption)Use iPython
?to look up reference on arguments fornp.loadtxt
Store the NumPy array as variable called
glas_np
# STUDENT CODE HERE
Written response: What are some advantages of reading our data into a Numpy array, rather than a simple list (or list of lists)?#
STUDENT WRITTEN RESPONSE HERE
Do a quick check to make sure your array looks good#
glas_np
array([[2.00313957e+03, 7.31266943e+05, 4.41578970e+01, ...,
1.40052000e+03, 3.30000000e-01, 3.10000000e+01],
[2.00313957e+03, 7.31266943e+05, 4.41501750e+01, ...,
1.38464000e+03, 4.30000000e-01, 3.10000000e+01],
[2.00313957e+03, 7.31266943e+05, 4.41486320e+01, ...,
1.38349000e+03, 2.80000000e-01, 3.10000000e+01],
...,
[2.00977600e+03, 7.33691238e+05, 3.78993190e+01, ...,
1.55644000e+03, 0.00000000e+00, 3.10000000e+01],
[2.00977600e+03, 7.33691238e+05, 3.79008690e+01, ...,
1.55644000e+03, 0.00000000e+00, 3.10000000e+01],
[2.00977600e+03, 7.33691238e+05, 3.79024200e+01, ...,
1.55644000e+03, 0.00000000e+00, 3.10000000e+01]])
How many rows and columns are in your array?#
# STUDENT CODE HERE
(65236, 8)
What is the datatype of your array?#
# STUDENT CODE HERE
dtype('float64')
Note that a NumPy array typically has a single datatype, while a Pandas DataFrame can contain multiple data types (e.g., string, float64)
Examine the first 3 rows#
Use slicing here
# STUDENT CODE HERE
Examine the column with glas_z values#
You will need to figure out which column number corresponds to these values (can do this manually from header), then slice the array to return all rows, but only that column
# STUDENT CODE HERE
Compute the mean and standard deviation of the glas_z values#
# STUDENT CODE HERE
Use print formatting to create a formatted string with these values#
Should be
'GLAS z: mean +/- std meters'using yourmeanandstdvalues, both formatted with 2 decimal places (cm-precision)For example: ‘GLAS z: 1234.56 +/- 12.34 meters’
# STUDENT CODE HERE
Create a simple scatter plot of the glas_z values using Matplotlib#
Careful about correclty defining your x and y with values for latitude and longitude - easy to mix these up
Use point color to represent the elevation
You should see points that roughly outline the western United States
Does this look familiar??
Label the x axis, y axis, and add a descriptive title
# STUDENT CODE HERE

Use conditionals and fancy indexing to extract points from 2005#
Design a “filter” to isolate the points from 2005
Can use boolean indexing
Can then extract values from original array using the boolean index
Store these points in a new NumPy array
# STUDENT CODE HERE
How many points were acquired in 2005?#
# STUDENT CODE HERE
13122
Part 2: pandas (5 pts)#
A significant portion of the Python data science ecosystem is based on pandas and/or pandas data models.
pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way towards this goal.
https://github.com/pandas-dev/pandas#main-features
pandas offers:
A better way to deal with tabular data, built on top of NumPy arrays
With NumPy, we had to remember which column number (e.g., 3, 4) represented each variable (lat, lon, glas_z, etc)
pandas allows you to store data with different types, and then reference using more meaningful labels
NumPy:
glas_np[:,4]pandas:
glas_df['glas_z']
A good “10-minute” reference with examples: https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html
There are newer packages that aim to improve on pandas (e.g. polars), but pandas is still the most popular and established Python package for handling tabular data.
Load the csv file with pandas#
Note that pandas has excellent readers for most common file formats: https://pandas.pydata.org/pandas-docs/stable/reference/io.html
Store as a DataFrame called
glas_df
# STUDENT CODE HERE
Written response: We already loaded these same data using Numpy. What are some advantages of loading these data as a pandas DataFrame rather than a Numpy array?#
STUDENT WRITTEN RESPONSE HERE
Is a pandas DataFrame appropriate for all shapes and size of data?#
STUDENT WRITTEN RESPONSE HERE
That was easy! Let’s inspect the DataFrame#
glas_df
| decyear | ordinal | lat | lon | glas_z | dem_z | dem_z_std | lulc | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2003.139571 | 731266.943345 | 44.157897 | -105.356562 | 1398.51 | 1400.52 | 0.33 | 31 |
| 1 | 2003.139571 | 731266.943346 | 44.150175 | -105.358116 | 1387.11 | 1384.64 | 0.43 | 31 |
| 2 | 2003.139571 | 731266.943347 | 44.148632 | -105.358427 | 1392.83 | 1383.49 | 0.28 | 31 |
| 3 | 2003.139571 | 731266.943347 | 44.147087 | -105.358738 | 1384.24 | 1382.85 | 0.84 | 31 |
| 4 | 2003.139571 | 731266.943347 | 44.145542 | -105.359048 | 1369.21 | 1380.24 | 1.73 | 31 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 65231 | 2009.775995 | 733691.238340 | 37.896222 | -117.044399 | 1556.16 | 1556.43 | 0.00 | 31 |
| 65232 | 2009.775995 | 733691.238340 | 37.897769 | -117.044675 | 1556.02 | 1556.43 | 0.00 | 31 |
| 65233 | 2009.775995 | 733691.238340 | 37.899319 | -117.044952 | 1556.19 | 1556.44 | 0.00 | 31 |
| 65234 | 2009.775995 | 733691.238340 | 37.900869 | -117.045230 | 1556.18 | 1556.44 | 0.00 | 31 |
| 65235 | 2009.775995 | 733691.238341 | 37.902420 | -117.045508 | 1556.32 | 1556.44 | 0.00 | 31 |
65236 rows × 8 columns
Check data types#
Can use the DataFrame
infomethod
# STUDENT CODE HERE
Check the column labels#
Can use the DataFrame
columnsattribute
# STUDENT CODE HERE
If you are new to Python and object-oriented programming, take a moment to consider the difference between the methods and attributes of the DataFrame, and how both are accessed.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
If this is confusing, ask your neighbor or instructor.
Preview records using DataFrame head and tail methods#
# STUDENT CODE HERE
| decyear | ordinal | lat | lon | glas_z | dem_z | dem_z_std | lulc | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2003.139571 | 731266.943345 | 44.157897 | -105.356562 | 1398.51 | 1400.52 | 0.33 | 31 |
| 1 | 2003.139571 | 731266.943346 | 44.150175 | -105.358116 | 1387.11 | 1384.64 | 0.43 | 31 |
| 2 | 2003.139571 | 731266.943347 | 44.148632 | -105.358427 | 1392.83 | 1383.49 | 0.28 | 31 |
| 3 | 2003.139571 | 731266.943347 | 44.147087 | -105.358738 | 1384.24 | 1382.85 | 0.84 | 31 |
| 4 | 2003.139571 | 731266.943347 | 44.145542 | -105.359048 | 1369.21 | 1380.24 | 1.73 | 31 |
# STUDENT CODE HERE
Compute the mean and standard deviation for all values in each column#
Don’t overthink this, should be simple (no loops!)
# STUDENT CODE HERE
decyear 2005.945322
ordinal 732291.890372
lat 40.946798
lon -115.040612
glas_z 1791.494167
dem_z 1792.260964
dem_z_std 5.504748
lulc 30.339444
dtype: float64
# STUDENT CODE HERE
Apply a custom function to each column#
For this example, first define a function to compute the Normalized Median Absolute Deviation (NMAD)
https://en.wikipedia.org/wiki/Median_absolute_deviation
Your function should accept an nD array and return an array containing a single NMAD value
For a normal distribution, this is equivalent to the standard deviation.
For data containing outliers, it is a more robust representation of variability.
Then use the pandas
applymethod to compute the NMAD for all values in each columnTake a moment to compare the NMAD values with the standard deviation values above.
# formula for nmad
c = 1.4826
a = np.random.rand(50) # example array
nmad = np.median(np.fabs(a - np.median(a))) * c
# Note: the NMAD function is now distributed with scipy.stats: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.median_absolute_deviation.html
# STUDENT CODE HERE
# STUDENT CODE HERE
decyear 2.066488
ordinal 755.079010
lat 3.885421
lon 5.798237
glas_z 632.580942
dem_z 632.136162
dem_z_std 2.001510
lulc 0.000000
dtype: float64
Print quick stats for entire DataFrame with the describe method#
# STUDENT CODE HERE
Useful, huh? Note that the 50% statistic (50th percentile) is the median.
Use the pandas plotting functionality to create a 2D scatterplot of glas_z values#
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.scatter.html
Note that labels and colorbar are automatically plotted!
Adjust the size of the points using the
s=1keywordExperiment with different color ramps:
https://matplotlib.org/examples/color/colormaps_reference.html (I prefer
inferno)
Note: If your x axis label mysteriously disappears, see: https://uwgda-jupyterbook.readthedocs.io/en/latest/resources/core_packages.html#scatterplot-x-axis-label-and-x-tick-labels-disappear-when-using-colormap
Color ramps#
Information on how to choose a good colormap for your data: https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html
Another great resource (Thanks @fperez!): https://matplotlib.org/cmocean/
TL;DR Don’t use jet, use a perceptually uniform colormap for linear variables like elevation. Use a diverging color ramp for values where sign is important.
# STUDENT CODE HERE

Experiment by changing the variable represented with the color ramp#
Try
decyearor other columns to quickly visualize spatial distribution of these values.
# STUDENT CODE HERE

Challenge Question: Create a 3D scatterplot (GS required/UG +0.5)#
See samples here: https://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html
Explore with the interactive tools (click and drag to change perspective). Some lag here considering number of points to be rendered, and maybe useful for visualizing small 3D datasets in the future. There are other 3D plotting packages that are built for performance and efficiency (e.g., ipyvolume: https://github.com/maartenbreddels/ipyvolume)
# STUDENT CODE HERE

Create a histogram that shows the number of points vs time (decyear)#
Should be simple with built-in method for your
DataFrameMake sure that you use enough bins to avoid aliasing. This could require some trial and error (try 10, 100, 1000, and see if you can find a good compromise)
Can also consider some of the options (e.g., ‘auto’) here, though I have found ‘auto’ doesn’t always work well: https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram_bin_edges.html#numpy.histogram_bin_edges
I approached this by thinking about the number of bins required for ~weekly resolution over the ~6 year mission
You should be able to resolve the distinct campaigns during the mission (each ~1-2 months long). There is an extra credit problem at the end to group by years and play with clustering for the campaigns.
# STUDENT CODE HERE

Create a histogram of all glas_z elevation values#
What do you note about the distribution?
# STUDENT CODE HERE

Create a 2d scatterplot showing the spatial distribution of glas_z values above 4000 m#
Use indexing
Plot the rest of the points (
glas_zvalues below 4000 m) in light gray
# STUDENT CODE HERE

Part 3: Elevation Difference Calculations, Outlier Removal (5 pts)#
Compute the elevation difference between ICESat glas_z and SRTM dem_z values#
Earlier, David mentioned that he had sampled the SRTM DEM for each GLAS shot
Compute the difference using pandas and store in a new column called
glas_srtm_dhRemember the order of this calculation (if the difference values are negative, which dataset is higher elevation?)
Check values with
head
# STUDENT CODE HERE
| decyear | ordinal | lat | lon | glas_z | dem_z | dem_z_std | lulc | glas_srtm_dh | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2003.139571 | 731266.943345 | 44.157897 | -105.356562 | 1398.51 | 1400.52 | 0.33 | 31 | -2.01 |
| 1 | 2003.139571 | 731266.943346 | 44.150175 | -105.358116 | 1387.11 | 1384.64 | 0.43 | 31 | 2.47 |
| 2 | 2003.139571 | 731266.943347 | 44.148632 | -105.358427 | 1392.83 | 1383.49 | 0.28 | 31 | 9.34 |
| 3 | 2003.139571 | 731266.943347 | 44.147087 | -105.358738 | 1384.24 | 1382.85 | 0.84 | 31 | 1.39 |
| 4 | 2003.139571 | 731266.943347 | 44.145542 | -105.359048 | 1369.21 | 1380.24 | 1.73 | 31 | -11.03 |
Compute the time difference between each ICESat point timestamp and the SRTM timestamp#
Store in a new column named
glas_srtm_dtThe SRTM data were collected between February 11-22, 2000
Can assume a constant decimal year value of 2000.112 for now
Check values with
head
# STUDENT CODE HERE
Compute “apparent” annualized elevation change rates (meters per year) from these new columns#
Store in a new column named glas_srtm_dhdt
This will be rate of change between the SRTM timestamp (2000) and each GLAS point timestamp (2003-2009)
This is $\frac{dh}{dt}$ : a common metric used for elevation change analysis
Check values with
head
# STUDENT CODE HERE
| decyear | ordinal | lat | lon | glas_z | dem_z | dem_z_std | lulc | glas_srtm_dh | glas_srtm_dt | glas_srtm_dhdt | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2003.139571 | 731266.943345 | 44.157897 | -105.356562 | 1398.51 | 1400.52 | 0.33 | 31 | -2.01 | 3.027571 | -0.663899 |
| 1 | 2003.139571 | 731266.943346 | 44.150175 | -105.358116 | 1387.11 | 1384.64 | 0.43 | 31 | 2.47 | 3.027571 | 0.815836 |
| 2 | 2003.139571 | 731266.943347 | 44.148632 | -105.358427 | 1392.83 | 1383.49 | 0.28 | 31 | 9.34 | 3.027571 | 3.084982 |
| 3 | 2003.139571 | 731266.943347 | 44.147087 | -105.358738 | 1384.24 | 1382.85 | 0.84 | 31 | 1.39 | 3.027571 | 0.459114 |
| 4 | 2003.139571 | 731266.943347 | 44.145542 | -105.359048 | 1369.21 | 1380.24 | 1.73 | 31 | -11.03 | 3.027571 | -3.643185 |
Create a scatterplot of the rates#
Use a
RdBu(Red to Blue) color rampSet the color ramp limits using
vminandvmaxkeyword arguments to be symmetrical about 0Generate two plots with different color ramp range to bring out some detail
Do you see outliers (values far outside the expected distribution)?
Do you see any coherent spatial patterns in the difference values?
# STUDENT CODE HERE

Create a histogram of the difference values#
Increase the number of bins, and limit the range to bring out detail of the distribution
Optional: add a vertical line for the median difference using
axvline
# STUDENT CODE HERE

Compute the mean, median and standard deviation of the differences#
Thought question: why might we have a non-zero mean or median difference?
# STUDENT CODE HERE
Create a scatterplot of elevation difference glas_srtm_dhdt values vs elevation values#
glas_srtm_dhdtshould be on the y-axisglas_zvalues on the x-axis
# STUDENT CODE HERE

Challenge Question: Remove outliers (GS required/UG +0.5)#
The initial filter in glas_proc.py removed GLAS points with absolute elevation difference >200 m compared to the SRTM elevations. We expect most real elevation change signals to be less than this for the given time period. But clearly some outliers remain.
Design and apply a filter that removes outliers. One option is to define outliers as values outside some absolute threshold. Can set this threshold as some multiple of the standard deviation (e.g., 3*std) about the mean. This is also known as a Z-score filter. Can also use quantile or percentile values for this threshold.
Create new plot(s) to visualize the distribution of outliers and inliers. I’ve included my figure as a reference, but please experiment to develop your own, don’t just try to reproduce! Focus on the filtering strategy and create some quick plots to verify that things worked.
# STUDENT CODE HERE
# STUDENT CODE HERE

Active remote sensing confidence check#
Even after removing outliers, there are still some big differences between the SRTM and GLAS elevation values.
Please consider the following thought questions (discuss with neighbor, maybe jot down some notes in a new cell, but formal responses not required):
Do you see systematic differences between the glas_z and dem_z values?
Any clues from the scatterplot? (e.g., do some tracks (north-south lines of points) display systematic bias?)
Brainstorm some ideas about what might be going on here. Think about the nature of each sensor:
ICESat was a Near-IR laser (1064 nm wavelength) with a big ground spot size (~70 m in diameter)
Timestamps span different seasons between 2003-2009
SRTM was a C-band radar (5.3 GHz, 5.6 cm wavelength) with approximately 30 m ground sample distance (pixel size)
Timestamp was February 2000
Data gaps (e.g., radar shadows, steep slopes) were filled with ASTER GDEM2 composite, which blends DEMs acquired over many years ~2000-2014
Consider different surfaces and how the laser/radar footprint might be affected:
Flat bedrock surface
Dry sand dunes
Steep montain topography like the Front Range in Colorado
Dense vegetation of the Hoh Rainforest in Olympic National Park
Part 4: pandas Groupby (5 pts)#
Let’s check to see if differences are due to our land-use/land-cover classes#
Determine the unique values in the
lulccolumn and the total count for each (hint: see thevalue_countsmethod)In the introduction, I said that I initially preserved only two classes for these points (12 - snow/ice, 31 - barren land), so this isn’t going to help us over forests:
https://www.mrlc.gov/data/legends/national-land-cover-database-2011-nlcd2011-legend
# STUDENT CODE HERE
Use pandas groupby to compute stats for the LULC classes#
https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html
This is one of the most powerful features in Pandas, efficient grouping and analysis based on some values
Compute mean, std, median, and nmad of the
glas_srtm_dhdtfor each LULC classThese can be computed individually, or by passing a list to the
aggfunction: https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#applying-multiple-functions-at-once
# STUDENT CODE HERE
| glas_srtm_dhdt | ||||
|---|---|---|---|---|
| mean | std | median | nmad | |
| lulc | ||||
| 12 | -0.383826 | 2.912049 | -0.162833 | 1.281189 |
| 31 | -0.173273 | 2.527328 | -0.137926 | 0.421701 |
Prepare scatterplots for each LULC class#
Remember that LULC 12 is snow/ice and LULC 31 is exposed bare ground
# STUDENT CODE HERE

Written response: Interpretation#
The data are noisy, but do you see any statistically significant differences and/or coherent spatial patterns for points over ice vs. bare rock?
The answer could be no! See the active remote sensing confidence check above.
STUDENT RESPONSE HERE
groupby year#
See if you can use Pandas
groupbyto count the number of shots for each yearMultiple ways to accomplish this
One approach might be to create a new column with integer year, then groupby that column
Can modify the
decyearvalues (seefloor), or parse the Python time ordinals
Create a bar plot showing number of shots in each year
# STUDENT CODE HERE

Challenge Question: Annual scatterplots (GS required/UG +0.5)#
Create a figure with multiple subplots showing scatterplots of points for each year
Try to create the plots using a loop rather than individually
# STUDENT CODE HERE

Submit your work#
Save this notebook with all code and output (Make sure when you save the notebook, all cells show their outputs).
Use the terminal to stage, commit, and push your notebook to your GitHub repository. It should look something like this…
git add 02_lab.ipynb
git commit -m “Completed Lab 02 assignment”
git push
Verify that your notebook appears in your GitHub repository. Double check to make sure all the ouputs are visible!