02: Core python modules - Numpy, Pandas, and Matplotlib#
UW Geospatial Data Analysis
CEE467/CEWA567
David Shean, Eric Gagliano, Quinn Brencher
Please quickly read through this entire document once, then go back and start tackling the various tasks.
Overview#
This week is our final “flip the classroom” situation with longer interactive reading assignments, yay!! Before class, you will be responsible for reviewing material from external resources. During lecture/lab, we will briefly review some of this material, do some interactive demos, discuss questions and clarify concepts as a class, and then collaboratively work on some problems/exercises to help solidify the concepts.
Reading and Tutorials#
This week we are reviewing core Python modules: NumPy, Pandas, and Matplotlib. These are essential for the rest of the course and future data science endeavors (geospatial and otherwise). Most of the geospatial modules we use are built on these packages, so you must be comfortable with the underlying functionality. This review will ensure that we have a common baseline and set of references moving forward. If you are relatively new to Python and these modules, it is critical that you spend extra time with self-study this week.
Again, this is intended to be an individual review. Tailor to your needs, and adjust emphasis so you are best using your time outside of class. Even if you’ve been using these tools for many years, it can still valuable to review, as you will inevitably learn (or re-learn) some new tricks and develop a better grasp of more complex concepts. If this is new, please dedicate the time to explore interactively, don’t just skim rendered versions on the web.
As with the previous homework assignments, don’t wait until Friday morning to start. This material will be much more useful if broken up over several sessions throughout the week - try an hour or two a day.
Python Data Science Handbook: NumPy, Pandas and Matplotlib (~2-6 hours)#
Review the following sections of Jake Vanderplas’ Python Data Science Handbook. We learned last week how to clone a repository from github to the jupyterhub, try it now by yourself! Get through what you can, but if you’re feeling overwhelmed or pressed for time, try to at least work through the first half of each section, which should cover most basic functionality.
2. Introduction to NumPy
Can skip section on Structured Arrays (we’ll use Pandas)
3. Introduction to Pandas
If new to Pandas, can skip section on Hierarchical Indexing, Pivot Tables and High-performance Pandas
4. Introduction to Matplotlib
Skip the section on “Geographic Data with Basemap” as this is largely outdated
Can skip Customizing Matplotlib: Configurations and Stylesheets
Work through some of the interactive examples, and explore new concepts (don’t just shift-Enter as quickly as possible)
The section on Machine Learning with Scikit-Learn is also great, but not required for this course
(Optional) Official Quickstart/Doc#
Skip any installation instructions
Pandas#
https://pandas.pydata.org/docs/user_guide/10min.html
https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
Matplotlib#
https://matplotlib.org/stable/tutorials/introductory/usage.html
Can skip the “interactive” mode
https://matplotlib.org/cheatsheets/cheatsheets.pdf
See other handouts: https://github.com/matplotlib/cheatsheets
NumPy#
https://numpy.org/doc/stable/user/absolute_beginners.html
https://numpy.org/doc/stable/user/quickstart.html
Assignment - Due before class this Friday#
Complete the above reading assignments
Fill out this feedback form about the reading assignment
Submit last week’s lab assignment
Outlook#
You did it! Congrats on getting through the first two weeks of readings. From here on out, reading assignments will be much shorter. And next week we start geospatial data analysis in earnest!