# 02: Core python modules - Numpy, Pandas, and Matplotlib

UW Geospatial Data Analysis  
CEE467/CEWA567  
David Shean, Eric Gagliano, Quinn Brencher

Please quickly read through this entire document once, then go back and start tackling the various tasks.

## Overview
This week is our final “flip the classroom” situation with longer interactive reading assignments, yay!! Before class, you will be responsible for reviewing material from external resources. During lecture/lab, we will briefly review some of this material, do some interactive demos, discuss questions and clarify concepts as a class, and then collaboratively work on some problems/exercises to help solidify the concepts.

## Reading and Tutorials
This week we are reviewing core Python modules: NumPy, Pandas, and Matplotlib. These are essential for the rest of the course and future data science endeavors (geospatial and otherwise). Most of the geospatial modules we use are built on these packages, so you must be comfortable with the underlying functionality. This review will ensure that we have a common baseline and set of references moving forward. **If you are relatively new to Python and these modules, it is critical that you spend extra time with self-study this week.**

Again, this is intended to be an individual review. Tailor to your needs, and adjust emphasis so you are best using your time outside of class. Even if you’ve been using these tools for many years, it can still valuable to review, as you will inevitably learn (or re-learn) some new tricks and develop a better grasp of more complex concepts. If this is new, please dedicate the time to explore interactively, don't just skim rendered versions on the web.

As with the previous homework assignments, don’t wait until Friday morning to start.  This material will be much more useful if broken up over several sessions throughout the week - try an hour or two a day.

### Python Data Science Handbook: NumPy, Pandas and Matplotlib (~2-6 hours)
* Review the following sections of Jake Vanderplas’ [Python Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook). We learned last week how to clone a repository from github to the jupyterhub, try it now by yourself! Get through what you can, but if you're feeling overwhelmed or pressed for time, try to at least work through the first half of each section, which should cover most basic functionality.  
   * **2. Introduction to NumPy**
      * Can skip section on Structured Arrays (we'll use Pandas)
   * **3. Introduction to Pandas**
      * If new to Pandas, can skip section on Hierarchical Indexing, Pivot Tables and High-performance Pandas
   * **4. Introduction to Matplotlib**  
      * Skip the section on "Geographic Data with Basemap" as this is largely outdated
      * Can skip Customizing Matplotlib: Configurations and Stylesheets
* Work through some of the interactive examples, and explore new concepts (don't just shift-Enter as quickly as possible)
* The section on Machine Learning with Scikit-Learn is also great, but not required for this course

### (Optional) Official Quickstart/Doc
*Skip any installation instructions*
#### Pandas
* https://pandas.pydata.org/docs/user_guide/10min.html
* https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
#### Matplotlib
* https://matplotlib.org/stable/tutorials/introductory/usage.html
   * Can skip the "interactive" mode
* https://matplotlib.org/cheatsheets/cheatsheets.pdf
   * See other handouts: https://github.com/matplotlib/cheatsheets
#### NumPy
* https://numpy.org/doc/stable/user/absolute_beginners.html
* https://numpy.org/doc/stable/user/quickstart.html

## Assignment - *Due before class this Friday*
* Complete the above reading assignments
* Fill out this [feedback form](https://docs.google.com/forms/d/e/1FAIpQLSe8UBUcEW5gEqdlzK2Trc1nqW53FVZClSfBXjNZCozuQgQoDA/viewform?usp=sharing&ouid=109959221683421850946) about the reading assignment
* Submit last week's lab assignment

## Outlook
You did it! Congrats on getting through the first two weeks of readings. From here on out, reading assignments will be much shorter. And next week we start geospatial data analysis in earnest!
