Galvanize Data Science Prep, pt 2

Hey, so guess who forgot to write an update last week? The course has been great, I highly recommend it. Part 2 was released yesterday (Wednesday, the course concludes Friday). [edit: and an update to Part 1, see the list below] Pt 2 is intended as a primer for students who are enrolling into the data science intensive, and looks like it expands on the topics covered in the prep course. The instructor won’t be available after Friday, but the slack team will still be active, and the other students have been helpful whenever there’s a question posted. Here’s what’s additionally on offer:

  • Git Primer
    • Primer, Concepts, submitting work to the instructor
  • In-terminal Editors
    • Vim
    • Nano
  • Object Oriented Programming
    • Terminology
    • Classes
    • Initialization and ‘self’
    • Methods
    • Magic Methods
  • Pandas
    • Pandas Series
    • Panda DataFrames
    • Merging DataFrames
    • Split, Apply, Combine data
    • Visualize Data with Pandas
    • File I/O
    • intro to Exploratory Data Analysis
  • Interpret functions as integrals and derivatives
    • Mathematical Limits
    • Derivatives and rates of change
    • Integrals of functions
    • Connection between derivatives and integrals
  • Linear Algebra 1
    • Matrix Inversion
    • Systems of Linear Equations
    • Vector Similarity
  • Linear Algebra 2
    • Linear Algebra from a Geometric Perspective
    • Linear Transformations Overview
    • Rotations
    • Changing Dimensions
    • Eigenvectors and Eigenvalues
  • Statistics and Probability
    • Random Variables
    • Distributions
    • Estimation
  • Back in Part 1 SQL was added some time this week
    • Database Structure
    • Populating a database
    • Writing simple queries
    • Writing aggregate queries
    • Joining tables
    • SQL style conventions

So wow, that’s a lot of ground to cover. It looks like a really good expansion on the topics in Part 1, and I’m looking forward to going through that material.

The modules I outlined in my previous post have been generally good. There are some aspects where there isn’t a clear line between A and B. In a lot of ways that mimics my experience in engineering school; a primary difference being that I now have the web to look for explanations of things that aren’t making sense. Back then I could reread the textbook, review my crummy notes or try the math tutoring center (if it was open and if I could get over my anxiety about asking for help). I’m much better about using the resources that are available to me.

Now I have a plethora of examples to look for one that makes sense to my brain about how a particular math operation should work. There are some aspects of using NumPy that simply need practice and repetition, but fortunately I’m comfortable and experienced with googling my code problems.

So far there’s only been one challenge (end of module) problem that I’ve called shenanigans on. It required using a technique that was discussed and we were given two basic practice problems to see how it works. The question in question required recalling this technique several modules later, then applying said technique to a new method that behaves very differently from anything we’d encountered previously. I was able to figure it out, but it felt like the learning was less about how to use the method and more about deep-diving on problem solving. This question was either a success or not, depending on their intended outcomes.

Having the instructor has been really helpful. I’m new-ish to Python and how the syntax parses. Having someone to pair with to review my code and tidy it up was fantastic.

I’m looking forward to finishing this first probability module today. As I’m able to continue with the new material, I’ll make some new posts to compare these “deep dives” with the modules in the first section.

Galvanize Data Science Prep, pt 2

Galvanize Data Science Intensive, Premium Prep Course (Day 3)

You’re not behind, I am. I’m writing this on day three but it’s the first blog post in what will be a short series.

Last Monday the venue for PuPPy’s Interview Prep Night was Seattle’s Galvanize location. I’ve been there before for meetups and I was so very impressed and grateful for their hospitality toward IPN. Snacks, water, swag (swag!) and discount codes for their Data Science Immersive prep course. Awesome!

My motivations are mostly altruistic. Over the course of TAing classes for University of Washington, at meetups, and as an organizer for IPN I’ve been asked numerous times, “What do I need to do to get into data science/machine learning/AI?” Previously my answer was along the lines of “git gud at Python…. and how’s your math?” I have a slightly better idea of what’s involved in the foundational education now that I’ve seen how this course is laid out.

The GDSIPP is “self-directed” insofar as we’re left to work on the sections at our own pace but there’s an instructor available from 9:30 to 4:30 Pacific Time for the two week course. Each day begins with a roughly 30 minute stand up and a short lesson about upcoming topics. Midday (1:30) there’s another check in meeting that is for q-and-a and the instructor has additional lessons planned to fill that half hour. The video check-ins have been good for me as I was concerned that I would start to wander off if left entirely to self-directed study and communications via Slack.

On day one we’re given access to Galvanize’s “Learn” platform that has eight lessons, each of which are comprised of sub-sections with code challenges, and a final code challenge that tests student knowledge of the topics covered. The lessons are:

  • Intro to Python 3.6
    • Basic data types, flow control (if…else), while loops
  • Strings and Lists
    • String indexing and iterations, string formatting (notably Python 3.6’s f-string wasn’t introduced), lists and list comprehensions
  • Python tuples, dictionaries and sets
    • Mutability, tuples, dictionaries, sets, dictionary comprehensions
  • Python functions
    • Functions with parameters and arguments, function variables and scope
  • Linear Algebra
    • Vectors, matrices, matrix multiplication, rank and linear independence
  • NumPy
    • Create and modify arrays, use boolean indexing, linear algebra operations
  • Foundations of Probability
    • Language of probability, set arithmetic, set notation, dependence/independence, chain rule, law of total probability, Bayes’ Rule, Bayes’ Fallacy, Bayes’ probability trees, combinatorics, multiplication rule

Students are encouraged to reach out to the Slack workspace if they are stuck on a problem for more than 10 minutes, and in the course intro they go over posting etiquette and using threads rather than cluttering the main Slack space with problem discussion. Students are encouraged to help each other without giving away answers or posting working code. Our instructor has been very responsive and I haven’t felt as though students are solely reliant on each other.

I’m starting day 3 with tuples and sets. My goal is to get into the linear algebra section Friday morning so I’ll have a week to pick the instructor’s brain about math concepts that I had difficulty with in college. So far I’ve tackled series and sequence questions without much difficulty, and I certainly understood what the question was asking even if I couldn’t immediately write code to solve the problem.

I plan to touch base at least three more times before the course is finished. I hope you find this series useful, and feel free to ask question in the comments; I’ll do my best to answer in a timely manner.

Galvanize Data Science Intensive, Premium Prep Course (Day 3)