Galvanize Data Science Prep, pt 2

Hey, so guess who forgot to write an update last week? The course has been great, I highly recommend it. Part 2 was released yesterday (Wednesday, the course concludes Friday). [edit: and an update to Part 1, see the list below] Pt 2 is intended as a primer for students who are enrolling into the data science intensive, and looks like it expands on the topics covered in the prep course. The instructor won’t be available after Friday, but the slack team will still be active, and the other students have been helpful whenever there’s a question posted. Here’s what’s additionally on offer:

  • Git Primer
    • Primer, Concepts, submitting work to the instructor
  • In-terminal Editors
    • Vim
    • Nano
  • Object Oriented Programming
    • Terminology
    • Classes
    • Initialization and ‘self’
    • Methods
    • Magic Methods
  • Pandas
    • Pandas Series
    • Panda DataFrames
    • Merging DataFrames
    • Split, Apply, Combine data
    • Visualize Data with Pandas
    • File I/O
    • intro to Exploratory Data Analysis
  • Interpret functions as integrals and derivatives
    • Mathematical Limits
    • Derivatives and rates of change
    • Integrals of functions
    • Connection between derivatives and integrals
  • Linear Algebra 1
    • Matrix Inversion
    • Systems of Linear Equations
    • Vector Similarity
  • Linear Algebra 2
    • Linear Algebra from a Geometric Perspective
    • Linear Transformations Overview
    • Rotations
    • Changing Dimensions
    • Eigenvectors and Eigenvalues
  • Statistics and Probability
    • Random Variables
    • Distributions
    • Estimation
  • Back in Part 1 SQL was added some time this week
    • Database Structure
    • Populating a database
    • Writing simple queries
    • Writing aggregate queries
    • Joining tables
    • SQL style conventions

So wow, that’s a lot of ground to cover. It looks like a really good expansion on the topics in Part 1, and I’m looking forward to going through that material.

The modules I outlined in my previous post have been generally good. There are some aspects where there isn’t a clear line between A and B. In a lot of ways that mimics my experience in engineering school; a primary difference being that I now have the web to look for explanations of things that aren’t making sense. Back then I could reread the textbook, review my crummy notes or try the math tutoring center (if it was open and if I could get over my anxiety about asking for help). I’m much better about using the resources that are available to me.

Now I have a plethora of examples to look for one that makes sense to my brain about how a particular math operation should work. There are some aspects of using NumPy that simply need practice and repetition, but fortunately I’m comfortable and experienced with googling my code problems.

So far there’s only been one challenge (end of module) problem that I’ve called shenanigans on. It required using a technique that was discussed and we were given two basic practice problems to see how it works. The question in question required recalling this technique several modules later, then applying said technique to a new method that behaves very differently from anything we’d encountered previously. I was able to figure it out, but it felt like the learning was less about how to use the method and more about deep-diving on problem solving. This question was either a success or not, depending on their intended outcomes.

Having the instructor has been really helpful. I’m new-ish to Python and how the syntax parses. Having someone to pair with to review my code and tidy it up was fantastic.

I’m looking forward to finishing this first probability module today. As I’m able to continue with the new material, I’ll make some new posts to compare these “deep dives” with the modules in the first section.

Advertisements
Galvanize Data Science Prep, pt 2

Galvanize Data Science Intensive, Premium Prep Course (Day 3)

You’re not behind, I am. I’m writing this on day three but it’s the first blog post in what will be a short series.

Last Monday the venue for PuPPy’s Interview Prep Night was Seattle’s Galvanize location. I’ve been there before for meetups and I was so very impressed and grateful for their hospitality toward IPN. Snacks, water, swag (swag!) and discount codes for their Data Science Immersive prep course. Awesome!

My motivations are mostly altruistic. Over the course of TAing classes for University of Washington, at meetups, and as an organizer for IPN I’ve been asked numerous times, “What do I need to do to get into data science/machine learning/AI?” Previously my answer was along the lines of “git gud at Python…. and how’s your math?” I have a slightly better idea of what’s involved in the foundational education now that I’ve seen how this course is laid out.

The GDSIPP is “self-directed” insofar as we’re left to work on the sections at our own pace but there’s an instructor available from 9:30 to 4:30 Pacific Time for the two week course. Each day begins with a roughly 30 minute stand up and a short lesson about upcoming topics. Midday (1:30) there’s another check in meeting that is for q-and-a and the instructor has additional lessons planned to fill that half hour. The video check-ins have been good for me as I was concerned that I would start to wander off if left entirely to self-directed study and communications via Slack.

On day one we’re given access to Galvanize’s “Learn” platform that has eight lessons, each of which are comprised of sub-sections with code challenges, and a final code challenge that tests student knowledge of the topics covered. The lessons are:

  • Intro to Python 3.6
    • Basic data types, flow control (if…else), while loops
  • Strings and Lists
    • String indexing and iterations, string formatting (notably Python 3.6’s f-string wasn’t introduced), lists and list comprehensions
  • Python tuples, dictionaries and sets
    • Mutability, tuples, dictionaries, sets, dictionary comprehensions
  • Python functions
    • Functions with parameters and arguments, function variables and scope
  • Linear Algebra
    • Vectors, matrices, matrix multiplication, rank and linear independence
  • NumPy
    • Create and modify arrays, use boolean indexing, linear algebra operations
  • Foundations of Probability
    • Language of probability, set arithmetic, set notation, dependence/independence, chain rule, law of total probability, Bayes’ Rule, Bayes’ Fallacy, Bayes’ probability trees, combinatorics, multiplication rule

Students are encouraged to reach out to the Slack workspace if they are stuck on a problem for more than 10 minutes, and in the course intro they go over posting etiquette and using threads rather than cluttering the main Slack space with problem discussion. Students are encouraged to help each other without giving away answers or posting working code. Our instructor has been very responsive and I haven’t felt as though students are solely reliant on each other.

I’m starting day 3 with tuples and sets. My goal is to get into the linear algebra section Friday morning so I’ll have a week to pick the instructor’s brain about math concepts that I had difficulty with in college. So far I’ve tackled series and sequence questions without much difficulty, and I certainly understood what the question was asking even if I couldn’t immediately write code to solve the problem.

I plan to touch base at least three more times before the course is finished. I hope you find this series useful, and feel free to ask question in the comments; I’ll do my best to answer in a timely manner.

Galvanize Data Science Intensive, Premium Prep Course (Day 3)

Dutch National Flag Problem and Code Golf

Last night at the Interview Preparation Night meetup that I host, I was asked to implement the Dutch National Flag problem on the whiteboard. The way it was presented to me was: given an array and a pointer element whose value acts as a pivot, return an array that has all the values less than the pointer to the left of the pointer, the pointer, and all values greater than the pointer to the right of the pointer.

I came up with a solution that works, and is fast, but it led to a discussion of expectations when the asker changed the game “oh, it’s supposed to be done in quicksort.”

Here’s all of the relevant code in one spot: https://repl.it/@kmskelton/Dutch-National-Flag-variations If you’d rather not take the time to check it out, I’m adding my comments and thoughts below.

The best implementation depends on the bottleneck in the system.
If I have to do this on a RaspberryPi I’d probably try my implementation and see if there’s still storage space; if not, let’s implement best_dnf. Same if I’m doing this on a cloud computing platform because there’s probably a storage limitation. If I’m in a browser I’d use my implementation because I’ll use all of your system’s resources all day.


These “code golf” answers drive me nuts. Essentially you can’t get to EPIP’s answer unless you a) know it or b) have most of the standard library memorized. I hadn’t seen a,b=b,a before, so I had to look it up:

https://stackoverflow.com/questions/21047524/how-does-swapping-of-members-in-the-python-tuples-a-b-b-a-work-internally

Python separates the right-hand side expression from the left-hand side assignment. First the right-hand side is evaluated, and the result is stored on the stack, and then the left-hand side names are assigned using opcodes that take values from the stack again.

Dutch National Flag Problem and Code Golf

PuPPy Interview Preparation Night – A Philosophy in Progress

I recently agreed to co-organize the Puget Sound* Programming in Python (PuPPy) Interview Practice Night. It has had much more attention than I expected and I’ve found that I’ve honed my message considerably; which is a benefit for me and my audience. This is a work in progress and I expect this will move in several different directions as we host an increasing number of events. However, I want to put this out there for groups that want to implement a similar practice paradigm. I’ll skip the discussion of finding locations and advertising. That will be entirely different for each group. Rather I will focus on my philosophy for the evening, starting with the problem as I see it.

A caveat I’d like to add is that this is not entirely my project, though I do get to be one of the two or three hands on the steering wheel. I have the impression that others involved with IPN have similar feelings but I can’t guarantee every event will be run the same way. I can guarantee that the events I host will be run this way and I’ll encourage the other hosts to adopt this philosophy.

The Problem

Most whiteboard interviews seem to follow a pattern: A relatively new to the field developer** applies to a job; Now “Applicant,” this person is told there’s a tech interview scheduled; Applicant freaks out and deep dives into Cracking the Coding Interview; Applicant arrives and is put in front of one or more engineers who ask them to perform a coding exercise that is out of context (these typically lack a use case and are divorced from the tools of the software developer); Applicant’s mouth dries as they stammer and piece together some code on the board; Interviewer says “thanks for your time” and Applicant leaves, having no idea how they performed; Applicant (maybe) receives an email informing them that Company is not moving forward with the interview process.

A Solution

Interview Practice Night hopes to develop the skill of performing a whiteboard interview. I want attendees to practice a nerve-wracking skill in front of relative strangers, but in a low-stakes environment. Ideally after a few attempts at coding in front of strangers, attendees will lose a lot of that performance anxiety. Our foundational texts are McDowell’s Cracking the Coding Interview as well as Elements of Programming Interviews in Python. Problems that attendees have encountered in technical interviews or on a code-challenge website are also encouraged (so long as the person bringing the problem has a reasonable path to solving their problem).

For the person writing, I encourage thinking out loud, writing out givens and assumptions, and pseudocoding. I understand that frequently the (un)stated goal of tech interviews is to learn how the applicant thinks and communicates about code. I encourage the person writing to work until they’re actually stuck, take a moment to think about their solution and if they can go no further, ask for a hint. I have heard both sides of this argument, but I’m of the opinion that: if I’m stuck on a problem, not moving forward doesn’t work in my favor; asking for help shows both an understanding of my abilities and their limitations; asking for help demonstrates that my ego is not more important than solving this problem; asking for help is what I would actually do in real life; and if asking for help is a mark against my candidacy for a job, that tells me a lot about the culture of the company in question.

For the people who are not writing, I strongly encourage them to act like they are the teammates of the person writing. We live in a world where professional code is written by a team; in real life teammates wouldn’t let the person at the whiteboard drown when they became stuck and they also wouldn’t let their teammate follow a wrong path (assuming the team identifies that the solution is heading in a non-optimal direction). I want the writer to solve the problem on their own if they’re able. But if teammates notice there’s a drop in production (written or spoken) I want the non-writers to ask if the writer wants a hint; or wants to share what’s happening in their internal dialog. If the team sees the writer moving in the wrong direction, I want the team to offer gentle encouragement back to a “correct” path. Much to my pleasant surprise, once a problem is “solved” I’ve witnessed great conversations about optimizing solutions and iterating on solutions to solve related, but different, problems.

Related Interview Skills

Whiteboarding is a useless skill if the applicant never gets to the room with the board and the pen. My future plans include résumé review, behavioral interview practice and remote coding interview practice. All of these initiatives are in parallel with whiteboarding; not everyone at a practice night will be participating in these events at the same time.

Résumé review is what it sounds like. The key here is to avoid a situation of “the blind leading the blind”. I can review your resume. But I haven’t looked at dozens or hundreds of resumes while trying to hire for an open position. Hiring managers and HR professionals can work with attendees to revise résumés, in particular looking for spelling and grammar errors that will turn off a potential employers. Ideally our professional volunteers will help “punch up” the document to make the verbiage more attractive to potential employers. In some cases, something as simple as rearranging the bullet points in a section can make it more enjoyable to read.

Behavioral interviewing is yet another skill that people don’t have a lot of opportunity to practice, outside of “throwaway” interviews (which is not an ideal situation for either side of the table). My vision for this is to have the interviewer ask STAR/”tell me about a time when” type of questions. The ‘interview” should last about ten minutes, with about five minutes for feedback. The interviewee will get practice with their storytelling, honing the details and narrative. The stories should only have the details needed to tell the story without meandering, but also avoiding “stories” that don’t offer any details (the dreaded monosyllabic answer to the open-ended question). In addition to narrative feedback, the interviewee should expect to learn about tics and distracting mannerisms ( “um…”, “…like…”, “… you know? …. you know?” as well as face touching, not making eye contact, etc.)

My dream for this project is to set up remote coding interviews. For many these are the worst scenario. The interview is conducted over the phone, which not only has the as-of-yet unsolved telephony issues but calls lack the non-verbal communication cues that we rely on. In addition, many people don’t have experience coding in a web-based IDE much less a shared web-based IDE. I would like to have the interviewer set up in one space with the interviewee in a separate space. The coding problem doesn’t have to be complex, this is a good use case for fizzbuzz or Fibonacci. The interviewee needs to practice with the interpreter/REPL and they should receive similar feedback as they would from a behavioral interview.

If you are in the Puget Sound region and would like to join us, please find the Interview Practice Night group through the PuPPy Meetup group, below.

Puget Sound Programming Python (PuPPy)

Seattle, WA
7,199 Pythonistas

Welcome! We are a fun and friendly user group dedicated to proliferating a diverse and talented Python community in the Puget Sound region. We are devoted to exploring Python-…

Next Meetup

Interview Practice Night

Monday, Feb 25, 2019, 6:15 PM
19 Attending

Check out this Meetup Group →

Please bring this to your group and try it out. It can be your coding bootcamp, your CS cohort, the informal coding group you’re a part of. If you find this useful, please let me know. If you tried this and found a different method that worked really well for your group, let me know. Good luck out there.

“*” PuPPy understands there needs to be some poetic license taken with the acronym.
“**” It is my observation that mid- and senior-level developers do not face this same challenge. Often whiteboards, if they’re used at all, are used to write out ideas collaboratively during interviews that feel like chat sessions.

PuPPy Interview Preparation Night – A Philosophy in Progress

instaSquared

I showed instaCropper to my social media clients, and they were impressed. One of them suggested that I increase the size of the canvas, rather than crop the images to squares. The tradeoff is that increasing the canvas makes the original image relatively smaller. The advantage is that I can bulk process photos in about a third of the time it took me to process them with instaCropper.

InstaSquared starts by ensuring the file in question is a photo by checking for EXIF data; if there is no EXIF the user is informed that the file is being skipped. Once it’s established that the file is a photo, it finds the longer side and uses that dimension to make a new canvas. There is some math to figure out offsets and the original photo is pasted onto the center of the new canvas.

instaSquared was designed to process bulk images that are sent to me by show photographers. With that in mind I eliminated all of the user decisions that were available in instaCropper. Now all files have “squared_” prefixed onto the original file name and the new file is saved to a “squared” directory in the parent directory.

If you try it out, please let me know what you think.

Original photo from my trip to Oahu
InstaSquared version of the above (resized by WordPress)
instaSquared

instaCropper

I started managing social media for a local theatre group. We’re trying to improve engagement on FaceBook (in our group and in each event), Twitter and Instagram. The process was eating in to the time I allotted for this project, so in addition to setting up an account at Sendible to automate post scheduling, I created the instaCropper tool.

The problem with Insta’ (the ‘Gram) is that the images for upload must be perfectly square (except for story photos which can be something like 1.9:1). The next issue is that performers universally don’t ask their photographers to provide the equivalent of passport photos just so some social media dude can make Instagram posts. Certainly cropping a single photo perfectly square isn’t particularly time consuming…

But I write code! Why would I do this manually? Ten performers plus two producers times three photos for each of those, plus photos of three back-of-house people, plus show prep photos equals more time that I want to spend drawing perfect squares.

Starting with the Pillow library, I was able to write a script that will take in a single image or a directory for processing multiple images. Each image is opened in preview to confirm that image is correct. If the orientation is wrong I can rotate or flip the image as needed. Should there be more background than I want I can set the crop origin. The crop is confirmed via a preview image. Once it’s what I want I can save it with a custom filename and in a separate directory from the original photo – all from the command line.

Processing a single image used to take at least 70 seconds. It now takes about 11 seconds and there is no chance that I will accidentally overwrite the original photo (that happened more than once, fortunately I was working on local copies from a google drive folder). If you use the tool, please let me know. If you think of reasonable additions, feel free to create a github issue.

instaCropper

Let’s Talk Template Literals

It’s interesting that there are so many ways to STDOUT text to the screen. Let’s look at a few of them in JavaScript (ES6) and Python (3.6)

Concatenation: This works in both JavaScript (console.log) and Python (print). “The first half of the sentence ” + “is joined to the second with a plus sign between the halves.” Also works with variables. “Hello, ” + user.name + ” and welcome to our shopping site.” Do pay attention to where spaces are.

Embedded expressions: Javascript allows developer to include simple expressions as well.

console.log( "Hello, " + user.name + ". Your total is: " + (purchase.subTotal + purchase.tax + purchase.delivery) + ".")

Hello, Kristopher. Your total is $123.45.

String formatting: Python 3.6 allows developers to include variables mid-string as follows:

print('We are the {} who say "{}!"'.format('knights', 'Ni')) 

We are the knights who say "Ni!"

But if you need to refer to the position of the argument (values passed in to ‘format()’ there’s also:

print('{0} and {1}'.format('spam', 'eggs')) 
spam and eggs

or:

print('{1} and {0}'.format('spam', 'eggs')) 
eggs and spam

As well as keyword arguments, though I find this exceptionally cumbersome and wonder why the developer wouldn’t jut hard-code the values into this sentence. I’m including it because it’s in the Python docs and isn’t noted as being too archaic:

print('This {food} is {adjective}.'.format(food='spam', adjective='absolutely horrible')) 

This spam is absolutely horrible.

My personal preference, added to Python in 3.6 is the f-string

print(f'Welcome, {user.name}! Today's special is {special.name} and is on sale for {special.price}.')

Welcome, Kristopher! Today's special is Singing Rooster Coffee 12oz vacuum bag and is on sale for $10.

This is really easy to debug, in part because it reads very much like the end sentence, without extra punctuation. I also don’t have to type-cast my variables like in Python 2.7 (which required %s, %f and %i when referring to strings, floats and integers). Take a look at the Python docs for more examples and explanations: https://docs.python.org/3/tutorial/inputoutput.html

In JavaScript I’m growing attached to template literals. Like the f-string a template literal reads very much like the final product.

console.log(`Welcome, ${user.name}! Today's special is ${special.name} and is on sale for ${special.price}.`

Welcome, Kristopher! Today's special is tomato soup with grilled cheese sandwich and is on sale for $6.75.

The only catch here is to look for backticks (“ the keys that are left of the “1” on a US keyboard). At first they can look like single quotes, though most IDEs I’ve worked with do a good job of making the angle very obvious. MDN has done a great job with the documentation: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals

Let’s Talk Template Literals