Aug 08, 2016 Hiring Data Scientist With Data
What process should you use to hire a data scientist?
Logically, it should include data of course. But using data to hire a data scientist can be a catch-22 for organizations just getting started in their data science journey. If they had the resources to dedicate to the data, they wouldn’t need to hire. If they don’t have the resources, they’re unable to use data to move forward. In the beginning, it can very much be a case of not knowing what they don’t know.
There is a lot of talk about hiring the elusive data science unicorn—someone who “has mastered every single programming language, has become an expert in each big data platform, and is at the top of the class in statistics, mathematics, and development.” And according to tech writer Rick Delgado in his article, What to Know to Avoid Hiring a Bad Data Scientist, that person doesn’t exist. And if they did, they probably wouldn't looking for a job—data science as a career is literally on a rocket trajectory and has been for years.
Three Identifying Criteria
At the Teradata PARTNERS Conference in Anaheim, Michael Li explained in his presentation “Needle in a Haystack,” that in their experience at The Data Incubator, the best data scientist must span three core competencies. They must be a data analyst, data engineer and data scientist. The first two skills already exist in most organizations, but often aren’t combined or structured in a way that allows them to function as a data scientist.
Looking at each individually, it’s clear how they contribute to effective data science.
It’s critical to move beyond the academic to really understand business environment, look at the data and be able to determine what is important and what has the most value for the organization. They should also be able to communicate these decisions and findings in language anybody can understand.
The technical ability to move and wrangle data into appropriate form and do the procurement without additional assistance is still critical. This gives the data scientist the ability to work independently and at his or her own more rapid pace.
Of course, moving forward and deeper into the data will require advanced mathematics capabilities, ETL and machine learning. But it can’t be the only thing. Without the other two, data science is likely to drift in an organization without the power to either get things done or establish and communicate the right goals.
They Went to ____, They Must Be Good
In a lot of hiring situations, the university that the applicant attended carries significant weight. This has especially been true for the “top 20” schools. But when Li put that hypothesis to the test with their applicants they discovered that universities were actually not a good indicator of success in data science.
Much better indicators for them were degrees, with best being pure mathematics and applied mathematics. But even that was not absolute. For example, in their tests, many chemistry majors performed better than pure math majors. They discovered that to truly identify the best, they needed to discard their biases and look beyond the expected schools and degrees.
Blindly Removing Biases
An example given was a study regarding biases in a famous orchestra. At the time, women comprised less than 5% of performers, a number that didn’t reflect the number of applicants. Historically auditions had been performed live for the selection committee face-to-face. When they put up a curtain and only identified the performers via non-gender specific information, the number of women performers increased seven-fold—from 5% to more than 25%.
According to Li, when we hire we look at the resume and we make snap judgments based on school, previous companies, location, biographical bias we're often wrong.
A better course of action is to present a series of challenge questions that then lead to actual problem solving which includes mathematics, statistics, data analytics and data engineering. It could begin with a simple question, followed by a related problem to see how the applicants solve it. In Li's case, they found that when they asked the question, "do you know Python?" they saw the following:
- 46% said they knew program
- 35% actually wrote Python code
- 26% got the answer right
- Only 12% wrote code that showed a deeper mastery
To Data Science or Not to Data Science
With the phrase data scientist being in demand and commanding prime salaries, it is being thrown around by both employers and potential employees. Everyone wants one and everyone wants to be one. It is critical that organizations be diligent in truly defining what they need and the role they are trying to fill. Go beyond the resume to ensure applicants have actual skills. And don't just hang the title on your open position to make everyone feel good. Make sure your role actually calls for a true data scientist that covers all three aspects.
Looking to join data scientists for discussion and collaboration about Python, linear regressions or maybe just plain talk about pivot tables? You are sure to find all of that at Teradata PARTNERS Conference 2016.
Check out sessions in the Big Data/Analytics tracks for all of the latest strategies.
View the presentation, Needle in a Haystack, presented at Teradata PARTNERS 2015 by Michael Li, PhD, The Data Incubator.