Sep 11, 2017 My Data Scientists Are Failures
The Multiple Kinds of Failure and How to Succeed in the Long Run
“Ninety percent of what I do doesn’t work.” That’s what one brave data scientist said in a company meeting, according to Christopher (Chris) Hillman. It’s not that he wasn’t great at his job; it’s just the nature of the work. What inspired the man to speak out at the meeting was the idea of contributing all of his work to an asset repository. But contributing everything, even what doesn’t work, is exactly the point, said Chris. That is how he arrived at his presentation and message for PARTNERS: “Are your Data Scientists failures? They should be!”
“Because a lot of what we do in data science is iterative, it fails. That’s all part of the process,” said Chris. “But that very rarely gets recorded or saved for other people to use and learn from. The problem is that other people make exactly the same mistakes getting to the same place.” That makes it growth incremental at best.
Fail—And Smile When You Say It
The reason we’re afraid of failure has several roots. For many people, especially in business, failing is seen as a negative. Things that don’t work are rarely documented. It even has a name—publication bias. As Chris relates, “In scientific papers, nobody publishes what didn’t work. Every scientific paper is about successes.” And we know from history that there are often mountains of failures in science before any breakthroughs.
For others, they are adopting the new fail-fast mantra without quite understanding it. There are different types of failure in data science according to Chris. “There are instances where the thing just doesn’t work. That’s bad failure. And there’s failure where you’re learning that the data doesn’t do this, and it doesn’t do that, but it does do this! That’s good. That’s a positive failure and really important information. We want to save it to make everything more efficient and make the best use of the data science team.”
This data visualization by Chris Hillman shows connections between Twitter users during a ‘Twitter Storm’.
Getting the Why Behind What Worked
When you know what’s been done before, what worked and what didn’t, you’re starting off ahead of the game. You’re not reinventing the wheel. “What a lot of companies should have, but don’t, are code repositories and descriptions of projects,” said Chris. “If I come to build a new churn model, for example, I should be able to look at the history of churn modeling in that company and see what other people have done, what did and didn’t work.”
“In the use cases we talk about, a lot of them have done this before. So they need to know why should I try it differently than the way it works now? But if the way it works now hasn’t been documented, you can’t answer those questions.”
In some cases, it’s a matter of the finding the right tool for the job. Right now, data science is hot and everything, from business intelligence to analytics to data mining, is getting pointed in that direction. But it could be a bad use case for data science. “Some projects are far better solved by traditional techniques—why not just do it like that? There’s no point in using a technique just because it’s there,” says Chris.
While failure is part of the data science job and how we learn, make sure that we’re documenting so that we have good failures—and build successes in the long run.
Why PARTNERS? Real Customer Stories
Go and see what interests you and speak to other customers while you’re there. They’ll give you the true story of what is actually happening. And there are multiple sessions dedicated to data science every day, with real customers’ stories you can hear and learn from and not make the same mistakes.
PARTNERS Session Title: My Data Scientists Are Failures – Session 0147
Are your Data Scientists failures? They should be! This session discusses the reason that failure is a good thing, how to encourage the “right kind of failure.” Some Data Science initiatives don’t work and with hindsight never could have worked—this is the wrong kind of failure, wasting time and money. However, some Data Science initiatives thrive on failure—it leads to an iterative pattern of discovery that ends with fantastic results. We also discuss how can we learn from failure so that the same mistakes are not made over and over, each project stuck in the same initial iterative steps. Many teams will be encouraged to document and promote successes, but how many of them accurately document the things that don’t work? With many Data Science teams experiencing high turnover, are your precious resources just failing in the same way as their predecessors? Come to this session and hear some solutions and war stories from the field.
Chris Hillman is a Principal Data Scientist in the International Advanced Analytics team at Teradata, based in London. He has over 20 years experience working with analytics across many industries including Retail, Finance, Telecoms, and Manufacturing. Chris is involved in the presale and startup activities of Analytics projects helping customers to gain value from and understand Advanced Analytics and Machine Learning. He has spoken on Data Science and analytics at Teradata events such as Universe and PARTNERS and also industry events such as Strata, Hadoop World, Flink Forward and IEEE Big Data conferences. Currently, Chris is also studying part-time for a Ph.D. in Data Science at the University of Dundee, applying Big Data analytics to the data from the Human Proteome Project.