Hi Jake! What is your experience in data analytics and science roles?

  • Computer vision researcher, UCLA, worked to get computers to recognize objects better.
  • Data scientist, Utopia Compression, sold my soul to do R&D for DARPA, erring on the side of the greater good (e.g. automated landmine removal as opposed to automated baby exploding).
  • Data scientist, New York Times R&D Lab, worked to understand how data would transform journalism and the world writ large.  One of the best jobs ever.
  • Founder + Executive Director, DataKind, working to get socially conscious data experts teamed up with visionary social orgs to make the world better through data.  The best job ever.

What is your educational background and how has it prepared you for your role in this field?  What skills did you not develop in school that you find important in your work?

I got extremely lucky in my academic choices.  I started out getting a B.S. in Computer Science with a focus on intelligent systems, which gave me the software skills and critical thinking for building tech solutions and dipping my toe into machine learning.  I got incredibly lucky in choosing my advisor for grad school, who forced me into a Statistics Ph.D. instead of a Computer Science Ph.D., where I learned the mathematical and statistical foundation for drawing conclusions from data and building computer systems to take advantage of that.

I was very fortunate to be in an extremely applied statistics program at UCLA, so thankfully we were taught many of the computing skills that I hear other stats programs lack.  However, like most tech disciplines, there isn’t enough time to learn every individual tool, so I found myself picking up Python and Processing on my own.  More than that though I would have loved to learn communication design, a topic that I think is *sorely* lacking in the scientific community.  90% of our jobs are (or should be) communicating our results to the non-technical as well as the technically oriented, so being able to visually and orally communicate what we’ve done is a hugely important skill.  I recall a vague “learn how to present!” course being offered through a well-intentioned career services group that I just never got around to taking amongst all my other commitments.  I wish that had been mandatory or that the culture of communicating results had been built into the coursework itself.  R’s default graphics are not only unsexy, they can be misleading if you don’t know what you’re doing.

What are the biggest challenges in data science and/or analytics?  What are the most important things to ‘get right’. What are the best technologies available to solve these problems?

You’re going to hear this so much more in 2013, but the biggest challenge, IMO, is asking the right questions.  People are excited to dive into a new dataset and get “hacking”, or often come to us at DataKind with a big dataset, plop it down, and say “now what?”, but the data isn’t going to ask the questions.  Sure, there are lots of exciting and new things we can learn from data, but without someone with the vision of what needs to get done, be that hitting a performance metric in a company or broadly understanding a trend in your field, you’re just going to be spinning your wheels.  Data should be used in service of solving the bigger problems that an expert can help identify.

One of our major principles at DataKind is that we team data scientists with subject matter experts because, for all of our software writing and data analysis skills, we don’t know what app or analysis is going to be most useful for, say, alleviating poverty.  As the barriers to obtaining and analyzing/visualizing data disappear, you’re going to see a glut of projects that people pull together merely because they can.  The real distinguishing feature between these projects and the ones that really have lasting impact is that the latter solve a problem that was scoped with someone who understands what that analysis/visualization/data tool is ultimately going to be used for and why.

What’s your definition of data analytics and data science?

Woof, I don’t want to wade into a flame war but, simply put, data science (to me) is merely statistics souped up with some programming skills.  Academic statistics really missed out on a marketing opportunity by letting industry define “data science”.  The term data science was being batted around as a term for statistics in academic circles as early as 2001 when . makes this point better than I ever could, but statistics has always been the discipline dedicated to :

1) collecting data
2) exploring data for hypothesis generation and assumption testing
3) modeling data using mathematical models
4) drawing conclusions from the data about the world in general
5) communicating those results to the public.

The only fundamental shifts I see is that increased computing power has touched every one of those steps – the ubiquity of cellphones and computers means we need programming skills to collect and manage data, new software exists for visualization and analysis, more powerful computers have made previously impossible statistical methods like Monte Carlo methods tractable, and we can interactively display information that used to live on the printed page – and the whole process has become democratized with the removal of barriers to cheap and accessible tech and data.  Aside from that, the core process of collecting data and making sense of it still squarely falls in the realm of statistics, and the sooner programmers pick up stats skills and statisticians become facile in programming, the better off the data science community will be.

What advice can you give someone with little experience in analytics to pursue a career in the field?

As Hal Varian put it some years ago, “the sexy job in the next 10 years will be statistics.”  You can’t browse the news without hearing that data scientists are the new “sexy rock stars” (proving that the term ‘sexy’ is very open to interpretation), so the impetus to join is there.  If you’re already sold and just want to know how to get started, I’d recommend taking advantage of this amorphous time to pick up some new skills alongside the growing data science community.  Like I said above, if you know how to program, join a statistical Meetup near you.  If you know statistics, take some classes on programming.  The data community, at least here in New York, is wildly inclusive and open, and I’d encourage anyone interested in this field to dive in by going to a hackathon and introducing yourself or following along on message boards and forums if face-to-face isn’t your bag.  Moreover, there are now more opportunities to learn “data science” than ever before, from new programs at universities like Columbia University and Rice University, to accelerated courses at places like 3rd Ward or the Insight Data Science Fellows program, to online courses from Coursera.  If you’re interested in the field of analytics and data science, I’d say roll up your sleeves and jump in!

How do you think the field will be different in 5-10 years?

Hah, things change so quickly I can’t even dream of a world more than 2 years from now.  Remember what things were like 10 years ago in 2003?  Facebook was a twinkle in Zuckerberg’s eye, the iPhone was a good presidential term away, and even companies like Google weren’t sure how or why to hire statisticians.  It was like the dark ages.

I will say this, regardless of the time frame:  You’re going to see data everywhere, and very soon.  The hype of “big data” is already reaching a fever pitch where most people have heard of it and, if we do our jobs, that hype will settle into a world where data and analytics are first-class citizens in decision making.  The fun thing about that last idea is that the term “decision making” applies to everything.  We’re not just talking about industry decisions like optimizing supply chains, or understanding customer sentiment, we’re talking about everything from healthcare delivery decisions to government aid distribution decisions to even just what you eat every day.  We’re going to be living in a world where everything is instrumented, everything recorded, and all of that information is going to be used to adjust our practices for the better in realtime.  Lest you see that as a dystopian neo-Tokyo world of endless circuitry and Big Brother style privacy erosion, I believe that we will apply these new technologies and information streams to improving our world for the better.  I know that’s what I’ll be working toward.

Connect with Jake on:

Have questions?  Continue the conversation in the comments.

Tags: ,

No comments yet.