30 Sep To get by (doing data science), with a little help from your friends
In a earlier blog post, I spoke about how each of us, despite our varied backgrounds, brings a unique and useful perspective to a data science team. As statisticians, computer scientists, software engineers, and/or hybrids, we utilize our individual strengths towards the common goal of providing insights to questions, from available data sets. This brings us to the question – what does an ideal data science team look like?
Perhaps, prior to exploring this question, we need to address why there is a need to work in a team. Quite simply because the results obtained from planned collaborative efforts greatly exceed those of one individual, not just with data science but with almost any endeavor. A team effort, when carried out well, thrives on the right combination of complementary individual strengths dedicated towards a unified goal.
Suppose, we find that one individual who has the right combination of all the necessary skill-sets that we need from a team – the so-called unicorn data scientist. First, we can rarely expect to find these super data scientists who know everything, can do everything, and deliver everything. Second, even if we do find someone who has this superhuman combination of skills, we eventually come to the realization that one person cannot address every organizational need. With the help of big data cloud services and machine learning black box technologies offered by various vendors, a single data scientist could accomplish a lot these days. But with increasing demands, and the need for resources, this set up is bound to fail over time.
Given that having a team is imperative for a company’s data science needs, how do we build that team? The composition of our data science team is, in many ways, defined by the type of data science we intend to do. As explained by Robert Chang in this Quora response, it is important to clearly understand why we need data science. What is the nature of data science we are doing? What are the problems we are addressing? This would help to understand whether the company is skewed more in the direction of analysis or product development (usually, some combination of these two categories with one superseding the other). A good understanding of these basic issues, within a company’s environment, will go a long way in putting together an effective team.
Are there certain basic components that are absolutely necessary for a team, irrespective of the type of data science efforts that the company is involved in? Michael Wu, chief scientist of Lithium Technologies, prescribes at least three necessary team members – (a) a business analyst who works with front-end tools closest to the organization’s core business, (b) a machine learning expert with knowledge of statistics who develops algorithms and builds data models, and (c) a data engineer who works with the bottom layer to capture, store, and process data. Compromising on any of these three areas could damage the team’s productivity as explained in this blog post from codecademy. These roles may, of course, become more fine-grained and branch out into several specialized ones (e.g. data engineer, designer, full-stack developer, solutions architect).
DJ Patil, who along with Jeff Hammerbacher is considered largely responsible for popularizing the term data science, has a refreshingly different view on what to look for in data scientists, when hiring for a team. His criteria speak more to the general mindset of a data scientist as opposed to the technical skill-set – having expertise in at least one scientific discipline; having the desire to discover while simultaneously abstracting problems into actionable hypotheses; having the ability to tell a story visually, using data; and having the ability to think about solutions to a problem creatively (i.e. without functional fixedness).
All these ideas can be summarized using a musical metaphor (a favorite exercise of mine). If I were to compose music and release it in the form of an album or a song to an audience, how should I go about doing it?
Option 1: I could try being a superhuman data scientist with the help of additional tools. So, with the help of existing digital audio workstations and sequencers (e.g. Pro Tools, Logic Pro, Ableton Live, Liquid Music), I could compose, arrange, and produce my own music. While this endeavor would involve some initial investment, it would considerably cut down on effort, time, and expense involved in collaborating with a team of other musicians. The advantages are clear. However, the disadvantages become obvious with time. The scope of my compositions would be limited to the techniques I use and restricted within the boundaries of my own creative knowledge schemas. The need for collaborating with other musicians becomes increasingly obvious with time.
Option 2: I could collaborate with other musicians. Before I collaborate, I need to decide on the choice of musical genre, to determine the type of musicians I would need to collaborate with. In other words, this is the phase where a company understands its data science needs before hiring a team (i.e. what type of data science are we doing? more analysis, more product development etc.). Since I approach music from a guitar player’s perspective, if I wanted to put together an acoustic duo, all I would need is another acoustic guitar player who can also write lyrics and sing. If my music fit more in line with rock, then I would need a bass player and a drummer, and perhaps a singer. If my musical choice were funk, then I would likely need a couple of musicians playing horns, in addition to the bass player and drummer, and perhaps a keyboard player depending on how much of the musical space I intend to fill. Given my musical choices, it seems that I would need at least a bass player and a drummer, at a minimum (analogous to the minimum data science team requirement of an engineer, business analyst, and machine learning expert).
Good luck with building your data science team!
By: Naresh Vempala
Image credit: freepik.com
Sorry, the comment form is closed at this time.