• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Whenever you search in PBworks, Dokkio Sidebar (from the makers of PBworks) will run the same search in your Drive, Dropbox, OneDrive, Gmail, and Slack. Now you can find what you're looking for wherever it lives. Try Dokkio Sidebar for free.



This version was saved 4 years, 2 months ago View current version     Page history
Saved by Irina K
on November 9, 2018 at 5:08:40 am

 This Wiki is licensed CC-BY-NC-SA -
Creative Commons

Attribution-Noncommercial-Share Alike 3.0 License.
Authors, learn more about your rights.




Why Do We Need Data Science when We’ve Had Statistics for Centuries?


Data Science is emerging as as one of the hottest new professions and academic disciplines in these early years of the 21st century.  A number of articles have noted that the demand for data scientists is racing ahead of supply.  People with the necessary skills are scarce, primarily because the discipline is so new.  But, the situation is rapidly changing, as universities around the world have started to offer different kinds of graduate programs in data science.  This year, for example, NYU is offering two new degrees, - a general Master in Data Science, and a more domain-specific Master in Applied Urban Science and Informatics

It’s very exciting to contemplate the emergence of a major new discipline.  It reminds me of the advent of computer science in the 1960s and 1970s.  Like data science, computer science had its roots in a number of related areas, including math, engineering and management.  In its early years, the field attracted people from a variety of other disciplines who started out using computers in their work or studies, and eventually switched to computer science from their original field.

This was the case with me.  I used computers extensively while a student at the University of Chicago, where I worked closely with Professor Clemens Roothaan, - one of the pioneers in the use of computers in physics and chemistry.  As an undergraduate, I worked part-time at the university’s supercomputing center which he founded, and later he was my thesis advisor as a graduate student in physics.  When the time came to look for a job, I realized that I enjoyed the computing side of my work more than the physics.  I decided to switch fields and in 1970 joined the computer science department at IBM’s Watson Research Center.

Not unlike data science today, computing had to overcome the initial resistance of some prominent academics.  I still remember a meeting in 1965 with a very eminent physicist from whom I was taking a graduate course.  He asked me what I planned to do research on for my degree, and I told him that I was already working with Professor Roothaan on atomic and molecular calculations.  He just said that good theoretical physics should require no more than pencil and paper, rather than these elaborate new computers.  In his mind, this wasn’t real physics.  A number of the physics faculty felt the same way.  Change does not come easy, even for brilliant physicists.


Computer science has since become a well respected academic discipline.  It has grown extensively since its early days and expanded in many new directions.  It’s quite possible that being around in the early days of computer science and computing in general is part of the reason I’m so interested in the evolution of data science today.  

So, what is data science all about?  One of the best papers on the subject is Data Science and Prediction by Vasant Dhar, - professor in NYU’s Stern School of Business and Director of NYU’s Center for Business Analytics, - which was published in the Communications of the ACM in December, 2013.

“Use of the term data science is increasingly common, as is big data,” Dhar writes in the opening paragraph.  “But what does it mean?  Is there something unique about it?  What skills do data scientists need to be productive in a world deluged by data?  What are the implications for scientific inquiry?”

He defines data science as being essentially the systematic study of the extraction of knowledge from data.  But, analyzing data is something people have been doing with statistics and related methods for a while.  “Why then do we need a new term like data science when we have had statistics for centuries?  The fact that we now have huge amounts of data should not in and of itself justify the need for a new term.”

In short, it’s all about the difference between explaining and predicting.  Data analysis has been generally used as a way of explaining some phenomenon by extracting interesting patterns from individual data sets with well-formulated queries.  Data science, on the other hand, aims to discover and extract actionable knowledge from the data, that is, knowledge that can be used to make decisions and predictions, not just to explain what’s going on.

The raw materials of data science are not independent data sets, no matter how large they are, but heterogeneous, unstructured data set of all kinds, - e.g., text, images, video.  The data scientist will not simply analyze the data, but will look at it from many angles, with the hope of discovering new insights.  

One of the problems with conducting such an in-depth, exploratory analysis is that the multiple data sets that are typically required to do so have are often found within organizational silos, - be they different lines of business in a company, different companies in an industry or different institutions across society at large.  Data science platforms and tools aim to address this problem by working with, linking together and analyzing data sets previously locked away in disparate silos.

“Unlike database querying, which asks What data satisfies this pattern (query)? discovery asks What patterns satisfy this data?,” notes Dhar.  “Specifically, our concern is finding interesting and robust patterns that satisfy the data, where interesting is usually something unexpected and actionable and robust is a pattern expected to occur in the future.”

The article discusses the key skills data scientists should have, starting with machine learning, a complex concept which Dhar explains in a particularly simple way.  

“Most of us are trained to believe theory must originate in the human mind based on prior theory, with data then gathered to demonstrate the validity of the theory.  Machine learning turns this process around.  Given a large trove of data, the computer taunts us by saying, If only you knew what question to ask me, I would give you some very interesting answers based on the data.  Such a capability is powerful since we often do not know what question to ask. . .” 

“Suitably designed machine learning algorithms help find such patterns for us.  To be useful both practically and scientifically, the patterns must be predictive.  The  emphasis on predictability typically favors Occam’s razor, or succinctness, since simpler models are more likely to hold up on future observations than more complex ones, all else being equal. . .”

Data scientists should also have good computer science skills, - including data structures, algorithms, systems and scripting languages, - as well as a good understanding of correlation, causation and related concepts which are central to modeling exercises involving data.

“The final skill set is the least standardized and somewhat elusive and to some extent a craft but also a key differentiator to be an effective data scientist - the ability to formulate problems in a way that results in effective solutions. . . formulation expertise involves the ability to see commonalities across very different problems . . .”

Like computing, one of the most exciting part of data science is that it can be applied to many domains of knowledge.  But, doing so effectively requires domain expertise to identify the important problems to solve in a given area, the kinds of questions we should be asking and the kinds of answers we should be looking for, as well as how to best present whatever insights are discovered so they can be understood by domain practitioners in their own terms.  Garbage-in, garbage-out, a phrase I often heard in the early days of computing, is just as applicable to data science today.

Physics, chemistry, biology and other natural science disciplines have long been practicing their own version of data science.  In physics, for example, “a theory is expected to be complete in the sense a relationship among certain variables is intended to explain the phenomenon completely, with no exceptions. . .  In such domains, the explanatory and predictive models are synonymous.”  

But, given our newfound ability to gather valuable data on almost any topic, prediction can now apply to softer disciplines like the health and social sciences.  Dhar points out that while these fields generally lack solid theories “large amounts of data can result in accurate predictive models, even though no causal insights are immediately apparent.  As long as their prediction errors are small, they could still point us in the right direction for theory development.”

Finally, beyond access to the appropriate skills, are there cultural and management implications in embracing data science in the business world?

“Besides recognizing and nurturing the appropriate skill sets, it requires a shift in managers’ mind-sets toward data-driven decision making to replace or augment intuition and past practices.  A famous quote by 20th-century American statistician W. Edwards Demming - In God we trust, everyone else please bring data - has come to characterize the new orientation, from intuition-based decision making to fact-based decision making. . . It is suddenly possible to test many of their established intuitions, experiment cheaply and accurately, and base decisions on data.  This opportunity requires a fundamental shift in organizational culture, one seen in organizations that have embraced the emerging world of data for decision making.”


Things that matter

Among the many reasons you would want to become a data scientist is that you can make a positive contribution to society. Data science can give you some pretty super superpowers. One of them is reshaping industries like healthcare. The amount of data produced about patients and illnesses rises by the second, opening new opportunities for better structured and more informed healthcare. The challenge is to carefully analyze the data in order to be able to recognize problems quickly and accurately – like deepsense.ai did in diagnosing diabetic retinopathy with deep learning.

Did you know that deep learning can help predict dangerous seismic events and keep miners safe? Underground mining is fraught with threats including fires, methane outbreaks or seismic tremors and bumps. An automatic system for predicting and alerting against such dangers is of utmost importance – and also a great challenge for data scientists. Our deepsense.ai team created a machine learning model for the Data Mining Challenge: Predicting Dangerous Seismic Events in Active Coal Mines, which was the winning solution, and one we take great pride in.

Another superpower is saving rare species. When you think of rescuing endangered animals, you see remote jungles and scientists chasing them. This is a stereotype that has changed a lot in recent years. Complex predictive models and algorithms can create insights that help scientists analyze threats to wildlife and create a solution that can save animals – all from the relative comfort of a desk. In fact, it was at our very desktops that we created the Facebook for whales, and It works with 87% accuracy!





Why do we need more data scientists and why should you become one?



According to LinkedIn’s 2017 U.S. Emerging Jobs Report, the number of data scientists has grown over 650% since 2012. Yet there are still too few people exploiting the opportunities in this field. Why has it grown so fast?

Companies need to use data to run and grow their everyday business. The fundamental goal of data science is to help companies make quicker and better decisions, which can take them to the top of their market, or at least – especially in the toughest red oceans – be a matter of long-term survival. The number of companies prepared to use big data is increasing. As Dresner Advisory Services laid out in their Big Data Analytics Market Study, forty percent of non-users expect to adopt big data in the next two years.

What is more, you can apply machine learning on smaller data sets, such as ones from a local company’s social media or shopping gift card history. This provides even more opportunities and increases the demand for data scientists. Job growth in the next decade is expected to exceed growth from the past ten years, creating 11.5M jobs by 2026, according to the U.S. Bureau of Labor Statistics. Companies are building up their data science teams to embrace data analytics and will make it integral to their success. Why are these analytics so important? Is it worth working for one of these companies? You will find the answer in the next two chapters.


Data science changes how decisions are made and companies are adapting a data-driven approach on a huge scale. Data-driven decisions made with advanced data analytics benefit all manner of company, from global behemoths to medium-sized companies down to local businesses looking to get ahead. Lack of data is rarely an issue – mountains of it are collected every single second, and we are beginning to understand the potential and influence it can have. Data sets in the right hands can help predict and shape the future.

The problem is getting data sets to mingle. It is the data scientist’s role to transform organisations from reactive environments with static and aged data, to automated ones that continuously learn in real time. Forecasts are simple – data is a valuable resource and investing in it will definitely pay off.

Tractica forecasts that worldwide revenue from deployments of AI software, hardware, and services will increase from $14.9 billion in 2017 to $23.6 billion in 2018, a year-over-year increase of 58%.

Do we need more data scientists?

Now, knowing that data science is in huge demand, you are probably wondering who is going to do all the work. Do we have enough data scientists? Maybe the market is already flush with experts. Nothing could be further from the truth – data scientists are few and far between, and highly sought after. IBM predicts demand for data scientists will soar 28% by 2020. Machine learning and data science are generating more jobs than there are experts to fill them, which is why these two fields are the fastest growing tech employment areas today.

Why should you become a data scientist?

Let’s start from the bottom of Maslow’s pyramid of human needs, which you secure with money. According to Glassdoor, in 2016 data science was the single highest paid profession. If data is money, as they say, then this should come as no surprise. The combination of skills necessary to do data science the right way is not common. The good news, however, is that if you want to become a data scientist and are willing to develop yourself, you are very likely to succeed. A background in mathematics, statistics or physics is a good foundation to build upon. You don’t necessarily need to have finished a data science program. We write a lot about learning methods on our blog, which you’ll see if you read our next post. Sign up for our newsletter if you would like to be updated.

Make the world easier

Besides its financial and economic aspects, data science is simply a fascinating discipline, one which affects many areas of our everyday lives and makes the world a better place. We already use it in many fields, such as quick and easy customer service, intelligent navigation, recommendations and voice-to-text. You can even improve the resolution of an image with deep learning.
We don’t have space enough to chronicle as of the ways that data science is improving people’s lives. It is indispensable to the banking sector as it is used to detect fraud by analyzing the behavior of financial institutions in real time. Elsewhere, robots will be used to help the elderly and the disabled gain mobility and independence. Data science makes these breakthroughs accessible to individuals, solves social problems and modernizes business. Most importantly, you can take part in the revolution data science is bringing about.


If we consider that the population of the US is close to 400 million, this would effectively mean that well over half a percent of the US population will be in data science by 2020! Even now, data science jobs typically remain open for 45 days, which is “five days longer than the market average” (Burning Glass Technologies, 2017). This means that it’s 10% harder for companies to fill these roles, which tells us that the personnel are currently not there.

“By 2020, the number of jobs for all US data professionals will increase by 364,000 openings to 2,720,000 according to IBM.”

As data grows exponentially, it stands to reason that we will need more data scientists to produce the tools that can handle it. This might make us believe that a career in data science is absolutely safe from automation.

This, however, is not the whole story – the better we get at producing tools to cope with data, the more automated we make our own field. There are robots that can handle the entire Data Science Process, without any need for human interaction.

So, will data scientists get replaced by their own algorithms?

Hadelin and Kirill have discussed this exact matter with Jeremy Achin, CEO of DataRobot. Achin and his team have created a self-serve ML platform where anyone can upload their data for a machine to process. These robots are hardly rudimentary. They allow you to get insights not only from your inputted data but also for a specific problem in need of a solution. DataRobot is one of many companies automating data analysis.

We know the way that the field is moving, and we don’t want to sell you any false dreams that, if you become a data scientist, your job will be 100% secure. The way to keep ahead of the game is to understand that while a data robot may indeed push out the number crunchers, data science is also reliant on people dreaming up new ways to capture, store and process information.

A machine might be able to do things more quickly, and even to identify the optimal solution for a task, but it cannot (yet) devise entirely new approaches to a data science project.

Getting into the field right now is the best time to upskill yourself to a level where you aren’t doing the ‘at risk’ tasks, protecting you against the threat of automation. The best advice we can give is therefore to focus on strengthening your creativity.

For more insights into future-proofing your career, there is only one real advice to follow: keep yourself curious, keep yourself hungry and learn as much as you possibly can. Even though there is a data tidal wave moving this way, we don’t want to alarm you, with it comes hundreds of new and exciting opportunities. You only need to be prepared and the best way is to get started right away!

To Conclude:

Robots will become smarter, and data science isn’t entirely immune to automation. But we also need data scientists to understand the mechanics of these robots, bots and algorithms: how do their functions work? how they can be managed? how can we get actionable outputs for companies? We also need thinkers – people who can take on the creative tasks that machines are less capable of handling. Get into the game early, and you’ll be streaks ahead when automation starts to happen.

There are so many ways to get started, and you can do so today: Why not take a course, grab a couple of good books, become an intern, practice with real-world datasets, or help out in a citizen science project? Even if you’re ‘only’ educating yourself at this stage, you’re still keeping yourself in the game and in the community. And if you’re just biding your time, don’t. Companies are looking for you, and they’re willing to pay you handsomely.


How do you feel about the future of data science as a discipline? Where do you see the most opportunity for data scientists? If you’re already a practitioner, do you have a different view of any of these questions? Let us know, and let’s get the conversation going!



Today, data is everywhere. Over the centuries, the world has generated huge data.

In fact, 80% of data comes from the web in various form. From the social channels to the blogs, images, sounds, and videos.

Every minute, the world generates yottabytes of data.

data generated every secondSource: Data Science Central

In a case study, IBM estimates 90% of the world’s data generated in the past few years. This means every day, the world creates 2.5 quintillion bytes of data.

The amounts of data are so huge that, organizations are struggling to handle it. They need technical experts to extract the powerful insights and make smarter business decisions.

To address this challenge, the Data Science emerged as hottest professions. Since 2012, the data science exploded in the data analytics industry.


According to LinkedIn’s 2017 Emerging Jobs Report, data scientist job demand has grown over 650% since 2012. It’s the same year that Harvard Business Review reported it “Sexiest job of the 21st century.”

As a matter of fact, the demand for data scientists is increasing over a short period.

McKinsey reported that by 2018, we will see 50% gap in the data scientists supply versus demand. And, the United States alone will experience the shortage of 190,000 skilled data scientists.

Hence, the data science job will continue to grow over the next decade. We think this is the right time to advance your career as data professionals.




hen Jonathan Goldman arrived for work in June 2006 at LinkedIn, the business networking site, the place still felt like a start-up. The company had just under 8 million accounts, and the number was growing quickly as existing members invited their friends and colleagues to join. But users weren’t seeking out connections with the people who were already on the site at the rate executives had expected. Something was apparently missing in the social experience. As one LinkedIn manager put it, “It was like arriving at a conference reception and realizing you don’t know anyone. So you just stand in the corner sipping your drink—and you probably leave early.”
Goldman, a PhD in physics from Stanford, was intrigued by the linking he did see going on and by the richness of the user profiles. It all made for messy data and unwieldy analysis, but as he began exploring people’s connections, he started to see possibilities. He began forming theories, testing hunches, and finding patterns that allowed him to predict whose networks a given profile would land in. He could imagine that new features capitalizing on the heuristics he was developing might provide value to users. But LinkedIn’s engineering team, caught up in the challenges of scaling up the site, seemed uninterested. Some colleagues were openly dismissive of Goldman’s ideas. Why would users need LinkedIn to figure out their networks for them? The site already had an address book importer that could pull in all a member’s connections.

Luckily, Reid Hoffman, LinkedIn’s cofounder and CEO at the time (now its executive chairman), had faith in the power of analytics because of his experiences at PayPal, and he had granted Goldman a high degree of autonomy. For one thing, he had given Goldman a way to circumvent the traditional product release cycle by publishing small modules in the form of ads on the site’s most popular pages.

Through one such module, Goldman started to test what would happen if you presented users with names of people they hadn’t yet connected with but seemed likely to know—for example, people who had shared their tenures at schools and workplaces. He did this by ginning up a custom ad that displayed the three best new matches for each user based on the background entered in his or her LinkedIn profile. Within days it was obvious that something remarkable was taking place. The click-through rate on those ads was the highest ever seen. Goldman continued to refine how the suggestions were generated, incorporating networking ideas such as “triangle closing”—the notion that if you know Larry and Sue, there’s a good chance that Larry and Sue know each other. Goldman and his team also got the action required to respond to a suggestion down to one click.


The shortage of data scientists is becoming a serious constraint in some sectors.


It didn’t take long for LinkedIn’s top managers to recognize a good idea and make it a standard feature. That’s when things really took off. “People You May Know” ads achieved a click-through rate 30% higher than the rate obtained by other prompts to visit more pages on the site. They generated millions of new page views. Thanks to this one feature, LinkedIn’s growth trajectory shifted significantly upward.

A New Breed

Goldman is a good example of a new key player in organizations: the “data scientist.” It’s a high-ranking professional with the training and curiosity to make discoveries in the world of big data. The title has been around for only a few years. (It was coined in 2008 by one of us, D.J. Patil, and Jeff Hammerbacher, then the respective leads of data and analytics efforts at LinkedIn and Facebook.) But thousands of data scientists are already working at both start-ups and well-established companies. Their sudden appearance on the business scene reflects the fact that companies are now wrestling with information that comes in varieties and volumes never encountered before. If your organization stores multiple petabytes of data, if the information most critical to your business resides in forms other than rows and columns of numbers, or if answering your biggest question would involve a “mashup” of several analytical efforts, you’ve got a big data opportunity.

Much of the current enthusiasm for big data focuses on technologies that make taming it possible, including Hadoop (the most widely used framework for distributed file system processing) and related open-source tools, cloud computing, and data visualization. While those are important breakthroughs, at least as important are the people with the skill set (and the mind-set) to put them to good use. On this front, demand has raced ahead of supply. Indeed, the shortage of data scientists is becoming a serious constraint in some sectors. Greylock Partners, an early-stage venture firm that has backed companies such as Facebook, LinkedIn, Palo Alto Networks, and Workday, is worried enough about the tight labor pool that it has built its own specialized recruiting team to channel talent to businesses in its portfolio. “Once they have data,” says Dan Portillo, who leads that team, “they really need people who can manage it and find insights in it.”

Who Are These People?

If capitalizing on big data depends on hiring scarce data scientists, then the challenge for managers is to learn how to identify that talent, attract it to an enterprise, and make it productive. None of those tasks is as straightforward as it is with other, established organizational roles. Start with the fact that there are no university programs offering degrees in data science. There is also little consensus on where the role fits in an organization, how data scientists can add the most value, and how their performance should be measured.

The first step in filling the need for data scientists, therefore, is to understand what they do in businesses. Then ask, What skills do they need? And what fields are those skills most readily found in?

More than anything, what data scientists do is make discoveries while swimming in data. It’s their preferred method of navigating the world around them. At ease in the digital realm, they are able to bring structure to large quantities of formless data and make analysis possible. They identify rich data sources, join them with other, potentially incomplete data sources, and clean the resulting set. In a competitive landscape where challenges keep changing and data never stop flowing, data scientists help decision makers shift from ad hoc analysis to an ongoing conversation with data.

Data scientists realize that they face technical limitations, but they don’t allow that to bog down their search for novel solutions. As they make discoveries, they communicate what they’ve learned and suggest its implications for new business directions. Often they are creative in displaying information visually and making the patterns they find clear and compelling. They advise executives and product managers on the implications of the data for products, processes, and decisions.

Given the nascent state of their trade, it often falls to data scientists to fashion their own tools and even conduct academic-style research. Yahoo, one of the firms that employed a group of data scientists early on, was instrumental in developing Hadoop. Facebook’s data team created the language Hive for programming Hadoop projects. Many other data scientists, especially at data-driven companies such as Google, Amazon, Microsoft, Walmart, eBay, LinkedIn, and Twitter, have added to and refined the tool kit.

What kind of person does all this? What abilities make a data scientist successful? Think of him or her as a hybrid of data hacker, analyst, communicator, and trusted adviser. The combination is extremely powerful—and rare.



Further Reading


Big Data: The Management Revolution
Competitive Strategy Feature
  • Andrew McAfee and Erik Brynjolfsson
The challenges of becoming a big data–enabled organization require hands-on—or in some cases hands-off—leadership.



Data scientists’ most basic, universal skill is the ability to write code. This may be less true in five years’ time, when many more people will have the title “data scientist” on their business cards. More enduring will be the need for data scientists to communicate in language that all their stakeholders understand—and to demonstrate the special skills involved in storytelling with data, whether verbally, visually, or—ideally—both.

But we would say the dominant trait among data scientists is an intense curiosity—a desire to go beneath the surface of a problem, find the questions at its heart, and distill them into a very clear set of hypotheses that can be tested. This often entails the associative thinking that characterizes the most creative scientists in any field. For example, we know of a data scientist studying a fraud problem who realized that it was analogous to a type of DNA sequencing problem. By bringing together those disparate worlds, he and his team were able to craft a solution that dramatically reduced fraud losses.

Perhaps it’s becoming clear why the word “scientist” fits this emerging role. Experimental physicists, for example, also have to design equipment, gather data, conduct multiple experiments, and communicate their results. Thus, companies looking for people who can work with complex data have had good luck recruiting among those with educational and work backgrounds in the physical or social sciences. Some of the best and brightest data scientists are PhDs in esoteric fields like ecology and systems biology. George Roumeliotis, the head of a data science team at Intuit in Silicon Valley, holds a doctorate in astrophysics. A little less surprisingly, many of the data scientists working in business today were formally trained in computer science, math, or economics. They can emerge from any field that has a strong data and computational focus.

It’s important to keep that image of the scientist in mind—because the word “data” might easily send a search for talent down the wrong path. As Portillo told us, “The traditional backgrounds of people you saw 10 to 15 years ago just don’t cut it these days.” A quantitative analyst can be great at analyzing data but not at subduing a mass of unstructured data and getting it into a form in which it can be analyzed. A data management expert might be great at generating and organizing data in structured form but not at turning unstructured data into structured data—and also not at actually analyzing the data. And while people without strong social skills might thrive in traditional data professions, data scientists must have such skills to be effective.

How to Find the Data Scientists You Need


1. Focus recruiting at the “usual suspect” universities (Stanford, MIT, Berkeley, Harvard, Carnegie Mellon) and also at a few others with proven strengths: North Carolina State, UC Santa Cruz, the University of Maryland, the University of Washington, and UT Austin.

2. Scan the membership rolls of user groups devoted to data science tools. The R User Groups (for an open-source statistical tool favored by data scientists) and Python Interest Groups (for PIGgies) are good places to start.

3. Search for data scientists on LinkedIn—they’re almost all on there, and you can see if they have the skills you want.

4. Hang out with data scientists at the Strata, Structure:Data, and Hadoop World conferences and similar gatherings (there is almost one a week now) or at informal data scientist “meet-ups” in the Bay Area; Boston; New York; Washington, DC; London; Singapore; and Sydney.

5. Make friends with a local venture capitalist, who is likely to have gotten a variety of big data proposals over the past year.

6. Host a competition on Kaggle or TopCoder, the analytics and coding competition sites. Follow up with the most-creative entrants.

7. Don’t bother with any candidate who can’t code. Coding skills don’t have to be at a world-class level but should be good enough to get by. Look for evidence, too, that candidates learn rapidly about new technologies and methods.

8. Make sure a candidate can find a story in a data set and provide a coherent narrative about a key data insight. Test whether he or she can communicate with numbers, visually and verbally.

9. Be wary of candidates who are too detached from the business world. When you ask how their work might apply to your management challenges, are they stuck for answers?

10. Ask candidates about their favorite analysis or insight and how they are keeping their skills sharp. Have they gotten a certificate in the advanced track of Stanford’s online Machine Learning course, contributed to open-source projects, or built an online repository of code to share (for example, on GitHub)?


Read more

Roumeliotis was clear with us that he doesn’t hire on the basis of statistical or analytical capabilities. He begins his search for data scientists by asking candidates if they can develop prototypes in a mainstream programming language such as Java. Roumeliotis seeks both a skill set—a solid foundation in math, statistics, probability, and computer science—and certain habits of mind. He wants people with a feel for business issues and empathy for customers. Then, he says, he builds on all that with on-the-job training and an occasional course in a particular technology.

Several universities are planning to launch data science programs, and existing programs in analytics, such as the Master of Science in Analytics program at North Carolina State, are busy adding big data exercises and coursework. Some companies are also trying to develop their own data scientists. After acquiring the big data firm Greenplum, EMC decided that the availability of data scientists would be a gating factor in its own—and customers’—exploitation of big data. So its Education Services division launched a data science and big data analytics training and certification program. EMC makes the program available to both employees and customers, and some of its graduates are already working on internal big data initiatives.


Data scientists want to build things, not just give advice. One describes being a consultant as “the dead zone.”


As educational offerings proliferate, the pipeline of talent should expand. Vendors of big data technologies are also working to make them easier to use. In the meantime one data scientist has come up with a creative approach to closing the gap. The Insight Data Science Fellows Program, a postdoctoral fellowship designed by Jake Klamka (a high-energy physicist by training), takes scientists from academia and in six weeks prepares them to succeed as data scientists. The program combines mentoring by data experts from local companies (such as Facebook, Twitter, Google, and LinkedIn) with exposure to actual big data challenges. Originally aiming for 10 fellows, Klamka wound up accepting 30, from an applicant pool numbering more than 200. More organizations are now lining up to participate. “The demand from companies has been phenomenal,” Klamka told us. “They just can’t get this kind of high-quality talent.”

Why Would a Data Scientist Want to Work Here?

Even as the ranks of data scientists swell, competition for top talent will remain fierce. Expect candidates to size up employment opportunities on the basis of how interesting the big data challenges are. As one of them commented, “If we wanted to work with structured data, we’d be on Wall Street.” Given that today’s most qualified prospects come from nonbusiness backgrounds, hiring managers may need to figure out how to paint an exciting picture of the potential for breakthroughs that their problems offer.

Pay will of course be a factor. A good data scientist will have many doors open to him or her, and salaries will be bid upward. Several data scientists working at start-ups commented that they’d demanded and got large stock option packages. Even for someone accepting a position for other reasons, compensation signals a level of respect and the value the role is expected to add to the business. But our informal survey of the priorities of data scientists revealed something more fundamentally important. They want to be “on the bridge.” The reference is to the 1960s television show Star Trek, in which the starship captain James Kirk relies heavily on data supplied by Mr. Spock. Data scientists want to be in the thick of a developing situation, with real-time awareness of the evolving set of choices it presents.

Considering the difficulty of finding and keeping data scientists, one would think that a good strategy would involve hiring them as consultants. Most consulting firms have yet to assemble many of them. Even the largest firms, such as Accenture, Deloitte, and IBM Global Services, are in the early stages of leading big data projects for their clients. The skills of the data scientists they do have on staff are mainly being applied to more-conventional quantitative analysis problems. Offshore analytics services firms, such as Mu Sigma, might be the ones to make the first major inroads with data scientists.



But the data scientists we’ve spoken with say they want to build things, not just give advice to a decision maker. One described being a consultant as “the dead zone—all you get to do is tell someone else what the analyses say they should do.” By creating solutions that work, they can have more impact and leave their marks as pioneers of their profession.

Care and Feeding

Data scientists don’t do well on a short leash. They should have the freedom to experiment and explore possibilities. That said, they need close relationships with the rest of the business. The most important ties for them to forge are with executives in charge of products and services rather than with people overseeing business functions. As the story of Jonathan Goldman illustrates, their greatest opportunity to add value is not in creating reports or presentations for senior executives but in innovating with customer-facing products and processes.

LinkedIn isn’t the only company to use data scientists to generate ideas for products, features, and value-adding services. At Intuit data scientists are asked to develop insights for small-business customers and consumers and report to a new senior vice president of big data, social design, and marketing. GE is already using data science to optimize the service contracts and maintenance intervals for industrial products. Google, of course, uses data scientists to refine its core search and ad-serving algorithms. Zynga uses data scientists to optimize the game experience for both long-term engagement and revenue. Netflix created the well-known Netflix Prize, given to the data science team that developed the best way to improve the company’s movie recommendation system. The test-preparation firm Kaplan uses its data scientists to uncover effective learning strategies.


Data scientists today are akin to the Wall Street “quants” of the 1980s and 1990s.


There is, however, a potential downside to having people with sophisticated skills in a fast-evolving field spend their time among general management colleagues. They’ll have less interaction with similar specialists, which they need to keep their skills sharp and their tool kit state-of-the-art. Data scientists have to connect with communities of practice, either within large firms or externally. New conferences and informal associations are springing up to support collaboration and technology sharing, and companies should encourage scientists to become involved in them with the understanding that “more water in the harbor floats all boats.”

Data scientists tend to be more motivated, too, when more is expected of them. The challenges of accessing and structuring big data sometimes leave little time or energy for sophisticated analytics involving prediction or optimization. Yet if executives make it clear that simple reports are not enough, data scientists will devote more effort to advanced analytics. Big data shouldn’t equal “small math.”

The Hot Job of the Decade

Hal Varian, the chief economist at Google, is known to have said, “The sexy job in the next 10 years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s?”

If “sexy” means having rare qualities that are much in demand, data scientists are already there. They are difficult and expensive to hire and, given the very competitive market for their services, difficult to retain. There simply aren’t a lot of people with their combination of scientific background and computational and analytical skills.

Data scientists today are akin to Wall Street “quants” of the 1980s and 1990s. In those days people with backgrounds in physics and math streamed to investment banks and hedge funds, where they could devise entirely new algorithms and data strategies. Then a variety of universities developed master’s programs in financial engineering, which churned out a second generation of talent that was more accessible to mainstream firms. The pattern was repeated later in the 1990s with search engineers, whose rarefied skills soon came to be taught in computer science programs.

One question raised by this is whether some firms would be wise to wait until that second generation of data scientists emerges, and the candidates are more numerous, less expensive, and easier to vet and assimilate in a business setting. Why not leave the trouble of hunting down and domesticating exotic talent to the big data start-ups and to firms like GE and Walmart, whose aggressive strategies require them to be at the forefront?

The problem with that reasoning is that the advance of big data shows no signs of slowing. If companies sit out this trend’s early days for lack of talent, they risk falling behind as competitors and channel partners gain nearly unassailable advantages. Think of big data as an epic wave gathering now, starting to crest. If you want to catch it, you need people who can surf.




Welcome to LikeInMind! 



Movies, Television, Music ...


Jobs, Real Estate, Investing...


Internet, Software, Hardware...


Video Games, RPGs, Gambling...


Fitness, Medicine, Alternative...


Family, Consumers, Cooking...

Kids and Teens

Arts, School Time, Teen Life...


Media, Newspapers, Weather...


Travel, Food, Outdoors, Humor...


Maps, Education, Libraries...


NZRussia, US, ...


Biology, Psychology, Physics...


Clothing, Food, Gifts...


People, Religion, Issues...


Baseball, Soccer, Basketball...


Català, Česky, Dansk, Deutsch, Español, Esperanto, Français, Galego, Hrvatski, Italiano, Lietuvių, Magyar, Nederlands, Norsk, Polski, Português, Română, Slovensky, Suomi, Svenska, Türkçe, Български, Ελληνικά, Русский, Українська, עברית‭, ‬العربية, ไทย, 日本語, 简体中文, 繁體中文, …




"We do stuff ... and tell stories..." Esteban Trev 

"Persistence and Consistency (in following expectations)" David Braden

"Intent. Patience. Persistence"   Michael Josefowicz 

World Brain 2 
Online Planning Game 
SkillMatcher NZ 
 Language Metaphor MJ +ET collaboration  
What leadership means - STN
The Creative Mind
From  Academic Tome to Commercial Option

Skill Matcher

MJ Home



A                     AK           AR
D                                   DVS              
E                                     ET
M                   MJ






http://i.creativecommons.org/l/by-nc-sa/2.5/ca/88x31.png This Wiki is licensed CC-BY-NC-SA -
Creative Commons

Attribution-Noncommercial-Share Alike 3.0 License.
Authors, learn more about your rights.

Comments (0)

You don't have permission to comment on this page.