(Note: This is a rant, so what I’m trying to say is possibly written between the lines.)
2012, three years ago, I was working in the context of computer vision, teaching computers to see. While this is still an exciting field, yielding exciting technology, no one is really making money there so far, because these are solutions looking for a problem—there is no itch to be scratched. Our department was selling tunnel surveillance systems to the traffic industry, which was quite a niche and didn’t contribute to getting our company out of financial trouble. I started a learning phase, trying to get deeper into that machine learning thing, seeing myself as a technical expert in a few years, being known for bringing complex theoretical concepts to life in successful solutions—at a place where such skills throw off money.
During generic research I collected more and more knowledge about the new hot field called data science, a magical mixture of statistical modeling and modern computer technology with its application in business. Since media mentioned IBM as player in the first row, I got in touch with their local office. And really, they hired me! However, I found myself placed onto the wrong track: I was expected to ensure that others do the work I was interested in doing, to generate projects, to devise proposals from zero to signings, to tell bank reps that they had to understand their customers as individuals to compete in today’s market. I was definitely not needed as a mathematician with a knowledge of data mining algorithms there. They needed business economists, marketers and sellers with an understanding of industries. The actual work that I was interested in doing—hacking fancy predictive models—would be delivered by folks who work at external business partners. How could that have happened? Both sides seemed to have had different expectations and interpretations. So, I was immediately job hunting again, and data science disappeared from my career radar during my way back to the software engineering world.
At my current employer, I’m somewhat known as the guy who knows about big data (although I haven’t ever tried Hadoop) and data mining (although some of my coworkers are “real” statisticians). But during the recent months I concluded that all this data science is just one good old thing: marketing. The big part that actually defines data science is totally not explained by its name: It’s definitely and exclusively solving business problems. Data mining, on the other hand, has different interpretations. I, too, was blinded by what tech people see when hit with this buzzword: Hadoop, MapReduce, statistical algorithms, other fancy formula-heavy or technological stuff, applied to data of manifold origin. The business folks however have that marketing interpretation:
Data mining is finding more people to sell stuff to.
Data mining is market basket analysis (what stuff people buy), upselling (more expensive stuff), cross-selling (additional other stuff), understanding a company’s customers (people who buy stuff) to prepare marketing campaigns (telling people to buy stuff). Hey, business analyst, find more people to sell our stuff to! Oh, you’re a data scientist? Well, what difference does it make? Find more people—they might be customers already, possibly thinking about leaving us, or they aren’t our customers just yet. Or, possibly create a new product. Data mining is also about creating more stuff to sell to more people.
So, be careful not to mistake data science with data mining. As a data scientist, you won’t just practice R programming, cleaning data, data analysis, statistical inference, or creating data products. If someone wants to hire a data scientist, they are looking for a business professional who, pointing at data in a spreadsheet, tells CEOs how they should transform their company. See, sometimes, someone tries to headhunt me for “[…] acting as a partner for marketing executives and collaborating with colleagues in management accounting […] Developing procedures to measure marketing campaigns on a global level together with managers and executives in marketing and sales […] identify new business opportunities […] Demonstrate business acumen […]”.
Only rarely it goes like “[…] work with complex, varied, high-volume data sets that have real meaning for our customers’ health and wellbeing […] Identify patterns and correlations of a user’s fitness data […] Good statistical, mathematical and predictive modelling skills to build the algorithms […]”—Wait, what, Runtastic are Austrian!? (Or rather: Runtastic are awesome although they are Austrian!?)
Maybe that topic comes back to me once that pile of sensor data has become higher and the internet of things takes off. But I’m not in my twenties anymore, so the doors and clefts to slip through have become narrower.