Sunday, August 5. 2012
-
Interesting comparison of Hadoop today with the Linux story from the past. This could mean Hadoop/MapReduce as state of the art around 2020.
tags: bigdata cloud technology linux
-
While Hadoop is all the rage in the technology media today, it has barely scratched the surface of enterprise adoption
-
Hadoop seems set to win despite its many shortcomings
-
still in the transition from zero per cent adoption to one per cent adoption
-
IBM points to a few specific deficiencies
-
lack of performance and scalability, inflexible resource management, and a limitation to a single distributed file system
-
IBM, of course, promises to resolve these issues with its proprietary complements to Hadoop
-
Hadoop is batch oriented in a world increasingly run in real-time
-
customers are buying big into Hadoop
-
it’s still possible that other alternatives, like Percolator, will claim the Hadoop crown
-
Back in 2000 IBM announced that it was going to invest $1bn in advancing the Linux operating system. This was big news
-
it came roughly 10 years after Linus Torvalds released the first Linux source code, and it took another 10 years before Linux really came to dominate the industry
-
The same seems true of Hadoop today
-
we’re just starting the marathon
-
Three data-mangling job sites, all only for the US
tags: bigdata career
-
Bright, one of several new companies
-
another new job site, Path.to
-
Gild, a third major player
-
tags: bigdata datascience technology opinion
-
Thomas H. Davenport, Paul Barth and Randy Bean
-
how do the potential insights from big data differ from what managers generate from traditional analytics?
-
1. Paying attention to flows as opposed to stocks
-
the data is not the “stock” in a data warehouse but a continuous flow
-
-
organizations will need to develop continuous processes
-
data extraction, preparation and analysis took weeks to prepare — and weeks more to execute
-
conventional, high-certitude approaches to decision-making are often not appropriate
-
new data is often available that renders the decision obsolete
-
2. Relying on data scientists and product and process developers as opposed to data analysts
-
the people who work with big data need substantial and creative IT skills
-
programming, mathematical and statistical skills, as well as business acumen and the ability to communicate effectively
-
-
started an educational offering for data scientists
-
3. Moving analytics from IT into core business and operational functions
-
new products designed to deal with big data
-
-
Relational databases have also been transformed
-
Statistical analysis packages
-
-
“virtual data marts” allow data scientists to share existing data without replicating it
-
traditional role of IT— automating business processes — imposes precise requirements
-
Analytics has been more of an afterthought for monitoring processes
-
business and IT capabilities used to be stability and scale, the new advantages are based on discovery and agility
-
discovery and analysis as the first order of business
-
IT processes and systems need to be designed for insight, not just automation
-
The title is misleading: It’s not about what DS is. It’s rather a vision of the ideal solution.
tags: datascience technology opinion
-
the old state and the ideal future state, which he calls “Analyst 1.0” and “Analyst 2.0,”
-
Analyst 1.0 as the state of maturity achieved by using the last generation of business intelligence tools
-
Analyst 1.0 has some coding skills, and perhaps writes an SQL query here and there
-
inflexibility of data warehouses and relational databases
-
Our current state of affairs, which we’ll call Analyst 1.5, finds us in limbo
-
two primary limitations: the immense size and variety of the data, and the complexity of the tools needed
-
-
to get value from big data, business analysts cannot simply be presented with a programming language
-
Analyst 1.5 is characterized by a disconnect between data scientists and the tools and systems in the more complex camp of programmers and computer scientists
-
caused data to be totally fragmented
-
Analyst 2.0 will have arrived when vendors and IT make analysis easy enough that a typical business user can conduct analysis entirely by themselves
-
Tools such as self-learning recommendations engines
-
demands new skills, such as a more precise focus on aberrant or statistically significant data in a stream, as well as better tools
-
somehow at some point you have to get your analytical inspection down to the equivalent of code level
-
what we’re trying to model is every person’s brain–at least the part of the brain that decides how to shop, when to shop, and what you want
-
we need to continue to mine for behavioral data, such as what people looked at before and after they made transactions
-
among the top pitfalls is the tendency to focus on a very small piece of data without occasionally stepping back
-
tendency to over-focus on technology
-
organizations are tempted to put the most technology-savvy person on the job, rather than the most business-savvy
-
computer scientists are not trained to ask the right business questions
Continue reading "Link roundup, week 31/2012"
|