<?xml version="1.0" encoding="utf-8" ?>

<rss version="2.0" 
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:admin="http://webns.net/mvcb/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
   xmlns:wfw="http://wellformedweb.org/CommentAPI/"
   xmlns:content="http://purl.org/rss/1.0/modules/content/"
      xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
   xmlns:atom="http://www.w3.org/2005/Atom"
   xmlns:sc="http://podlove.org/simple-chapters"
>
<channel>
     

<itunes:subtitle>Stephan Paukner :: syslog</itunes:subtitle>
<itunes:author>Stephan Paukner :: syslog</itunes:author>
<itunes:summary>#include&amp;lt;rant.h&amp;gt;</itunes:summary>
<itunes:image href="http://stephan.paukner.cc/syslog/itunes.jpg" />
<itunes:category text="Technology" />                
                
    <title>Stephan Paukner :: syslog - Projects</title>
    <link>https://stephan.paukner.cc/syslog/</link>
    <description>#include&amp;lt;rant.h&amp;gt;</description>
    <dc:language>en</dc:language>
    <admin:errorReportsTo rdf:resource="mailto:paux+www@paukner.cc" />
    <generator>Serendipity 2.5.0 - http://www.s9y.org/</generator>
    <pubDate>Fri, 15 Dec 2017 08:17:47 GMT</pubDate>

    <image>
    <url>https://stephan.paukner.cc/syslog/templates/2k11/img/s9y_banner_small.png</url>
    <title>RSS: Stephan Paukner :: syslog - Projects - #include&amp;lt;rant.h&amp;gt;</title>
    <link>https://stephan.paukner.cc/syslog/</link>
    <width>100</width>
    <height>21</height>
</image>

<item>
    <title>Data science = marketing + advertising</title>
    <link>https://stephan.paukner.cc/syslog/archives/463-Data-science-marketing-+-advertising.html</link>
            <category>Data Science</category>
            <category>Personal</category>
    
    <comments>https://stephan.paukner.cc/syslog/archives/463-Data-science-marketing-+-advertising.html#comments</comments>
    <wfw:comment>https://stephan.paukner.cc/syslog/wfwcomment.php?cid=463</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>https://stephan.paukner.cc/syslog/rss.php?version=2.0&amp;type=comments&amp;cid=463</wfw:commentRss>
    

    <author>paux+www15@paukner.cc (Stephan Paukner)</author>
    <content:encoded>
    &lt;p&gt;(Note: This is a follow-up rant of &lt;a href=&quot;https://stephan.paukner.cc/syslog/archives/447-A-pessimistic-conclusion-about-what-data-science-means.html&quot;&gt;my pessimistic conclusion about what data science means&lt;/a&gt; from two years ago.)&lt;/p&gt;

&lt;p&gt;Let me come straight to the point: Data science is synonymous for marketing. Period. Do not let yourself be misguided by online data science/machine learning/statistics lectures which only cover topics from the mathematical or programming area. These will only take up 5% of your work! The rest is 60% hot-air blowing, 50% managing/organizing and 40% delegating to juniors or externs, making that a total workload of 155% (60+ h/week), with expectations on you to generate projects and to build Big Data strategies. (After all, you&amp;#8217;re smart, right? Otherwise you wouldn&amp;#8217;t know that much math and &lt;span title=&quot;Artificial Intelligence&quot; class=&quot;serendipity_glossaryMarkup&quot;&gt;AI&lt;/span&gt;. You&amp;#8217;re smart, thus you can land big projects, right?) And the area of application is advertising, the data is customers, the outcome is customers—more people buying more stuff.&lt;/p&gt;

&lt;p&gt;No, thanks for the offers, but I won&amp;#8217;t work in advertising.&lt;/p&gt;

&lt;p&gt;Who actually is making Big Data a hot topic? It&amp;#8217;s those who create these solutions in the first place and sell them to companies who use these in their marketing processes.&lt;/p&gt;

&lt;p&gt;&amp;#8220;Data is the new oil.&amp;#8221; Really? So, how do you power your cars&amp;#160;with data? BS! We&amp;#8217;re still living in the oil interval.&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s so sad that AI is not solving any problems. It&amp;#8217;s used for making game characters act smarter, it&amp;#8217;s used for more effective advertising, it&amp;#8217;s used for making call centers obsolete (by executing voice commands), it&amp;#8217;s used for shopping agents (by executing voice commands), it&amp;#8217;s used for autonomous driving of cars, vacuums and lawnmowers. It&amp;#8217;s entirely used for making us (fat first-world people) even more lazy (fat), even more consumeristic (fat) and even more entertained (fat). These are solutions looking for a problem, but they can&amp;#8217;t find one. Seriously, I see much more potential in the blockchain, which deserves its own rant soon.&lt;/p&gt;
 
    </content:encoded>

    <pubDate>Thu, 24 Aug 2017 19:17:00 +0000</pubDate>
    <guid isPermaLink="false">https://stephan.paukner.cc/syslog/archives/463-guid.html</guid>
    <category>career</category>
<category>rant</category>

</item>
<item>
    <title>A pessimistic conclusion about what data science means</title>
    <link>https://stephan.paukner.cc/syslog/archives/447-A-pessimistic-conclusion-about-what-data-science-means.html</link>
            <category>Data Science</category>
            <category>Personal</category>
    
    <comments>https://stephan.paukner.cc/syslog/archives/447-A-pessimistic-conclusion-about-what-data-science-means.html#comments</comments>
    <wfw:comment>https://stephan.paukner.cc/syslog/wfwcomment.php?cid=447</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>https://stephan.paukner.cc/syslog/rss.php?version=2.0&amp;type=comments&amp;cid=447</wfw:commentRss>
    

    <author>paux+www15@paukner.cc (Stephan Paukner)</author>
    <content:encoded>
    &lt;p&gt;(Note: This is a rant, so what I&amp;#8217;m trying to say is possibly written between the lines.)&lt;/p&gt;

&lt;p&gt;2012, three years ago, I was working in the context of computer vision, teaching computers to see. While this is still an &lt;a href=&quot;https://stephan.paukner.cc/syslog/archives/72-Exciting-new-challenge.html&quot;&gt;exciting&lt;/a&gt; field, yielding exciting technology, no one is really making money there so far, because these are solutions looking for a problem—there is no itch to be scratched. Our department was selling tunnel surveillance systems to the traffic industry, which was quite a niche and didn&amp;#8217;t contribute to getting our company out of financial trouble. &lt;!--Sensing the collapse of our shack, --&gt;I started a learning phase, trying to get deeper into that machine learning thing, seeing myself as a technical expert in a few years, being known for bringing complex theoretical concepts to life in successful solutions—at a place where such skills throw off money.&lt;/p&gt;

&lt;p&gt;During generic research I &lt;a href=&quot;https://stephan.paukner.cc/syslog/plugin/tag/linkroll&quot;&gt;collected&lt;/a&gt; more and more knowledge about the new hot field called data science, a magical mixture of statistical modeling and modern computer technology with its application in business. Since media mentioned IBM as player in the first row, I got in touch with their local office. And really, they hired me! However, I found myself placed onto the wrong track: I was expected to ensure that others do the work I was interested in doing, to generate projects, to devise proposals from zero to signings, to tell bank reps that they had to understand their customers as individuals to compete in today’s market. I was definitely not needed as a mathematician with a knowledge of data mining algorithms there. They needed business economists, marketers and sellers with an understanding of industries. The actual work that I was interested in doing—hacking fancy predictive models—would be delivered by folks who work at external business partners. How could that have happened? Both sides seemed to have had different expectations and interpretations. So, I was immediately job hunting again, and data science disappeared from my career radar during my way back to the software engineering world.&lt;/p&gt;

&lt;p&gt;At my current employer, I&amp;#8217;m somewhat known as the guy who knows about big data (although I haven&amp;#8217;t ever tried &lt;span title=&quot;An open source software framework for data-intensive distributed applications&quot; class=&quot;serendipity_glossaryMarkup&quot;&gt;Hadoop&lt;/span&gt;) and data mining (although some of my coworkers are &amp;#8220;real&amp;#8221; statisticians). But during the recent months I concluded that all this data science is just one good old thing: marketing. The big part that actually defines data science is totally not explained by its name: It&amp;#8217;s definitely and exclusively solving business problems&lt;!--, namely those of sales--&gt;. Data &lt;em&gt;mining,&lt;/em&gt; on the other hand, has different interpretations. I, too, was blinded by what tech people see when hit with this buzzword: Hadoop, MapReduce, statistical algorithms, other fancy formula-heavy or technological stuff, applied to data of manifold origin. The business folks however have that marketing interpretation:&lt;/p&gt;

&lt;p&gt;Data mining is &lt;em&gt;finding more people to sell stuff to.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Data mining is market basket analysis (what stuff people buy), upselling (more expensive stuff), cross-selling (additional other stuff), understanding a company&amp;#8217;s customers (people who buy stuff) to prepare marketing campaigns (telling people to buy stuff). Hey, business analyst, find more people to sell our stuff to! Oh, you&amp;#8217;re a data scientist? Well, what difference does it make? Find more people—they might be customers already, possibly thinking about leaving us, or they aren&amp;#8217;t our customers just yet. Or, possibly create a new product. Data mining is also about &lt;em&gt;creating more stuff&lt;/em&gt; to sell to more people.&lt;/p&gt;

&lt;p&gt;So, be careful not to mistake data science with data mining. As a data scientist, you won&amp;#8217;t just practice &lt;!--a href=&quot;https://www.coursera.org/specialization/jhudatascience/1/courses&quot; target=&quot;_blank&quot;--&gt;R programming, cleaning data, data analysis, statistical inference, or creating data products&lt;!--/a--&gt;. If someone wants to hire a data scientist, they are looking for a business professional who, pointing at data in a spreadsheet, tells &lt;span title=&quot;Chief Executive Officer&quot; class=&quot;serendipity_glossaryMarkup&quot;&gt;CEO&lt;/span&gt;&lt;!--@--&gt;s how they should transform their company. See, sometimes, someone tries to headhunt me for &lt;em&gt;&amp;#8220;[…] acting as a partner for marketing executives and collaborating with colleagues in management accounting […] Developing procedures to measure marketing campaigns on a global level together with managers and executives in marketing and sales […] identify new business opportunities […] Demonstrate business acumen […]&amp;#8221;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Only rarely it goes like &lt;em&gt;&amp;#8220;[…] work with complex, varied, high-volume data sets that have real meaning for our customers’ health and wellbeing […] Identify patterns and correlations of a user&amp;#8217;s fitness data […] Good statistical, mathematical and predictive modelling skills to build the algorithms […]&amp;#8221;&lt;/em&gt;—Wait, what, Runtastic are Austrian!? (Or rather: Runtastic are awesome &lt;em&gt;although&lt;/em&gt; they are Austrian!?)&lt;/p&gt;

&lt;p&gt;Maybe that topic comes back to me once that pile of sensor data has become higher and the internet of things takes off. But&lt;!--, unfortunately,--&gt; I&amp;#8217;m not in my twenties anymore, so the doors and clefts to slip through have become narrower.&lt;/p&gt;
 
    </content:encoded>

    <pubDate>Wed, 08 Jul 2015 18:34:00 +0000</pubDate>
    <guid isPermaLink="false">https://stephan.paukner.cc/syslog/archives/447-guid.html</guid>
    <category>career</category>
<category>rant</category>

</item>
<item>
    <title>Link roundup, week 32/2012</title>
    <link>https://stephan.paukner.cc/syslog/archives/420-Link-roundup,-week-322012.html</link>
            <category>Data Science</category>
    
    <comments>https://stephan.paukner.cc/syslog/archives/420-Link-roundup,-week-322012.html#comments</comments>
    <wfw:comment>https://stephan.paukner.cc/syslog/wfwcomment.php?cid=420</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>https://stephan.paukner.cc/syslog/rss.php?version=2.0&amp;type=comments&amp;cid=420</wfw:commentRss>
    

    <author>paux+www15@paukner.cc (Stephan Paukner)</author>
    <content:encoded>
    &lt;ul class=&quot;diigo-linkroll&quot;&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://iianalytics.com/2012/08/a-strategic-mistake-with-big-data/&quot;&gt;A Strategic Mistake With Big Data — International Institute for Analytics&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/bigdata&quot;&gt;bigdata&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/opinion&quot;&gt;opinion&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;&lt;span class=&quot;author vcard&quot;&gt;&lt;span class=&quot;fn&quot;&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;http://iianalytics.com/author/bfranks_teradata/&quot; checklongurl=&quot;true&quot; title=&quot;Bill Franks&quot; class=&quot;fn n&quot;&gt;Bill Franks&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;That mistake is the development of a siloed, distinct big data strategy&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;strategy for big data is a new facet of their overall enterprise data and analytic strategy&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;the mess that many multi-channel retailers got themselves into through their entry into e-commerce&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;retailers launched distinct e-commerce divisions. Some were even separate legal entities. As opposed to viewing e-commerce as a new facet of an overall retail strategy&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;distinct processes and distinct infrastructure was created&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;provide a consistent experience for customers across channels&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;go into a store and grab a product and then find that same product on the retailer’s website.&amp;#160; Guess what? They have no way to match those products&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;separate, non-integrated strategies for big data will likely end up with systems and processes that are very difficult to integrate&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;integrate big data into the overall infrastructure and current and future analytic processes&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://news.cnet.com/8301-11386_3-57488781-76/machine-learning-system-can-id-cities-via-pics/&quot;&gt;Machine learning system can &lt;span title=&quot;Identification&quot; class=&quot;serendipity_glossaryMarkup&quot;&gt;ID&lt;/span&gt; cities via pics | Cutting Edge - CNET News&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/machinelearning&quot;&gt;machinelearning&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/computervision&quot;&gt;computervision&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/datamining&quot;&gt;datamining&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/research&quot;&gt;research&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/technology&quot;&gt;technology&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;The system automatically picks out relevant architectural details from photos&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;the details woven into the urban fabric that form a pattern&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;computers are learning to ID your city just by looking at random photos&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Google Street View images of Paris, London, New York, and Barcelona&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;features like the street signs, balconies, and lampposts of Paris to be distinct&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;lack of stylistic coherence in American cities&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;presented at &lt;a rel=&quot;nofollow&quot; href=&quot;http://s2012.siggraph.org/&quot; checklongurl=&quot;true&quot;&gt;Siggraph 2012&lt;/a&gt;&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;emerging field of visual data mining, which is more complex than looking for patterns in text or numbers&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Alexei Efros&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;we wish to automatically build a digital visual atlas of not only architectural but also natural geo-informative features for the entire planet&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://searchbusinessanalytics.techtarget.com/news/2240160935/Interviewing-data-scientist-candidates-Ask-these-questions&quot;&gt;Interviewing data scientist candidates? Ask these questions&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/datascience&quot;&gt;datascience&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/career&quot;&gt;career&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/opinion&quot;&gt;opinion&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;&amp;#8220;It&amp;#8217;s way over-hyped,&amp;#8221; said Franks&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;chief analytics officer for the data warehouse appliance vendor Teradata&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;&amp;#8220;softer&amp;#160;skills.&amp;#8221;&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;&amp;#8220;data artist.&amp;#8221;&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;data scientists are those that are able to understand the business problem; they&amp;#8217;re able to apply&amp;#160;creativity and present the results well&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;intuition,&amp;#160;which is hard to teach&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;five core areas I look at&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Commitment&lt;/div&gt; 
&lt;/div&gt; 
&lt;ul class=&quot;diigo-sticky-notes&quot;&gt; 
&lt;li&gt; 
&lt;div&gt;1.&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Creativity&lt;/div&gt; 
&lt;/div&gt; 
&lt;ul class=&quot;diigo-sticky-notes&quot;&gt; 
&lt;li&gt; 
&lt;div&gt;2.&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;thought process they went through deciding what&amp;#160;to do. Someone who&amp;#8217;s not creative is going to give me a list of the steps they went through&amp;#160;one-by-one more from a technical perspective&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Business savvy&lt;/div&gt; 
&lt;/div&gt; 
&lt;ul class=&quot;diigo-sticky-notes&quot;&gt; 
&lt;li&gt; 
&lt;div&gt;3.&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;what I want to hear is not just some technical&amp;#160;reasons&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Knowing how much information to give to the non-technical people&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Presentation&lt;/div&gt; 
&lt;/div&gt; 
&lt;ul class=&quot;diigo-sticky-notes&quot;&gt; 
&lt;li&gt; 
&lt;div&gt;4.&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;specific presentation as part of the interview&amp;#160;process&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;intuition&lt;/div&gt; 
&lt;/div&gt; 
&lt;ul class=&quot;diigo-sticky-notes&quot;&gt; 
&lt;li&gt; 
&lt;div&gt;5.&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;art or music or some other type of creative area&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;/ul&gt; &lt;a class=&quot;block_level&quot; href=&quot;https://stephan.paukner.cc/syslog/archives/420-Link-roundup,-week-322012.html#extended&quot;&gt;Continue reading &quot;Link roundup, week 32/2012&quot;&lt;/a&gt;
    </content:encoded>

    <pubDate>Sun, 12 Aug 2012 17:28:00 +0000</pubDate>
    <guid isPermaLink="false">https://stephan.paukner.cc/syslog/archives/420-guid.html</guid>
    <category>linkroll</category>

</item>
<item>
    <title>Link roundup, week 31/2012</title>
    <link>https://stephan.paukner.cc/syslog/archives/418-Link-roundup,-week-312012.html</link>
            <category>Data Science</category>
    
    <comments>https://stephan.paukner.cc/syslog/archives/418-Link-roundup,-week-312012.html#comments</comments>
    <wfw:comment>https://stephan.paukner.cc/syslog/wfwcomment.php?cid=418</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>https://stephan.paukner.cc/syslog/rss.php?version=2.0&amp;type=comments&amp;cid=418</wfw:commentRss>
    

    <author>paux+www15@paukner.cc (Stephan Paukner)</author>
    <content:encoded>
    &lt;ul class=&quot;diigo-linkroll&quot;&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://www.theregister.co.uk/2012/08/01/hadoop_will_only_get_bigger/&quot;&gt;&lt;span title=&quot;A popular open-source computer operating system (kernel)&quot; class=&quot;serendipity_glossaryMarkup&quot;&gt;Linux&lt;/span&gt; lessons for &lt;span title=&quot;An open source software framework for data-intensive distributed applications&quot; class=&quot;serendipity_glossaryMarkup&quot;&gt;Hadoop&lt;/span&gt; doubters • The Register&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-description&quot;&gt;Interesting comparison of Hadoop today with the Linux story from the past. This could mean Hadoop/MapReduce as state of the art around 2020.&lt;/p&gt;  
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/bigdata&quot;&gt;bigdata&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/cloud&quot;&gt;cloud&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/technology&quot;&gt;technology&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/linux&quot;&gt;linux&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;While Hadoop is all the rage in the technology media today, it has barely scratched the surface of enterprise adoption&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Hadoop seems set to win despite its many shortcomings&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;still in the transition from zero per cent adoption to one per cent adoption&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;http://www.datanami.com/datanami/2012-07-16/top_5_challenges_for_hadoop_mapreduce_in_the_enterprise.html&quot; checklongurl=&quot;true&quot;&gt;IBM points&lt;/a&gt; to a few specific deficiencies&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;lack of performance and scalability, inflexible resource management, and a limitation to a single distributed file system&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;IBM, of course, promises to resolve these issues with its proprietary complements to Hadoop&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Hadoop is batch oriented in a world increasingly run in real-time&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;customers are buying big into Hadoop&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;it&amp;#8217;s still possible that other alternatives, like Percolator, &lt;a rel=&quot;nofollow&quot; href=&quot;http://www.theregister.co.uk/2012/07/10/hadoop_past_its_prime/&quot; checklongurl=&quot;true&quot;&gt;will claim the Hadoop crown&lt;/a&gt;&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Back in 2000 &lt;a rel=&quot;nofollow&quot; href=&quot;http://news.cnet.com/2100-1001-249750.html&quot; checklongurl=&quot;true&quot;&gt;IBM announced&lt;/a&gt; that it was going to invest $1bn in advancing the Linux operating system. This was big news&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;it came roughly 10 years after Linus Torvalds released the first Linux source code, and it took another 10 years before Linux really came to dominate the industry&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;The same seems true of Hadoop today&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;we&amp;#8217;re just starting the marathon&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://management.fortune.cnn.com/2012/08/02/big-data-job-search/&quot;&gt;Can Big Data cut through your growing resume pile? - Fortune Management&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-description&quot;&gt;Three data-mangling job sites, all only for the US&lt;/p&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/bigdata&quot;&gt;bigdata&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/career&quot;&gt;career&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Bright, one of several new companies&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;another new job site, Path.to&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Gild, a third major player&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://sloanreview.mit.edu/the-magazine/2012-fall/54104/how-big-data-is-different/&quot;&gt;How &amp;#8216;Big Data&amp;#8217; Is Different&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/bigdata&quot;&gt;bigdata&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/datascience&quot;&gt;datascience&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/technology&quot;&gt;technology&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/opinion&quot;&gt;opinion&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Thomas H. Davenport, Paul Barth and Randy Bean&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;how do the potential insights from big data differ from what managers generate from traditional analytics?&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;1. Paying attention to flows as opposed to stocks&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;the data is not the “stock” in a data warehouse but a continuous flow&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Streaming analytics&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;organizations will need to develop continuous processes&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;data extraction, preparation and analysis took weeks to prepare — and weeks more to execute&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;conventional, high-certitude approaches to decision-making are often not appropriate&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;new data is often available that renders the decision obsolete&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;2. Relying on data scientists and product and process developers as opposed to data analysts&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;the people who work with big data need substantial and creative &lt;span title=&quot;Information Technology&quot; class=&quot;serendipity_glossaryMarkup&quot;&gt;IT&lt;/span&gt; skills&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;programming, mathematical and statistical skills, as well as business acumen and the ability to communicate effectively&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;EMC Corporation&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;started an educational offering for data scientists&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;3. Moving analytics from IT into core business and operational functions&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;new products designed to deal with big data&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Hadoop&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Relational databases have also been transformed&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Statistical analysis packages&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;the cloud&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;“virtual data marts” allow data scientists to share existing data without replicating it&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;traditional role of IT— automating business processes — imposes precise requirements&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Analytics has been more of an afterthought for monitoring processes&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;business and IT capabilities used to be stability and scale, the new advantages are based on discovery and agility&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;discovery and analysis as the first order of business&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;IT processes and systems need to be designed for insight, not just automation&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://www.forbes.com/sites/danwoods/2012/07/25/paypals-mok-oh-on-what-is-a-data-scientist/print/&quot;&gt;PayPal&amp;#8217;s Mok Oh On What Is A Data Scientist? - Forbes&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-description&quot;&gt;The title is misleading: It&amp;#8217;s not about what DS is. It&amp;#8217;s rather a vision of the ideal solution.&lt;/p&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/datascience&quot;&gt;datascience&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/technology&quot;&gt;technology&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/opinion&quot;&gt;opinion&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;the old state and the ideal future state, which he calls “Analyst 1.0” and “Analyst 2.0,”&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Analyst 1.0 as the state of maturity achieved by using the last generation of business intelligence tools&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Analyst 1.0 has some coding skills, and perhaps writes an &lt;span title=&quot;Structured Query Language&quot; class=&quot;serendipity_glossaryMarkup&quot;&gt;SQL&lt;/span&gt; query here and there&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;inflexibility of data warehouses and relational databases&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Our current state of affairs, which we’ll call Analyst 1.5, finds us in limbo&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;two primary limitations: the immense size and variety of the data, and the complexity of the tools needed&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Hadoop&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;to get value from big data, business analysts cannot simply be presented with a programming language&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Analyst 1.5 is characterized by a disconnect between data scientists and the tools and systems in the more complex camp of programmers and computer scientists&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;caused data to be totally fragmented&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Analyst 2.0 will have arrived when vendors and IT make analysis easy enough that a typical business user can conduct analysis entirely by themselves&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Tools such as self-learning recommendations engines&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;demands new skills, such as a more precise focus on aberrant or statistically significant data in a stream, as well as better tools&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;somehow at some point you have to get your analytical inspection down to the equivalent of code level&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;what we’re trying to model is every person’s brain–at least the part of the brain that decides how to shop, when to shop, and what you want&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;we need to continue to mine for behavioral data, such as what people looked at before and after they made transactions&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;among the top pitfalls is the tendency to focus on a very small piece of data without occasionally stepping back&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;tendency to over-focus on technology&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;organizations are tempted to put the most technology-savvy person on the job, rather than the most business-savvy&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;computer scientists are not trained to ask the right business questions&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;/ul&gt; &lt;a class=&quot;block_level&quot; href=&quot;https://stephan.paukner.cc/syslog/archives/418-Link-roundup,-week-312012.html#extended&quot;&gt;Continue reading &quot;Link roundup, week 31/2012&quot;&lt;/a&gt;
    </content:encoded>

    <pubDate>Sun, 05 Aug 2012 12:08:13 +0000</pubDate>
    <guid isPermaLink="false">https://stephan.paukner.cc/syslog/archives/418-guid.html</guid>
    <category>linkroll</category>

</item>
<item>
    <title>Link roundup, week 30/2012</title>
    <link>https://stephan.paukner.cc/syslog/archives/416-Link-roundup,-week-302012.html</link>
            <category>Data Science</category>
    
    <comments>https://stephan.paukner.cc/syslog/archives/416-Link-roundup,-week-302012.html#comments</comments>
    <wfw:comment>https://stephan.paukner.cc/syslog/wfwcomment.php?cid=416</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>https://stephan.paukner.cc/syslog/rss.php?version=2.0&amp;type=comments&amp;cid=416</wfw:commentRss>
    

    <author>paux+www15@paukner.cc (Stephan Paukner)</author>
    <content:encoded>
    &lt;ul class=&quot;diigo-linkroll&quot;&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://www.inference.phy.cam.ac.uk/mackay/itprnn/book.html&quot;&gt;David MacKay: Information Theory, Pattern Recognition and Neural Networks: The Book&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/machinelearning&quot;&gt;machinelearning&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/books&quot;&gt;books&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Information Theory, Inference, and Learning Algorithms&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;640 pages, Published September 2003&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;&lt;strong&gt;&lt;span title=&quot;Portable Document Format&quot; class=&quot;serendipity_glossaryMarkup&quot;&gt;PDF&lt;/span&gt; (A4)&lt;/strong&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://www.inference.phy.cam.ac.uk/itprnn/book.pdf&quot; checklongurl=&quot;true&quot;&gt;pdf&lt;/a&gt; (9M)  (fourth printing, March 2005)&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://wiki.stdout.org/rcookbook/&quot;&gt;Cookbook for R » Cookbook for R&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-description&quot;&gt;&lt;span title=&quot;HyperText Markup Language&quot; class=&quot;serendipity_glossaryMarkup&quot;&gt;HTML&lt;/span&gt;&lt;/p&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/r&quot;&gt;r&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/statistics&quot;&gt;statistics&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/programming&quot;&gt;programming&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;formerly named &lt;em&gt;R Cookbook&lt;/em&gt;&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;It is not related to Paul Teetor&amp;#8217;s excellent &lt;a rel=&quot;nofollow&quot; href=&quot;http://shop.oreilly.com/product/9780596809164.do&quot; checklongurl=&quot;true&quot;&gt;R Cookbook&lt;/a&gt;&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://tdwi.org/Articles/2012/07/24/Big-Data-4th-V.aspx&quot;&gt;Big Data -- Why the 3Vs Just Don&amp;#8217;t Make Sense -- TDWI -The Data Warehousing Institute&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-description&quot;&gt;When are we done defining big data? p.1&lt;/p&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/datascience&quot;&gt;datascience&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/bigdata&quot;&gt;bigdata&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/opinion&quot;&gt;opinion&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;settled on 3 Vs -- volume, variety, and velocity&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;another V: value&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;if big data is understood solely on the basis of these trends, it isn&amp;#8217;t clear that it&amp;#8217;s at all hype-worthy&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;if &amp;#8220;big data&amp;#8221; simply describes the volume, variety, and velocity of the information that constitutes it, our existing data management practices are still arguably up to the task&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;big data is hyped on the basis of its real or imagined outputs&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;a lot more interesting when you bring in &amp;#8216;V&amp;#8217; for value&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://cscs.umich.edu/~crshalizi/weblog/925.html#b3&quot;&gt;No, Really, Some of My Best Friends Are Data Scientists&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-description&quot;&gt;When are we done defining data science?&lt;/p&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/datascience&quot;&gt;datascience&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/statistics&quot;&gt;statistics&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/opinion&quot;&gt;opinion&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;the skills of a &amp;#8220;data scientist&amp;#8221; are&amp;#160;those of a modern statistician&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;know how to move data around and manipulate data with some programming language&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;know how to draw informative pictures of data&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Knowledge of stats, errorbars, confidence intervals&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;try to get people from different backgrounds&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Great communication skills&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;a lot of what we teach The Kids now looks a lot more like machine&amp;#160;learning than statistics as it was taught circa 1970, or even circa&amp;#160;1980&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Everything I &lt;em&gt;know&lt;/em&gt;&amp;#160;about statistics I&amp;#8217;ve learned without formal instruction&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;is not, in my experience, intrinsically hard for anyone who&amp;#160;already has a decent grounding in some other mathematical science&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;mastering them&amp;#160;really does mean trying to do things and failing&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;&lt;em&gt;potentially&lt;/em&gt; hazardous.&amp;#160;This is the idea that all that really matters is being &amp;#8220;smart&amp;#8221;&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;counter-productive for students to&amp;#160;attribute their success or failure in learning about something to an innate&amp;#160;talent&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://www.amazon.de/Taming-Data-Tidal-Wave-Opportunities/dp/1118208781/ref=sr_1_1?ie=UTF8&amp;amp;qid=1343220375&amp;amp;sr=8-1&quot;&gt;Taming The Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics (Wiley &amp;amp; SAS Business): Bill Franks&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-description&quot;&gt;Bill Franks is Chief Analytics Officer at Teradata&lt;/p&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/datascience&quot;&gt;datascience&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/bigdata&quot;&gt;bigdata&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/books&quot;&gt;books&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;27. April 2012&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;336 Seiten&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://smartdatacollective.com/rwang/57681/big-question-big-data-iswhats-question?utm_source=feedburner&amp;amp;utm_medium=feed&amp;amp;utm_campaign=Smart+Data+Collective+%28all+posts%29&amp;amp;utm_content=Google+Reader&quot;&gt;The Big Question In Big Data Is...What&amp;#8217;s The Question? | SmartData Collective&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/datascience&quot;&gt;datascience&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/opinion&quot;&gt;opinion&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/bigdata&quot;&gt;bigdata&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Some folks like to confuse &lt;span title=&quot;An open source software framework for data-intensive distributed applications&quot; class=&quot;serendipity_glossaryMarkup&quot;&gt;Hadoop&lt;/span&gt; with big data&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Focus On the Questions To Ask, Not The Answers&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;The failure of data warehouses to provide real-time data led to the creation of data marts&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Data marts failed to provide complete and updated and comprehensive views&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;existing solutions still don&amp;#8217;t solve the problem. Why? The market and business environment have changed&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Data moves from structured to unstructured. Sources exponentially proliferate. Data quality is paramount.&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Real-time is irrelevant because speed does not trump fidelity. Quantity does not trump quality&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Business questions remained unanswered despite the massive number of reports and views and charts&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;The big shift is about moving from data to decisions&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://www.inference.phy.cam.ac.uk/itprnn_lectures/&quot;&gt;Information Theory, Pattern Recognition, and Neural Networks&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/machinelearning&quot;&gt;machinelearning&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/video&quot;&gt;video&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/lectures&quot;&gt;lectures&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Draft videos (editing incomplete)&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Entropy and Data Compression&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Shannon&amp;#8217;s Source Coding Theorem&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Inference and Information Measures for Noisy Channels&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Introduction to Bayesian Inference&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Approximating Probability Distributions&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Clustering&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Monte Carlo Methods&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Gibbs sampling&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Neural Networks&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;http://www.inference.phy.cam.ac.uk/mackay/itprnn/&quot; checklongurl=&quot;true&quot;&gt;Other course materials - free online text book [Information Theory, Inference, and Learning Algorithms]&lt;/a&gt;&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;p class=&quot;diigo-ps&quot;&gt;Posted from &lt;a href=&quot;http://www.diigo.com&quot;&gt;Diigo&lt;/a&gt;. The rest of my favorite links are &lt;a href=&quot;http://www.diigo.com/user/paux&quot;&gt;here&lt;/a&gt;.&lt;/p&gt; 
    </content:encoded>

    <pubDate>Sun, 29 Jul 2012 12:16:15 +0000</pubDate>
    <guid isPermaLink="false">https://stephan.paukner.cc/syslog/archives/416-guid.html</guid>
    <category>linkroll</category>

</item>
<item>
    <title>Link roundup, week 29/2012</title>
    <link>https://stephan.paukner.cc/syslog/archives/414-Link-roundup,-week-292012.html</link>
            <category>Data Science</category>
    
    <comments>https://stephan.paukner.cc/syslog/archives/414-Link-roundup,-week-292012.html#comments</comments>
    <wfw:comment>https://stephan.paukner.cc/syslog/wfwcomment.php?cid=414</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>https://stephan.paukner.cc/syslog/rss.php?version=2.0&amp;type=comments&amp;cid=414</wfw:commentRss>
    

    <author>paux+www15@paukner.cc (Stephan Paukner)</author>
    <content:encoded>
    &lt;p&gt;It took me quite a long time to discover that my favorite knowledge management tool, &lt;a href=&quot;http://www.diigo.com/&quot; target=&quot;_blank&quot; title=&quot;Diigo.com&quot;&gt;Diigo&lt;/a&gt;, provides a feature to post one&amp;#8217;s bookmarks to a blog. As I often had the desire to repost certain links I stumbled upon, I will do that occasionally from now on, mainly about everything from the topic pool of data mining (and related buzzwords), with flavors ranging from theory to applications, from technology to business. (I can&amp;#8217;t really do that to social media sites, as it&amp;#8217;s almost impossible to explicitly consume posts topic-wise. So, blogs aren&amp;#8217;t really obsolete—yet.)&lt;/p&gt; 
&lt;p&gt;&lt;span title=&quot;By the way&quot; class=&quot;serendipity_glossaryMarkup&quot;&gt;Btw&lt;/span&gt;, Diigo is really awesome: You can highlight text on webpages and add annotations to help understanding an article and create a summary on the fly, right while going through it. In this sense: If you want to be briefed, read at least this. (And don&amp;#8217;t worry, the next episodes will contain less content; this one ranges back a few weeks.)&lt;/p&gt; 
&lt;ul class=&quot;diigo-linkroll&quot;&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://www.technologyreview.com/news/428497/your-laptop-can-now-analyze-big-data/&quot;&gt;Your Laptop Can Now Analyze Big Data - Technology Review&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/computerscience&quot;&gt;computerscience&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/datascience&quot;&gt;datascience&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/technology&quot;&gt;technology&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/software&quot;&gt;software&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;GraphChi, exploits the capacious hard drives&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;a Mac Mini running GraphChi can analyze Twitter&amp;#8217;s social graph from 2010—which contains 40 million users and 1.2 billion connections—in 59 minutes&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;The previous published result on this problem took 400 minutes using a cluster of about 1,000 computers&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;graph computation is becoming more and more relevant&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;GraphChi is capable of effectively handling many large-scale graph-computing problems without resorting to cloud-based solutions or supercomputers&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://research.google.com/pubs/pub36726.html&quot;&gt;Large-scale Incremental Processing Using Distributed Transactions and Notifications&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-description&quot;&gt;Google&amp;#8217;s Percolator paper&lt;/p&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/google&quot;&gt;google&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/datascience&quot;&gt;datascience&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/bigdata&quot;&gt;bigdata&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/technology&quot;&gt;technology&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Publication Year&amp;#160;2010&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;MapReduce and other batch-processing systems cannot process small updates&amp;#160;            individually&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Percolator, a system for incrementally processing updates to a large&amp;#160;              data set&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://www.silicon.de/41568978/big-data-in-deutschland-der-status-quo/&quot;&gt;Big Data in Deutschland – der Status Quo | silicon.de&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/bigdata&quot;&gt;bigdata&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/datascience&quot;&gt;datascience&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/technology&quot;&gt;technology&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/opinion&quot;&gt;opinion&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;das Thema Big Data noch in einem frühen Stadium&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;noch in der Analyse- und Planungsphase&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Verfügbarkeit neuer Analyse- und Datenbanktechnologien&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;dynamische Zunahme des unternehmensinternen Datenverkehrs&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Big Data vielfach ‘durch die Hintertür’ ins Unternehmen&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Datenwachstum von 42 Prozent bis zum Ende des Jahres 2014&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;auf Seiten der Storage-Infrastruktur eine Menge Arbeit&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;mittelständischen (500-999 Mitarbeiter) und&lt;br /&gt;        &amp;#160;den Großunternehmen (ab 1.000 Mitarbeiter)&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Über ein Drittel erwarten sich Kosteneinsparungen. Fast die Hälfte verspricht sich bessere Einsichten in das Informations- und Konsumverhalten der Kunden&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;hohen Erwartungen, die an Dienstleister und Lösungsanbieter gestellt werden&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;h5&gt;&lt;a href=&quot;http://gigaom.com/cloud/why-the-days-are-numbered-for-hadoop-as-we-know-it/&quot;&gt;Why the days are numbered for &lt;span title=&quot;An open source software framework for data-intensive distributed applications&quot; class=&quot;serendipity_glossaryMarkup&quot;&gt;Hadoop&lt;/span&gt; as we know it — Cloud Computing News&lt;/a&gt;&lt;/h5&gt; 
&lt;p class=&quot;diigo-tags&quot;&gt;&lt;span&gt;tags:&lt;/span&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/datascience&quot;&gt;datascience&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/bigdata&quot;&gt;bigdata&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/cloud&quot;&gt;cloud&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/technology&quot;&gt;technology&lt;/a&gt; &lt;a href=&quot;http://www.diigo.com/user/paux/opinion&quot;&gt;opinion&lt;/a&gt;&lt;/p&gt; 
&lt;ul class=&quot;diigo-annotations&quot;&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;it has become synonymous with big data&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;de facto standard&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Is the enterprise buying into a technology whose best day has already passed?&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Hadoop’s inspiration – Google’s MapReduce&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;&lt;a rel=&quot;nofollow&quot; href=&quot;http://research.google.com/archive/gfs.html&quot; checklongurl=&quot;true&quot;&gt;Google File System&lt;/a&gt;&amp;#160;(GFS) and &lt;a rel=&quot;nofollow&quot; href=&quot;http://research.google.com/archive/mapreduce.html&quot; checklongurl=&quot;true&quot;&gt;Google MapReduce&lt;/a&gt;&amp;#160;(GMR)&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;make big data processing approachable to Google’s typical user/developer&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Hadoop Distributed File System and Hadoop MapReduce — was born in the image of GFS and GMR&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Your code is turned into map and reduce &lt;em&gt;jobs&lt;/em&gt;, and Hadoop runs those &lt;em&gt;jobs&lt;/em&gt; for you&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Google evolved. Can Hadoop catch up?&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;GMR no longer holds such prominence in the Google stack&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Here are technologies that I hope will ultimately seed the post-Hadoop&amp;#160;era&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;it will require new, non-MapReduce-based architectures that leverage the Hadoop core (HDFS and Zookeeper) to truly compete with Google&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Percolator for incremental indexing and analysis of frequently changing datasets&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;each time you want to analyze the data (say after adding, modifying or deleting data) you have to stream over the entire dataset&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;displacing GMR in favor of an incremental processing engine called &lt;a rel=&quot;nofollow&quot; href=&quot;[5]%20http://research.google.com/pubs/pub36726.html&quot; checklongurl=&quot;true&quot;&gt;&lt;strong&gt;Percolator&lt;/strong&gt;&lt;/a&gt;&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;dealing only with new, modified, or deleted documents&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Dremel for ad hoc analytics&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;&lt;span title=&quot;Structured Query Language&quot; class=&quot;serendipity_glossaryMarkup&quot;&gt;SQL&lt;/span&gt;-like familiarity&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;many interface layers have been built&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;purpose-built for organized data processing (&lt;em&gt;jobs&lt;/em&gt;). It is baked from the core for workflows, not ad hoc exploration&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;BI/analytics queries are fundamentally ad hoc, interactive, low-latency&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Google invented &lt;a rel=&quot;nofollow&quot; href=&quot;http://research.google.com/pubs/pub36632.html&quot; checklongurl=&quot;true&quot;&gt;&lt;strong&gt;Dremel&lt;/strong&gt;&lt;/a&gt; (now &lt;a rel=&quot;nofollow&quot; href=&quot;http://gigaom.com/cloud/google-opens-up-its-biq-query-data-analytics-service-to-all/&quot; checklongurl=&quot;true&quot;&gt;exposed as the BigQuery product&lt;/a&gt;)&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;I’m not aware of any compelling open source alternatives to Dremel&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Pregel for analyzing graph data&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;certain core assumptions of MapReduce are at fundamental odds with analyzing networks of people, telecommunications equipment, documents and other&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;petabyte -scale graph processing on distributed commodity machines&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Hadoop, which often causes exponential data amplification in graph processing&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;execute graph algorithms such as SSSP or PageRank in dramatically shorter time&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;near linear scaling of execution time with graph size&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;the only viable option in the open source world is &lt;a rel=&quot;nofollow&quot; href=&quot;http://giraph.apache.org/&quot; checklongurl=&quot;true&quot;&gt;Giraph&lt;/a&gt;&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;if you’re trying to process dynamic data sets, ad-hoc analytics or graph data structures, Google’s own actions clearly demonstrate better alternatives to the MapReduce paradigm&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;Percolator, Dremel and Pregel make an impressive trio and comprise the new canon of big data&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;li&gt; 
&lt;div class=&quot;diigoContent&quot;&gt; 
&lt;div class=&quot;diigoContentInner&quot;&gt;similar impact on &lt;span title=&quot;Information Technology&quot; class=&quot;serendipity_glossaryMarkup&quot;&gt;IT&lt;/span&gt; as Google’s original big three of GFS, GMR, and BigTable&lt;/div&gt; 
&lt;/div&gt; 
&lt;/li&gt; 
&lt;/ul&gt; 
&lt;/li&gt;
&lt;/ul&gt; &lt;a class=&quot;block_level&quot; href=&quot;https://stephan.paukner.cc/syslog/archives/414-Link-roundup,-week-292012.html#extended&quot;&gt;Continue reading &quot;Link roundup, week 29/2012&quot;&lt;/a&gt;
    </content:encoded>

    <pubDate>Fri, 20 Jul 2012 12:52:00 +0000</pubDate>
    <guid isPermaLink="false">https://stephan.paukner.cc/syslog/archives/414-guid.html</guid>
    <category>linkroll</category>

</item>
<item>
    <title>Wir verlassen die Stadt</title>
    <link>https://stephan.paukner.cc/syslog/archives/278-Wir-verlassen-die-Stadt.html</link>
            <category>New Home</category>
    
    <comments>https://stephan.paukner.cc/syslog/archives/278-Wir-verlassen-die-Stadt.html#comments</comments>
    <wfw:comment>https://stephan.paukner.cc/syslog/wfwcomment.php?cid=278</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>https://stephan.paukner.cc/syslog/rss.php?version=2.0&amp;type=comments&amp;cid=278</wfw:commentRss>
    

    <author>paux+www15@paukner.cc (Stephan Paukner)</author>
    <content:encoded>
    
&lt;p&gt;Heute haben wir den Vertrag für unsere neue Wohnung unterschrieben. Sie befindet
sich deutlich außerhalb von Wien.&lt;/p&gt;&lt;p&gt;Nachdem ich vor &lt;nobr&gt;einigen Jahren&lt;/nobr&gt; eine
35m² Wohnung in Wien gemietet hatte, die sozusagen als meine
Studentenbude fungierte, hat es sich ergeben, dass wir sie schon seit geraumer Zeit zu
zweit bewohnen, was sowohl an kapazitive als auch an die Grenzen der Behaglichkeit stößt. Nach dem &lt;a href=&quot;/logbook/categories/5-Master-Thesis&quot;&gt;Studienabschluss&lt;/a&gt; im Vorjahr und dem &lt;a href=&quot;/logbook/archives/72-Exciting-new-challenge.html&quot;&gt;neuen Job&lt;/a&gt; beginnt für mich nun ein weiterer Abschnitt im „richtigen Leben“. Persönliche Highlights der neuen Wohnung sind für mich, dass es sich um einen Neubau mit kontrollierter Wohnraumlüftung handelt, große 95m² Wohnfläche + unglaubliche 24m² Terrasse + Kellerabteil + Garagenplatz = Flächenfaktor 3,5(!), &lt;nobr&gt;2 Kinderzimmer,&lt;/nobr&gt; Badewanne (endlich!), sowie – Trommelwirbel! – ein Geschirrspüler!! &lt;img src=&quot;https://stephan.paukner.cc/syslog/plugins/serendipity_event_emoticate/img/emoticons/wink.png&quot; alt=&quot;;-)&quot; class=&quot;emoticon&quot; /&gt;&lt;/p&gt;&lt;p&gt;Dass ich Kinderzimmer erwähne, führt auch schon zum Hauptgrund, warum wir nicht (mehr) in Wien leben wollen: Da wir selbst Landeier sind und mit dem tristen Flair des &lt;nobr&gt;11. Wiener&lt;/nobr&gt; Gemeindebezirks nichts anfangen konnten, können wir uns nicht vorstellen, unsere Kinder – so sie eines Tages kommen – zwischen Beton, Asphalt, &lt;a href=&quot;/logbook/archives/138-Angekotzt-und-angeschissen.html&quot;&gt;Hundehaufen&lt;/a&gt; und Kopftücheln aufzuziehen. Viel wichtiger ist uns, innerhalb einer Minute in der Natur sein zu können. Der Stadtrand von Wien ist für uns aber unleistbar, womit der Radius weiter zu ziehen war.&lt;/p&gt;&lt;p&gt;Die Ortschaft, in der wir nun leben werden, ist relativ beschaulich, aber groß genug, um die wichtigste Infrastruktur (Schulen, Ärzte, etc.) zu beherbergen. Vielmehr liegt die Entscheidung dafür aber in der strategischen geografischen Lage: Es handelt sich um einen Bahnknotenpunkt, der Weg zur Arbeit nach Wien dauert genauso(!) lang wie der bisherige &lt;a href=&quot;/logbook/archives/122-Von-der-Unmoeglichkeit-in-den-Nordwesten-Wiens-zu-kommen.html&quot;&gt;quer durch die Großstadt&lt;/a&gt;, und zwei größere Städte sind – sogar mit der Bahn – nur einen Katzensprung entfernt.&lt;/p&gt;&lt;p&gt;Ein Hausbau kam für uns nicht in Frage, da dann viel weniger finanzieller Spielraum bliebe für einen guten Lebensstil. Allerdings wird der Bau erst im Laufe des Sommers fertiggestellt und im Herbst übergeben. Am Freitag haben wir den Parkettboden ausgesucht. Wir können&amp;#8217;s kaum erwarten!&lt;/p&gt; 
    </content:encoded>

    <pubDate>Mon, 30 Jun 2008 18:48:00 +0000</pubDate>
    <guid isPermaLink="false">https://stephan.paukner.cc/syslog/archives/278-guid.html</guid>
    <category>german</category>
<category>lifestyle</category>

</item>
<item>
    <title>Project summary</title>
    <link>https://stephan.paukner.cc/syslog/archives/8-Project-summary.html</link>
            <category>Master's Thesis</category>
    
    <comments>https://stephan.paukner.cc/syslog/archives/8-Project-summary.html#comments</comments>
    <wfw:comment>https://stephan.paukner.cc/syslog/wfwcomment.php?cid=8</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>https://stephan.paukner.cc/syslog/rss.php?version=2.0&amp;type=comments&amp;cid=8</wfw:commentRss>
    

    <author>paux+www15@paukner.cc (Stephan Paukner)</author>
    <content:encoded>
    
&lt;p&gt;While being employed 40 hours/week I started to repeat basics in functional analysis in January 2006. In April I started to do some general reading on the subject of time-frequency analysis. I wanted to have the topic of my Master&amp;#8217;s thesis set until May, but it took me until August to file it with the working title &amp;#8220;Gabor Analysis for Image Processing&amp;#8221;. The finish of my thesis had originally been targeted for Christmas 2006, but it soon was clear that it would also take the whole spring of 2007.&lt;/p&gt;&lt;p&gt;With May 2006 I reduced my working times to &lt;nobr&gt;26 hours/week&lt;/nobr&gt;, and I quit my occupation in January 2007 as I was granted a 6-month scholarship. Now I had time to do numerical experiments and to actually write my thesis. The scholarship ended by July 2007, and I hoped to have my thesis finished by September. The final work, entitled &lt;a href=&quot;http://paukner.cc/math/ga4ip/&quot; target=&quot;_blank&quot; title=&quot;Foundations of Gabor Analysis for Image Processing&quot;&gt;Foundations of Gabor Analysis for Image Processing&lt;/a&gt;, was printed in the mid of November and graded an A. I had my master exam on December 18th and finished my studies with distinction.&lt;/p&gt;&lt;p&gt;Some statistics: It took me 8 months to read up on time-frequency analysis (while being employed). I was in the official status of a graduand for 16 months. I authored my thesis within 9 months.&lt;/p&gt; 
    </content:encoded>

    <pubDate>Tue, 18 Dec 2007 14:55:00 +0000</pubDate>
    <guid isPermaLink="false">https://stephan.paukner.cc/syslog/archives/8-guid.html</guid>
    
</item>

</channel>
</rss>
