Entries from Stephan Paukner

Tuesday, July 9. 2013

Cheap time lapses, II

Some quotes of myself from four years ago:

“Cheap” should mean here that you don’t need to spend money on extra hardware like a remote timer or on extra software like Windoze [...]

My hope is that I can use gphoto2 with an Android smartphone.

Somehow this is still a Linux-post, as Android simply is the most popular Linux distribution to date. Btw, PC and smartphone don’t count as dedicated extra hardware. But you might need to purchase an app and a USB adapter:

Meanwhile I use the awesome DSLR Controller on my Android smartphone in USB host-mode to create time-lapse picture sequences on my aging Canon EOS 40D—a thing that wouldn’t work with a diePhone, I guess. This app can do a lot more, of course. Up to now, you have to rely on USB. However, as Wi-Fi enabled DSLRs are appearing, there is some hope to get rid of that cable and adapter one day; Canon’s own EOS Remote app doesn’t support time-lapse shooting yet. [Update: To equip your camera with Wi-Fi, you could hack an Android TV stick!]

The most elegant solution would of course be located within the camera itself. Canon is still sleeping in this regard, but at least there’s the third-party Magic Lantern firmware add-on for some newer models, also sporting an intervalometer. My 40D is not (yet, but I guess won’t ever be) supported.

You should still set the camera to a resolution at or slightly above the HD 1080p resolution, which in my case is 1936×1288. You should fix the aperture and ISO values, and probably also the exposure time. [Update: In addition, fix the white balance. Also, don’t forget to cover the viewfinder with the eyepiece cover on your strap, as otherwise you might get different exposures that result in flickering!]

After getting the images to your Linux machine, you need to crop the pictures from 3:2 to 16:9 (in my case 1936×1089) or crop a 1920×1080 patch directly. You can do this with a simple script using ImageMagick:

#!/bin/bash

[ ! -z “$1” ] && v=$1 || v=0
[ ! -z “$2” ] && h=$2 || h=0

for img in *JPG; do
num=$(echo $img | tr -d ’[:alpha:]_.’)
convert $img -crop 1920x1080+$h+$v img_${num}c.jpg
done

(GNU Parallel didn’t work for me.) You can then issue

$ mencoder ’mf://*jpg’ \
-nosound \
-ovc x264 \
-x264encopts nocabac:level_idc=41:bframes=0:bitrate=9500:\
global_header:threads=auto:pass=1 \
-mf type=jpg:fps=24 \
-vf dsize=1920:1080:2 \
-of lavf -lavfopts format=mp4 \
-o timelapse-f24-1080.mp4

to render an HD video into a format that’s also recognized by your smartphone.

Finally, you could use OpenShot to edit your videos and add background music. (You could of course compose that music yourself using FOSS as well...)

Posted by Stephan Paukner in Information Technology, Photography at 15:28 | Comments (0) | Trackbacks (0)

Defined tags for this entry: gnu-linux, photography, software, time lapse, video

Wednesday, May 22. 2013

My quest for a tablet

I started the following section as a draft in 2008, with the generic title “Mobile device wanted”:

I need an electronic device with about the shape and weight of a usual magazine or book.

It should have an electronic touch-screen with a reasonably high resolution, in a size that covers most of the front area.

The device should be capable of storing files of all type.

The device’s software should be capable of:

Showing documents of various formats (especially PDF) that can be scrolled by touching the screen.

The screen should capture and visualize touch-pen movements correspondingly, e.g. to underline phrases or add hand-written notes. (Most important!)

Bookmark functionality and cross-document references. (OK, just nice to have.)

These notes should be stored in some way, either directly in the document (if the format allows it), or in a separate database or file that maybe even allows exporting the additional information.

Editing editable document formats such as DOC or ODT by translating touch-pen movements to text is optional.

Optional features:

Keyboard underneath or attachable.

Network connection (LAN/Wi-Fi) and internet/e-mail/WWW software.

I want such a device as I read many scientific papers/PDFs for work. It’s not convenient to print them all out, gather them as a bunch of single sheets and take notes with a real pen while e.g. on a train. It would be nice to have a PC-like device, as software and operating systems already exist, and it would immediately be possible to search the web for further information.

It seems that such a device is finally coming in two different incarnations: As netbook or as e-reader.

Yes. The ¡Pad didn’t exist yet. That mentioned netbook back then was a convertible, with a touch-screen LCD that could be rotated to be operated like a tablet. Similar solutions still exist, e.g. the Lenovo ThinkPad X230, but I dislike the low screen resolution, its weight and the MS Windows 7 OS that’s not tailored for touch use. But the main reason why I disliked both the netbook and the e-readers was that these were no dedicated solutions for stylus input—I wanted to write formulas and draw graphs and arrows. I followed the evolution of e-readers with e-ink displays, but they were just too laggy.

When the ¡Pad was introduced in 2010, I saw the desired technology approaching, but I was leering at a more open Android solution. Sadly, in the first time those tablets were only targeted at game and movie enthusiasts, and were more considered gadgets than productive devices. I was drooling over Plastic Logic’s QUE e-reader that also had a stylus, but of course it never appeared on the market.

The finally tablet-optimized Android 3 of 2011 wasn’t mature. Also, I still think that 10” screen diagonals are somewhat small compared to textbooks or magazines, so I was drooling over Kno’s 14” tablet. (I shake my head over opinions that 10” are already too large.) Of course, Kno went out of the hardware business before reaching production. From a different manufacturer, and while too underweight for my needs, the NoteSlate was a nice concept, but it still doesn’t exist.

2011 also brought Samsung’s S Pen technology with their Android-based Note brand, and I thought that a tablet variant would be just around the corner. Indeed, the Note 10.1 came in 2012, but it still had that pixelated 720p resolution (regarding font rendering) instead of more reasonable 1080p. Finally, there seem to be two 11” Samsung tablets lurking for 2013, but none of them is Note branded; there’s also a rumor that there won’t be a Note 10.1 successor soon due to weak sales of the current model.

It appears I’ll have to wait for yet another year.

Posted by Stephan Paukner in Information Technology at 19:14 | Comments (0) | Trackback (1)

Defined tags for this entry: android, hardware, tablet

Monday, January 14. 2013

Technical job board FAIL

From: me
To: info@[a technical job board].com
Subject: Re: Please delete my account

[a technical job board] wrote:

Your account has been deleted. Please inform us about the reason you want to close your account.
We would like to improve our services and the website all the time so I hope you can give me some valuable feedback.

Reason for deletion: I was shocked to see that after registration, I was sent my password in an email in clear text.
There are two things wrong with that:
1. You obviously store users’ passwords in clear text.
2. You send passwords via email, an inherently insecure protocol.
While your service might be intended for technically versed professionals, these facts state that your service isn’t run by such.

Posted by Stephan Paukner in Curiosities at 10:05 | Comments (0) | Trackbacks (0)

Defined tags for this entry: career, fail, internet, security, www

Sunday, August 12. 2012

Link roundup, week 32/2012

A Strategic Mistake With Big Data — International Institute for Analytics

tags: bigdata opinion
- Bill Franks
- That mistake is the development of a siloed, distinct big data strategy
- strategy for big data is a new facet of their overall enterprise data and analytic strategy
- the mess that many multi-channel retailers got themselves into through their entry into e-commerce
- retailers launched distinct e-commerce divisions. Some were even separate legal entities. As opposed to viewing e-commerce as a new facet of an overall retail strategy
- distinct processes and distinct infrastructure was created
- provide a consistent experience for customers across channels
- go into a store and grab a product and then find that same product on the retailer’s website. Guess what? They have no way to match those products
- separate, non-integrated strategies for big data will likely end up with systems and processes that are very difficult to integrate
- integrate big data into the overall infrastructure and current and future analytic processes
Machine learning system can ID cities via pics | Cutting Edge - CNET News

tags: machinelearning computervision datamining research technology
- The system automatically picks out relevant architectural details from photos
- the details woven into the urban fabric that form a pattern
- computers are learning to ID your city just by looking at random photos
- Google Street View images of Paris, London, New York, and Barcelona
- features like the street signs, balconies, and lampposts of Paris to be distinct
- lack of stylistic coherence in American cities
- presented at Siggraph 2012
- emerging field of visual data mining, which is more complex than looking for patterns in text or numbers
- Alexei Efros
- we wish to automatically build a digital visual atlas of not only architectural but also natural geo-informative features for the entire planet
Interviewing data scientist candidates? Ask these questions

tags: datascience career opinion
- “It’s way over-hyped,” said Franks
- chief analytics officer for the data warehouse appliance vendor Teradata
- “softer skills.”
- “data artist.”
- data scientists are those that are able to understand the business problem; they’re able to apply creativity and present the results well
- intuition, which is hard to teach
- five core areas I look at
- Commitment
  - 1.
- Creativity
  - 2.
- thought process they went through deciding what to do. Someone who’s not creative is going to give me a list of the steps they went through one-by-one more from a technical perspective
- Business savvy
  - 3.
- what I want to hear is not just some technical reasons
- Knowing how much information to give to the non-technical people
- Presentation
  - 4.
- specific presentation as part of the interview process
- intuition
  - 5.
- art or music or some other type of creative area

Continue reading "Link roundup, week 32/2012"

Posted by Stephan Paukner in Data Science at 17:28 | Comments (0) | Trackbacks (0)

Defined tags for this entry: linkroll

Sunday, August 5. 2012

Link roundup, week 31/2012

Linux lessons for Hadoop doubters • The Register

Interesting comparison of Hadoop today with the Linux story from the past. This could mean Hadoop/MapReduce as state of the art around 2020.

tags: bigdata cloud technology linux
- While Hadoop is all the rage in the technology media today, it has barely scratched the surface of enterprise adoption
- Hadoop seems set to win despite its many shortcomings
- still in the transition from zero per cent adoption to one per cent adoption
- IBM points to a few specific deficiencies
- lack of performance and scalability, inflexible resource management, and a limitation to a single distributed file system
- IBM, of course, promises to resolve these issues with its proprietary complements to Hadoop
- Hadoop is batch oriented in a world increasingly run in real-time
- customers are buying big into Hadoop
- it’s still possible that other alternatives, like Percolator, will claim the Hadoop crown
- Back in 2000 IBM announced that it was going to invest $1bn in advancing the Linux operating system. This was big news
- it came roughly 10 years after Linus Torvalds released the first Linux source code, and it took another 10 years before Linux really came to dominate the industry
- The same seems true of Hadoop today
- we’re just starting the marathon
Can Big Data cut through your growing resume pile? - Fortune Management

Three data-mangling job sites, all only for the US

tags: bigdata career
- Bright, one of several new companies
- another new job site, Path.to
- Gild, a third major player
How ‘Big Data’ Is Different

tags: bigdata datascience technology opinion
- Thomas H. Davenport, Paul Barth and Randy Bean
- how do the potential insights from big data differ from what managers generate from traditional analytics?
- 1. Paying attention to flows as opposed to stocks
- the data is not the “stock” in a data warehouse but a continuous flow
- Streaming analytics
- organizations will need to develop continuous processes
- data extraction, preparation and analysis took weeks to prepare — and weeks more to execute
- conventional, high-certitude approaches to decision-making are often not appropriate
- new data is often available that renders the decision obsolete
- 2. Relying on data scientists and product and process developers as opposed to data analysts
- the people who work with big data need substantial and creative IT skills
- programming, mathematical and statistical skills, as well as business acumen and the ability to communicate effectively
- EMC Corporation
- started an educational offering for data scientists
- 3. Moving analytics from IT into core business and operational functions
- new products designed to deal with big data
- Hadoop
- Relational databases have also been transformed
- Statistical analysis packages
- the cloud
- “virtual data marts” allow data scientists to share existing data without replicating it
- traditional role of IT— automating business processes — imposes precise requirements
- Analytics has been more of an afterthought for monitoring processes
- business and IT capabilities used to be stability and scale, the new advantages are based on discovery and agility
- discovery and analysis as the first order of business
- IT processes and systems need to be designed for insight, not just automation
PayPal’s Mok Oh On What Is A Data Scientist? - Forbes

The title is misleading: It’s not about what DS is. It’s rather a vision of the ideal solution.

tags: datascience technology opinion
- the old state and the ideal future state, which he calls “Analyst 1.0” and “Analyst 2.0,”
- Analyst 1.0 as the state of maturity achieved by using the last generation of business intelligence tools
- Analyst 1.0 has some coding skills, and perhaps writes an SQL query here and there
- inflexibility of data warehouses and relational databases
- Our current state of affairs, which we’ll call Analyst 1.5, finds us in limbo
- two primary limitations: the immense size and variety of the data, and the complexity of the tools needed
- Hadoop
- to get value from big data, business analysts cannot simply be presented with a programming language
- Analyst 1.5 is characterized by a disconnect between data scientists and the tools and systems in the more complex camp of programmers and computer scientists
- caused data to be totally fragmented
- Analyst 2.0 will have arrived when vendors and IT make analysis easy enough that a typical business user can conduct analysis entirely by themselves
- Tools such as self-learning recommendations engines
- demands new skills, such as a more precise focus on aberrant or statistically significant data in a stream, as well as better tools
- somehow at some point you have to get your analytical inspection down to the equivalent of code level
- what we’re trying to model is every person’s brain–at least the part of the brain that decides how to shop, when to shop, and what you want
- we need to continue to mine for behavioral data, such as what people looked at before and after they made transactions
- among the top pitfalls is the tendency to focus on a very small piece of data without occasionally stepping back
- tendency to over-focus on technology
- organizations are tempted to put the most technology-savvy person on the job, rather than the most business-savvy
- computer scientists are not trained to ask the right business questions

Continue reading "Link roundup, week 31/2012"

Posted by Stephan Paukner in Data Science at 12:08 | Comments (0) | Trackbacks (0)

Defined tags for this entry: linkroll

Sunday, July 29. 2012

Link roundup, week 30/2012

David MacKay: Information Theory, Pattern Recognition and Neural Networks: The Book

tags: machinelearning books
- Information Theory, Inference, and Learning Algorithms
- 640 pages, Published September 2003
- PDF (A4) pdf (9M) (fourth printing, March 2005)
Cookbook for R » Cookbook for R

HTML

tags: r statistics programming
- formerly named R Cookbook
- It is not related to Paul Teetor’s excellent R Cookbook
Big Data -- Why the 3Vs Just Don’t Make Sense -- TDWI -The Data Warehousing Institute

When are we done defining big data? p.1

tags: datascience bigdata opinion
- settled on 3 Vs -- volume, variety, and velocity
- another V: value
- if big data is understood solely on the basis of these trends, it isn’t clear that it’s at all hype-worthy
- if “big data” simply describes the volume, variety, and velocity of the information that constitutes it, our existing data management practices are still arguably up to the task
- big data is hyped on the basis of its real or imagined outputs
- a lot more interesting when you bring in ‘V’ for value
No, Really, Some of My Best Friends Are Data Scientists

When are we done defining data science?

tags: datascience statistics opinion
- the skills of a “data scientist” are those of a modern statistician
- know how to move data around and manipulate data with some programming language
- know how to draw informative pictures of data
- Knowledge of stats, errorbars, confidence intervals
- try to get people from different backgrounds
- Great communication skills
- a lot of what we teach The Kids now looks a lot more like machine learning than statistics as it was taught circa 1970, or even circa 1980
- Everything I know about statistics I’ve learned without formal instruction
- is not, in my experience, intrinsically hard for anyone who already has a decent grounding in some other mathematical science
- mastering them really does mean trying to do things and failing
- potentially hazardous. This is the idea that all that really matters is being “smart”
- counter-productive for students to attribute their success or failure in learning about something to an innate talent
Taming The Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics (Wiley & SAS Business): Bill Franks

Bill Franks is Chief Analytics Officer at Teradata

tags: datascience bigdata books
- 27. April 2012
- 336 Seiten
The Big Question In Big Data Is...What’s The Question? | SmartData Collective

tags: datascience opinion bigdata
- Some folks like to confuse Hadoop with big data
- Focus On the Questions To Ask, Not The Answers
- The failure of data warehouses to provide real-time data led to the creation of data marts
- Data marts failed to provide complete and updated and comprehensive views
- existing solutions still don’t solve the problem. Why? The market and business environment have changed
- Data moves from structured to unstructured. Sources exponentially proliferate. Data quality is paramount.
- Real-time is irrelevant because speed does not trump fidelity. Quantity does not trump quality
- Business questions remained unanswered despite the massive number of reports and views and charts
- The big shift is about moving from data to decisions
Information Theory, Pattern Recognition, and Neural Networks

tags: machinelearning video lectures
- Draft videos (editing incomplete)
- Entropy and Data Compression
- Shannon’s Source Coding Theorem
- Inference and Information Measures for Noisy Channels
- Introduction to Bayesian Inference
- Approximating Probability Distributions
- Clustering
- Monte Carlo Methods
- Gibbs sampling
- Neural Networks
- Other course materials - free online text book [Information Theory, Inference, and Learning Algorithms]

Posted from Diigo. The rest of my favorite links are here.

Posted by Stephan Paukner in Data Science at 12:16 | Comments (0) | Trackbacks (0)

Defined tags for this entry: linkroll

Friday, July 20. 2012

Link roundup, week 29/2012

It took me quite a long time to discover that my favorite knowledge management tool, Diigo, provides a feature to post one’s bookmarks to a blog. As I often had the desire to repost certain links I stumbled upon, I will do that occasionally from now on, mainly about everything from the topic pool of data mining (and related buzzwords), with flavors ranging from theory to applications, from technology to business. (I can’t really do that to social media sites, as it’s almost impossible to explicitly consume posts topic-wise. So, blogs aren’t really obsolete—yet.)

Btw, Diigo is really awesome: You can highlight text on webpages and add annotations to help understanding an article and create a summary on the fly, right while going through it. In this sense: If you want to be briefed, read at least this. (And don’t worry, the next episodes will contain less content; this one ranges back a few weeks.)

Your Laptop Can Now Analyze Big Data - Technology Review

tags: computerscience datascience technology software
- GraphChi, exploits the capacious hard drives
- a Mac Mini running GraphChi can analyze Twitter’s social graph from 2010—which contains 40 million users and 1.2 billion connections—in 59 minutes
- The previous published result on this problem took 400 minutes using a cluster of about 1,000 computers
- graph computation is becoming more and more relevant
- GraphChi is capable of effectively handling many large-scale graph-computing problems without resorting to cloud-based solutions or supercomputers
Large-scale Incremental Processing Using Distributed Transactions and Notifications

Google’s Percolator paper

tags: google datascience bigdata technology
- Publication Year 2010
- MapReduce and other batch-processing systems cannot process small updates individually
- Percolator, a system for incrementally processing updates to a large data set
Big Data in Deutschland – der Status Quo | silicon.de

tags: bigdata datascience technology opinion
- das Thema Big Data noch in einem frühen Stadium
- noch in der Analyse- und Planungsphase
- Verfügbarkeit neuer Analyse- und Datenbanktechnologien
- dynamische Zunahme des unternehmensinternen Datenverkehrs
- Big Data vielfach ‘durch die Hintertür’ ins Unternehmen
- Datenwachstum von 42 Prozent bis zum Ende des Jahres 2014
- auf Seiten der Storage-Infrastruktur eine Menge Arbeit
- mittelständischen (500-999 Mitarbeiter) und
  den Großunternehmen (ab 1.000 Mitarbeiter)
- Über ein Drittel erwarten sich Kosteneinsparungen. Fast die Hälfte verspricht sich bessere Einsichten in das Informations- und Konsumverhalten der Kunden
- hohen Erwartungen, die an Dienstleister und Lösungsanbieter gestellt werden
Why the days are numbered for Hadoop as we know it — Cloud Computing News

tags: datascience bigdata cloud technology opinion
- it has become synonymous with big data
- de facto standard
- Is the enterprise buying into a technology whose best day has already passed?
- Hadoop’s inspiration – Google’s MapReduce
- Google File System (GFS) and Google MapReduce (GMR)
- make big data processing approachable to Google’s typical user/developer
- Hadoop Distributed File System and Hadoop MapReduce — was born in the image of GFS and GMR
- Your code is turned into map and reduce jobs, and Hadoop runs those jobs for you
- Google evolved. Can Hadoop catch up?
- GMR no longer holds such prominence in the Google stack
- Here are technologies that I hope will ultimately seed the post-Hadoop era
- it will require new, non-MapReduce-based architectures that leverage the Hadoop core (HDFS and Zookeeper) to truly compete with Google
- Percolator for incremental indexing and analysis of frequently changing datasets
- each time you want to analyze the data (say after adding, modifying or deleting data) you have to stream over the entire dataset
- displacing GMR in favor of an incremental processing engine called Percolator
- dealing only with new, modified, or deleted documents
- Dremel for ad hoc analytics
- SQL-like familiarity
- many interface layers have been built
- purpose-built for organized data processing (jobs). It is baked from the core for workflows, not ad hoc exploration
- BI/analytics queries are fundamentally ad hoc, interactive, low-latency
- Google invented Dremel (now exposed as the BigQuery product)
- I’m not aware of any compelling open source alternatives to Dremel
- Pregel for analyzing graph data
- certain core assumptions of MapReduce are at fundamental odds with analyzing networks of people, telecommunications equipment, documents and other
- petabyte -scale graph processing on distributed commodity machines
- Hadoop, which often causes exponential data amplification in graph processing
- execute graph algorithms such as SSSP or PageRank in dramatically shorter time
- near linear scaling of execution time with graph size
- the only viable option in the open source world is Giraph
- if you’re trying to process dynamic data sets, ad-hoc analytics or graph data structures, Google’s own actions clearly demonstrate better alternatives to the MapReduce paradigm
- Percolator, Dremel and Pregel make an impressive trio and comprise the new canon of big data
- similar impact on IT as Google’s original big three of GFS, GMR, and BigTable

Continue reading "Link roundup, week 29/2012"

Posted by Stephan Paukner in Data Science at 12:52 | Comments (0) | Trackbacks (0)

Defined tags for this entry: linkroll

Saturday, March 17. 2012

Status quo of my web usage, II

Changes during the recent months:

Deleted accounts

Gowalla (meanwhile shut down by FB)
Foursquare
FootFeed (that combined the above two)
FB (after 8 months of deactivation)
FriendFeed
Flickr and Yahoo—I had already abandoned Flickr in 2010 and moved to SmugMug last year, but now I also deleted ...
... SmugMug, haven’t really dived into it, using Google+ as photo showroom
PicPlz (only possible via mail to support), using Google+ for random pics
Last.fm (never used)
Blip.fm (rarely used)
Posterous (never used)
Flattr, had abandoned it in 2010, was too expensive

Not (yet) deleted

Brightkite, service was deactivated for weeks, site meanwhile unreachable
Tupalo, maybe give another chance
PayPal, closing didn’t work for days due to a “temporary communication problem”—at least I removed critical data
eBay, not sure if I really don’t need it anymore
Soup.io, another platform for devotedly wasting one’s time