Personal note: No, I didn’t gain that much weight. I mean, you tell me. This is about large-scale Web companies, which, from the perspective of mainstream computing, abide to styles and approaches to system design and implementation that seem foreign. Today’s off-beat approach is quickly becoming tomorrow’s common knowledge, and this talk sets a foot in that direction.


Working with very large data sets, only a few years ago the monopoly of a few companies (such as Google, Walmart, Yahoo, or Morgan Stanley), is becoming increasingly commonplace. Dealing with massive quantities of data on parallel computational networks shifts usual design tradeoffs substantially: operations that are traditionally considered cheap become prohibitive, and algorithms that seem ungainly become life savers. This talk shares from Andrei’s experience on working on large data sets with his doctoral work and one year of doing Natural Language Processing research for Faceboook.