Learning Outcomes
Recent advances in technology have led to rapid growth of big data. This led to the need for cost
efficient and scalable analysis algorithms. In this course concepts for scalable analysis of big data
sets will be presented and applied using open source technologies. Participants of this module will
gain an in-depth understanding of concepts and methods as well as practical experience in the area
of scalable data analysis and data mining.
The course is principally designed to impart:
technical skills 50% method skills 30% system skills 10% social skills10%
Content
The focus is of this module is to get familiar with different parallel processing platforms and
paradigms and to understand their feasibility for different kinds of data mining problems. For that
students will learn how to adapt popular data mining and standard machine learning algorithms such
as: Naïve Bayes, K-Means clustering or PageRank to scalable processing paradigms. And
subsequently gain practical experience in how to implement them on parallel processing platforms
such as Apache Hadoop, Stratosphere and Apache Giraph.
Description of Teaching and Learning Methods
This „integrated course“(Integrierte Veranstaltung, IV) consists of lectures on key concepts and
exercise sessions with smaller and larger exercises, particularly one complex task, to be fulfilled in
team work. This includes elaborating one of the key topics with own literature work, giving a short
presentation and developing an implementation. Active contribution to all parts of the course is
essential, as there will be a final presentation of the complex exercise by all members of the course.
Registration Procedures
Students are required to register via the DIMA course registration tool before the start of the first
lecture (http://www.dima.tu-berlin.de/). Within the first six weeks after commencement of the lecture,
students will have to register for the course at QISPOS (university examination protocol tool) and ISIS
(course organization tool) in addition to the registration at the DIMA course registration tool.