Zur Modulseite PDF generieren

#40311 / #1

SS 2014 - SS 2014

English

Advanced Information Management 3 - Scalable Data Analysis and Data Mining

6

Markl, Volker

Benotet

Mündliche Prüfung

English

Zugehörigkeit


Fakultät IV

Institut für Softwaretechnik und Theoretische Informatik

34351500 FG Datenbanksysteme und Informationsmanagement (DIMA)

Keine Angabe

Kontakt


EN 7

Soto, Juan

sekr@dima.tu-berlin.de

Lernergebnisse

Recent advances in technology have led to rapid growth of big data. This led to the need for cost efficient and scalable analysis algorithms. In this course concepts for scalable analysis of big data sets will be presented and applied using open source technologies. Participants of this module will gain an in-depth understanding of concepts and methods as well as practical experience in the area of scalable data analysis and data mining. The course is principally designed to impart: technical skills 50% method skills 30% system skills 10% social skills10%

Lehrinhalte

The focus is of this module is to get familiar with different parallel processing platforms and paradigms and to understand their feasibility for different kinds of data mining problems. For that students will learn how to adapt popular data mining and standard machine learning algorithms such as: Naïve Bayes, K-Means clustering or PageRank to scalable processing paradigms. And subsequently gain practical experience in how to implement them on parallel processing platforms such as Apache Hadoop, Stratosphere and Apache Giraph.

Modulbestandteile

Compulsory area

Die folgenden Veranstaltungen sind für das Modul obligatorisch:

LehrveranstaltungenArtNummerTurnusSpracheSWS ISIS VVZ
Advanced Information Management 3 - Scalable Data Analytics and Data MiningIV0434 L 472SoSeKeine Angabe4

Arbeitsaufwand und Leistungspunkte

Advanced Information Management 3 - Scalable Data Analytics and Data Mining (IV):

AufwandbeschreibungMultiplikatorStundenGesamt
Exercises / practice15.02.0h30.0h
Final report & presentation (as preparation for oral exam)15.02.0h30.0h
Plenary sessions15.04.0h60.0h
Preparation / Consolidation (including literature work and seminar presentation)15.04.0h60.0h
180.0h(~6 LP)
Der Aufwand des Moduls summiert sich zu 180.0 Stunden. Damit umfasst das Modul 6 Leistungspunkte.

Beschreibung der Lehr- und Lernformen

This „integrated course“(Integrierte Veranstaltung, IV) consists of lectures on key concepts and exercise sessions with smaller and larger exercises, particularly one complex task, to be fulfilled in team work. This includes elaborating one of the key topics with own literature work, giving a short presentation and developing an implementation. Active contribution to all parts of the course is essential, as there will be a final presentation of the complex exercise by all members of the course.

Voraussetzungen für die Teilnahme / Prüfung

Wünschenswerte Voraussetzungen für die Teilnahme an den Lehrveranstaltungen:

Prerequisites: The material covered in the basic modules MPGI 1-5 in the Bachelor Curriculum in Computer Sciences/ TU Berlin, MPGI5 (“Datenbanksysteme”) as well as good Java programming skills are required. A basic understanding of Probability and Statistics as well as Linear Algebra is helpful. The AIM-3 / SDADM course will be given in English language, thus fluency in English is required!

Verpflichtende Voraussetzungen für die Modulprüfungsanmeldung:

Abschluss des Moduls

Benotung

Benotet

Prüfungsform

Oral exam

Sprache(n)

English

Dauer/Umfang

Keine Angabe

Prüfungsbeschreibung (Abschluss des Moduls)

The grade will be given with an oral examination. To be admitted for this final exam, a participant must fulfill all required tasks during the course: seminar work; active participation in home/lab exercises including final report and presentation.

Dauer des Moduls

Für Belegung und Abschluss des Moduls ist folgende Semesteranzahl veranschlagt:
1 Semester.

Dieses Modul kann in folgenden Semestern begonnen werden:
Sommersemester.

Maximale teilnehmende Personen

Die maximale Teilnehmerzahl beträgt 30.

Anmeldeformalitäten

Students are required to register via the DIMA course registration tool before the start of the first lecture (http://www.dima.tu-berlin.de/). Within the first six weeks after commencement of the lecture, students will have to register for the course at QISPOS (university examination protocol tool) and ISIS (course organization tool) in addition to the registration at the DIMA course registration tool.

Literaturhinweise, Skripte

Skript in Papierform

Verfügbarkeit:  nicht verfügbar

 

Skript in elektronischer Form

Verfügbarkeit:  verfügbar
Zusätzliche Informationen:

 

Literatur

Empfohlene Literatur
Anand Rajaraman, Jeffrey David Ullman : Mining of Massive Datasets (Free Online: http://infolab.stanford.edu/~ullman/mmds/book.pdf)
Ian H. Witten, Eibe Frank: Data Mining: Practical Machine Learning Tools and Techniques.
Tom White: Hadoop: The Definitive Guide von Tom White

Zugeordnete Studiengänge

Dieses Modul findet in keinem Studiengang Verwendung.

Studierende anderer Studiengänge können dieses Modul ohne Kapazitätsprüfung belegen.

open for the remaining diploma students in the mentioned areas.

Sonstiges

This module is offered each summer term. For each topic during this course additional research papers and reports will be used.