Navigation To modulepage
xml-Export Generate XML
Display language
Debug Infos

Advanced Information Management 3 - Scalable Data Analysis and Data Mining

6 LP

English

#40311 / #1

SS 2014 - SS 2014

Fakultät IV

EN 7

Institut für Softwaretechnik und Theoretische Informatik

34351500 FG Datenbanksysteme und Informationsmanagement

Markl, Volker

No information

sekr@dima.cs.tu-berlin.de

No information

POS-Nummer PORD-Nummer Modultitel
62950 27675 Advanced Information Management 3 - Scalable Data Analysis and Data Mining

Learning Outcomes

Recent advances in technology have led to rapid growth of big data. This led to the need for cost efficient and scalable analysis algorithms. In this course concepts for scalable analysis of big data sets will be presented and applied using open source technologies. Participants of this module will gain an in-depth understanding of concepts and methods as well as practical experience in the area of scalable data analysis and data mining. The course is principally designed to impart: technical skills 50% method skills 30% system skills 10% social skills10%

Content

The focus is of this module is to get familiar with different parallel processing platforms and paradigms and to understand their feasibility for different kinds of data mining problems. For that students will learn how to adapt popular data mining and standard machine learning algorithms such as: Naïve Bayes, K-Means clustering or PageRank to scalable processing paradigms. And subsequently gain practical experience in how to implement them on parallel processing platforms such as Apache Hadoop, Stratosphere and Apache Giraph.

Module Components

Pflichtteil:

All Courses are mandatory.

Course Name Type Number Cycle Language SWS
Advanced Information Management 3 - Scalable Data Analytics and Data Mining IV 0434 L 472 SS No information 4

Workload and Credit Points

Advanced Information Management 3 - Scalable Data Analytics and Data Mining (IV):

Workload description Multiplier Hours Total
Exercises / practice 15.0 2.0h 30.0h
Final report & presentation (as preparation for oral exam) 15.0 2.0h 30.0h
Plenary sessions 15.0 4.0h 60.0h
Preparation / Consolidation (including literature work and seminar presentation) 15.0 4.0h 60.0h
180.0h(~6 LP)
The Workload of the module sums up to 180.0 Hours. Therefore the module contains 6 Credits.

Description of Teaching and Learning Methods

This „integrated course“(Integrierte Veranstaltung, IV) consists of lectures on key concepts and exercise sessions with smaller and larger exercises, particularly one complex task, to be fulfilled in team work. This includes elaborating one of the key topics with own literature work, giving a short presentation and developing an implementation. Active contribution to all parts of the course is essential, as there will be a final presentation of the complex exercise by all members of the course.

Requirements for participation and examination

Desirable prerequisites for participation in the courses:

Prerequisites: The material covered in the basic modules MPGI 1-5 in the Bachelor Curriculum in Computer Sciences/ TU Berlin, MPGI5 (“Datenbanksysteme”) as well as good Java programming skills are required. A basic understanding of Probability and Statistics as well as Linear Algebra is helpful. The AIM-3 / SDADM course will be given in English language, thus fluency in English is required!

Mandatory requirements for the module test application:

1. Requirement:

Module completion

Grading:

graded

Type of exam:

Oral exam

Language:

English

Duration/Extent:

No information

Duration of the Module

This module can be completed in one semester.

Maximum Number of Participants

The maximum capacity of students is 30.

Registration Procedures

Students are required to register via the DIMA course registration tool before the start of the first lecture (http://www.dima.tu-berlin.de/). Within the first six weeks after commencement of the lecture, students will have to register for the course at QISPOS (university examination protocol tool) and ISIS (course organization tool) in addition to the registration at the DIMA course registration tool.

Recommended reading, Lecture notes

Lecture notes

Availability:  unavailable

Electronical lecture notes

Availability:  available
Additional information:
http://www.dima.tu-berlin.de

Literature

Recommended literature
Ian H. Witten, Eibe Frank: Data Mining: Practical Machine Learning Tools and Techniques.
Anand Rajaraman, Jeffrey David Ullman : Mining of Massive Datasets (Free Online: http://infolab.stanford.edu/~ullman/mmds/book.pdf)
Tom White: Hadoop: The Definitive Guide von Tom White

Assigned Degree Programs

Zurzeit wird die Datenstruktur umgestellt. Aus technischen Gründen wird die Verwendung des Moduls während des Umstellungsprozesses in zwei Listen angezeigt.

This module is not used in any degree program.

Students of other degrees can participate in this module without capacity testing.

open for the remaining diploma students in the mentioned areas.

This module is used in the following Degree Programs (new System):

    Students of other degrees can participate in this module without capacity testing.

    open for the remaining diploma students in the mentioned areas.

    Miscellaneous

    This module is offered each summer term. For each topic during this course additional research papers and reports will be used.