Display language
To modulepage Generate PDF

#40311 / #1

SS 2014 - SS 2014

English

Advanced Information Management 3 - Scalable Data Analysis and Data Mining

6

Markl, Volker

benotet

Mündliche Prüfung

Zugehörigkeit


Fakultät IV

Institut für Softwaretechnik und Theoretische Informatik

34351500 FG Datenbanksysteme und Informationsmanagement (DIMA)

No information

Kontakt


EN 7

Soto, Juan

sekr@dima.tu-berlin.de

Learning Outcomes

Recent advances in technology have led to rapid growth of big data. This led to the need for cost efficient and scalable analysis algorithms. In this course concepts for scalable analysis of big data sets will be presented and applied using open source technologies. Participants of this module will gain an in-depth understanding of concepts and methods as well as practical experience in the area of scalable data analysis and data mining. The course is principally designed to impart: technical skills 50% method skills 30% system skills 10% social skills10%

Content

The focus is of this module is to get familiar with different parallel processing platforms and paradigms and to understand their feasibility for different kinds of data mining problems. For that students will learn how to adapt popular data mining and standard machine learning algorithms such as: Naïve Bayes, K-Means clustering or PageRank to scalable processing paradigms. And subsequently gain practical experience in how to implement them on parallel processing platforms such as Apache Hadoop, Stratosphere and Apache Giraph.

Module Components

Pflichtteil:

All Courses are mandatory.

Course NameTypeNumberCycleLanguageSWSVZ
Advanced Information Management 3 - Scalable Data Analytics and Data MiningIV0434 L 472SoSeNo information4

Workload and Credit Points

Advanced Information Management 3 - Scalable Data Analytics and Data Mining (IV):

Workload descriptionMultiplierHoursTotal
Exercises / practice15.02.0h30.0h
Final report & presentation (as preparation for oral exam)15.02.0h30.0h
Plenary sessions15.04.0h60.0h
Preparation / Consolidation (including literature work and seminar presentation)15.04.0h60.0h
180.0h(~6 LP)
The Workload of the module sums up to 180.0 Hours. Therefore the module contains 6 Credits.

Description of Teaching and Learning Methods

This „integrated course“(Integrierte Veranstaltung, IV) consists of lectures on key concepts and exercise sessions with smaller and larger exercises, particularly one complex task, to be fulfilled in team work. This includes elaborating one of the key topics with own literature work, giving a short presentation and developing an implementation. Active contribution to all parts of the course is essential, as there will be a final presentation of the complex exercise by all members of the course.

Requirements for participation and examination

Desirable prerequisites for participation in the courses:

Prerequisites: The material covered in the basic modules MPGI 1-5 in the Bachelor Curriculum in Computer Sciences/ TU Berlin, MPGI5 (“Datenbanksysteme”) as well as good Java programming skills are required. A basic understanding of Probability and Statistics as well as Linear Algebra is helpful. The AIM-3 / SDADM course will be given in English language, thus fluency in English is required!

Mandatory requirements for the module test application:

1. Requirement

Module completion

Grading

graded

Type of exam

Oral exam

Language

English

Duration/Extent

No information

Duration of the Module

The following number of semesters is estimated for taking and completing the module:
1 Semester.

This module may be commenced in the following semesters:
Sommersemester.

Maximum Number of Participants

The maximum capacity of students is 30.

Registration Procedures

Students are required to register via the DIMA course registration tool before the start of the first lecture (http://www.dima.tu-berlin.de/). Within the first six weeks after commencement of the lecture, students will have to register for the course at QISPOS (university examination protocol tool) and ISIS (course organization tool) in addition to the registration at the DIMA course registration tool.

Recommended reading, Lecture notes

Lecture notes

Availability:  unavailable

 

Electronical lecture notes

Availability:  available
Additional information:
http://www.dima.tu-berlin.de

 

Literature

Recommended literature
Anand Rajaraman, Jeffrey David Ullman : Mining of Massive Datasets (Free Online: http://infolab.stanford.edu/~ullman/mmds/book.pdf)
Ian H. Witten, Eibe Frank: Data Mining: Practical Machine Learning Tools and Techniques.
Tom White: Hadoop: The Definitive Guide von Tom White

Assigned Degree Programs

This module is not used in any degree program.

Students of other degrees can participate in this module without capacity testing.

open for the remaining diploma students in the mentioned areas.

Miscellaneous

This module is offered each summer term. For each topic during this course additional research papers and reports will be used.