LV-Nummer 0434 L 472
Gesamt-Lehrleistung 85,33 UE
Semester WiSe 2020/21
Ansprechpartner
Verantwortlich Markl, Volker
Zugeordnet zu Technische Universität Berlin
↳ Fakultät IV
    ↳ Institut für Softwaretechnik und Theoretische Informatik
        ↳ 34351500 FG Datenbanksysteme und Informationsmanagement (DIMA)
URL
Label
Sprache Deutsch
Module
Studiengänge
Alle Veranstaltungen im Kurs
Advanced Information Management 3 (AIM-3) - Scalable Data Science: Systems und Methods (SDSSM)
Gruppe AIM 3 Advanced Information Management III – Scalable Data Science: Systems and Methods

Termine (16)


Fr., 06.11.2020 - Fr., 19.02.2021

14:00 Uhr - 18:00 Uhr

Ohne Ort

85,33 UE
Einzeltermine ausklappen

Legende
Fr.

02.11.2020 - 08.11.2020(SW 3)

08:00
09:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
AIM 3 Advanced Information Management III – Scalable Data Science: Systems and Methods
IV
Ohne Ort

351133

The module will focus on mainstream distributed processing platforms and paradigms and learn how to employ these to solve challenging big data problems using popular data mining methods. Students will learn how to implement and employ varying data mining algorithms, such as Naïve Bayes, K-Means Clustering, and PageRank on varying open-source systems (e.g., Apache Hadoop, Apache Flink).


351140

The module can be completed within 1 semester.

The lab capacity limits this course to max. 30 participants.

Since 2014, this module is offered each summer and winter term.

For each topic during this course additional research papers and reports will be used.

351133

The module will focus on mainstream distributed processing platforms and paradigms and learn how to employ these to solve challenging big data problems using popular data mining methods. Students will learn how to implement and employ varying data mining algorithms, such as Naïve Bayes, K-Means Clustering, and PageRank on varying open-source systems (e.g., Apache Hadoop, Apache Flink).


351135

Recent advances in technology have led to rapid growth of big data. This led to the need for cost efficient and scalable analysis algorithms. In this course concepts for scalable analysis of big data sets will be presented and applied using open source technologies.

Participants of this module will gain an in-depth understanding of concepts and methods as well as practical experience in the area of scalable data science. The course is principally designed to impart: technical skills (50%), method skills (30%), system skills (10%), and social skills (10%).


351134

Students are required to register via the DIMA course registration tool before the start of the first lecture (http://www.dima.tu-berlin.de/).

Within the first six weeks after commencement of the lecture (31.5.2020), students have to register for the course at QISPOS (university examination protocol tool) and ISIS (course organization tool) in addition to the registration at the DIMA course registration tool.


351141

This course targets Master’s students focused on Database Systems and Information Management in Computer Science (Major: System Engineering), Computer Engineering (Major: Information Systems & Software Engineering), and Industrial Engineering. Compulsory Elective module for ERASMUS MUNDUS IT4BI, plus Compulsory for EIT-ICT Data Science (DS) and Compulsory Elective for EIT-ICT Cloud Computing and Services (CCS) Subject to space availability, Master’s students in other academic programs may also enroll and satisfy elective module requirements.

Wahlpflichtmodul im Masterstudiengang Informatik/Studienschwerpunkt System Engineering, Technische Informatik/ Studienschwerpunkte Informationssysteme & Software Engineering und im Masterstudiengang Wirtschaftsingenieurswesen (Studiengang IuK). Wahlpflichtmodul im ERASMUS MUNDUS IT4BI, sowie für EIT-ICT Cloud Computing and Services (CCS), Pflicht für EIT-ICT Data Science (DS).

Je nach Verfügbarleit der Plätze können auch Studierende anderer Fachrichtungen als Wahlpflicht das Modul belegen.

351138

This Integrated Course (Integrierte Veranstaltung, IV) consists of: (i) lectures on key concepts, (ii) practical theoretical & programming exercises, and (iii) student lead presentations (including literature search).

Active participation and contributions to all parts of this course are essential.


351137

Computer science topics addressed in TU Berlin modules in the Bachelor’s curriculum, particularly, the database course (“Information Systems and Data Analysis”) or the equivalent, as well as good Java programming skills are required. Basic knowledge in linear algebra, numerical analysis, probability, and statistics are strongly recommended. Furthermore, it is preferable if students have already completed (or are currently enrolled in) a machine-learning course

Since the course will be offered in English, fluency in English is also required.

351139

Prüfungsform: Portfolioprüfung

The portfolio exam is comprised of three parts, namely:

  1.  Homework         (30 points)
  2.  in-class presentations (20 portfolio points), and
  3.  a written final exam ("Testat") (50 portfolio points).

The final grade will be computed according to the grading table 2 of faculty IV, according to German law, § 47 (2) AllgStuPO TU Berlin.


351136

Literatur:

  • Anand Rajaraman, Jeffrey David Ullman : Mining of Massive Datasets (Free Online: http://infolab.stanford.edu/~ullman/mmds/book.pdf)
  • Data Mining: Practical Machine Learning Tools and Techniques, Ian H. Witten and Eibe
  • Frank, Morgan Kaufmann, 2011. 
  • Hadoop: The Definitive Guide (4th Edition), Tom White, O’Reilly Media, 2015.
  • Supplementary reading material may be assigned to complement course lectures