Display language
To modulepage Generate PDF

#40311 / #7

SoSe 2022 - SoSe 2023

English

SDS Scalable Data Science

6

Markl, Volker

benotet

Portfolioprüfung

Zugehörigkeit


Fakultät IV

Institut für Softwaretechnik und Theoretische Informatik

34351500 FG Datenbanksysteme und Informationsmanagement (DIMA)

No information

Kontakt


EN 7

Soto, Juan

sekr@dima.tu-berlin.de

Learning Outcomes

Recent advances in technology have led to rapid growth of big data. This led to the need for cost efficient and scalable analysis algorithms. In this course concepts for scalable analysis of big data sets will be presented and applied using open source technologies. Participants of this module will gain an in-depth understanding of concepts and methods as well as practical experience in the area of scalable data science. The course is principally designed to impart: technical skills (50%), method skills (30%), system skills (10%), and social skills (10%).

Content

The module will focus on mainstream distributed processing platforms and paradigms and learn how to employ these to solve challenging big data problems using popular data mining methods. Students will learn how to implement and employ varying data mining algorithms, such as Naïve Bayes, K-Means Clustering, and PageRank on varying open-source systems (e.g., Apache Hadoop, Apache Flink).

Module Components

Pflichtteil:

All Courses are mandatory.

Course NameTypeNumberCycleLanguageSWSVZ
Advanced Information Management 3 (AIM-3) - Scalable Data Science: Systems und Methods (SDSSM)IV0434 L 472SoSeNo information4

Workload and Credit Points

Advanced Information Management 3 (AIM-3) - Scalable Data Science: Systems und Methods (SDSSM) (IV):

Workload descriptionMultiplierHoursTotal
Exercises/Practice15.04.0h60.0h
Plenary sessions15.04.0h60.0h
Preparation & Consolidation (incl. literature studies)15.04.0h60.0h
180.0h(~6 LP)
The Workload of the module sums up to 180.0 Hours. Therefore the module contains 6 Credits.

Description of Teaching and Learning Methods

This Integrated Course (Integrierte Veranstaltung, IV) consists of: (i) lectures on key concepts, (ii) practical theoretical & programming exercises, and (iii) student lead presentations (including literature search). Active participation and contributions to all parts of this course are essential.

Requirements for participation and examination

Desirable prerequisites for participation in the courses:

Computer science topics addressed in TU Berlin modules in the Bachelor’s curriculum, particularly, the database course (“Information Systems and Data Analysis”) or the equivalent, as well as excellent JAVA AND SQL programming skills are strictly required. Basic knowledge in linear algebra, numerical analysis, probability, and statistics are strongly recommended. Furthermore, it is highly advisable if students have already completed (or are currently enrolled in) a machine-learning course. Since the course will be offered in English, fluency in English is also required.

Mandatory requirements for the module test application:

This module has no requirements.

Module completion

Grading

graded

Type of exam

Portfolio examination

Type of portfolio examination

100 Punkte insgesamt

Language

English

Test elements

NamePointsCategorieDuration/Extent
(Deliverable assessment) Homework30written30 hours / 20 pages
(Deliverable assessment) In-class presentations20oral40 min. / about 35 slides
(Examination) Written test50written60 min

Grading scale

Notenschlüssel »Notenschlüssel 2: Fak IV (2)«

Gesamtpunktzahl1.01.31.72.02.32.73.03.33.74.0
100.0pt95.0pt90.0pt85.0pt80.0pt75.0pt70.0pt65.0pt60.0pt55.0pt50.0pt

Test description (Module completion)

The portfolio exam (worth 100 points) is comprised of three parts, namely: (i) written homework (30 points), (ii) in-class presentations (20 portfolio points), and (iii) a written exam (50 portfolio points). The final grade according to § 68 (2) AllgStuPO will be calculated with the faculty grading table 2. (Die Gesamtnote gemäß § 68 (2) AllgStuPO wird nach dem Notenschlüssel 2 der Fakultät IV ermittelt.)

Duration of the Module

The following number of semesters is estimated for taking and completing the module:
1 Semester.

This module may be commenced in the following semesters:
Sommersemester.

Maximum Number of Participants

The maximum capacity of students is 30.

Registration Procedures

Students are required to register for the course in the official TUB examination system within six weeks after commencement of the first lecture or when the first graded assignment is due, whichever happens to be first

Recommended reading, Lecture notes

Lecture notes

Availability:  unavailable

 

Electronical lecture notes

Availability:  available
Additional information:
https://www.dima.tu-berlin.de/menue/teaching/

 

Literature

Recommended literature
Hadoop: The Definitive Guide (4th Edition), Tom White, O’Reilly Media, 2015.
Mining of Massive Datasets (3rd Edition), J. Leskovec, A. Rajaraman, and J. Ullman, Cambridge 2019. http://mmds.org/.
Stream Processing with Apache Flink, Fabian Hueske and Vasiliki Kalavri, O’Reilly Press, 2019
Supplementary reading material may be assigned to complement course lectures.

Assigned Degree Programs


This module is used in the following Degree Programs (new System):

Studiengang / StuPOStuPOsVerwendungenErste VerwendungLetzte Verwendung
This module is not used in any degree program.
This course targets Master’s students focused on Database Systems and Information Management in Computer Science (Major: System Engineering), Information Systems Management, Computer Engineering (Major: Information Systems & Software Engineering), and Industrial Engineering. In addition, this course is a compulsory elective for ICT Innovation (i.e. EIT Data Science) students. Master’s students in other academic programs may also enroll in this course subject to space availability. Wahlpflichtmodul im Masterstudiengang Informatik/Studienschwerpunkt System Engineering, Technische Informatik/Studienschwerpunkte Informationssysteme & Software Engineering und im Masterstudiengang Wirtschaftsingenieurswesen (Studiengang IuK). Wahlpflichtmodul ICT Innovation (z.B. EIT Data Science) Je nach Verfügbarleit der Plätze können auch Studierende anderer Fachrichtungen als Wahlpflicht das Modul belegen.

Miscellaneous

No information