Display language
To modulepage Generate PDF

#40494 / #8

SoSe 2023 - SoSe 2023

English

BDSPRO Big Data Systems Project

9

Markl, Volker

benotet

Portfolioprüfung

Zugehörigkeit


Fakultät IV

Institut für Softwaretechnik und Theoretische Informatik

34351500 FG Datenbanksysteme und Informationsmanagement (DIMA)

No information

Kontakt


EN 7

Soto, Juan

sekr@dima.tu-berlin.de

Learning Outcomes

In this course students will learn how to systematically analyze a current issue in the Big Data management area as well as develop and implement a problem-oriented solution as part of a team. Students will learn how to cooperate as team member, to contribute to project organization, quality assurance and documentation, and to evaluate the quality of your solution through analysis, systematic experiments and test cases. After the course, students will be able to understand the architecture of large scale data systems and to solve problems behind the end-to-end design, implementation, and testing of large scale data management systems. They will be capable of designing and implementing solutions to improve the performance and feature-completeness of large-scale data systems in a collaborative team.

Content

Both the sciences and industry are currently undergoing a profound transformation: large-scale, diverse data sets - derived from Internet-of-Things sensors, the web, or via crowd sourcing - present a huge opportunity for data-driven decision making. This data poses new challenges in a variety of dimensions: in its unprecedented volume, in the speed at which it is generated (its velocity) and in the variety of data sources that need to be integrated. The field of Big Data Systems deals with the technological means of processing high-volume of data to gain insights from data. A whole new breed of systems and paradigms has been developed to to cope with that these challenges. However, current systems still fall short in addressing many user needs. Therefore, further system research is necessary to achieve robust query performance in any scenario. As a result, students will conduct projects that deal with topics related to current data management trends, such as Modern Hardware, Stateful Data Stream Processing, Sensor Data Management, Compiler Technology, Query Optimization, Machine Learning for Databases, and many others. This scope of the project will be adjusted to the final group size to reflect the overall workload of the course (i.e., 270h of work per student) For that, students will learn the algorithms, system design, and actual implementation of the so called Distributed Processing Platforms (e.g., Flink, NebulaStream, Spark). These are systems that execute parallel computations on terabytes of data on clusters as well as distributed Internet-of-Things topologies of up to several thousand machines. At the start of the project, a student will receive a topic as well as some information material. The team, with the assistance of the lecturer, will decide on a project environment with the suitable tools for team work, project communication, development and testing. Next, the problem will have to be analyzed, modelled and decomposed into individual components, from which tasks are derived that are subsequently assigned to smaller teams or individuals. At weekly project meetings, the project team presents progress and milestones that have been reached. In consultation with the lecturer, it is decided which further steps to take. The project is concluded with a final presentation which includes a demonstration of the prototype.

Module Components

Pflichtteil:

All Courses are mandatory.

Course NameTypeNumberCycleLanguageSWSVZ
BDAPRO - Big Data Analytics ProjectPJ0434 L 484WiSe/SoSeNo information6

Workload and Credit Points

BDAPRO - Big Data Analytics Project (PJ):

Workload descriptionMultiplierHoursTotal
Documentation, Presentation1.040.0h40.0h
Implementation, Tests, Experiments1.0130.0h130.0h
Participation in Meetings20.03.0h60.0h
Preparation Phase and Design1.040.0h40.0h
270.0h(~9 LP)
The Workload of the module sums up to 270.0 Hours. Therefore the module contains 9 Credits.

Description of Teaching and Learning Methods

Guided and self-organized project work.

Requirements for participation and examination

Desirable prerequisites for participation in the courses:

1. General computer science knowledge (e.g., algorithms, data structures, systems architecture, distributed systems) obtained via the completion of a Bachelor's (e.g., B.Sc. in Computer Science) is required, as well as linear algebra and statistics. 2. Depending on the topic, additional prerequisites may be required (e.g., Database Technology, Data Management on Modern Hardware, Machine Learning, Compilers). 3. Solid programming skills in at least one of the following programming languages: Java, C++, Scala, or Python. 4. Basic knowledge in functional programming. 5. Basic knowledge in distributed source control management systems (i.e., Git) and software development processes such as Agile. 6. Students must have completed DBT or an equivalent course on database internals prior to enrolling in BDSPRO. If not, alternatively, students must be concurrently enrolled in DBT and BDSPRO in the same semester.

Mandatory requirements for the module test application:

This module has no requirements.

Module completion

Grading

graded

Type of exam

Portfolio examination

Type of portfolio examination

100 Punkte insgesamt

Language

English

Test elements

NamePointsCategorieDuration/Extent
(Deliverable Assessment) Experiments Analysis20practicalabout 30h
(Deliverable Assessment) Final Presentation20oralabout 20 minutes
(Deliverable Assessment) Intermediate Presentation10oralabout 10-15 minutes
(Learning Process Review) Experiment Design and Execution20practicalabout 30h
(Learning Process Review) Prototype with Test Cases and Documentation30practicalabout 60h

Grading scale

Notenschlüssel »Notenschlüssel 2: Fak IV (2)«

Gesamtpunktzahl1.01.31.72.02.32.73.03.33.74.0
100.0pt95.0pt90.0pt85.0pt80.0pt75.0pt70.0pt65.0pt60.0pt55.0pt50.0pt

Test description (Module completion)

The overall grade for the module consists of the results of the course work ('portfolio exam'). The following are included in the final grade: 1. Prototype with test cases and documentation (30p.) 2. Experiment design and execution (20p.) 3. Intermediate presentation (10p.) 4. Experiments analysis (20p.) 5. Final presentation (20p.) In the final grade, students are graded individually, i.e., final grades between students in a group can vary depending on the amount of work carried out by each person. The final grade according to § 68 (2) AllgStuPO will be calculated with the faculty grading table 2. (Die Gesamtnote gemäß § 68 (2) AllgStuPO wird nach dem Notenschlüssel 2 der Fakultät IV ermittelt.)

Duration of the Module

The following number of semesters is estimated for taking and completing the module:
1 Semester.

This module may be commenced in the following semesters:
Winter- und Sommersemester.

Maximum Number of Participants

The maximum capacity of students is 12.

Registration Procedures

Students are required to register for the course in the official TUB examination system within six weeks after commencement of the first lecture or when the first graded assignment is due, whichever happens to be first.

Recommended reading, Lecture notes

Lecture notes

Availability:  unavailable

 

Electronical lecture notes

Availability:  unavailable

 

Literature

Recommended literature
Project specific literature will be announced in the first lecture.

Assigned Degree Programs


This module is used in the following Degree Programs (new System):

Studiengang / StuPOStuPOsVerwendungenErste VerwendungLetzte Verwendung
This module is not used in any degree program.
This course addresses master students with a focus on database systems and information management after the first (master) term in “Informatik - System Engineering”, “Technische Informatik -- Informationssysteme”, “Wirtschaftsingenieurwesen -- IuK”. (If capacity is available, it will be open also for other faculties).

Miscellaneous

No information