Navigation To modulepage
Display language

BDAPRO - Big Data Analytics Project

9 LP

English

#40494 / #5

Seit SS 2017

Fakultät IV

EN 7

Institut für Softwaretechnik und Theoretische Informatik

34351500 FG Datenbanksysteme und Informationsmanagement

Markl, Volker

Alexandrov, Alexander

sekr@dima.tu-berlin.de

POS-Nummer PORD-Nummer Modultitel
2346454 37407 BDAPRO - Big Data Analytics Project

Learning Outcomes

In this course you will learn to systematically analyze a current issue in the information management area and to develop and implement a problem-oriented solution as part of a team. You will learn to cooperate as team member and to contribute to project organization, quality assurance and documentation. The quality of your solution has to be proven through analysis, systematic experiments and test cases. Examples of IMPRO projects carried out in recent semesters are a tool used to analyse Web 2.0 Forum data, an online multiplayer game for mobile phones, implementation and analysis of new join methods for a cloud computing platform or the development of data mining operations on the massively parallel system Hadoop as part of the Apache open source project Mahout. After the course, students will be able to understand methods for large scale data analytics and to solve large scale data analytics problems. They will be capable of designing and implementing large scale data analytics solutions in a collaborative team.

Content

Both the sciences and industry are currently undergoing a profound transformation: large-scale, diverse data sets - derived from sensors, the web, or via crowd sourcing - present a huge opportunity for data-driven decision making. This data poses new challenges in a variety of dimensions: in its unprecedented volume, in the speed at which it is generated (its velocity) and in the variety of data sources that need to be integrated. A whole new breed of systems and paradigms is currently developed to be able to cope with that these challenges. The field of Big Data Analytics deals with the technological means of gaining insights from huge amounts of data. Students will conduct projects that deal with applying data mining algorithms to large datasets. For that, students will learn to use so called Parallel Processing Platforms (e.g. Flink, Spark, Hadoop, HBase), systems that execute parallel computations with terabytes of data on clusters of up to several thousand machines. At the start of the project, a student will receive a topic as well as some information material. The team, with the assistance of the lecturer, will decide on a project environment with the suitable tools for team work, project communication, development and testing. Next, the problem will have to be analyzed, modelled and decomposed into individual components, from which tasks are derived that are subsequently assigned to smaller teams or individuals. At weekly project meetings, the project team presents progress and milestones that have been reached. In consultation with the lecturer, it is decided which further steps to take. The project is concluded with a final report as well as a final presentation which includes a demonstration of the prototype.

Module Components

Pflichtteil:

All Courses are mandatory.

Course Name Type Number Cycle Language SWS
BDAPRO - Big Data Analytics Project PJ 0434 L 484 WS/SS No information 6

Workload and Credit Points

BDAPRO - Big Data Analytics Project (PJ):

Workload description Multiplier Hours Total
Documentation, Presentations 1.0 40.0h 40.0h
Implementation, Tests, Experiments 1.0 130.0h 130.0h
Participation in Meetings 20.0 3.0h 60.0h
Preparation Phase and Design 1.0 40.0h 40.0h
270.0h (~9 LP)
The Workload of the module sums up to 270.0 Hours. Therefore the module contains 9 Credits.

Description of Teaching and Learning Methods

Guided and self-organized project work.

Requirements for participation and examination

Desirable prerequisites for participation in the courses:

Knowledge from the complete Bachelor program (Informatik or Technische Informatik) is required, as well as linear algebra and statistics. Depending on the topic, additional prerequisites may be required, e.g. „DBT - Database Technology" Solid programming skills in at least one of the following programming languages: Java, C++, Scala, Python. Basic knowledge in functional programming. Basic knowledge in distributed source control management systems (Git, Mercurial) and software processes like Scrum.

Mandatory requirements for the module test application:

No information

Module completion

Grading:

graded

Type of exam:

Portfolio examination

Language:

English

Typ of portfolio examination

100 points in total

Test elements

Name Points Categorie Duration/Extent
(Deliverable assessment) Experiments analysis 20 practical about 30h
(Deliverable assessment) Final presentation 20 oral about 20 minutes
(Deliverable assessment) Intermediate presentation 10 oral about 10-15 minutes
(Learning process review) Experiment design and execution 20 practical about 30h
(Learning process review) Prototype with test cases and documentation 30 practical about 60h

Grading scale

1.01.31.72.02.32.73.03.33.74.0
95.090.085.080.075.070.065.060.055.050.0

Test description (Module completion)

The overall grade for the module consists of the results of the course work ('portfolio exam'). The following are included in the final grade: 1. Prototype with test cases and documentation (30p.) 2. Experiment design and execution (20p.) 3. Intermediate presentation (10p.) 4. Experiments analysis (20p.) 5. Final presentation (20p.) The final grade according to § 47 (2) AllgStuPO will be calculated with the faculty grading table 2. (Die Gesamtnote gemäß § 47 (2) AllgStuPO wird nach dem Notenschlüssel 2 der Fakultät IV ermittelt.)

Duration of the Module

This module can be completed in one semester.

Maximum Number of Participants

The maximum capacity of students is 27.

Registration Procedures

Students are required to register via the DIMA course registration tool before the start of the first lecture (http://www.dima.tu-berlin.de/). Within the first six weeks after commencement of the lecture, students will have to register for the course at QISPOS (university examination protocol tool) in addition to the registration at the DIMA course registration tool.

Recommended reading, Lecture notes

Lecture notes

Availability:  unavailable

Electronical lecture notes

Availability:  unavailable

Literature

Recommended literature
[2] A. Rajaraman, J. Ullman: Mining of Massive Datasets, Cambridge 2010
[1] H. Garcia-Molina, J. Ullman, J. Widom: Database Systems – The Complete Book, Pearson 2009
More, project specific, literature will be announced in the first lecture.

Assigned Degree Programs

This module is used in the following modulelists:

This course addresses master students with a focus on database systems and information management after the first (master) term in “Informatik – System Engineering”, “Technische Informatik -- Informationssysteme”, “Wirtschaftsingenieurwesen -- IuK”. (If capacity is available, it will be open also for other faculties). Moreover it is a compulsory class for the ERASMUS MUNDUS IT4BI master programme, and it is mandatory elective for the EIT-Digital Data Science Master.

Miscellaneous

No information