Display language
To modulepage Generate PDF

#40791 / #1

WS 2016/17 - WS 2016/17

English

The 800-pound Gorilla in the corner: Data Integration

6

Abedjan, Ziawasch

benotet

Mündliche Prüfung

Zugehörigkeit


Fakultät IV

Institut für Softwaretechnik und Theoretische Informatik

34352300 FG Big Data Management (Juniorprof.)

No information

Kontakt


EN 7

Abedjan, Ziawasch

sekr@bigdama.tu-berlin.de

Learning Outcomes

“Data integration is the 800-pound gorilla in the corner, and everyone’s got it in spades,” according to Mike Stonebraker, MIT professor and Turing Award Laureate. The most challenging and time-consuming task of data scientists in the era of Big Data is to consolidate data from different sources, overcoming dirty data, heterogeneity in data representations, and incompleteness of data. In this course, we will surface the entire pipeline of an information integration workflow, by learning about existing integration architectures, algorithms in data cleansing, schema matching, and data fusion. Furthermore, we will discuss state-of-the-art systems and prominent use cases of information integration techniques.

Content

The course has the following main topics: - Distribution and autonomy - Foundations of data integration - String Matching - Schema matching/mapping - Global-as-View and Lokal-as-View modelling - Data cleansing - Duplicate detection - Data quality - Hidden Web

Module Components

Pflichtgruppe:

All Courses are mandatory.

Course NameTypeNumberCycleLanguageSWSVZ
Data IntegrationIV4523 L 08957WiSe/SoSeNo information4

Workload and Credit Points

Data Integration (IV):

Workload descriptionMultiplierHoursTotal
Ausarbeitung15.02.0h30.0h
Präsenz15.04.0h60.0h
Übungsaufgaben/Praxisteil (Fallbeispiel)15.02.0h30.0h
Vor- und Nachbearbeitung (inkl. Lesen der Primärliteratur und eigenem Vortrag)15.04.0h60.0h
180.0h(~6 LP)
The Workload of the module sums up to 180.0 Hours. Therefore the module contains 6 Credits.

Description of Teaching and Learning Methods

Vorlesung und Übung

Requirements for participation and examination

Desirable prerequisites for participation in the courses:

Die Voraussetzungen sind das abgeschlossene Bachelorstudium und Grundkenntnisse im Bereich des Datenbankmanagements und Grundkenntnisse in mindestens einer modernen Programmier- oder Skriptsprache.

Mandatory requirements for the module test application:

This module has no requirements.

Module completion

Grading

graded

Type of exam

Oral exam

Language

English

Duration/Extent

No information

Duration of the Module

The following number of semesters is estimated for taking and completing the module:
1 Semester.

This module may be commenced in the following semesters:
Winter- und Sommersemester.

Maximum Number of Participants

This module is not limited to a number of students.

Registration Procedures

Alle Teilnehmer/innen müssen sich vor dem ersten Lehrveranstaltungstermin mit dem Anmeldetool auf den BigDaMa-Webseiten (http://www.bigdama.tu-berlin.de/) für dieses Modul bei BigDaMa anmelden. Beachten Sie bitte unbedingt alle Regelungen Ihres Studienganges! Die Anmeldung zur Prüfung erfolgt über ein elektronisches Prüfungsanmeldesystem.

Recommended reading, Lecture notes

Lecture notes

Availability:  unavailable

 

Electronical lecture notes

Availability:  unavailable

 

Literature

Recommended literature
Principles of Data Integration. Anhai Doan, Alon Halevy, Zachary Ives. Morgan Kaufmann, 1st edition (2012), 520 pages Ulf Leser and Felix Naumann: Informationsintegration, dpunkt Verlag, 2006.

Assigned Degree Programs


This module is used in the following Degree Programs (new System):

Studiengang / StuPOStuPOsVerwendungenErste VerwendungLetzte Verwendung
This module is not used in any degree program.

Students of other degrees can participate in this module without capacity testing.

Miscellaneous

No information