Big Data: Technologies, Methods, Concepts
Prof. Dr. Andreas Harth
Lectures with exercises
4 SWS, ECTS-Studium, ECTS-Credits: 5
Time and location: see campo.
No specific prerequisites are required. Some basic knowledge in databases and web technologies could be useful.
Big Data refers to dataset that are too large or too complex to handle in traditional data management and processing systems. The course presents an overview of methods and technologies related to the storage and processing of Big Data, including:
Distributed Systems and Cloud Computing
Big Data Processing Systems, including Map/Reduce
Theory and Practice of NoSQL Systems
Data Mapping and Integration
The course concludes with an outlook on further topics, including data mining and machine learning.
The course teaches the fundamentals of Big Data, including real-world use cases, as well as current technical challenges and opportunities with Big Data. Students will learn about the foundational algorithms used in large-scale distributed systems. Further, students will learn how to make use of available technologies to store, process and integrate Big Data on cloud infrastructures and to perform data analytics tasks. The hands-on sessions include setting up a cloud environment, and querying and visualizing a large dataset.
Explain the V’s of Big Data
Outline the distributed architectures used in Map/Reduce systems
Explain Brewer’s CAP theorem
Write algorithms with map and reduce functions
Outline the use of similarity metrics for data mapping
Explain steps involved in large-scale data integration and data analytics
Jure Leskovec, Anand Rajaraman, Jeff Ullman, Mining of Massive Datasets, http://mmds.org/.
AnHai Doan, Alon Halevy, Zachary Ives, Principles of Data Integration, Morgan Kaufmann, 2012.