Big Data: Technologies, Methods, Concepts

Lecturer:

Prof. Dr. Andreas Harth

Details:

Lectures with exercise

ECT-Credits: 5

Language: English

Modul-Nr: 85765

Time and location: see campo

Prerequisites

No specific prerequisites are required. Some basic knowledge in databases and web technologies could be useful.

Contents
Big Data refers to datasets that are too large or too complex to handle in traditional data management and processing systems. The course presents an overview of methods and technologies related to the storage and processing of Big Data. The course concludes with modern approaches for Big Data, especially Transformers (Large Language Models) and agent-based systems.
The goal of the course will be to provide a solid foundation in the traditional design aspects relating to Distributed Computing and Distributed Databases, showing how they have influenced modern developments in cloud computing, including distributed data storage (e.g., NoSQL storage techniques) and data processing abstractions (e.g., MapReduce/Hadoop).

Course Objectives
The course teaches the fundamentals of Big Data, including real-world use cases, as well as current technical challenges and opportunities with Big Data. Students will learn about the foundational algorithms used in large-scale distributed systems. Further, students will learn how to make use of available technologies to store, process and integrate Big Data on cloud infrastructures and to perform data analytics tasks. Students will get an overview of methods and approaches related to natural language processing with transformers (Large Language Models) and multi-agent systems.

Learning Goals:

The course gives students a comprehensive understanding of modern data processing techniques, distributed systems, and their applications in business and technology:

Analyse and apply fundamental concepts of Big Data, including data classification, algorithm complexity and distributed systems design
Understand and implement Web of Things (WoT) concepts, including device interfaces and scripting
Apply Business Intelligence principles, including data quality assessment, processing strategies and visualisation techniques
Understand NoSQL database concepts, including ACID/BASE properties, the CAP theorem and various NoSQL store types
Explain and use data processing paradigms, particularly MapReduce and Hadoop, including functional programming concepts
Comprehend and apply Information Retrieval techniques, including web crawling, indexing and ranking algorithms
Describe the architecture, advantages and applications of Transformer models in natural language processing
Explain the principles of intelligent agents, including their properties, architectures and multi-agent system design

Literature:

A. S. Tanenbaum, M. Van Steen. Distributed Systems: Principles and Paradigms (2nd Edition). Prentice Hall, 2006.
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, G. Czajkowski. Pregel: a system for large-scale graph processing. SIGMOD Conference 2010: 135-146.
K. Hwang, J. Dongarra, G. C. Fox. Distributed and Cloud Computing: From Parallel Processing to the Internet of Things (1st Edition). Morgan Kaufmann, 2011.
M. T. Özsu, P. Valduriez. Principles of Distributed Database Systems. Springer, 2011.
T. White. Hadoop: The Definitive Guide. O’Reilly, 2012.
P. J. Sadalage, M. Fowler. NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley Professional, 2012.
Jure Leskovec, Anand Rajaraman, Jeff Ullman, Mining of Massive Datasets.
AnHai Doan, Alon Halevy, Zachary Ives, Principles of Data Integration, Morgan Kaufmann, 2012
Guinard D, Trifa V. Building the Web of Things: With Examples in Node.Js and Raspberry Pi. Manning; 2016.
Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern Approach (4th Edition).

Campo instance: https://www.campo.fau.de:443/qisserver/pages/startFlow.xhtml?_flowId=detailView-flow&unitId=89949&periodId=397&navigationPosition=hisinoneLehrorganisation,examEventOverview

Join the StudOn instance: https://www.studon.fau.de/crs5831215_join.html