Foundations of Data Science


Lecture in the winter term 2017/2018

data stream consisting of zeros and ones Copyright: © Chair i7


Tue, 1:15 - 2:45pm (AH II)
Thu, 8:30 - 10:00am (AH II)

Exercise Class
Tue, 10:15  - 11:45am (5056)



In the age of "big data" and "advanced analytics", data processing faces new challenges. Queries become more complex and often involve data mining and machine learning tasks, and the scale of the datasets requires new algorithmic approaches.

This course will cover the "theoretical foundations" of modern data processing and analytics. This includes topics from database theory, such as data models, the analysis of query languages, and basic algorithmic and complexity theoretic questions related to query processing. It also includes topics from algorithmic learning theory, such as basic machine learning algorithms, support vector machines, the PAC model, and VC-Dimension. Furthermore, it includes new models of computation on massive datasets, such as the streaming model and the map-reduce paradigm, and algorithms for these models.

We will focus on "computational aspects" of the theory. Statistics, though undoubtedly one of the foundations of data science, will not play a central role in this course.


This lecture can be taken as a bachelor or master course.

There are no prerequisites required.



The course will be held in english.

Time and Place

Tuesday, 1:15pm - 2:45pm in 2350|111 (AH II)
Thursday, 8:30am - 10:00am in 2350|111 (AH II)

This 3-hour course will be held as 4-hour course, but not every week. The exact dates will be announced in the first lecture and can be found in CampusOffice.


Martin Grohe



There will be weekly exercise sets. Completing these successfully, reachiong at least 50% of possible points, is necessary for admittance to the examination.

The exercise sheets will be released on Thursdays and have to be handed in before the Thursday lecture one week later. Alternatively the exercise sheets can be put in the exercise box of the chair, building E1, first floor, before 10:00am.

Groups of up to three students are allowed in fact, encouraged to work together and hand in the solutions together.

The solutions of the exercises will be presented on Tuesday, 8:30am - 10:00am in 2356|056 (5056).


There will be written exams. The exact modalities of the exams will be announced later. The planned exam dates are:

Thursday, Februrary 15, 2018, 11:30am, 2350|111 (AH II)
Thursday, March 22, 2018, 11:30am, 2350|009 (AH I)



S. Abiteboul, R. Hull, V. Vianu. Foundations of Databases. Addison Wesley 1995.

J. Hopcroft, R. Kannan. Foundations of Data Science. Unpublished, draft available online.

M. Kearns, U. Vazirani. An Introduction to Computational Learning Theory. MIT Press 1994.

J. Leskovec, A. Rajaraman, J. Ullman. Mining of Massive Datasets. Cambridge University Press 2014.

S.J. Russell, P. Norvig. Artificial Intelligence: A Modern Approach. 3rd Edition, Pearson 2014.


External Links