MSc thesis supervision
I am supervising students who conduct research in information retrieval, natural language processing and learning analytics (basically data science applied to the learning domain). For information retrieval, ongoing benchmark competitions give you a good idea of hot-topic tasks and research directions:
Natural language processing has a lot of public benchmarks too, though they are typically not organized within a particular conference/workshop series. The nlpprogress.com website contains a good overview of tasks/datasets and current state-of-the-art. IR and NLP are not completely separate fields and naturally some of the NLP tasks are more relevant for IR than others. The following tasks are relevant to IR (the links lead to the particular section on the nlpprogress website):
If you are interested in learning analytics, have a look at the proceedings of different editions of the Learning At Scale conference (2014, 2015, 2016, 2017, 2018, 2019). I am particularly interested in approaches that require the implementation of tooling that is hypothesized to aid learning (either in a MOOC or in the classroom), which is then deployed in either a crowdsourcing study or an actual class.
Below are the resources I have developed for my courses (some are more up-to-date than others): Big Data Processing, Web and Database Technology and Information Retrieval.
Big Data Processing
Since 2013/2014 I have been teaching the second year Bachelor course Big Data Processing at TU Delft (with 2016/17 being the last time for now). The course covers a range of technologies in the Hadoop ecosystem after a short excursion into the streaming world; I created the material based on a number of great books, including Mining of Massive Datasets, Data-Intensive Text Processing with MapReduce, Hadoop: The Definite Guide, Programming Pig and ZooKeeper.
Slides - 2016/17 Edition
- Streams I
- Streams II
- Algorithm design for MapReduce
- Pig I
- Pig II
- Graph algorithms
- 2 more lecture on Spark completed this course.
Assignments - 2016/17 edition
- Assignment 1: Streaming
- Assignment 2: Streaming and Hadoop
- Assignment 3: Hadoop
- Assignment 4: Pig data
- Assignment 5: Pig data
- Assignment 6: Giraph
- Assignment 7: Spark
A Sample of Previous Exams
- 24 questions on streaming
- 32 questions on MapReduce/Hadoop
- 10 questions on graphs and Giraph
- 12 questions on Pig/Pig Latin
Web (and Database) Technology
Since 2013/2014 I have also been teaching the first year Bachelor course Web and Database Technology (known as TI1506 or CSE1500) at TU Delft, together with Alessandro Bozzon. I teach the Web technology part, which turned out to be quite a challenge due to the wide variety of skill sets our incoming students possess (some work as Web developers, others have never written a single line of HTML before the start of this course). To level the playing field for the more than 300 (or 400, 500, 600,… it is increasing every year) students we have, we rely on the Learning Web App Development book, which introduces the modern Web stack in an accessible and practical manner. The lectures build on the material introduced in the book.
Web Technology Transcripts, Demo code, etc.
In the 2018/19 edition, we have roughly 900 students taking the course and so I finally bit the bullet and made extensive lecture transcripts and demo code and wrote it up all in a GitHub repository: https://github.com/chauff/Web-Teaching.
Feel free to use the materials (with acknowledgement)!
Web Technology Slides - 2017/18 Edition
- HTTP: slide decks 1 and 2
- HTML and Web app design
- Cookies and sessions
- Securing your application
Lab Assignments - 2017/18 Edition
Note: this course has five assignments and at the end the students will have developed a fully functioning Web application (in this year a habit tracker) with a database backend. The assignments build on each other and are executed in groups of two students; interviews with TAs lead to a pass or fail of an assignment.
A Sample of Previous Exams
Note: the samples below include database questions, which are covered in the other half of the course
Slides - 2017/18 Edition
After a few years of not teaching IR, I am back at it. This time, the Information Retrieval course covers core IR topics for half of the lectures and NLP topics (taught by Nava Tintarev) for the other half.
- IR evaluation
- Retrieval models
- Query refinement
- Interactive IR
- Personalization in (Web) search
- Neural models
- Learning to rank
I have also written a blog post about the IR project setup.
Slides - 2011/12 Edition
In 2011/12 I taught my core area of research in a Master course Information Retrieval at TU Delft. This course was my first excursion into the use of Hadoop & Co as part of my teaching, thanks to a grant from Amazon and their (at the time) “AWS Education” scheme - $3500 to allow students to use a real Hadoop cluster for their experiments.
The course material is quite old by now, but it may still be useful to some. It was also my first venture into the teaching of large classes, the structure and design of the course certainly reflects that.