MSc thesis supervision
I am supervising students who conduct research in information retrieval and natural language processing. For both research directions, taking a look at papers at recent conferences (such as SIGIR, CIKM, WSDM, EMNLP, ACL) and ongoing benchmark efforts (MSMarco, SQUAD 2.0, GLUE, decaNLP, TREC) will help to figure out a topic of interest.
I also have a set of topics that I am ready to give away:
- Evaluate the usability of Macaw, a recently inroduced Conversational Information Seeking Platform, possibly extend it and run an interactive IR study with it.
- Extend SearchX, a collaborative search engine we built in-house, with shared workspace capabilities and run an interactive IR study with it.
- Analyze the effectiveness of multi-task learning for different IR tasks.
- Evaluate the use of efficient context-sensitive embedding approaches (variations of BERT & Co that do not rely on hundreds of millions of parameters) for different IR tasks and under performance constraints.
- Design, build and evaluate an extension to Visual Studio Code that enables information seeking for programming tasks directly in the IDE.
- Investigate UI elements that make collaborative search in the mobile setting (where screen space is a premium) a real possibility.
Below are the resources I have developed for my courses (some are more up-to-date than others): Big Data Processing, Web and Database Technology and Information Retrieval.
Big Data Processing
Since 2013/2014 I have been teaching the second year Bachelor course Big Data Processing at TU Delft (with 2016/17 being the last time for now). The course covers a range of technologies in the Hadoop ecosystem after a short excursion into the streaming world; I created the material based on a number of great books, including Mining of Massive Datasets, Data-Intensive Text Processing with MapReduce, Hadoop: The Definite Guide, Programming Pig and ZooKeeper.
Slides - 2016/17 Edition
- Streams I
- Streams II
- Algorithm design for MapReduce
- Pig I
- Pig II
- Graph algorithms
- 2 more lecture on Spark completed this course.
Assignments - 2016/17 edition
- Assignment 1: Streaming
- Assignment 2: Streaming and Hadoop
- Assignment 3: Hadoop
- Assignment 4: Pig data
- Assignment 5: Pig data
- Assignment 6: Giraph
- Assignment 7: Spark
A Sample of Previous Exams
- 24 questions on streaming
- 32 questions on MapReduce/Hadoop
- 10 questions on graphs and Giraph
- 12 questions on Pig/Pig Latin
Web (and Database) Technology
Since 2013/2014 I have also been teaching the first year Bachelor course Web and Database Technology (known as TI1506 or CSE1500) at TU Delft, together with Alessandro Bozzon. I teach the Web technology part, which turned out to be quite a challenge due to the wide variety of skill sets our incoming students possess (some work as Web developers, others have never written a single line of HTML before the start of this course).
In the 2018/19 edition, we had roughly 900 students taking the course and so I finally bit the bullet and started making extensive lecture transcripts (with self-check questions, demo code, assignments, etc.), split the materials into GitHub repos and created a good looking website: https://chauff.github.io/Web-Teaching/.
Feel free to use the materials with acknowledgement.
Needless to say that this is ongoing work at all times - web tech changes quickly.
Slides - 2017/18 Edition
After a few years of not teaching IR, I am back at it. This time, the Information Retrieval course covers core IR topics for half of the lectures and NLP topics (taught by Nava Tintarev) for the other half.
- IR evaluation
- Retrieval models
- Query refinement
- Interactive IR
- Personalization in (Web) search
- Neural models
- Learning to rank
I have also written a blog post about the IR project setup.
Slides - 2011/12 Edition
In 2011/12 I taught my core area of research in a Master course Information Retrieval at TU Delft. This course was my first excursion into the use of Hadoop & Co as part of my teaching, thanks to a grant from Amazon and their (at the time) “AWS Education” scheme - $3500 to allow students to use a real Hadoop cluster for their experiments.
The course material is quite old by now, but it may still be useful to some. It was also my first venture into the teaching of large classes, the structure and design of the course certainly reflects that.