After quite a few years of not teaching information retrieval, I finally taught an IR MSc course (140 hours study load, 2 lectures a week for 8 weeks; 4 support hours per week) together with my colleague Nava Tintarev earlier this year. While Nava introduced the students to the basics of NLP, I focused on the core IR topics. Part of the course is a group project (60% of the final grade). Our students could choose to conduct their project work either within the IR or NLP ‘track’. When setting up the IR project, I got inspired by numerous public resources of other lectureres; in return, I want to contribute back by publishing my own project setup. Here it is (addressed directly to the students).
I also added a short reflection and links to my lecture slides at the very end.
You (groups of 2 or 3 students) can choose between two types of projects:
Reproducing a human-in-the-loop IR paper
You first choose a published paper that describes an IR experiment that involves actual users. We have a list of papers that are well suited for this project below, but of course you can also suggest to reproduce another one. You then implement and execute the experiments described in the paper, analyze your results and discuss them in light of the original paper’s results. Note that reproducibility does not mean to create an exact replica of the original paper (that would be replicability), instead reproducibility asks the question whether the idea is reproducible by a different team, via a slightly different experimental setup and independently developed artifacts. ACM has more information on this issue. This paper on algorithm reproducibility will provide you with advice on what to look out for in terms of reproducibility (in particular Table 2).
Why should you choose this project type? Because human-in-the-loop IR experiments require you to understand and apply the whole breadth of IR — starting from finding the right search system to use (or program from scratch if you desire), to the right document collection, implementation and evaluation of user logs.
Replication can be a lot of fun and yes, it sometimes looks like this!
Investigating a research idea of your own choice
You can also propose a research project within IR of your own choice in consultation with me.
One option is to participate in a benchmark (either a current or previous one, depending on the timeline). The most popular benchmark is TREC (note that we have some older TREC corpora as well, contact us to hear which ones).
To get inspired, take a look at recent conference proceedings at the major IR conferences:
- WSDM: International Conference on Web Search and Data Mining
- SIGIR: International Conference on Research and Development in Information Retrieval
- CIKM: Conference on Information and Knowledge Management
- CHIIR: Conference on Human Information Interaction and Retrieval
Why should you choose this project type? Because not all IR topics (e.g. deep learning for IR, conversational search, music retrieval) are covered in the lectures and you can investigate them further by choosing them as your research project
List of Human-in-the-loop IR papers
These papers have different lengths (some are 4 pages, some are 10+ pages). While it is feasible to attempt to reproduce a 4 page paper, for the 10+ page papers, you have to make a selection of the paper aspects to reproduce; use the proposal to argue for your decisions.
- QWERTY: The Effects of Typing on Web Search Behaviour
- Time Pressure and System Delays in Information Search
- Using Information Scent to Understand Mobile and Desktop Web Search Behaviour
- Perceptions of the Effect of Fragmented Attention on Mobile Web Search Tasks
- Playing your cards right: the effect of entity cards on search behaviour and workload
- Quality through flow and immersion: gamifying crowdsourced relevance assessments
- AspecTiles: tile-based visualization of diversified web search results
- An Eye-Tracking Study of Query Reformulation
- SearchGazer: Webcam Eye Tracking for Remote Studies of Web Search
- A comparison of query and term suggestion features for interactive searching
- Effects of popularity and quality on the usage of query suggestions during information search
- Learning by Example: Training Users with High-quality Query Suggestions
- How does search behavior change as search becomes more difficult?
- User performance versus precision measures for simple search tasks
- Find it if you can: a game for modeling different types of web search success using interaction data
- SearchTogether: An Interface for Collaborative Web Search
- Optimizing search by showing results in context
- Mouse movement during relevance judging: implications for determining user attention
- To hint or not: exploring the effectiveness of search hints for complex informational tasks
Your peers are your study participants
In order to make the experimental phase efficient, all course participants will also be required to act as participants in each other’s experiments. We expect this to not only be a fun experience but also an educational one (though admittedly some user experiments may be more interesting than others). Participating will not take a lot of time and will take place in the last three weeks of the course.
To make this possible, we have developed APONE (Academic Platform for Online Experiments), an A/B testing platform through which you can deploy your own experiment as well as participate in others’ experiments. If you reproduce a paper that conducted a lab study, explore the possibilities of converting the setup into an online experiment. The recruitment of participants will become easier this way.
The project work has five milestones:
- Start of week 2: Decision NLP/IR group project and project group settled
- Start of week 3: Project proposal due
- Start of week 6: Intermediate project report due
- Start of week 10: Final project report due
- Week 10: ~15 minute project interview
Among the 15 groups of students who conducted an IR project, 12 chose to reproduce a paper from my list, one decided to reproduce a neural IR paper and two groups tried their hand at TREC CAR. No group decided to follow their own research idea. In hindsight, this is not the result I aimed for; for the students, picking a paper from a pre-approved list was just a lot easier than the back and forth of discussing an idea with me until it is suitable. The reproducibility results were as expected: the older a paper to reproduce, the lower the likelihood of observing the same/similar results.
Most groups deployed their online experiments very late (in the last 1.5 weeks of the class); this meant that students had to participate in each other’s experiments, some of which lasted up to an hour, at a time when they were also busy with final projects and exams for other courses. The next edition of the course will contain an additional milestone at which point the experiments need to be deployed.
An issue arose when two groups chose to reproduce the same paper; they submitted the same experiment to APONE. This was not ideal for the students acting as study participants as some ended up doing essentially the same experiment twice in different conditions. In the next iteration of the course each paper can only be chosen by a single group.
While few groups struggled with the implementation aspects of their project, understanding the statistical analyses conducted in the original papers and selecting the right ones for their experiments proved to be very challenging for many students. A clear signal for an additional lecture on the topic.
Last but not least, here are the topics and my slides of the “core IR” part of this course: