Multimedia Information Retrieval

Students: In general I respond to emails within 2 workdays (Mon-Fri). If you do not get a response, please resend the email - Dr. Lew

It is necessary to register on the LML Course Manager:

LML Course Manager, lcm.liacs.nl

Period: (tentative - may change) Wednesdays, 2022
Time: 14:15 - 16:00
Place: Snellius rm. 313

Organizer:

Dr. Michael S. Lew, (Lecturer) email: lewmsk@gmail.com (Email to make an appointment)

Assistance and contributions given by:

Mingrui Lao, email: m.lao@liacs.leidenuniv.nl

Kai He, email: k.he@liacs.leidenuniv.nl

Nan Pu, email: n.pu@liacs.leidenuniv.nl

Students: In emails, please use subject lines which start with MIR::

Recommended Prior Knowledge:

Because this course depends on knowledge in deep learning and visual imagery analysis, it is strongly recommended that students have successfully completed:

Introduction to Deep Learning
or Deep Learning & Neural Networks

and

The student must be fluent in C and C++ programming (Python is also useful) and in image processing.

Goals:

Briefly, this course is research seminar that covers the fundamentals of understanding multimedia information retrieval by using deep learning techniques, computer vision, and graphics. In this course, the students will present important or recent research in multimedia understanding toward providing relevant information for searching.

The state-of-the-art techniques for understanding the multimedia data are currently based in the fields of deep learning and computer vision. Therefore, this course will be discussing the strengths and weaknesses, challenges and issues and future directions of computer vision and deep learning as methods of understanding diverse multimedia data.

At the end of the Multimedia Information Retrieval course, the student should be able to

- understand the fundamental principles of multimedia information retrieval
- understand the primary paradigms in deep learning and computer vision
- have insight into traditional and state-of-the-art multimedia features.
- have insight into traditional and state-of-the-art multimedia learning algorithms.
- analyze a multimedia information retrieval system with regard to strengths and weaknesses and potential areas for improvements.
- explain the differences between modern search engines and database systems.
- have insight into traditional and state-of-the-art multimedia features
- have insight into scientifically evaluating an information retrieval system
- have insight into the integration of intelligent algorithms into the retrieval process
- develop and write scientific reports
- develop and give scientific presentations

Description:

Multimedia information exists in diverse forms, from personal images to movies to MRI and X-ray imagery. It is frequently combined with other types of media such as text (on the WWW) or audio and more recently location and range sensing data (e.g. smartphones and self-driving automobiles), which is why we call it multimedia. Because multimedia sensors are everywhere, there is a worldwide need for algorithms and systems for finding and understanding the multimedia data (e.g. all the images on the WWW, digital libraries, medical imagery, shop cameras, smartphones, etc.)

The state-of-the-art techniques for understanding the multimedia data are currently based in the fields of computer vision and deep learning. Therefore, this course will be discussing the strengths and weaknesses, challenges and issues and future directions of computer vision and deep learning as methods of understanding diverse multimedia data.

Work-forms

- lectures
- seminar
- student discussions
- presentations
- homework and software assignments

Examination (for 6 ECs):

The final grade is composed of (1) 50% for Paper Presentation/Seminar (Class Participation & Questions & Homework). (2) 50% for Assignments and Final Project.

All programming assignments must work on Linux (Ubuntu 18 or 20)

For programming homework/assignments, the grading focuses on the software: source code, documentation, and how well the program works. Non-programming homework may be a blend of theoretical questions and/or scientific assessments of particular algorithms, where the main focus is not on programming, but on assessing and evaluating methods.

Assignments turned in late: grade penalty of -1 per 24 hours (1 day)

Source code for assignments must include instructions for compiling and execution in the machines in rooms 302, 303. This is necessary for grading/evaluating the work by the class organizers.

As this is a seminar, attendance is mandatory

University Leiden students do the work (see Examination above) including a Final Project for 6 ECTS.

Final Project

In principle, this is meant to be a scientific toned study between at least 2 competing algorithms for the same task such as VGG vs Resnet in image classification. The goal is to gain insights into the strengths and weaknesses of each algorithm and also the challenges in performing quantitative fair comparisons.

Recommended Reading: Principles of Visual Information Retrieval, M. S. Lew, Springer, 2001, ISBN: 978-1-85233-381-2

Suggestions for class presentations

- Introduce the problem and main idea - Explain the motivation for the paper
- Cover the main points of the algorithm
- Make sure you have a slide which clearly states the main contribution of the paper at the end
- There should be a little humor in the presentation to make it interesting for the audience

Topics

- Introduction to Multimedia Retrieval using Computer Vision and Multimedia Deep Learning
- Major Computer Vision Paradigms
- Major Deep Learning Paradigms
- Challenges in information retrieval
- Color, texture, and shape features
- Multi-modal features and salient points
- Similarity and Ranking
- Building search engines
- Bag-of-words approaches
- Concept detection
- Interactive and intelligent search algorithms
- Advanced browsing paradigms
- High performance search systems

Schedule (TENTATIVE):

Week 1: 9 Feb - Challenges in Computer Vision (CV) and Multimedia Deep Learning (DL)

Week 2: 16 Feb - First Major Paradigm of Computer Vision and Multimedia

Week 3: 23 Feb - Second Major Paradigm of Computer Vision and Multimedia

Week 4: 2 Mar - Advanced Deep Learning (guest lecture)

Week 5: 9 Mar - [optional] Reading and preparation for paper presentations

Week 6: 16 Mar - [optional] Giving Scientific Presentations Notes and Q&A

Week 7: 23 Mar - Scientific Paper Presentations

Week 8: 30 Mar - Homework on Deep Learning

Week 9: 6 April - Scientific Paper Presentations

Week 10: 13 April - [optional] Final Project Advice

Week 11: 20 April - Scientific Paper Presentations

Week 12: 27 April - No Class - King's Day

Week 13: 4 May - Scientific Paper Presentations

Week 14: 11 May - No Class - Science Career Event

Week 15: 18 May - Scientific Paper Presentations

Week 16: 25 May - Final Projects