May 1st, 2021: ACM SIG Multimedia - 7th most cited paper (out of 16,000 from 1994-2021) link September 5th, 2020: over 1000 Google Scholar citations and roughly 2 terabytes per month of downloads from universities (MIT, Cambridge, Stanford, Oxford, Columbia, UIUC, NUS, Tsinghua, Univ. Tokyo, KAIST, etc.) and companies (IBM, Microsoft, Google, Yahoo!, Facebook, Philips, Sony, Nokia, etc.) worldwide
Current Organizers: Mark Huiskes,
Bart Thomee and Michael Lew In 2019: Special issue on Deep Learning in Image and Video Retrieval. International Journal of Multimedia Information Retrieval, CFP. In 2015: Special issue on Concept Detection with Big Data. International Journal of Multimedia Information Retrieval, 4(2), 2015, weblink. In 2013: Special issue on visual concept detection in the MIRFLICKR/ImageCLEF benchmark. Computer Vision and Image Understanding 117(5): 451-452 (2013). In 2012, the MIRFLICKR-1M collection will be used in ImageCLEF 2012 for the photo annotation and retrieval task. Please take a look at the Photo Annotation task description for further details. In 2011, the MIRFLICKR-1M collection will be used in ImageCLEF 2011 for the visual concept detection and annotation task. Please take a look at the Photo Annotation task description for further details.
In 2010, the MIRFLICKR-25000 collection will be used in ImageCLEF 2010 for the visual
concept detection and annotation task. Please take a look at the Photo Annotation
task description for further details. In 2009, the MIRFLICKR-25000 collection will be used in ImageCLEF 2009 for the visual concept detection and annotation task. Please take a look at the Photo Annotation task description for further details. Introduction Copyright Tags EXIF Annotations Download Publications Extension
|
|||||||
Introduction |
The MIRFLICKR-25000 open evaluation project consists of 25000 images downloaded from the social photography site Flickr through its public API coupled with complete manual annotations, pre-computed descriptors and software for bag-of-words based similarity and classification and a matlab-like tool for exploring and classifying imagery. We are doing our best to make sure that MIRFLICKR will be:
MIRFLICKR-25000 is an evolving effort with many ideas for extension. So far the image collection, metadata, annotations, descriptors and software can be downloaded below. If you enter your email address before downloading, we will keep you posted of the latest updates. |
||||||
Copyright and Licenses |
Although most images on Flickr are published with all rights reserved, there is also a large number of images offered under Creative Commons copyright licenses. The Creative Commons attribution licenses allow for image use as long as the photographer is credited for the original creation. Possibly, use is granted under additional restrictions, but none of these preclude the use of the images for benchmarking purposes. While compiling the MIRFLICKR-25000 collection we have made sure only Creative Commons images were included and took care to collect as much information possible about the creators of the image. The creator information as well as the exact license type and image title are collected in image license metafiles, which are distributed together with the images. We would like to take the opportunity here to express our gratitude to the image photographers for allowing us to use their pictures: we greatly appreciate this and gladly acknowledge your work. Your names and license details are also listed in this credit document. Please let us know if you have special wishes on how you would like to be credited or have additional details that must be incorporated. |
||||||
Flickr Tags |
One of the great attractions of Flickr is the platform it offers its users to search and share their pictures based on image tags. We also supply these image tags in two forms: the raw form in which they are obtained from the users and in processed form with raw data cleaned up (a bit) by Flickr. For retrieval research we are mainly interested in concrete visual concepts. The most common tags of this type are listed below (colors, seasons and place names were left out): The average number of tags per image is 8.94. In the collection there are 1386 tags which occur in at least 20 images. Most tags are in English, but foreign terms occur as well. |
||||||
EXIF |
EXIF (Exchangable image file format) metadata represents a number of properties and settings of the digital camera at the time of taking a picture. This includes information on:
Flickr separates the EXIF data from the images: the information is no longer embedded in the image files! For about 85% of the images in the collection, EXIF data are available and permission is granted by the creator to access this data through the API. For these images we have collected the data (with exception of binary data such as for thumbnails) and made them available in plain text files. Note that even when EXIF data was collected, not all fields are always present. The table below shows the possession for a number of common fields. EXIF geolocation fields are particularly scarce and are available for only 152 images. |
||||||
Annotations |
The annotation scheme has been set up in a way to make it easy to extend it with new keywords without having to go through all 25000 images again. This is possible by stepwise refinement along two dimensions:
|
||||||
Download |
Please proceed to the download page.
(has both 25K and 1M) |
||||||
Publications |
If you use the MIRFLICKR-25000 image collection in your work, please cite: M. J. Huiskes, M. S. Lew (2008). The MIR Flickr Retrieval Evaluation. ACM International Conference on Multimedia Information Retrieval (MIR'08), Vancouver, Canada (bib)
|
||||||
Extension |
The MIR Flickr collection has been extended in two ways. First, the number of images has been extended to 1 million images. Second, we now supply a number of content-based visual descriptors for the entire new set of images. The new images are obtained in the same way as the original images. All images are made available under a Creative Commons Attribution Licence. To obtain high quality photography, the images are also selected based on their Flickr interestingness score. Note that the new images are not manually annotated like the core set of 25000 images, but all original Flickr user tag data, as well as the EXIF metadata, are again made available. The content-based visual descriptors that are supplied for the new images are the MPEG-7 Edge Histogram and Homogeneous Texture descriptors, and the ISIS Group color descriptors. All original images are made available through BitTorrent.
Since, for many, the full collection may prove too large to download,
we also provide 64x64 pixel jpeg-thumbnails. For further details, see
the download page.(has both 25K and 1M) The extension is described in: M. J. Huiskes, B. Thomee, M. S. Lew (2010). New Trends and Ideas in Visual Concept Detection. ACM International Conference on Multimedia Information Retrieval (MIR'10), Philadelphia, USA (bib) |