The MIRFLICKR-25000 Image Collection25000 beautiful Flickr pictures available under Creative Commons licenseOffered by the LIACS Medialab at Leiden University, The Netherlands Main contacts: Mark Huiskes and Michael Lew NEWS NEWS Introduction Copyright Tags EXIF Annotations Download Publications Extension
|
|||||||
Introduction |
The new MIRFLICKR-25000 collection consists of 25000 images downloaded from the social photography site Flickr through its public API. We are doing our best to make sure the image collection is going to be:
MIRFLICKR-25000 is an evolving effort with many ideas for extension. So far the image collection, metadata and annotations can be downloaded below. If you enter your email address before downloading, we will keep you posted of the latest updates. |
||||||
Copyright and Licenses |
Although most images on Flickr are published with all rights reserved, there is also a large number of images offered under Creative Commons copyright licenses. The Creative Commons attribution licenses allow for image use as long as the photographer is credited for the original creation. Possibly, use is granted under additional restrictions, but none of these preclude the use of the images for benchmarking purposes. While compiling the MIRFLICKR-25000 collection we have made sure only Creative Commons images were included and took care to collect as much information possible about the creators of the image. The creator information as well as the exact license type and image title are collected in image license metafiles, which are distributed together with the images. We would like to take the opportunity here to express our gratitude to the image photographers for allowing us to use their pictures: we greatly appreciate this and gladly acknowledge your work. Your names and license details are also listed in this credit document. Please let us know if you have special wishes on how you would like to be credited or have additional details that must be incorporated. |
||||||
Flickr Tags |
One of the great attractions of Flickr is the platform it offers its users to search and share their pictures based on image tags. We also supply these image tags in two forms: the raw form in which they are obtained from the users and in processed form with raw data cleaned up (a bit) by Flickr. For retrieval research we are mainly interested in concrete visual concepts. The most common tags of this type are listed below (colors, seasons and place names were left out):
The average number of tags per image is 8.94. In the collection there are 1386 tags which occur in at least 20 images. Most tags are in English, but foreign terms occur as well. |
||||||
EXIF |
EXIF (Exchangable image file format) metadata represents a number of properties and settings of the digital camera at the time of taking a picture. This includes information on:
Flickr separates the EXIF data from the images: the information is no longer embedded in the image files! For about 85% of the images in the collection, EXIF data are available and permission is granted by the creator to access this data through the API. For these images we have collected the data (with exception of binary data such as for thumbnails) and made them available in plain text files. Note that even when EXIF data was collected, not all fields are always present. The table below shows the possession for a number of common fields.
EXIF geolocation fields are particularly scarce and are available for only 152 images. |
||||||
Annotations |
The annotation scheme has been set up in a way to make it easy to extend it with new keywords without having to go through all 25000 images again. This is possible by stepwise refinement along two dimensions:
![]() |
||||||
Download |
Please proceed to the download page. |
||||||
Publications |
If you use the MIRFLICKR-25000 image collection in your work, please cite: M. J. Huiskes, M. S. Lew (2008). The MIR Flickr Retrieval Evaluation. ACM International Conference on Multimedia Information Retrieval (MIR'08), Vancouver, Canada (bib)
|
||||||
Extension |
The MIR Flickr collection has been extended in two ways. First, the number of images has been extended to 1 million images. Second, we now supply a number of content-based visual descriptors for the entire new set of images. The new images are obtained in the same way as the original images. All images are made available under a Creative Commons Attribution Licence. To obtain high quality photography, the images are also selected based on their Flickr interestingness score. Note that the new images are not manually annotated like the core set of 25000 images, but all original Flickr user tag data, as well as the EXIF metadata, are again made available. The content-based visual descriptors that are supplied for the new images are the MPEG-7 Edge Histogram and Homogeneous Texture descriptors, and the ISIS Group color descriptors. All original images are made available through BitTorrent. Since, for many, the full collection may prove too large to download, we also provide 64x64 pixel jpeg-thumbnails. For further details, see the download page. The extension is described in: M. J. Huiskes, B. Thomee, M. S. Lew (2010). New Trends and Ideas in Visual Concept Detection. ACM International Conference on Multimedia Information Retrieval (MIR'10), Philadelphia, USA (bib) |