Jia-Yu Pan's Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category •

Automatic Multimedia Cross-modal Correlation Discovery

Jia-Yu Pan, Hyung-Jeong Yang, Christos Faloutsos, and Pinar Duygulu. Automatic Multimedia Cross-modal Correlation Discovery. In Proceedings of the 10th ACM SIGKDD Conference, 2004.
Seatle, WA, August 22-25, 2004

Download

[PDF]135.4kB [gzipped postscript]468.1kB

Abstract

Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph-based approach, "MMG", to discover such cross-modal correlations.
Our "MMG" method requires no tuning, no clustering, no user-determined constants; it can be applied to any multimedia collection, as long as we have a similarity function for each medium; and it scales linearly with the database size. We report auto-captioning experiments on the ``standard'' Corel image database of 680 MB, where it outperforms domain specific, fine-tuned methods by up to 10 percentage points in captioning accuracy (50\% relative improvement).

BibTeX Entry

@InProceedings{KDD04CrossModalCorrelation,
  author =	 {Jia-Yu Pan and Hyung-Jeong Yang and Christos Faloutsos and Pinar Duygulu},
  title =	 {Automatic Multimedia Cross-modal Correlation Discovery},
  booktitle =	 {Proceedings of the 10th ACM SIGKDD Conference},
  year =	 2004,
  wwwnote =	 {Seatle, WA, August 22-25, 2004},
  abstract = {Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph-based approach, "MMG", to discover such cross-modal correlations. <br>
Our "MMG" method requires no tuning, no clustering, no user-determined constants; it can be applied to <i>any</i> multimedia collection, as long as we have a similarity function for each medium; and it scales linearly with the database size. We report auto-captioning experiments on the ``standard'' Corel image database of 680 MB, where it outperforms domain specific, fine-tuned methods by up to 10 percentage points in captioning accuracy (50\% relative improvement).
},
  bib2html_pubtype = {Refereed Conference},
  bib2html_rescat = {Multimedia Data Mining},
}

Generated by bib2html (written by Patrick Riley ) on Wed Sep 01, 2004 13:24:30