Linking Wikipedia to the Web

From Brede Wiki
Jump to: navigation, search
Conference paper (help)
Linking Wikipedia to the Web
Authors: Rianne Kaptein, Pavel Sedyukov, Jaap Kamps
Citation: Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval  : 2010
Editors:
Publisher: ACM, New York, NY, USA
Meeting: SIGIR-10
Database(s):
DOI: 10.1145/1835449.1835642.
Link(s): http://riannekaptein.woelmuis.nl/2010/kapt-linking10.pdf
Search
Web: DuckDuckGo Bing Google Yahoo!Google PDF
Article: Google Scholar PubMed
Restricted: DTU Digital Library
Services
Format: BibTeX

Linking Wikipedia to the Web reports on an experiment with predicting external links on Wikipedia, i.e., given a Wikipedia article predict which external Web pages will be appropriate for the external link section.

Contents

[edit] Data

The authors constructed a test set by removing the existing links in Wikipedia and using these links as the ground truth. The data is reported to be available at:

http://staff.science.uva.nl/~kamps/effort/data

But as of April 2011 the data was still not available.

The data set is reported to contain "53 topics with 84 relevant home pages". (It is not entirely clear if "topics" here each represent an Wikipedia article).

The 50 million Clueweb category B Web pages [1] was used as the test collection.

[edit] Method

They used

  • anchor text index and full text index.
  • URL class priors
  • Document priors
  • Krovetz stemmer.
  • Dirichlet document smoothing
  • Delicious

They used the Indri toolkit.

[edit] Results

The highest Mean Reciprocal Rank obtained was 0.7119. The value was when using the anchor text index, anchor length and URL class document priors and a combination with Delicious.

[edit] Related papers

  1. Collaborative knowledge management: evaluation of automated link discovery in the Wikipedia
  2. Discovering missing links in Wikipedia
  3. Evaluation of automatic linking strategies for Wikipedia pages

[edit] Critique

  1. They seem only to have 53 Wikipedia articles in the test set. Given that they report 13 results with different variations of the algorithm there might be selection bias in the results.
Personal tools