Linking Wikipedia to the Web
|Conference paper (help)|
|Linking Wikipedia to the Web|
|Authors:||Rianne Kaptein, Pavel Sedyukov, Jaap Kamps|
|Citation:||Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval : 2010|
|Publisher:||ACM, New York, NY, USA|
|Web:||DuckDuckGo Bing Google Yahoo! — Google PDF|
|Article:||Google Scholar PubMed|
|Restricted:||DTU Digital Library|
Linking Wikipedia to the Web reports on an experiment with predicting external links on Wikipedia, i.e., given a Wikipedia article predict which external Web pages will be appropriate for the external link section.
The authors constructed a test set by removing the existing links in Wikipedia and using these links as the ground truth. The data is reported to be available at:
But as of April 2011 the data was still not available.
The data set is reported to contain "53 topics with 84 relevant home pages". (It is not entirely clear if "topics" here each represent an Wikipedia article).
The 50 million Clueweb category B Web pages  was used as the test collection.
- anchor text index and full text index.
- URL class priors
- Document priors
- Krovetz stemmer.
- Dirichlet document smoothing
They used the Indri toolkit.
The highest Mean Reciprocal Rank obtained was 0.7119. The value was when using the anchor text index, anchor length and URL class document priors and a combination with Delicious.
 Related papers
- Collaborative knowledge management: evaluation of automated link discovery in the Wikipedia
- Discovering missing links in Wikipedia
- Evaluation of automatic linking strategies for Wikipedia pages
- They seem only to have 53 Wikipedia articles in the test set. Given that they report 13 results with different variations of the algorithm there might be selection bias in the results.