Modeling user reputation in wikis
|Modeling user reputation in wikis|
|Authors:||Sara Javanmardi, Cristina Lopes, Pierre Baldi|
|Citation:||Statistical Analysis and Data Mining 3 (2): 126-139. 2010 April|
|Web:||Bing Google Yahoo! — Google PDF|
|Article:||BASE Google Scholar PubMed|
|Restricted:||DTU Digital Library|
|Extract:||Talairach coordinates from linked PDF: CSV-formated wiki-formated|
The researchers wanted to estimate the reputation of a user at a specific time R_i(t) as a value scaled between 0 and 1.
The regard "tokens" as either good-quality or poor-quality. A good-quality token is a token that "is present after the invention of the admin" (page 4). Then the consider the number of tokens inserted until time t for author i: N_i(t) and the number of good-quality tokens for author i inserted at time i: n_i(t).
The consider 3 different models:
- The fraction of good-tokens insert to all tokens inserted for a user.
- The first model extended so that quick deletions are weighted more a poor-quality tokens.
- The second model extended with the reputation of the user deleting the token.
There is only one parameter in the model: the decay parameter for the exponential decay for how "quick" the deletion occurs.
They use MD5 signature to compare revisions (page 11) and a algorithm by P. Heckel for diffs: A technique for isolating differences between files. Their developed tool is called Wikipedia Event Extractor and was/is(?) publicly available:
- http://mondego.calit2.uci.edu/WikipediaEventExtractor/ (link apparently no longer working)
crawler4j crawled the English Wikipedia to download 1.9 million articles and their revisions in the summer 2009.
Properties of the dataset:
- 124 million revision.
- 83 million by anonymous users
- 41 million by registered users
- 12.8 million users.
- 1.7 million registered users
- 11 million anonymous users
- Admins on average submits 11% of the revisions of a paer.
 Related papers
- A content-driven reputation system for the Wikipedia
- A utility for estimating the relative contributions of wiki authors
- Computing trust from revision history
- Evaluating authoritative sources in collaborative editing environments
- Investigations into trust for collaborative information repositories: a Wikipedia case study
- Measuring article quality in Wikipedia: models and evaluation
- Mining revision history to assess trustworthiness of article fragments
- Modeling trust in collaborative information systems
- Structuring wiki revision history
- Wikirep: digital reputation in virtual communities
- What is a "token" precisely?
- It is unclear why there is a minus subscript in equation 1.