Wiki Workshop 2016 at ICWSM
|Wiki Workshop 2016 at ICWSM|
|Date & time:|
|Search:||DuckDuckGo Google Bing|
 Invited talks
Quick notes not necessarily completely correct.
Wikipedia Text Mining — Uncovering Quality and Reuse
- Improve low-quality content
- Maintain high-quality content
- There is quite a lot of research on classifying articles into featured-or-not. Although academic interesting it does have little practical importance.
Anderka 2012 counts 445 types of quality flaws in the English Wikipedia. Verifiability is the most frequency quality flaws. Potthast talk about a one-class estimation of quality flaws. operating point on the precision-recall cure based on F_0.2.
Wikipedia text reuse detection (sometimes a euphemism of plagirism).
Text reuse: Quotation, boilerplate, translation (metaphrase, paraphrase), summarization. All of these could be plagiarism. See also Potthast 2011.
Algorithm: Keyword extraction, source retrieval, 4-gram match, knowledge-based post-processing.
Question: Why is the quality of Wikipedia so high? Potthast attempted an answer: Errors can be fixed and there are more people that correct errors than make them.
Gender Inequalities in Wikipedia
- How are notable men and women presented in Wikipedia?
- How are professions described on Wikipedia?
- "Human accomplishment"
- Pantheon (11000 individuals (13% women)
How are wo/men depicted in Wikipedia?
- Linguistic bias, abstract positive and abstract negative. Men is described with more positive words.
- Structual issues.
- How does articles about wo/men link to wo/men.
- Men more well-connected
Pictures in Wikipedia on pages about occupations, e.g., journalist.
Emergent Work in Wikipedia
"Emergent roles": all-round contributors, quick and dirty editors, copy editors, layout shapers, vandals, ...
- Wiki work manual annotation (remove vandalism, insert vandalism, ...)
- 90 articles.
- 13'592 reliable annotations
- Trained a model to predict the manual annotation
Applying Social Network Analysis Metrics to Large-Scale Hyperlinked Data
Linton C. Freeman, Centrality in social networks conceptual clarification: Different network centrality measures.
"Graph" and "network": "networks" are "graphs" with meaning.
A Hitchhiker's Guide to Ontology
Spoke about YAGO:
- Extraction from Wikipedia
- Extraction from GeoNames and WordNet.
- Intermediate extractor
- Clean facts
- Ensuring "high quality" (95%)
- 10 language
- 100 relations
- 100 million facts
- 10 milion entities
- Used by DBpedia
Extension with products based on product ID.
- AMIE "is based on an efficient in-memory database implementation
- For instance, type(x, pope) => diedIn(x, Rome)
- Adding fake facts
- Removal of facts (ISWC 2011)
Mining Le Monde
- Enriching Le Monde from YAGO.
- Statistics on people mentioned in Le Monde
His slides were made with Powerline using his own font.
- Semi-Supervised Automatic Generation of Wikipedia Articles for Named Entities
- ENRICH: A Query Expansion Service Powered by Wikipedia Graph Structure
- Similar Gaps, Different Origins? Women Readers and Editors at Greek Wikipedia
- Wiki Editors' Acceptance of Additional Guidance on Talk Pages
- What Can Wikipedia Tell Us about the Global or Local Character of Burstiness?
- State of the Union: A Data Consumer's Perspective on Wikidata and Its Properties for the Classification and Resolution of Entities
- Literature, geolocation and Wikidata
- Graph-based breaking news detection on Wikipedia
- A proposed solution for discovery of reusable technology pictures using textmining of surrounding article text, based on the infrastructure of Wikidata, Wikisource and Wikimedia Commons
- Extracting Semantics from Random Walks on Wikipedia: Comparing Learning and Counting Methods
- In Wikipedia We Trust: A Case Study
- Wikipedia Knowledge Graph with DeepDive
- Hidden Gems in the Wikipedia Discussions: The Wikipedians' Rationales
- Topical Interest and Degree of Involvement of Bilingual Editors in Wikipedia
- On the Reliability of Information and Trustworthiness of Web Sources in Wikipedia
- Collective Remembering in Wikipedia: The Case of Aircraft Crashes
- The Political Salience Dynamics and Users' Interaction Using the Example of Wikipedia within the Authoritarian Regime Context
- WikiLayers – A Visual Platform for Analyzing Content Evolution and Editing Dynamics in Wikipedia
- Cultural Relation Mining on Wikipedia: Beyond Culinary Analysis
- Discovery and efficient reuse of technology pictures using Wikimedia infrastructures. A proposal