9th Extended Semantic Web Conference

From Brede Wiki
Jump to: navigation, search
Event (help)
9th Extended Semantic Web Conference (ESWC 2012)
Location: Heraklion Greece Map
Date & time: 2012-05-27 – 2012-05-31

Extended Semantic Web Conference


SePublica 2012

Link: http://2012.eswc-conferences.org/
Search: DuckDuckGo Google Bing

9th Extended Semantic Web Conference (ESWC 2012) is the 2012 version of Extended Semantic Web Conference held in Heraklion (Hersonissos) on Crete in Greece.

Among the workshops [1] was SePublica 2012, and there was panel titled "Semantic Technology and Big Data". The conference was recorded by Videolectures.

The ESWC 7-years most influential paper award went to Towards semantically-interlinked online communities that described the SIOC.

The proceedings are published by Springer http://www.springerlink.com/content/978-3-642-30283-1

Elena Simperl was General Chair in the organizing committee.


[edit] Keynotes

Julius van de Laar 
Digital media strategist Julius van de Laar [2] has the "preconference" keynote "From storytelling to micro-targeting strategies". He spoke about the online campaigning for the 2008 US President election. Arguing "political campaigning = retail marketing" and mentioning the "National Voter File".
Abraham Bernstein 
In his presentation "Should we throw the Semantic in the garbage can" he mentioned localization of user interfaces, "Hofstede's culture dimensions", smurfing in financial institutions with graph-flow pattern detection, Hexastore, Garbage Can Theory (Garbage Can Model), CrowdLang (low cost crowdsourced translation of reasonable quality)[1]
Jeroen van Grondelle 
He was from the "be informed" company and presented "New audiences for ontologies". van Grondelle described a system that is also described in the paper A knowledge infrastructure for the Dutch Immigration Office and Acquiring and modelling legal knowledge using patterns: an application for the Dutch immigration and naturalisation service from ESWC 2010.
Monica Lam 
From Stanford University spoke about egocentric social network with their project Musubi, which works on mobile phones in a peer-to-peer way with little dependence on server services. The system has Apps too. She also showed a custom search engine that would use emails and Twitter.
Márta Nagy-Rothengass 
from DG Information Society and Media, European Commission presented "Data value chain in Europe". Related ICT calls in 2013: "Content analytics and language technologies", "SME initiative on analytics" "Scalabale data analytics".
John Dominigue 
Gave the dinner speech.
Alon Halevy 
from Google presented "Bringing (Web) databases to the masses" where he described Google Fusion Tables and WebTables.
Aleksander Kołcz 
From Twitter presented "Large scale (machine) learning at Twitter". At Twitter he might be using machine learning for information relevance ranking, "who to follow" and topics. He believes simple machine learning models are good for Big Data. Mentioning tools used by Twitter Apache Mesos, Hadoop, Mallet, Zookeeper, Mahout, Cassandra, Apache HBase. Also mentioned: Apache Pig and Scalding. Kołcz showed online learning with stochastic gradient descent trained on different subsets and classifier committees. Showed sentiment analysis with training on emoticons as labeled and logistic regression over characters of 4grams and with data sets on 1 million, 10 million and 100 million tweets. He also showed topic modelling with PigML for processing and Mahout Latent Dirichlet Allocation. He would like to anchor LDA results for interpretability with "Labeled LDA". Kołcz also mentioned the issue of Twitter spam and showed that the geographic pattern are different between spammers and normal users. The topic he spoke about is in http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf

[edit] Papers

[edit] Orals

Automatic identification of best answers in online inquiry communities 
Inquiry communities such as Stackoverflow (April 2011 datasets) and SAP Community Network Forums. Identify features (19 features + 23 extended features): user features (reputation, number of best answers, normalised activity entropy), content features, thread features. Feature ranking with information gain (IG). Finds an F1 accuracy on 0.83-0.87
Characterising emergent semantics in Twitter lists 
(Page 530 in the Proceedings) Focus on the lists that Twitter users can create. "Lists where the NASDAQ user is a member grouped by number of subscriptions". Used the Vector Space Model and Latent Dirichlet Allocation, similarity based on WordNet. The dataset was based on over 200'000 lists.
Crowdsourcing taxonomies 
Describes a system for constructing a taxonomy from individual "votes" on ancestor-descendant relations. Dataset: ACM Computing Classification System, 245 nodes, 102 students annotating ancestor-descendant relationships.
Linked Data-based concept recommendation: comparison of different methods in open innovation scenario 
Presented by Danica Damljanovic. Uses hyProximity, Zemanta, DBpedia, Random indexing, adWords, ...
Finding co-solvers on Twitter, with a little help from Linked Data 
Presented by Milan Stankovic. Finding collaborators: complementary competences, interest similarity, social similarity.
An approach for named entity recognition in poorly structured data 
Named entity recognition with conditional random fields, Virtual International Authority File (VIAF)[2] and features such as personFirstName, ...,posNoun (this and following based on WordNet, posVerb, isCapitalized, isUppercased, capitalizedFrequency, token, tokenLength, startOfElement, endOfElement, dataElement. Evaluation data from Europeana: 120 records containing 584 references to persons, ... The system was compared to OpenNLP and Stanford Named Entity Recognizer which also uses conditional random fields. Cross-validation was used with precision, recall and F1 measures.
Generating possible interpretations for statistics from Linked Open Data 
Presented by Heiko Paulheim where their simple feature generating toolkit FeGeLOD,[3] named entity recognition and Linked Open Data were used together with correlation analysis and rule learning to generate hypotheses across a number of variables. The first data set used was Mercer Quality of Living in 216 cities world wide. The second dataset was from Transparency International. http://www.ke.tu-darmstadt.de/resources/explain-a-lod
Product customization as Linked Data 
The presenters was from Renault and and Linked data to represent configuration of cars (color, gears, ABS, ...), also considering the constraints in the configuration. A configuration ontology is available: http://purl.org/configurationontology
Green-Thumb Camera: LOD application for field IT 
Plant recommendation agent available on user's smartphone. Smartphone sensors give illuminance and GPS information so they can determine temperature, ...
The current state of SKOS Vocabularies on the Web
Supporting Linked Data production for cultural heritage institutes: the Amsterdam Museum case study 
data.europeana.eu. XML ingestion (OAI), ClioPatria (XMLRDF, Amalgame). Tools applied on a case study with Amsterdam Museum (formerly Amsterdam Historic Museum) that in March 2010 published their whole collection online which was 73.000 objects CC-licensed comprising 256 MB in nested XML for object metadata, and a 27.000 concept thesaurus on 9 MB and lastly 67.000 persons (Person Authority File). They translated the XML to rough RDF and then refine the RDF. Mapping to Europeana Data Model (Dublin Core, SKOS, RDA Group 2 elements, OAI-ORE, EDM-specific). The did alignment with Amalgame to GeoNames, VIAF, ...
From Web 1.0 to social semantic web: lessons learnt from a migration to a medical semantic wiki 
Describes a system for medical guidelines based on Semantic MediaWiki. They have a spacial "travail" workspace for draft documents.
LODifier: generating Linked Data from unstructured text 
presented by Isabelle Augenstein. Their pipeline contains tokenizer, named entity recognition (Wikifier[4]) parsing with C&C using combined categorical grammar, lemmatization, word sense disambiguition with UKB, deep semantic analysis with Boxer. "bag-of-URI"
Semi-automatically mapping structured sources into the Semantic Web
Presented by Craig A. Knoblock. His system tries to predict the semantic type of the columns and uses conditional random fields and tries to infer relationship between the columns using concepts such as Steiner tree and Steiner nodes. Works on data related to PharmGKB, ABA, KEGG Pathway, UniProt. More information is available at http://www.isi.edu/~knoblock and software under Apache license https://github.com/InformationIntegrationGroup/Web-Karma-Public
Enhancing OLAP analysis with Web Cubes 
Presented by Lorena Etcheverry describing a system that combines ideas from OLAP and the Semantic Web. Their contributions are RDF for multidimensional model ("The Open Cube Vocabulary"), SPARQL for multidimensional models. OLAP operations: Roll up, Slice, Dice. Related work are the RDF Data Cube Vocabulary. What is the relationship to SCOVO? Why are the vocabulary not match to SCOVO? The Open Cube vocabulary was not available at the time of the presentation.
Exchange and consumption of huge RDF data
Presented by Mario Arias. Propose a binary format for RDF data: Header Dictionary Triples (HDT) and uses dictionary mapping. Showed performance on DBpedia (258 million triples), GeoNames, dblp, and LinkedMKB. They can store the DBpedia in 5-6 GB. More information is available from http://www.rdf-hdt.org. There is another project that consider compression of triples.
Unsupervised learning of data linking configuration 
Wants to link data items: instance matching, e.g., "Bill Clinton" with "Clinton, W.J.". They use "pseudo-precision" and "pseudo-rall" and "pseudo-F-measure" fitness functions and a genetic algorithm. Evaluated with the Person/restaurant OAEI 2010 data set and New York Times OAEI 2011 data sets. The main idea is that items between two ontologies/data sets should match one and only one item.
Graph kernels for RDF data
Describe a system that could be used to, e.g., predict links in RDF graphs or do property value prediction with kernel machines.
Exploiting information extraction, reasoning and machine learning for relation prediction 
Represent the RDF graph with a matrix representation where subjects are rows and (verb-object) is columns. Uses probabilistic latent factor model and performs an experiment on gene-disease-relationships using LOD's Liked Life Data and BIO2RDF with 2462 genes and 331 diseases.

[edit] Posters

SciNet: augmenting access to scientific information 
by Tuukka Ruotsalo et al. from Helsinki Institute for Information Technology presented a PDF reader on tablet computer linked to the millions of scientific papers available in their university repository. They monitored how the reader navigated the document and uploaded the information to a server.
Named entity disambiguation using Linked Data 
by Danica Damljanovic and Kalina Bontcheva used ANNIE from GATE with Large Knowledge Gazetteer (LKB) for named entity recognition.

[edit] Reference

  1. CrowdLang - first steps towards programmable human computers for general computation
  2. VIAF (Virtual International Authority File): Linking Die Deutsche Bibliothek and Library of Congress Name Authority Files
  3. Unsupervised feature generation from Linked Open Data
  4. Learning to link with Wikipedia
Personal tools