International Semantic Web Summer School in Bertinoro
This summer I had a very nice opportunity to attend to the International Semantic Web Summer School (ISWS). The school took 1 week and was mainly organized for Phd students, young professionals and graduates who are interested in topics like Semantic Web and surrounding technologies (e.g Natural Language Processing, Machine Learning, Ontology Design, Blockchain). My master thesis topic was 'Usage of Linked Open Data in Content-Based Recommender Systems for Real World E-Commerce', which I've written under the supervision of Prof. Harald Sack (Karlsruhe University) who is one of the directors of the summer school. Thus, I decided to take a chance and to refresh my knowledge about the semantic topics, to network with researchers from the area and at the same time to represent the SpringerNature company by the target audience. As an outcome I've got very special experience, a lot of lessons learned, great socializing and two awards for and.
This April there was an open Semantic Web SN Hack Day with the topic, which was kindly organized by our SciGraph team (Arian Celina, Markus Kaindl, Sebastian Bock and Michele Pasin). I had a chance to attend to this event and learned more about the sn scigraph and our semantic tools and infrastructure. The project idea of my team, was to build a Google Chrome plugin, which will enrich a springerlink article page with semantic context:
- the description of the article is analyzed and semantic entities are highlighted. By mouseover a Wikipedia abstract appears
- fulfill the article with scigraph information (link to the SciGraph connections graph)
- fulfill the article with unsilo information about related articles
- fulfill the article with dimensions information about related categories
Our team was lucky to win the price for the most innovative project idea for this project.
Since I'm responsible for the Springernature.com webpage, I've used some of our 10% time hack days to implement similar functionality for www.springernature.com blog pages. The resulting prototype is able to extract semantic entities and to retrieve semantic categories from them. In the next steps, I need to identify the most relevant categories, which can be then used as automatic tagging functionality.
Preparations for the Summer School
Every participant has to make an A1 poster about his research or working topic. My poster was based on my master thesis and was about 'Semantic Recommender System for Scientific Publishing', where I described my ideas about an approach for semantic content-based recommendations for Springer Nature publications and books. Many thanks to Markus Kaindl, Michele Pasin and for the review and very valuable improvements.
To my knowledge currently we use collaborative filtering (the user who bought an item A, bought also an item B) for our recommendations. But in scientific domain it is not so feasible to recommend to a researcher items which were purchased by another researcher since every researcher has his own unique interest area. My idea was to use semantic connections between the products in order to calculate how similar/related products to each other and give recommendations based on that. As data sources I suggested to use on one hand dbpedia (semantic, machine interpretable version of Wikipedia), SN SciGraph and third party datasources (e.g Dimensions, Unsilo). DBpedia had already a project with SN SciGraph team to interlink these datasets together (s. interlinking of Springer Nature’s SciGraph and DBpedia datasets)
This approach has several advantages compared to the collaborative filtering:
- a cold start problem could be avoided. Which items should we recommend to the new user, who didn't purchased anything? How should the new item be recommended if nobody purchased it?
- the discoverability of our products can be improved since even items which are rarely purchased will be recommended if they are somehow related to the topic
- the user experience could be improved since the provided recommendations are better from the point of view of novelty and serendipity (the occurrence and development of events by chance in a happy or beneficial way)
- the reasons for the particular recommendation can be provided
International Semantic Web Summer School
The conference was held from 1 till 7 July in the University Residence Center in a small Italian province Bertinoro. The residence center is a 1000 years old and very beautiful castle, which is used for conferences and training courses. I was one of 60 selected participants and had an honor to represent the Springer Nature company. Some statistics about participants you can see from the image below (37% were female, 73% phd students, mostly from european universities (Fr, It, Ge)). But you can meet people with background from any corner of the world.
The education was done in form of tutorials, keynotes and so called in-depth-pills. Furthermore there was a bigger group work, in which each team worked on a research topic with a paper and presentation as outcome. Overall the summer school aimed to model and represent a researcher’s life in one week, including team work, socializing and deadlines!
Keynotes were given by Enrico Motta about Data Analytics, Marta Sabou about Rigour and Relevance with Design Science, and Sebastian Rudolph about a Logician’s view of the Semantic Web. The keynotes covered more high level concepts and lessons learned. They were very inspiring and gave a good perspective on a researcher’s life.
The tutorials were given by Maria-Esther Vidal and Sebastian Rudolph were about the basics of Reasoning and SPARQL query execution. Claudia d’Amato and Michael Cochez talked about Machine Learning. John Domingue talked about Blockchain and decentralization.
As a third kind-of talk, the summer school offered so called in-depth-pills. Aldo Gangemi talked about ontology design patterns, Marieke van Erp about Natural Language Processing and Marta Sabou about crowdsourcing. Overall, each talk gave a lot of insights and the numerous questions from the audience were answered.
[Thanks to Sven Lieber for the nice summary in his blog.]
Research Task Forces
The students has to work on the same big topic "Validity in Linked Open Data". All participants were divided in teams, so called task forces of 6 students and a tutor. Every team got a specific topic about data validation and a cool team name. The supporting tutor was in the most cases a research or a professor teaching the courses in that particular area. The outcome should be a research paper and a presentation. After the school, the organizers plan to combine all papers in one and publish a big paper about the "Linked Data validity" with 75 co-authors.
As I mentioned above each team got a cool team name. And to assign the name the organizers used a sorting hat like in Harry Potter movies. They used the metaphor of Harry Potter quite often because it was very obvious in that environment. They bound a loudspeaker to the hat and used a speech synthesis using funny jokes to assign a name to the particular team. In a picture below you can see the tutors with the sorting hat on their heads.
The presented topics were:
- Linked Data Validity in a Decentralized Disintermediated World using Blockchains (Tutor: Prof. John Domingue) (Team: Hufflepuff) (my task force)
- What is a definition of validity? Does is apply to a single statement (e.g. a triple) in LOD or a collection of statement? Would it be a general definition, a context/domain dependent definition, or both? (Tutor: Prof. Claudia d’Amato)
- Can we find out whether data is valid or invalid by only looking at the graph itself,e.g., using machine learning to detect anomalies? Anomalies in subgraphs (Dr. Michael Cochez)
- When you go from text to structured data how do assess validity of a piece of information?How do you cope with imperfect systems that extract information from text to structured formats?How do you deal with contradicting or incomplete information? (Tutor: Dr. Marieke van Erp) (Team: The 42's)
- What are exemplary use cases for LOD validity?How to establish validity metrics that are sensible both/either to structure (internal), as well as to tasks, existing knowledge and sustainability (external)? What are the patterns to check for validity (Tutor: Dr. Aldo Gangemi) (Team: Gryffindor)
- Can be LOD validity established using common sense? What is Dr. Valentina Presutti) (Team: The Delorians)common sense in terms of linked open data? (Tutor:
- How to define logical validity using mathematic calculus? Can be different degrees of logical validity? (Tutor: Prof. Sebastian Rudolf) (Team: Dragons)
- Context of validity. Will the validity stay the same? Will it evolve over time? (Tutor: Prof. Harald Sack) (Team: Ravenclaw)
- How to express the degree of validity of a dataset? (Tutor: Prof. Ruben Verbough) (Team: Hobbits)
- Completeness of linked data in case of federated queries. Completeness models for RDF. Federated query engines. (Tutor: Prof. Maria-Esther Vidal) (Team: Jedis)
I was a part of the first team and we have investigated how we can ensure validity in a distributed environment.
Each task force has to prepare a 8 pages research paper, a 10 minutes presentation and a 1 minute funny video. At the end there was an award session for the best work. My team (Hufflepuffs) was awarded for the best research paper and the best presentation.