BiOnIC - Catalog of User Interactions with Biomedical Ontologies

About BiOnIC

BiOnIC is a catalog of aggregated statistics of user clicks, queries, and reuse counts for access to over 200 biomedical ontologies. BiOnIC also provides anonymized sequences of classes accessed by users over a period of four years. To generate the statistics, we processed the access logs of BioPortal, a large open biomedical ontology repository. We publish the BiOnIC data using DCAT and SKOS metadata standards. The BiOnIC catalog has a wide range of applicability, which we demonstrate through its use in three different types of applications. To our knowledge, this type of interaction data stemming from a real-world, large-scale application has not been published before.

Download Paper || Download PresentationSlides

Dataset Characteristics

User level features

Ontology level features

We compute the summary statistics for the following user-level features:- total and unique entities in a sequence, and total time of interaction.

We compute the following ontology-level features:- % of classes accessed using WebUI and API, size of the ontologies, and number of unique users for each mode, as well as Spearman correlation between the classes browsed using WebUI and queried using API.

BiOnIC Schema

We used elements from the Simple Knowledge Organization System (SKOS), the Data Catalog vocabulary (DCAT), and the Provenance Ontology (PROV-O) for our vocabulary schema.

Class Statistics Datasets. The bionic:StatDataset class represents datasets that publish the aggregated statistics for each ontology class. The bionic:ClassInfo captures the structural characteristics of the class. The bionic:ReuseCount represents the number of ontologies that reuse a specific ontology class and the type of reuse (IRI, CUI). The bionic:RequestCount represents the total and unique counts of clicks and queries for each ontology class. These RDF datasets can be queried in conjunction with the ontologies and ontology mappings in BioPortal using owl:Class IRIs.

User Interaction Sequences Datasets. The bionic:SeqDataset class represents datasets of user interactions sequences for a particular ontology. The anonymized user identifiers are represented as prov:Agent instances. A sequence of user interactions - captured as an instance of bionic:Sequence - is represented as a list of bionic:SeqEntity instances linked via the bionic:nextEntity properties. bionic:Sequence and bionic:SeqEntity are subclasses of skos:Collection and skos:Concept respectively


Characterizing User Behaviors

We used the BiOnIC WebUI and API sequence datasets to model the browsing behavior of BioPortal users using memoryless Markov chains. We represented the user behavior as a vector, and we clustered these vectors using k-means. We were able to identify seven distinct browsing types, all relying on different functionality provided by BioPortal. For example, Search Explorers extensively use the search functionality while Ontology Tree Explorers mainly rely on the class hierarchy for exploring ontologies. Further, we show that specific characteristics of ontologies influence the way users explore and interact with the website.

Reference: Simon Walk, Lisette Esín-Noboa, Denis Helic, Markus Strohmaier, and Mark A. Musen. 2017. How Users Explore Ontologies on the Web: A Study of NCBO's BioPortal Usage Logs. In Proceedings of the 26th International Conference on World Wide Web (WWW '17), 775-784. DOI

Identifying Structural Exploration and Querying patterns in Ontologies

We used the BiOnIC statistics on user clicks, queries and reuse counts for each class in every ontology, and did not find a significant Spearman correlation between class access and reuse. We also investigated if user browsing behaviors through the BioPortal WebUI and the API correlate with each other. We developed the PolygOnto visualization that exploits the class hierarchy to reveal regions in an ontology where users tend to explore and query more. We observe two types of exploration patterns: i) Triangles: 1 parent -> 2 child classes, and ii) Inverted Triangles: 1 child -> 2 parent classes. We also observe that classes in the lower levels of the class hierarchy are rarely explored or queried by users.

Reference: Maulik R. Kamdar, Simon Walk, Tania Tudorache, and Mark A. Musen. 2017. Visualizing Request and Reuse Data across Biomedical Ontologies. Journal of Web Semantics, (under review). Download

Comparing BioPortal Access Modes and Temporal Influences

We use Fisher's exact test over the BiOnIC aggregative statistics, and multiple hypotheses testing, to investigate if users access certain ontology classes significantly more when compared between different access modes and different time periods. The VisIOn Web application provides fascinating insights in the influence of access modes and time on information retrieval in ontologies. For example, as seen in the word cloud perspective of VisIOn, more users queried the Gene Ontology using the BioPortal API for classes related to pigmentation in 2015, when compared to 2016. Certain classes (e.g., protein transmembrane transporter activity) are requested multiple times using the BioPortal API, but are never requested using the BioPortal WebUI. Moreover, by observing the VisIOn word cloud and the volcano plot perspectives, we can observe the rise of queries for certain classes related to Zika virus and Ebolavirus in several disease ontologies.

Visit VisIOn