There are different dimensions to support the assessment and benchmarking of the social value of open data initiatives. We propose a methodology that compares and evaluates open data social value according to a spectrum of measures going from intensional completeness to subjective meaning. We first suggest that open data made available online by an organization can be modelled in terms of the corresponding integrated conceptual schema, as a uniform construct. Then, a global schema is created with the integrated schemas, and intensional as well as extensional social value on data can be defined over such conceptual schemas.
Valuable information is then extracted from queries based on such constructs, and which may result useful in the different contexts and related needs that users may experience in the domain of health. In this way, we propose to compare and measure the social value of different open data initiatives, as it results from the analysis of the information that can be modelled and extracted from their conceptual schemas, from the quality of their instances, and from the subjective perception of their valuable information in different contexts and for different needs.
Phases to the Open Data Social Value Methodology
Data selection. The selection of a domain of interest and of the related open datasets available online for that domain. In particular, the main sources of datasets may be the institutional and governmental portals specialized in government-to-citizens communication.
Integrated Schema Creation. The creation of an Conceptual Integrated Schema (IS) that groups datasets according to a criterion (e.g., all the datasets relative to nationwide hospitals, having the entity hospital in common). Such schema should represent all the available domain entities and relationships for the group of datasets under exam. For example, in the hospital domain, if a dataset contains the list of hospitals of a region with their address and another dataset contains the same list with the diagnosis on patients grouped by pathology, the relative conceptual schema for this group of datasets will contain the entities “Hospital”, “Patient hospitalization” and “Pathology”, which may for example be pairwise related by the “deals with” and “due to” relations, respectively.
Global Schema Construction. The construction of a global schema (GS) as the result of the integration of all the ISs produced in the previous step. This schema is an overview of all the available information pieces for a particular domain, represented by a selected group of datasets.
Schema Extension and Addition. A further step would include the addition of new concepts as they emerge from interviews with domain experts, with all the users of domain services (e.g. patients, stakeholders, tax payers, and so on), as well as from the literature studies on the matter, to extend the Global Schema.
Intensional Social Value Measurement. For each IS, a “coverage” measure of concepts of GS by the concepts of each IS can be defined. For example, an intensional social value measure of one IS schema could be expressed as the ratio between the total entities in an IS and the total entities of a GS.
Extensional Social Value Measurement. A measure on “extensional social value” of each IS can be defined, for example, as the degree of completeness, accuracy and timeliness of the instances contained in each dataset modelled within the IS and compared, for example, against the GS instances (or with a gold standard on hospitals, if any).
Context-Need-Information or Query Formulation. Finally, a measure of the “user perceived social value” of the information can be defined as a set of contexts (that are intended as user scenarios) and needs that could be devised for the domain of interest (e.g. the sudden stroke of a relative and the need to handle the emergency) and the related information that could be judged valuable for such particular contexts and needs (e.g., the information of the geo-referenced position of hospitals on a map, in order to detect the nearest one).
User Perceived Social Value Measurement. For each of the schemas and for each individual context-need-information query, check whether the information can be retrieved by single ISs or by the GS. A weight of importance can be then assigned to the queries on the basis of the users’ perception in different scenarios of the same information.
Final Social Value Measurement. The output of the methodology should be a final rank of the social value measurement based on intensional social value, extensional social value, and perceived social value for each group of open data, through the uniform construct of their IS, GS and related queries (e.g. information pieces).
The next figure shows a workflow of the methodological steps described above.
Figure 1 – The Methodology Workflow.
G.Viscusi, M.Castelli, C.Batini: Assessing Social Value in Open Data Initiatives: A Framework. Future Internet 6(3): 498-517 (2014)
Batini, C., Locoro., A. (2015). Putting Open Data to the Test of Life: Conceptual Schemas as a Means to Compare and Measure Social Value. Accepted at SEBD 2015.
Cabitza, F., Locoro, A., Batini, C. (2015). A User Study to Assess the Situated Social Value of Open Data in Healthcare. Accepted at HCIST 2015.