In the previous article from Mr. Lubor Illek (Security concerns of Open Data publishing), we’ve outlined some security concerns which apply to publication of Open Data, regarding security of the IT environment of the publisher and security (in the sense of „protection“) of the published data. In this article we describe in more detail what exactly Open Data Node (ODN) does for publishers in that regard.
Separate computing environment
Quoting previous article:
“For example if you use the ODN software, and have it placed in the DMZ or similar security zone, all inputs should already be “cleaned” from what data that has not to be disclosed – such cleaning should not be implemented in the ODN using its data transformations.“
With the current ODN release 1.x that is indeed the main supported method of deployment, placing ODN into separate DMZ in a way similar to how a web-portals are already deployed in organizations. This is a typical and common scenario for which publishers can use ODN on their own, as it is, without difficulties.
But what to do in cases where there are some sensitive data not to be disclosed yet it is desirable to perform cleaning of data in ODN? After all, ODN is primarily intended among other things for data transformations before publication, so there should be a way to do so. With the current ODN releases, such use-cases are not supported by default so the answer is “custom deployment“: Thanks to modular design, ODN deployment can be split to more servers, thus:
- The internal part of ODN (responsible for data transformations, i.e. mainly internal catalogue and UnifiedViews) can be deployed on one server and placed into secure internal environment right next to the internal database server
- The public part of ODN (responsible for publication of Open Data, i.e. mainly public catalogue) can be deployed on second server and placed into publicly accessible DMZ right next to existing web-server
Because the internal part of ODN is pushing data to the public part, the public part does not need to have access to the internal part. So the network layer and other mechanisms can be used to deny access from DMZ to internal parts (as is usual). In this way, the internal part of ODN has access to the full original data (including sensitive parts) and can thus perform data cleansing but during that process:
- sensitive data never leave secure internal environment
- the public part contains only non-sensitive Open Data
- it is not possible to gain access to sensitive internal data via the public part of ODN
Such custom deployment is harder and greatly depends on specific architecture of IT environment from organization to organization so if the publisher is not able to perform it on their own, we can provide commercial support.
Integrity and authenticity of published data
Again, quoting previous article:
“If data publication is for informational purposes only, normally it is sufficient to ensure the protection of integrity at the same level as for web servers or sites of the organisation. The authenticity of the data is achieved at the level of metadata, which means by declaration.“
And again, with the current ODN release that is the main supported method of ensuring integrity and authenticity of published Open Data: to publish “machine readable information” (in this case Open Data) in a way very similar to existing way of publishing “human readable information” (web pages) on web-portals. This is fully supported by ODN and only little additional effort is expected from the publisher:
- monitoring of ODN, to ensure smooth operation
- obtaining and usage of proper SSL certificate to ensure integrity and authenticity of communication between ODN and its users (both internal within publisher organization and external, i.e. data users)
And what to do in cases when more protection is needed? As part of some COMSODE pilots, we’re employing digital encryption and signing to ensure much greater integrity for both “bulk data” (file dumps) and API calls. But the technology in those pilots is to some degree country specific, so those wishing to use such functionality should contact us.
Peter Hanečák is a Senior Researcher and a team leader from the EEA Company. At the same time he is the Open Data enthusiast.