Publishing Health Records
Leveraging the power of XML data validation

In Europe, hospitals are required to submit health records of their patients to the government through an institute. Proper submission of these records helps the government deliver benefits to patients with medical insurance and understand the medical conditions of the patients in their country.
The problem with the submission system
The hospital recorded all its patient information in a *.csv document to the institute’s server. The .zip file that was shared with the institute contained records of thousands of patients. If even one record was incorrect, the institute had to reject the entire package.
For example, some of the entries in the “date” field were either invalid or inaccurate, such as ‘Last Sunday’ or the ‘32nd March’. There was no system in place to check whether the data were valid.
This would put the responsibility on the hospital to sort through every entry and ensure that it is within the appropriate parameters. Correcting this content was a time- and labor-intensive task for the hospitals.
The government institute began to look for a solution because it was their responsibility to ensure that the government received correct information from all the hospitals. Upon doing their research, they came across XML and thought its validation power could help cure the data.
Reigning in the stakeholders
The project involved many stakeholders, such as hospital administrators, data analysts, and lawyers from the institute and the government. The scope of XML created an ambition among stakeholders to optimize their work.
Each stakeholder had different requirements based on their role in the process. For example, the data entry executive from the hospital would want the schema organized in a manner that directed the users to correct any incorrect data entered by them. The lawyers were concerned about the legal requirements around processing, storing, and sharing data. The data analysts were focused on checking criteria like the minimum number of visits needed for refunds. Due to these varying requirements and interests, each hospital had a unique opinion about the model they would like to provide to match their systems.
Therefore, the solution had to be approached through the following steps:
- Define the information model to bring all the stakeholders into the picture.
- Derive the XML schema based on the information model.
- Determine what data needs to be entered by the hospital to ensure that the necessary information was submitted.
- To accommodate the different requirements of stakeholders, a small ‘transformation’ was provided, enabling hospitals to provide data the way they wanted to, but at the same time, ensuring that all the legally required information was included.
Implementation of XML Schema
The hospital database stores data as *.csv files in a flat, tabular structure. XML documents, on the other hand, are hierarchical, which means that some entries are subsets of other entries.
In this project, the data that was first entered in a flat structure had to be transformed into a hierarchical structure and later fed into a database with a flat structure again. But the process of loading the XML directly into the database was riddled with issues. So, an intermediate, generic XML was created to mimic the *.csv structure. This made it easy for the XML data to be loaded into the database.
What was the value?
Now, there are several hospitals that no longer need to sort through thousands of entries when the large .zip file is rejected by the server. They know exactly which record is incorrect and have the means to correct it without spending several man-hours.Hospitals that require data validation for other projects can now use this model to verify all their projects. Moreover, there is scope for expanding the business rules for validation to further improve data quality.
What can publishers take away from this?
In this project, XML was chosen specifically for its validation powers. Publishers are in the business of validating information and making it accessible to their readers. Therefore, publishers need to ensure that their data is valid and accurate.Much like the government institute in this episode of XML Stories, journal publishers also stand to gain from XML. In the case of journals, there is a need to ensure that the dates and references in articles are correct. For example, when a chemical substance is referenced in a journal from an existing list, the power of validation can ensure that the references are accurate, thereby creating richer metadata that makes content easily accessible and visible to the right people.

Click here to watch the full story.