An open conversation around open access

Open access

An open conversation around open access

Open access

Open Access Week 2021 stimulated many discussions across the world pertaining to the theme, “It matters how we open knowledge: Building structural equity”. In light of this year’s event, we discuss open access (OA) and open research with

Lisa Walton, Global Publishing Strategy Manager at BMJ, a global healthcare knowledge provider with a wide variety of products and services; 

Bryan Hibbard, Journal Editorial and Production Manager at Society of Petroleum Engineers (SPE), an independent, nonprofit global society focused on the upstream oil and gas industry; and 

Melissa Harrison, former Head of Production Operations at eLife, a non-profit organization that works across publishing, technology, and research culture.  

Where are we now in terms of opening knowledge equitably and what needs to be done? What can publishers do to build a sustainable open ecosystem?

Melissa Harrison: Science is a global endeavor, and it should work to improve the lives of all of humanity. Yet this is not achieved, in part because the current scientific enterprise frequently perpetuates inequalities that exclude people from participating in science or accessing its outputs. A diverse scientific workforce is key to solving the complex problems facing society today, but not everyone is welcomed, appreciated or empowered in science. Discrimination and other inequalities deprive the scientific workforce of many talented individuals. Biases – whether explicit, implicit or systematic – often go unacknowledged and unaddressed. Only a narrow set of perspectives, backgrounds, contributions and career paths are commonly valued and respected, which combined with intense competition for jobs and funding, can make scientific and medical research unhealthy places to work with poor work-life balance.

Open access is just the beginning. To truly democratize scientific outputs we need to go further and build open source tooling and apply things like The Principles of Open Scholarly Infrastructure (POSI) to this infrastructure. The roles of funders and institutions are even more important than publishers in creating a sustainable ecosystem.

Bryan Hibbard: Many of the early open access initiatives focused on the largest players as well as those that were well funded. Requiring an author to pay an article processing charge (APC) to publish OA can be prohibitively expensive for many researchers. While hybrid open access journals still offer a free publishing path, they disadvantage those who cannot afford OA. While from a reader standpoint, the OA movement has increased access, it has not done the same for authors. We need to work towards fair models that cover the cost of publication but give equal access to OA for all authors. 

OA models need to be fair to all and sustainable for both large and small publishers. I have yet to see a sustainable Gold OA option. While many publishers using transformative agreements are dependent on research libraries to continue to fund research that they can access for free, I believe this will be the first area that is cut when libraries are asked to contract their budget.

Lisa Walton: Reducing the burden on researchers to publish open access and ensuring equitable access to publish openly both contribute to the sustainability of the open access ecosystem. As part of our work towards that, BMJ is evaluating its waiver policy and working on offering transitional agreements that remove administrative barriers to publishing open access.

Please tell us about the OA initiatives adopted by your journal and the impact these have had on your authors and readers.

Lisa Walton: The research in our flagship journal, The BMJ, has always been free to read, and in 2011 we launched our first and largest open access medical journal, BMJ Open. Today, a third of our journals are fully open access, and we also make academic research freely accessible and discoverable with hybrid publication models. The majority of our hybrid journals have been given Transformative Journal status by cOAlition S. Through BMJ’s OA campaign, we support authors to achieve global impact, broader reach, and exceptional quality. 

In 2019, BMJ co-launched medRxiv with Yale University and Cold Spring Harbour Laboratory. It is the first health sciences preprint server, and it allows fast sharing of preliminary research findings to the widest possible audience. medRxiv now highlights when preprints have been accepted or published across BMJ’s journal portfolio, further contributing to the reliability and value of preprints as part of the scientific record. Journals like BMJ Open Science improve the validity and quality of pre-clinical research through open practices.

BMJ is also part of initiatives like Initiative for Open Citations (I4OC) and Initiative for Open Abstracts (I4OA), making it easier for articles to be found, read, and cited. 

Bryan Hibbard: SPE will be introducing a hybrid open access model in 2022 using traditional APCs with discounts for members and authors from low- and middle-income countries. Open access will be optional, and the subscription model will still be available to all.

Melissa Harrison: eLife has been open access from the outset so we focus our attention on open science, which includes reproducibility, open data, open software and so on.

eLife, via Mark Patterson, was one of the organizations that spearheaded DORA (The Declaration on Research Assessment) recognizes the need to improve the ways in which researchers and the outputs of scholarly research are evaluated) and I4OC (initiative for open citations). We have since supported I4OA.

We ensure our content is machine readable and we submit as much metadata as we have and is allowed in the Crossref schema. Crossref APIs are used so far and wide and so much is being built upon them that we feel it’s important to make as much available as possible there, hence the I4OA and I4OC initiatives.

Editorial policies and eLife staff QC support open science, and we publish transparent reporting forms, key resources tables, data availability statements, as well as encouraging open underlying code and data that supports the research. We support many other initiatives, including the CredIT taxonomy, FAIR sharing principles, the Open Funder Registry, and ROR. From all of this, we also generate full text XML that semantically demonstrates these initiatives as well as developing standardization across corpus XML via JATS4R recommendations, and delivery downstream to indexers, and Crossref.

What role do you think DEI initiatives play in making research accessible to a wider audience?

Lisa Walton: DEI initiatives can give authors and researchers equal support to get their work published, regardless of their sex, gender, race or ethnicity, first language, sexual orientation, religion, beliefs, disability status, age, status, nationality or citizenship. This approach serves to dismantle the barriers that have previously prevented women and underrepresented groups from being published, having a voice or advancing their careers.

Melissa Harrison: Fortunately, there is increasing recognition of these issues, and a willingness to face them. As a publisher and organization looking to reform research communication, eLife has the ability to influence the community and promote greater equity, diversity and inclusion in research and publishing.

eLife’s Community Ambassadors programme and Early-Career Advisory Group are active in helping us change and we have targets to diversify our editorial board and staff composition. We have introduced a code of conduct for all eLife interactions, as well as a social media policy, to ensure all voices are heard and for mutual respect to be adhered to. There are many other areas of activity ongoing and in development.

Where do you see Open Access in the next few years? What are the changes you anticipate?

Bryan Hibbard: I expect many more new ideas to appear in the OA field and expect to see a crystallization of what actually works in OA. I think we are still in the very early stages, and publishers are throwing ideas at the wall to see what sticks. I do believe that many of the current models that publishers are pursuing will turn out to be unsustainable. This is one of the reasons that we are dipping our toe in, so to speak, by utilizing hybrid open access and waiting to see what comes next.

Lisa Walton: BMJ supports and promotes a future built on the principle of unrestricted access to the outputs of medical research, encouraging scientific discourse and facilitating further medical advances. We expect to see increases in the proportion and amount of research published open access.

Melissa Harrison: Any new journal being launched is open access. The challenge is to flip the long standing hybrid or closed access journals to an open access model. Funders have signaled their commitment to open access and initiatives like Plan S are pushing the needle further. The key issue will be to change the whole business models around publishing and for this to be achievable for any journal, and not just monopolized by the big publishers.

Events like Open Access Week 2021 are critical for the growth of the global research community as it encourages all stakeholders to discuss pressing issues. We are grateful to Lisa Walton, Bryan Hibbard, and Melissa Harrison for sharing their insights with us!


Image courtesy: Business vector created by vectorjuice –

Leveraging XML data validation

Publishing Health Records

Leveraging the power of XML data validation
XML data validation

In Europe, hospitals are required to submit health records of their patients to the government through an institute. Proper submission of these records helps the government deliver benefits to patients with medical insurance and understand the medical conditions of the patients in their country.

The problem with the submission system

The hospital recorded all its patient information in a *.csv document to the institute’s server. The .zip file that was shared with the institute contained records of thousands of patients. If even one record was incorrect, the institute had to reject the entire package.

For example, some of the entries in the “date” field were either invalid or inaccurate, such as ‘Last Sunday’ or the ‘32nd March’. There was no system in place to check whether the data were valid.

This would put the responsibility on the hospital to sort through every entry and ensure that it is within the appropriate parameters. Correcting this content was a time- and labor-intensive task for the hospitals.

The government institute began to look for a solution because it was their responsibility to ensure that the government received correct information from all the hospitals. Upon doing their research, they came across XML and thought its validation power could help cure the data.

Reigning in the stakeholders

The project involved many stakeholders, such as hospital administrators, data analysts, and lawyers from the institute and the government. The scope of XML created an ambition among stakeholders to optimize their work.

Each stakeholder had different requirements based on their role in the process. For example, the data entry executive from the hospital would want the schema organized in a manner that directed the users to correct any incorrect data entered by them. The lawyers were concerned about the legal requirements around processing, storing, and sharing data. The data analysts were focused on checking criteria like the minimum number of visits needed for refunds. Due to these varying requirements and interests, each hospital had a unique opinion about the model they would like to provide to match their systems. 

Therefore, the solution had to be approached through the following steps: 

  • Define the information model to bring all the stakeholders into the picture. 
  • Derive the XML schema based on the information model.
  • Determine what data needs to be entered by the hospital to ensure that the necessary information was submitted.
  • To accommodate the different requirements of stakeholders, a small ‘transformation’ was provided, enabling hospitals to provide data the way they wanted to, but at the same time, ensuring that all the legally required information was included.

Implementation of XML Schema

The hospital database stores data as *.csv files in a flat, tabular structure. XML documents, on the other hand, are hierarchical, which means that some entries are subsets of other entries.

In this project, the data that was first entered in a flat structure had to be transformed into a hierarchical structure and later fed into a database with a flat structure again. But the process of loading the XML directly into the database was riddled with issues. So, an intermediate, generic XML was created to mimic the *.csv structure. This made it easy for the XML data to be loaded into the database.

What was the value?

Now, there are several hospitals that no longer need to sort through thousands of entries when the large .zip file is rejected by the server. They know exactly which record is incorrect and have the means to correct it without spending several man-hours.Hospitals that require data validation for other projects can now use this model to verify all their projects. Moreover, there is scope for expanding the business rules for validation to further improve data quality.

What can publishers take away from this?

In this project, XML was chosen specifically for its validation powers. Publishers are in the business of validating information and making it accessible to their readers. Therefore, publishers need to ensure that their data is valid and accurate.Much like the government institute in this episode of XML Stories, journal publishers also stand to gain from XML. In the case of journals, there is a need to ensure that the dates and references in articles are correct. For example, when a chemical substance is referenced in a journal from an existing list, the power of validation can ensure that the references are accurate, thereby creating richer metadata that makes content easily accessible and visible to the right people.

Publishing Health Records

Click here to watch the full story. 

Optimize book revision cycles with XML

Publishing Textbooks

Optimize revision cycles and publish digital collaterals with XML
publishing turnaround with XML

About a quarter of the world’s population is under 15 years of age, and the majority of them are in school. One of the successful and time-honored ways by which education is imparted to them is through textbooks.

In the case of the educational publisher covered in this episode of XML Stories, the publisher mainly produces books for school children. They also publish some books for university students. Their authors are school-subject specialists and university professors. So, the content falls within the scope of not just school students but also advanced material for university students. The common factor is that all published materials are educational.

The publisher wants to print high quality educational material on a nicely printed book. To accompany the book, they also want interactive features such as DVDs and websites.

What was the challenge?

Most authors were not comfortable with using FrameMaker to edit content. As a result, the book had to be physically printed and given to authors for editing. The edits were made on sticky notes and returned to the typesetter for correction. This feedback cycle would occur a few times. In addition, there was an editorial and redaction cycle. This classical workflow is not cost-effective, especially when a book has been written by multiple authors. So, the publisher began searching for solutions to address these challenges.

What kind of solution was the publisher looking for?

Initially, the publisher was hoping for a single source for their printed books and the collaterals that come with it, like the website, animations, videos, references, etc. They did their research and discovered the benefit of XML: create in XML and publish in a wide variety of formats.

When the publisher discovered the scope of XML, they wondered if they could also find a solution for their revision cycle issue, where subject matter experts and authors were reviewing books from a print delivered by a FrameMaker typesetter.

What is the result of shifting to XML?

The authors, editors, and other stakeholders involved in the revision process no longer need to work with physical materials. They use a virtual, WYSIWYG interface to complete their edits without the need for typesetters to revise every version of the book.

In the earlier FrameMaker setup, the publisher created interactive content in the form of CDs and websites. However, this process had limitations.

The publisher desired greater control over the scope of the users’ interactive experience, and this level of control could not be achieved in the FrameMaker workflow.

For instance, there may be a case where students need to be guided from one exercise to another only after completing the first. This can be achieved only with the features and facilities of XML. Many digital publishers in the education field have entered the space of creating learning modules. A background in XML empowers the publisher with the tools they need to build such modules.

In this case, the textbook publisher was looking for a solution to a specific problem related to content revision cycles. When the publisher explored the scope of XML technology further, they were able to reimagine the technology as a solution for various other publishing workflow-related optimizations as well.

Publishing textbooks

Click here to watch the full story.

Increase publishing turnaround with XML

Publishing Drilling Manuals

Increase publishing turnaround by 2500% at no additional cost
publishing turnaround with XML

What’s in a drilling rig manual?

Drilling rigs are used for a variety of purposes such as mining, taking samples from the ground, studying rocks by drilling hundreds of meters beneath them, and more. Depending on the task, a user buys different types of drills with different platforms, and sometimes, they might require a different type of tower or power supply setup. Drilling rigs are portable devices made up of several individual machines. They are disassembled and reassembled at every point of use.

When a user buys a drilling rig, they are entitled to a manual as per their specifications and language requirements. These manuals are necessary because many drilling operations occur in locations that are far from the reach of mobile towers and service specialists. Although the company that makes these rigs sold over 300 a year, they were able to deliver only a dozen near-specific manuals. This meant that many customers, despite paying millions for the product, would not receive the manual suited to their exact needs based on their selection of drilling equipment.

What’s the scope of delivery?

The manufacturer sourced the job from a publisher who edited the manuals and created the layout as per specifications. As each manual was hundreds of pages long, the copyeditor had to be conscious to find every instance in the book where corrections were required. This work was tedious and subject to human error. Every time there was a change in the content, the layout specialist also had to come in and edit every page.

Considering the number of pages and the size of the task at play, it is evident why the publisher found it difficult to deliver the required 300 manuals per year.

What was the challenge?

A variety of tools can be used to create a document like a product manual. One might use a word processor such as Microsoft Word or an advanced tool for technical authors such as Adobe FrameMaker.

The challenge lies in the fact that these tools are proprietary products; the user’s ability to perform operations like translations is heavily dependent on the program’s inherent functionalities. At the time of this project, the publisher needed to publish manuals in CJK characters and right-to-left languages, but Adobe did not provide this facility. Therefore, an equipment user in Iraq or China would have access to only an English language manual.

There are other challenges as well. For example, if the content written in English is 100 pages long, the same content in another language may require up to 50% more space. This means that the layout specialist would have to prepare the entire document from scratch. To solve these problems, XML was brought into the picture. Another challenge was changing the mindset of the authors and having them accept that they can’t know how the final content is going to look in different output formats while they’re working on it, considering the variety of ways in which documents are accessed today.

How does XML address these challenges?

When the publisher uses an XML-first approach, they make a one-time entry for all their reusable content. For instance, the function of a diesel engine is the same, regardless of the drilling rig setup. The text that describes the use of the diesel engine remains uniform across the board.

However, the diesel engine may vary in size. So, every instance where the size of the diesel engine is referenced is identified and profiles are built around it. Every time a new diesel engine is ordered, the publisher only needs to create a profile of the engine with specifications and images, and the appropriate details are automatically entered into the manual.

The other benefit of the XML approach is that layouts could be built as templates, and once the text was finalized, it was automatically laid onto the manual. Therefore, the page breaks and layouts were done as per coded specifications.

Does this solution take away the artistic touch?

When this transition occurred, copyeditors and layout specialists expressed concern because some page breaks and image positions were laid out differently than they would have preferred.

From the author to the page designer, each person has a unique style with which they craft the manual. They had established a workflow when they were working on each page individually with FrameMaker. But when XML was brought in, they no longer had to work on pages individually. With the XML solution, the publishing team now had the opportunity to focus on improving the quality of language, images, and diagrams instead of focusing on the layout.

Therefore, the solution that enabled the publisher to accelerate their workflows also allowed them to spend more time on improving the content.

What was the business outcome?

The drilling rig manufacturer now has very happy clients who have access to a manual that is highly suited to their specific customization.

The publishing team of authors and designers that previously published around a dozen manuals annually now publishes over 300 manuals a year in 17-18 different languages, including CJK characters and right-to-left languages using the same style sheet and content. The quality of content has also improved as the team no longer needs to relay every version of the manual. The team was able to publish the drilling rig manuals in different formats such as print PDFs, eBooks, HTML for web versions as well as mobile- and tablet-friendly versions.

Click here to watch the full story.

How to optimise high intensity publishing cycles

Publishing Car Manuals

Optimize high-intensity publishing cycles

Every year, about 70 million cars are manufactured and sold. Every year, every manufacturer produces a series of brands, each brand has variations, each variation must comply with the regulations of a country, and the manuals for these cars need to be translated to roughly 30 local languages. Furthermore, brands may introduce changes and redesigns mid-year or custom-build some vehicles. So, a publisher is obliged to produce a car manual for every iteration, leaving them in a continuous publishing cycle.

What did this production cycle look like?

The publisher built a page layout in English with a series of InDesign files for one model of the car. When the version was changed, the production team had to identify the pages where alterations were made and ensure that the layout was maintained. Then, this file went to at least 30 translation partners who would send content back, and the publisher needed to ensure that all these variations fell within the layout. Therefore, publishers spent a considerable amount of time relaying pages to meet their quick turnaround times.

Why did XML come into the picture?

The publisher recognized that they were finding it difficult to keep up with all their time-bound obligations to the market and looked for solutions. After some research, they discovered a potential solution—XML. To learn more about XML, they attended a DITA conference where they connected with experts who helped them realize the scope of the solution.

What was the transition to XML like?

The publisher’s client, the car manufacturer, could not stop production while the publisher transitioned to this new technology. Therefore, the publisher had to manage this transition while simultaneously delivering the product.

During the transition, in addition to their daily tasks, the production staff learned how to navigate through XML technologies and transform their workflow to suit the solution. In a few months, the production team reduced their turnaround time from a few days to a few hours. Before adopting this solution, each revision would entail formatting and layout tasks that could take up to 2 weeks for one manual. Now, with XML, the team was able to complete and share the revised version in just 1-2 days after receiving feedback. 

In every transition, there are some inevitable trade-offs. When using InDesign earlier, the information architect would immediately make tweaks to the layout based on the revisions. In an XML-driven solution, the same tweak was achieved with a 95% similarity to the InDesign solution. At first, the publisher was apprehensive of the 5% gap in similarity. However, the value of the time gained in this process outweighed the need for the InDesign quality finish. Instead of spending a few days formatting a 100-page manual, it could now be achieved in a matter of hours.

The success of this publisher’s efforts can be attributed to various players. It was the publisher who initiated research to solve the problem. The DITA conference served as a place where a community of practitioners could meet and develop ideas. The support of the production staff enabled a successful transition. XML is just a lucrative tool waiting to be tapped by a collaborative group of stakeholders willing to embrace change.

Click here to watch the full story.