Rucio is an exascale data management system powering the ATLAS & CMS experiments on the Large Hadron Collider at CERN. It is also being used to support the needs of diverse scientific communities beyond LHC, such as astrophysics; thereby making it necessary for the documentation to be as relevant and as accessible as possible. With the help of this project, CERN wants to enable the end-users of Rucio to have a seamless experience while utilizing the framework by providing a centralized view to access all of the relevant documentation.

 

During the Google Season of Docs program, I worked with CERN-HSF on the project - Modernize (restructure & rewrite) the Rucio documentation. The original proposal for the project can be found here.

 

Project Goals

 

After the project was accepted by CERN-HSF, my mentors & I narrowed down the project scope by clearly defining goals that could be achieved within three months. The three main problems identified with existing documentation were: 

 

  • Disparate sources of documentation & lack of centralization

  • Existing documentation at https://rucio.readthedocs.io/en/latest/ had the following drawbacks:

    • Built by Sphinx & therefore, located in the source code

    • Redundant information

    • Complex navigation

  • No support for newly introduced JSX modules

 

Based on the above challenges & limited time frame, we formulated the goals listed below:

 

  • Eliminate dependency on source code as much as possible

  • Centralize the various disparate sources

  • Move to a documentation-as-a-service tool that would support JSX and the existing APIs

  • Restructure it for easier navigation

  • Introduce documentation contribution guide to provide a lower level barrier for new contributors

  • Improve quality of documentation

 

Community Bonding phase: Setting the tone

 

Owing to the timezone difference, the first agenda item was to set the cadence & channel of communication. Along with this, I was encouraged by my mentors to further gain a solid understanding of the existing toolset so that I could familiarize myself with the challenges they were facing. Once I had gained enough exposure, I commenced research on the various documentation-as-a-service tools that would fit the bill for this project. With the help of my mentors, I also refined my goals since we identified that it would not be to completely eliminate the dependency on source code & would have to retain the dependency for API documentation derived directly from it.

 

Documentation Development phase

 

  • Eliminating dependency on source code as much as possible

 

As aforementioned in the Community Bonding phase, we had identified that a complete elimination of the dependency on source code would not be possible. Therefore, keeping in mind the various tools available, we opted to use Sphinx to convert the documentation from RST to HTML only for the API documentation that was derived from the source code. The rest of the documentation would be converted to Markdown owing to its simplicity & ease of maintenance.

 

  • Centralize the various disparate sources

 

For a person entirely new to the Rucio ecosystem, one of the common problems encountered were the various sources of documentation. With the revamped website developed during this phase, we’ve ensured to link each of these documentation sources for ease of accessibility on the landing page.

 

  • Move to a documentation-as-a-service tool that would support JSX and the existing APIs 

 

This was partially achieved during the Community Bonding phase where we tried out various documentation-as-a-service tools. I worked with the existing tool-set - Sphinx & tried to integrate various extensions that would help us parse the RST files directly into Markdown. 

 

One of the options that we considered was the sphinx-markdown-builder extension. Considering the manual editing of the Markdown file post integrating this extension, we decided to not use it. Since JSX support was also a key future requirement in addition to the ease of maintenance, we narrowed down on Docusaurus as our primary documentation-as-a-service tool. The challenge with this approach was the conversion of existing documentation that was in the form of RST files into Markdown format. While researching potential solutions to this problem, there were a few things that came to light:

 

  • Only the actual API documentation was derived from the source code

  • The rest of the documentation was manually written

 

This simplified our approach to a very large extent. The manual documentation was easily converted to markdown via pandoc. For the API documentation, however, we stuck with Sphinx as the tool. We utilized Sphinx to convert the RST files into HTML format & integrated it within the static folder of Docusaurus. This also gave us the ability to integrate the API documentation within our website while playing around with the look & feel of it using Sphinx themes. 

 

While this was simple to achieve on a local machine, setting up an automated build to publish on GitHub pages was another challenging aspect. Since our project had dependencies on the source code and all of it was built on Docusaurus, existing Docusaurus/Sphinx actions weren’t able to meet our requirements. With the help of Rosemary’s GitHub actions for Docusaurus & a customized one for the building of Sphinx documentation, we managed to segregate our repository into three different branches:

 

  • Sphinx: Containing the Sphinx source files that would undergo a rebuild every time a pull request was submitted to the repository.

  • Master: The default branch containing all the source files for building Docusaurus

  • GitHub pages: The branch hosting our website for documentation.

 

  • Restructure it for easier navigation

 

Once the structure for our repository was decided upon, we needed to ascertain how best to tailor it for ease of navigation. Centralization, as aforementioned, was one of the strategies adopted. However the existing documentation suffered from redundancy, both in the structure & content. We have worked on eliminating this by sectioning it appropriately, ranging from documentation that helps users get started with Rucio to those for developers that wish to experiment with Rucio via calls to the REST & Client APIs. These have also been presented on the landing page so that users have a fair idea before diving right in. We've also integrated Algolia search, that comes built-in with Docusaurus, to simplify the search experience for users. 

 

  • Introduce documentation contribution guide to provide a lower level barrier for new contributors

 

While starting off with this project, it was important to all of us that we make the maintenance & upgrade of documentation as community-driven as possible.To ensure that, it was necessary for us to have a documentation contribution guide in place outlining the ways anybody could contribute. In addition to the hows of contributing, it also provides insight into the sources our documentation is derived from so that users are able to contribute effectively.

 

  • Improve quality of documentation

 

This, of course, will always be a work in progress! Basis some of the feedback we received from the existing users, we have tailored the documentation to include more content around the motivations behind Rucio & the different layers/resources within it. I’ve also rewritten certain areas in the documentation to make it simpler from a user’s perspective. 

 

Work done

 

All of the work done during Google Season of Docs is viewable on the website - http://rucio.cern.ch/documentation/. As aforementioned, this is hosted on GitHub pages & a separate repository was created containing the source files here. It uses a combination of the static site generator, Docusaurus & Sphinx.

 

You can refer to the documentation contribution guide to make your very first contribution to Rucio’s documentation! Some very good first issues have already been listed down, should you want to start immediately. 

 

Challenges

 

There were two major challenges that we faced during this project & I’m thankful to my mentors for guiding me through these. The first challenge was that of converting files from RST to Markdown format. Without the observation from my mentors’ end that all the files except the API documentation were manually written, I would have had a tough time utilizing the full capability of Docusaurus as a tool. With the insight, we were able to come up with a solution of segregating the documentation into two different categories & relying less on the source code. As aforementioned during the setup of GitHub actions, on account of the special nature of our project, we ran into a second roadblock. After brainstorming sessions with my mentors & help from other members of the community (Special thanks to Ben), we managed to figure out the entire setup in a way that would simplify the contribution experience for everyone.

 

My takeaways

 

Before this project, I wasn’t aware of Rucio/any of the work around it. However, as a systems administrator I was always curious about how scientific communities maintained & used the large amounts of data generated from experiments. Working on this project has given me insight into how that is achieved, in addition to enhancing my Python knowledge. There were also multiple opportunities within the project to brush up on languages/formats I already knew - HTML, CSS, JavaScript, Markdown. 

Overall, contributing to this project has been one of the best things to happen to me this year! Working alongside my super-talented mentors, Martin, Thomas, & Mario has been an honor & an immense learning experience that I shall treasure for years to come.

 

  • LinkedIn
  • Twitter

©2021 by divya-mohan.com.