artistic title image

HIFIS Survey 2020: Programming, CI and VCS

image/svg+xml

Introduction

In the beginning of 2020 the HIFIS team conducted a survey among Helmholtz scientists with the goals of learning more about the current practices concerning research software development and identifying future challenges.

This blog post will present a glimpse into the survey’s results and our take on the gathered data. Specifically, we will take a look at the distribution of programming languages across the different research fields as well as the utilization of Version Control Systems (VCS) in the same context. Last, a short insight into the prevalence of various Continuous Integration (CI) systems will be given to round out this blog post.

Programming Languages

We asked the survey participants which programming languages they regularly used for writing research software. The following heatmap displays the relative usage of the most predominant programming languages for each research field

Plot: Languages by Research field

All presented numbers are the relative usage of a given language in a given field. They might not always add up to exactly 1.00 per field or per language due to multiple factors:

  • Some participants did not answer both questions. These answers are not represented in the plot.
  • Languages that had not at least a 5% share in at least one field were omitted to focus on the most prominent ones and make the graphic easier to read.

What can We Learn?

The first thing that catches the eye is that Python seems to be very dominant in every research field. We have to take this appearance with a slight grain of salt since the survey did not distinguish between the outdated, but generally popular, Python 2 and the current Python 3. The popularity of the language amongst researchers is not very surprising: They are well suited for quickly creating small scale scripts, combined with an extensive choice of libraries for many use cases.

Consequently, our education and training efforts will continue to provide offers regarding programming in Python and create appropriate courses and materials to further the knowledge and best practices in this language amongst scientists and research software developers.

Regarding consultations we expect the team to receive requests regarding the porting of older Python 2 applications to Python 3, as well as support requests for dealing with the variance of virtual environments and package management for this language.

A second language often selected was C++ which often is a popular choice in high performance computing and larger applications.

This indicates a potential demand for supporting this language in the future as well, especially in the context of training as well as consulting.

Notable further mentions would be the the strong presence of the statistics language R in the Health and Earth and Environment research fields, which implies the opportunity for education and consulting being tailored and advertised more towards these areas.

Version Control systems

Similarly to the question above, a second question was analyzed, concerning the usage of Version Control Systems (VCS) amongst the participants related to specific fields of research.

Plot: VCS Usage by Research field

The strong prevalence of Git is apparent at first glance. As a runner-up there are still some projects out there based on SVN for version control, which - together with a few mentions of CVS - might be an indicator for older, longer living projects. The amount of projects not using any version control at all is comparatively low, which points toward the usage of VCS being an established step in setting up projects across all research fields.

From an education perspective it appears to be the right way to continue to focus on basic and advanced Git-courses and promote version control as one of the standard practices in every scientists toolbox. It can be expected that the consulting team might face requests for help with migrating projects from SVN or CVS to Git in the future.

Continuous Integration

As a third question we wanted to know which Continuous Integration (CI) services the participants use to automate tasks surrounding their projects. This, again, was a multiple choice question and the following plot shows the relative distribution of the given answers:

Plot: Overall CI Usage

One very prominent outcome is that over half of the participants did claim to not use any CI at all. Several possible reasons for this finding come to mind:

  • The question was not clear enough and participants who actually use CI were not aware of that fact.
  • Participants are not aware that CI exists.
  • Participants do not see any potential benefit of CI for their projects.
  • Participants do not know how to set up and use CI.

Given that practically any project can benefit from employing Continuous Integration services by automating at least the mundane management tasks like license checking, documentation generation, style checks, etc. all four given reasons can be assumed to be a lack in awareness and education.

Further, the plot reveals that the currently used CI solutions are (in descending order of percentage) GitLab CI which holds over a quarter of all shares, Jenkins and Travis CI with all other services being barely represented.

Building on the insights from this analysis, three actions clearly stand out to improve CI usage across all projects:

  • The education team will have to increase their portfolio and offer more courses centered around CI usage.
  • The popularity of GitLab CI will likely increase the demand to migrate other projects to this system. It will fall to the consulting branch to be prepared to deal with such requests.
  • The technology team has already begun to offer pre-made recipes for CI pipelines and has an incentive to grow the collection of ready-to-use solutions for popular scenarios.

Further insights on the usage of Continuous Integration platforms can be gained from another blog post discussing the survey analysis from a technology perspective.

Conclusion

Thanks to the participants of the HIFIS survey in 2020 it was possible to gain a first glimpse into the status quo of research software engineering within the Helmholtz centers. With this data, the needs of the scientists could be assessed from a birds-eye perspective and it is possible to determine concrete steps to offer better support for the scientists at Helmholtz.