HIFIS transfer service¶
Introduction and purpose¶
The HIFIS transfer service enables Helmholtz scientists to transfer large data sets between sites.
In order to provide a comfortable solution, we make use of CERN’s File Transfer Service (FTS), which is used in WLCG for the distribution of experimental data to hundreds of LHC tier centres.
Commissioning transfers can be done via the WebUI WebFTS or FTS3’s REST API by submitting a JSON file with the transfer’s details.
The advantage of FTS in comparison with, e.g.
rsync or Dropbox is that users can commission data transfers between endpoints that will run asynchronously, i.e. without the need for them to interfere.
Transfers of large data sets can thus be commissioned in a ‘fire-and-forget’ manner.
In order to use FTS at a Helmholtz centre, an endpoint in form of a webserver capable of communicating with FTS and other endpoints needs to be present there. Up until now, dedicated WLCG storage solutions, e.g. dCache, DPM and EOS, had to be installed on site. For HIFIS, an Apache web server with some modified modules can be used instead. Configuration examples and instructions can be found in the gitlab repositories linked to under Endpoint installation below.
If you already have two endpoints between which you want to transfer data, you can use WebFTS to start the transfer.
- Click on “Login” (top left corner) and choose “Helmholtz AAI” as your IdP.
- After going through the login process, you can click on the tab “Submit a transfer”
- Enter the URLs of both endpoints with the https:// prefix and optional port after the domain name (the endpoint administrator should have given you the correct format of the URL including the directories you can access).
- Optionally, you can enable “Compare Checksums” below the arrows before submitting the transfer in order to have the data intefrity checked after the transfer. Note that this might lead to errors if one of the endpoints does not support the comparison of checksums.
- Start the transfer by clicking on one of the arrows in the center of the screen according to your choice of source and destination enpoint.
- The status of each transfer can be checked either on the tab “My Jobs” within WebFTS or on the FTS3 status page. For the latter option you might need the job id displayed after submitting the transfer job.
Please note that CERN is using their own certificates for the FTS3 status page and you will likely encounter a security warning in your browser when accessing the above link. This is due to CERN’s root CA certificate is not part of the standard packages on any operating system. You can either accept the warning or install the root CA certificate in order to continue.
If you are in need of a possibility to do automated and/or script-based file transfers, please have a look at the FTS3 REST API via cURL or the FTS3 Python bindings.
There, the basic usage of the API via the commandline tool
curl and via Python-scripts is described.
Please note that the documentation assumes the use of X.509 certificate as means of user authentication and authorization.
For use with the HelmholtzAAI access tokens, you need to specify
-H "Authorization: Bearer $(ACCESS_TOKEN) instead of
-E ~/proxy.pem with curl and the
access-token=ACCESS_TOKEN parameter for the context creation with the Python bindings.
A token from the HelmholtzAAI can be obtained on the commandline by using oidc-agent (official website), please have a look at our documentation for setting
oidc-agent up with the HelmholtzAAI in our documentation.
Since the endpoint needs to be integrated with each centre’s infrastructure, it is generally a task for the local IT. We as the HIFIS team can provide the configuration of and support for installing an endpoint. If you have the need for data transfers via FTS, please contact your local IT first and enquire if a) such an endpoint already exists, or b) if it would be possible to install one. A detailed description of the steps necessary to install an Apache webserver as FTS endpoint can be found in the repositories linked below.
- A collection of the modules and the modifications that need to be patched in can be found at in this repository for a manual setup.
- A Docker image for either standalone or a kubernetes deployment together with the corresponding helm chart are located in this repository.