Dane na klastrze Eagle

New data structure.

Home directory: is a limited data space with a small capacity limit of a few hundred megabytes. In practice, it is a sack for project directories.

'Project directory Each user with an active computing grant (scientific or commercial) on the cluster in his 'Home directory' has an automatically created directory named 'grant_<grant_number>' for each grant in which the user participates. This directory contains two important subdirectories:

  • archive' - is a space for storing data that is not actively used, such as processed calculation results, etc. This space is slower than project_data and scratch, but has a much larger size - it is possible to store tens, or (if necessary), hundreds of TB of data;
  • project_data - is a directory shared among all users of a given grant. Data located in all subdirectories of this directory are protected from accidental deletion by an automatic backup mechanism. PSNC provides a guaranteed data space for this directory in the size granted under the grant application or commercial contract. This limit can only be used by users. Within this directory, users can freely create and delete files/directories and manipulate access rights. The default space available in this directory is 5TB;
  • scratch - is 'dedicated' space for a given grant to run calculations/storage of input data. The usage rules are identical to the currently used scratch space. The default size of this space is 10TB.

The data available in the 'project_data' directory will be available for the entire duration of the grant extended by some additional time to archive the data or move it to the space of the new grant. Currently, data is made available for a period of 6 months after the end of the grant, plus an additional 6 months during which data can be retrieved upon request. The new data structure is as follows:

-> ~<user_name> : home directory
--> grant_X
---> project_data : shared data space for grant_X
---> scratch : shared data space for data at computation time for grant_x
---> archive : shared space for archive data for grant_X

--> grant_y
---> project_data : shared data space for grant_Y
---> scratch : shared space for data at the time of calculation for grant_Y
---> archive : shared space for archive data for grant_Y.


In the future, there will be additional directories providing, among other things, easier access to the archiving system or the possibility of exchanging data with the box.pionier.net.pl service.

Note: the project_data and scratch directories are symbolic links to the actual storage system mount location. Users are asked to use only relative paths (e.g. /home/users/<user_name>/grant_id/project_data), not physical mount paths as these may change. The new data structure allows us to dynamically move grants between different storage systems "on the fly," so to speak, so there is no guarantee that the data will be stored on the same physical system throughout the life of the grant.

The storage systems currently in use are being phased out starting at the end of 2021. - the /tmp/lustre directory will disappear and space in the home directory will be significantly reduced - the 'project_data' and 'scratch' directories of grants should become the main storage location. Users will be informed of any planned work at least two months in advance to allow data migration.

To check how much space a given directory (archive, project_data, scratch) occupies and what limint it has, use the following commands:

check quota for archive:

getfattr -n ceph.quota.max_bytes /some/dir - quota per size
getfattr -n ceph.quota.max_files /some/dir - quota on number of files

quota check for project_data and scratch:

project_data:
lfs quota -h -p 399 /mnt/storage_2/project_data/grant_399

scratch:
lfs quota -h -p 3990000 /mnt/storage_2/scratch/grant_399

== Previous data structure on the cluster ==.

Home directory' : Each user, after logging into the access server, has a personal file storage space, which is used to store the final results of calculations and other valuable information. This space is usually located in the /home/users/<username> directory. This directory is available on the access server and all computing servers of the cluster under the same path.  The MNC performs regular backups of this data, so it is possible to restore the files in case of a disaster

Previous data structure on the cluster

Home directory' : Each user, after logging into the access server, has a personal file storage space, which is used to store the final results of calculations and other valuable information. This space is usually located in the /home/users/<username> directory. This directory is available on the access server and all computing servers of the cluster under the same path.  The PSNC makes regular backups of this data, so there is the possibility of restoring the files in case of a failure of the storage system or mistaken deletion of the data by the owner of the data.This directory, according to the regulations, must not be shared with others.

Directory for conducting calculations / scratch. Each user has access to a personal space of the high-speed storage system. This system is designed to store data used during calculations and is the place where temporary data generated during the operation of the application should be saved. This space is available under the path /tmp/lustre/<username>. This system is many times larger than the Home Directories space and offers several times more performance. However, the space is not protected by automatic backups - mistakenly or intentionally deleted data cannot be recovered. Both "Home Directories" and scratch space offer all the available space for each user.

The above solution was convenient, but an error in data management by a single user, occupying all available space, resulted in blocking the ability of all users to perform calculations. Due to cyclical problems with space availability and ensuring the quality of the service provided, it was decided to gradually abandon this model in favor of the new structure described below. Another reason for the decision to change the directory structure is the problematic migration from a single central file system to another. Currently, projects on a cluster take up many petabytes of space. It would take many months to migrate this data to a new storage system so that it would not significantly affect the day-to-day operation of the cluster.