Posts

Showing posts from December, 2017

Mounting a federated storage cluster as part of a local file system

For the  Belle-II experiment, we run more than 3500 user jobs in parallel on 6 different clouds which are all at different geographic locations far away from each other.  Running physics simulations, each of these jobs needs a set of 5 input files with about 5GB of input data. All available sets together are about 100GB of size and each job choose one of the sets as their input data. However, if all jobs access a single storage site then it is very easy to run into problems, mainly due to: high load on the storage servers timeouts due to too slow data transfers when sharing the bandwidth of the storage site slow (random) read access to the disks when providing the files for many different jobs in parallel, especially since the central storage server also serves data for other experiments inefficiencies due to long distance data transfers The best solution here would be to have the data files on different locations close to where the jobs run. This isn't easily possible