News from the High Energy Physics Research Computing Group at the University of Victoria

Posts

HEPiX Presentations

May 16, 2018

We had the opportunity to presented our compute and storage system for HEP at the Fall HEPiX at KEK, Japan. In our first presentation , we showed how we run HEP workloads on distributed clouds. We showed how Cloudscheduler is used to start up VMs on any cloud accessible to us automatically when needed, how Shoal is used to connect to the squid closest to a VM, and how we implemented an Accounting and Benchmark system. Our second presentation covered the data storage part, which is for a distributed cloud system very different than for a traditional Grid storage site. In this presentation, we showed why a distributed cloud compute system needs a different storage approach, how we realize it, and how the authentication and authorization for such system works. We will present updates on our accounting and monitoring system as well as on the production usage of our distributed storage system soon at CHEP in Sofia, Bulgaria.

Mounting a federated storage cluster as part of a local file system

December 06, 2017

For the Belle-II experiment, we run more than 3500 user jobs in parallel on 6 different clouds which are all at different geographic locations far away from each other. Running physics simulations, each of these jobs needs a set of 5 input files with about 5GB of input data. All available sets together are about 100GB of size and each job choose one of the sets as their input data. However, if all jobs access a single storage site then it is very easy to run into problems, mainly due to: high load on the storage servers timeouts due to too slow data transfers when sharing the bandwidth of the storage site slow (random) read access to the disks when providing the files for many different jobs in parallel, especially since the central storage server also serves data for other experiments inefficiencies due to long distance data transfers The best solution here would be to have the data files on different locations close to where the jobs run. This isn't easily pos...

Shoal - Squid Proxy Discovery and Management

October 11, 2017

The Shoal system has been running stable in a production environment for several years now without much change. The goal of Shoal is to help provide contextualization to new virtual machines in a cloud production habitat. More simply- Shoal provides virtual machines with some squid proxies where they can retrieve the software and data they need to run their payloads without going all the way to the source. The Shoal system is broken down into three components: shoal-agent shoal-server shoal-client The shoal-agent is a daemon process that runs on a squid proxy cache. The daemon collects various health metrics and configuration information about the squid and sends the shoal-server a message via AMQP (Advanced Message Queuing Protocol). Each installation of shoal-agent will have a shoal-agent configuration file typically found at /etc/shoal/ shoal_agent.conf. This file allows you to configure several things about the squid cache such as which shoal-server to register to, who the ...

ACAT 2017

August 28, 2017

The 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research ( ACAT ) took place in Seattle this week. We presented our work on integrating the dynamic web federation into HEP computing as a poster . The conference focused on the use of machine learning algorithm in physics research with contributions from industry offering effective computing technology to execute workflows employing deep neural nets. These technologies offer solutions to the computing issues the field is facing in light of a great increase in data with a constant computing budget. When the LHC experiments were planned it was assumed that Dennard Scaling would solve this problem for us, it has become clear that this is not the case. It was shown that generative adversarial neural nets may be used to do do simulation, and that supervised learning may provide options for triggering and reconstruction. In some places these technologies are already used. nVidia, Microsoft, and DW...

Glint Version 2 Enters Production

August 09, 2017

After several months of prototyping and development Glint version 2.0 (Glint v2) has entered production. Glint v2 is a standalone web service inspired by Colin Leavett-Brown and Ron Desmarais' original glint service (Glint v1). The idea of glint was to allow for image replication across multiple openstack clouds using a simple interface instead of manually downloading and uploading images to new locations. Version 2 differs from the original in that it is a dedicated web service instead of an extension of the Openstack Horizon dashboard. Unfortunately the Openstack developers had a different philosophy regarding image and repository management and decided not to accept glint v1 as a proprietary module. The Openstack 6 month development cycle made it unreasonable for a small group like UVic's HEPRC group to maintain Glint v1 as an openstack plugin. Instead a new version of the service was conce...

Authorization in DynaFed, Part 2

August 07, 2017

As we showed previously , there is an easy way to use the information derived from VOMS-server based on grid-mapfiles to authorize a specific user to access a specific part of the dynamic federation. This solution was based on 3 parts: a grid-mapfile listing the DNs of all users from all supported VOs with all possible roles a text file (accessfile) that specifies the different privileges for the different parts of the storage federation a python script that is doing the authentication and authorization based on the 2 previously mentioned files While in this solution the grid-mapfile and accessfile can be changed anytime without the need to reload/restart the httpd and memcache process, there is also a simpler solution based on the internal authentication methods possible which however needs to restart httpd and memcache after each change. This one will be explained in the following. Using the built in authentication in Dynafed, one can grant access to a specific part o...

Search This Blog