Minutes of storage phone conference 2005-11-23 Present: Durham: Mark QMUL: Giuseppe Bristol: Yves Glasgow: Jamie, Fraser, Graeme Liverpool: Michael, Paul Imperial: Mona, Olivier Tier 1: Derek Lancaster: Matt Manchester: Alessandra RAL Storage: Owen, Jiri, Jens (chair+mins) 0. Action review, bug summary. Actions, see below. 26 open bugs, I will close a few now that the wiki is back with us. Worries me slightly that many are unassigned. 1. Delta site. Peak yesterday, nearly 180 TB probably due to over-eager Glasgow DPM. Back to 160 now which is still pretty good, given that it's not long ago we reached 100. Lancaster will publish another 10 soon. Let's call Sarah again when we reach 250. Liverpool Running test dCache. Need more testing before can put into production. ACTION Jens Follow up with Owen and Greig about testing Liverpool. See also bug #10368 https://savannah.cern.ch/bugs/index.php?func=detailitem&item_id=10368 and http://wiki.gridpp.ac.uk/wiki/SRM_commands Sheffield Only 0.5 person on Grid stuff. Andrew is believed to have installed dCache. UCL Physics Now publishing a test DPM. UCL CCC Report from Olivier, no change since last week, storage wise. William is currently doing cluster stuff. Cambridge After a certificate problem, publishing DPM. Some problems with publishing, which are being sorted out. Durham Downtime (Hibernating?), but migrating Classic SE to DPM this week. RHUL In progress, report by Olivier. Duncan working with Simon; two 2.8 TB disks need to be turned into DPM pools. Tier 1 Using scheduled downtime to upgrade dCache from 1.6.5 to 1.6.6. 2. SC4 prep: what to do with your classic SE. Some sites still have Classic SEs; we agreed that we should try to get rid of them. There are two paths: either migrate them to DPM or shut them down. For example, the one in RAL PP is being shut down. The shutdown procedure was discussed on tb-support between 7 and 10 Oct 2005. In particular, see http://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0510&L=tb-support&T=0&F=&S=&X=4CBDA80E418969725D&Y=j.jensen%40rl.ac.uk&P=8526 ACTION Jens: Clarify with sites who have a C-SE whether they (the C-SEs, not the sites) are being shut down or migrated. Do we need the shutdown procedure described in the wiki? 3 & 4. SC4 prep: {status of, plans for} 2.1 testing, and DPM and 2.1.1-modified. Jens explained that SRM2 is part of the goals for SC4. Resounding cries of excitement spectacularly failed to be heard. But then again neither did cries of woe and anguish, nor gnashing of teeth. The only 2.1 client tools we have now are those for DPM which Jiri has tested. The tools are built for 2.1.1 and most (other) SRM 2s require 2.1.1-modified. * Jiri has tested the 2.1 interface in DPM, but only with upload and download so far (which is good because that is what we need for SC4). * Jiri is working on modifying the 2.1.1 tools to 2.1.1-modified. Related actions: #8 (Resurrect SRM client API), now closed because replaced with #60 (Add client wsdl->library recipe to wiki). Related to that, I plan to put the SC4 goals on the agenda for hepsysman and neither Pete nor Jeremy have complained loudly. 5. Report from the LCG computing review (Graeme) Graeme reports that there isn't much interesting to report from the review. He gave a talk which was well received. SC3 throughput will be repeated in January, but this should not affect Tier 2s. 6. Overview of monitoring framework (Jens) Jens reported that the monitoring framework will be extended to: 1. Cover all SRMs rather than just the close SEs - it will even work with SRMs not in the BDII. 2. Be extensible to be able to do more stuff. Which will be handy: * Once the SRM2s are deployed. * If/when we want to run further tests against higher level services (currently covered by SFT, but only for close SEs); * If/when we want to run further tests against lower level services (currently not covered directly). In the first instance it will do SRM upload, download (and compare, obviously), delete (via srmcp and srm-advisory-delete, i.e. the FNAL SRM tools). The test will be done by a job running on a local CE, thus exercising the site-internal interface. Site-external interfaces are not necessarily covered by this monitoring framework; they are already covered by the SFT etc. 7. AOB Owen reported YAIM script is now ready for 1.6.6. ACTION ALL: Send site-info.def to Owen. Olivier had a few specific questions on DPM: * Quotas can be done (only) with one pool per VO, but Graeme reports that a subgroup of DPMers is working on improving this; * Can DPM filesystems share a disk partition. Answer, no. * How does DPM select a pool? Graeme speculated that it will pick the one with most free space. Graeme reports: > If one does a "dpm-qryconf" then one of the parameters is "FSS_POLICY", which > is "File System Selection policy name" (says the man page). At the moment > this is set to "maxfreespace", which I think confirms my speculation. > dpm-modifypool doesn't have any facility to change this policy, so I think > it's a placeholder for future functionality. > ------------------------------------------------------------------------ ACTIONS: 35 13/07/2005 Find out correct behaviour for what to do on full system Owen Ongoing Currently testing on 1.6.6 The original question is actually closed, fixed in 1.6.5: https://savannah.cern.ch/bugs/index.php?func=detailitem&item_id=10120 Since sites will upgrade from 1.5.X to 1.6.6, rather than go via 1.6.5, it will be useful to test the filling up behaviour on 1.6.6 as well. 41 10/08/2005 Agree licence with DESY Jens Open No change since last week, still with external advisors. 50 14/09/2005 Can DPM use space on WNs (Durham) Owen Open Low priority Now done at RHUL? => Simon George? 53 12/10/2005 Find reasoanable % for SE uptime for SC4 Jeremy Open Reassigned. Follow up with GDB et al No news. 54 02/11/2005 Report on performance/scalability with pools on WNs Paul Open No news. 55 02/11/2005 Report on upgrade to 1.6.6 Greig Open Reassigned 'Tis done. 57 02/11/2005 Investigate StoRM UNASSIGNED Open Wait for INFN report Depends on 61. 60 16/11/2005 Add client wsdl->library recipe to wiki Jens Open Not done. 61 16/11/2005 Figure out who in INFN is testing StoRM Jens Open Not done. 62 16/11/2005 Follow up with JC about NorthGrid being dCache only Jens Open High priority Done. 63 16/11/2005 Follow up with Sheffield about timeline Jens Open On-going. 64 16/11/2005 Publish FTS performance to list Jamie Open Done. The FTS stuff is on-going: Jamie reported an FTS channel between the DPMs at Edinburgh and Glasgow is seeing fully satisfactory performance (1 GB transferred in 1-1.5 mins), as opposed to the unsatisfactory performance RAL <-> Glasgow reported last week. Jamie will be doing more work shortly. Graeme reports that FTS (at least 1.4.0) can take SURLs in a different format. I have located the description here: https://uimon.cern.ch/twiki/bin/view/EGEE/DMFtsSupport#DCacheSrmCopyUrl Note there is a problem with dCache 1.6.6. Another potential problem was reported by Jane Liu from LBNL on the dCache list: when FTS-1.3.0'ing files between dCache-1.6.5.2 and dCache-1.6.5.3, a log says "not enough available space to lock". Since UK Tier2s will skip 1.6.5, we will need to test FTS transfers between 1.5.X and 1.6.6, and obviously test between 1.6.6 and 1.6.6, as well as to/from DPMs. It would be good to debug the current Tier 1 to Glasgow problem, but given that Tier 1 is upgrading now, we will have to rerun tests after the upgrade. NEW ACTIONS -------------------------- 65 Send site-info.def to Owen ALL Open 66 Follow up with Owen and Greig re testing Liverpool Jens Open 67 Clarify future of classic SEs for sites that still have them Jens Open Not urgent There are a few other open issues which I haven't listed as actions; such as following up with FTS-dCache testing. And the 2.1 interfaces and testing them.