Minutes for Storage phone conf 25 Jan 2006 Present: Tier 1: Derek Lancaster: Matt, Brian Edinburgh: Greig (chair+mins), Phil Glasgow: Graeme Durham: Mark Liverpool: Paul RAL Storage: Owen Apologies: Jens, Jiri (CASTOR meeting) 0. Review of actions (see below). 1. Update on the dCache -> DPM transfer problem. RFIO_TCP_NODELAY=yes in /etc/sysconfig/dpm-gridftp fixes the dCache -> DPM issue. But, RFIO_TCP_NODELAY not needed when the DPM node uses long hostnames. Note that RFIO_TCP_NODELAY is unnecessary when running DPM version >= 1.4.4. However, it is necessary to have gethostname() return the FQDN in DPM <= 1.4.5-1 (the current candidate for LCG 2.7.0). 2. LCG 2.7.0, & Upgrade schedule for sites: 2.1 LCG 2.7.0 version of dCache installed and tested. dCache 1.6.6-5 with gdbm database. http://wiki.gridpp.ac.uk/wiki/Scotgrid_LCG_2.7_Pre-Release_Testing#SE_.28dCache.29_Installation What should dCache sites do? Recommended that sites upgrade to the postgreSQL PNFS database now rather than waiting until their dCache is full of data and then migrating from gdbm. Information publishing. New dCache dynamic information plugin is working, publishing storage used/available on a per-VO basis. In the testzone version of dCache that was installed, the LCG dynamic plugin had to be removed and replaced with the dCache specific one. Also necessary to setup dCache to use pool groups with the names of the VOs that your site supports. 2.2 LCG 2.7.0 version of DPM installed and tested. DPM version 1.4.5 http://wiki.gridpp.ac.uk/wiki/Scotgrid_LCG_2.7_Pre-Release_Testing#SE_.28DPM.29_Installation Working DPM installed from scratch. Static and dynamic storage information publishing working. Functionality of YAIM improved since the DPM pools are now trusted hosts with the pools added to /etc/shift.conf . Graeme has contributed fixes to bugs that he found in YAIM. DPM upgrade script exists (config_DPM_upgrade) that makes changes to the database schema between v1.3.8 and later versions. This works when upgrading from v1.3.8 of DPM to v1.4.5 that is in LCG 2.7.0, but not if sites have previously upgraded to 1.4.1 and manually changed the schema. Graeme working on a fix. DPM pool nodes as WNs (dependency issues) Conflict between DPM rfio and CASTOR rfio that comes with the WN install. It is possible to hack to get it working, but not recommended since the CPU usage will be high. What should sites do? Wait until LCG 2.7.0. 2.3 Only Sheffield need to install an SRM. Should be done 26/01/06. 3. SC4 workshop - dCache questions. Workshop agenda: http://agenda.cern.ch/fullAgenda.php?ida=a056461 dCache sites should email Greig with any questions they would like answered. Issues will be placed in Savannah bug/issue tracker: http://savannah.cern.ch/bugs/?group=srmsupportuk Sites have been emailed about this. 4. Status of Transfer tests. Which sites are to perform the next set of transfer tests? Oxford Brunel Liverpool Durham - retest required, only got 88Mb/s first time round. Mark looking at network configuration. We need to keep in mind that we will eventually want to perform simultaneous transfers to multiple sites. 5. Status of SRM v2.1 testing (Jiri). Should we be adding information to the wiki about this? Yes, see actions 75, 76. Owen mentioned that they plan to release an rpm to allow sites to use the test suite and help develop* more SRM2 test scenarios. Jiri has added that he can release some basic tests next week, but does not think that there will be any stress scripts ready by then. The client needs a bit more coding. 6. AOB. 6.1 Lancaster -> RAL transfer of ~830Mb/s over UKLight. Not using FTS. 6.2 Need to find way of letting users/expts know about setup at Tier-2s. Greig asked sites to send details of their disk setup (i.e. is RAID being used, if so, which level?). See Action 78. Graeme mentioned that there is an Architecture field in the Glue schema. You can find the latest draft here: http://infnforge.cnaf.infn.it/glueinfomodel/uploads/Spec/GLUEInfoModel_1_2_final.pdf From p19 of the document: Architecture SEArch_t 1 Underlying architectural system category. String enumeration: disk, tape, multidisk, other (N) Presumably RAID-5/resilient dCache -> multidisk no failover -> disk. 6.3 RAL SC3 re-run over, so we can go back to testing the T1-T2 links. Target is for sustained transfers of 1TB between the T1 and 50% of T2's by the end of the month (transfers in both directions). We may not quite manage to make the target, but a lot of transfers have been taking place involving Glasgow and other T2's. Target rate for T1-T2 transfers is 300-500Mb/s. ------------------------------------------------------------------------ ACTIONS 41 10/08/2005 Agree licence with DESY Jens Open Progress. Jens has had an email back from RAL's maximum legal advisor, with some questions, which he is currently answering. 50 14/09/2005 Can DPM use space on WNs (Durham) Graeme Open Low priority Done. See above. 53 12/10/2005 Find reasoanable % for SE uptime for SC4 Jeremy Open Reassigned. Follow up with GDB et al 54 02/11/2005 Report on performance/scalability with pools on WNs Paul Open Low priority 57 02/11/2005 Investigate StoRM Jiri Open Wait for INFN report Low priority 60 16/11/2005 Add client wsdl->library recipe to wiki Jens Open Low priority 61 16/11/2005 Figure out who in INFN is testing StoRM Jens Open Low priority 74 18/01/2006 Report on YAIM/gLite-installer upgrade 1.4.1->1.4.2 Graeme Open Owen has been trying to look at the glite-installer, although priority remains YAIM development for dCache. 75 18/01/2006 Describe 2.1 testing tool in wiki Jiri Open 76 18/01/2006 Describe status of DPM tests in wiki (test script, results) Jiri Open 77 18/01/2006 Follow up with Alessandra and Andrew re Sheffield install Jens Open Done. Greig visiting Sheffield on the 26th Jan. New Actions ----------- 78 25/01/2006 Get sites to forward details of their disk setups Greig Open 79 25/01/2006 Contact Lancaster RE: dCache problems Greig Open