Minutes of storage phone conference 9th Aug 2006 Present: Edinburgh: Greig RAL Tier 1: Derek Glasgow: Graeme, Jamie Lancaster: Matt DESY: Owen RAL Storage: Jiri, Jens (chair+mins) 0. Review of actions (see below) 1. Updates and status We discussed whether any sites need special attention. Manchester is upgrading but Alessandra is on leave. Matt reported a dCache accounting problem which may just be due to extra replicas being kept. Maybe it is necessary to lie about the free space (again) when more than one copy is being kept? We need some more documentation in this area. 2. Input needed for site testing? Jamie is coordinating the new tests, with Graeme as the Eminence grise. Jamie and Greig will get in touch about giving sites a heads up in prep'n for the oncoming challenge. It is worth noting that CMS will make T0->T1 transfers to CASTOR at the same time. 3. Storage accounting & monitoring & goc db Really covered under the actions. 4. dCache licence discussion? Also covered under the actions. See also new actions. 5. UK support status, dCache support How is support going at the moment, given that Greig basically has to support all Tier 2s singlehandedly? It's sort of going OK at the moment, but when releases need testing, more effort will be required. There is also more than support; there is a slew of ongoing issues like the monitoring and plugins and testing, and documentation of course. Owen can and will provide some support, but will occasionally have higher priorities, like the LCG deployment. We discussed briefly whether GridPP can continue to support all three of DPM, dCache, and CASTOR SRM. The official position is that, we can, because we have to. No one size fits all. It was agreed that that we cannot provide the service without supporting all three, even if we have our collective hands full. As a corollary, we cannot really support anything else, e.g. StoRM. GridPP sites should be discouraged from running unsupported solutions. Part of the question is also whether we can, and should, support one or more distributed file systems as underlying technology to enable missing storage to be made available. At the moment, we support none, and we don't have particularly promising experiences with any either. 6. CASTOR update RAL's CASTOR is now publishing its accounting information. Oh, and it's running? Did I mention that? 7. AOB Graeme mentioned GDB site storage group, getting input from OSG on recovering lost files - will circulate to list. ------------------------------------------------------------------------ OPEN ACTIONS 41 10/08/2005 Agree licence with DESY Jens Open We now have a licence but there are bits we don't understand (e.g. source access?), and we need legal review, and should probably make a combined response as GridPP - does it meet our needs, or do we still need a special licence for GridPP? 53 12/10/2005 Find reasoanable % for SE uptime for SC4 Jeremy Open Ongoing - now depends on us being able to monitor uptime better. 54 02/11/2005 Report on performance/scalability with pools on WNs Paul Open Was reassigned to Greig, and closed. 86 08/02/2006 Extend monitoring to do sites per VO and VOs per site Greig Open Ongoing - Greig's RGMA is doing the accounting, and Dave Kant is working on the publishing side. It is also important to distinguish accounting and monitoring - see also minutes from 19/07/06 about Nagios plugins. Nagios plugin for DPM has progressed; currently being built to publish via Ganglia, but will then be modified to use Nagios. Should eventually query rfio_statfs which can provide more detailed information. Will also be monitoring the daemons. The FNAL plugins just query the dCache status web page. More detailed monitoring may be required. CASTOR has made progress; we can now do the accounting bit via a GIP plugin run directly from the BDII. Owen made the point that it will be easier to get tools to use the free space information when it is reliable. 105 03/05/2006 Re-poke DESY or FNAL about SRM 2.1 for dCache Owen Open The official statement is that GridPP wants to play with FNAL's 2.2 even if it isn't finished, so we would like it to go into the 1.7.0 release. Owen should talk to Tigran and Patrick (who are currently out). 116 31/05/2006 Progress of Durham-MAN networking discussions. Mark Open No news. 119 07/06/2006 Circulate next version of VO storage to list Jens Open Jens will check where this has gone. 121 05/07/2006 Get report from NGS on GPFS Owen Open Progress reported last time. Jens will follow up. 125 12/07/2006 Add SURL publishing recipe for DPM to Wiki Graeme Open Done. 126 19/07/2006 Wiki page describing dCache specific steps when storage lost Greig Open This is for admins to be able to recover SURLs from PNFS. Derek has a recipe. Ongoing. 127 19/07/2006 Test out dCache Nagios plugin Greig Open Ongoing, see monitoring action #86 above. RAL also has Nagios and can do something. 128 19/07/2006 Circulate DPM monitoring wiki page to the list Graeme Open Done. ------------------------------------------------------------------------ NEW ACTIONS 129 09/08/2006 Produce GridPP response to dCache licence Jens/ALL Open 130 09/08/2006 Get legal input on dCache licence Jens Open 131 09/08/2006 Give sites heads up regarding next SC Greig/Jamie Open 132 09/08/2006 Circulate OSG email about recovering lost files to list Graeme Open