Minutes of storage phone conference 11 Oct 2006 Present: Edinburgh: Greig RAL Tier 1: Derek Durham: Mark Lancaster: Matt, Brian Bristol: Jon Glasgow: Graeme, Paul RAL Storage: Jens (chair+mins) 0. Review of actions (see below) 1. Deployment of MonAMI plugins for GRidPP (DPM & dCache) Most of the meeting was spent discussing MonAMI status and plans. Status can also be found in the Wiki when it resurfaces. Paul has improved MonAMI to be async and multithreaded, and output plugins have been improved; now input plugins may need improving. Graeme is currently busy deploying Glasgow infrastructure but will return to monitoring as soon as pos., deploy the MonAMI monitoring probably before next GridPP, and have something deployable before the end of the month. Currently doesn't do more than the improved GIP plugin for DPM, but looking into adding other stuff such as watching pools. Built to run on head node only. Greig's monitoring for dCache is available from CVS, and currently monitors number of requests - but doesn't specifically account for how many of those fail. Could be installed in 10 mins. Need to run also on pool nodes, e.g. to watch for too many sockets in CLOSE_WAIT. Lancaster volunteered to test it - soon (ACTION). 2. Storage metrics, monitoring, redux. Monitoring via MonAMI can also publish to RGMA, so if we get it to watch metrics (cf storage accounting page in wiki), data can be gathered centrally. How is this different from the FTS (or heartbeat) monitoring? FTS watches one VO and checks for whether the SE is running (active monitoring); the plugins can watch the SE "from inside" and report back not just whether it is running, but also how busy it is, what the transfer rates are, etc. This sort of monitoring is complementary. 3. Not a word about the GLUE storage schema Except for the actions mentioned below. 4. StoRM update Jon is running StoRM because he already has GPFS and needs an SRM. We are all agog for the results, but it is (very) unlikely the GridPP storage can provide support for this at the moment because nobody else has StoRM experience (in GridPP). Jens reports that FZK, whom he talked to last week, have dCache running on top of GPFS, and have good experiences with it (reporting similar experiences to those of HPCx, see below). 5. The ongoing experimental survey Is ongoing - get your votes in before Monday. Report next week. 6. AHM round up Brian gave a report last week, nothing else to report. 7. GridPP17 plans It was suggested we give this another think and return to this next week. 8. AOB NOB. ------------------------------------------------------------------------ ACTIONS 41 10/08/2005 Agree licence with DESY Jens Open Ongoing - see survey. 86 08/02/2006 Extend monitoring to do sites per VO and VOs per site Greig Open Ongoing - waiting for Dave Kant to implement something. Greig will poke (ACTION). Jens asked about something that can be imported into spreadsheets - needs investigation, but job accounting does have csv. 116 31/05/2006 Progress of Durham-MAN networking discussions. Mark Open Local MAN to upgrade connection to Janet to 2.5 Gb/s. Cap was raised for last transfer to 400 Mb/s. 119 07/06/2006 Circulate next version of VO storage to list Jens Open No news as far as Tier 2s are concerned, and becoming urgent. 121 05/07/2006 Get report from NGS on GPFS Jens Open Ongoing. It turns out Jon is the person "from NGS" testing GPFS - However, it turns out that there are more people in (or affiliated with) NGS using GPFS: * Southampton's National Oceanographic centre (http://www.soc.soton.ac.uk/) * HPCx in Daresbury (http://www.hpcx.ac.uk/) has 70 TB but on AIX, and report: "performance scalability reliability is all pretty good. However it sometimes needs a bit of coaxing to recover when failures do occur" * Rumour that some people in Nottingham are investigating GPFS for HPC. 129 09/08/2006 Produce GridPP response to dCache licence Jens/ALL Open Ongoing - see survey. 130 09/08/2006 Get legal input on dCache licence Jens Open Depends on outcome of 129. 138 30/08/2006 Check DPM drain functionality Greig/Yves? Open Done - doesn't work, but no one has tested 1.5.9 yet. 139 06/09/2006 Test 1.7.[45] dCache, and the 2.2 in particular Greig Open Done, except 2.2 hasn't been tested yet (mostly waiting for FTS now, mid Nov). We cannot yet recommend upgrading from 1.6.x. Closed. 141 06/09/2006 Circulate new schema diagram Jens Open Ongoing in SRM group - circulate latest version. 142 06/09/2006 Circulate new schema document when available Jens Open As 141. 143 06/09/2006 Volunteer DPM site for FTS testing with 2.2 support ALL Open Again, waiting for FTS to support 2.2 but it would be good if a site would raise its hand and say me me me. 144 06/09/2006 Volunteer dCache disk site for FTS testing with 2.2 support ALL Open As 143. 152 13/09/2006 Investigate using jiscmail surveys for, er, surveys. Jens Open Ongoing - report next week on outcome.