Storage Phone Conf 30 Nov 2005 Minutes Following participants logged at startup or detected during call: RAL Tier 1: Steve Edinburgh: Greig Glasgow: Jamie Lancaster: Matt, Brian Cambridge: Santanu RAL Storage: Jiri, Owen, Jens (chair+mins) Apologies: Glasgow: Fraser and Graeme Any lurkers, let me know. 0. Review of actions and bug status I forgot to count the bugs before the meeting, but it's 19 open bugs which is less than 26 which we had last time. That's probably good, at least from a pointy-haired perspective. There's not much activity on the Savannah which is less good. Actions, listed below. 1. Delta sites Tier 1 upgrading today. We agreed that there are no Tier 2s we're currently too worried about. 2. Recommendations for upgrade (LCG-soon-to-be-2.7.0-or-something) Upgrade - more a question of _when_ rather than _whether_. This was the main point of the meeting. 2.1. DPM 1.4.1 came out this Monday so is likely to go into next/imminent LCG release, and Jiri has been testing it, compiled from source, and upgrading a 1.3.8. 1.4.1 has -modified WSDL and marvellous python and Perl dependencies. Greig now also testing 1.4.1 install/upgrade. Install is manual for now, and the GridFTP server is missing (use the one from 1.3.8). DPM also comes with SRM 2.1 clients, command line tools plus a hacky Perl script to run them. They may not be perfect but they work and have been used to put and get to/from both DPM (obviously), and the CASTOR SRM2. Yep, the CASTOR SRM2. The DPM clients, as mentioned above, use 2.1.1-modified. 2.2. dCache 1.6.6 has much improved logging, which alone is a strong reason to recommend upgrading soon. See Greig's excellent account of the new (billing) logs: http://wiki.gridpp.ac.uk/wiki/DCache_Billing_Information Notice that the DN is not recorded for srmCopy()ing into dCache (i.e. you tell dCache A to srmCopy a file from SRM B). And even more worrying, not for delete either. But it is a huge improvement over what we had previously. Here is a recent 2.1 status presentation from Michael Ernst: http://agenda.cern.ch/askArchive.php?base=agenda&categ=a056625&id=a056625s1t2/moreinfo Looks like much is recycled from the last ones he gave; the baseline API requirements were decided back in April 05 and things don't seem to have changed much. Which is good. Notice that it says "unlikely to have full 2.1 by end of March" or somesuch but we only need the get/put cycles. Peter confirmed that FTS will not need more (i.e., will not depend on pinning and bells and whistles). The other reason to upgrade is that going 1.5.X -> 1.6.6 -> {whichever version has SRM 2.1} is probably easier or safer or both than 1.5.X -> {whichever version has SRM 2.1}. Owen should publish the YAIM script for brave people to test by the end of this week. 2.3. FTS Reminder: the storage goals for SC4 are defined in terms of FTS: if FTS is able to transfer files via a 2.1 interface then we get a gold star. Each, presumably. This happy experiment is not affected by the performance problems (below); all we need is "does it work or not?" However, there are other goals for the transfer rates, so you do need to worry about performance. Greig reported potential problem with FTS support for 2.1 gets and puts (also sent to list). https://uimon.cern.ch/twiki/bin/view/EGEE/DMFtsWorkPlan In an email dated 14/11/05, Peter told me the expected date for FTS supporting 2.1 was "end of the year". So I am hoping we can start testing in January. Conversely, we can let him have a go at our SRMs. We know JRA1 are working on 2.1 clients. 2.4. 1.1 and 2.1 coexisting There was a quick discussion this morning on the list and at the phone conf about 1.1 and 2.1. Basically an SRM should be able to publish both - on the same port (but with different endpoint) - which is good for your firewall setup, one of the selling points of web services. However, DPM has two different daemons, so they obviously cannot bind to the same port, so 2.1 uses 8444. One extra port to open (and a non-standard one too). The related issue is that of publishing - clearly a 1.1 and 2.1 interface should be different interfaces to the _same_ SRM. I.e., files you put in via 1.1 can be taken out from 2.1 and vice versa. This, we believe, works with DPM, but current DPMs publish only their 1.1 inteface (GlueSEArchitecture: srm_v1). We need to figure out how to publish the 2.1. A reminder that the tapestore will (if all goes to plan) publish only the 2.1 interface (2.1.1-modified). 3. FTS performance update (Glasgow) I nearly forgot this point, till Jamie reminded me. Thanks, Jamie. The regulars will remember performance problems with FTS transfers out of dCache. It turns out there are also problems with lcg-rep. Does dCache throttle the transfer? Work has been done to pin down the problem, see Graeme's summary to the list yesterday: http://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0511&L=gridpp-storage&T=0&O=A&P=21711 Since FTS is essential for SC4, we need to report the problem to LCG soon. Item 4 was covered mainly when we discussed the improved logging. We don't want to do our own security challenges unless we have to, but we must advise admins how to respond if a file contains naughty stuff. 5. AOB None. ------------------------------------------------------------------------ ACTIONS 35 13/07/2005 Find out correct behaviour for what to do on full system Owen Ongoing #10120 closed, but testing on 1.6.6 Ongoing. 41 10/08/2005 Agree licence with DESY Jens Open No news. None at all. Not a sausage. 50 14/09/2005 Can DPM use space on WNs (Durham) Simon Open Low priority No news. 53 12/10/2005 Find reasoanable % for SE uptime for SC4 Jeremy Open Reassigned. Follow up with GDB et al No news. 54 02/11/2005 Report on performance/scalability with pools on WNs Paul Open No news. 57 02/11/2005 Investigate StoRM UNASSIGNED Open Wait for INFN report No news. Depends on 61. 60 16/11/2005 Add client wsdl->library recipe to wiki Jens Open No news. Not a biggie, have just had too many meetings since last week. 61 16/11/2005 Figure out who in INFN is testing StoRM Jens Open No news. 62 16/11/2005 Follow up with JC about NorthGrid being dCache only Jens Open High priority Done. Not a problem - NorthGrid can have what they want/like (or what Alessandra likes :-) 63 16/11/2005 Follow up with Sheffield about timeline Jens Open Ongoing, but no news since last week. 65 23/11/2005 Send site-info.def to Owen ALL Open Three sites have sent theirs (Liverpool, Lancaster and one more). That may be enough. Lancaster's setup suggested YAIM improvement, namely, to separate the pool setup (names of nodes) with that of the doors (which nodes have doors). 66 23/11/2005 Follow up with Owen and Greig re testing Liverpool Jens Open Ongoing. There's probably not much else to do now; best thing is to hook the thing up to LCG and see what happens. 67 23/11/2005 Clarify future of classic SEs for sites that still have them Jens Open Not urgent Almost done, two unaccounted for. ------------------------------------------------------------------------ NEW ACTIONS, HURRAH! 68 30/11/2005 Report on DPM 1.4.1 upgrade at Edinburgh Greig Open Now it's done at RAL by Jiri; and Greig is doing it at Edinburgh. 69 30/11/2005 Follow up with LCG re FTS from dCache performance Glasgow Open See above. 70 30/11/2005 Publish YAIM script for 1.6.6 for testing by brave people Owen Open By the end of this week. 71 30/11/2005 Figure out how to publish 1.1 interface and 2.1 to same SE Jens Open I've assigned it to me by default, maybe I'll ask LCG about it. Coming up with a proposal shouldn't be terribly difficult.