UKI Monthly Operations Meeting (TB-SUPPORT) - F2F

Europe/Zurich
IT Auditorium (CERN - IT Amphitheatre)

IT Auditorium

CERN - IT Amphitheatre

Jeremy
Description
The meeting will be held in conjunction with the Tier-2 workshop at CERN. The original meeting (09:00-10:40) was extended to end at 12:30. The agenda below was followed but not to the times recorded.
minutes
    • 09:00 09:10
      Introductions 10m IT Auditorium (VRVS)

      IT Auditorium

      VRVS

      - A chance to know who is who!
      Speaker: All
    • 09:10 09:15
      Discussion on ALICE 5m IT Auditorium (VRVS)

      IT Auditorium

      VRVS

      - Which UK sites already support them? - Who may wish to take part in PDC'06 (30-45 days?) - VOBox requirements (Birmingham feedback) - Require LFC locally at Tier-2 - Require xrootd on SEs
    • 09:15 09:30
      Discussion on ATLAS 15m IT Auditorium (VRVS)

      IT Auditorium

      VRVS

      - Direct contact with sites - Requirements (shared file system, 20-30GB disk, glibc) - Thoughts on mySQL (or other DB) at Tier-2s - Sites wishing to run own services (ie. not dependent on T1) - Tier-2 specialisation - Bandwidth (AOD 20MB/s + reprocessing 20 MB/s. 3.2MB/s continuous for simulated data) - Ability of Tier-1 to take MC output - Network planning for outside UK storage (what testing is required) - "Send jobs to data". How do we ensure input data is available. - Consequences of a large number of user job failures. What can we do to deal with it? - Data services still unclear
    • 09:30 09:45
      Discussion on CMS 15m IT Auditorium (VRVS)

      IT Auditorium

      VRVS

      - Some CPU bounding of jobs - Locally merging small files - heavily I/O bound. Site consequences - Partitioning of resources (any concerns/fears!?) - 1Gb/s bandwidth requirement "any less and the disk is essesntially static" - Installing a Squid cache (http cache server - is 15 mins realistice!) - Optional services - Assumption that "Tier-2 runs efficiently throughout the year" - Experience with PhEDEx - "It is not too late to join the deployment activities". What is the LT2 experience and which other sites are interested in joining?
    • 09:45 10:00
      Discussion on LHCb 15m IT Auditorium (VRVS)

      IT Auditorium

      VRVS

      - Contribution good so far. How will this be affected when scheduling kicks in? - Primarily using T2s for MC. Will any of our larger Tier-2s support analysis? - LHCb problems log: http://santinel.home.cern.ch/santinel/cgi-bin/logging_info Is such logging useful to sites. What else do sites need? - WN requirements (Python >2.2, 2.5GB disk/CPU, outbound connectivity) - Storing data (Lyon or NIKHEF?). - Question about default s/w stack and missing site software. What is the feedback? - Other problems: -- Wrong queue settings; faulty WN (ssh keys); expired certificates; overloaded file servers; SE and FC failures - Links: Monitoring: http://lhcb01.pic.es/DIRAC/Monitoring Accounting: http://lhcb01.pic.es/DIRAC/Accounting
    • 10:00 10:15
      Maintaining a site 15m IT Auditorium (VRVS)

      IT Auditorium

      VRVS

      - What does the MoU say about "availability" & response times?(http://lcg.web.cern.ch/lcg/C-RRB/MoU/WLCGMoU.pdf p22-23). GridPP MoU (2004) is here http://www.gridpp.ac.uk/tier2/ - How close are we!? - Ways to improve -- Monitoring tools (DTEAM discussion on sharing Nagios extension scripts to identify common failure modes). Which sites currently run Nagios? What else?? -- Site coverage -- Use of trouble tickets (aside: there are currently a lot of open tickets. What is the site experience of what is happening?) -- "Care and maintenance guide" (Stephen Burke will present on this at GridPP16) -- Other communications issues
    • 10:15 10:35
      Catch ALL discussion! 20m IT Auditorium (VRVS)

      IT Auditorium

      VRVS

      - Current views on the middleware (releases) and move towards more incremental/ongoing updates - SFTs. Move to daily report availability. Noting problems as non-relevant/relevant. How the reports are now used. Move to Service Availability Monitoring (https://lcg-sam.cern.ch:8443/sam/sam.cgi) - Security. What are the current concerns. Membership of the vulnerability group. Where are we with an incident response procedure! - Getting a feel for job efficiencies across sites. The Tier-1 scripts are available and it would be useful to roll them out (http://www.gridpp.rl.ac.uk/stats/) - Ganglia deployment for (federating Ganglia information) tools such as: http://ganglia.gridpp.rl.ac.uk/cgi-bin/ganglia-fs/fs-page.pl?m=%5Bnone%5D&r=day&s=descending - Use of site wiki pages and blogs
    • 10:35 10:40
      AOB 5m IT Auditorium (VRVS)

      IT Auditorium

      VRVS

      - Areas where more information is required/wanted (e.g. GPBox: https://twiki.cern.ch/twiki/bin/view/DILIGENT/GLiteThree http://infnforge.cnaf.infn.it/docman/view.php/5/101/G-PBbox_JRA1_All-Hands_Meeting.ppt)
    • 10:40 12:30
      Meeting extension 1h 50m IT Auditorium (VRVS)

      IT Auditorium

      VRVS