Service Challenge Meeting at CERN

Europe/Zurich
513/1-024 (CERN)

513/1-024

CERN

50
Show room on map
Jamie Shiers
Description
Meeting meant primarily for the planning of SC2 (March) and SC3 (July) VRVS: Island Virtual Room Phone access: +41 22 767 7000 and ask for "Service Challenge Meeting" chaired by Jamie Shiers For visitor portable access, please see http://it-div.web.cern.ch/it-div/gencomputing/VisitorPortables.asp Give JAMIE SHIERS IT-GD as contact person. It is proposed that each site address the following questions, and give a timetable in weeks. -- what is your data transfer cluster ? -- and if you don't have it, when will you have it ? -- what is your network and how much bandwidth can be used when ? -- what is your software: OS, kernel version, globus version ? -- when do you need to be alone for perf testing ? -- what software will be tested and when ? -- what is your SRM implementation / timeline ? -- what are your performance milestones ?
    • 10:00 10:15
      Introduction & Goals of SC2 15m
      Speaker: Jamie Shiers
      more information
    • 10:15 10:45
      CERN plans for SC2 30m
      Speaker: Vlado Bahyl
      more information
      transparencies
    • 10:45 11:15
      CNAF plans for SC2 30m
      This is the time schedule for the INFN SC2 partecipation. CNAF Roadmap for SC2 7-11 Feb: Network set-up and performance testing of connection to CERN 14-18 Feb: Installation of servers 21 Feb - 11 Mar: Configurations tuning and software set-up 14-25 Mar: SC2 A more detailed action list will follow.
    • 11:15 11:45
      FZK plans for SC2 30m
    • 11:45 12:15
      IN2P3 plans for SC2 30m
      Proposal for SC milestones at CCIN2P3 - From Feb 14 : Setup SRM-dCache on new hardware + tests - Feb 15-17 : sustained CERN-CC disk-disk transfers (100MB/s) - weeks 9-10 : sustained CERN-CC disk-disk transfers (100MB/s) or weeks 11-12 if CERN want all T1 at the same time. Cluster will be 2 nodes with 256GB disk each. - week 17 : SRM-dCache available as front-end to HPSS We could begin to test to write to tapes from CERN. Cluster will be 2 dCache pool nodes with 2TB disk each. - week 21 : 3 days sustained transfers between CERN and Lyon tapes. The rate target is 50-60 MB/s - weeks 27-30: SC3. Expected rate 50-60 MB/s. - september 2005: a direct link to CERN at 10Gb/s (dark fiber) will be available. - september 2005: we expect to add a few dCache nodes to the cluster. - beginning 2006: Lyon will be connected to Geant at 10Gb/s.
      more information
    • 12:15 12:45
      FNAL plans for SC2 30m
      more information
      transparencies
    • 12:45 14:15
      Lunch 1h 30m
    • 14:15 14:45
      RAL plans for SC2 30m
      Here is preliminary proposal - but of course it depends on meshing in with James's plans. Intention: Obtain 2.5Gb UKLIGHT link end to end with CERN (needs negotiation). Try and fill as much as possible of 2*1gbit/sec pipe from RAL over short term test. Sustain 100MB/s for 2 weeks - last 2 weeks November. Intend to deploy up to 16 worker nodes as gridftp. Start with 4 but throw more at the problem if needed. Will backend to a number of disk servers/RAID arrays to deliver sufficient performance. Exact details to be finalised after I get resource from GRIDPP User Board - probably a significant number of RAID arrays. We will run this test on a new DCACHE infrastructure in parrallel to the existing framework currently being used by CMS. Week ending 11th February. Agreement reached with UK LIGHT (and onward), for end to end provisioning of lightpath/bandwidth to CERN. Available to deliver at least 2Gbit capacity. All hardware available - freed from production (except extra network cards?) Deployment of new DCACHE infrastructure commenced. Hardware benchmark/profiling commenced 18th February End to end connectivity established to CERN. This depends on UKLIGHT being live at RAL by 14th February, which is our current best estimate from UKERNA. Not within our control. Tier1 attached to UKLIGHT. Local system benchmarking complete 25th February Network guys completed capacity/throughput tests and hand over DCACHE Infrastructure deployed 11th March End to end transfers tested over DCACHE SRM - CERN to RAL 18th March DCACHE tested to peak load (try and achieve 2Gb/s) - depends on bandwidth being made available by UKLIGHT/CERN. 21st March 2 week SC2 starts at 100MB/s. Runs unnatended (this is what Jamie said to me and I take him at his word). Also note that UK networkshop 22-24th March. Longer Range: 1) Deploy prototype SRM to tape - end of February 2) Complete internal stress test (not throughput) on SRM to tape by end of March 3) Deploy new RAID controllers to enhance robot capacity to tape (end April) We continue to plan/stress test towards 50MB/s for 1 month July.
      more information
    • 14:45 15:15
      NIKHEF/SARA plans for SC2 30m
      Speaker: Kors Bos
      more information
    • 15:15 15:45
      BNL plans for SC2 30m
      more information
    • 15:45 16:15
      Triumf plans for SC2 30m
      TRIUMF's plans for 2005 Service Challenges: =========================================== February 7, 2005 The following outlines the goals we would like to achieve in order to participate in the Service Challenges: A) 10 GigE lightpath tests between Vancouver-Ottawa: ------------------------------------------------- February: The immediate plans are the following: We have 3 machines available for the test. 1 SUN Sunfire V40Z quad opteron (2.4GHz processors) with 8GB memory but only 2 * 72 + 3*144 GByte SCSI drives. We are looking at an economical way to connect more storage to this unit - possibly by simply connecting 8-16 SATA disks housed in an external box which simply powers them and connects them to a pair of raid cards installed in the SUN via sata cables through the rear open pci slots. We also have 2 Tyan based Dual 2.4GHz Opteron machines, each with 4GB memory and each with 16 300GByte SATA drives connected to 2 RocketRaid 1820 controllers. These need to be configured this week with 64bit Fedora Core 3 kernels with support for - bbftp - bonnie++ - RocketRaid 1820a support - 10Gbit Intel support - 10Gbit S2IO support - xfs support - iperf - cacti monitoring - ssh configured to allow easy interconnect Tests already indicate good xfs read in raid5 configuration - 420MB/sec being standard and 620MB/sec being available under circumstances that needs to be better understood. xfs writing is currently limited to about 250MB/sec. We have 2 10GbE intel cards and 1 S2IO 10GbE card. Tests could thus try to aggregating to/from 2 machines to the third. Lots of combinations to explore. We need to establish stable disk-to-disk transfers over the next week - at a minimum 200MB/sec. As soon we have this we should have Ottawa end likewise configures and start transfers to/from Ottawa. We have kept two 8 channel 3-Ware 9500-s8 SATA Raid cards from Ciara for use in the SUN, or alternatively for when the RockeRaids fail to perform as required in either read or write modes. The 10Gbit link between TRIUMF and Ottawa should be checked out and established this week. Consideration should be given to implementing gridftp and using it instead of bbftp. B) March Service Challenge hardware and 1 GigE lightpath ------------------------------------------------------ February: by mid-February, We will finalize the purchase of few more servers (4-5). These machines will effectively be used in the incoming service challenges. Typically Dual processor/ 2G RAM / RAID 5 with at least 8 disks (2.4+ TB)/ dual GigE (channel bonding). The goal is to aggregate these servers to be able to write at a speed of 500 MB/s with an SRM interface. 1 GigE networking preparation:(needed for end of March service challenge) - 1GigE light path to CERN can be stablished immediately, TRIUMF has the neccessary lambda and optics, must make a request to CANARIE for the lightpath, to be carried across CA*net4 by CANARIE and by Surfnet from either MANLAN in New York or STARLIGHT in Chicago to CERN. A request will be submitted in the week 7-11th Feb for a 1GigE lightpath until the end of the year, or until CANARIE can contact a permanent 10G lightpath which they are currently in the process of procuring. Will also request a routable address space from BCNET. March: Prepare new machines for the March service challenge: - installation/configuration of dCache/SRM service on new servers - Site tuning / performance tests for stable operations - Service Challenge at 100 MB/s (Disk to Disk) C) June Service Challenge and 10 GigE lightpath: -------------------------------------------- April-May: 10 GigE networking preparation: - 10G lightpath status, currently CANARIE is in the process of procuring a permanent 10G lightpath to CERN. TRIUMF currently has 10GigE equipment on loan from Foundry. A purchased solution is awaiting clarification on the availability of 10GigE WAN PHY 1550nm optics as well as whether or not 10 GigE LAN PHY 1550m, optics will be availalble at the BCNET gigapop. This will not be known until end of March. - A 10G lightpath between TRIUMF and CERN will requested between June 13 and 24th for a Service Challenge test. The specific 10G equipment that will be used will be determined by the availability of the optics mentioned above and can not be determined at this time. June: 10 GigE tests between Vancouver - CERN (via Amsterdam) - Allocated time splot: 13/6-24/6 - Site tuning - Service Challenge (single site) Disk to Disk at 500 MB/s D) Infrastructure and Hardware for Service Challenge (disk/tape to tape): ---------------------------------------------------------------------- Summer-Fall: Work on Tier 1 site infrastructure (computing room preparation / engineering work). The exact time table is not known yet. Fall 2005: Acquisition of a Tape library system (when computing room ready) - Tape library unit (base frame, IBM 3584 or something similar) - 3 drives - 100-200 tapes - dCache/SRM + tape back-end configuration - site tuning / performance tests December 2005: Service Challenge (to tape at 50 MB/s)
    • 16:15 16:30
      Update on Computing Models & T2 plans 15m
      Speaker: Jamie Shiers
    • 16:30 16:45
      SC3 draft milestones 15m
      Speaker: Jamie Shiers
      more information
    • 16:45 17:00
      Coordination with 3D Project 15m
      Speaker: Dirk Duellmann
      more information
      transparencies
    • 17:00 17:15
      Future meetings and events 15m
      The tentative schedule and goals of future SC meetings and workshops will be discussed.
      Speaker: Kors Bos, Jamie Shiers
      more information