SC4 / pilot WLCG Service Workshop

Name: SC4 / pilot WLCG Service Workshop
Start: 2006-02-10T09:00:00+01:00
End: 2006-02-12T19:30:00+01:00
Location: TIFR, Mumbai

10 Feb 2006, 09:00 → 12 Feb 2006, 19:30 Europe/Zurich

TIFR, Mumbai

Description

Mailing list for attendees: hep-event-sc4workshop@cern.ch The conclusions on data management are attached in the following "document"

Friday, 10 February
- 09:00 → 18:50
  Data Management Mumbai
  
  Mumbai
  - 09:00
    Introduction & Workshop Goals 15m
    
    This talk will try to identify the key issues that we need to understand as a result of this workshop, including: - Discussing and obtaining draft agreement of the detailed Service Plan for 2006 (what services are deployed where, schedule, agreed functionality that is delivered by each release etc.) - Data flows / rates (reprocessing at Tier1s, calibration etc. (e.g. the TierX-TierY ones...) - Corresponding impact on Tier1s and Tier2s The Tier0-Tier1 rates, as well as first pass processing, are assumed to be well known ...
    
    Speaker: Jamie Shiers
    
    more information
    
    more information
    
    more information
    
    more information
    
    more information
    
    more information
  - 09:15
    
    Update on LCG OPN 15m
    
    The status and timeline of the LCG Optical Private Network (OPN) is presented.
    
    Speaker: David Foster
    
    more information
    
    transparencies
  - 09:30
    
    FTS status and directions / FTS requirements from a site point of view 1h 15m
    
    This session will cover recent and foreseen enhancements to the FTS s/w and service. (20') It will also provide an opportunity for sites to express their requirements in terms of operation, manageability and so forth.
    
    Speaker: Gavin McCance, Andreas Heiss (tbc) (CERN, FZK)
    
    more information
    
    transparencies
  - 10:45
    
    DPM / LFC 1h 15m
    
    This session will summarise recent and foreseen enhancements to the DPM & LFC software and services (20') It will also cover the site perspective from a deployment / service point of view.
    
    Speaker: Jean-Philippe Baud, Graeme Stewart (CERN, Glasgow)
    
    transparencies
  - 12:00
    
    Lunch 1h 30m
  - 13:30
    
    CASTOR2 + CASTOR SRM 2h
    
    Deployment schedule of CASTOR2 and CASTOR SRM across the sites, including: - proposed h/w configuration to be used at CERN for SC4; - specific CASTOR and SRM 2.1 features provided by each release, - testing schedule, site & experiment validation etc.
    
    Speaker: Jan van Eldik, Ben Couturier - for the CASTOR / SRM team (CERN - RAL)
    
    transparencies
  - 15:30
    
    dCache 2h
    
    - Summary of dCache workshop. - Deployment schedule of dCache 2.1 SRM across the sites, including specific features provided by each release, testing schedule, experiment validation etc.
    
    Speaker: Patrick Fuhrmann/Michael Ernst for the dCache team (DESY)
  - 17:30
    
    SRM / FTS / etc interoperability issues 1h
    
    By definition, the various components cited above must interoperate. We should identify any outstanding issues and produce a plan to resolve them.
  - 18:30
    
    End of day 1 20m
Saturday, 11 February
- 09:00 → 18:20
  Services and Service Levels Mumbai
  
  Mumbai
  
  more information
  - 09:00
    
    Service Checklist 1h
    
    List of requirements for setting up a production service - the "service dashboard" / "checklist"
    
    Speaker: Tim Bell
    
    more information
  - 10:00
    
    Deploying Production Grid Servers & Services 1h
    
    By the time of this workshop, all main Grid services at CERN will have been redeployed to address the MoU targets. This session covers the various techniques deployed as well as initial experience with running the services. Recommendations for sites deploying the same services will be offered.
    
    Speaker: Thorsten Kleinwort
    
    transparencies
  - 11:00
    
    COFFEE 30m
  - 11:30
    
    Service Availability Monitoring 1h
    
    Experience with monitoring services using the Site Functional Tests
    
    Speaker: Piotr Nyczyk
    
    transparencies
  - 12:30
    
    Lunch 1h 30m
  - 13:30
    
    Service Level Metrics - Monitoring and Reporting 1h
    
    A proposal for how component level testing could be aggregated into MoU-level services; metrics; reporting etc.
    
    Speaker: Harry Renshall
    
    more information
    
    transparencies
  - 14:30
    
    Deploying Agreed Services at Tier1 / Tier2 Sites 1h 30m
    
    In this session we attempt to reach draft agreement on the detailed Service Schedule for 2006. This includes: - The list of services deployed at which sites; - The service roll-out schedule; - The service roll-out procedure (which must include both pre-testing / certification by the deployment teams as well as prolonged testing in a production environment (the Pre-Production System assumed) by the experiments.
    
    Speaker: SC team, Sites, Experiments
  - 16:00
    
    Operations Model 45m
    
    - The operations model for 2006 and roll-out schedule (from 'SC operations' to standard operations - what, if anything, remains particular to WLCG production?) - Coordinating the various mailing lists, announcement mechanisms, out-of-hours coverage... - The support model for 2006: moving from stop-gap support lists setup for SC3 to GGUS support; time-table, scope etc.
    
    Speaker: Maite Barroso
    
    transparencies
  - 16:45
    
    Service Levels by Service / Tier / VO 45m
    
    A discussion on the target service levels. The baseline is given in the attached planning document. All sites - at least the Tier1s and DESY - should come prepared to report on service levels and availability at their site.
    
    Speaker: Helene Cordier + ALL SITES (IN2P3)
    
    actionlist
    
    more information
  - 17:30
    Distributed Database Deployment 30m
    
    Conclusions of the workshop held at CERN on February 6th: Agenda : http://agenda.cern.ch/fullAgenda.php?ida=a058495 The workshop goals are to review the status of the database project milestones on the tier 1/0 side and on the experiment side. This boils down to - to what extent are the production database servers and frontier installations at T0 and T1 becoming available? What problems setup have been encountered/ solved? Which steps still need to be completed before opening the first distributed LCG database setup for users from the experiments in March? How can the remaining Tier 1 sites join/participate during the first production phase (March-Sept) in order to be ready to also provide database services for the full deployment milestone in autumn? - to what extent has the development of the main applications (conditions data and other apps to be tested as part of SC4) been completed? This includes the definition of concrete detector data models including refined estimates for volume and access patterns, connection between online and offline, connection between T0 and T1 (reports from the various distribution tests).
    
    Speaker: Dirk Duellmann, Andrea Valassi
    
    transparencies
    
    transparencies
    
    transparencies
    
    transparencies
  - 18:00
    
    End of day 2 20m
Sunday, 12 February
- 09:00 → 18:00
  Experiment and ARDA Use Cases Mumbai
  
  Mumbai
  
  The various experiment Use Cases, data volumes, flows and rates will be presented.
  
  In particular, reprocessing at Tier1s, detector calibration processing and analysis use cases should be addressed (including Tier1-Tier1, Tier1<->Tier2 transfers, and TierX<->TierY transfers).
  - 09:00
    
    LHCb Use Cases for SC4 1h
    
    Speaker: Nick Brook / Ricardo Graciani
    
    more information
    
    transparencies
  - 10:00
    
    CMS Use Cases for SC4 1h
    
    Speaker: Ian Fisk (TBC)
    
    more information
    
    transparencies
  - 11:00
    
    ATLAS Use Cases for SC4 1h
    
    Speaker: Dario Barberis
    
    more information
    
    transparencies
  - 12:00
    
    Lunch 1h 30m
  - 13:30
    
    ALICE Use Cases for SC4 1h
    
    Speaker: Federico Carminati / Piergiorgio Cerello
    
    transparencies
  - 14:30
    
    ARDA Use Cases for SC4 1h
    
    Speaker: Massimo Lamanna
    
    transparencies
  - 15:30
    
    ROOT / PROOF / xrootd 1h
    
    Service and deployment issues... Target sites, service levels, monitoring, procedures, etc.
    
    Speaker: Rene Brun / Fons Rademakers / Andy Hanushevsky
  - 16:30
    
    Discussion on Analysis Requirements and Services Required 30m
    
    Speaker: all
  - 17:00
    
    Workshop Summary 30m
    
    Speaker: Jamie Shiers
  - 17:30
    
    END OF WORKSHOP 20m
- 18:00 → 19:30
  Storage Types BOF
  
  Minutes
  The text below is now a summary of the BOF conclusions ordered logically; see the "minutes" link for a more faithful record of the discussions.
  1. It was agreed that the resaons for the volatile/durable/permanent distinction make little sense for HEP centres storing HEP data and that issues of interest to HEP physicists (is a file on tape, who manages the cache) are not covered.
  2. Although the discussions on the Sunday talked about changng the SRM specification (and this may indeed be done), discussions on the Monday and Tuesday focussed more on the Glue Schema which uses these same terms to describe Storage Areas which are provided by a Storage Element. Although the changing/extending the Glue schema could take some time, there is little to stop people using a new type ("fred") for these storage areas provided all agree what is meant by the new type name.
  3. One of the word documents attached here describes a list of possible storage types. If this list can be agreed to be exhaustive and names assigned to all types in the next week or so then fine. (It may be possible to pick [replacement] names for just the two initial SC4 classes, but time presses. The advantage would be that a migration from these names would therefore be avoided.) If not, we use durable and permanent as the sole storage types for the start of SC4, with the meanings as described there.
  4. The SRM 1.1 interface will be used for the start of SC4
  5. The consequence of points 3 & 4 is that HSM providers and sites must decide how permanent and durable storage areas can be implemented such that someone checking the information service can use the areas as required. This to be done by February 27th.
  6. Thereafter, HSM providers and sites have a little longer to name the different storage types and implement all of these in the SRM 2.1 interface. The target to have all of this done (and a migration path defined) such that we can move to the SRM 2.1 interface by October. SRM 2.1 interfaces that are not compatible with what is agreed are not to be introduced.
  7. Maarten Litmaath of CERN will be responsible for fostering discussions and convening meetings as necessary to ensure agreement for SRM 2.1 introduction in October.
  document
  
  document
  
  document
  
  document
  
  document
  
  document
  Those present reviewed possible storage service classes that could be provided by storage systems to see which were relevant for the LHC experiments. In the interest of time, we concentrated on storage of large production data files rather than user files. The list of possible service classes is shown in the attached document. Service classes in bold font were identified as required for SC4, greyed out service classes were thought to be of little relevance even in the long term.
  
  Three possible options for the experiments to specify which storage class to use were also discussed, although rather briefly.
  
  It was agreed that
  
  For SC4 only
  
  Glue schema storage types “permanent” and “durable” should be used to refer to tape backed up storage (e.g. for raw data) and disk resident storage (e.g. ESD files) respectively
  as there are only two service classes, these are distinguished by using an SRM endpoint for each.
  
  In the long term
  
  Mass storage system providers, site representatives and experiments should agree a full classification of all the storage types, assigning a name to each. It was considered that we can add SRM types to refer to these once this is done, although care has to be taken for sites which support non-HEP users where issues other than file safety and ease of access are relevant (e.g. data can be deleted automatically after a set period of time without informing the user).
  Following this, agree an efficient/effective way for users to select amongst the available storage classes.
  
  [Discussion Monday during coffee (Tony, Don, Timur, Don, Graeme): multiple SRM endpoints look difficult, but probably not required. Advertise multiple Storage Areas (SAs) (with GlueSAType durable or permanent), each of which has an associated path which is VO specific. To be discussed internally by each storage system provider; response in next 10days. Note: This is in context of SRM 1.1, not SRM 2.1. which won\'t be used for (start of) SC4.] Still the case that long term solution is to be decided.
  At the closeout session on Monday, it was agreed that system providers should return with a proposal of how the GlueSAType can be advertised (see attached Glue considerations document). It was further agreed that
  <sl>
- A clear message should be given to the experiments on how to intercase with the SRM 1.1 client (i.e. how does fts/gfal interface to this?)
- SC4 starts without SRM 2.1. SRM 2.1 interfaces to storage systems must support whatever is agreed as the future ontology/access system, but must still aim for October as the deadline for introducing an SRM 2.1 service.
- Sites without tape must advertise their GlueSAType as durable.
- A plan for the migration from SRM 1.1 to SRM 2.1 must be prepared. In particular, this must explain how to deal with the experiment catalogues. (catalogue stores reduced SURL; does anything change in what is stored—e.g. endpoint, …???)
- Maarten Litmaath will coordinate/chair future meetings and is responsible for ensuring agreement and delivery of the SRM servers by September. </sl>

Choose timezone

SC4 / pilot WLCG Service Workshop

TIFR, Mumbai

Mumbai

Mumbai

Mumbai

Share this page

Direct link

Social networks

Calendaring