emails on
procurement planning and resource usage scheduling
-----Original Message-----
From: Alberto
Aimar
Sent: 28 November 2005 18:38
To: worldwide-lcg-management-board (LCG
Management Board)
Subject: Procurement Plans Issue
Dear MB members,
For some sites, if I did not misunderstand their milestones plans,
there is a considerable
mismatch between:
- what is planned in terms of hw procurement in
the Tier1 site, and
- what is being signed in the WLCG MoU Annexe 6.3 (Tier1 Computing Capabilities):
https://uimon.cern.ch/twiki/pub/LCG/SitesPlans/PagesfromMoU_Annexe_6_3.pdf
In order to simplify the analysis I have summarized the procurement
milestones
of all Tier1 sites in
one single (short) Excel table:
https://uimon.cern.ch/twiki/pub/LCG/SitesPlans/ProcurementPlans-200511.xls
(values planned in white, values in MoU in pink)
I have added this issue as last item in tomorrow's MB agenda because
it would be useful to
clarify it.
Regards.
Alberto.
-----Original Message-----
From: Kors Bos [mailto:bosk@nikhef.nl]
Sent: 28 November 2005 22:00
To: Alberto Aimar
Cc: worldwide-lcg-management-board (LCG
Management Board); Jeff Templon
Subject: Re: Procurement Plans Issue
Alberto, All,
Of course there is a mismatch between what we plan to install in 2006
and what we said we
would in the Phase-II-Planning tables. For the P2P
estimates we
took what the experiment TDRs said that was needed.
We have
always said that we would
be as conservative as possible with installing
this equipment as our
money would be worth more if we could wait longer
and this would be very
much also in the interest of the experiments.
In this service phase of Service Challenge 3 we have seen such low
demand from the experiments
that we are very reluctant to invest any
more money in additional
hardware at this point in time as the current
infrastructure is
already over-dimensioned by large factor: over the
past 3 months there have
been a few TBytes written and nothing read back
and the farms are
running idle when I speak (not speaking about CMS as
we don't have that VO
in
I think it is un-realistic to maintain the overall planning for 2006 as
it is written in the MoU tables as we simply will not do it. We are not
going to spend a lot of money
because it is written in some tables and
let it sit un-used. If
this means not signing the MoU, let that be it
(although I am not the person to decide that).
I am very worried about this development as I have said many times. It
means that the experiments
want us to to scale up orders of magnitude by
the time the beam comes
on without ever having had the chance to test
this. We have now written
a few TBytes over three months, something we
should be able to do in a
few hours in 2007 and we still have never read
anything
back something we should be doing constantly. We still don't
know by far what we can
expect under real data taking circumstances and
there are only 487 days
left.
Kors
From: Les
Robertson
Sent: 01 December 2005 09:06
To: worldwide-lcg-management-board
(LCG Management Board); Jeff Templon; Chris Eck
Subject: RE: Procurement Plans Issue
Dear
MB Members
Kors raises important issues here that I think need
to be discussed at the next MB.
The Phase 2 Planning Group intended to collect the plans of the regional centres. These of course are based on the requirements of
the experiments but they also had to take account of the funding
possibilities and, very importantly, the need for each centre to ramp up its
services in terms of reliability, availability, functionality,
performance, capacity .... Kors
has pointed out that the experiments are not at present filling the available
capacity, and Ricardo and others have noted that in part this is because the
services are not yet working smoothly. We also know that we failed to achieve
the (as we had thought rather modest) data transfer goals of SC3 before the
summer, and it looks as if it will be April next year before we are able to
complete this. This means that a lot of the equipment needed to get us up to
the target performance (tape drives, disk servers, network switches, WAN links, ..) will have lain idle for long periods - but that is
unavoidable. Recent experience proves that we will definitelty
not achieve the performance that we need when the accelerator starts running if
we do not go through a lengthy learning process.
In terms of Service Challenges for the Service Providers we have got stuck for the moment on data distribution, but there will undoubtedly be a substantial set of problems uncovered as we expand to the fuller set of services that the Tier-0 and Tier-1s must provide. Adding the less predictable workload of calibration, re-processing, ESD/AOD skimming, etc. places significant demands on the disk and LAN infrastructure - and it is important to show that our design estimates stand up in practice. The experiments have a need to test their full software chain, which of course adds complications to the testing cycles. It may be that we should plan some more artificial tests that concentrate on the infrastructure - as we did this year for data transfer. This should be discussed at the Mumbai workshop.
I also wonder if we should not have an MB working group to assemble the overall requirements for resources in LCG over the next twelve months and come up with an overall schedule - one person from each expeiment and 3-4 people from sites. I am worried about each site making its own independent conclusions about the needs for next year - when we have to demonstrate the full data rates (resource intensive) as well as run decent services for the experiments. While it may be that the cpu capacity may be over-sized for 2006 I doubt that the disks systems will be. For these it is not capacity that matters but accessibility.
A final, but very important point - I really do think that it is important that sites put their real and realistic plans into the MoU tables. We are asking the funding agencies to make real commitments and stick to them. The MoU process was specifically constructed so that firm commitments are only made for the coming year. Please look at your plans and the MoU tables for 2006 - and if there are discrepancies please change them now so that the MoU can be signed with the real numbers. We must take this seriously if we expect the agencies to do so.
Regards, Les
From: Gordon,
JC (John) [mailto:J.C.Gordon@rl.ac.uk]
Sent: 01 December 2005 10:26
To: Les Robertson; worldwide-lcg-management-board
(LCG Management Board); Jeff Templon; Chris Eck
Subject: RE: Procurement Plans Issue
Les, I (and my colleagues) recognise the requirement to test the LHC computing systems in advance at their proposed level of complexity and we have plans to install all we can afford and PPARC will commit to this. However, PPARC as a funding body are not at all convinced of the point of installing computing resources that only get used for a small fraction of the time during these tests. They submit us to external review and the reviewers are apalled at our levels of utilisation. Each year we show these unused resources it gets harder to convince them to commit. There is a real risk that one year they will refuse completely and not commit to the MoU until they see that there is pressure on the existing resources. They did that this year and we had to fight hard to reverse the decision, that is why our installation plan is later than we forecast in earlier P2P meetings. Next year it will be harder still. They have approved the funding but it is hard to make the case for its release when we will get closer to the experiments 2008 requirements the longer we wait. I have raised this issue before at GDB and my prediction came true this year.
John
-----Original Message-----
From: Mirco Mazzucato [mailto:mirco.mazzucato@pd.infn.it]
Sent: 01 December 2005 15:34
To: John Gordon
Cc: Les Robertson; worldwide-lcg-management-board
(LCG Management Board); Jeff Templon; Chris Eck
Subject: Re: Procurement Plans Issue
Dear Les
just to confirm that I
have similar difficulties in INFN even if I try,
with more or less
success, to overcome them opening more
the Center to
currently
running experiments.
Cheers
Mirco
From: Les Robertson
Sent: 01 December 2005 17:56
To: John Gordon; worldwide-lcg-management-board
(LCG Management Board); Jeff Templon
Subject: RE: Procurement Plans Issue
Dear John
I agree that we need to improve the understanding of exactly which resources the experiments need for their work, when and at which sites. But we have to be clear to ourselves and to the funding agencies that we are not at present in a mode where systems are expected to be kept filled throughout the year. At present in many cases the oversight/funding agencies should be concerned with building up performance and demonstrating readiness more than with sustained throughput.
Regards, Les
From: Holger Marten
Sent: 01 December 2005 14:03
To: worldwide-lcg-management-board (LCG
Management Board)
Subject: AW: Procurement Plans Issue
Hi
Alberto,
Concerning GridKa:
There is an apparent mismatch between the disk milestones and what has been
written in the MoU ("missing" 35 TB). The
reason is that we submitted our planning to extend the **dCache disk space**.
There are more than 40 additional TB of disk space already in use by the LHC
experiments which are not dCache and thus have to be added to the total sum!
Comparing tape plannings with MoU
numbers there are 20 TB missing. This is correct as we are still discussing wether to extend our existing robots with slots or go for
another new robot system at the beginning of 2006. It is likely, that these
discussions will proceed in January, and I didn't want to promise any time
scales in the current version of the milestones document.
Regards,
Holger
-----Original Message-----
From: Jeff
Templon [mailto:templon@nikhef.nl]
Sent: 01 December 2005 17:04
To: Les Robertson
Cc: Gordon, JC (John); worldwide-lcg-management-board
(LCG
Management Board)
Subject: Re: Procurement Plans Issue
We are developing a min / max system here.
We will give Alberto Aimar the "min" stuff ... this is what
we will deploy
regardless of how it gets used, because we
want to start building up
certain parts of the infrastructure
that are more or less
capacity independent. Then there is a
'max' which is what is in the MoU. We try to make
sure we
can deploy this IF IT IS
NEEDED in a expeditious manner.
But if we are clearly overdimensioned we are not going to
blindly
follow the MoU capacity tables. Every day we wait,
things get cheaper and
better.
This of course is from the Tier-1 point of view. From the
experiments
point of view, every day we wait is another day
you don't have to figure
out what bugs are left. Bring it on.
JT
-----Original Message-----
From: Dominique
Boutigny [mailto:boutigny@in2p3.fr]
Sent: 02 December 2005 09:47
To: Les Robertson
Cc: worldwide-lcg-management-board (LCG
Management Board)
Subject: Re: Procurement Plans Issue
Hi Les,
Concerning IN2P3, we have a rather strict budget profile between now and
2010 and we should stick to it. Even, if it is financially more
interesting to
buy the hardware later, it is not possible to change the
profile.
We will buy the hardware according to the MOU as long as we get the
budget.
If the experiments are not able to use all the ressources
immediatly
after they are made
available at CC-IN2P3, then, it is up to the French
LCG collaboration and more generally to WLCG to help them to overcome
the difficulties.
In any case, the ressources declared in the
MOU for 2006 are pledged, so
they have to be realistic
numbers.
Best regards,
Dominique
-----Original Message-----
From: Federico
Carminati
Sent: 05 December 2005 11:05
To: boutigny@in2p3.fr
Cc: Les Robertson; worldwide-lcg-management-board
(LCG Management Board)
Subject: Re: Procurement Plans Issue
Dear All,
in
relation with this thread, it is important to note that the
under-utilisation of
the resources comes from two facts:
- Lack of proper planning by the experiments
- Problems in the availability / usability of the resources
If the LHC software were
working perfectly, and all the sites were
perfectly
configured and available 100% of the time, much more of the
resources
would have been used. Of course not 100% of them, because
experiments (or
at least
plan perfectly. In any
case I would like to see the discussion
focused
more on "how to make resources 100% (well as close as
possible) available", rather then "experiments used less
than
promised, so
let's buy less resources". I think the discussion as it
is now tends to focus
more on the lack of planning (which indeed is
real to some extent) than
on the effectively usability of the
resources
deployed (which is FAR from 100%). The lack of an agreed
metrics
does not help this debate to become factual. Best,
Federico Carminati
CERN-PH
1211
Tel: +41 22 76 74959
Fax: +41 22 76 79480