CERN Tier-0 Report ================== CERN Grid services managed by GD: --------------------------------- * Lemon update on gdrb01 to gdrb11 (RBs with LCG 2.7.0 installed on it). * New classic SE volhcb01in production. This SE will be used by LHCb to store their log files. * rb101 to rb103 (gLite WMS) are now in production. The VOs supported are: - rb101: dteam, ops and atlas. - rb102: dteam, ops and cms. - rb103: dteam, ops, gear, unosat, sixt, na48 and geant4. * UI lxplus configured to support the new WMS nodes (rb101 to rb103). * A VOMRS upgrade to version 1.2.3 is planned for the 1st week of July on the production service lcg-voms.cern.ch. Detailed plan has been elaborated with FIO and FNAL. * On June 21st lcg-voms.cern.ch suffered from bug https://savannah.cern.ch/bugs/?func=detailitem&item_id=15638 * On June 27th voms.cern.ch couldn't contact its database due to work on the Oracle server grid8.cern.ch. Users who have not yet registered on lcg-voms.cern.ch (the primary VOMS service) were affected by this. CERN Grid services managed by FIO: --------------------------------- * Problem with the configuration of the CEs (ce101 and ce102). Special accounts with suffix prd were not included in the /etc/security/limits.conf file, generating problem on the new RBs rb104 to rb108 which were unable to determine the exit status of the user's job (see for example ggus ticket #9743). * Inconsistency between two rpms on rb104 to rb108: - Package edg-fabricMonitoring-agent-2.12.1-1 coming from Quattor/Lemon. - Package edg-fabricMonitoring-2.5.4-4 (coming from the middleware, more precisely metapackage lcg-RB) which provides the client for Lemon and Gridice. * Upgrade of lcg-vomscerts-4.2.0-1 not done automatically on rb104 to rb108 (this problem has been discovered thanks to ggus ticket #9476). The auto-update of the middleware packages has not been done before because the package apt-autoupdate has not been installed by default by quattor. CASTOR2: ------- * LSF Licenses shortage: Monday evening the CMS CASTOR2 instance started to fail to get new licenses during Monday night. As a work around until we got the new license file non-essential services (some batch nodes, LSF test instances) were shut down to free enough licenses to bring the c2cms back into production. At 17:20 we received an updated license file with 7000 licenses for LSF 6.2 which was deployed a few minutes later. * Oracle memory problem on the ATLAS CASTOR2 stager Tuesday morning. The DBA took a dump and flushed the memory. The dump pointed to a deadlock problem in one of the oracle procedures, which is now being investigated by the CASTOR2 developers. Many thanks to DES for the dump analysis. * Wednesday nights powercut unfortunately also involved the CASTOR Oracle databases in the critical area. This caused a slightly slower startup of CASTOR and CASTOR2 than what should normally be the case. * The upgrade of the Castor-2 instances to the latest version of the castor software in in full swing. ALice and LHCb were upgraded this week, we are planning to upgrade CMS and Atlas on Monday. This version brings support for the extended TURL's required to access 'durable' data from the SRM v1.1 endpoints that we are setting up. * We have upgraded all Castor-2 instances to the latest software version 2.1.0-3, and we will now start to bring the clients (lxplus, lxbatch, desktops, etc) to the same level. The upgrades will start on Monday on a subset of lxplus nodes and on Linux desktops. Unless showstopper problems are reported, we will upgrade all of lxplus and lxbatch on July. An announcement was sent to the info-experiments mailing list on Wed June 28. * We are continuing the deployment of diskservers in the LHC Castor instances, in accordance to the LCG resource planning. * A full SRM request spool halted the srm.cern.ch service on Tuesday morning, between 02:00 and 09:30. The problem with a cleaner script has been fixed. * Blocking issue for writing tape marks on SLC4 has now been identified and will be solved in the next SLC4 release. This will allow testing of Castor 2 on 64 bit large memory tape servers to improved performance. LSF: --- * We still experience long configuration times of LSF. A new mbatchd that was provided by Platform did not solve the problem yet, but a glitch in our configuration was found and fixed on Thursday. The mbatchd is running in debug mode to allow further investigation of the problem.