CERN-PROD (Tier -0/1) ===================== * Added two new lcg-CE machines CE105 and CE106 to relieve CE101 and CE102. Unfortunately, most of the jobs still go to the first two machines. * Release of gLite-3.0 WMS & WN: All of lxbatch has been upgraded to gLite-3.0. Two new CE (ce103 & ce104) as well as three new RBs have been put in production. CASTOR: ------ * Last Thursday, at 17 PM, we saw a heavy lock activity on the Atlas stager database, the result of many jobs without associated data transfer. This was possibly caused by the problems on one of the Atlas LFC nodes. Eric, Nilo and Miguel work until 3 AM to solve the problem (improved SQL procedure) and clean the DB. The Castor instance continued to function during this period, but requests took up to 30 minutes to be processed. * Also on Thursday: CMS reported many failing PHeDex SRM transfers. These were traced to bugs in the Castor stager_qry procedures. A manual (partial) clean-up was performed to allow PHeDex transfers to work properly again. The offending procedure is also fixed on Monday. Unfortunately the bug in re-introduced on Monday night... The fix is re-applied on Tuesday noon. * The Castor development team have announced that they will have a database 'hotfix' this Friday. More manual cleanup work will be needed though, also on the other Castor-2 instances. * An optimized version of a garbage collector procedure was introduced in production on the CMS stager. Unfortunately this contained a bug, and filesystems were no longer cleaned, resulting in files a messy states, and more cleanup work. * The staged rollout of castor client s/w 2.1.0-3 on the central services and the linux desktops has started. No problems have been reported, so we will deploy this version on Monday July 10 * We have announced durable SRM endpoints to Atlas and LHCb * We are adding >50 diskservers to the Alice and CMS stager to prepare for their DAQ-T0-T1 tests. LXBATCH/GRID: ------------ * LXBATCH has been upgraded to gLite 3.0.1 * Three new gLite-2.7 CE machines have been added to remedy the situation with the heavily overloaded production CE's, ce101 and ce102 * The gLite 3.0 installation at CERN-PROD is done and was announced to the GRID. * we are preparing to enable write access to the grid software management accounts to AFS via gssklog. The necessary patches to the job starter have been done, and the new rpm's are under test. The change was announced via broadcast on 5-7-2006. If no complains are received, and no problems are found, they will be rolled out early next week. Special Clusters: ---------------- * 10 lxbuild machines installed for ATLAS : 4 with SLC3/32 ,5 with SLC4/64, 1 with SLC4/32. Fabric Developments: ------------------- Quattor: - CDB: (Minor) fixes applied to CDB shipped with Quattor 1.2 and reported by outside users. The CERN-CC CDB will be upgraded before the end of the month to Quattor1.2. - Secure transfer of profiles with X.509 authentication under testing: . tests for already installed clients successfully done; . tests for new nodes installation underway (modifs to PrepareInstall required). Deployment will follow ASAP; phasing-out of clear HTTP access to be carefully planned, since other services (i.e., CDBSQL/Remedy) depend on it. Lemon: - Preparing a new version of lemon-sensor-exception, providing in particular support for defining exceptions on metrics reported on behalf of remote entities. - Much work has been done to track down CDB synchronization issues between SURE and LAS alarms. - With the official public release announce made of edg-fabricMonitoring-2.12.1-2 last week many bugs have been found in the ncm component and pan-templates and fed back to the development team at CERN. Castor: - a new release is being build : 2.1.0-4 including fixes for different bugs (among which rfcp and missing ns commands) - xrootd plugin testing is stalled due to the lack of answers of the XROOTD developer - rootd improvements for new TURL are being tested by the ROOT team - the problem with stager_qry reported by CMS is now understood and a workaround will be deployed asap. CERN Grid Operations managed by GD: ----------------------------------- * New classic SE voatlas01in production. This SE will be used by Atlas to store their log files. * VO services suffered 2 short unscheduled interruptions and loss of service to some users due to hardware and configuration errors. Further details at - https://uimon.cern.ch/twiki/bin/view/LCG/LcgScmStatusAas * One security incident reported from an EGEE site which resolved to an effective denial of service against the CE due to middleware failure. GridView: --------- * New Job Monitoring as well as SAM monitoring graphs are finished and will be put into production next week. * Work on FTS monitoring has started.