======================================================================== NA4/Medical imaging meeting at CERN, November 29, 2006 ======================================================================== Participants: ============= NA4: Csaba Anderlink Hugues Benoit-Cattin Arnaud Fessy Daniel Jouvenot Johan Montagnat Christophe Pera Romain Texier Phone participants: Miklos Kozlovszky Tuodor Gurov JRA1/SA3: Akos Frohner Claudio Grandi David Smith Elisabetta Molinari Agenda: ======= 10:00 Status (J. Montagnat) 10:15 Application portals (H. Benoit-Cattin) 10:45 Web interface to medical image repository (A. Fessy) 11:00 Discussion on web portals 11:20 Application reports (All) 12:30 Lunch 14:00 gLite Data Management (A. Frohner) 14:45 gLite Workload Management System (E. Molinari) 15:30 AoB, conclusions 16:00 End of meeting Status, (J. Montagnat) ====================== Upcoming milestones: first user survey (MNA4.3, PM7) being submitted; Biomedical demonstration (MNA4.2, PM10, Jan. 07) coming soon. First summary and evaluation of application usage of EGEE-II services (DNA4.2.1, PM11, Fev. 07) coming soon. The call for candidate demonstrations is open. The first user survey was submitted this week. Feedback was requested on (i) user support mechanisms, (ii) infrastructure and (iii) middleware services. A single survey was filled in for all biomedical activities. Concerning user support, it was reported that GGUS is satisfying for daily/local problems solving but that it hardly addresses the problem of longer term issues follow-up. It proved to be difficult to get support for services deployment. Regarding the infrastructure, the VO is using the production infrastructure only. The resources availability is high. However, the VO services support is not satisfying. The most used middleware services are the WMS (through the RB) and the DMS (through file catalogs). The major limitation encountered for middleware are the inefficient processing of short jobs and limitations in the management of sensitive data. The VO make use of many non-EGEE services (AMGA for metadata , MOTEUR and Taverna workflow managers, PGRADE and other application portals, Medical Data Manager, EncFile and Perroquet data security services). In the project TCG, the main request made is for secured data management services deployment. We get very little voice there and very few resources and time could be allocated to the prioritized list of requirements and this request. There is a pending request to the OAG to support gLite DMS services in a reasonably stable environment. The second User Forum will be organized in conjunction with the next Open Grid Forum (OSG: May 7-9, UF2: May 9-11). The common day (May 9) will be dedicated to common interest tracks (standards, applications experience, business) and the two remaining days will be dedicated to parallel thematic sessions, demonstrations and posters. A call for abstract will be released very soon (short, text only abstracts). Everybody is encourage to participate. The NA4 proposal for EGEE-III document has just been circulated. It will be used as input for the EGEE-III draft technical document to be assembled by the end of this year. The intent is to organize the "applications" activity so that it can efficiently manage and respond to a growing user community. The roles of this activity are (i) to provide support to set up and start new VOs, (ii) to build self-reliant communities and (iii) to work on advanced application service requirements. To improve efficiency, a hierarchical structure based on regional activities should be set up. Resources should be concentrated in reasonably large co-located team to which missions can be assigned depending on the emerging needs for applications and services development. The current status of the medical imaging activity is difficult to assess. There is no more global statistics collection by JRA2 since the end of EGEE phase I. Although these were incomplete, this provided with a global metric showing the evolution of the VO activity that is lacking today. The last morning slot gives some input on the current applications and services development status. Application portals, (H. Benoit-Cattin) ======================================= There is a need for deploying Web portals that ease the access to grids to a large user community by hiding the grid internal complexity. The experience with the SiMRI MRI simulator showed that setting up an application portal really boosted the usage of the application. A portal is expected to enable the execution of existing code on a grid infrastructure without recompilation and new application development. Ideally different computing resources (grids, local clusters...) should be accessible transparently. The portal is expected to enable the access to several medical image sources as well. The targeted applications are heavy computations using MPI. Therefore there is no interaction need. The input data sources might be local files, grid files, DICOM storage or parameters. The outputs might be local or grid files, or visualization results. A portal is expected to enable running jobs status checking. In addition this is very useful to get an history of jobs ran, for efficiency (caching), accounting and reporting reasons. There is also a need for user space management (I/O data temporary storage, jobs history, enable data sharing for user groups). Server administration is of course needed (portal users management, accounting...). A portal should improve the grid WMS quality of service (automatic resubmission of failed jobs, management of groups of jobs, MPI submission...). It should include authentication and security features. The SIMRI portal only include login/password authentication and uses a single portal certificate but ideally, the user certificates should be managed and used by the portal. The SIMRI portal architecture is based on three layers. There are several other possibilities: PGRADE, GENIUS, GridSphere... What is the best solution? Discussion on portals ===================== MK: In the grid area, you need to deal with constantly changing technologies: it is better to rely on an external portal. MK: Interoperability between different infrastructures is possible through PGRADE. Certificate management available, etc. HBC: I want to make my (gridified) application available to end users. What is the more efficient way to go? MK: The portal needs to be able to handle the users load. Management of users, accounting... MK: For a pilot project, a small, home-made solution is really feasible. For longer term / more intensive usage, an existing portal is a good idea. MK: PGRADE team happy to help porting of the SIMRI application. AK: We have some experience with non-portal solutions: Tomcat-based and APACHE-based solutions for deploying Web Services. There are tricks to work with proxies but that is all. The gridsite.org site has APACHE modules for grid credentials/VOMS. HBC: How to integrate my (gridified) application in a portal? MK: Use grid-sphere to wrap the application in a portlet and create user interface very easily. HBC: will it offer job monitoring and accounting as well? MK: PGRADE is using the same portlet API and provides additional functionality. MK: How many users are you targeting? HBC: hundreds, up to thousands for SIMRI if the quality of service is good enough. MK: The portal needs to manage the load. HBC: Can I access DICOM storage? JM: MDM should provide this functionality. Currently blocked by the deployment of an operational file catalog. Otherwise almost ready (deployment procedure implemented, need to make final tests as soon as the DMS allows for it). HBC: Does the grid provide long term archiving? JM: We had bad experiences with that: SEs have been wiped out in the past without back up of the data. CP: Grid sites have no policy for grid data archiving. HBC: How to obtain the jobs history. AF: the glite L&B (development version) handles it but it is not deployed yet. HBC: Is it possible to get notified about jobs status? AF: No: polling is needed. MK: Notification can be configured in PGRADE but it is usually not asked and not used. HBC: Is multiple submission possible (and then cancel all jobs but one, once the first one as started)? JM: See Casanova paper in HPDC06: it does not really hurt in general but with this kind of strategy, the more resources you have, the faster your jobs and the less resources you have, the later your jobs. It is an unfair strategy. AK: Physicists do worse: they send pilot jobs on the nodes. I do not like it but this is done. HBC: This is even more critical for long MPI jobs. HBC: management of multiple submissions? MK: possible to have regular submission through time; and workflows. HBC: How to handle user certificates? MK: In PGRADE: user are authenticated by password on the portal. The portal is using a myproxy server to get the user certificate. HBC: user space management? AK: Separate it. For data, use DMS. For jobs, use glite L&B. For group of users there is VOMS. DJ: PTM3D graphical interface is connected to the grid file catalog. An HTTP protocol access is being developed. PGRADE status, (Miklos Kozlovsky) ================================= PGRADE v2.4 has been available from last September. The "parameter study" portal will be available by the end of this month. A new, redesigned portal to be announced in January, with better scaling up capabilities. License: no change, freely available for education/universities. There are moves to make it open. Web interface to medical image repository (Arnaud Fessy) ======================================================== MIP (the Medical Image Platform) is a web portal for medical images with upload functionality, visualization, anonymization of medical data, metadata extraction and registration in AMGA. GATE application jobs (Monte Carlo) are initiated and results archived from MIP. Plans: gridFTP upload, authentication par certificates, job management with resubmission. Image query engine. Application reports, (all) ========================== Freesurfer brain image analysis (Csaba Anderlink) ------------------------------------------------ Users: medical imaging group in Bergen. Plans to have a web portal for freesurfer. Simple needs for now (data transfers, jobs management). Simple authentication handled with password. We are in contact with the users. They have expressed requirements similar as the one discussed this morning. Data sharing among several users will be needed. In terms of time scale, a prototype is expected by the end of January. The grid benefit from the users point of view is to gain access to large resources (thus enabling the processings of many subjects). JM: See PGRADE parametric study (privately available only for now), or MOTEUR workflow manager for that kind of needs. AK: There is an Apache module for ACLs control. MK: data management in PGRADE: file manager portlets for (1) storage elements, and (2) user space for workflows on the portal + quota manager. THiS, (H. Benoit-Cattin) ------------------------ A script language for PET and hadrontherapy simulation has been designed. THiS is optimizing GEANT simulations in 3D space for the needs of complex medical structures. An integration of a part of THiS inside GEANT is being discussed. The project is progressing very rapidly. Cardiovascular analysis, (H. Benoit-Cattin) ------------------------------------------- Non-rigid 2D-3D registration. Gridification almost completed: an MPI version is currently being debugged. Will reuse the SiMRI gridification framework and portal. SiMRI3D: temporarily, problem with CC-IN2P3 UI. gPTM3D, (D. Jouvenot) --------------------- Integration of the grid file catalog browser in the official gPTM3D development branch. MDM, (R. Texier) ---------------- Tests of the deployment procedure temporarily suspended due to non availability of the FiReMAN service. Will restart soon. MOTEUR, (J. Montagnat) ---------------------- New developments on-going to enable the use of complex types in Web Services orchestrated. SDJ --- Will be ready in gLite 3.1 from a middleware point of view. gLite Data Management (A. Frohner) ================================== List of DM components available in gLite 3.0 - DPM storage element - FTS (File Transfer Service) - File catalogs (LFC, FiReMAN) - metadata catalog (AMGA) - POSIX IO and CLI (GFAL and lcg-utils, gliteIO) - encrypted storage (EDS library and Hydra) SRMv2 is targeted for fine grained access control (ACLs) in every SE. The GFAL file catalog and lcg_utils are already SRMv2.2 compliant. The DPM and FTS should be SRMv2 compliant by the end of this year. There are work on CASTOR and dCache to move towards SRMv2 but no guaranteed timeline. GFAL should be interconnected to the EDS library soon. GFAL support in CLI and Csec encryption are planed for Feb. 07. A POSIX interface is also planed but no timeline. Communications with LFC are authenticated but not encrypted. The encryption key splitting strategy is implemented and its integration with EDS planed in Feb. 07. DPM HTTPS transfer protocol is planed for Feb. 07. A biomed request is to get ACLs synchronizations grid-wide (to synchronize both key store ACLs and ACLs on SEs in SRMv2). The EDS CLI should ensure "best effort" consistency (planed for April 07). A dedicated synchronization service might be available in June 07 (depends on design). DPM-DICOM integration has been mentioned. A DPM support for secondary storage back-end is planed for March 07. DICOM could then be integrated as a specific type of secondary storage For using FiReMAN FC on the production infrastructure - The gliteIO client is deployed in gLite 3.0. - Need to have a FiReMAN server installed - Need to have a gLiteIO server on the SEs It could be possible to ask a site to install a biomed VOBOX (host with root access). On such a box it should be feasible to install a gLiteIO server. gLite WMS, (Elisabetta Molinari) ================================ This talk emphasize on gLite WMS and main differences with LCG2, file catalog interfaces, high level job control tools and latest WMS tests. The WMS is a complex interaction of components. It includes in particular the WMProxy (a WS interface to the WMS), the Workload manager itself (main component), and the L&B (job monitoring). The matchmaking process is based on JDL requirements and ranks. The rank is computed as a function of the number of free slots in queues and CPU power. It sometimes proves not to be very reliable. Therefore, fuzzy ranking is possible (through the JDL attribute fuzzyRank = true). It takes "one of the best" sites, not necessarily the best one. New components in gLite WMS are: - A task queue to periodically retry non-matching requests (jobs are seen as submitted from the users). - Shallow resubmission: resubmission of jobs that failed before the user job starts running (improves the success rate a lot). - Information supermarket: cache repository of IS on available resources (improves performances a lot). - LBProxy: L&B server. - CondorC: between WM and CE (more reliable than GRAM). The New features are: - Parametric jobs: the "PARAM" keyword in the JDL is substituted by a varying parameter. - Jobs collections: set of jobs all collected together (handled as DAGs). There exist a bundle job ID. The jobs share their input sandbox. The submission is bulk (single authentication). - DAGS = collections with dependencies - VOViews: can specify access rules to the CEs for different VOs and different groups inside VOs - Automatic zipping of sandboxed directly to the UI (not going through the RB) - Short Deadline Jobs - High load limiter: prevents new submissions to overloaded RBs (the UI may have round-robin RBs policy) - Prolog and epilog: pre and poststcript around the user jobs (prolog and epilog are not considered as part of the job: a failure in prolog cause a job to be shallow resubmitted; epilog executed after end of MPI jobs). - Job perusal: job's file content inspection while the job is running (for a specific list of jobs specified at submission time). The WMS is interfaced to different data catalogs. The data catalog is indicated through the 'DataRequirements' JDL tag value: DLI (Data Location Interface for LFC and DPM) or SI (gLite Storage Index for FiReMAN). Job types supported: - interactive stdin/out streams - DAGs - collections - parametric jobs High level job control tools - WSDL for jobs submission (WMProxy) + CLI + APIs - LB APIs Deployment status: gLite3.0 WMS too buggy and not really deployed. A controlled number of new instances have been intensively tested on PPS (CMS and ATLAS tests) and should become available on the production infrastructure very soon. Performance results on the development version show 0.5s/job submission, 2.5h to submit 20k jobs, very few failures due to the WMS itself (failures usually due to application or site specific problems). Plans: - High availability of the RB (more robust). - Job provenance: retain data on finished jobs - ICE: interface to WS-based CE - DGAS: storage accounting system JM: Will the polling latency be reduced? Will improve when using CondorG and CREAM: currently, a job monitor parses the Condor log file regularly to get notification. With ICE, there will be a real notification mechanism.