========================================================================
          NA4/Medical imaging meeting at CERN, November 29, 2006
========================================================================


Participants:
=============

NA4:
Csaba Anderlink
Hugues Benoit-Cattin
Arnaud Fessy
Daniel Jouvenot
Johan Montagnat
Christophe Pera
Romain Texier

Phone participants:
Miklos Kozlovszky
Tuodor Gurov

JRA1/SA3:
Akos Frohner
Claudio Grandi
David Smith
Elisabetta Molinari


Agenda:
=======
10:00	Status (J. Montagnat)
10:15	Application portals (H. Benoit-Cattin)
10:45	Web interface to medical image repository (A. Fessy)
11:00	Discussion on web portals
11:20	Application reports (All)
12:30	Lunch
14:00	gLite Data Management (A. Frohner)
14:45	gLite Workload Management System (E. Molinari)
15:30	AoB, conclusions
16:00	End of meeting


Status, (J. Montagnat)
======================

Upcoming milestones: first user survey (MNA4.3, PM7) being submitted;
Biomedical demonstration (MNA4.2, PM10, Jan. 07) coming soon. First
summary and evaluation of application usage of EGEE-II services
(DNA4.2.1, PM11, Fev. 07) coming soon. The call for candidate
demonstrations is open.

The first user survey was submitted this week. Feedback was requested
on (i) user support mechanisms, (ii) infrastructure and (iii)
middleware services. A single survey was filled in for all biomedical
activities. Concerning user support, it was reported that GGUS is
satisfying for daily/local problems solving but that it hardly
addresses the problem of longer term issues follow-up. It proved to be
difficult to get support for services deployment. Regarding the
infrastructure, the VO is using the production infrastructure
only. The resources availability is high. However, the VO services
support is not satisfying. The most used middleware services are the
WMS (through the RB) and the DMS (through file catalogs). The major
limitation encountered for middleware are the inefficient processing
of short jobs and limitations in the management of sensitive
data. The VO make use of many non-EGEE services (AMGA for metadata ,
MOTEUR and Taverna workflow managers, PGRADE and other application
portals, Medical Data Manager, EncFile and Perroquet data security
services).

In the project TCG, the main request made is for secured data
management services deployment. We get very little voice there and
very few resources and time could be allocated to the prioritized list
of requirements and this request. There is a pending request to the
OAG to support gLite DMS services in a reasonably stable environment.

The second User Forum will be organized in conjunction with the next
Open Grid Forum (OSG: May 7-9, UF2: May 9-11). The common day (May 9)
will be dedicated to common interest tracks (standards, applications
experience, business) and the two remaining days will be dedicated to
parallel thematic sessions, demonstrations and posters. A call for
abstract will be released very soon (short, text only
abstracts). Everybody is encourage to participate.

The NA4 proposal for EGEE-III document has just been circulated. It
will be used as input for the EGEE-III draft technical document to be
assembled by the end of this year. The intent is to organize the
"applications" activity so that it can efficiently manage and respond
to a growing user community. The roles of this activity are (i) to
provide support to set up and start new VOs, (ii) to build
self-reliant communities and (iii) to work on advanced application
service requirements. To improve efficiency, a hierarchical structure
based on regional activities should be set up. Resources should be
concentrated in reasonably large co-located team to which missions can
be assigned depending on the emerging needs for applications and
services development.

The current status of the medical imaging activity is difficult to
assess. There is no more global statistics collection by JRA2 since
the end of EGEE phase I. Although these were incomplete, this provided
with a global metric showing the evolution of the VO activity that is
lacking today. The last morning slot gives some input on the current
applications and services development status.


Application portals, (H. Benoit-Cattin)
=======================================

There is a need for deploying Web portals that ease the access to
grids to a large user community by hiding the grid internal
complexity. The experience with the SiMRI MRI simulator showed that
setting up an application portal really boosted the usage of the
application.

A portal is expected to enable the execution of existing code on a
grid infrastructure without recompilation and new application
development. Ideally different computing resources (grids, local
clusters...) should be accessible transparently. The portal is
expected to enable the access to several medical image sources as
well.

The targeted applications are heavy computations using MPI. Therefore
there is no interaction need. The input data sources might be local
files, grid files, DICOM storage or parameters. The outputs might be
local or grid files, or visualization results.

A portal is expected to enable running jobs status checking. In
addition this is very useful to get an history of jobs ran, for
efficiency (caching), accounting and reporting reasons. There is also
a need for user space management (I/O data temporary storage, jobs
history, enable data sharing for user groups). Server administration
is of course needed (portal users management, accounting...).

A portal should improve the grid WMS quality of service (automatic
resubmission of failed jobs, management of groups of jobs, MPI
submission...). It should include authentication and security
features. The SIMRI portal only include login/password authentication
and uses a single portal certificate but ideally, the user
certificates should be managed and used by the portal.

The SIMRI portal architecture is based on three layers. There are
several other possibilities: PGRADE, GENIUS, GridSphere... What is the
best solution?


Discussion on portals
=====================

MK: In the grid area, you need to deal with constantly changing
technologies: it is better to rely on an external portal.
MK: Interoperability between different infrastructures is possible
through PGRADE. Certificate management available, etc.

HBC: I want to make my (gridified) application available to end
users. What is the more efficient way to go?
MK: The portal needs to be able to handle the users load. Management
of users, accounting...
MK: For a pilot project, a small, home-made solution is really
feasible. For longer term / more intensive usage, an existing portal
is a good idea.
MK: PGRADE team happy to help porting of the SIMRI application. 

AK: We have some experience with non-portal solutions: Tomcat-based
and APACHE-based solutions for deploying Web Services. There are
tricks to work with proxies but that is all. The gridsite.org site has
APACHE modules for grid credentials/VOMS.

HBC: How to integrate my (gridified) application in a portal?
MK: Use grid-sphere to wrap the application in a portlet and create
user interface very easily.
HBC: will it offer job monitoring and accounting as well?
MK: PGRADE is using the same portlet API and provides additional
functionality.

MK: How many users are you targeting?
HBC: hundreds, up to thousands for SIMRI if the quality of service is
good enough.
MK: The portal needs to manage the load.

HBC: Can I access DICOM storage?
JM: MDM should provide this functionality. Currently blocked by the
deployment of an operational file catalog. Otherwise almost ready
(deployment procedure implemented, need to make final tests as soon
as the DMS allows for it).

HBC: Does the grid provide long term archiving?
JM: We had bad experiences with that: SEs have been wiped out in the
past without back up of the data.
CP: Grid sites have no policy for grid data archiving.

HBC: How to obtain the jobs history.
AF: the glite L&B (development version) handles it but it is not
deployed yet.

HBC: Is it possible to get notified about jobs status?
AF: No: polling is needed.
MK: Notification can be configured in PGRADE but it is usually not
asked and not used.

HBC: Is multiple submission possible (and then cancel all jobs but
one, once the first one as started)?
JM: See Casanova paper in HPDC06: it does not really hurt in general
but with this kind of strategy, the more resources you have, the
faster your jobs and the less resources you have, the later your
jobs. It is an unfair strategy.
AK: Physicists do worse: they send pilot jobs on the nodes. I do not
like it but this is done.
HBC: This is even more critical for long MPI jobs.

HBC: management of multiple submissions?
MK: possible to have regular submission through time; and workflows.

HBC: How to handle user certificates?
MK: In PGRADE: user are authenticated by password on the portal. The
portal is using a myproxy server to get the user certificate.

HBC: user space management?
AK: Separate it. For data, use DMS. For jobs, use glite L&B. For group
of users there is VOMS.
DJ: PTM3D graphical interface is connected to the grid file
catalog. An HTTP protocol access is being developed.


PGRADE status, (Miklos Kozlovsky)
=================================

PGRADE v2.4 has been available from last September.

The "parameter study" portal will be available by the end of this month.

A new, redesigned portal to be announced in January, with better
scaling up capabilities.

License: no change, freely available for education/universities. There
are moves to make it open.


Web interface to medical image repository (Arnaud Fessy)
========================================================

MIP (the Medical Image Platform) is a web portal for medical images
with upload functionality, visualization, anonymization of medical
data, metadata extraction and registration in AMGA.  GATE application
jobs (Monte Carlo) are initiated and results archived from MIP.

Plans: gridFTP upload, authentication par certificates, job
management with resubmission. Image query engine.


Application reports, (all)
==========================

Freesurfer brain image analysis (Csaba Anderlink)
------------------------------------------------

Users: medical imaging group in Bergen.

Plans to have a web portal for freesurfer. Simple needs for now (data
transfers, jobs management). Simple authentication handled with
password.

We are in contact with the users. They have expressed requirements
similar as the one discussed this morning. Data sharing among several
users will be needed.

In terms of time scale, a prototype is expected by the end of January.

The grid benefit from the users point of view is to gain access to
large resources (thus enabling the processings of many subjects).
JM: See PGRADE parametric study (privately available only for now), or
MOTEUR workflow manager for that kind of needs.
AK: There is an Apache module for ACLs control.
MK: data management in PGRADE: file manager portlets for (1) storage
elements, and (2) user space for workflows on the portal + quota
manager.


THiS, (H. Benoit-Cattin)
------------------------

A script language for PET and hadrontherapy simulation has been
designed. THiS is optimizing GEANT simulations in 3D space for the
needs of complex medical structures. An integration of a part of THiS
inside GEANT is being discussed. The project is progressing very
rapidly.


Cardiovascular analysis, (H. Benoit-Cattin)
-------------------------------------------

Non-rigid 2D-3D registration. Gridification almost completed: an MPI
version is currently being debugged. Will reuse the SiMRI
gridification framework and portal.

SiMRI3D: temporarily, problem with CC-IN2P3 UI.


gPTM3D, (D. Jouvenot)
---------------------

Integration of the grid file catalog browser in the official gPTM3D
development branch.


MDM, (R. Texier)
----------------

Tests of the deployment procedure temporarily suspended due to non
availability of the FiReMAN service. Will restart soon.


MOTEUR, (J. Montagnat)
----------------------

New developments on-going to enable the use of complex types in Web
Services orchestrated.


SDJ
---

Will be ready in gLite 3.1 from a middleware point of view.


gLite Data Management (A. Frohner)
==================================

List of DM components available in gLite 3.0
 - DPM storage element
 - FTS (File Transfer Service)
 - File catalogs (LFC, FiReMAN)
 - metadata catalog (AMGA)
 - POSIX IO and CLI (GFAL and lcg-utils, gliteIO)
 - encrypted storage (EDS library and Hydra)

SRMv2 is targeted for fine grained access control (ACLs) in every
SE. The GFAL file catalog and lcg_utils are already SRMv2.2
compliant. The DPM and FTS should be SRMv2 compliant by the end of
this year. There are work on CASTOR and dCache to move towards SRMv2
but no guaranteed timeline.

GFAL should be interconnected to the EDS library soon. GFAL support in
CLI and Csec encryption are planed for Feb. 07. A POSIX interface is
also planed but no timeline. Communications with LFC are authenticated
but not encrypted.

The encryption key splitting strategy is implemented and its
integration with EDS planed in Feb. 07.

DPM HTTPS transfer protocol is planed for Feb. 07.

A biomed request is to get ACLs synchronizations grid-wide (to
synchronize both key store ACLs and ACLs on SEs in SRMv2). The EDS CLI
should ensure "best effort" consistency (planed for April 07). A
dedicated synchronization service might be available in June 07
(depends on design).

DPM-DICOM integration has been mentioned. A DPM support for secondary
storage back-end is planed for March 07. DICOM could then be
integrated as a specific type of secondary storage


For using FiReMAN FC on the production infrastructure
 - The gliteIO client is deployed in gLite 3.0.
 - Need to have a FiReMAN server installed
 - Need to have a gLiteIO server on the SEs
It could be possible to ask a site to install a biomed VOBOX (host
with root access). On such a box it should be feasible to install a
gLiteIO server.


gLite WMS, (Elisabetta Molinari)
================================

This talk emphasize on gLite WMS and main differences with LCG2, file
catalog interfaces, high level job control tools and latest WMS tests.

The WMS is a complex interaction of components. It includes in
particular the WMProxy (a WS interface to the WMS), the Workload
manager itself (main component), and the L&B (job monitoring). The
matchmaking process is based on JDL requirements and ranks. The rank
is computed as a function of the number of free slots in queues and
CPU power. It sometimes proves not to be very reliable. Therefore,
fuzzy ranking is possible (through the JDL attribute fuzzyRank =
true). It takes "one of the best" sites, not necessarily the best one.

New components in gLite WMS are:
 - A task queue to periodically retry non-matching requests (jobs are
   seen as submitted from the users).
 - Shallow resubmission: resubmission of jobs that failed before the
   user job starts running (improves the success rate a lot).
 - Information supermarket: cache repository of IS on available
   resources (improves performances a lot).
 - LBProxy: L&B server.
 - CondorC: between WM and CE (more reliable than GRAM).

The New features are:
 - Parametric jobs: the "PARAM" keyword in the JDL is substituted by a
   varying parameter.
 - Jobs collections: set of jobs all collected together (handled as
   DAGs). There exist a bundle job ID. The jobs share their input
   sandbox. The submission is bulk (single authentication).
 - DAGS = collections with dependencies
 - VOViews: can specify access rules to the CEs for different VOs and
   different groups inside VOs
 - Automatic zipping of sandboxed directly to the UI (not going
   through the RB)
 - Short Deadline Jobs
 - High load limiter: prevents new submissions to overloaded RBs (the
   UI may have round-robin RBs policy)
 - Prolog and epilog: pre and poststcript around the user jobs (prolog
   and epilog are not considered as part of the job: a failure in
   prolog cause a job to be shallow resubmitted; epilog executed after
   end of MPI jobs).
 - Job perusal: job's file content inspection while the job is running
   (for a specific list of jobs specified at submission time).

The WMS is interfaced to different data catalogs. The data catalog is
indicated through the 'DataRequirements' JDL tag value: DLI (Data
Location Interface for LFC and DPM) or SI (gLite Storage Index for
FiReMAN).

Job types supported:
 - interactive stdin/out streams
 - DAGs
 - collections
 - parametric jobs

High level job control tools
 - WSDL for jobs submission (WMProxy) + CLI + APIs
 - LB APIs

Deployment status: gLite3.0 WMS too buggy and not really deployed. A
controlled number of new instances have been intensively tested on PPS
(CMS and ATLAS tests) and should become available on the production
infrastructure very soon. Performance results on the development
version show 0.5s/job submission, 2.5h to submit 20k jobs, very few
failures due to the WMS itself (failures usually due to application or
site specific problems).

Plans:
 - High availability of the RB (more robust).
 - Job provenance: retain data on finished jobs
 - ICE: interface to WS-based CE
 - DGAS: storage accounting system

JM: Will the polling latency be reduced?
Will improve when using CondorG and CREAM: currently, a job monitor
parses the Condor log file regularly to get notification. With ICE,
there will be a real notification mechanism.