next_inactive up previous


Tests on DPM 1.3.4 and DPM 1.3.7

Spinoso V. - Donvito G.
INFN Bari

SEPTEMBER 1, 2005


Contents

1 Starting notes

We remember:

a) srmcp doesn't allow overwriting an homonym file already stored on DPM server;

b) otherwise, srmcp overwrites WITHOUT WARNING an homonym file stored on local fs.

This test uses also dCache 1.6.5-2. Where not specified, tests and results pertain to both versions of DPM.

2 Test suite

2.1 Directory listing on DPM server via SRM: SUCCESSFUL

$ /opt/d-cache/srm/bin/gridftplist gsiftp://pccms5.cmsfarm1.ba.infn.it//dpm/ba.infn.it/home/cms/

-rw-rw-r- 1 11410 1399 225633480 Jul 26 18:01 test_perf

-rw-rw-r- 1 11410 1399 3384502200 Jul 27 16:33 very_big

2.2 Copy local fs -> DPM server via SRM: SUCCESSFUL

$ /opt/d-cache/srm/bin/srmcp file:////home/eric/test_big1 srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_big1

2.3 Directory listing on DPM server via SRM, verifying the presence of the copied file: SUCCESSFUL

$ /opt/d-cache/srm/bin/gridftplist gsiftp://pccms5.cmsfarm1.ba.infn.it//dpm/ba.infn.it/home/cms/

-rw-rw-r- 1 11410 1399 1128167400 Aug 4 16:04 test_big1

-rw-rw-r- 1 11410 1399 225633480 Jul 26 18:01 test_perf

-rw-rw-r- 1 11410 1399 3384502200 Jul 27 16:33 very_big

2.4 Multiple access to the same file by many applications (deleting while transferring file): SUCCESSFUL

This tests SRM pinning features. Started a copy DPM server -> local fs via SRM, we tried to delete the same file on the DPM server using a second shell.

(sh 1):$ /opt/d-cache/srm/bin/srmcp

 srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_big1 file:////home/eric/test_big1

&

(sh 2):$ /opt/d-cache/srm/bin/srm-advisory-delete srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_big1

When (sh 2) returned, and while (sh 1) still transfering, we run gridftplist:

(sh 2):$ /opt/d-cache/srm/bin/gridftplist gsiftp://pccms5.cmsfarm1.ba.infn.it//dpm/ba.infn.it/home/cms/

-rw-rw-r- 1 11410 1399 225633480 Jul 26 18:01 test_perf

-rw-rw-r- 1 11410 1399 3384502200 Jul 27 16:33 very_big

Transfer on (sh 1) went on, until successful end of transfer, apart from previous deleting.

2.5 Copy local fs -> DPM server via SRM, into a nonexistent directory: SUCCESSFUL

In particular, srmcp self-creates the nonexistent directory (nonesisto/):

$ /opt/d-cache/srm/bin/srmcp -debug file:////home/eric/test_big srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/nonesisto/test_big

$ /opt/d-cache/srm/bin/gridftplist gsiftp://pccms5.cmsfarm1.ba.infn.it//dpm/ba.infn.it/home/cms/

drwxrwxr-x 1 11410 1399 0 Aug 4 16:38 nonesisto

-rw-rw-r- 1 11410 1399 225633480 Jul 26 18:01 test_perf

-rw-rw-r- 1 11410 1399 3384502200 Jul 27 16:33 very_big

$ /opt/d-cache/srm/bin/gridftplist gsiftp://pccms5.cmsfarm1.ba.infn.it//dpm/ba.infn.it/home/cms/nonesisto

-rw-rw-r- 1 11410 1399 496173056 Aug 4 16:40 test_big

2.6 Copy DPM server -> same DPM server via SRM: PARTLY SUCCESSFUL

On DPM 1.3.4 srmcp returns several "sleeping" messages; besides, interrupting srmcp (CTRL+C) causes the CRASH of the srmv1 daemon on the DPM server. Of course, it must be restarted manually.

On https://savannah.cern.ch/bugs/?func=detailitem&item_id=9393 developers state this bug has been corrected on DPM >=1.3.5; DPM 1.3.7 returns ``sleeping'' messages too, but srmv1 won't crash any more if you ctrl+c while sleeping.

Apart from this bug, DPM is not able to transfer files from DPM server to DPM server, because he is not able to be a ``client'' (just like Castor).

Tests on DPM 1.3.4 follow.

$ /opt/d-cache/srm/bin/srmcp -debug

   srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/nonesisto/test_big

   srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_big

Storage Resource Manager (SRM) CP Client version 1.17

Copyright (c) 2002-2005 Fermi National Accelerator Laborarory

SRM Configuration:

debug=true

gsissl=true

help=false

pushmode=false

userproxy=true

buffer_size=131072

tcp_buffer_size=0

stream_num=10

config_file=/home/eric/.srmconfig/config.xml

glue_mapfile=/opt/d-cache/srm/conf/SRMServerV1.map

webservice_path=srm/managerv1.wsdl

webservice_protocol=https

gsiftpclinet=globus-url-copy

protocols_list=http,gsiftp

save_config_file=null

srmcphome=/opt/d-cache/srm

urlcopy=/opt/d-cache/srm/sbin/url-copy.sh

x509_user_cert=/home/eric/.globus/usercert.pem

x509_user_key=/home/eric/.globus/userkey.pem

x509_user_proxy=/tmp/x509up_u501

x509_user_trusted_certificates=/etc/grid-security/certificates

retry_num=20

retry_timeout=10000

wsdl_url=null

use_urlcopy_script=false

connect_to_wsdl=false

delegate=true

full_delegation=true

from[0]=srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/nonesisto/test_big

to=srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_big

Thu Aug 04 16:49:24 CEST 2005: starting SRMCopyPullClient

Thu Aug 04 16:49:24 CEST 2005: SRMClient(https,srm/managerv1.wsdl,true)

Thu Aug 04 16:49:24 CEST 2005: connecting to server

Thu Aug 04 16:49:24 CEST 2005: connected to server, obtaining proxy

SRMClientV1 : connecting to srm at httpg://pccms5.cmsfarm1.ba.infn.it:8443/srm/managerv1

Thu Aug 04 16:49:26 CEST 2005: got proxy of type class org.dcache.srm.client.SRMClientV1

Thu Aug 04 16:49:26 CEST 2005: 

   copying srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/nonesisto/test_big

   into srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_big

SRMClientV1 : copy, srcSURLS[0]="srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/nonesisto/test_big"

SRMClientV1 : copy, destSURLS[0]="srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_big"

SRMClientV1 : copy, contacting service httpg://pccms5.cmsfarm1.ba.infn.it:8443/srm/managerv1

Thu Aug 04 16:49:30 CEST 2005: srm returned requestId = 34

Thu Aug 04 16:49:30 CEST 2005: sleeping 1 seconds ...

Thu Aug 04 16:49:32 CEST 2005: sleeping 10 seconds ...

Thu Aug 04 16:49:42 CEST 2005: sleeping 10 seconds ...

Thu Aug 04 16:49:53 CEST 2005: sleeping 10 seconds ...

Thu Aug 04 16:50:03 CEST 2005: sleeping 10 seconds ...

Thu Aug 04 16:50:13 CEST 2005: sleeping 10 seconds ...

Thu Aug 04 16:50:24 CEST 2005: sleeping 10 seconds ...

Thu Aug 04 16:50:34 CEST 2005: sleeping 10 seconds ...

Thu Aug 04 16:50:45 CEST 2005: sleeping 10 seconds ...

Thu Aug 04 16:50:55 CEST 2005: sleeping 10 seconds ...

Thu Aug 04 16:51:05 CEST 2005: sleeping 10 seconds ...

Thu Aug 04 16:51:16 CEST 2005: sleeping 10 seconds ...

Thu Aug 04 16:51:26 CEST 2005: sleeping 10 seconds ...

Thu Aug 04 16:51:36 CEST 2005: sleeping 10 seconds ...

Thu Aug 04 16:51:47 CEST 2005: sleeping 10 seconds ...

Thu Aug 04 16:51:57 CEST 2005: sleeping 10 seconds ...

Thu Aug 04 16:52:08 CEST 2005: sleeping 10 seconds ...

> > > CTRL+C < < <

Thu Aug 04 16:52:08 CEST 2005: setting all remaining file statuses of request requestId=34 to "Done"

Thu Aug 04 16:52:08 CEST 2005: setting file request 0 status to Done

SRMClientV1 : getRequestStatus: try #0 failed with error

SRMClientV1 : (0)null

java.lang.RuntimeException: (0)null

at org.dcache.srm.client.SRMClientV1.setFileStatus(SRMClientV1.java:1097)

at gov.fnal.srm.util.SRMCopyPullClient.run(SRMCopyPullClient.java:309)

at java.lang.Thread.run(Thread.java:534)

$

When the command had finished, we verified srmv1 status (DPM 1.3.4):

# /etc/init.d/srmv1 status

srmv1 dead but pid file exists [FAILED]

#

On the other side, srmv1 on DPM 1.3.7:

# /etc/init.d/srmv1 status 

srmv1 (pid 10476) is running...[  OK  ] 

2.7 dpm-addfs, putting, listing and deleting a file: SUCCESSFUL

Adding a new DPM fs (dpm-addfs), putting a file on the new fs via SRM, and then removing it (srm-advisory-delete) don't make trouble.

2.8 Setting RDONLY on a fs (dpm-modifyfs): PARTLY SUCCESSFUL

We used dpm-modifyfs to set one partition RO, leaving the other one RW.

dpm-modifyfs -server pccms5 -fs /storage1 -st RDONLY

dpm-modifyfs -server pccms5 -fs /storage2 -st RDONLY

Then dpm-qryconf presents wrong information:

[root@pccms5 dpm]# dpm-qryconf 

POOL pccms5.cmsfarm1 DEFSIZE 200.00M GC_START_THRESH 0 

GC_STOP_THRESH 0 DEFPINTIME 0 PUT_RETENP 86400 FSS_POLICY maxfreespace 

GC_POLICY lru RS_POLICY fifo GID 0 S_TYPE - 

CAPACITY 76.27G FREE 75.80G ( 99.4%) 

pccms22 /home/storage CAPACITY 10.87G FREE 9.42G ( 86.7%) RDONLY 

pccms5 /storage1 CAPACITY 13.94G FREE 13.47G ( 96.7%) 

pccms5 /storage2 CAPACITY 62.33G FREE 62.32G (100.0%) 

We don't know whether dpm sees badly like dpm-qryconf; so let's start a transfer on DPM fs. Wrongly, file is put on pccms5, because dpm server uses the wrong infos. Transfer is:

[donvito@gridba1 donvito]$ srmcp -debug=true 

   file:////tmp/trentadue.tar.gz

   srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/trentadue

and on DPM server:

[root@pccms5 dpm]# dpm-qryconf 

POOL pccms5.cmsfarm1 DEFSIZE 200.00M GC_START_THRESH 0 

GC_STOP_THRESH 0 DEFPINTIME 0 PUT_RETENP 86400 FSS_POLICY maxfreespace 

GC_POLICY lru RS_POLICY fifo GID 0 S_TYPE -                               

CAPACITY 76.27G FREE 73.20G ( 96.0%)   

pccms22 /home/storage CAPACITY 10.87G FREE 9.42G ( 86.7%) RDONLY   

pccms5 /storage1 CAPACITY 13.94G FREE 10.87G ( 78.0%)   

pccms5 /storage2 CAPACITY 62.33G FREE 62.32G (100.0%)

You can notice >2GB space allocated on pccms5 /storage1, which should be RO! The final solution is restarting the dpm daemon after setting RO on the fs. Doing so, if you put a file on the DPM fs, the DPM service selects correctly the RW partition, respecting the RO/RW flags.

2.9 Trying to store files with no available fs: FAILED

Let's start from this situation:

[root@pccms5 dpm]# dpm-qryconf 

POOL pccms5.cmsfarm1 DEFSIZE 200.00M GC_START_THRESH 0 

GC_STOP_THRESH 0 DEFPINTIME 0 PUT_RETENP 86400 

FSS_POLICY maxfreespace GC_POLICY lru RS_POLICY fifo GID 0 S_TYPE - 

CAPACITY 10.87G FREE 9.42G ( 86.7%) 

pccms22 /home/storage CAPACITY 10.87G FREE 9.42G ( 86.7%) 

pccms5 /storage1 CAPACITY 13.94G FREE 13.47G ( 96.7%) RDONLY 

pccms5 /storage2 CAPACITY 62.33G FREE 62.32G (100.0%) |RDONLY

The only available fs is on pccms22. Let's stop dpm-gsiftp on that host:

[root@pccms22 root]# /etc/init.d/dpm-gsiftp stop
Proceeding with a file copy, we expected DPM to tell us an error message about the space unavailability. On the contrary, it tries to contact pccms22, which ``refuses connection''; in the meanwhile, space for the file is already allocated on pccms22. Restarting dpm daemon corrects capacity on pccms22. We expected the space for the file to be allocated AFTER negotiating connection towards dpm-gsiftp.

2.10 dpm daemon sync: FAILED

Just starting/stopping/restarting dpm daemon:

[root@pccms5 dpm]# /etc/init.d/dpm stop 

Stopping dpm:                                              [  OK  ] 

[root@pccms5 dpm]# /etc/init.d/dpm start 

Starting dpm:                                              [  OK  ] 

[root@pccms5 dpm]# /etc/init.d/dpm restart 

Stopping dpm:                                              [  OK  ] 

Starting dpm: dpm already started:                         [FAILED] 

[root@pccms5 dpm]# /etc/init.d/dpm restart 

dpm already stopped:                                       [FAILED] 

Starting dpm:                                              [  OK  ] 

[root@pccms5 dpm]# /etc/init.d/dpm restart 

Stopping dpm:                                              [  OK  ] 

Starting dpm: dpm already started:                         [FAILED] 

[root@pccms5 dpm]# /etc/init.d/dpm restart 

dpm already stopped:                                       [FAILED] 

Starting dpm:                                              [  OK  ] 

[root@pccms5 dpm]# /etc/init.d/dpm restart 

Stopping dpm:                                              [  OK  ] 

Starting dpm:                                              [  OK  ] 

[root@pccms5 dpm]#

2.11 Getting files stored on a removed DPM fs (dpm-rmfs): FAILED

Unexpectedly, gridftplist still returns information on those files on the dpm-rmfs'd fs! We expected srmcp to give an error about the nonexistence/unavailability of those files; on the contrary, srmcp bypasses the unavailability of the DPM fs, negotiates getting and succeeds in gsiftping. The output of dpm-qryconf is often unconsistent with dpm-removing: it still prints old infos, you have to restart dpm daemon to get them correct.

$ /opt/d-cache/srm/bin/gridftplist gsiftp://pccms5.cmsfarm1.ba.infn.it//dpm/ba.infn.it/home/cms

drwxrwxr-x 1 11410 1399 0 Aug 4 16:38 nonesisto

-rw-rw-r- 1 11410 1399 60643264 Jul 26 17:15 test1344554eryrtttyty4rtrtrt

-rw-rw-r- 1 11410 1399 60643264 Jul 26 17:25 test1344554eryrtttyty4rtrtrtt4t

-rw-rw-r- 1 11410 1399 496173056 Aug 5 11:51 test_big17

-rw-rw-r- 1 11410 1399 496173056 Aug 5 11:55 test_big18

-rw-rw-r- 1 11410 1399 60643264 Jul 26 17:29 test_donvito3435

-rw-rw-r- 1 11410 1399 60643264 Jul 27 12:49 test_donvito3435ef

-rw-rw-r- 1 11410 1399 225633480 Jul 26 18:01 test_perf

-rw-rw-r- 1 11410 1399 3384502200 Jul 27 16:33 very_big

$ /opt/d-cache/srm/bin/gridftplist gsiftp://pccms5.cmsfarm1.ba.infn.it//dpm/ba.infn.it/home/cms/nonesisto

-rw-rw-r- 1 11410 1399 496173056 Aug 4 16:40 test_big

$ /opt/d-cache/srm/bin/srmcp -debug srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_perf file:////home/eric/test_nonesisto

Storage Resource Manager (SRM) CP Client version 1.17

Copyright (c) 2002-2005 Fermi National Accelerator Laborarory

SRM Configuration:

debug=true

gsissl=true

help=false

pushmode=false

userproxy=true

buffer_size=131072

tcp_buffer_size=0

stream_num=10

config_file=/home/eric/.srmconfig/config.xml

glue_mapfile=/opt/d-cache/srm/conf/SRMServerV1.map

webservice_path=srm/managerv1.wsdl

webservice_protocol=https

gsiftpclinet=globus-url-copy

protocols_list=http,gsiftp

save_config_file=null

srmcphome=/opt/d-cache/srm

urlcopy=/opt/d-cache/srm/sbin/url-copy.sh

x509_user_cert=/home/eric/.globus/usercert.pem

x509_user_key=/home/eric/.globus/userkey.pem

x509_user_proxy=/tmp/x509up_u501

x509_user_trusted_certificates=/etc/grid-security/certificates

retry_num=20

retry_timeout=10000

wsdl_url=null

use_urlcopy_script=false

connect_to_wsdl=false

delegate=true

full_delegation=true

from[0]=srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_perf

to=file:////home/eric/test_nonesisto

Fri Aug 05 13:24:53 CEST 2005: starting SRMGetClient

Fri Aug 05 13:24:53 CEST 2005: SRMClient(https,srm/managerv1.wsdl,true)

Fri Aug 05 13:24:53 CEST 2005: connecting to server

Fri Aug 05 13:24:53 CEST 2005: connected to server, obtaining proxy

SRMClientV1 : connecting to srm at httpg://pccms5.cmsfarm1.ba.infn.it:8443/srm/managerv1

Fri Aug 05 13:24:55 CEST 2005: got proxy of type class org.dcache.srm.client.SRMClientV1

SRMClientV1 : get: surls[0]="srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_perf"

SRMClientV1 : get: protocols[0]="http"

SRMClientV1 : get: protocols[1]="dcap"

SRMClientV1 : get: protocols[2]="gsiftp"

SRMClientV1 : get, contacting service httpg://pccms5.cmsfarm1.ba.infn.it:8443/srm/managerv1

doneAddingJobs is false

copy_jobs is empty

Fri Aug 05 13:24:59 CEST 2005: srm returned requestId = 49

Fri Aug 05 13:24:59 CEST 2005: sleeping 1 seconds ...

Fri Aug 05 13:25:01 CEST 2005: FileRequestStatus with SURL=srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_perf is Ready

Fri Aug 05 13:25:01 CEST 2005: received TURL=gsiftp://pccms5.cmsfarm1.ba.infn.it/pccms5:/storage/cms/2005-07-26/test_perf.5.0

Fri Aug 05 13:25:01 CEST 2005: fileIDs is empty, breaking the loop

doneAddingJobs is true

copy_jobs is not empty

copying CopyJob, source = gsiftp://pccms5.cmsfarm1.ba.infn.it/pccms5:/storage/cms/2005-07-26/test_perf.5.0 destination = file:////home/eric/test_nonesisto

GridftpClient: memory buffer size is set to 131072

GridftpClient: connecting to pccms5.cmsfarm1.ba.infn.it on port 2811

GridftpClient: gridFTPClient tcp buffer size is set to 0

GridftpClient: gridFTPRead started

GridftpClient: parallelism: 10

GridftpClient: waiting for completion of transfer

GridftpClient: gridFtpWrite: starting the transfer in emode from pccms5:/storage/cms/2005-07-26/test_perf.5.0

GridftpClient: DiskDataSink.close() called

GridftpClient: gridFTPWrite() wrote 225633480bytes

GridftpClient: closing client : org.dcache.srm.util.GridftpClient$FnalGridFTPClient@1554d32

GridftpClient: closed client

execution of CopyJob, source = gsiftp://pccms5.cmsfarm1.ba.infn.it/pccms5:/storage/cms/2005-07-26/test_perf.5.0 destination = file:////home/eric/test_nonesisto completed

setting file request 0 status to Done

doneAddingJobs is true

copy_jobs is empty

stopping copier

2.12 Interrupting transfer of a big file (9GB): FAILED

Interrupting transfer of a big file (9GB), not bigger than free space on DPM server (15GB free) fails: after interrupt (CTRL+C), there is a corresponding 0-byte file on the DPM server. We removed that (srm-advisory-delete), but dpm-qryconf on DPM server still indicates 9GB allocated (corresponding to the interrupted big file). Finally, if you restart dpm daemon on DPM server, you'll see dpm-qryconf returning the correct free space.

# dpm-qryconf

POOL pccms5.cmsfarm1 DEFSIZE 200.00M GC_START_THRESH 0 

   GC_STOP_THRESH 0 DEFPINTIME 0 PUT_RETENP 86400 

   FSS_POLICY maxfreespace GC_POLICY lru RS_POLICY fifo GID 0 S_TYPE -

CAPACITY 13.94G FREE 4.61G ( 33.1%)

pccms5 /storage1 CAPACITY 13.94G FREE 4.61G ( 33.1%)

2.13 File access through many protocols: SUCCESSFUL

ROOT is not supported yet. All protocols work. Load balancing works also with both rfcp and gridftp.

[eric@pccms10 eric]$ export DPNS_HOST=pccms5.cmsfarm1.ba.infn.it 

[eric@pccms10 eric]$ /opt/lcg/bin/rfdir /dpm drwxrwxr-x 1 root root 0 Jul 26 17:14 ba.infn.it 

[eric@pccms10 eric]$ /opt/lcg/bin/rfdir /dpm/ba.infn.it 

drwxrwxr-x 6 root root 0 Jul 26 17:14 home 

[eric@pccms10 eric]$ /opt/lcg/bin/rfdir /dpm/ba.infn.it/home 

drwxrwxr-x 0 root 1395 0 Jul 26 17:14 alice 

drwxrwxr-x 0 root 1307 0 Jul 26 17:14 atlas 

drwxrwxr-x 10 root cms 0 Aug 05 15:03 cms 

drwxrwxr-x 0 root 2688 0 Jul 26 17:14 dteam 

drwxrwxr-x 0 root 1470 0 Jul 26 17:14 lhcb 

drwxrwxr-x 0 root 1077 0 Jul 26 17:14 sixt [

eric@pccms10 eric]$ /opt/lcg/bin/rfdir /dpm/ba.infn.it/home/cms 

drwxrwxr-x 1 11410 cms 0 Aug 04 16:38 nonesisto 

-rw-rw-r- 1 11410 cms 60643264 Jul 26 17:15 test1344554eryrtttyty4rtrtrt 

-rw-rw-r- 1 11410 cms 60643264 Jul 26 17:25 test1344554eryrtttyty4rtrtrtt4t 

-rw-rw-r- 1 11410 cms 496173056 Aug 05 11:51 test_big17 

-rw-rw-r- 1 11410 cms 496173056 Aug 05 11:55 test_big18 

-rw-rw-r- 1 11410 cms 0 Aug 05 15:03 test_bigger 

-rw-rw-r- 1 11410 cms 60643264 Jul 26 17:29 test_donvito3435 

-rw-rw-r- 1 11410 cms 60643264 Jul 27 12:49 test_donvito3435ef 

-rw-rw-r- 1 11410 cms 225633480 Jul 26 18:01 test_perf 

-rw-rw-r- 1 11410 cms 3384502200 Jul 27 16:33 very_big 

[eric@pccms10 eric]$ export DPM_HOST=pccms5.cmsfarm1.ba.infn.it 

[eric@pccms10 eric]$ /opt/lcg/bin/rfcp edglog.log /dpm/ba.infn.it/home/cms/edglog.log 

79 bytes in 0 seconds through local (in) and eth0 (out)

Rfcp commands tested. Files, submitted one by one, are alternatevely stored on our two pools by the server.

rfcp infn_cert_240.tar.gz  /dpm/ba.infn.it/home/cms/infn_cert_24091.tar.gz  

rfcp infn_cert_240.tar.gz  /dpm/ba.infn.it/home/cms/infn_cert_24091.tar.gz  

rfcp infn_cert_240.tar.gz  /dpm/ba.infn.it/home/cms/infn_cert_24091.tar.gz

Gridftp commands tested. gridftp sees DPM shared fs:

edg-gridftp-ls -v gsiftp://pccms5.cmsfarm1.ba.infn.it:2811//dpm/ba.infn.it/home/cms/
globus-url-copy refers to pccms5, which is the DPM server, but in this case the file is stored on the other pool; that's to say, globus-url-copy can store files on the other pool in a trasparent way.

globus-url-copy file:///home/donvito/infn_cert_240.tar.gz gsiftp://pccms5.cmsfarm1.ba.infn.it:2811//dpm/ba.infn.it/home/cms/infn_cert_240912211df1.tar.gz

2.14 Manipulating fs and permissions: SUCCESSFUL

No problems. Permissions are correctly handled.

2.15 Write a file bigger than free space (PreparetoPut): FAILED

Having a free space of 4.6GB on the DPM server:

# dpm-qryconf

POOL pccms5.cmsfarm1 DEFSIZE 200.00M GC_START_THRESH 0 

   GC_STOP_THRESH 0 DEFPINTIME 0 PUT_RETENP 86400 

   FSS_POLICY maxfreespace GC_POLICY lru RS_POLICY fifo GID 0 S_TYPE -

CAPACITY 13.94G FREE 4.61G ( 33.1%)

pccms5 /storage1 CAPACITY 13.94G FREE 4.61G ( 33.1%)

we tried to put a 9GB file on the DPM server. Now, srmcp correctly detects no space left on device and avoids transferring; but, after that, we submitted again THE SAME srmcp command line; srmcp returns that the file EXISTS on the server, with a 0-byte filesize: THIS is incorrect.

$ /opt/d-cache/srm/bin/srmcp file:////home/eric/test_bigger srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_bigger

Fri Aug 05 15:02:24 CEST 2005: rs.state = Failed rs.error = No space left on device

srmcp error : rs.state = Failed rs.error = No space left on device

SRMClientV1 : getRequestStatus: try #0 failed with error

SRMClientV1 : Invalid state

java.lang.RuntimeException: Invalid state

at org.dcache.srm.client.SRMClientV1.setFileStatus(SRMClientV1.java:1097)

at gov.fnal.srm.util.SRMPutClient.run(SRMPutClient.java:370)

at java.lang.Thread.run(Thread.java:534)

$ /opt/d-cache/srm/bin/srmcp file:////home/eric/test_bigger srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_bigger

Fri Aug 05 15:02:40 CEST 2005: rs.state = Failed rs.error = File exists

srmcp error : rs.state = Failed rs.error = File exists

SRMClientV1 : getRequestStatus: try #0 failed with error

SRMClientV1 : Invalid state

java.lang.RuntimeException: Invalid state

at org.dcache.srm.client.SRMClientV1.setFileStatus(SRMClientV1.java:1097)

at gov.fnal.srm.util.SRMPutClient.run(SRMPutClient.java:370)

at java.lang.Thread.run(Thread.java:534)

2.16 Remote copy, interaction among many SRM: SUCCESSFUL

Working interactions:

dCache <-> DPM OK

DPM <-> DPM Sleeping forever

DPM <-> Castor Sleeping forever

In particular, DPM (towards dCache) behaves like Castor:

srm2srm: dCache Locale -> DPM Locale, pushmode=true

/opt/d-cache/srm/bin/srmcp -pushmode=true

   srm://alicegrid4.ba.infn.it:8443/pnfs/ba.infn.it/cms/prova_dpm 

   srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_donvito34

See http://www.dcache.org/manuals/experts_docs/pushpull.html for further details.

The transfer between DPM and another DPM or CASTOR server shows that pure srmcp is not yet supported.

2.17 Multiple requests: SUCCESSFUL

Multiple requested are shared among two machines, using both srmcp and rfcp.

3 Failure test

3.1 Crashing mysql while PreparetoPut: SUCCESSFUL

If transfer begins while mysql is off, srmcp waits, retrying periodically; if mysql wakes up in the meanwhile, then srmcp handles transfer again, as expected.

$ /opt/d-cache/srm/bin/srmcp -debug file:////home/eric/test_big srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_big

Storage Resource Manager (SRM) CP Client version 1.17

Copyright (c) 2002-2005 Fermi National Accelerator Laborarory

SRM Configuration:

debug=true

gsissl=true

help=false

pushmode=false

userproxy=true

buffer_size=131072

tcp_buffer_size=0

stream_num=10

config_file=/home/eric/.srmconfig/config.xml

glue_mapfile=/opt/d-cache/srm/conf/SRMServerV1.map

webservice_path=srm/managerv1.wsdl

webservice_protocol=https

gsiftpclinet=globus-url-copy

protocols_list=http,gsiftp

save_config_file=null

srmcphome=/opt/d-cache/srm

urlcopy=/opt/d-cache/srm/sbin/url-copy.sh

x509_user_cert=/home/eric/.globus/usercert.pem

x509_user_key=/home/eric/.globus/userkey.pem

x509_user_proxy=/tmp/x509up_u501

x509_user_trusted_certificates=/etc/grid-security/certificates

retry_num=20

retry_timeout=10000

wsdl_url=null

use_urlcopy_script=false

connect_to_wsdl=false

delegate=true

full_delegation=true

from[0]=file:////home/eric/test_big

to=srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_big

Thu Aug 04 17:40:49 CEST 2005: starting SRMPutClient

Thu Aug 04 17:40:49 CEST 2005: SRMClient(https,srm/managerv1.wsdl,true)

Thu Aug 04 17:40:49 CEST 2005: connecting to server

Thu Aug 04 17:40:49 CEST 2005: connected to server, obtaining proxy

SRMClientV1 : connecting to srm at httpg://pccms5.cmsfarm1.ba.infn.it:8443/srm/managerv1

Thu Aug 04 17:40:52 CEST 2005: got proxy of type class org.dcache.srm.client.SRMClientV1

SRMClientV1 : put, sources[0]="/home/eric/test_big"

SRMClientV1 : put, dests[0]="srm://pccms5.cmsfarm1.ba.infn.it:8443/dpm/ba.infn.it/home/cms/test_big"

SRMClientV1 : put, protocols[0]="http"

SRMClientV1 : put, protocols[1]="dcap"

SRMClientV1 : put, protocols[2]="gsiftp"

SRMClientV1 : put, contacting service httpg://pccms5.cmsfarm1.ba.infn.it:8443/srm/managerv1

doneAddingJobs is false

copy_jobs is empty

SRMClientV1 : put: try # 0 failed with error

SRMClientV1 : Can't get req uniqueid

SRMClientV1 : put: try again

SRMClientV1 : sleeping for 10000 milliseconds before retrying

SRMClientV1 : put: try # 1 failed with error

SRMClientV1 : Can't get req uniqueid

SRMClientV1 : put: try again

SRMClientV1 : sleeping for 20000 milliseconds before retrying

Thu Aug 04 17:41:27 CEST 2005: srm returned requestId = 38

Thu Aug 04 17:41:27 CEST 2005: sleeping 1 seconds ...

Thu Aug 04 17:41:28 CEST 2005: rs.state = Failed rs.error = File exists

Thu Aug 04 17:41:28 CEST 2005: ====> fileStatus state ==Failed

java.io.IOException: rs.state = Failed rs.error = File exists

at gov.fnal.srm.util.SRMPutClient.start(SRMPutClient.java:336)

at gov.fnal.srm.util.SRMCopy.work(SRMCopy.java:409)

at gov.fnal.srm.util.SRMCopy.main(SRMCopy.java:242)

doneAddingJobs is true

copy_jobs is empty

stopping copier

Exception in thread "main" java.io.IOException: rs.state = Failed rs.error = File exists

at gov.fnal.srm.util.SRMPutClient.start(SRMPutClient.java:336)

at gov.fnal.srm.util.SRMCopy.work(SRMCopy.java:409)

at gov.fnal.srm.util.SRMCopy.main(SRMCopy.java:242)

Thu Aug 04 17:41:28 CEST 2005: setting all remaining file statuses to "Done"

Thu Aug 04 17:41:28 CEST 2005: setting file request 0 status to Done

SRMClientV1 : getRequestStatus: try #0 failed with error

SRMClientV1 : Invalid state

java.lang.RuntimeException: Invalid state

at org.dcache.srm.client.SRMClientV1.setFileStatus(SRMClientV1.java:1097)

at gov.fnal.srm.util.SRMPutClient.run(SRMPutClient.java:370)

at java.lang.Thread.run(Thread.java:534)

$

3.2 Crashing a file server: SUCCESSFUL

If you kill, for example, gridftp daemon on a pull while that pull is transferring, the transfer is stopped; then no new transfers are directed to the crashed server.

4 To do

Tests with some GFAL client.

About this document ...

Tests on DPM 1.3.4 and DPM 1.3.7

This document was generated using the LaTeX2HTML translator Version 2002 (1.62)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -no_subdir -split 0 -show_section_numbers /tmp/lyx_tmpdir29465ogsMft/lyx_tmpbuf0/DPM-test-suite.tex

The translation was initiated by on 2005-09-01


next_inactive up previous
2005-09-01