The "BOF" Storage Ontology for LHC SC4. D. Petravick Fermilab Surning the SRM BOF metings are CHEP 06, a small ontology was made that includes conceptswhich are of urgent concern for SC4. Other concepts that are considered significant by sites are not included. so it is likely only a beginning. Three terms are immediately relevant for the beginning of SC4. These concepts were discussed in two SRM BOF's help at CHEP 06, consensus of was reached. The concepts are ================ File Retention Time, Quality of Storage, Uniformity of Transfer Performance. Summary table ============= Abstract SRM V1 Web Services SRM V2web Services GLUE Concept API notes API notes notes ======= =================== ================== =================== File not modeled modeled explicitly modeled Retention permanent permanent Time durable durable volatile volatile other "SAElement SAType" Quality of not modeled not modeled modeled Retention. implicitly via "SAElement Architecture" Uniformity of not discoverable not discoverable modeled Transfer Performance manageable via pin manageable via implicitly via PrepareToGet*** "SAElement Architecture" ***Current implementation has significant performance side effects, unaccepted proposal exists to removed side effects. File Retention Time ==================== File Retention Time is a concept in the SRM V2 specification. It informs storage system administrators and storage system software when files may be autonomously deleted by the storage system in order to increase its capacity. Retention time is an orthogonal concept to the concepts of probability of data-loss-due-to-accident, and the storage system users ability to delete files on command. There are three kinds of File retention times. "volatile" -- An expiration time is associated with the file. When the expiration time passes, the storage system may unilaterally delete the file. "durable" -- An expiration time is associated with the file. When the expiration time passed, the storage system may free the capacity by deleting the file after taking some due care. The care is system specific. Conversations in the SRM collaboration have mentioned, putting the file on tape backup, sending an email to a user waring that the file will soon be deleted. "permanent" -- The storage system may not unilaterally delete the file. "other" -- This has no precise meaning.. LHC notes -- LHC Experiments are not known to model retention times in their catalogs. Therefore, it would seem that "permanent" is the only appropriate retention policy for cataloged files. The SRM V2 web services API implements the concepts of "permanent", "durable", and "volatile". "other"is not modeled in the SRM web services API, SRM V1 web services API models none of these concepts. The GLUE "StoraegArea" Entity "GlueSAType" property models "permanent", "durable" "volatile" and "other". Quality of Retention. ===================== In our discussions, it was realized that there are many distinctions that could be made, for example raided versus raided disk. The consensus inthe BOF was that an important distinction was "safe on tape" or not. Every tape system,in practice, is accessed through a disk cache. The GLUE StoraegArea Entity property StorageElement entity has a property "Architecture" which seems to a place for for modeling quality of retention. The property is an enumeration which takes on the values "disk", "tape" "multidisk" and "other". Quality of Retention is not a concept in the version 1 nor the version 2 SRM Web services API. In the interim "tape" means architecture where the data are "safe on tape" "disk" and "multidisk" mean the data are not "safe on tape" "other" has no assigned meaning. Uniformity of Transfer Performance ==================================== A related concept is that experiments want to know if the storage system they are dealing with is made of up of systems where files may exist in different performance modes, and the mode a file is in might be affected by user action. Thsi is perpahe too abstract to be clear-- For SC4, the consensus was that there are two types of systems needing consideration -- ones with a storage hierarchy, and ones without such a hierarchy. ALl known relevant heirarchical systems have tape at the bottom. Some systems have several layers of disk in the hierarchy. Given such hierarchical systems, There is a strong desire to implement the functionality of moving a file to the top of the hierarchy, and maintaining it there. There seem to be two type of maintenance, one type where the maintenance is explicitly managed. An example feature of this kind of system is that a storage system may refuse to promote a file to the top of the hierarchy because it is maintaining many files at the top of the hierarchy. The other concept is that the storage system only gives additional weight to its consideration that the file be at the top of the hierarchy. The ontology for this is not developed. however information is available by coincidence -- the aspect of systems is modeled in the same GLUE property as "Quality or Retention". The mapping of the enumeration is as follows: "tape" means that files have relatively non-uniform transfer performance. "disk" and "multidisk" mean the data have relatively uniform transfer performance. "other" has no assigned meaning. Comments: SRM V1 web services API deals with promoting a file to the top of the hierarchy with the notion of pin. the SRM V2 web services API web services handles the notion as a kind of "prepareToGet" of the file.