The WMS shall allow users to run arbitrary executables and workloads. All files shall be catalogable with metadata. Computing systems shall support outbound http requests. The WMS shall have (provide? use?) these connections. Audit tools shall be provided to produce storage usage reports. These reports shall list usage for each user. User storage space and production storage space shall be accounted separately. The WMS shall allow users and production teams to request minimum resources per job, for example CPU cores, run time, memory, scratch disk space and i/o. The WMS system shall match jobs with available resources that meet the requested requirements when possible, and report to the jobs what resources are available. Jobs shall be able to determine how much time is left before it is stopped or held. The system shall allow efficient use of resources. (example: the lifetimes of the pilots introduces discretization of available time that can result in inefficiency unless there is a method to take advantage odd amounts of remaining time in the glidein). The WMS shall support the models of sending jobs to the data or sending the data to the jobs. Sending jobs to data involves pre-staging data at sites. The distributed data management system shall provide the ability to assess demand and track popularity of data sets. Data management shall by dynamic. Data handling choices such as pre-staging vs. streaming vs. on-demand copying shall be made dynamically by the DDM. Explicit data management choices such as pre-staging and pinning datasets at sites shall be supported. Protocol choices should also be possible to explicitly choose. The Data Management System shall be integrated with EOS and CASTOR. The system shall provide a subscription feature for data movement and deletion. The system shall have appropriate permissions and privileges implemented to authorize each operation and assign priority. The data handling system shall provide support for access to federated datasets via multiple protocols including streaming protocols. The WMS shall be able to interface with multiple Data Management Systems via modular API's. ------------------- Terminology: Abstract concepts: Campaign: Examples: MC Challenge Data reprocessing Derivation/Skim Keep-up processing (Tier-0) Defined by software release / configuration list of input datasets Task: Series of data transformations on a single dataset Subtask: single transformation Concrete concepts: Cluster of jobs: defines when next dependent subtask can start consists of jobs Job: actual execution of one or more transformations on one or more units of work on one compute resource ------------------- The WMS system supports the above-named concepts. We should be able to clone the abstract concepts and the system should then rerun the concrete steps. It shall be possible to rerun the same execution of a job both interactively and on a batch resource for debugging purposes. The WMS shall support this workflow 1) Define a campaign 2) Run the tasks and campaign 3) Automatically recover failed jobs a configurable number of times 4) Debug jobs (human intervention) that continue to fail after automatic retries 5) Complete campaign with patched software. The collaboration may decide to start a new campaign but should have the choice to continue with the patch. --------------------- Logfiles consist of stdout and stderr for each job, as well as associated Condor, POMS, SAM file delivery logs, and database access. Job logfiles shall be kept in permanent storage. Job logfiles shall be indexed with metadata and recoverable associated by campaign, task, and job. Campaign, task, subtask and job submission information shall be stored permanently. A mechanism shall be provided for identifying the raw data file a particular event is in. Multiple roles shall be supported, and designated collaboration members shall be able to assign and revoke roles. The relative ratios of CPU and disk allocations between collaboration-defined categories of work shall be configurable. It shall be possible to steer specific work to specific sites. This feature shall be both rule-based and configurable on a task by task basis. If a site requests to run specific work or just DUNE jobs, such requests shall be honored. Users shall be allowed to use as many features of the WMS as they would like. Users shall be allowed to start campaigns, run tasks or individual jobs and benefit from the WMS features. There shall be just one WMS. Users shall be able to run arbitrary jobs on arbitrary data. Monitoring shall be provided to track the progress of jobs. CPU, memory, and disk usage shall be tracked. Monitoring shall display information indexed by site, by user, and by campaign. Production-specific monitoring pages and dashboards shall be provided. Historical job information shall be provided. Addition of computing sites to the list shall involve a minimal amount of work. Blacklisting of sites shall also be possible in a short amount of time. Accounting of resource usage shall be collected by site, by user, and by campaign. Project names and ID's shall be used to collect accounting information. Databases