1. Do not pin "Replica-Online files". Control behavior with an option. CMS T1 Issues: 0. CRC Errors need to be logged into billing. Auto CRC checking should be enabled.Create Log4j Billing appender or structured message? 1. More info is needed. Replica Manager needs to consider if pool status is enabled or disabled. Gerd: RM only uses PoolStatusChangedMessage, which is sent when pool is fully enabled. When you do PoolManager "pool ls", check the pool status too. 2. Replica Manager will use multiple threads for communication with pools 3. Analyze the Stage and Suspended rc request in the PoolManager and find the once that can be served from the pools that just came back online. If the serving the request fails, we go back to the same staging state. Use SMC for states of the stage requests? Stage Request Scheduler is a separate module in PoolManager. Code is replicated in dCache source tree, so that old and new code can be easily activated in configuration. Patrick, Tigran and Irina need to document (Wiki page) the state machine for Pool Manager Stage Request Scheduler. Investigate if this problem can be solved without a redesign. 4. 5. 6. Specific examples would be needed. Some moves are illegal and will not work. What are the symptoms of the problems? Which commands do you type in? Ticket would help. 7. There is already a ticket for this. Gerd is looking at this , will be fixed soon. 8. Use alternatives to console log4j appenders.Reduce the number of exception stack trace for non errors, emitting sql in non-debug mode (File Not Found on create!!! etc.). 9. What evidence there exists? SRM does not use Pin info for selection of the pool, it uses PoolManager. More info is needed. 10. Issue is understood and will be fixed with some priority. 11. Might be caused by restarts of hardware. Is the healer able to resolve the issue? We need to know exactly what happens? Controls files are leaked when files are removed, control files are corrupt or missing , etc. Specific descriptions would help diagnose and solve the problem. Healer will be added code for removing the control files when data files are absent. 12. The 0 length files are there by design. "From client" state is overloaded in case of control files, in case berkley DB this translates into "From client" or "From Client error" states. Rep ls should show file as broken. This is up to door (dcap door never removes the files) or cleaner to remove. File will be gone on restart. 13,14,15 Certain errors (i.e. OOME) can be handled in a generic way by installing error handlers from cell code. In order to check for missing components we need a way to describe what is expected in a given deployment. Detect a second component deployment as an error. Using JMS would allow for an easy deployment of redundant services servicing the messages of certain type, using common queue, thus increasing reliability. Routing Manager needs to log errors.Impossible to solve in general. Specific problem reports would help. 16. Issue is understood, but no solutions are planned at this time. 17. Issue of PNFS Manager queueing in presence of large number of dCap transfers: try enabling message folding. Discussion with Jon is to follow.