Reviewers: Jim Bottum <jb@clemson.edu>, Ian Fisk <ifisk@fnal.gov>, Mark Neubauer <msn@illinois.edu>, Ewa Deelman <deelman@isi.edu>

Date: Mar 14, 2014

-----

*** Final ET / Reviewers discussion

- ...

- Dealing with large organizations lead to an inherent inertia

- Expanding user basis with the current model (e.g. Software inheriting orphaned projects) is a challenge

- LHC experiments are probably willing to evolve as directed by OSG in long timescales

- LHC experiments would favor a model where resources / effort is given in bursts instead of steady.

- Infrastructure is too static: it takes too long to set up a new site.
Goals: Set up a new site in 1/2 day, then tear them down.
Either provide a large pool that can be easily tapped into or become more agile in setting up access to centers.

- Process to decide on new technologies is inefficient. You don't need more Blueprint meetings. Maybe focused workshops, user groups, a better planning process, ...

- OSG is missing user (at all levels) feedback. No good sense of roadmap from the presentation, comeing from user feedback.

- Campus grid approach is a step in the right direction, but need to set the bar high e.g. this may be a a way to deal with the T3.

- Are the council members able to represent their communities in full? E.g. WLCG does not take decision for the sites.

- Who / how do you engage campus. OSG failed to engage CIOs. Find CIO that can organize a CIO group and advise on this. There is nothing wrong deadline with the scientists directly. CIOs know how to connect in IT depts. Connection between faculty and IT is often lose: IT can learn what to do to help faculty better by dealing with you.

- OSG has strengths in packaging, testing, and deployment. Should capitalize on these e.g. to connect more campuses through them.

- Lot of focus on technical aspects and not enough on people communicating with users. CIO have resources to allocate 0.5 FTE for communication. Jim can help with this strategy.

- Providing course material and example online to spread knowledge. OSG has no funding for education. OSG does only summer school. A user goes to the OSG web site and cannot easily find documentation.

- Resources are deployed to run ops, security, make sure that LHC succeeds. Need to change allocation if you want to evolve beyond core mission.

- Analyze software stack and identify what is really used (gWMS, no cert VOs, etc.) and remove unneeded dependencies. Redeploy the resources from Software.

- You are trying to do too many things yourself and you do not leverage the community.

- Have we tried to "evangelize" - find 10 people and ask them to find 10 people each.

- Help scientists and let their competition know. Find under-served communities / universities: don't forget the little guys.

- Distribute information by creating a 1 evening course with examples.

- Create a buzz on a few campuses to make it spread.

- Mark's final remarks: Define stakeholder, grid sites. Be more nimble. Facilitate new ideas instead of doing the Blueprint meetings.

---

*** Discussion during the talks

** 8:30 Lothar

Ian: What is the target amount of opportunistic resources in OSG?
Miron: We target to "hit the ceiling" of all available resources. We may find more resources at that point.
Chander: 90M CPU h is the usage: we don't know how much there is.

Miron: Glow provided as many opportunistic resources to OSG as consumed opportunistic resources on OSG: the plumbing works.

Miron: how do we make sure that all the services can support a grow from 60M h per mo to, say, 100m h.

Jim Bottum: Does the utilization maps to our investors?
Lothar: roughly
Miron: if we had more demand, we might find more resources

Miron: some communities are still not thinking about this order of magnitude in available resources, although OSG can provide them.
Jim / Ian: Double edged sword: there may be various internal organizational  (campus, DOE, ...) incentives to use resources owned by them, rather than going to OSG. Sociological issues.

Mark Neubauer: Make the case that OSG is the basis for professional training in the US. Better metrics than computational hours.

Miron: google and facebook are the "identity providers"
Jim: I wish that it was possible to rely on the commercial sector for that. We might not be there yet to run identity management for an entire university.


** 9:15 Chander - OSG User Support

Ian: Do people switch among VOs?
Miron: OSG is merging with Engage

Jim: you said that want to increase the number of supported projects: have you reached out to Internet2 e.g. through the Net+ service?
Miron: yes, but no deal yet

Mark: You might want to change the way you present number of users (with the tables of XD projects, OSG-Direct, projects at campuses, ...). Underline that these are only the PIs.

Michael Ernst: what is the feedback from the community?
Chander: Users are satisfied. Metric: anecdotal evidence

Jim: need senior university management to tell local campus champions that they have to work with the national cyber-infrastructure as well as focusing on local computing


** 9:41 Rob Gardner - Campus Grid

Jim: it would be useful to have a glossary of acronyms (DHTC, etc.)

Ewa: Do you have of deploying components easily?
Rob: yes, they don't have to be exposed to the full complexity.

Mark: to get more users, it seems that going to the campuses and doing tutorials seems very effective to enable technology there. Need to also discuss under-represented universities.

Mark: what is the "coherent" OSG strategy in outreach to serve the communities? This seems missing from the agenda.
Lothar: The model started as VO-driven: bring resources and share. Now evolving a new business model: OSG VO and campus connect. Our new strategy is not mature yet.
Miron: still need to ask researchers "what would you do with 100,000 cpu hours"? Scientists need to think differently to affect usage increase


** Break


** 10:30 Rob Quick - Operations

Miron: what is the impact of the government shutdown?
Rob: if FNAL and BNL had gone down, all critical services were expected to be up. Communication has a separate path from FNAL. Accounting is cached locally at sites.

Mark: your level of downtime for the critical services in 2 yrs is commendable. Do you have a disaster recovery plan?
Rob: yes. Should IU have a disaster, we can bring up the critical services within a week
Miron: distributing the services to various institution is a strength that increases some risks.

Jim: do you have a service methodology across the 5 sites to tight them together?
Rob: the process is ITIL-like

Jim: how do you develop senior staff?
Rob: one-on-one with Rob and leadership classes. Left for 1 month for paternity leave and ops went smoothly.

Mark: XSEDE gives you an environment; OSG works with and existing environment. What is the symmetry of the XSEDE / OSG relationship?
Chander: See Brian's talk

Ewa: What is the distribution of tickets from users vs. sites?
Rob: Probably close to 40%-60%. Need to check.


** 11:07 Brian Bockelman - Technology

Mike Ernst: Who is providing guidance of where the architecture is heading and what technologies are considered. What is the input process?
Brian: Miron is the technical director. John runs the process. Input from several stakeholders.
Requests from council, area coordinators, blueprint meetings. Not a formalized process.
Miron: if there is enough pressure from a stakeholder, then we do a blueprint meeting. We should increase the number of blueprint meetings (quarterly is too little).
Lothar: The fact that the Blueprint meeting process is not smooth is holding us back. Bringing expert to the table is an almost unique strength. We have to fix the process.
Miron: I accept to be responsible to set up 6 (?) blueprint meetings every year, if someone helps with logistic

Jim: is there a process to gather requests?
Lothar: yes, there is a request system
Michael: it is a good system to track the requets. We need a better way to help stakeholders formulate requests.
Brian: the system is good for specific requests, not high-level / general requests. Council meetings may be better to discuss those.

Michael Ernst: we should use workload manager, instead of workflow manager, for the gWMS / Condor technologies


** 11:35 Tim C. and Tim T. - Software and release management

Mark: is it manageable / too high-maintenance to customize 15M lines of code, most of which are out of your control?
Tim T.: we have automated processes to retrieve, apply patches, etc.
Mark: It can be a high barrier to change technologies
Brian: we struggle to count packages: is globus 1 or 50? Today we say 50 with 50 automated downloads, but we update that as 1.
Tim C.: We use the terms "components" of "packages"
Miron: we should have better language / metrics. Number of packages is high as per reviewers' feedback
Brian: we are decreasing the number of packages we support and increasing the number supported by the community

Miron: separation of seoftware and release management is working well. Release management is the interface to operations.

Jim: how do you financially support the taking over of orphaned software?
Miron: OSG pays for it.
Jim: do the funding agencies understand that you are taking this over?
Miron: they understand we do the best we can.  If their stakeholders get the job done, the agencies are fine. The ET has been managing priorities well.
Lothar: we are in this space to support our communities
Mark: is it the users that don't want to overcome the barrier to update or there are more substantial issues?
Miron: all of the above.
Jim: what is your budget for software?
Chander: 9.6 FTE for technology, including software. 1/3 of our effort.


** 12:09 Mine Altunay - Security

Mark: where can you push standardization for InCommon? Computing is an equalizer to allow contributions from small universities. Are we marginalizing them this way?
Mine: there are standardization efforts
Lothar: this is the 80% coverage of users: maybe it should be 95%. We have other identity mechanisms for the remaining users.


** Shawn McKee - Networking