Full HEPCloud Facility Board Meeting (Weekly)

Name: Full HEPCloud Facility Board Meeting (Weekly)
Start: 2020-07-27T15:00:00-05:00
End: 2020-07-27T16:00:00-05:00
Location: Fermilab

Monday 27 Jul 2020, 15:00 → 16:00 US/Central

Fermilab

https://fnal.zoom.us/j/91247475589?pwd=TnBuY2d2bDk3MGptRW1oZjJnSUpVdz09

Andrew Norman (Fermilab), Stuart Fuess (Fermilab)

Description

Andrew Norman is inviting you to a scheduled Zoom meeting.

Topic: HEPCloud Facility Board
Time: Jul 27, 2020 03:00 PM Central Time (US and Canada)

Contact

anorman@fnal.gov

Hide

Present: M. Livny, B. Bockelman, A. Norman, S. Timm, A. Tiradani, M. Mambelli, M. Acosta

HEPCloud theta

Users:

CMS, DUNE, other neutrino expos.

Scope: what functionality are we delivering

All experiments can define a “campaign” and then at experiment level, say that

This “campaign” should run at ALCF.

Not a job-by-job decision.

Data delivery:

For purposes of this demo it is assumed that the right input data needed for the CMS jobs

Is already staged in to Argonne and the output data will be transferred out of band,

Discussion—CMS already have some hooks about data in their job descriptions, how accurate are they and how much they are used?

Use the whole block of allocation at once, or part of it, etc.

Would prefer DC work but will have capacity for allocating the whole burst if needed.

Technical

Tony:

Got instructions for replicating the Barcelona setup

Replicating on Wilson @ Fermilab first

Jim and Liz will make contact with people @ Argonne to see if there is a Kubernetes cluster

Available to work on.

If not (default assumption) presume we are working on a login node @ Argonne

CMS will give small test workflow and then later a bigger one.

Miron Q—are we following the Barcelona model

They (Barcelona) had no schedd on the HPC system.

Barcelona system assumed you had individual jobs

Miron—if scheduling is a campaign, can we delegate some of the schedd activity to Argonne

Miron Interested in moving a whole bucket of jobs from schedd to schedd

Tony—wondering how long it would take to develop this. Barcelona method is available in current htcondor.. in time crunch to deliver.

Miron—still wonder if they can really run a schedd—for fall the Barcelona model may be the right way but longer term want to run a schedd on that end if we can.

Have we checked with Argonne re. The networking assumptions? Tony—yes.

In theory there is some network between login and worker nodes but not robust.

Brian B. — likes idea of keeping both in mind but starting with the Barcelona model and adding capabilitiy/capacity to it for larger bulk transfers..

Better to learn how to move sets of jobs

condor_b—submitting condor jobs into seti@home /BOINC large chunk of jobs and assign to a location.

Steve—involved with condor_annex? Brian — no

Miron—question—do we have authority to run this on the login nodes? Andrew not yet

But division head is tasked with clearing this for us with Argonne

Madison involvement

Brian—need a standing contact

Propose it is the standing fermi/condor meeting 2nd Friday of each month)

Who needs to be there? Best answer at the moment Jaime Frey (combination of

Schedd understanding and knowing how the Barcelona system works) but could change.

Who is the Fermi contact? Tony is the technical contact but Maria Acosta is doing most of the work.

Does Maria have the contacts she need?

Maria:

So far yes

Got Jaime’s code and is following his instructions on the HTcondor wiki.

Have adapted the code for what our needs are.

Still trying to submit slurm jobs remotely.

Have been talking to Jaime constantly

Chirp attributes to ship back, etc.

Meetings on Friday are a good first step—have a conflict but can try to make it.

E-mail is OK

Slack might be nice

When do they (CMS) want to run jobs?

Want to run jobs in this calendar year.

There are minutes attached to this event. Show them.

- 15:00 → 15:25
  
  HEPCloud Theta Plan Discussion 25m
  
  Discuss the plan with Brian and Miron
  
  Speaker: Anthony Tiradani (Fermilab)
- 15:25 → 15:50
  
  HTCondor and Madison Involvement Discussion 25m
  
  Discuss where Madison can contribute to the work
- 15:50 → 15:55
  
  Wrap Up 5m