# Getting Started with jobsub_lite

Note:  All of the commands that are preceded by "!" are shell commands.

## Setup

The new jobsub_lite software will be installed on the interactive nodes via RPM, and the jobsub_lite executables should be in your PATH automatically. No sourcing of setup scripts is necessary; neither is UPS/Spack. 

This can be seen here:

In [1]:
!which jobsub_submit

/opt/jobsub_lite/bin/jobsub_submit


The following steps are to ensure that 

* There is a valid kerberos ticket

* Any existing vault tokens or cached files are removed before starting the demo.

In [2]:
!klist

Ticket cache: FILE:/tmp/krb5cc_10610_i4IjDKvh5k
Default principal: sbhat@FNAL.GOV

Valid starting       Expires              Service principal
01/03/2023 16:20:26  01/04/2023 18:20:15  krbtgt/FNAL.GOV@FNAL.GOV


In [3]:
!rm /tmp/vt_u$(id -u)*
!rm ~/.config/htgettoken/credkey*

## Submit a simple job

Submit a simple job using jobsub_submit, with no setup of UPS/Spack.

In [22]:
!jobsub_submit -G fermilab file:///usr/bin/printenv

Submitting job(s).
1 job(s) submitted to cluster 57107368.
Use job id 57107368.0@jobsub01.fnal.gov to retrieve output


To manage a job, simply use the standard jobsub_q, jobsub_hold, jobsub_release, and jobsub_rm commands.  Remember to pass the -G flag for all jobsub commands.

In [5]:
!jobsub_q -G fermilab

JOBSUBJOBID                             OWNER       	SUBMITTED     RUNTIME   ST PRIO   SIZE  COMMAND
57107356.0@jobsub01.fnal.gov            sbhat     	01/03 16:27   0+00:00:00 I    0    0.0 simple.sh 
1709.0@jobsubdevgpvm01.fnal.gov         sbhat     	11/07 22:27   0+09:36:03 I    0    0.0  
1710.0@jobsubdevgpvm01.fnal.gov         sbhat     	11/07 22:28   0+09:23:34 I    0    0.0  
1714.0@jobsubdevgpvm01.fnal.gov         sbhat     	11/07 22:31   0+08:47:41 I    0    0.0  
1715.0@jobsubdevgpvm01.fnal.gov         sbhat     	11/07 22:32   0+09:11:23 I    0    0.0  
1725.0@jobsubdevgpvm01.fnal.gov         sbhat     	11/08 02:03   0+08:49:42 I    0    0.0  
1730.0@jobsubdevgpvm01.fnal.gov         sbhat     	11/08 03:00   0+08:16:47 I    0    0.0  
1733.0@jobsubdevgpvm01.fnal.gov         sbhat     	11/08 03:08   0+07:45:47 I    0    0.0  
1812.0@jobsubdevgpvm01.fnal.gov         sbhat     	11/16 15:05   0+00:00:00 I    0    0.0 true 
1829.0@jobsubdevgpvm01.fnal.gov         sbhat     	11/17 1

Querying for a particular job:

In [7]:
!jobsub_q -G fermilab 57107356.0@jobsub01.fnal.gov

JOBSUBJOBID                             OWNER       	SUBMITTED     RUNTIME   ST PRIO   SIZE  COMMAND
57107356.0@jobsub01.fnal.gov            sbhat     	01/03 16:27   0+00:00:00 I    0    0.0 simple.sh 


Holding a job:

In [8]:
!jobsub_hold -G fermilab 57107356.0@jobsub01.fnal.gov

Job 57107356.0 held


Releasing a job:

In [9]:
!jobsub_release -G fermilab 57107356.0@jobsub01.fnal.gov

Job 57107356.0 released


Removing a job:

In [10]:
!jobsub_rm -G fermilab 57107356.0@jobsub01.fnal.gov

Job 57107356.0 marked for removal


## Submitting DAGs

For a file TestDAG/mywork.dagnabbit:

In [11]:
!cat TestDAG/mywork.dagnabbit

<serial>
jobsub_submit -G fermilab file:///home/sbhat/TestDAG/jobA.sh
jobsub_submit -G fermilab file:///home/sbhat/TestDAG/jobB.sh
</serial>
<parallel>
jobsub_submit -G fermilab file:///home/sbhat/TestDAG/jobC.sh
jobsub_submit -G fermilab file:///home/sbhat/TestDAG/jobD.sh
</parallel>
<serial>
jobsub_submit -G fermilab file:///home/sbhat/TestDAG/jobE.sh
</serial>


Submit a DAG:

In [12]:
!jobsub_submit -G fermilab --dag file:///home/sbhat/TestDAG/mywork.dagnabbit

Submitting job(s).
1 job(s) submitted to cluster 57107357.
Use job id 57107357.0@jobsub01.fnal.gov to retrieve output


Query the status of the DAG:

In [14]:
!jobsub_q -G fermilab 57107357.0@jobsub01.fnal.gov

JOBSUBJOBID                             OWNER       	SUBMITTED     RUNTIME   ST PRIO   SIZE  COMMAND
57107357.0@jobsub01.fnal.gov            sbhat     	01/03 16:28   0+00:00:00 R    0    0.0 dagman_wrapper.sh -p 0 -f -l . -Lockfi


## Tarfiles

-f flags send files to `$CONDOR_DIR_INPUT` directory in job:

* -f:  Transfer file at runtime
* -f dropbox:// :  Transfer file at submission time

--tar-file-name flags copy tarballs to `$INPUT_TAR_FILE`, unpack contents into same directory as `$INPUT_TAR_FILE`:

* --tar-file-name dropbox:// :  Transfer file at submission time.
* --tar-file-name tardir:// :  Create tar archive of directory, transfer to job

In [15]:
!jobsub_submit -G fermilab -f test_file.txt file:///usr/bin/printenv

Submitting job(s).
1 job(s) submitted to cluster 57107359.
Use job id 57107359.0@jobsub01.fnal.gov to retrieve output


In [18]:
!jobsub_submit -G fermilab -f dropbox:///home/sbhat/test_file.txt file:///usr/bin/printenv

test_file.txt
Using bearer token located at /tmp/bt_token_fermilab_Analysis_10610 to authenticate to RCDS
Submitting job(s).
1 job(s) submitted to cluster 57107365.
Use job id 57107365.0@jobsub01.fnal.gov to retrieve output


In [19]:
!jobsub_submit -G fermilab --tar-file-name dropbox:///home/sbhat/TestDirTarfile.tar file:///usr/bin/printenv

Using bearer token located at /tmp/bt_token_fermilab_Analysis_10610 to authenticate to RCDS
Submitting job(s).
1 job(s) submitted to cluster 57107366.
Use job id 57107366.0@jobsub01.fnal.gov to retrieve output


In [20]:
!jobsub_submit -G fermilab --tar-file-name tardir:///home/sbhat/TestDirTarfile file:///usr/bin/printenv

./
./a
./b
./c
./Subdir/
./Subdir/a
./Subdir/b
Using bearer token located at /tmp/bt_token_fermilab_Analysis_10610 to authenticate to RCDS
Submitting job(s).
1 job(s) submitted to cluster 57107367.
Use job id 57107367.0@jobsub01.fnal.gov to retrieve output


## Fetching logs

Use jobsub_fetchlog to fetch logs from completed jobs.

In [23]:
!jobsub_fetchlog -G fermilab 57107368.0@jobsub01.fnal.gov 

Error in transfer_data(): DCSchedd::receiveJobSandbox:7003:File transfer failed for target job 57107368.0: SCHEDD at 131.225.161.93 failed to send file(s) to <131.225.152.173:34030>: error reading from /storage/local/data1/condor/spool/7368/0/cluster57107368.proc0.subproc0/printenv2023_01_03_16382445ea48cb-a0f2-43ec-b43b-7a377b77c228cluster.57107368.0.out: (errno 2) No such file or directory; TOOL failed to receive file(s) from <131.225.161.93:9615>
Transfer may be incomplete.


In [24]:
!ls

57106981.0@jobsub01.fnal.gov.tgz  TestCondorSubmitSimple  TestDirTarfile2
57107368@jobsub01.fnal.gov.tgz	  TestDAG		  TestDirTarfile.tar
anaconda3			  test_dir		  test_file.txt
jobsub_lite_demo.ipynb		  TestDirTarfile


In [22]:
!mkdir -p test_dir
!tar -xvf 57106981.0@jobsub01.fnal.gov.tgz --directory=test_dir

simple.cmd
simple.sh
probe_sleep.sh
probe_sleep.sh2022_12_02_18312495358905-1008-406c-ada5-0cee90514e83cluster.57106981.0.out
probe_sleep.sh2022_12_02_18312495358905-1008-406c-ada5-0cee90514e83cluster.57106981.0.err
probe_sleep.sh2022_12_02_18312495358905-1008-406c-ada5-0cee90514e83cluster.57106981.0.log


In [23]:
!ls test_dir

probe_sleep.sh
probe_sleep.sh2022_12_02_18312495358905-1008-406c-ada5-0cee90514e83cluster.57106981.0.err
probe_sleep.sh2022_12_02_18312495358905-1008-406c-ada5-0cee90514e83cluster.57106981.0.log
probe_sleep.sh2022_12_02_18312495358905-1008-406c-ada5-0cee90514e83cluster.57106981.0.out
simple.cmd
simple.sh


## Condor Commands

Here, we will use the jobsub_lite-wrapped condor_submit, and the standard condor_q that comes with HTCondor to submit and query the status of a cluster, respectively.

For a simple submit file:

In [32]:
!cat TestCondorSubmitSimple/simple_test.cmd

Universe   = vanilla
Executable = /usr/bin/printenv
Arguments  = 120
Log        = cout/printenv.$(Cluster).$(Process).log
Output     = out/printenv.out.$(Cluster).$(Process)
Error      = err/printenv.err.$(Cluster).$(Process)
RequestCpus = 1
RequestMemory = 1024
x509userproxy = /tmp/x509up_fermilab_Analysis_10610
delegate_job_GSI_credentials_lifetime = 0
Queue 5


We will use the jobsub_wrapped condor_submit to submit the 5-job cluster:

In [31]:
!condor_submit -G fermilab --debug TestCondorSubmitSimple/simple_test.cmd

schedd_name is : jobsub01.fnal.gov
cmd_args is : ['TestCondorSubmitSimple/simple_test.cmd']
Running: voms-proxy-info -exists -valid 0:10 -file /tmp/x509up_fermilab_Analysis_10610
proxy is : /tmp/x509up_fermilab_Analysis_10610
token is : /tmp/bt_token_fermilab_Analysis_10610
Running: _condor_CREDD_HOST=jobsub01.fnal.gov BEARER_TOKEN_FILE=/tmp/bt_token_fermilab_Analysis_10610 /usr/bin/condor_submit -pool gpcollector04.fnal.gov -remote jobsub01.fnal.gov 'TestCondorSubmitSimple/simple_test.cmd' '-spool'
Submitting job(s).....
5 job(s) submitted to cluster 57107372.
Use job id 57107372.0@jobsub01.fnal.gov to retrieve output


To use the unwrapped condor_q command that comes with HTCondor (at `/usr/bin/condor_q`, you will need to handle the authentication yourself:

In [33]:
!htgettoken -a htvaultprod.fnal.gov -i fermilab

Attempting to get token from https://htvaultprod.fnal.gov:8200 ... succeeded
Storing bearer token in /run/user/10610/bt_u10610


And then use the HTCondor executables out of `/usr/bin/`

In [35]:
!/usr/bin/condor_q -pool gpcollector04.fnal.gov -name jobsub01.fnal.gov 57107372



-- Schedd: jobsub01.fnal.gov : <131.225.161.93:9615?... @ 01/03/23 16:43:54
OWNER BATCH_NAME      SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
sbhat ID: 57107372   1/3  16:43      _      _      5      5 57107372.0-4

Total for query: 5 jobs; 0 completed, 0 removed, 5 idle, 0 running, 0 held, 0 suspended 
Total for all users: 22 jobs; 9 completed, 0 removed, 5 idle, 3 running, 5 held, 0 suspended



### Onto production jobs and the Managed Tokens Service (separate notebook)

If you have any comments or questions about this notebook, please open a ServiceNow ticket to the Job Management Development group.