wiki:command_line_submissions

CRDS Command Line File Submissions

CRDS provides the crds.submit command line tool which can interact with the CRDS server to perform file submissions. Because it is a command line tool it requires setup in the user's environment and must be operated from systems (and by users) with privileged access to CRDS server file systems.

For optimal file transfer performance, the presumption of the file submission tool is that installation is on a computer or virtual machine that is located on site at STScI. Use of personal laptops and VPN are highly discouraged because of the amount of slow hidden i/o that will occur.

Pre-installed (recommended)

A pre-installed copy of the CRDS command line file submission tools is on dmsinsvm.stsci.edu.

You can utilize it by updating your PATH in your .bash.exports or similar rc file:

export  OLD_PATH="$PATH"
export  PATH="/grp/crds/code/crds_stacks/anaconda3-2/bin:$PATH"
source activate crds-file-submission-2

This overrides your Python environment with a dedicated CRDS file submission installation.

To revert to your default Python environment after completing your submission work, you must reset your PATH:

export PATH="$OLD_PATH"

Installation (alternative)

Installation instructions for the nightly CRDS JWST development build are here:

JWST development calibration conda install

Required Permissions, Accounts, ssh setup

You must have:

  1. Membership in UNIX group "crdsoper". Email support@…
  1. An account on the CRDS servers. E-mail crds_team@…
  1. Login and/or ssh privileges on a machine that can see the CRDS Server submission directories. E-mail crds_team@…

The preferred usage of the CRDS command line submission tool is to run it directly from your account on dmsinsvm.stsci.edu. dmsinsvm has direct visibility to /ifs/crds so ordinary "cp" file copies will work on dmsinsvm for team members.

Alternately, you can run the tool remotely configured to copy your files to your account on dmsinsvm using "scp"; this requires full transparent ssh setup to work, you must initialize your ssh login and agent prior to running CRDS.

Finally, it's an option to set up a machine similar to dmsinsvm but dedicated to this purpose and use it via any of the aforementioned methods.

For efficient file transfers, it is critical that both the ReDCaT staging file system and the CRDS Isilon file system be located on site at STScI with the shortest possible network path.

To avoid VPN file copy inefficiencies, it is critical that offsite users first "ssh" to an onsite VM, e.g. dmsinsvm. This ensures that data transfers between the submitter and CRDS server remain on fast onsite networks.

Setup

Setting up for file submission requires these steps:

  1. Create a $HOME/.crds.ini file which contains:
[authentication]
CRDS_USERNAME =  homer
CRDS_PASSWORD = xxxxx

where the authentication credentials are your normal web file submission credentials on the CRDS Server.

  1. Set .crds.ini file permissions to 600 (private only to you)
chmod 600 $HOME/.crds.ini

this is REQUIRED or CRDS will not operate.

  1. Environment setup

Add the one of the following to your shell environment:

  1. JWST OPS Configuration
export CRDS_SERVER_URL="https://jwst-crds.stsci.edu"
export CRDS_PATH="$HOME/crds_cache_ops"
  1. JWST TEST Configuration
export CRDS_SERVER_URL="https://jwst-crds-test.stsci.edu"
export CRDS_PATH="$HOME/crds_cache_test"
  1. HST OPS Configuration
export CRDS_SERVER_URL="https://hst-crds.stsci.edu"
export CRDS_PATH="$HOME/crds_cache_ops"
  1. HST TEST Configuration
export CRDS_SERVER_URL="https://hst-crds-test.stsci.edu"
export CRDS_PATH="$HOME/crds_cache_test"

Each configuration selects a CRDS server unique to one combination of (HST, JWST) x (OPS, TEST).

The CRDS cache stores limited configuration information needed for file submissions but can also be shared for more general CRDS purposes.

Unlike server selections, the same CRDS cache can support both (HST, JWST), they are only segregated by OPS vs. TEST.

Starting the Submission

Before starting, it's important to understand that crds.submit works in concert with the web server. Hence starting or killing crds.submit is independent of what is happening on the CRDS server. Once a submission is started, the server will process the submission until it is ready for confirmation or fails for some reason. CRDS provides tools to monitor the progress of submissions on the server, but does not currently provide a mechanism to control or abort the server-side processing. Only one submission for the same instrument should be active at the same time.

You can initiate a command line file submission as follows:

python -m crds.submit --files @files --monitor --wait --wipe --log-time --stats --creator "Homer" --description "Small scale command line submission test."

where the file "files" is a list of file paths to be submitted, one per line.

Alternately just list the file paths directly instead of the file list @files.

Many of the command line switches specified above are optional:

--monitor          crds.submit will poll the server for status and print it out.  Otherwise no server status is output.
--wait             crds.submit will not exit until the submission is ready to cancel/confirm or fails.  Otherwise it exits as soon as submitted.

By default,  pre-existing files are included in the submission,  and repeat copies of the same files result in a collision,  failed copy,  and error.  So by default the expectation is that your submission directory will be empty when you start and CRDS will error-out if this is not true,  either because files from the same submission already exist,  or because you started out with "garbage" which is not included in the current submission list.

--wipe             crds.submit deletes all files already in the ingest directory,  possibly forcing lengthy repeat copies but guaranteed to remove problem files.
--keep-files       crds.submit will keep any files in the ingest directory and add missing or wrong-length files.

In both cases,  pre-existing garbage not included in the current file submission list will result in an error.

--log-time         time is added to log messages
--stats            summary status about overall runtime is emitted when complete

The submission above produced output like this:

% python -m crds.submit --files @files --monitor --wait --wipe --log-time --stats --creator "Homer" --description "Small scale command line submission test."
2016-07-18 18:32:39,471 - CRDS - INFO - =============================== setting up ===============================
2016-07-18 18:32:50,582 - CRDS - INFO - Symbolic context 'jwst-edit' resolves to 'jwst_0195.pmap'
2016-07-18 18:32:50,583 - CRDS - INFO - Logging in aquiring lock.
2016-07-18 18:32:51,868 - CRDS - INFO - =============================== wipe files ===============================
2016-07-18 18:32:51,869 - CRDS - INFO - Wiping files at 'dmsinsvm.stsci.edu:/ifs/crds/jwst/ops/server_files/ingest/homer'
2016-07-18 18:32:52,140 - CRDS - INFO - Preparing server logging.
2016-07-18 18:32:52,424 - CRDS - INFO - ============================== ingest files ==============================
2016-07-18 18:32:52,424 - CRDS - INFO - Copying 1 file(s) totalling 191.6 M bytes
2016-07-18 18:32:52,425 - CRDS - INFO - Copy started '/eng/ssb/crds/submission_testing/jwst_miri_dark_0025_a.fits' [ 1 / 1  files ] [ 191.6 M / 191.6 M  bytes ]
2016-07-18 18:32:56,273 - CRDS - INFO - Copy complete [       1 /       1 files ] [    0.3  files-per-second ]
2016-07-18 18:32:56,274 - CRDS - INFO - Copy complete [ 191.6 M / 191.6 M bytes ] [  49.8 M bytes-per-second ]
2016-07-18 18:32:56,274 - CRDS - INFO - STARTED 2016-07-18 18:32:52.42
2016-07-18 18:32:56,274 - CRDS - INFO - STOPPED 2016-07-18 18:32:56.27
2016-07-18 18:32:56,274 - CRDS - INFO - ELAPSED 0:00:03.84
2016-07-18 18:32:56,274 - CRDS - INFO -       1 files at    0.3  files-per-second
2016-07-18 18:32:56,275 - CRDS - INFO - 191.6 M bytes at  49.7 M bytes-per-second
2016-07-18 18:32:56,275 - CRDS - INFO - ===========================================================================
2016-07-18 18:32:56,275 - CRDS - INFO - Posting web request for '/batch_submit_references/'
2016-07-18 18:33:01,493 - CRDS - INFO - ======= monitoring server on '8c0ae186-0762-4551-95e3-ab3b7fe1c212' =======
2016-07-18 18:33:01,584 - CRDS - INFO - >> Starting submission processing.
2016-07-18 18:33:01,584 - CRDS - INFO - >> Certifying 'jwst_miri_dark_0025_a.fits'
2016-07-18 18:33:07,735 - CRDS - INFO - >> Processing 'jwst_miri_dark_0025_a.fits' [1 / 1 files] [191.6 M / 191.6 M / 191.6 M bytes]
2016-07-18 18:33:10,881 - CRDS - INFO - >> Renaming 'jwst_miri_dark_0025_a.fits' --> 'jwst_miri_dark_0107.fits'
2016-07-18 18:33:10,882 - CRDS - INFO - >> Linking jwst_miri_dark_0025_a.fits --> jwst_miri_dark_0107.fits
2016-07-18 18:33:10,882 - CRDS - INFO - >> Adding file 'jwst_miri_dark_0025_a.fits' to database.
2016-07-18 18:33:53,936 - CRDS - INFO - >> Generating new rmap 'jwst_miri_dark_0027.rmap' from 'jwst_miri_dark_0017.rmap'.
2016-07-18 18:33:53,937 - CRDS - INFO - >> Adding file 'jwst_miri_dark_0027.rmap' to database.
2016-07-18 18:33:57,033 - CRDS - INFO - >> Certifying 'jwst_miri_dark_0027.rmap'
2016-07-18 18:33:57,033 - CRDS - INFO - >> Checking for derivation collisions.
2016-07-18 18:33:57,033 - CRDS - INFO - >> Computing file differences.
2016-07-18 18:33:57,033 - CRDS - INFO - >> Differencing 'jwst_miri_dark_0017.rmap' vs. 'jwst_miri_dark_0027.rmap'
2016-07-18 18:33:57,033 - CRDS - INFO - >> COMPLETED: {'status': 0, 'result': 'https://jwst-crds.stsci.edu//display_result/2dc0463e-f32e-4684-af22-2cc4fc2ccfef'}
2016-07-18 18:34:00,035 - CRDS - INFO - ========================= monitoring server done =========================
2016-07-18 18:34:06,525 - CRDS - INFO - 0 errors
2016-07-18 18:34:06,525 - CRDS - INFO - 0 warnings
2016-07-18 18:34:06,525 - CRDS - INFO - 15 infos
2016-07-18 18:34:06,525 - CRDS - INFO - STARTED 2016-07-18 18:32:39.46
2016-07-18 18:34:06,526 - CRDS - INFO - STOPPED 2016-07-18 18:34:06.52
2016-07-18 18:34:06,526 - CRDS - INFO - ELAPSED 0:01:27.06

A fancier way of running any program, including crds.submit, is to use "nohup" and run it in the background, logging to a file.

% nohup  program parameters... >&file.log   &

where "nohup" means to not "hangup" if you are logged out somehow, ">& file.log" sends all program output to file.log instead of your terminal, and "&" means to run in the background allowing you to continue to type other commands in the foreground and/or logout.

Without the nohup, you could still run your program in the background, but it would be killed when you logged out somehow. Without running in the background with "&", you'd be unable to issue other commands or type e.g. "logout". Together, the program runs silently in the background and continues to run when you logout or are otherwise disconnected.

One way to peek at background program output while it is running is to use the tail command in "follow" mode:

% tail -f file.log

That will watch file.log indefinitely and print each new line as it arrives. Hit control-c to stop then run it again later the same way as needed.

Monitoring the Submission

After the submitted files have been copied to the CRDS ingest directory and the submission request has been posted to the CRDS server, a STARTED e-mail will be emitted to crds-servers@… and the file submitter's e-mail. The STARTED e-mail contains a link to a new "monitor" web page which can be followed instead of command line output or the batch submission page's built-in monitor. The STARTED e-mail contains text similar to the following:

SUBMITTED 'Batch Submit References' for 'homer'.

Monitor at: 
-----------
https://jwst-crds.stsci.edu/monitor/8c0ae186-0762-4551-95e3-ab3b7fe1c212/

Description:
------------
Small scale command line submission test.


Uploaded Files:
----------------
jwst_miri_dark_0025_a.fits --> jwst_miri_dark_0025_a.fits

You must be logged in to watch the monitor page. Clicking the link should redirect you to the login page if you are not already logged in.

Confirming the Submission

If the submission processes successfully, a READY e-mail similar to the STARTED e-mail will be emitted. The READY e-mail contains a link to the confirmation page which contains certification results and rmap differences. Hence the confirmation web page contains the informatiomn required to review the submission results and has a confirmation button. The e-mail has contents similar to the following:

READY 'Batch Submit References' for 'homer'.

Review/Confirm at:
------------------
https://jwst-crds.stsci.edu//display_result/2dc0463e-f32e-4684-af22-2cc4fc2ccfef

Description:
------------
Small scale command line submission test.


Uploaded Files:
----------------
jwst_miri_dark_0025_a.fits --> jwst_miri_dark_0025_a.fits

Note that the link given on the READY page is the same link given by the final output of the crds.submit command line program *if* it is run with --wait. If it is not run without --wait, crds.submit will exit before the READY link has been determined.

Final Results

After the submitter clicks the "Confirm" or "Cancel" button on the confirmation page, a final CONFIRMED or CANCELLED e-mail will be emitted similar to the following:

confirmed 'batch submit' for 'homer'.

Final Results:
--------------
https://jwst-crds.stsci.edu//display_result/a11d1e4d-e3f5-4616-88bc-dc2e748d24b6

Description:
------------
Small scale command line submission test.


Uploaded Files:
----------------
jwst_miri_dark_0025_a.fits --> jwst_miri_dark_0096.fits


Generated Files:
-----------------
jwst_0194.pmap --> jwst_0195.pmap
jwst_miri_0091.imap --> jwst_miri_0092.imap
jwst_miri_dark_0015.rmap --> jwst_miri_dark_0017.rmap

The CONFIRMED e-mail includes a small amount of additional information directly, namely the mapping from uploaded names to newly generated names, as well as the old and new names of all derived/generated files.

Last modified 19 months ago Last modified on 06/02/17 14:44:00