wiki:delivery_workflow

Reference Delivery Workflow

This page documents the workflow and teams involved in delivering files into the CRDS system and distributing them to the pipeline.

Reference File Preparation

The IDT's, INS, ReDCaT, or SSB prepare a new set of reference files and perform validation and/or development with the calibration software.

Reference File Checkout (optional)

Prior to upload, file deliverers can pre-check references using the crds.certify command line tool. This requires an up-to-date CRDS installation which can be difficult to obtain and maintain during on-going development and type changes.

Alternately, file deliverers can do pre-checks on the CRDS server using the Certify Files function to dry-run files before attempting actual submissions.

Reference File Delivery

A ReDCaT team member logs into the CRDS server locking a single instrument for which files will be delivered.

Typically the Batch Submit References function is used to both ingest new references and automatically generate a set of CRDS rules to refer to them.

The CRDS server issues STARTED, READY, and CONFIRMED e-mails during the file submission process.

CRDS File Staging

After the file submitter confirms a submission, new files are permanently added to the CRDS database and copied to a staging directory for pickup by the CRDS pipeline. A list of the delivered files (e.g. jwst_292.cat) is created to define and control the delivery.

CRDS Pipeline Archive Ingest

Periodically (e.g. 10 minutes) the CRDS pipeline polls the CRDS server delivery directory for new .cat files. During the ingest process the .cat file is renamed with various suffixes attached, e.g. _PROC or _ERROR. Each file listed in the .cat file is ingested into the CRDS pipeline. When the CRDS pipeline has completed the ingest, the files and .cat are deleted from the ingest directory. The CRDS pipeline issues a success e-mail once the ingest is completed.

Archive Backend Delivery

Further downstream from the CRDS pipeline, an archive process asynchronously completes the delivery making the files available on the archive files server used to distribute CRDS files.

Pipeline Operator Set Context

At any point after the removal of the .cat file by the CRDS pipeline, a pipeline operator logs into the CRDS server and uses the Set Context function to update the default operational context. This defines what context will be used on the CRDS server and by any subsequently sync'ed remote system.

When a new context is set as default, an new entry is made in the CRDS context history tracked and displayed on the CRDS server. In addition, the CRDS affected datasets system is triggered to recommend datasets affected by the arrival of new references to the pipeline for reprocessing.

Pipeline Operator Cache Synchronization

After updating the context, the pipeline operator runs a sync script that downloads files from the archive into the pipeline's local CRDS cache. The cache contains the necessary rules and references for CRDS to operate in the pipeline without a constant connnection to the CRDS server. The cache records the default operational context. The sync'ed context can be verified by the operator either locally or on the CRDS server by any authenticated user.

CRDS Affected Datasets System

When the new context is selected as default by the pipeline operator, the CRDS affected datasets system triggers to recommend datasets affected by the newly arrived files for reprocessing. Typically the system finishes the determination within 10 minutes but can require several hours for those cases where large numbers of datasets are considered and potentially affected.

When the affected datasets computation completes, CRDS issues an e-mail containing the snipped log as the body and an attachment that lists one dataset id per line. For HST the ids listed are product IDs. For JWST the IDs listed are based on file set names and the detector, currently for the highest level of recorded bestrefs parameters in the archive, up to level-2b data only.

The CRDS server records the log output and recommended dataset IDs on the CRDS server permanently. Both may be downloaded by querying a web service using the query_affected_datasets script included in the CRDS client s/w distribution. Once initialized, the query_affected_datasets script records the last seen recommendation in the CRDS cache and can be called in a polling mode without parameters to download the next set of recommendations whenever they become available.

Pipeline Calibrations

Based on CRDS reprocessing recommendations or other requestors, the pipeline uses the installed CRDS client software in conjunction with the updated CRDS cache to assign appropriate reference files to datasets and use them to calibrate data. Because of the CRDS cache, the pipeline can operate without any potential interference from the CRDS server until the next reference file delivery occurs.

Last modified 20 months ago Last modified on 04/10/17 14:58:06