wiki:FY-18-initial-inputs

For new file types:

For HST, the only thing I know about is the ACS SNKCFILE and updated PCTETAB that are currently being tested for the 2017.2a quick-fix. For JWST, there are several reference types listed on the JWST CRDS web site for which there have been no deliveries. If they are needed for pipeline calibrations, then I¹m guessing they¹ll be delivered sometime between the end of OTIS testing and whenever the calibration pipeline needs to be ready.

For large scale deliveries: The ACS Team will be redelivering all of their darks (about 5000 files) this summer, probably after they¹ve finished with 2017.2a. I¹m not aware of any other large deliveries, though the instrument teams would have a better idea.

For other areas of work, Rossy and I came up with the following list:

1.) The ability for CRDS to generate a new context (pmap) with the delivery of an imap. Rossy and I feel that not only should this be a high priority item, but it would satisfy the Œmerging contexts¹ issue that has come up on occasion (most recently a couple of months ago). It would be especially useful when an instrument team has several new and/or updated reference types (e.g., the COS updates for 2017.2); not to mention that both you and I got burned by not updating the editing context.

2.) Having crds.uniqname handle HST TMG, TMC, and TMT files (collectively known as master tables). Each of these filetypes has the INSTRUME keyword set to HST. If crds.uniqname sees this value for INSTRUME, the last character in the 9 character filename would be an Œm¹ and the 3 character Œsuffix¹ (tmg, tmc, or tmt) would depend on the value of the DBTABLE keyword. Currently, we use a round about method where we rename a random reference file, then use that as the name of the master table with the last character set to Œm¹.

3.) Have crds.certify check and verify the subarray size of JWST references.

4.) Separate the core file per instrument. Currently, a keyword may have a valid value as per the core file, but it would be invalid given the instrument. It would also make it easier for some of the tools we¹re developing/updating.

5.) As part of CRDS requirement DMS-HST-03, providing a tool that allows a user to list all of the selection criteria that result in the use of a given file (which is satisfied within the listings of a given reference under the instruments), the descriptions for those dropdowns in the explore best references area where an asterisk is an option should indicate that it is being used as a wildcard rather than a valid value.

6.) The ability to search for and list all of the files for a given type and range of dates (e.g., all WFC3 UVIS darks for data taken between May 15 2017 and May 22 2017). Right now, you get all references for a given date for a given list of parameters.

7.) On the Explore Best References area, update the default values of the parameters and remove others. For example, the default detector for ACS is HRC and the default gain is 0.5. These should be WFC and 2.0, respectively. The LTV1 and LTV2 parameters aren¹t needed.

8.) CRDS requirement DMS-HST-11: CRDS shall provide a tool or web interface that allows users to access dictionaries with information on the selection parameters and valid values for a given reference file (type).

9.) Can bestref populate the HST header keywords if the correlation parameter for the pipeline step and the reference file are not present in the raw data files? This has happened quite often for files the instrument teams have on disk and need to use them after a new reference file type has been introduced. Currently, this has to be done manually and is time consuming. While the data could be re-retrieved, retrieving the files is not desirable when dealing with a large number of them.

I think CRDS work for HST will break down into 4 areas:

1.  Pacing JWST for say 2 HST catch up releases

2.  Adding any new types, constraints, or other content changes specifically required for HST.  Timing may drive extra releases beyond (1) above.

3.  Supporting those FY18 ReDCaT requests below that I can support

4.  Updating those areas of CRDS I think could use improvement


For "Todd driven improvements",  these have been in the back of my mind:

1. Documentation overhaul to simplify and provide task oriented cookbooks,  cleaner formatting. 

2. More traceability for the context syncing in the pipeline,  both as e-mails for setting and syncing and as an audit trail to prove the sync occurred.

3. More feedback for rules-only submissions,  showing differences to evaluate before confirming,  similar to the reference file submissions.

4. Improvements to reprocessing system parameter fetching from DADSOPS and also minimizing the total number of parameter sets fetched.   This is peculiar to HST because (a)  HST hundreds of thousands to millions of parameter sets and (b)  HST has it's own CRDS-implemented interface to the archive database,  different than JWST where the archive provides a web service.

Todd

-------- Forwarded Message --------
Subject: 	Re: CRDS FY18 planning
Date: 	Fri, 14 Jul 2017 15:30:39 -0400
From: 	Todd Miller <jmiller@stsci.edu>
To: 	Matt McMaster <mcmaster@stsci.edu>, Robert Jedrzejewski <rij@stsci.edu>


Hi Matt,

My thoughts are below in blue and green,  there's a mix of responses,  more discussion is required for several.

On Jul 12 2017 2:49 PM, Matt McMaster wrote:

> For other areas of work, Rossy and I came up with the following list:
> 1.)  The ability for CRDS to generate a new context (pmap) with the
> delivery of an imap.  Rossy and I feel that not only should this be a high
> priority item, but it would satisfy the Œmerging contexts¹ issue that has
> come up on occasion (most recently a couple of months ago).  It would be
> especially useful when an instrument team has several new and/or updated
> reference types (e.g., the COS updates for 2017.2); not to mention that
> both you and I got burned by not updating the editing context.

WILL DO:    I've also concluded this is needed.

> 2.)  Having crds.uniqname handle HST TMG, TMC, and TMT files (collectively
> known as master tables).  Each of these filetypes has the INSTRUME keyword
> set to HST.  If crds.uniqname sees this value for INSTRUME, the last
> character in the 9 character filename would be an Œm¹ and the 3 character
> Œsuffix¹ (tmg, tmc, or tmt) would depend on the value of the DBTABLE
> keyword.  Currently, we use a round about method where we rename a random
> reference file, then use that as the name of the master table with the
> last character set to Œm¹.

WILL DO:   I'll ask for more details later.

> 3.) Have crds.certify check and verify the subarray size of JWST
> references.

MAYBE:   If you have information on what the SUBARRAY sizes/locations are,  CRDS can certainly do it.  

> 4.) Separate the core file per instrument.  Currently, a keyword may have
> a valid value as per the core file, but it would be invalid given the
> instrument.  It would also make it easier for some of the tools we¹re
> developing/updating.

MAYBE:

If you mean the core schema,  that is cal code,  not CRDS.   So it would be an issue for Howard.   And,  if the core changed,  it would also force CRDS revisions.

However,   CRDS does have an "all_all" .tpn which is similar to the core schema:  it applies to all instruments and types.   CRDS also has more specialized instrument_all .tpn files that apply to each instrument;  those could encode the kind of info you're describing more strictly than the data models,  and also make them keywords "required" rather than "optional" as needed.

To use the CRDS .tpn facility,  someone would need to define the keyword value breakdown by instrument,  and it would also have to be kept up to date.

I think Howard did some of this in the schema,  as a descriptive but non-functional partitioning.  So some of this already happened recently for SUBARRAY.

> 5.)  As part of CRDS requirement DMS-HST-03, providing a tool that allows
> a user to list all of the selection criteria that result in the use of a
> given file

PROBABLY EXISTS:   CRDS currently has a few things:

1.  Maybe the best,  open up the Operational Context Display to a particular instrument and type,  then type the filename into the Search box.   That will list all rows / criteria assigning that file.   For HST some files will have multiple rows.

2.  The "crds matches" tool.   Used like this:

    $ crds matches --contexts jwst-operational --files jwst_nircam_distortion_0055.asdf

    CRDS - INFO -  Symbolic context 'jwst-operational' resolves to 'jwst_0341.pmap'
    
    jwst_nircam_distortion_0055.asdf : NIRCAM DISTORTION META.EXPOSURE.TYPE='NRC_IMAGE|NRC_TSIMAGE|NRC_FLAT|NRC_LED|NRC_WFSC' META.INSTRUMENT.DETECTOR='NRCA1' META.INSTRUMENT.CHANNEL='SHORT' META.INSTRUMENT.PUPIL='CLEAR|F162M|F164N|GDHS0|GDHS60|WLM8|WLP8|PINHOLES|MASKIPR' META.INSTRUMENT.FILTER='N/A' META.OBSERVATION.DATE='2014-10-01' META.OBSERVATION.TIME='00:00:00'

The matching output is on one line for each pattern a file matches for a particular context.    Like the Context display, one context and file can result in multiple rows.   This mechanism is also fully independent from (1) which is accomplished by the web page table s/w.

3.  The file details web page for each reference has a "Lookup Patterns" section that shows the equivalent of the crds matches output formatted as a table:

     https://jwst-crds.stsci.edu/browse/jwst_nircam_distortion_0055.asdf

This is an automatically run and and web formatted version of (2) crds.matches.

> (which is satisfied within the listings of a given reference
> under the instruments), the descriptions for those dropdowns in the
> explore best references area where an asterisk is an option should
> indicate that it is being used as a wildcard rather than a valid value.

MAYBE:   This sounds like a note that I can put directly on the explorer input page.  OK?

> 6.) The ability to search for and list all of the files for a given type
> and range of dates (e.g., all WFC3 UVIS darks for data taken between May
> 15 2017 and May 22 2017).  Right now, you get all references for a given
> date for a given list of parameters.

MAYBE:

The date range you mention seems to be on the datasets (DATE-OBS?),  not USEAFTER.  This exact query can be done as a fairly easy addition to the crds.bestrefs program:

    crds bestrefs --instrument wfc3 --datasets-since 2017-05-15 --datasets-until 2017-05-22 --types dark --print-new-references

where superficially --datasets-until is the only new thing.   Maybe CRDS will also need --some-new-output-format,  but nothing difficult.

On the plus side,  I think that addresses your current question leveraging portable CRDS resources that work for both projects the same way.   As long as you're not expecting completely different queries in the future,  it sounds solved.  

But...  we may be on the hairy edge of "CRDS is not SQL,  and CRDS is not the archive database."   So it's definitely advisable to think ahead to any other queries you may want to make so that they can be incorporated into the design.

> 7.)  On the Explore Best References area, update the default values of the
> parameters and remove others.  For example, the default detector for ACS
> is HRC and the default gain is 0.5.  These should be WFC and 2.0,
> respectively.  The LTV1 and LTV2 parameters aren¹t needed.

HERE I DISAGREE:    LTV1 and LTV2 are needed by ACS BIASFILE hook functions not visible in the rmap.

> 8.)  CRDS requirement DMS-HST-11: CRDS shall provide a tool or web
> interface that allows users to access dictionaries with information on the
> selection parameters and valid values for a given reference file (type).

HARD ONE:   I thought we agreed this was the Bestrefs Explorer on the website.   That's what I have down for milestones,  metrics, and Trac ticketing.    The Explorer (and Context Display) loosely satisfies the original phrasing of DMS-HST-11:

    CRDS shall provide a user interface to display required selection criteria for specified reference files, and the permitted values for those selection criteria through a web interface.

I think that's what happened with this requirement,  the change in wording was lost in the shuffle,  and even now I don't know what the revised wording means.   What are you trying to do instead of the explorer?   If the Explorer and Context Displays are "not it",  what does your ideal tool or interface look like?

> 9.)  Can bestref populate the HST header keywords if the  correlation
> parameter for the pipeline step and the reference file are not present in
> the raw data files?

MAYBE:
 
If a keyword is not present,  CRDS defines it as UNDEFINED.   When xxxCORR=UNDEFINED instead of OMIT or PERFORM,  CRDS may attempt to compute a bestref where it is not needed or possible,  and may generate an ERROR.   This is because many of the rmap_relevance expressions are of the form "xxxCORR != 'OMIT' and... other condtions".   The point of rmap_relevance is to assign N/A before attempting a bestref lookup that will fail or be meaningless.

> This has happened quite often for files the instrument
> teams have on disk and need to use them after a new reference file type
> has been introduced. Currently, this has to be done manually and is time
> consuming.  While the data could be re-retrieved, retrieving the files is
> not desirable when dealing with a large number of them.

EITHER: 

    If you really just want to assign bestrefs for one new type,  you can do:

        % crds bestrefs --files xxx.fits  --types darkfile  --update

    and if sufficient information is available in the FITS headers *for that type* CRDS will do it.

    Again,  where  it is needed and possible,  CRDS should work,  and where it is irrelevant,  while CRDS may ERROR it should not do a bestref update;  OTOH,  it also will not assign N/A.

    With all that said,  CRDS is probably not the only system that needs the xxxCORR keywords,  what happens in the cal code may be the real issue.  If xxxCORR is not set in the cal code,  it may not run and may outright crash.

OR:

    The real problem may be that the FITS headers are too incomplete,  period,  and not because of bestrefs. 

    So really we may be talking about updating the xxxCORR parameters rather than re-retrieving the files.

    If CRDS can relate the FITS files to corresponding archive dataset IDs it knows about,  and CRDS has the required parameters in its archive parameter keyword queries,   then CRDS could probably provide a utility to take values from the archive dataset parameters (used for CRDS reprocessing) and write those to FITS headers.   So this is more of a general "update FITS file header from archive database" utility.   With the limitations I gave on "which parameters are available",  CRDS has the infrastructure to create this fairly simply.

    Once the FITS parameters are updated,  CRDS bestrefs and the cal code would work normally.   This would of course break down if not even the archive database accurately knows what the xxxCORR values should be.  Maybe they are UNDEFINED or wrong even in the archive because the data has not been reprocessed...  in which case it may be "nobody knows."

Todd

> That should be everything.
>
> Cheers,
>
> Matt
>
>
> On 6/22/17, 3:42 PM, "Todd Miller" <jmiller@stsci.edu> wrote:
>
>> Hi Rossy and Matt,
>>
>> We're doing FY18 planning in OED.
>>
>> Do you have any feature requests or bug reports for CRDS?
>>
>> Do you have any predictions for new reference types to be added or large
>> scale deliveries for either HST or JWST?
>>
>> Any other areas of work you need done?
>>
>> Thanks,
>> Todd
>>
>>

More areas for improvement not included in response to Warren:

  1. Traceability enhancements for repro parameter fetching, including:
    • Recording parameter source, e.g. pathless mock file, database config, or archive URL
    • Recording context defining parameter set
  1. Improvements to server stack building to make re-builds more consistent, utilize saved conda packages.
  1. Upgrade of (HST, JWST) x (DEV,TEST,OPS) servers to RHEL-7.
  1. Discussion of possible server consolidation, reduction in VMs.

Areas of JWST work:

  1. Add support for association handling, as needed, to CRDS repro and the archive parameter handling.
  1. Optimize dataset handling for bestrefs to limit computation to exposures rather than the current simple but inefficient <product>:<exposure>.
  1. Continue refinement of CRDS repro required types determination, including support for time series exposure types.
  1. Continue refinements to certify tests as needed, including possible subarray checks.
  1. Continue refinements to rmap update techniques, P_ keywords, data model based reference file production techniques, and overlap handling, as needed.
  1. Add support for SSO based server authentication.
Last modified 13 months ago Last modified on 11/29/17 11:38:53