SAM PDS Procedure

This procedure outlines the process for creating a SAM Reduced Data Record archive. Make sure to follow all the steps outlined here. Edit this page if anything changes.

Important People

Heather Franz: Heather wrote the SAM RDR SIS (Software Interface Specification), a user's guide for the RDR archive. Heather generally attends the MSL DAWG meetings and will keep you informed if anything important happens there (you usually don't need to attend). Heather also generates the high-level QMS products.
Jean-Yves Bonnet: Jean-Yves generates the high-level GC products.
Greg Flesch: Greg generates all TLS products.
Susan Slavney: Susie is the "Geosciences" PDS Node. She is our main point-of-contact between the SAM and PDS teams. If you have questions about PDS validation, etc., contact her.
Joy Crisp: Joy is the chair of the MSL DAWG, but usually does not get directly involved in SAM's RDRs. I believe she announces the delivery schedule.

Before the Delivery

The MSL DAWG discusses a release schedule, usually shortly after the last release. You will usually have at least two months to get the products together, but since assembling the archive requires inputs from other parties, it's important to contact the other parties as soon as possible.

When you get the email from Susie labeled "MSL PDS Release 13 schedule", there will be an attachment that indicates when certain things are due. The first thing asked is "PDS asks the data providers for their archive readiness reports" which will be an email that will come out on the date (about one month before delivery is due). You want to plan to email Heather, Greg, and Jean-Yves a couple days before this date basically saying the following:

    Hi all,
    Release XX covers sols yy-zz, (START_DATE – END_DATE). Heather can you please release a list of the TIDs that we need for this delivery. 
    Delivery is due DELIVERY_DATE so I will need all materials by 2_WEEKS_BEFORE_DELIVERY_DATE.

All the information will be in the attachments from Susie's email.

When you get the email asking for readiness report, just indicate delivery will be made on time and there are no changes unless Heather/Jean-Yves indicate something will be new or if you do not believe you will have delivery made in time.

Now just wait until you get all the materials from the team members and if someone has not delivered them on the date make sure you message them (if they forget they can usually get it to you in less than 24 hours).

Preparing the Inputs

SVN

Make sure you have all the current SAM data checked out fresh from SVN. Make sure that your working copy does not contain any unversioned, modified, conflicted files.

EDRs

Generating an RDR requires EDRs as input. EDRs should not be kept in SVN, and so you will probably need to download them separately. The process for getting EDRs is the same as it is for SAM Ops. Create an edrhub folder in the fmdata branch, and inside it run fei5kinit (to log in), feiget.py (to download the RDRs), followed by movefei.py to move them to the appropriate telemetry directories.

An alternative approach is to get someone on the SAM team to generate the EDRs and submit the TIDs with them included. I usually contact Benito Prats who is part of the SAM team, he can usually get them updated in a couple of days so just let him know ahead of time.

RDR Configuration Files

For each of the directories in the release, you will need to create a file named "rdr.config". This file contains information that cannot be obtained automatically by the RDR generator, or at least could not at the time the software was written. It is a newline-delimited CSV file, with the first column as keys and the second as values. I recommend you copy an existing one into each of the new TID directories, and then edit each one to fill out the information. All of the following lines need to be present:

SOURCE,F
MSL:SAM_GC_COLUMN_NUMBER,5
EXPERIMENT_TYPE,SPYR
VERSION,1
RELEASE,XX
PYRO_OVEN_NUMBER,1

The SOURCE field is a one-character description of where the data came from: calibration data ("C"), ATLO data ("A"), testbed data ("T"), or flight data ("F"). I do not know if we will ever archive a data set from any one of these alternate sources.
The MSL:SAM_GC_COLUMN_NUMBER is the number of the GC column. You can determine it by searching the message log with "tmmsg.py column".
The EXPERIMENT_TYPE is a four-character description of the type of experiment. The values are defined by the SIS, and in the code they are defined by the SAM_EXPERIMENT_TYPES dictionary. They can take the following values. If you are unsure of the classification for a given experiment, ask Heather. Note that some of these will never apply to flight data (but could in theory apply to testbed or calibration data).
- SPYR: Solid sample pyrolysis with GCMS
- SDER: Solid sample derivatization
- CSOL: Solid sample calibration
- ADIR: Direct atmospheric measurement
- AENR: Atmospheric enrichment
- AMET: Atmospheric methane enrichment
- ANGE: Atmospheric noble gas enrichment
- SCMB: Solid sample combustion
- CGAS: Gas calibration
The VERSION field is the release version of the data set. It should start at 1, but if you ever re-release a data set for some reason (usually because there was an error in a previous delivery), you increment it.
For RELEASE, put the current release number of the delivery. (Delivery due 11/4/16 is release 13).
The PYRO_OVEN_NUMBER field is more difficult to obtain. I find it much easier to set it to 1 at first and debug it as I do tm2rdr.py. I will explain how this is done below during the processing part and will include what needs to be done when pyro oven is set to 2.

Once you create these configuration files, commit them to SVN.

High-level GC Products

Jean-Yves' inputs are the easiest to include, because he delivers them directly to SVN. For each TID he was assigned, he creates a directory called rdr_gc. The directory should have files named "notes.txt", "noise.csv", "species.csv", and "species.jpg". To my knowledge, he creates all four of these files for each TID. All you have to do is make sure that the files are there and named correctly.

High-level QMS Products

Heather does not have SVN access, and her deliveries are small enough for email. She generally delivers her products as ZIP files, but names them .piz so they don't get block from the email server, so just rename them back to .zip and they should extract easily. For each TID she delivers, create an rdr_qms directory in that TID, and unzip the contents of her ZIP file into that directory. Her files should only have the following names: "NOTES.TXT", "ATMCOMP.CSV", "ISOTOPE.CSV", "EGA.CSV", and "EGACOMP.CSV." If you are unsure what a file is supposed to be called, ask Heather.

High-level TLS Products

Greg does not have SVN access, either, but his deliveries are so large he cannot email them. He usually uploads the deliveries on dropbox and will send you the link via email to the products. For each TID, make an "rdr_tls" directory in that TID directory, copy the contents there, and add them all to SVN. Make sure the three high-level products are named "notes.txt", "abundance.csv", and "ratios.csv", respectively. (He will prefaces them with the TID and names the notes file TID.txt.)

Creating the RDRs

Once the inputs are created, the RDR can be created with a single command.

Change directories to the fmdata branch of the working copy. individually go into each of the TIDs and run tm2rdr.py and fix any errors that come up.

While running tm2rdr.py and you have pyro oven set to 1 you may see the following error:

INFO Processing QMS science data
Traceback (most recent call last)
File "/Users/briancorrigan/labcode/699util/scripts-pds/tm2rdr.py", line 2017, in <module>
  exit(main())
File "/Users/briancorrigan/labcode/699util/scripts-pds/tm2rdr.py", line 2003, in main
  process_qms_data(tmfile, rdr)
File "/Users/briancorrigan/labcode/699util/scripts-pds/tm2rdr.py", line 1371, in process_qms_data
  pyro_time, pyro_temps = zip(*pyro_temps)
ValueError: need more than 0 values to unpack

The usual fix to this is that the pyro oven is actually 2 so do the following:

tmfields.py --sclk 86 223 > tmfields.txt
Go into tmfields.txt and delete the first three lines that begin with # so it is just the 3 columns of data.
change PYRO_OVEN_NUMBER in rdr.config to 2
re-run tm2rdr.py

If this still does not fix it, just set PYRO_OVEN_NUMBER to 0 and it will just skip over them (I only had to do this once).

If you want to run all the TIDs at once you can try by using runall.py but I have had errors with this in the past and it is easier to debug issues that come up with pyro ovens by doing them individually.

You should have already created the pds_MMM-NNN.txt file. You can now use it as an input to a runall.py.

Execute runall.py -d pds_MMM-NNN.txt tm2rdr.py. Running this command will take a long time. The program will generate RDRs for each TID, one at a time. You should go get lunch or do something else. If anything goes wrong, all progress will cease, and you will probably have to fix something in tm2rdr.py.

If an error comes up while processing QMS data, this is probably because you indicated pyro oven 1 was on when really it was pyro oven 2, so modify the rdr.config file and change pyro oven to 2.

Notes for Debugging

To create an RDR for a single TID, just run tm2rdr.py just like you would any other python script.

If you need to re-run the RDR generation script over again, you should run rdrclean.sh for each TID before re-running. You will notice the second time you run, the script will run must faster. Each major step in the RDR generation process will create a .pickle file, which is basically a large stored Python object. If you need to change sections of the code that create the data that goes into these files (e.g., housekeeping extraction), you should delete the associated pickle file.

Assembling and Delivering

Creating an Archive Directory

Download the mslsam_1xxx directory from the current archive:

wget -r -nH --cut-dirs 2 -l 2 ftp://pds-geosciences.wustl.edu/msl/msl-m-sam-2-rdr-l0-v1/mslsam_1xxx/

Make sure the data folder is empty. Move all the files out of the index directory to a temporary place. The tools will generate the delta rows from this delivery and you will have to manually add them to these index files later.

Summarize Your Changes

Edit the file mslsam_1xxx/errata.txt. In "SECTION A", copy and paste an entry from a previous release and put it at the top of the section. Fill out the information with the current release date (consult Joy's email) and the changes in this delivery. If you did not update/change any old files, you can write "N/A" under "REASON FOR UPDATES". Save the changes.

Install Software

The first time you do a delivery from a computer, you will need to install VTool. This can be downloaded from the PDS website here: http://pds.nasa.gov/tools/label-validation-tool.shtml. To install it, unzip the TAR archive and put it somewhere on your filesystem. I put mine under /usr/local. Then, add the archive's "bin" directory to your path.

You will also need to download and install md5deep. You can get the source code here: http://md5deep.sourceforge.net/. It's been a long time since I installed it, but I imagine it's a normal ./configure, make, [sudo] make install installation.

Link the new deliveries

Navigate to mslsam_1xxx/data. Run the following command with each TID in the delivery as an argument:

rdrlink TID1 TID2 TID3 TID4 ...

This will create symbolic links to the RDR products you just made.

Update Index Files

This step requires that you have the index files you downloaded from the archive and moved aside.

Start by going into mslsam_1xxx/ folder. Run rdrindex.py which should add 8 files to your index folder (4 lbl files and 4 tables for each level). Next, open a new folder window and go to the previous delivery's index folder. You are just going to concatenate the old folders index tables to the new ones.

open PREVIOUS_RELEASE/mslsam_1xxx/index/l0_index.tab
open NEW_RELEASE/mslsam_1xxx/index/10_index.tab
copy every line in previous release file
paste it on the first line of the new release file
repeat for remaining tables

Now all that needs to be done is to update the label files to match the table files.

open NEW_RELEASE/mslsam_1xxx/index/l0_index.lbl
open NEW_RELEASE/mslsam_1xxx/index/10_index.tab
indicate how many lines are in the index table
change FILE_RECORDS in the label file to the number of lines that were in the index table
change ROWS in the label file to the number of lines that were in the index table

Archive is now ready to be packaged and released.

Package the Archives

Navigate to the directory that contains mslsam_1xxx. Run rdrpackage which should create a "mslsam_1xxx.tar", "mslsam_1xxx_manifest", and "mslsam_1xxx_checksum". On Mac, you should be able to control + click the .tar file and hit "compress" which will make a much smaller .tar.zip.

Deliver the Archive

Now all you need to do is use an SFTP client to connect to wuftp.wustl.edu and copy the .zip as well as both the checksum and manifest over. Email Susie and let her know you have made the delivery and fix any errors that she has.

BELOW ARE OLD INSTRUCTIONS FOR PACKAGING I NO LONGER DO.

Navigate to the directory containing mslsam_1xxx. Create a directory called "rdrstage". Now, run the following command (with each TID in the delivery specified): rdrsemipackage TID1 TID2 TID3 .... This will create a tar.gz archive in the rdrstage folder. If you want to include older TIDs in the delivery (perhaps because you reprocessed them), include them in this command as well.

The rdrsemipackage not only archives the files you want to deliver (and excludes the other stuff), but it also creates a "manifest" file and a "checksum" file, both of which Susie needs to validate we got her everything correctly. Look at the script to see how these files are created. There is also an rdrpackage script, which I used for the first delivery. I have not needed to use it since.