Onsite analysis (LSTOSA)

From my_wiki
Revision as of 09:25, 4 November 2021 by Maria.lainez (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Code: https://github.com/gae-ucm/lstosa

Docs: https://lstosa.readthedocs.io

How to use lstosa

Access to the LST IT container

  • With your personal account (meant for developing and testing)

If you have an LDAP account (firstname.lastname@cta-consortium.org), follow these steps:

- Log in to an allowed machine. Only a few machines (IPs) are allowed to log in to the LST IT center, so it is necessary to jump through them to the IT center.
- Log in with the general user CtAlAPaLmA to a general login server:
$ ssh -l CtAlAPaLmA 161.72.87.1
- From this login server, login with your LDAP account either to cp01 or cp02:
$ ssh -l firstname.lastname cp01 
  • Alternatively, with the official lstanalyzer account (meant for data processing)

Login to tcs06 machine as lstanalyzer:

$ ssh lstanalyzer@tcs06

Install miniconda (python package manager)

Follow the installation instructions here or alternatively, follow the instructions here to work at the LST-IT container.

Install lstosa in your directory

For the installation of lstosa follow the instructions in: https://github.com/gae-ucm/lstosa

You will end up with a new conda environment with all the python packages needed to run lstosa and lstchain, plus some other for developing, building the documentation, and testing the code. By default the environment is named 'osa-env' if no other name is specified with the -n flag while creating the environment.

Make sure that you do the last part after activating the environment: pip install -e . in the lstosa base directory.

Activate osa environment

$ conda activate <name of the environment>

Tip: you can list the environments in your conda installation with conda env list.

Configuration file for running lstosa

Set the proper paths in the configuration file you are using to run osa scripts: sequencer, closer, etc. See default file in cfg/sequencer.cfg

  • [ENV]:
- USER: firstname.lastname (set the name of the user for the usage of SLURM. If you have entered with your personal LDAP account)
  • [LSTOSA]:
- HOMEDIR: home directory (e.g. /fefs/aswg)
- PYTHONDIR: e.g. /home/firstname.lastname/lstosa
  • [LST1] (paths to different data level files):
- DIR set your work base directory where files are going to be produced (e.g. /fefs/aswg/workspace/firstname.lastname/data/real)
- PROD-ID = v0.7.5 (main production id for the running_analysis directory)
- CALIB-PROD-ID = v0.7.5
- DL1-PROD-ID = tailcut84_dynamic_cleaning (sub production id for the DL1AB subdirectory)
- DL2-PROD-ID = tailcut84_dynamic_cleaning (sub production id for the DL2 subdirectory)

The DL1 and DL2 can be different or leave them empty so automatically just lstchain version will be used.

With these parameters the running_analysis directory where all analysis products are first produced is: /fefs/aswg/workspace/firstname.lastname/data/real/running_analysis/YYYYMMDD/v0.7.5

DL1/DL2 final directories (after running closer script) will be in: /fefs/aswg/workspace/firstname.lastname/data/real/{DL1,DL2}/YYYYMMDD/v0.7.5/tailcut84_dynamic_cleaning

Check ELOG

Check the ELOG (https://www.lst1.iac.es/elog/LST+commissioning/) sent the morning after data taking finishes. Look for sky-data runs that are going to be processed by lstosa. In addition, look for information about the problems during data taking and weather conditions. For bookkeeping, annotate the runs to be processed by lstosa in the spreadsheet LST-analysis-logbook.

RunSummary files

The official RunSummary files are produced every day by a cron job in the lstanalyzer account. The files of each night can be found in the following path: /fefs/aswg/data/real/monitoring/RunSummary .

# %ECSV 0.9
# ---
# datatype:
# - {name: run_id, datatype: int64}
# - {name: n_subruns, datatype: int64}
# - {name: run_type, datatype: string}
# - {name: ucts_timestamp, datatype: int64}
# - {name: run_start, datatype: int64}
# - {name: dragon_reference_time, datatype: int64}
# - {name: dragon_reference_module_id, datatype: int16}
# - {name: dragon_reference_module_index, datatype: int16}
# - {name: dragon_reference_counter, datatype: uint64}
# - {name: dragon_reference_source, datatype: string}
# delimiter: ','
# meta: !!omap
# - {date: '2020-12-07'}
# - {lstchain_version: 0.7.1}
# schema: astropy-2.0
run_id,n_subruns,run_type,ucts_timestamp,run_start,dragon_reference_time,dragon_reference_module_id,dragon_reference_module_index,dragon_reference_counter,dragon_reference_source
3091,4,DRS4,1607381878299098079,1607381770000000000,1607381878299098079,90,0,6299097800,ucts
3092,4,PEDCALIB,1607383294862247802,1607383190000000000,1607383294862247802,90,0,2862247500,ucts
3093,144,DATA,1607384124176999635,1607384016000000000,1607384124176999635,90,0,5176999400,ucts
3094,143,DATA,1607385418722871215,1607385307000000000,1607385418722871215,90,0,8722871000,ucts
3095,217,DATA,1607386783224563365,1607386676000000000,1607386783224563365,90,0,4224563100,ucts
3096,139,DATA,1607388960101619394,1607388852000000000,1607388960101619394,90,0,6101619100,ucts
3097,135,DATA,1607390297501301237,1607390187000000000,1607390297501301237,90,0,6501301000,ucts
3098,127,DATA,1607393037807265129,1607392935000000000,1607393037807265129,90,0,1807264900,ucts
3099,102,DATA,1607394373634159580,1607394261000000000,1607394373634159580,90,0,8634159300,ucts
3100,44,DATA,1607406182086385013,1607406073000000000,1607406182086385013,90,0,7086384800,ucts
3101,37,DATA,1607407166313411768,1607407058000000000,1607407166313411768,90,0,4313411500,ucts

The cron job is: # [RUN SUMMARY] Produce run summary file the morning after data taking at 6:45 UTC 45 06 * * * obsdate=`date +\%Y\%m\%d -d yesterday`; export obsdate; /fefs/aswg/lstosa/run_summary.sh $obsdate >/dev/null 2>&1

Note: one can always produce its own run summary files with this lstchain script.

Launch sequencer

This is indeed when OSA processing starts.

First simulate the sequencer by using the -s option:

$ sequencer -v -s -c cfg/sequencer.cfg  -d 2020_12_07 LST1

Then, after checking that everything is in the sequencer table, launch the sequencer script script without the -s option:

$ sequencer -v -c cfg/sequencer.cfg  -d 2020_12_07 LST1

The “sequencer” script creates and executes the calibration sequence and prepares a SLURM job array which launches the data sequences for every subrun. The sequence scripts will be launched using sbatch.

A first calibration sequence creates the DRS4 pedestal, charge and time calibration files. Once the calibration sequence has finished, the rest of the sequences, corresponding to sky-data runs that make use of the previously produced calibration files, start executing. DL1 and DL2 files are generated. It takes around 1 hour to execute the calibration sequence and also around 1 hour in total to execute the data sequences (go from R0 to DL2).

What does sequencer do?

- Set up the ‘running_analysis’ directory according to the path and production ID set up in the cfg file.
- Extract run information from RunSummary.
- Build the sequences 
(each sequence corresponds to a run).
- Create jobs for each sequence.
- Submit jobs to queue.

Monitor the processing status

At any moment you can launch again the same sequencer command used to trigger the processing but this time with the -s option for simulating all the steps. It will display the sequencer table with the processing status for each run. This is done automatically by sequencer_webmaker script that produces the table display in https://www.lst1.iac.es/datacheck/lstosa/sequencer.xhtml. This page is updated every 5 minutes. So there is no need to login into the LST-IT to check the processing status.

(osa) [daniel.morcuende@cp01 lstosa]$ sequencer -c sequencer.cfg -s -d 2021_09_02 LST1
=========================== Starting sequencer.py at 2021-10-13 12:55:35 UTC for LST, Telescope: LST1, Night: 2021_09_02 ===========================
Tel   Seq  Parent  Type      Run   Subruns  Source  Wobble  Action  Tries  JobID     State    Host  CPU_time  Walltime  Exit  DL1%  MUONS%  DL1AB%  DATACHECK%  DL2%  
LST1    0  None    PEDCALIB  5979  5        None    None    Check       1  11721605  RUNNING  None  00:01:11  None      None  None  None    None    None        None  
LST1    1       0  DATA      5970  173      None    None    Check       0  None      None     None  None      None      None     0       0       0           0     0  
LST1    2       0  DATA      5971  139      None    None    Check       0  None      None     None  None      None      None     0       0       0           0     0  
LST1    3       0  DATA      5972  130      None    None    Check       0  None      None     None  None      None      None     0       0       0           0     0  
LST1    4       0  DATA      5973  125      None    None    Check       0  None      None     None  None      None      None     0       0       0           0     0  
LST1    5       0  DATA      5980  83       None    None    Check       0  None      None     None  None      None      None     0       0       0           0     0  
LST1    6       0  DATA      5981  117      None    None    Check       0  None      None     None  None      None      None     0       0       0           0     0  
LST1    7       0  DATA      5982  114      None    None    Check       0  None      None     None  None      None      None     0       0       0           0     0

Note: this will be done automatically by the autocloser script.

Launch closer/autocloser

Launch the "autocloser" script:

$ python osa/scripts/autocloser.py -c cfg/sequencer.cfg -d 2020_12_07 LST1

The -s option can be used first to simulate the autocloser launching:

$ python osa/scripts/autocloser.py -c cfg/sequencer.cfg -s -d 2020_12_07 LST1

In case of finding errors or non-completed sequences, launching the autocloser script will not close the day. In that case, the autocloser script can be launched using the -y option which forces it to close:

$ python osa/scripts/autocloser.py -c cfg/sequencer.cfg -y -s -d 2020_12_07 LST1
 
============================ Starting closer.py at 2021-11-03 14:33:19 UTC for LST, Telescope: LST1, Night: 2020_12_07 ============================
analysis.finished.timestamp=2021-11-03 14:33:20.517605
analysis.finished.night=2020_12_07
analysis.finished.telescope=LST1
analysis.finished.data.size=1728 GB
analysis.finished.data.files.r0=1092
analysis.finished.data.files.pedestal=0
analysis.finished.data.files.calib=0
analysis.finished.data.files.time_calib=0
analysis.finished.data.files.dl1=0
analysis.finished.data.files.dl2=0
analysis.finished.data.files.muons=0
analysis.finished.data.files.datacheck=0

Looping over the sequences and merging the dl2 files
Submitted batch job 12208392
Submitted batch job 12208393
Submitted batch job 12208394
Submitted batch job 12208395
Submitted batch job 12208396
Submitted batch job 12208397
Submitted batch job 12208398
Submitted batch job 12208399
Submitted batch job 12208400

What does closer do?

- Check that all jobs have finished properly by checking at the history files
- Check that all DL1, DL2, DL1 datacheck, Muons files (subrun wise) were produced by looking at the sequencer table.
- If the above condition is fulfilled, merge DL1 datacheck, DL2 subrun wise files on a run basis.
- Move the files to their final directory.
- Produce provenance.

Note: the provenance information, merging of different analysis products (DL2, DL1datacheck) are handled by closer script. In previous versions of OSA this was done separately by hand.

Provenance files run-wise are produced in these directories:

/fefs/aswg/workspace/firstname.lastname/data/real/{DL1,DL2}/20201207/v0.7.5/tailcut84_dynamic_cleaning/log

DL1 files are in

/fefs/aswg/workspace/firstname.lastname/data/real/DL1/20201207/v0.7.5/tailcut84_dynamic_cleaning

DL2 files are in

/fefs/aswg/workspace/firstname.lastname/data/real/DL2/20201207/v0.7.5/tailcut84_dynamic_cleaning

Muons file are in

/fefs/aswg/workspace/firstname.lastname/data/real/DL1/20201207/v0.7.5

DL1 datacheck files are in

/fefs/aswg/workspace/firstname.lastname/data/real/DL1/20201207/v0.7.5/tailcut84_dynamic_cleaning

Merged DL1 datacheck h5 and PDF files are produced on a run basis.

Long-term DL1 monitoring script

These are not python scripts but bash scripts in lstosa/utils/

- Daily (long-term script restricted to one day to facilitate the daily datacheck)
- All-time monitoring (run over all the runs taken so far after July 2020)

Copy datacheck files to web

Copy pdf run-wise files to lst1.iac.es/datacheck/dl1/ and copy the DRS4 and calibration datacheck pdf files to lst1.iac.es/datacheck/drs4/ and lst1.iac.es/datacheck/enf_calibration/, respectively, by launching the copy_datacheck.py script.


Backup

Cron jobs

List of cron jobs executed automatically by lstanalyzer. Check the command issued with crontab -l logged as lstanalyzer:

  • [COPY OBS OVERVIEW] Copy overview Seiya Nozaki's plot to lst1 web server (https://www.lst1.iac.es/datacheck/observation_overview/)
  • [RUN SUMMARY] Produce lstchain run summary file the morning after data taking
  • [SEQUENCER] Launch sequencer once in the morning
  • [WEB] Make the sequencer XHTML table and copy it to the lst1 web server. Updated every 5 mins.
  • [CLOSER] Launch the closer without forcing it (no -y option). It will fail if not all sequences are finished.
  • [COPY DATACHECK] Copy the available calibration and DL1 data check to the LST1 web server.
  • [LONG TERM DAILY] Launch the long-term script on a daily basis.
  • [LONG TERM] Launch the long-term script for all the observations

Git, python, documentation tips

[Developers] How to use git/GitHub

To implement a new feature or make any change you need to first create a new branch from the up-to-date main branch of the repository.

  • Always make sure that your local repository is updated with git pull.
  • Create and change to a new branch: git checkout -b <branch_name>.
  • Make the change and add it to the git history with: git add <path_to_file>.
  • Commit the changes: git commit -m “Message of the commit with short description”.
  • Push the commit to the remote repository: git push -u origin <branch_name>.
  • Go to the lstosa GitHub repository web. You will see a message saying that you could create a Pull Request out of those committed and pushed changes.

Update lstosa in the IT container

For that, synchronize the local main branch (in /fefs/aswg/lstosa or your own installation) with the lstosa remote repository (https://github.com/gae-ucm/lstosa) by doing git pull from “/fefs/aswg/lstosa” and entering your GitHub username and password. Then update the conda environment if needed with conda env update -n osa -f environment.yml (if versions of dependencies have changed or there are new dependencies) and install lstosa with pip install -e . again.

Note: It may happen that there are conflicts if changes are directly done locally which should be avoided. Resolving conflicts might be straightforward but it must be done carefully. Some commands that may help are git status, git log. Command git stash can be used to resolve the conflict. It bypasses momentarily the local changes, then you should be able to resync with the origin repository.

Look for problematic files during the processing

Sometimes you see that one sequence failed in the sequencer table. You need to figure out which are the affected subruns. To do that you can directly look into the history files which are produced on a subrun basis.

To check if there is any problem with the DL1 files:

$ grep "fits 1" /fefs/aswg/data/real/running_analysis/20200810/v0.6.0_v04/*.history

To check if there is any problem with the DL2 files:

$ grep "json 1" /fefs/aswg/data/real/running_analysis/20200810/v0.6.0_v04/*.history

And for memory issues:

$ grep "json -9" /fefs/aswg/data/real/running_analysis/20200812/v0.6.0_v04/*.history

In principle, if everything goes well the exit rc code would be 0. However, it may happen that due to some warnings/errors (mainly with the DL1 datacheck part) the exit code is 1. It is not a fatal error and DL1 h5 file is produced anyway. Other exit errors might arise (although it is not common) and probably they would be problematic. In that case, the only advice is to check the slurm and python out and err logs in the running_analysis directory directory.