Difference between revisions of "Fast Offline Analysis Guide"

From my_wiki
Jump to: navigation, search
(Quick-look next-day analysis)
(Quick-look next-day analysis)
 
(14 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
= Quick-look next-day analysis =
 
= Quick-look next-day analysis =
  
Work in progress...
+
Documentation is work in progress...
  
 
Description of the tasks for the fast offline analysis (FOA)
 
Description of the tasks for the fast offline analysis (FOA)
  
Perform the analysis of the data from ToO/alert observations (AGN ToO, Galactic transient sources, )
+
Perform the '''analysis of data from alerts, time of opportunities (ToO), and transients'''' observations, which comprise: 
 +
* Gamma-ray burst observations
 +
* Gravitational wave alerts
 +
* Neutrino alerts and ToOs
 +
* Galactic transients ToOs
 +
* AGN ToOs
  
 +
Also, '''FOA should perform the next-day standard analysis''' in case any expert (hardware/software) asks for a quick look analysis of any data to '''assess the telescope performance or check whether there is any problem''' with data-taking.
 +
 +
For bookkeeping, please also post the results on a new wiki page following [[Fast_Offline_Analysis_Template]].
 +
 +
== Analysis set up (SLURM, conda environment)==
 +
 +
=== SLURM settings ===
 +
To get higher priority in SLURM use the account option of sbatch/srun (-A, --account) "foa" like:
 +
 +
sbatch -A foa script_name.sh
 +
 +
=== Analysis environment ===
 +
 +
There is an environment already created with the latest lstchain version (0.10.12), acceptance_modelisation (0.2.0) and Gammapy (1.1) ready to be used. The first thing to do is activate that environment once logged into the LST-IT cluster:
 +
 +
conda activate foa
 +
 +
You can use a higher Gammapy version, but keep in mind that the "acceptance_modelisation" package might be not compatible with it and may have problems when trying to install them in the same environment.
  
 
== Production of DL3 files for a quick-look analysis ==
 
== Production of DL3 files for a quick-look analysis ==
Line 18: Line 41:
 
'''Outline of the procedure:'''
 
'''Outline of the procedure:'''
  
1. Start from DL1 files automatically produced by the on-site analysis, LSTOSA (each night's data is typically ready by noon next day)
+
1. Start from DL1 files automatically produced by the on-site analysis, LSTOSA (each night's data is typically ready by noon the next day).
  
2. Select the relevant files for a given source, period and zenith distance range
+
2. Select the relevant files for a given source, period, and zenith distance range.
  
3. Use already existing standard RF models to create DL2 real data files
+
3. Use already existing standard RF models to '''create DL2''' real data files: '''select the closest declination''' line.
  
4. Use the already existing MC test DL2 files (corresponding to the same "production" as the RF models) to create the IRF files
+
4. Use the already existing MC test DL2 files (corresponding to the same "production" as the RF models) to '''create the IRF files'''.
  
5. Produce the DL3 files
+
5. '''Produce the DL3''' files.
 +
 
 +
6. Perform high-level analysis (theta2 distribution, sky-maps, SED and light curve).
 +
 
 +
7. '''Post the results''' and analysis details in a new '''wiki page''' following the [[Fast_Offline_Analysis_Template]].
  
 
----
 
----
  
 
=== 1. Selection of the DL1 real data files ===
 
=== 1. Selection of the DL1 real data files ===
 +
 +
==== Check ELOG, schedule webpage, Slack ====
 +
 +
Check the ELOG sent via email to the lst-onsite mailing list. There you will know if observations where performed, on which targets and if they should be analyzed by the FOA (look for keywords ToO, GRB, alert, neutrino, GW, etc)
 +
 +
Before the data get processed the day after datataking, you can already know which observations were taken by checking the table: https://www.lst1.iac.es/datacheck/lstosa/sequencer.xhtml
 +
 +
There you can already identify the observation runs for the FOA.
 +
 +
Check also the LST schedule webpage: https://www.lst1.iac.es/schedule/ (there could be ToO scheduled in advance indicated by a blinking flames icon)
 +
 +
==== Using data quality notebook ====
 +
 +
Despite running the notebook, which will select some runs, the FOA analysis should be performed using all the runs available. The notebook will inform about the quality of the data and the validity of the results at later stages. But still, it is important to take into account all the data available in the FOA, to know whether there is any gamma-ray excess despite not being ideal observing conditions.
  
 
Use the notebook [https://github.com/cta-observatory/cta-lstchain/blob/main/notebooks/data_quality.ipynb cta-lstchain/notebooks/data_quality.ipynb]
 
Use the notebook [https://github.com/cta-observatory/cta-lstchain/blob/main/notebooks/data_quality.ipynb cta-lstchain/notebooks/data_quality.ipynb]
  
 
*The version of the notebook used at the school (from lstchain v0.10.7) is at:
 
*The version of the notebook used at the school (from lstchain v0.10.7) is at:
/fefs/aswg/workspace/analysis-school-2024/
+
 /fefs/aswg/workspace/analysis-school-2024/
  
 
*Copy it to a directory of your choice in the IT cluster, and open it with jupyter notebook. (check these instructions if you don’t know how to execute a jupyter notebook on a remote machine)
 
*Copy it to a directory of your choice in the IT cluster, and open it with jupyter notebook. (check these instructions if you don’t know how to execute a jupyter notebook on a remote machine)
Line 46: Line 87:
  
 
This will select Crab observations (in standard wobble mode) up to 35 deg in zenith taken
 
This will select Crab observations (in standard wobble mode) up to 35 deg in zenith taken
in the night of Jan 17, 2023. At the end of the notebook you will get a list of “good runs”:
+
on the night of Jan 17, 2023. At the end of the notebook you will get a list of “good runs”:
  
 
  /fefs/aswg/data/real/DL1/20230117/v0.10/tailcut84/dl1_LST-1.Run11699.h5
 
  /fefs/aswg/data/real/DL1/20230117/v0.10/tailcut84/dl1_LST-1.Run11699.h5
Line 59: Line 100:
 
in the notebook cell “path to the necessary datacheck files”. The directory below
 
in the notebook cell “path to the necessary datacheck files”. The directory below
 
contains links to (the most recent version of) all datacheck files:
 
contains links to (the most recent version of) all datacheck files:
 +
 
  /fefs/aswg/data/real/OSA/DL1DataCheck_LongTerm/night_wise/all/
 
  /fefs/aswg/data/real/OSA/DL1DataCheck_LongTerm/night_wise/all/
  
Line 65: Line 107:
 
=== 2. Production of the real data DL2 files ===
 
=== 2. Production of the real data DL2 files ===
  
 +
* For this you need to execute lstchain_dl1_to_dl2 script. It reads in the DL1b data (image parameters like width, length, intensity…) from the DL1 files and reconstructs the “physics parameters” (direction, energy, gammaness)
 +
 +
* Besides the DL1 files (those selected by data_quality.ipynb) you will need Random Forests (RFs, built from MC simulations). We will use existing RFs, created from the standard (“base”) MC, that is, with the default level of NSB (night sky background) i.e., ~ dark sky conditions. They are stored under:
 +
/fefs/aswg/data/models/AllSky/20240131_allsky_v0.10.5_all_dec_base/
 +
The “training” MC used to build RFs are generated in pointings along “declination
 +
lines”. Select the declination line closest to the declination of your source. Since
 +
Crab is at 𝛅≃22 deg, we select the line:
 +
 +
/fefs/aswg/data/models/AllSky/20240131_allsky_v0.10.5_all_dec_base/dec_2276/
 +
 +
The directory above contains the RFs and also the .json configuration file that was
 +
used to create them. It is important to use the same in the DL1 to DL2 stage
 +
 +
* The jobs to convert DL1 into DL2 (which are quite memory-consuming) should be launched using SLURM. You can use the following script to do it:
 +
/fefs/aswg/workspace/analysis-school-2024/helpful_scripts/launch_dl1_dl2.sh
 +
 +
It has two arguments: the directory where the RFs are, and the JSON file used in their production. You must execute it in the directory where you have the file_list.txt file containing the list of DL1 files. The directory must have a DL2 subdirectory where the DL2 files will be created. Copy the script to the same directory and execute it:
 +
 
 +
MCMODELS=/fefs/aswg/data/models/AllSky/20240131_allsky_v0.10.5_all_dec_base
 +
./launch_dl1_dl2.sh $MCMODELS/dec_2276 \
 +
$MCMODELS/dec_2276/lstchain_config_2024-01-31.json
 +
 +
You will get a few messages saying that the jobs were submitted. If you want to check the status of your jobs just do:
 +
 +
squeue –u xxxx.yyyy
 +
 +
(where xxxx.yyyy is your username at the IT cluster)
 +
 +
=== 3. Exploration of the DL2 data (theta2 and significance of detection) ===
 +
 +
* With the DL2 files you can already check for the possible detection of a source. Use the notebook below to obtain a 𝛉2 plot around a given sky direction:
 +
 
 +
 cta-lstchain/notebooks/explore_DL2.ipynb
 +
 +
* In the notebook you can easily see how to load the DL2 information into Pandas dataframes, and how to access the main reconstructed parameters: gammaness, direction and energy.
 +
 +
* Note that for datasets longer than a few tens of hours the notebook may be rather slow and memory-hungry. In such cases, you may just move to the DL3 level (see next pages) and just do the 𝛉2 plots from there using Gammapy.
  
 
== Post DL3 analysis ==
 
== Post DL3 analysis ==
Line 70: Line 149:
 
=== Significance of the detection (theta^2 distribution) ===
 
=== Significance of the detection (theta^2 distribution) ===
  
=== 1D spectral analysis ===
+
Starting from DL2 files you can use the following notebook (global selection cuts):
 +
 
 +
https://github.com/cta-observatory/cta-lstchain/blob/main/notebooks/explore_DL2.ipynb
 +
 
 +
You can also use Gammapy to calculate the theta2 distributions:
 +
 
 +
https://indico.cta-observatory.org/event/5272/contributions/42843/attachments/25274/36920/plot_theta2_from_dl3.ipynb
 +
 
 +
=== Sky maps ===
 +
They are important for transient alerts which may not be well localized. Follow the notebook that uses the acceptance model library
 +
 
 +
https://github.com/mdebony/acceptance_modelisation
 +
 
 +
https://indico.cta-observatory.org/event/5272/contributions/43476/attachments/25278/36935/skymap_LST_analysis_school_2D_3D_ring_FoV_pointlike.ipynb
 +
 
 +
=== 1D spectral analysis and light curve ===
 +
 
 +
You can follow this notebook:
 +
 
 +
https://indico.cta-observatory.org/event/5272/contributions/42843/attachments/25274/36922/post_DL3_analysis.ipynb
 +
 
 +
or the Gammapy tutorials for 1D spectral analysis using energy-dependent directional cuts and light curves:
 +
 
 +
https://docs.gammapy.org/1.2/tutorials/analysis-1d/spectral_analysis_rad_max.html
 +
 
 +
https://docs.gammapy.org/1.2/tutorials/analysis-time/light_curve_flare.html

Latest revision as of 18:20, 20 November 2024

Quick-look next-day analysis[edit]

Documentation is work in progress...

Description of the tasks for the fast offline analysis (FOA)

Perform the analysis of data from alerts, time of opportunities (ToO), and transients' observations, which comprise: 

  • Gamma-ray burst observations
  • Gravitational wave alerts
  • Neutrino alerts and ToOs
  • Galactic transients ToOs
  • AGN ToOs

Also, FOA should perform the next-day standard analysis in case any expert (hardware/software) asks for a quick look analysis of any data to assess the telescope performance or check whether there is any problem with data-taking.

For bookkeeping, please also post the results on a new wiki page following Fast_Offline_Analysis_Template.

Analysis set up (SLURM, conda environment)[edit]

SLURM settings[edit]

To get higher priority in SLURM use the account option of sbatch/srun (-A, --account) "foa" like:

sbatch -A foa script_name.sh

Analysis environment[edit]

There is an environment already created with the latest lstchain version (0.10.12), acceptance_modelisation (0.2.0) and Gammapy (1.1) ready to be used. The first thing to do is activate that environment once logged into the LST-IT cluster:

conda activate foa

You can use a higher Gammapy version, but keep in mind that the "acceptance_modelisation" package might be not compatible with it and may have problems when trying to install them in the same environment.

Production of DL3 files for a quick-look analysis[edit]

Analysis guide made by Abelardo Moralejo (IFAE) after the LST-1 Analysis 2024 School



Outline of the procedure:

1. Start from DL1 files automatically produced by the on-site analysis, LSTOSA (each night's data is typically ready by noon the next day).

2. Select the relevant files for a given source, period, and zenith distance range.

3. Use already existing standard RF models to create DL2 real data files: select the closest declination line.

4. Use the already existing MC test DL2 files (corresponding to the same "production" as the RF models) to create the IRF files.

5. Produce the DL3 files.

6. Perform high-level analysis (theta2 distribution, sky-maps, SED and light curve).

7. Post the results and analysis details in a new wiki page following the Fast_Offline_Analysis_Template.


1. Selection of the DL1 real data files[edit]

Check ELOG, schedule webpage, Slack[edit]

Check the ELOG sent via email to the lst-onsite mailing list. There you will know if observations where performed, on which targets and if they should be analyzed by the FOA (look for keywords ToO, GRB, alert, neutrino, GW, etc)

Before the data get processed the day after datataking, you can already know which observations were taken by checking the table: https://www.lst1.iac.es/datacheck/lstosa/sequencer.xhtml

There you can already identify the observation runs for the FOA.

Check also the LST schedule webpage: https://www.lst1.iac.es/schedule/ (there could be ToO scheduled in advance indicated by a blinking flames icon)

Using data quality notebook[edit]

Despite running the notebook, which will select some runs, the FOA analysis should be performed using all the runs available. The notebook will inform about the quality of the data and the validity of the results at later stages. But still, it is important to take into account all the data available in the FOA, to know whether there is any gamma-ray excess despite not being ideal observing conditions.

Use the notebook cta-lstchain/notebooks/data_quality.ipynb

  • The version of the notebook used at the school (from lstchain v0.10.7) is at:

 /fefs/aswg/workspace/analysis-school-2024/

  • Copy it to a directory of your choice in the IT cluster, and open it with jupyter notebook. (check these instructions if you don’t know how to execute a jupyter notebook on a remote machine)
  • Look for the “USER INPUT” cells in the notebook and do the following:
  • In the “path to the necessary datacheck files” cell, set “files” to load all the datacheck files of 2023 data processed with version v0.10 of lstchain (explained in the notebook)
  • check that the source is set to “Crab Nebula”
  • set max_zenith = 35 * u.deg
  • set first_date = 20230117
  • set last_date = 20230117

This will select Crab observations (in standard wobble mode) up to 35 deg in zenith taken on the night of Jan 17, 2023. At the end of the notebook you will get a list of “good runs”:

/fefs/aswg/data/real/DL1/20230117/v0.10/tailcut84/dl1_LST-1.Run11699.h5
/fefs/aswg/data/real/DL1/20230117/v0.10/tailcut84/dl1_LST-1.Run11700.h5
/fefs/aswg/data/real/DL1/20230117/v0.10/tailcut84/dl1_LST-1.Run11701.h5
/fefs/aswg/data/real/DL1/20230117/v0.10/tailcut84/dl1_LST-1.Run11702.h5
/fefs/aswg/data/real/DL1/20230117/v0.10/tailcut84/dl1_LST-1.Run11703.h5

Just copy this list and using a text editor paste it into a file called “file_list.txt” that you will need later (yes, I know, the file could be written by the notebook directly…). If you want to select data from any time in the whole sample you have to change the input in the notebook cell “path to the necessary datacheck files”. The directory below contains links to (the most recent version of) all datacheck files:

/fefs/aswg/data/real/OSA/DL1DataCheck_LongTerm/night_wise/all/
  • See this school session for more details on the data quality selection

2. Production of the real data DL2 files[edit]

  • For this you need to execute lstchain_dl1_to_dl2 script. It reads in the DL1b data (image parameters like width, length, intensity…) from the DL1 files and reconstructs the “physics parameters” (direction, energy, gammaness)
  • Besides the DL1 files (those selected by data_quality.ipynb) you will need Random Forests (RFs, built from MC simulations). We will use existing RFs, created from the standard (“base”) MC, that is, with the default level of NSB (night sky background) i.e., ~ dark sky conditions. They are stored under:
/fefs/aswg/data/models/AllSky/20240131_allsky_v0.10.5_all_dec_base/

The “training” MC used to build RFs are generated in pointings along “declination lines”. Select the declination line closest to the declination of your source. Since Crab is at 𝛅≃22 deg, we select the line:

/fefs/aswg/data/models/AllSky/20240131_allsky_v0.10.5_all_dec_base/dec_2276/

The directory above contains the RFs and also the .json configuration file that was used to create them. It is important to use the same in the DL1 to DL2 stage

  • The jobs to convert DL1 into DL2 (which are quite memory-consuming) should be launched using SLURM. You can use the following script to do it:
/fefs/aswg/workspace/analysis-school-2024/helpful_scripts/launch_dl1_dl2.sh

It has two arguments: the directory where the RFs are, and the JSON file used in their production. You must execute it in the directory where you have the file_list.txt file containing the list of DL1 files. The directory must have a DL2 subdirectory where the DL2 files will be created. Copy the script to the same directory and execute it:  

MCMODELS=/fefs/aswg/data/models/AllSky/20240131_allsky_v0.10.5_all_dec_base
./launch_dl1_dl2.sh $MCMODELS/dec_2276 \
$MCMODELS/dec_2276/lstchain_config_2024-01-31.json

You will get a few messages saying that the jobs were submitted. If you want to check the status of your jobs just do:

squeue –u xxxx.yyyy

(where xxxx.yyyy is your username at the IT cluster)

3. Exploration of the DL2 data (theta2 and significance of detection)[edit]

  • With the DL2 files you can already check for the possible detection of a source. Use the notebook below to obtain a 𝛉2 plot around a given sky direction:

   cta-lstchain/notebooks/explore_DL2.ipynb

  • In the notebook you can easily see how to load the DL2 information into Pandas dataframes, and how to access the main reconstructed parameters: gammaness, direction and energy.
  • Note that for datasets longer than a few tens of hours the notebook may be rather slow and memory-hungry. In such cases, you may just move to the DL3 level (see next pages) and just do the 𝛉2 plots from there using Gammapy.

Post DL3 analysis[edit]

Significance of the detection (theta^2 distribution)[edit]

Starting from DL2 files you can use the following notebook (global selection cuts):

https://github.com/cta-observatory/cta-lstchain/blob/main/notebooks/explore_DL2.ipynb

You can also use Gammapy to calculate the theta2 distributions:

https://indico.cta-observatory.org/event/5272/contributions/42843/attachments/25274/36920/plot_theta2_from_dl3.ipynb

Sky maps[edit]

They are important for transient alerts which may not be well localized. Follow the notebook that uses the acceptance model library

https://github.com/mdebony/acceptance_modelisation

https://indico.cta-observatory.org/event/5272/contributions/43476/attachments/25278/36935/skymap_LST_analysis_school_2D_3D_ring_FoV_pointlike.ipynb

1D spectral analysis and light curve[edit]

You can follow this notebook:

https://indico.cta-observatory.org/event/5272/contributions/42843/attachments/25274/36922/post_DL3_analysis.ipynb

or the Gammapy tutorials for 1D spectral analysis using energy-dependent directional cuts and light curves:

https://docs.gammapy.org/1.2/tutorials/analysis-1d/spectral_analysis_rad_max.html

https://docs.gammapy.org/1.2/tutorials/analysis-time/light_curve_flare.html