# PhysLiteToOpenData
Package to create Educational OpenData ntuples from PhysLite based on https://atlassoftwaredocs.web.cern.ch/AnalysisSWTutorial/.

Useful links:
- [DAODPHYS/LITE reference](https://opendata.atlas.cern/docs/documentation/data_format/physlite) (Public)
- [DAODPHYS/LITE reference](https://twiki.cern.ch/twiki/bin/view/AtlasProtected/DAODPhys#DAOD_PHYSLITE) (ATLAS Internal)
- [CPAlogrithmsExample](https://gitlab.cern.ch/atlas/athena/-/blob/master/PhysicsAnalysis/Algorithms/AnalysisAlgorithmsConfig/python/FullCPAlgorithmsTest.py)

## 1. Installation

### Setup 

First ensure you have the latest version of `git` setup correctly.

```
setupATLAS -q
lsetup git
mkdir OpenData; cd OpenData
mkdir source run build
cd source
git clone https://gitlab.cern.ch/atlas-outreach-data-tools/physlitetoopendata.git PhysLiteToOpenData
cp PhysLiteToOpenData/extras/CMakeLists-template.txt CMakeLists.txt
cd ../build
asetup AnalysisBase,25.2.40
cmake ../source
cmake --build .
source x86_64*/setup.sh
cd ..
``` 

#### With docker / podman
Without access to cvmfs you can use docker instead:

```bash
docker run -v $DATA_DIR:/workdir/data -v $OUTPUT_DIR:/workdir/run -it gitlab-registry.cern.ch/atlas-outreach-data-tools/physlitetoopendata:master OpenDataNtupler.py -i inputlist_test.txt
```

If you want to make changes to the code and run that in a docker container, follow the instructions below:

```bash
mkdir OpenData; cd OpenData
mkdir source run
cd source
git clone https://gitlab.cern.ch/atlas-outreach-data-tools/physlitetoopendata.git PhysLiteToOpenData
cp PhysLiteToOpenData/extras/CMakeLists-template.txt CMakeLists.txt
cd ..
sudo docker run --rm -it -v $PWD/source:/workdir/source -v $DATA_DIR:/workdir/data -v $PWD/run:/workdir/run gitlab-registry.cern.ch/atlas-outreach-data-tools/physlitetoopendata:master /bin/bash
```
similar with podman:
```bash
podman volume create atlasopendata_run  # create a run volume instead of a regular directory
podman run --rm -it -v $PWD/source:/workdir/source -v $PWD/data:/workdir/data -v atlasopendata_run:/workdir/run gitlab-registry.cern.ch/atlas-outreach-data-tools/physlitetoopendata:master /bin/bash
```
to see where the content of the run volume is stored on the host, do `podman volume inspect atlasopendata_run`

continue inside the container (same for docker and podman):
```bash
source /release_setup.sh
mkdir build && cd build
cmake ../source
cmake --build .
source x86_64*/setup.sh
```

## 2. Local Running
To run the code:
```
cd run
OpenDataNtupler.py -i inputlist.txt
```

So far have been testing on a file from `mc20_13TeV.700322.Sh_2211_Zee_maxHTpTV2_CVetoBVeto.deriv.DAOD_PHYSLITE.e8351_s3681_r13167_p5855`.
You can find a number of test files in `/cvmfs/atlas-nightlies.cern.ch/repo/data/data-art/ASG/DAOD_PHYSLITE` if you have cvmfs access.

## 3. Grid Running

Set up to run on grid:
```
setupATLAS -q
cd build
asetup
source */setup.sh
cd ..
voms-proxy-init -voms atlas
lsetup panda
```

Create a file ending with `.txt` which contains the list of datasets to run.

Then to submit (from the top level directory) do:
```
submitToGrid.py -i <input_list> -p <prefix> -n <maxNFilesPerJob>
```

where `<input_list>` is replaced by the name of your file containing the datasets to run, `<prefix>` is the prefix to the grid job name e.g. OD_v0.1, and `<maxNFilesPerJob>` is the maximum number of files to run from the dataset per grid job.

## 4. Testing
Tests are located in `test/`. Run with `pytest test`.

## 5. Package structure

- Header files are in the `PhysLiteToOpenData` directory, source files in the `Root` directory.
   - The `PhysLiteToOpenDataDict.h` and `selection.xml` files are to create "dictionaries" for reflection (making C++ classes available in python)
   - The `PhysLiteToOpenData.cxx` source file depends on the other source files that deal with individual objects
- The `scripts` directory contains the script for submission to the grid, and the `share` directory contains the script for local running
- The `extras` directory contains a project-level `CMakeLists.txt` file for convenience
- The `notebooks` directory contains a few helpful jupyter notebooks for testing the ntuple output
- The `python` directory contains python configuration for the C++ code
- The `test` directory contains tests, and has its own README for convenience
- The `hist_branches_to_ntuples` contains some convenient scripts for adding branches to the ntuples after they've been created, for example for adding metadata

## License
This software is licensed under the terms of the [Apache 2.0 License](LICENSE)