sophon / README.md
colizz's picture
Update README.md upon finalizing the dataset
59dc4b7 verified
metadata
license: mit
datasets:
  - jet-universe/jetclass2
tags:
  - particle physics
  - jet tagging

Model Card: Sophon

The Sophon model is a jet tagging model pre-trained on a 188-class classification task using the JetClass-II dataset. It is based on the Particle Transformer architecture. This model represents the first practical implementation under the Sophon (Signature-Oriented Pre-training for Heavy-resonance ObservatioN) methodology.

For more details, refer to the following links: [Paper], [Github].

Try out this [Demo on Colab] to get started with the model.

Model Details

The Sophon model functions both as a generic jet tagging model and a pre-trained model tailored for LHC's analysis needs. As a jet tagger, the model is trained to distinguish among 188 classes:

jetclass2_table

Key features of the model include:

  • Training in a mass-decorrelated scenario, achieved by (1) ensuring the training dataset covers a wide range of jet transverse momentum (pT) and soft-drop mass (mSD), and (2) reweighting samples in each major training class to achieve similar jet distributions for pT and mSD.
  • Enhanced "scale invariance" through the use of normalized 4-vectors as input.

Uses and Impact

Inferring Sophon model via ONNX

The Sophon model is valuable for future LHC phenomenological research, particularly for estimating physics measurement sensitivity using fast-simulation (Delphes) datasets. For a quick example of using this model in Python, or integrating this model in C++ workflows to process Delphes files, check [here].

This model also offers insights for the future development of generic and foundation AI models for particle physics experiments.

Training Details

Install dependencies

The Sophon model is based on the ParT architecture. It is implemented in PyTorch, with training based on the weaver framework for dataset loading and transformation. To install weaver, run:

pip install git+https://github.com/hqucms/weaver-core.git@dev/custom_train_eval

Note: We are temporarily using a development branch of weaver.

For instructions on setting up Miniconda and installing PyTorch, refer to the weaver page.

Download Sophon repository

git clone https://github.com/jet-universe/sophon.git
cd sophon

Download dataset

Download the JetClass-II dataset from [HuggingFace Dataset]. The training and validation files are used in this work, while the test files are not used.

Ensure that all ROOT files are accessible from:

./datasets/JetClassII/Pythia/{Res2P,Res34P,QCD}_*.parquet

Training

Step 1: Generate dataset sampling weights according to the weights section in the data configuration. The processed config with pre-calculated weights will be saved to data/JetClassII.

./train_sophon.sh make_weight

Step 2: Start training.

./train_sophon.sh train

Note: Depending on your machine and GPU configuration, additional settings may be useful. Here are a few examples:

  • Enable PyTorch's DDP for parallel training, e.g., CUDA_VISIBLE_DEVICES=0,1,2,3 DDP_NGPUS=4 ./train_sophon.sh train --start-lr 2e-3 (the learning rate should be scaled according to DDP_NGPUS).
  • Configure the number of data loader workers and the number of splits for the entire dataset. The script uses the default configuration --num-workers 5 --data-split-num 200, which means there are 5 workers, each responsible for processing 1/5 of the data files and reading the data synchronously; the data assigned to each worker is split into 200 parts, with each worker sequentially reading 1/200 of the total data in order.

Step 3 (optional): Convert the model to ONNX.

./train_sophon.sh convert

Evaluation

The Sophon model has been evaluated on several LHC experimental tasks. The evaluation dataset is a dedicated Standard Model dataset, collected using a generic large-R jet trigger that selects large-R (R = 0.8) jets with pT > 400 GeV and trimmed mass mtrim > 50 GeV.

Key evaluation results include:

  • Superior performance in directly tagging X→bb jets against QCD background jets, and in tagging X→bs jets against QCD jets after fine-tuning. The model outperforms the best experimental taggers for X→bb and X→bs.
  • Significant potential for searching for unknown heavy resonances by constructing various tagging discriminants, selecting data, and performing generic bump hunts.
  • Excellent results in anomaly detection using a weakly-supervised training approach, showing greater sensitivity to signals at very low signal injection levels and improved significance with adequate signal.

For more details, refer to the [Paper].

Citation

If you use the JetClass-II dataset or the Sophon model, please cite:

@article{Li:2024htp,
    author = "Li, Congqiao and Agapitos, Antonios and Drews, Jovin and Duarte, Javier and Fu, Dawei and Gao, Leyun and Kansal, Raghav and Kasieczka, Gregor and Moureaux, Louis and Qu, Huilin and Suarez, Cristina Mantilla and Li, Qiang",
    title = "{Accelerating Resonance Searches via Signature-Oriented Pre-training}",
    eprint = "2405.12972",
    archivePrefix = "arXiv",
    primaryClass = "hep-ph",
    month = "5",
    year = "2024"
}