One of my Ph.D. thesis work, I implemented a machine learning algorithm that can learn the structures of objects called
jets in proton collisions. This led to a new way to search for anomalies (or new physics) hidden in these jets. I fully implemented the algorithm in c++, available for download at SourceForge. The code was later adapted by the ATLAS experiment and led to some of the best constraints on new physics hiding in jets. All the technical details was published in JHEP .
One of the main goals of proton collider experiments is searching for new particles. In order to do that, we need to understand what happens when two protons collide. Unfortunately, protons are quantum mechanical and we cannot control what happens when collisions happen. In a typical collision, uninteresting particles called hadrons are produced in abundance, leading to a huge background one must content with when searching for interesting new physics. Why? This is due to the structure of the proton.
The proton is a composite object made of many other subatomic particles called quarks and gluons. These particles are bound together by the Strong Force, responsible for fusion reactions that power the Sun. Because the strong force is so "strong", and the quarks and gluons are so abundant, the most common results of proton collisions are simply more quarks and gluons. The produced quarks and gluons fly away near the speed of light and eventually convert into other particles called hadrons, which include things like (anti)protons, neutrons and pions. These produced hadrons tend to be collimated, and each collimation is called a jet.
A collision event recorded by the ATLAS experiment in 2015. The events include multiple jets that were detected. The image is about 5m wide, and the multiple lines show the reconstructed tracks of the produced hadrons. The rectangles depict a 3D histogram of the energy desposits in the calorimeters. The hadrons are colluminated and are clustered into 6 jets here.
Given that jets contain lots of particles, they are hard to understand and model. It is also difficult and computationally costly to simulate them. What if there is new physics hidden in these jets? How can we be sure that there is new physics? This is where machine learning can help.
Instead of using the standard procedure for modelling jets via simulations, I constructed an anomaly detection algorithm to use the data itself to check for new physics. To do this I divided the properties of each jet into two categories: bulk properties (i.e. total momentum, energy and location in detector), and sub-structure properties (i.e. shape or energy/particle distributions within a jet). From particle physics, we know that bulk properties can be used to predict sub-structure properties. The next step is then to learn this prediction in data.
Here's a specific example, using simulated data, I first look at collision events with only two to three very energetic jets. Interesting signals tend to have many more jets, thus we expect the sample to contain only background. Then I look at a sub-structure properties (i.e the mass over momentum ratio). From machine learning, I trained a model using Multivariate Kernels . The model is based on a smoothing parameter c, which limits the extend of over-training. We followed the recommendation found in literature and showed that it performed fairly well!
After validating our chosen machine learning algorithm, we used it to construct new
Event substructure variables that are useful for discriminating signal vs. background. These variables were constructed by combining the bulk and sub-structure properties of all the jets, and were proposed in one of my publications . Here are two examples of such
Event substructure variables, the total jet mass and T21