### Organisers **Gabor Csanyi (Cambridge), Kyle Cranmer (NYU) , Shirley Ho (Flatiron Institute), Michelle Ceriotti (EPFL)**

### Date: August 18th

This workshop aims to explore current frontiers and challenges of Machine Learning for the physical sciences, covering a wide range of disciplines

ET | CET | GMT+8 | Speaker |
---|---|---|---|

9:00am-9:30am | 15:00-15:30 | 21:00-21:30 | Christoph Ortner: Atomic Cluster Expansion and Beyond- ML with Symmetry for Particle Systems |

9:30am-10:00am | 15:30-16:00 | 21:30-22:00 | Julia Westermayr: Learning orbital energies and excited states of functional organic molecules |

10:00am-10:30am | 16:00-16:30 | 22:00-22:30 | Break/Gathertown |

10:30am-11:00am | 16:30-17:00 | 22:30-23:00 | George Booth: A Bayesian machine-learning perspective on the quantum many-electron problem |

11:00am-11:30am | 17:00-17:30 | 23:00-23:30 | Vanessa Bohm: Anomaly Detection with the Probabilistic AutoEncoder (PAE) |

11:30am-11:45am | 17:30-17:45 | 23:30-23:45 | Break/Gathertown |

11:45am-12:15pm | 17:45-18:15 | 23:45-00:15 | Kyle Cranmer: Studying the effectiveness of inductive bias with a physics-inspired generative model |

### Abstracts

#### Christoph Ortner (Warwick): Atomic Cluster Expansion and Beyond - ML with Symmetry for Particle Systems

I will briefly review machine-learning for interatomic potentials and the atomic cluster expansion (Drautz, 2019) which is a classical linear regression scheme, but with exceptional performance demonstrated in initial results. A key aspect is the construction of features describing particle configurations that capture the physical symmetries. With this review I also want to motivate various immediate generalisations which will hopefully have applications far beyond interatomic potentials, including in particular equivariant properties, inputs with different symmetries, and employing ACE in the construction of a (new?) physically informed message passing-type network architecture.

#### Julia Westermayr (Warwick) :Learning orbital energies and excited states of functional organic molecules

A computationally efficient description of the excited states of functional organic molecules can significantly advance the research field of optoelectronics. This is, because knowing the level alignment of molecules, can enable a targeted design of novel functional materials with tailored optoelectronic properties. However, high throughput spectroscopic characterization of candidate molecules is tedious and computational methods are either limited by high computational costs or low accuracy [1]. In this talk, we will show how machine learning models can be used to predict molecular excited states with experimental accuracy and low computational costs. Our new method is based on two interdependent machine learning models: The first one is physically inspired by a quantum chemical Hamiltonian and describes orbital energies of molecules as eigenvalues of a latent machine learning Hamiltonian matrix. The second model is used to correct orbital energies to quasiparticle energies that can be compared to experiment [2]. We evaluate the accuracy and reliability of the model by learning molecules represented in the spectroscopy dataset “OE62” [3] and by predicting optoelectronic properties of unseen molecules relevant in optoelectronics [2].

[1] J. Westermayr, P. Marquetand, Chem. Rev., in press, doi:10.1021/acs.chemrev.0c00749 (2020). [2] J. Westermayr and R. J. Maurer, Chem. Sci. (2021); doi:10.1039/D1SC01542G [3] A. Stuke, C. Kunkel, D. Golze, M. Todorović, J. T. Margraf, K. Reuter, P. Rinke, and H. Oberhofer, Sci. Data. 7, 58 (2020).

#### George Booth (King’s College London): A Bayesian machine-learning perspective on the quantum many-electron problem

The quantum many-body problem is a keystone challenge, with developments impacting fields from materials science, to nuclear structure. The problem at its heart is exponentially complex, and we have been looking into a new Bayesian framework for describing the complexity of these states. Specifically, we have developed a Gaussian process regression framework for entangled quantum states, with development of a physically motivated kernel. This has lead to new levels of accuracy in describing the physics of strongly entangled quantum systems, new supervised learning optimization strategies and a novel perspective on this fundamental object of quantum many-body problems. However, in this recasting of quantum systems into a machine learning problem, we also find that many of the challenges in the field (e.g. frustrated quantum magnetism) can be also recast as problems of generalization in a machine learning context. Finally, we will briefly mention new research themes which have opened up from this work that we are exploring, including the use of this wave function specification as a new model within classical machine learning classification and regression tasks.

Refs: A Glielmo, Y Rath, G Csányi, A De Vita, GH Booth, Physical Review X, 10, 041026 (2020) Y Rath, A Glielmo, GH Booth, Journal of Chemical Physics, 153, 124108 (2020)

#### Vanessa Bohm (UC Berkeley): Anomaly Detection with the Probabilistic AutoEncoder (PAE)

Deep generative models are powerful machine learning models that can learn complex, high-dimensional data likelihoods and should thus be naturally suited for the task of anomaly detection. However, a number of works have found that popular generative models can fail catastrophically in this task, assigning higher likelihoods to anomalous data than to the data they were trained on. In my talk I will outline possible explanations for this failure and introduce a simple, but powerful generative model, the probabilistic autoencoder, which exhibits outstanding outlier detection accuracy. I will further show how the PAE can be used to mine a dataset of millions of high resolution galaxy spectra for anomalous and potentially interesting objects.

#### Kyle Cranmer (NYU): Studying the effectiveness of inductive bias with a physics-inspired generative model

Motivated by the desire to better understand the interactions between the structure of the data, the models, and the learning algorithms we have developed a physics-inspired generative model to explore the effectiveness of different types of inductive bias. The generative model is a Markov process that branches as it evolves such that each instance is a tree. The features associated to the leaves of the tree are observed, while the rest of the tree is latent. It is a simplification of the “parton shower” process that produces sprays of particles at the large hadron collider known as “jets” and it has similarities to phylogenetic trees. In parallel, we have developed dynamic programming algorithms to efficiently compute the marginal likelihood and maximum a posteriori tree given a set of leaves over the combinatorially large search space. These developments provide powerful tools to study the effectiveness of different forms of inductive bias including graph networks, message passing networks, deep sets, and TreeRNNs. This work also connects to recent ideas such as algorithmic alignment.