# INNF+ 2021

### Schedule

#### For a more detailed schedule, see the video stream link (registration needed).

For all the videos of accepted papers, please visit the accepted papers page.
(Times are local; )
 Opening Remarks Charline Le Lan(Oxford) Invited Talk: On the use of density models for anomaly detection Thanks to the tractability of their likelihood, some deep generative models show promise for seemingly straightforward but important applications like anomaly detection. However, the likelihood values empirically attributed to anomalies conflict with the expectations these proposed applications suggest. This talk will review some of these density-based anomaly detection methods that have widely been used in the machine learning literature and question the expectation that density estimation should always enable anomaly detection. In particular, we will examine the extent of the issues that can arise from these practices and look at some practical consequences. Finally, the talk will also cover some promising directions for reliably detecting anomalies through density, in particular highlighting the importance of prior knowledge. This project was joint work with Laurent Dinh. Yingzhen Li(ICL) Invited Talk: Inference with scores: slices, diffusions and flows In this talk I will discuss our recent efforts on developing Stein's method for approximate inference and model learning. I will start from an introduction of the score matching and Stein discrepancy, with a comparison to KL divergence based approaches. Then I will discuss our recent works that tries to address the curse of dimensionality issues in existing Stein discrepancies. The idea is based on slicing, and an important step within the approach is to measure the score difference in a different basis of $\mathbb{R}^d$. Lastly we extend the basis modification idea to measuring score difference with local basis, and discuss an on-going work that aims to connect this approach with normalising flows. This talk will also feature Wenbo Gong, a student collaborator with me on theory & applications of Stein’s method. Poster Spotlights I Poster Session I Poster Room 1 - Poster Room 2 - Presenting papers Phiala Shanahan(MIT) Invited Talk: Flow models for theoretical particle and nuclear physics I will discuss opportunities for machine learning, in particular approaches based on normalizing flows, to accelerate first-principles lattice quantum field theory calculations in particle and nuclear physics. Particular challenges in this context include incorporating complex (gauge) symmetries into model architectures, and scaling models to the large number of degrees of freedom of state-of-the-art numerical studies. I will show the results of proof-of-principle studies that demonstrate that sampling from generative models can be orders of magnitude more efficient than traditional Hamiltonian/hybrid Monte Carlo approaches in this context. Marcus Brubaker(York) Invited Talk: Wavelet Flow: Fast Training of High Resolution Normalizing Flows This talk will introduce Wavelet Flow, a novel normalizing flow architecture which explicitly represents the scale-space structure of signals in the architecture of the normalizing flow through the use of wavelets. The result is a generative model which automatically includes models of images at resolutions small than that used for training and is able to perform super-resolution with not additional effort. Further, because of the structure of the architecture, each scale can be trained completely independently, leading to significant improvements in training efficiency and enabling the first reported normalizing flow model for 1024x1024 resolution images. This project is joint work with Jason Yu and Kosta Derpanis. Break Stefano Ermon(Stanford) Invited Talk: Maximum Likelihood Training of Score-Based Diffusion Models Existing generative models are typically based on explicit representations of probability distributions (e.g., autoregressive or VAEs) or implicit sampling procedures (e.g., GANs). We propose an alternative approach based on modeling directly the vector field of gradients of the data distribution (scores). Our framework allows flexible architectures, requires no sampling during training or the use of adversarial training methods. Additionally, score-based generative models enable exact likelihood evaluation through connections with normalizing flows. We produce samples comparable to GANs, achieving new state-of-the-art inception scores, and competitive likelihoods on image datasets. Ann-Kathrin Dombrowski Contributed Talk I: Diffeomorphic Explanations with Normalizing Flows Normalizing flows are diffeomorphisms which are parameterized by neural networks. As a result, they can induce coordinate transformations in the tangent space of the data manifold. In this work, we demonstrate that such transformations can be used to generate interpretable explanations for decisions of neural networks. More specifically, we perform gradient ascent in the base space of the flow to generate counterfactuals which are classified with great confidence as a specified target class. We analyze this generation process theoretically using Riemannian differential geometry and establish a rigorous theoretical connection between gradient ascent on the data manifold and in the base space of the flow. Maximilian Nickel(Facebook) Invited Talk: Modeling Spatio-Temporal Events via Normalizing Flows Aditya Ramesh(OpenAI) Invited Talk: TBA Marylou Gabrié Contributed Talk II: Efficient Bayesian Sampling Using Normalizing Flows to Assist Markov Chain Monte Carlo Methods Normalizing flows can generate complex target distributions and thus show promise in many applications in Bayesian statistics as an alternative or complement to MCMC for sampling posteriors. Since no data set from the target posterior distribution is available beforehand, the flow is typically trained using the reverse Kullback-Leibler (KL) divergence that only requires samples from a base distribution. This strategy may perform poorly when the posterior is complicated and hard to sample with an untrained normalizing flow. Here we explore a distinct training strategy, using the direct KL divergence as loss, in which samples from the posterior are generated by (i) assisting a local MCMC algorithm on the posterior with a normalizing flow to accelerate its mixing rate and (ii) using the data generated this way to train the flow. The method only requires a limited amount of a priori input about the posterior, and can be used to estimate the evidence required for model validation, as we illustrate on examples. Poster Spotlights II Poster Session II Poster Room 1 - Poster Room 2 - Presenting papers