High Dimensional Statistics

Chairs: Marylou Gabrie (NYU-Flatiron Institute) and Ilias Zadik (NYU/MIT)

Time: August 16th, 1:10pm-2pm ET, 19:10-20:00 CET, 01:10-02:00 GMT+8

  • Sharp threshold for alignment of graph databases with Gaussian weights, Luca Ganassali (INRIA (Paris).

Paper Highlight, by Galen Reeves

The problem of graph alignment is to find the correspondence between the vertices in two graphs based on matching values associated with the vertices. This paper studies the fundamental limits of a particular version of this problem where the weights are correlated Gaussian variables. Recent work by Wu, Xu, and Yu, proves a sharp threshold (in terms of the number of vertices and the degree of correlation) for the problem of detecting whether a correspondence exists. The main contribution of this paper is to show that that same threshold also delineates between achievability and impossibility for the problem of exactly recovering the underlying correspondence. On the theoretical side, there has been lots of interest in pinning down exact thresholds for detection and recovery in high-dimensional inference problems and this paper fits in nicely with this line of work. A particular strength of this paper is the results are sharp and the proofs are clearly described.

slides video paper

  • Construction of optimal spectral methods in phase retrieval, Antoine Maillard (Ecole Normale Supérieure), Yue Lu (Harvard University, USA); Lenka Zdeborova (EPFL); Florent Krzakala (École Polytechnique Fédérale de Lausanne)

Paper Highlight, by Gabor Csanyi

In the last years, analytical tools from the physics of disordered systems have been shown to be very effective in precisely characterizing asymptotic thresholds and behaviors in inference and learning settings, and in deriving message passing schemes saturating algorithmic optimality. A paradigmatic example is the phase retrieval problem, where approximate message passing achieves perfect recovery of the signal down to sampling rates very close to the information theoretic threshold, as recently (and rigorously!) showed by some of the authors. Quite surprisingly, in this nice paper, it is demonstrated that those very same tools can be used to investigate a seemingly unrelated family of spectral methods for phase retrieval, typically used as warm-start initialization in gradient-based approaches. The optimal spectral method is constructively obtained using two different approaches (Bethe hessian and BP linearization). Their non-trivial relationship is carefully discussed. A related analysis was carried out in the past for the community detection problem: it is nice to see how those analytical tools evolved and adapted to different settings, that many analogies persist but some surprising behaviors are found (I’m talking to you, top eigenvalue of linearized BP!), that the capacity to deal with the large family of rotationally invariant ensembles (instead of Gaussian i.i.d. matrices) is now part of our skillset. Another result, one with a good taste to it, is the fact that weak recovery is already achieved at the algorithmic threshold by the optimal method in this simple spectral family. This paper is a great read for a dive intersecting disciplines, sapiently combining many years of accumulated progress and bringing us a further step ahead.

slides video paper

  • Reduced Order Modeling using Shallow ReLU Networks with Grassmann Layers, Hayden Schaeffer (Carnegie Mellon University ), Kayla Bollinger (Carnegie Mellon University)

Paper Highlight, by Ben Peherstorfer

This work proposes an innovative model reduction approach that learns a reduced approximation of the inputs together with a surrogate map from the reduced inputs to outputs. This is in contrast to classical model reduction methods that first construct a reduced space and then learn a surrogate model in a second, separate step. The reviewers highlighted that the proposed surrogate model achieves high expressiveness even with only few network parameters that can be trained with few data points, which makes the proposed approach very suitable for science and engineering applications with scarce data. The reviewers also especially liked the clear formulation of the proposed surrogate model as a neural network where the first layer is confined to be on the Grassmann manifold so that it corresponds to a projection of the inputs, which leads to efficient training methods for the non-convex learning problem.

slides video paper

  • Certifying Robustness of Graph Laplacian-Based Semi-Supervised Learning, Matthew Thorpe (University of Manchester); Bao Wang (University of Utah)

Paper Highlight, by Sebastian Goldt

This paper considers the problem of adversarial robustness for Graph Laplacian (GL) based learning. It focuses on the semi-supervised setting, where only some of the graph nodes are labeled, and provides a bound on how the classification estimate is affected if an adversary replaces the clean dataset with another, corrupted dataset of the same size. I liked the mix of rigorous results - the authors bound on the difference in the classification error before and after the attack - and the empirical comparison to other algorithms, in this case k-nearest neighbour methods, and I hope to see more work on adversarial robustness in this context in the future!

slides video paper

  • The Gaussian equivalence of generative models for learning with shallow neural networks, Sebastian Goldt (International School for Advanced Studies (SISSA)), Bruno Loureiro (EPFL); Galen Reeves (Duke); Florent Krzakala (École Polytechnique Fédérale de Lausanne); Marc Mezard (ENS); Lenka Zdeborova (EPFL)

slides video paper