I was feeling a bit nostalgic (and hungry) today after the weekly Friday macroeconomic discussion session, that suppose to end 11:20 but stretches all the way towards noon.

There happened to be an AI seminar right after, with lunch — I grabbed a sandwich, fully expecting myself to sit in the back, eating my feelings while absorbing some comforting high-level vibes. Instead, it turns out to be one of the most interesting theoretical advances at the forefront of self-supervised learning.

Slay

Slay

The paper was live on ArXiv two days ago:

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

Randall Balestriero, Yann LeCun | ArXiv https://doi.org/10.48550/arXiv.2511.08544

Apart from a brilliantly delicious Napolian Submarine Sandwich, here’s what I learnt:

  • JEPA (~= Self-supervised learning) is a family of models that learn by predicting representations (not a fixed set of labels), which makes them simpler, more stable, and more scalable than contrastive or generative approaches.
  • But in practice, JEPA training often relies on heuristics to prevent collapsed or weirdly shaped embedding spaces.
  • LeJEPA’s key idea: enforce a clean, well-behaved latent geometry by nudging the embedding distribution toward an isotropic Gaussian—the mathematically nicest, smoothest shape one should hope for.
  • To do this efficiently, they use a “sketched” statistical test: project the latent space in a bunch of random directions and penalize deviations from Gaussianity.
  • Conclusion: By enforcing simple, mathematically grounded structure on the embedding distribution—rather than relying on heuristics—self-supervised learning becomes both simpler and provably well-behaved, giving JEPA a solid theoretical foundation while remaining scalable.