First Order Methods (Constrained) | Notes from YYYe's ORML Intensive Lectures

We’re back! Last time, we explored unconstrained first-order methods (FOMs), where gradient descent works well and its time-traveling cousins (momentum and acceleration) helped even more. Now adding constraints: Here’s how FOMs extend to constrained problems, especially equality constraints. We’ll walk through two major methods: The Augmented Lagrangian Method with Multipliers (ALMM) and its smoother, modular evolution—ADMM. For constrained problems like this: $$ \min_x ; f(x) \quad \text{s.t.} \quad h(x) = 0,; x \in X $$ ...

June 7, 2025

Implicitly Convex Fractional Optimization | Notes from YYYe's ORML Intensive Lectures

A lot of seemingly non-convex optimization problems are de facto convex. For example $$ \begin{align*} \min_{a_i, r_i}& ;\frac{a_i}{r_i}\cr s.t.& ;a_i, r_i \ge0 \end{align*}\tag{1} $$ can actually be massaged into a convex optimization. Let $x_i\ge \frac{a_i}{r_i}$, optimization $(1)$ is equivalent to $$ \begin{align*} \min_{a_i, r_i, x_i}& ;x_i\cr s.t.& ;a_i, r_i \ge0 \end{align*}\tag{2} $$ And is equivalent to $$ \begin{align*} \min_{a_i, r_i, x_i}& ;x_i\cr s.t.& ;\begin{bmatrix} x_i & \sqrt{a_i}\cr \sqrt{a_i}& r_i \end{bmatrix}\succeq0\cr & a_i, r_i \ge 0 \end{align*} $$ So now it’s in Positive Semidefinite (PSD) Programming, and it is convex. ...

June 6, 2025

Gittin Index's Optimality Proof by the Prevailing Charge Argument from Weber (1992)

For the discrete-time Bandit Process definition, consider a total of $n$ of these discrete-time Bandit Processes. The state of the bandits are $x_1(t), \ldots, x_n(t)$. One arm is allowed to be pulled at each $t$. Taking action $a_j, j\in [n]$ corresponds to pulling arm $j$ and generate reward $R_j(x_j(t))$. The aim is to maximize total-discounted ($\beta < 1$) reward, given initial state $\vec x(0)$. The Gittins Index Policy pulls the arm $j$ with the highest Gittins Index $\nu(x_j(t))$: $$ a(t) = \arg\max_j \nu_j(x_j(t)), $$ and is known to be optimal. ...

June 5, 2025

Gittins Index Closed-form Calculation

Definition Consider a discrete-time Bandit Process with one arm (dropping arm index $i$ for convenience): Markov state transition, binary action, reward associated with positive action (aka ‘pull’) denoted as $r(x(t))$ at each time point $t = 1, \ldots$ The states $x(t)$ doesn’t change if the arm is idle. And the goal is to maximize the discounted reward criteria: $$ \text{Reward}:=\mathbb E[\sum_t \beta^t r(x(t))]. $$ ($x(\cdot)$ or $x$ is state) The Gittins Index $v(x)$ is calculated for each state $x$ as $$ v(x) :=\sup_{\tau >0}\frac{\mathbb E[\sum_{t = 1}^\tau \beta^t r(x(t))\mid x(0) = x]}{\mathbb E[\sum_{t = 1}^\tau \beta^t\mid x(0) = x]} $$ Note $\tau$ is a past-measurable stopping time—so expectation is taken w.r.t. the potentially stochastic state transitions, including $\tau$. ...

June 4, 2025

Bandit Superprocess Model and Whittle Condition

In the standard discounted multi‑armed bandit (MAB) with binary “play/rest” decisions, Gittins’ index gives a provably optimal per‑arm priority rule. Section 3.2 of Q. Zhao (2021) For general bandit super process, index-based policy no longer works. Counterexample: Section 3.2 of Q. Zhao (2021) Actually, in many cases, the index is not even defined. But somehow if the problem is “indexable” satisfying the following condition, Index Policy still is optimal ...

June 3, 2025

See It All — No Column Left Behind in Pandas

Tired of df.head() giving you a mysterious “…”? When working with wide DataFrames, pandas hides columns by default—really? This simple line ensures every column stands proudly in your console: import pandas as pd pd.set_option('display.max_columns', None) # Show all columns Default slicing of data really drives people crazy. To solve this, one usually has to switch to VSCode or other graphic-based ipynb consoles for better visuals. But even in termainl, the full DataFrame view is just one setting away. ...

June 2, 2025

Childern's Day Music

Schumann’s Kinderszenen (Op. 15) is not really meant for kids—it’s like a memory-scape written from adulthood, looking backward with tenderness, longing, and the complexity only hindsight can afford. The simplicity of the music is deceptive. It’s technically accessible “Leichte Stücke”—but emotionally layered. Happy children’s day :)

June 1, 2025

Duanwu — Chaos, Comfort, and Coming Home

Duanwu, also known as the Dragon Boat Festival, is a traditional Chinese holiday celebrated on the 5th day of the 5th lunar month. It commemorates Qu Yuan, a patriotic poet who lived—and died—for his country over 2,000 years ago. But for me, Duanwu means something more personal. It marks the true beginning of summer—thick with heat, humidity, and a kind of restless energy. It’s a season that feels chaotic, slightly uncomfortable, yet oddly peaceful. Everything about this festival is part of my comfort zone. The traditions. The food—especially zongzi (sticky rice dumplings), my all-time favorite. The heat. It all brings me back to childhood, to a place I belong. ...

May 31, 2025

Spotify, and general recommendation algorithms with incentives

Here’s an interesting article about Spotify’s music recommendation algorithm: The surprising thing I learned from quitting Spotify Vox | Ada Estes | https://www.vox.com/even-better/404896/spotify-youtube-apple-music-amazon-playlists Spotify’s success key is the platform’s superior recommendation algorithm. What distinguishes Spotify is its 15-year archive of the user’s listening habits and an advanced algorithm that understands and reinforces personal tastes — something competitors like Apple Music failed to replicate (hmm?). Spotify’s strength lies in its hybrid use of content-based filtering (analyzing song attributes like genre, mood, and artist) and collaborative filtering (recommending music based on what similar users enjoy). With access to the listening patterns of 675 million users, Spotify can create surprisingly effective and serendipitous recommendations. ...

May 30, 2025

Recommender systems as mechanisms for social learning

Believe it or not, ChatGPT dug this up for me among the ocean of economic literature! Recommender Systems as Mechanisms for Social Learning QJE 2017 | Yeon-Koo Che , Johannes Hörner. https://doi.org/10.1093/qje/qjx044 This article studies how a recommender system may incentivize users to learn about a product collaboratively. To improve the incentives for early exploration, the optimal design trades off fully transparent disclosure by selectively overrecommending the product (or “spamming”) to a fraction of users. Under the optimal scheme, the designer spams very little on a product immediately after its release but gradually increases its frequency; she stops it altogether when she becomes sufficiently pessimistic about the product. The recommender’s product research and intrinsic/naive users “seed” incentives for user exploration and determine the speed and trajectory of social learning. Potential applications for various Internet recommendation platforms and implications for review/ratings inflation are discussed. ...

May 29, 2025