ariana's blog

Gittin Index's Optimality Proof by the Prevailing Charge Argument from Weber (1992)

For the discrete-time Bandit Process definition, consider a total of $n$ of these discrete-time Bandit Processes. The state of the bandits are $x_1(t), \ldots, x_n(t)$. One arm is allowed to be pulled at each $t$. Taking action $a_j, j\in [n]$ corresponds to pulling arm $j$ and generate reward $R_j(x_j(t))$. The aim is to maximize total-discounted ($\beta < 1$) reward, given initial state $\vec x(0)$. The Gittins Index Policy pulls the arm $j$ with the highest Gittins Index $\nu(x_j(t))$: $$ a(t) = \arg\max_j \nu_j(x_j(t)), $$ and is known to be optimal....

Gittins Index Closed-form Calculation

Definition Consider a discrete-time Bandit Process with one arm (dropping arm index $i$ for convenience): Markov state transition, binary action, reward associated with positive action (aka ‘pull’) denoted as $r(x(t))$ at each time point $t = 1, \ldots$ The states $x(t)$ doesn’t change if the arm is idle. And the goal is to maximize the discounted reward criteria: $$ \text{Reward}:=\mathbb E[\sum_t \beta^t r(x(t))]. $$ ($x(\cdot)$ or $x$ is state) The Gittins Index $v(x)$ is calculated for each state $x$ as $$ v(x) :=\sup_{\tau >0}\frac{\mathbb E[\sum_{t = 1}^\tau \beta^t r(x(t))\mid x(0) = x]}{\mathbb E[\sum_{t = 1}^\tau \beta^t\mid x(0) = x]} $$ Note $\tau$ is a past-measurable stopping time—so expectation is taken w....

Bandit Superprocess Model and Whittle Condition

In the standard discounted multi‑armed bandit (MAB) with binary “play/rest” decisions, Gittins’ index gives a provably optimal per‑arm priority rule. Section 3.2 of Q. Zhao (2021) For general bandit super process, index-based policy no longer works. Counterexample: Section 3.2 of Q. Zhao (2021) Actually, in many cases, the index is not even defined. But somehow if the problem is “indexable” satisfying the following condition, Index Policy still is optimal Whittle’s Indexability Condition Embed the superprocess in a two‑arm system together with a dummy arm that delivers a constant reward $c$....

See It All — No Column Left Behind in Pandas

Tired of df.head() giving you a mysterious “…”? When working with wide DataFrames, pandas hides columns by default—really? This simple line ensures every column stands proudly in your console: import pandas as pd pd.set_option('display.max_columns', None) # Show all columns Default slicing of data really drives people crazy. To solve this, one usually has to switch to VSCode or other graphic-based ipynb consoles for better visuals. But even in termainl, the full DataFrame view is just one setting away....

Childern's Day Music

Schumann’s Kinderszenen (Op. 15) is not really meant for kids—it’s like a memory-scape written from adulthood, looking backward with tenderness, longing, and the complexity only hindsight can afford. The simplicity of the music is deceptive. It’s technically accessible “Leichte Stücke”—but emotionally layered. Happy children’s day :)

Duanwu — Chaos, Comfort, and Coming Home

Duanwu, also known as the Dragon Boat Festival, is a traditional Chinese holiday celebrated on the 5th day of the 5th lunar month. It commemorates Qu Yuan, a patriotic poet who lived—and died—for his country over 2,000 years ago. But for me, Duanwu means something more personal. It marks the true beginning of summer—thick with heat, humidity, and a kind of restless energy. It’s a season that feels chaotic, slightly uncomfortable, yet oddly peaceful....

Spotify, and general recommendation algorithms with incentives

Here’s an interesting article about Spotify’s music recommendation algorithm: The surprising thing I learned from quitting Spotify Vox | Ada Estes | https://www.vox.com/even-better/404896/spotify-youtube-apple-music-amazon-playlists Spotify’s success key is the platform’s superior recommendation algorithm. What distinguishes Spotify is its 15-year archive of the user’s listening habits and an advanced algorithm that understands and reinforces personal tastes — something competitors like Apple Music failed to replicate (hmm?). Spotify’s strength lies in its hybrid use of content-based filtering (analyzing song attributes like genre, mood, and artist) and collaborative filtering (recommending music based on what similar users enjoy)....

Recommender systems as mechanisms for social learning

Believe it or not, ChatGPT dug this up for me among the ocean of economic literature! Recommender Systems as Mechanisms for Social Learning QJE 2017 | Yeon-Koo Che , Johannes Hörner. https://doi.org/10.1093/qje/qjx044 This article studies how a recommender system may incentivize users to learn about a product collaboratively. To improve the incentives for early exploration, the optimal design trades off fully transparent disclosure by selectively overrecommending the product (or “spamming”) to a fraction of users....

Morricone’s Lolita Theme

There’s a thread in Reddit “If someone ask why did you watch Lolita. What will be your answer?” The famous poster — For research purposes, thank you! Ennio Morricone (1928–2020) was an Italian composer and conductor best known for his prolific work in film music. He scored over 400 films and television series. He received two Academy Awards, three Grammys and three Golden Globes. Works includes The Good, the Bad and the Ugly (1966), Once Upon a Time in the West (1968), and Cinema Paradiso (1988) and The Legend of 1900 (1998), and...

Newton's Method🍎 for Constrained Optimization | Notes from YYYe's ORML Intensive Lectures

I used to think Newton’s method was a bit old-school. Turns out I was the one playing catch-up. This is about Newton’s method and a clever way to make it work for constrained optimization. Consider optimizing an unconstrained function $f(\cdot)$ whose Hessian exists. Newton’s method works around its gradient function, finding a point $x^\star$ such that $\nabla f(x^\star) = 0$. In an iterative way, at each point $x^k$, it approximates the gradient function $\nabla f(\cdot)$ around $x^k$ using its second-order derivative information $$ \nabla {\tilde f}(x^{k + 1}) \approx \nabla f(x^k) + \nabla^2 f(x^k)(x^{k + 1} - x^k), $$ and solve for $x^{k + 1}$ such that this approximation equals zero—which gives us the classic update: $$ x^{k + 1} = x^k - (\nabla^2 f(x^k))^{-1}\nabla f(x^k)....