Bandits with Switching Cost

Multi-armed bandits with switching costs are a special case of the restless-bandit model. Setup Consider the infinite-horizon, discounted MAB problem with finite state space $\mathcal S$, binary action set ${0,1}$ per arm ($1$ = pull, $0$ = idle), discount factor $0\le\beta<1$, arms evolve only when pulled (i.e. “static” when $a_i=0$), per-pull reward $r_i(s)$. We now add two costs for each arm $i$: switch-in cost $c_i$: paid (once) whenever we switch to arm $i$, ...

June 17, 2025

How music at work can fine-tune your research

Are you listening to anything the last time you read/wrote a paper? Sounds of science: how music at work can fine-tune your research Nature | https://www.nature.com/articles/d41586-023-00984-4 Researchers describe how listening to music at work can boost (or hamper) productivity, and share the tunes that keep them focused. TLDR: music cheers you up—almost like a mental massage, dopamine boosters. So it makes tedious, repetitive work less unenjoyable. But music also takes up the brain’s processing power, especially for people with musical training. Therefore, for creative work music hampers productivity. ...

June 16, 2025

RMAB Survey | Niño-Mora 2023

“The literature on the RMABP, whether on its theoretical, algorithmic, or application aspects, is currently vast to the point where it is virtually infeasible for researchers to keep up to date with the latest advances in the field.” True. Markovian Restless Bandits and Index Policies: A Review José Niño-Mora | Mathematics, 2023 The review is organized as follows. Section 2 surveys the antecedents to the RMABP, in particular, the classic MABP and the Gittins index policy. Section 3 formulates the RMABP and outlines the Whittle index policy. Section 4 reviews works on the complexity, approximation, and relaxations of the RMABP. Section 5 focuses on indexability, that is, the existence of the Whittle index and extensions. Section 6 discusses works on means of computing the Whittle index. Section 7 considers works that establish the optimality of the myopic index policy for the RMABP, whereas Section 8 reviews the asymptotic optimality of index policies. Section 9 surveys multi-action restless bandits. Section 10 considers policies that are different from Whittle’s and based on Lagrangian and fluid relaxations. Section 11 addresses reinforcement learning and, in particular, Q-learning solution approaches. Section 12 surveys works on the RMABP from the perspective of online learning. Sections 13 and 14 are devoted to works on applications of the RMABP in diverse settings. The former section focuses on MDP models and the latter on partially observable MDP (POMDP) models. Finally, Section 15 concludes this paper. ...

June 15, 2025

On the Alleged Lightness of Swan Lake as 'Ballet Music'

Someone wrote about Swan Lake music: Once ballet music leaves the stage and enters the concert hall or recording, it becomes a kind of group-form symphony. Yet, it lacks the weight of a true symphony or concerto, as the dance drama itself is inherently “light.” Composers have never used ballet to convey grand themes—after all, a fictional prince can leap and twirl, but imagine Peter the Great or Napoleon doing the same; the image borders on the absurd. It’s no surprise, then, that music historians have long undervalued ballet scores. ...

June 8, 2025

First Order Methods (Constrained) | Notes from YYYe's ORML Intensive Lectures

We’re back! Last time, we explored unconstrained first-order methods (FOMs), where gradient descent works well and its time-traveling cousins (momentum and acceleration) helped even more. Now adding constraints: Here’s how FOMs extend to constrained problems, especially equality constraints. We’ll walk through two major methods: The Augmented Lagrangian Method with Multipliers (ALMM) and its smoother, modular evolution—ADMM. For constrained problems like this: $$ \min_x ; f(x) \quad \text{s.t.} \quad h(x) = 0,; x \in X $$ ...

June 7, 2025

Implicitly Convex Fractional Optimization | Notes from YYYe's ORML Intensive Lectures

A lot of seemingly non-convex optimization problems are de facto convex. For example $$ \begin{align*} \min_{a_i, r_i}& ;\frac{a_i}{r_i}\cr s.t.& ;a_i, r_i \ge0 \end{align*}\tag{1} $$ can actually be massaged into a convex optimization. Let $x_i\ge \frac{a_i}{r_i}$, optimization $(1)$ is equivalent to $$ \begin{align*} \min_{a_i, r_i, x_i}& ;x_i\cr s.t.& ;a_i, r_i \ge0 \end{align*}\tag{2} $$ And is equivalent to $$ \begin{align*} \min_{a_i, r_i, x_i}& ;x_i\cr s.t.& ;\begin{bmatrix} x_i & \sqrt{a_i}\cr \sqrt{a_i}& r_i \end{bmatrix}\succeq0\cr & a_i, r_i \ge 0 \end{align*} $$ So now it’s in Positive Semidefinite (PSD) Programming, and it is convex. ...

June 6, 2025

Gittin Index's Optimality Proof by the Prevailing Charge Argument from Weber (1992)

For the discrete-time Bandit Process definition, consider a total of $n$ of these discrete-time Bandit Processes. The state of the bandits are $x_1(t), \ldots, x_n(t)$. One arm is allowed to be pulled at each $t$. Taking action $a_j, j\in [n]$ corresponds to pulling arm $j$ and generate reward $R_j(x_j(t))$. The aim is to maximize total-discounted ($\beta < 1$) reward, given initial state $\vec x(0)$. The Gittins Index Policy pulls the arm $j$ with the highest Gittins Index $\nu(x_j(t))$: $$ a(t) = \arg\max_j \nu_j(x_j(t)), $$ and is known to be optimal. ...

June 5, 2025

Gittins Index Closed-form Calculation

Definition Consider a discrete-time Bandit Process with one arm (dropping arm index $i$ for convenience): Markov state transition, binary action, reward associated with positive action (aka ‘pull’) denoted as $r(x(t))$ at each time point $t = 1, \ldots$ The states $x(t)$ doesn’t change if the arm is idle. And the goal is to maximize the discounted reward criteria: $$ \text{Reward}:=\mathbb E[\sum_t \beta^t r(x(t))]. $$ ($x(\cdot)$ or $x$ is state) The Gittins Index $v(x)$ is calculated for each state $x$ as $$ v(x) :=\sup_{\tau >0}\frac{\mathbb E[\sum_{t = 1}^\tau \beta^t r(x(t))\mid x(0) = x]}{\mathbb E[\sum_{t = 1}^\tau \beta^t\mid x(0) = x]} $$ Note $\tau$ is a past-measurable stopping time—so expectation is taken w.r.t. the potentially stochastic state transitions, including $\tau$. ...

June 4, 2025

Bandit Superprocess Model and Whittle Condition

In the standard discounted multi‑armed bandit (MAB) with binary “play/rest” decisions, Gittins’ index gives a provably optimal per‑arm priority rule. Section 3.2 of Q. Zhao (2021) For general bandit super process, index-based policy no longer works. Counterexample: Section 3.2 of Q. Zhao (2021) Actually, in many cases, the index is not even defined. But somehow if the problem is “indexable” satisfying the following condition, Index Policy still is optimal ...

June 3, 2025

See It All — No Column Left Behind in Pandas

Tired of df.head() giving you a mysterious “…”? When working with wide DataFrames, pandas hides columns by default—really? This simple line ensures every column stands proudly in your console: import pandas as pd pd.set_option('display.max_columns', None) # Show all columns Default slicing of data really drives people crazy. To solve this, one usually has to switch to VSCode or other graphic-based ipynb consoles for better visuals. But even in termainl, the full DataFrame view is just one setting away. ...

June 2, 2025