Whittle index | an example

Consider the following RMAB problem instance. (For a review of what is an RMAB, check out here). A platform coordinates a total of $N$ volunteers for various tasks. At each time point, a task $k$ out of $K$ possible tasks arrives randomly with probability $f_k$, and the platform must decide whom to notify (activate) based on the likelihood of acceptance $p_{k, i}, i\in [N]$ and the potential reward for each volunteer incentivized, exogeneously given as $w_k$. ...

June 7, 2024

Whittle index policy in RMAB problem | technicals

Consider an RMAB instance with $N$ arms, where each arm $i \in [N]$ has a finite state space $\mathbb S_i$ and can receive an action $y_i^t \in {0, 1}$ (representing not pulling or pulling the arm, respectively) at each time step $t$. The state of arm $i$ at time $t$ is denoted by $s_i^t$. Depending on the action taken, a reward $r_i(s_i^t, y_i^t)$ is accrued. As a decision maker, our objective is to maximize the averaged total reward over an infinite time horizon, under a constraint that only $B$ arms can be pulled at any time step. This setup can be formalized as follows: $$ \max_{\pi::\mathbb S \to Y} \lim_{T \to \infty} \frac{1}{T} \mathbb{E} [\sum_{t = 1}^T \sum_{i\in [N]} r_i(s_i^t, y_i^t)]\ \text{s.t. } \sum_{i\in [N]} y_i^t \le B, \forall t $$ Consider a relaxed version where the ‘hard’ budget constraint is averaged over time: $$ \max_{\pi::\mathbb S \to Y} \lim_{T \to \infty} \frac{1}{T} \mathbb{E} [\sum_{t = 1}^T \sum_{i\in [N]} r_i(s_i^t, y_i^t)]\ \text{s.t. } \frac1T\mathbb E[\sum_{i\in [N]} y_i^t ]\le B $$ Let $\lambda$ be the Lagrangian multiplier for the relaxed budget constraint $\frac1T\mathbb E[\sum_{i\in [N]} y_i^t ]\le B$. The duality transforms into the following $$ \min_{\lambda} \max_{\pi::\mathbb S \to Y} \lim_{T \to \infty} \frac{1}{T} \left( \mathbb{E} \left[\sum_{t=1}^T \sum_{i \in [N]} r_i(s_i^t, y_i^t)\right] - \lambda \left(\mathbb{E} \left[\sum_{t=1}^T \sum_{i \in [N]} y_i^t\right] - TB\right) \right) $$ Notice that we can decouple the above optimization problem for every arm, by leaving out the budget constraint on the outside: $$ \begin{align} & \lim_{T \to \infty} \frac{1}{T} \left( \mathbb{E} \left[\sum_{t=1}^T \sum_{i \in [N]} r_i(s_i^t, y_i^t)\right] - \lambda \left(\mathbb{E} \left[\sum_{t=1}^T \sum_{i \in [N]} y_i^t\right] - TB\right) \right)\cr = & \lim_{T \to \infty} \frac{1}{T}\left(\sum_{i\in [N]} \left( \mathbb{E} \left[\sum_{t=1}^T r_i(s_i^t, y_i^t)\right] - \lambda \mathbb{E} \left[\sum_{t=1}^T y_i^t\right]\right) - \lambda TB\right) \end{align} $$ So, for each $i$, consider the inner maximization. $$ \max_{\pi::\mathbb S_i \to Y_i} \left(\lim_{T \to \infty} \frac{1}{T} \left( \mathbb{E} \left[\sum_{t=1}^T r_i(s_i^t, y_i^t)\right] - \lambda \left(\mathbb{E} \left[\sum_{t=1}^T y_i^t\right] - TB_i\right) \right)\right) $$ Notice that, the last term $\lambda TB$ can be dropped for each individual $i$ because it’s irrelevant with the inner maximization. Let: $$ \phi_i(\lambda) :=\max_{\pi::\mathbb S_i \to Y_i}\left(\lim_{T \to \infty} \frac{1}{T} \left( \mathbb{E} \left[\sum_{t=1}^T r_i(s_i^t, y_i^t)\right] - \lambda \mathbb{E} \left[\sum_{t=1}^T y_i^t\right] \right)\right) $$ The Lagrangian multiplier here now has an economic intepretation – the price charged for pulling the arm. Consider the optimal stationay policy $\pi_i^*$ for the above optimization problem. Define an arm $i$’s indexability: ...

June 6, 2024

RMAB and Whittle index | a modern approach to decision-making

Decision-making in dynamic environments over time presents unique challenges, particularly when the conditions influencing decisions are constantly changing. In such scenarios, traditional decision-making models often fall short. This is where Restless Multi-Armed Bandits (RMAB) come into play. The method provides a robust framework for modeling and optimizing decisions over time. Here’s a brief introduction of the concept of RMAB and the Whittle Index Policy. understanding Restless Multi-Armed Bandits what are RMAB? The multi-armed bandit problem is a classic version of the problem of optimal allocation of activity under certainty: simply saying, one has $n$ projects, the state of project $i\in [n]$ being denoted by $x_i(t)$​. ...

June 5, 2024

write-up | algorithmic classification and strategic effort

A memoir of Market Mechanism Design course’s final presentation report, based on: Algorithmic Classification and Strategic Effort Jon Kleinberg and Manish Raghavan | ACM SIGecom Exchanges, Vol. 18, No. 2, November 2020, Pages 53–60 motivation: difference in modelling strategic behavior and objectives–between econ/CS perspectives The principal-agent and strategic machine learning literatures appear to share a common goal: how should one structure a decision-making rule to account for the strategic actions of decision subjects? ...

June 4, 2024

Mostly OM diary | Practicing OR/OM in China

speaker: Zizhuo Wang | Prof., The Chinese University of Hong Kong-Shenzhen. Takeaway: point of view of bridging practice and research. For industry solutions: Need to consider a lot of details Need to be fast and intepretation, therefore requires simple solutions Don’t care about theory. For academic works: Need abstract to focus Graced with time Expect some generality, therefore theory. TALK ABSTRACT: In the past years, there have been growing number of companies in China that adopt OR methods in their operations. There are also several interesting research questions that arise in those application problems. In this talk, we present some recent projects and some of the research questions that come from those scenarios. ...

June 3, 2024

Mostly OM diary | Randomization in Product, Fulfillment, and Pricing as a Profit Lever

speaker: Ming Hu | Prof., University of Toronto Keys: random product offering/demand allocation/pricing algorithm. TALK ABSTRACT: First, we study blind boxes as a novel selling mechanism in which buyers purchase sealed packages containing unknown items, with the chance of uncovering rare or special items. We show how such product randomization introduced by the blind box can improve the seller’s profitability over traditional separate selling. Second, we study how an e-commerce platform should assign sequentially arriving customers to sellers who compete to sell identical products on the platform. The allocation rule may be random and dependent on the sellers’ inventory levels. We show how such demand fulfillment randomization can incentivize sellers to hold more inventory and improve the platform’s profitability and customer welfare. Third, we study randomized promotions in which the firm randomly offers discounts over time to sequentially arriving customers with heterogeneous valuations and patience levels. We show how price randomization can improve the firm’s profitability beyond deterministic pricing policies. ...

June 2, 2024

Mostly OM diary | Optimal Conditional Drug Approval

speaker: Peng Sun | Prof., Duke University TALK ABSTRACT: New prescription drugs require regulatory approval before drug makers can sell them. In some countries, regulators may conditionally approve a drug, which allows sales to begin before the developer has proven the drug’s efficacy. After further testing, the regulator may either grant final approval or reject the drug. We show that conditional approval not only speeds access to drugs but also encourages the development of drugs that would not have been pursued otherwise. Using mechanism design principles, we show that regulators should conditionally approve a drug even if it is ex-ante less likely to prove efficacious, under certain conditions. The regulator should approve the drug to encourage investment, especially when only the firm knows the testing cost. However, drugs that are less likely to prove efficacious should only be conditionally approved for a portion of the patient population if that is enough to motivate testing. Additionally, the impact of conditional approval is greater when the firm’s revenue increases over time. Finally, a regulator should sometimes commit that in the future it will grant final approval for a drug that narrowly misses the efficacy threshold in return for the firm testing an otherwise unprofitable drug.

June 1, 2024

Mostly OM diary | Allocating Divisible Resources on Arms with Unknown and Random Rewards

speaker: Ningyuan Chen | Prof., University of Toronto related paper: Allocating Divisible Resources on Arms with Unknown and Random Rewards TALK ABSTRACT: We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order b of the allocated resource. In particular, if the decision maker allocates resource $A_i$ to arm $i$ in a period, then the reward $Y_i$ is $Yi(Ai)=A_i\mu_i+A_i^b \xi_i$, where $\mu_i$ is the unknown mean and the noise $\xi_i$ is independent and sub-Gaussian. When the order $b$ ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for $b\in [0,1]$, and demonstrate a phase transition at $b=1/2$. The theoretical results hinge on a novel concentration inequality we have developed that bounds a linear combination of sub-Gaussian random variables whose weights are fractional, adapted to the filtration, and monotonic.

May 31, 2024

Mostly OM diary | The Limits of Personalization in Assortment Optimization

speaker: Guillermo Gallego | Prof., The Chinese University of Hong Kong-Shenzhen. TALK ABSTRACT: To study the limits of personalization, we introduce the notion of a clairvoyant firm that can read the mind of consumers and sell them the highest revenue product that they are willing to buy. We show how to compute the expected revenue of the clairvoyant firm for a class of rational discrete choice models, and develop prophet-type inequalities that provide performance guarantees for the expected revenue of the traditional assortment optimization firm (a TAOP firm) relative to the clairvoyant firm, and therefore to any effort to personalize assortments. In particular, we show that the expected revenue of the clairvoyant firm cannot exceed twice the expected revenue of the TAOP for the RCS model, the MNL, the GAM and the Nested-Logit Model. On the other hand, there are random utility models for which personalized assortments can earn up to $n$ times more than a TAOP firm, where $n$ is the number of products. Our numerical studies indicate that when the mean utilities of the products are heterogeneous among consumer types, and the variance of the utilities is small, firms can gain substantial benefits from personalized assortments. We support these observations, and others, with theoretical findings. While the consumers’ surplus can potentially be larger under personalized assortments, clairvoyant firms with pricing power can extract all surplus, and earn arbitrarily more than traditional firms that optimize over prices but do not personalize them. For the price-aware MNL, however, a clairvoyant firm can earn at most $\exp(1)$ more than a traditional firm. ...

May 30, 2024

Mostly OM 2024 Workshop at Tsinghua University, Beijing

What an elegant name! It reminded me of the book Mostly Harmless Econometrics. Mostly OM is an annual international workshop sponsored by the School of Economics and Management, the Research Center for Contemporary Management, of Tsinghua University, focusing on the state-of-the-art research in Operations Management (OM), broadly defined. Mostly OM aims to provide a forum for researchers, from China and overseas alike, to foster interaction and exchange ideas, learn from each other new methodologies and applications, and explore opportunities for collaboration. ...

May 29, 2024