UCL School of Management is delighted to welcome, Hamed Mamani, University of Washington, to host a research seminar discussing ‘Dynamic risk stratification and precision medicine using multi-armed bandits’
Multiple myeloma is an incurable cancer of bone marrow plasma cells with a median overall survival of 5 years. With the development of newly approved drugs to treat this disease over the last decade, physicians are afforded more opportunities to tailor treatment to individual patients and thereby improve survival outcomes and quality of life. However, since the optimal sequence of therapy is unknown, selecting a treatment that will result in the most effective outcome for each individual patient is challenging. In this work, we present a data-driven, analytical approach to develop dynamic personalized treatment recommendations for multiple myeloma patients, with the primary goal to maximize their overall survival.
We formulate the treatment recommendation problem as a Bayesian contextual bandit, which is an adaptive, online learning approach that sequentially selects treatments based on contextual information about the patients and the therapies. Facing the difficulty of evaluating the performance of the policy without field experiments, we develop a generative econometric model – the Hidden Markov model to profile patients’ dynamic risk levels and treatment responses. By integrating the structural econometric models into our bandit optimization framework and simulating patients’ treatment responses that are not seen in the data, we are able to generate counterfactuals to support the theoretical exploration/exploitation framework with empirical evidence, by comparing it with the current clinical practice, as well as other benchmark strategies. The evaluation of our model suggests the merits of an adaptive personalized approach that balances the exploration/exploitation trade-off in medical practices.