Online SuBmodular + SuPermodular (BP) Maximization with Bandit Feedback

Read original: arXiv:2207.03091 - Published 5/14/2024 by Adhyyan Narang, Omid Sadeghi, Lillian J Ratliff, Maryam Fazel, Jeff Bilmes

🌿

Overview

The paper extends previous work on submodular optimization to more general non-submodular objectives.
This includes objectives that are additively decomposable into a monotone submodular and monotone supermodular term, as well as those that are only weakly submodular.
This allows representing both competitive (submodular) and complementary (supermodular) relationships between objects, expanding the applicability to a broader range of real-world problems.
The paper also studies a novel framework where feedback is available separately for each term in the two-term case.
Nystrom sketching techniques are integrated to reduce computational cost, including for the purely submodular case.
Sub-linear theoretical regret bounds are shown in the Gaussian process contextual bandits setting.
Empirical results demonstrate good applicability to recommendation systems and data subset selection.

Plain English Explanation

The paper explores a new approach to online interactive machine learning with combinatorial objectives. Previous work had focused on optimizing submodular objectives, which capture competitive relationships between objects. However, many real-world problems, such as movie recommendations or medical treatments, also involve complementary relationships between objects.

The researchers extend the prior work to handle more general non-submodular objectives. This includes objectives that can be broken down into a sum of a monotone submodular term and a monotone supermodular term, as well as those that are only weakly submodular. By allowing both competitive and complementary relationships, the approach can be applied to a wider range of real-world applications.

In the two-term case, the paper also explores a novel framework where feedback is provided separately for each term, rather than a single monolithic feedback. This adds flexibility and could be beneficial in certain scenarios.

To make the approach more practical and scalable, the researchers integrate Nystrom sketching techniques to significantly reduce the computational cost, even for the purely submodular case. This is important for deploying these methods in real-world systems.

The paper also provides theoretical regret bounds in the Gaussian process contextual bandits setting, and demonstrates good empirical performance on recommendation systems and data subset selection tasks.

Technical Explanation

The paper extends prior work on submodular optimization to more general non-submodular objectives. Specifically, it considers two types of non-submodular objectives:

Additively decomposable objectives: These can be expressed as the sum of a monotone submodular term and a monotone supermodular term, known as a BP decomposition.
Weakly submodular objectives: These do not necessarily satisfy the strict submodularity condition but still exhibit some submodular-like properties.

By considering these more general objectives, the approach can capture both competitive (submodular) and complementary (supermodular) relationships between objects, expanding the applicability to a wider range of real-world problems.

In the two-term additively decomposable case, the paper studies not only the typical monolithic feedback approach but also a novel framework where feedback is available separately for each term. This could be beneficial in certain scenarios where the components of the objective have distinct interpretations or dynamics.

To address the computational challenges, the researchers integrate Nystrom sketching techniques, which allow for significant reductions in computational cost, even for the purely submodular case. This is an important consideration for deploying these methods in practical, large-scale systems.

The theoretical analysis shows sub-linear regret bounds in the Gaussian process contextual bandits setting for all the considered cases (submodular, non-submodular with BP decomposition, and weakly submodular).

The empirical evaluation demonstrates the applicability of the proposed methods to recommendation systems and data subset selection tasks, showcasing their practical relevance and performance.

Critical Analysis

The paper makes a valuable contribution by extending submodular optimization to more general non-submodular objectives, which expands the range of real-world problems that can be effectively addressed. The inclusion of complementary relationships between objects is a key enhancement that allows the approach to be applied to a broader set of applications.

However, the paper does not provide a detailed discussion of the limitations or potential downsides of the proposed methods. For example, it would be helpful to understand the computational complexity and scalability of the Nystrom sketching techniques, particularly as the problem size or number of terms in the decomposition increases.

Additionally, the paper could have delved deeper into the practical implications and considerations for deploying these methods in real-world systems. For instance, how would the separate feedback framework for the two-term case work in practice, and what are the potential challenges or trade-offs involved?

While the theoretical regret bounds and empirical results are promising, it would be valuable to see more extensive experimentation across a wider range of real-world scenarios to better understand the strengths, weaknesses, and practical applicability of the proposed approach.

Overall, the paper makes a valuable contribution, but a more in-depth discussion of the limitations, practical considerations, and potential avenues for future research would further strengthen the work.

Conclusion

This paper presents an important extension of submodular optimization to more general non-submodular objectives, which allows for the representation of both competitive and complementary relationships between objects. By considering additively decomposable objectives and weakly submodular objectives, the approach can be applied to a broader range of real-world problems, such as recommendation systems and data subset selection.

The integration of Nystrom sketching techniques to reduce computational cost is a key practical enhancement, making the methods more scalable and suitable for deployment in large-scale systems. The theoretical regret bounds and empirical results demonstrate the effectiveness of the proposed approach.

While the paper could have provided more detailed discussion of the limitations and practical considerations, it nevertheless represents a significant advancement in the field of combinatorial optimization and interactive machine learning. The ability to handle a wider range of objective functions opens up new possibilities for tackling complex real-world challenges in a more flexible and effective manner.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌿

Online SuBmodular + SuPermodular (BP) Maximization with Bandit Feedback

Adhyyan Narang, Omid Sadeghi, Lillian J Ratliff, Maryam Fazel, Jeff Bilmes

In the context of online interactive machine learning with combinatorial objectives, we extend purely submodular prior work to more general non-submodular objectives. This includes: (1) those that are additively decomposable into a sum of two terms (a monotone submodular and monotone supermodular term, known as a BP decomposition); and (2) those that are only weakly submodular. In both cases, this allows representing not only competitive (submodular) but also complementary (supermodular) relationships between objects, enhancing this setting to a broader range of applications (e.g., movie recommendations, medical treatments, etc.) where this is beneficial. In the two-term case, moreover, we study not only the more typical monolithic feedback approach but also a novel framework where feedback is available separately for each term. With real-world practicality and scalability in mind, we integrate Nystrom sketching techniques to significantly reduce the computational cost, including for the purely submodular case. In the Gaussian process contextual bandits setting, we show sub-linear theoretical regret bounds in all cases. We also empirically show good applicability to recommendation systems and data subset selection.

5/14/2024

Linear Submodular Maximization with Bandit Feedback

Wenjing Chen, Victoria G. Crawford

Submodular optimization with bandit feedback has recently been studied in a variety of contexts. In a number of real-world applications such as diversified recommender systems and data summarization, the submodular function exhibits additional linear structure. We consider developing approximation algorithms for the maximization of a submodular objective function $f:2^Utomathbb{R}_{geq 0}$, where $f=sum_{i=1}^dw_iF_{i}$. It is assumed that we have value oracle access to the functions $F_i$, but the coefficients $w_i$ are unknown, and $f$ can only be accessed via noisy queries. We develop algorithms for this setting inspired by adaptive allocation algorithms in the best-arm identification for linear bandit, with approximation guarantees arbitrarily close to the setting where we have value oracle access to $f$. Finally, we empirically demonstrate that our algorithms make vast improvements in terms of sample efficiency compared to algorithms that do not exploit the linear structure of $f$ on instances of move recommendation.

7/4/2024

From Linear to Linearizable Optimization: A Novel Framework with Applications to Stationary and Non-stationary DR-submodular Optimization

Mohammad Pedramfar, Vaneet Aggarwal

This paper introduces the notion of upper linearizable/quadratizable functions, a class that extends concavity and DR-submodularity in various settings, including monotone and non-monotone cases over different convex sets. A general meta-algorithm is devised to convert algorithms for linear/quadratic maximization into ones that optimize upper quadratizable functions, offering a unified approach to tackling concave and DR-submodular optimization problems. The paper extends these results to multiple feedback settings, facilitating conversions between semi-bandit/first-order feedback and bandit/zeroth-order feedback, as well as between first/zeroth-order feedback and semi-bandit/bandit feedback. Leveraging this framework, new algorithms are derived using existing results as base algorithms for convex optimization, improving upon state-of-the-art results in various cases. Dynamic and adaptive regret guarantees are obtained for DR-submodular maximization, marking the first algorithms to achieve such guarantees in these settings. Notably, the paper achieves these advancements with fewer assumptions compared to existing state-of-the-art results, underscoring its broad applicability and theoretical contributions to non-convex optimization.

5/15/2024

🏅

Submodular Reinforcement Learning

Manish Prajapat, Mojm'ir Mutn'y, Melanie N. Zeilinger, Andreas Krause

In reinforcement learning (RL), rewards of states are typically considered additive, and following the Markov assumption, they are $textit{independent}$ of states visited previously. In many important applications, such as coverage control, experiment design and informative path planning, rewards naturally have diminishing returns, i.e., their value decreases in light of similar states visited previously. To tackle this, we propose $textit{submodular RL}$ (SubRL), a paradigm which seeks to optimize more general, non-additive (and history-dependent) rewards modelled via submodular set functions which capture diminishing returns. Unfortunately, in general, even in tabular settings, we show that the resulting optimization problem is hard to approximate. On the other hand, motivated by the success of greedy algorithms in classical submodular optimization, we propose SubPO, a simple policy gradient-based algorithm for SubRL that handles non-additive rewards by greedily maximizing marginal gains. Indeed, under some assumptions on the underlying Markov Decision Process (MDP), SubPO recovers optimal constant factor approximations of submodular bandits. Moreover, we derive a natural policy gradient approach for locally optimizing SubRL instances even in large state- and action- spaces. We showcase the versatility of our approach by applying SubPO to several applications, such as biodiversity monitoring, Bayesian experiment design, informative path planning, and coverage maximization. Our results demonstrate sample efficiency, as well as scalability to high-dimensional state-action spaces.

5/27/2024