A/B Testing

A general summary of goals for A/B testing.

Overview

A/B testing is a reliable technique to evaluate impact and enhance the relevancy and accuracy of its predictions and rankings. Promoted constantly runs A/B tests when launching model improvements or making changes to allocation. During A/B testing, Promoted splits your users into two groups: one group with Promoted, and one group without (using your existing system). Then, Promoted measures and compares the sales attributed to each group of users.

The baseline is whatever is the best possible experience the client is are currently providing without Promoted, and Promoted's goal is to achieve better results. Promoted’s framework allows for the assignment of experiment arm labels to users, facilitating any type of experiment assignment. In an A/B experiment, CONTROL is what users see without Promoted, and TEST is with Promoted’s optimizations. Promoted then compares the total sales in each group over the course of the experiment. Because Promoted integrates with clients' existing UI and is extremely low latency, the product experience will be the same for both groups, and customers won't be able to tell which system is serving the rankings (with the exception that Promoted's rankings may be more relevant).

Note that A/B testing does not capture all of Promoted's benefits, such as new seller activation and long-term user retention.

Cohorts

The user may or may not be part of an experiment that Hipcamp and Promoted are running. Promoted supports experimentation, allowing users to be split into a TEST arm, CONROL arm, and other groups to test particular features. Before the experiment, all users are excluded by default. There are two ways to add users to TEST or CONTROL:

  1. If you prefer to pre-generate a list of users for each group, you may batch upload the assignments to Promoted.
  2. You can use a triggering event, called a cohort assignment event, that assigns the user into either TEST or CONTROL. Otherwise, they remain excluded from the experiment. Specifying a cohort assignment (e.g., a user logging in, submitting a search query, etc) is way to communicate to Promoted which users should be in which group. Excluding some users from the experiment also helps increase the relevance and impact, and avoids diluting the experiment with users that may have nothing to do with the feature you're testing. With that said, in some experiments it may be better to use 100% of users, for increased data.

The following table is an example cohort format from a real introspection report from a Promoted client.

cohort idarmmembership idmembership event api timestampmembership client log timestamp
client_experiment_0214CONTROLf5c9b2a1-9d47-4e8c-b6f3-2a1ecf9b8e732024-06-25 18:04:322024-06-25 18:04:32
null-experimentCONTROLa5b9e3f7-4d2c-1a8f-9e0b-b5c2d4e6a7f82024-06-25 18:04:322024-06-25 18:04:32

Overlapping and limited tests

Clients can test Promoted while also testing other, non-Promoted changes (e.g., introducing new sites). Additionally, testing can be limited to a subset of users given specific constraints, such as their geographic location or surface.

Collaboration and Experiment Management

While Promoted complements existing experiment tools, it actively collaborates on experiment evaluation and can conduct internal A/B testing, relevant to both Promoted's system improvements and potential customer insights. Customers can either run their own experiments and share the assignments with Promoted or request Promoted to manage the experiment assignments.

Importance of Experiment Labels

It's essential for Promoted to receive experiment group labels. This is especially true for Promoted-related experiments, which assess the safety and impact of integration and help determine whether users experience Promoted's rankings or the default setup. Early experiments may compare different objectives, like click-through versus conversion rates, aiding in refining the business objectives integration.

Promoted-internal experiments

Promoted's system can autonomously assign users to different groups for experiments primarily affecting its own predictions and rankings. Promoted is constantly running experiments internally which may or may not be relevant to expose to end users, such as new infrastructure.

UI/UX Experimentation and Model Training

You may have have experiment tooling that you prefer to use for consistency with other experiments, or you may include metrics that are outside the scope of the Promoted's integration. Even if you will be evaluating your own experiment metrics, Promoted should still receive your experiment group labels for Promoted-related experiments. For other UI/UX and content experiments, we will need to receive the information, but it may be in other formats

Promoted's model needs comprehensive data to predict user behavior accurately. For instance, in UI experiments, knowing the context like display format (FULL versus COMPACT) helps the model learn that reviews impact click probability only in FULL displays. Similarly, experiments altering product titles (e.g., 'wool socks' versus 'cozy socks') should provide these variations during the delivery API call. This ensures the model can accurately learn from these changes, avoiding the need to relearn from scratch after the experiment ends.

Outcome Analysis

Customers can collaborate with Promoted to query the data warehouse for analysis to evaluate experiment outcomes. Outcomes are analyzed in data notebooks, and Promoted can provide periodic exports of assignments and the metrics needed for comprehensive analysis. This service is tailored to the unique goal metrics of each customer, acknowledging that many customers prefer their own experiment tools for consistency and comprehensive analysis, including metrics beyond Promoted's scope.

Promoted also generates custom reports on feature importance, aiding in understanding the impact of UI/UX changes on user behavior and optimizing future experiments.

Promoted serves as a versatile partner in the A/B testing process, offering robust integration capabilities, detailed analytical support, and a commitment to improving experiment outcomes and user experience insights.