Feature Importance Report

The feature importance report is a spreadsheet describing which features were seen, which models the features were used in, how important these features were, and how each feature's importance changes over time. This helps data science, ML, and business teams understand what signals drive performance over time, and why.

The full Google Sheet is found here.

Daily Feature Report Links

When new models are trained, new feature reports are generated. These reports are always generated with the latest daily model, even if the latest daily model was not published to production. See Figure 1. These reports show the latest feature importances and how they have changed over time. The three reports are:

Sample Model Reports

Figure 1: Daily Feature Reports Links in Daily ML Reports

  • Descriptive Feature Importance: A simplified, unified report that best answers "what features drive model performance overall" in an intelligible way. Excludes engagement features to help show the importance of content features that are otherwise made redundant by item engagement features.
  • Debug Feature Importance: A more technical report that shows feature importance for all features for all models, including engagement features. More useful for engineering and debugging.
  • Feature Coverage Report: Statistics of features used in the models.

Understanding Feature Importance Reports

See Figures 2 to 4 for examples of Feature Importance Reports. Feature Importance Reports are Google Spreadsheets and have all the features of an ordinary Google spreadsheet: copy, share, comment, use, and edit. Spreadsheets are saved in a shared Google Drive directory that you can share with your team.

Sample Feature Categories

Figure 2: Top-level feature categories in the Descriptive Feature Importance report

"Importance" is a heuristic for how much prediction results are influenced by a particular feature. "Importance" is computed using GBDT, a popular non-linear ML algorithm. We use the "Total Gain" metric; other importance metrics are in another tab in the Debug report. Technically, Promoted does not directly use GBDT in production models except for very small customers, although depending on your data size, we may use one or several GBDTs as automated feature transformations. Promoted uses neural networks in production.

Sample Item importance

Figure 3: Expansion of the "Content" feature category, which are features provided by the customer in the Item CMS system.

Some features do not explain ranking performance by themselves. For example, "device type" is constant for all items on the same search page, so it is unlikely to affect search rankings much. Instead, some features capture bias in labels so that other features can be more useful.

Features are grouped by the feature source. You can expand or collapse a group of features for easier intelligibility.

  • Promoted.ai: Features generated by Promoted.ai's systems, like counter features and personalization
  • User: Features derived from the User CMS system
  • Content: Features derived from the Item CMS system
  • Request: Features derived from information passed in the Delivery SDK on Request, Request.Properties, or Insertion.Properties
  • Custom-built: Special features that Promoted can custom build for some customers
Sample Promoted Features

Figure 4: Expansion of the "Promoted.ai" feature category, which are features generated by Promoted.ai automatically