CMS and Request Property Feature Formats
How to use the User and Item CMS systems and Request and Insertion Properties to create automatic production features
Information sent to Promoted.ai via the User and Item Content Management Systems (CMS) or as part of the Delivery API Request Properties or Request Insertion Properties is automatically included in all machine learning models for optimization. These systems support arbitrarily-formatted JSON and perform general feature transformations to make information available for machine learning and AI. You do not need to do special feature engineering or annotation to use information sent to Promoted for use in modeling.
Example JSON CMS Records
We'll reference these example CMS JSON records in the documentation below.
Example Item CMS Record
{
"valueScore": {
"c": 1.8462813912476075,
"a": 1.2222,
"b": 2
},
"status_details": {
"copyright": true,
"author": "John Doe"
},
"is_enabled": false,
"standards_label": [],
"subscores": {},
"categories": [
"food",
"drinks",
"chicken"
],
"title": "Grandmas Chicken Receipe with Ginger",
"created_at": "2022-11-10",
"is_holiday": true,
"seasonal_sale_pattern_usa": 0.1,
"seasonal_sale_pattern_australia": 0.9
}
Example User CMS Record
{
"country": "USA",
"num_purchases": 34,
"created_at": "2021-02-10",
"categories": [
"cats",
"food",
"chicken"
],
"purchased": [
"content_ID_1234",
"content_ID_3482"
],
"liked": [
"content_ID_1234",
"content_ID_3482"
],
"followed_authors": [
"John Doe",
"Mary Smith",
"Bill Gates"
]
}
How Features are Generated per Record
Numeric and Boolean Features
Keys paths with floats or integer values will be converted to numeric features. Boolean features have the value of 0 (false) or 1 (true).
Strings that can be cast to a numeric value will be converted and used like numeric values.
Categorical (string-valued) Features
Key paths with string values are converted to "Sparse ID" features are used in defining blender rules like diversity rules. For example, status_details.author
will create a Sparse ID feature that could be used to define diversity allocation rules. String values that can be cast to an integer will also be used as a categorical feature in addition to being used as a numeric feature.
For identifiers (ID), use string types, not integer types. Integer types may be cast to "numeric types" and lose precision, making them inappropriate for use as identifiers.
String (Text) Features
Strings that resemble text (words separated by spaces) are processed into word lists for use in sparse modeling on text with some trimming on the first few words to save resources. Words that match on the Request.Properties may set additional text matching features.
Some string features represent titles and descriptions, by default title
and description
(these fields can be configured per customer by Promoted. These strings get additional processing to include more words and more complex string matching feature generation. By default, titles and descriptions are also used in content embedding transformations.
Identity Features
Integer-values float-values with at most 2 digits of precision, and string-valued keypaths will create a "sparse" identity feature with a name with the pattern keypath=value
and a value of 1
. Identity features are used in sparse ML models. For example, valueScore.b=2
= 1.0
.
Maps
Each path in every map is a unique key path and generates at least one feature. For example, valueScore
generates at least 3 features:
valueScore.a
=1.2222
valueScore.b
=2.0
valueScore.c
=1.8462813912476075
Additionally, a feature indicating if the map is empty is set per map. For example:
subscores.empty
= 1.0
valueScores.empty
= 0.0
Lists
Two keypaths are generated for every item in a list: an ordered path, and an unordered path. The ordered path uses the integer position in the path, and the unordered path uses |
for every position in the path. For example:
categories.0=food
= 1.0
categories.1=drinks
=1.0
categories.|=food
=1.0
categories.|=drinks
=1.0
There is a limit of at most 100 items in any list. If you need a longer list (e.g., to send a list of the most recently viewed pages), then you should be using the Measure system infrastructure to send this information to Promoted, which is dramatically more efficient and happens in real-time.
Timestamps
If a string looks like an ISO8601 timestamp string format or is a float or int within the reasonable range of an Epoch Unix timestamp in seconds or milliseconds, then Promoted will automatically represent this feature as a time-from-now feature with the same name. For example, if the value of an entry in the Item CMS is created_at="2022-11-10"
, then the value of the feature created_at
will be the number of milliseconds at the time of Delivery when logged and when used in machine learning inference or blender rules. If 25 hours later, then the feature value will be 25 _ 60 _ 60 _ 1000 = 90,000
. If 43.2 hours later, then 43.2 _ 60 _ 60 _ 1000 = 155,520
.
Promoted will still create an "key=value"=1
identity feature for the exact value of the timestamp like for other features.
The numeric range is between 1999-12-31 16:00:00
as 946684800[_1000]
and 2050-02-10 16:00:00
or 2528150400[_1000]
.
ISO8601 timestamps supported have the formats like:
YYYY-MM-DDTHH:MM:SSZ
YYYY-MM-DD
Updated 2 months ago