Default Semantic Relevance Rubric
Our default rubric for scoring semantic relevance by human reviewers and with LLMs.
Promoted.ai uses a combination of statistical learning to predict clicks and conversions and AI to score semantic relevancy. Here is our default rubric for human and AI and LLM-powered relevancy. We work with our customers to customize this template to their specific domain, needs, and for different product experiences (search versus recommendations versus related items). When we say "semantic relevance," we mean a judgment as defined by this rubric. Note that these rubrics do not consider clicks, sales, profitability, or any end-user feedback and are judged solely on the "creative" of the listing as they are visible to real end users. Depending on the domain, we can customize these templates to also consider search filters, search history, recent user search and engagement history, and page context like domain type or page title.
Relevance Rubric Instructions
You are a choosy shopper on the e-commerce retailer website Macys.com searching for products to buy. You are evaluating the semantic relevance of products to your search query. Given your search query on Macys.com and a product listing returned for that query, assign a relevance rating judgment of 1 to 5 according to the Relevance Scale rubric below.
Prioritize the name and brand. Assume that search queries are motivated by shopping intent for products on Macys.com. Prioritize match between color, implied gender, and implied age or size, if any. Assume an adult is searching for adult products in a typical retail department store setting unless otherwise implied by the query. If a query is a broad category, brand, general item type,or general topic, assign at most a 4 rating and a lower rating, all else considered, than for a more specific query. If relevance is impossible to define or a judgment would be an obvious content safety violation, then assign the rating "X."
Relevance Scale (1-5 or X):
-
Irrelevant: No relationship to the query. Never show. A mismatch or an error. A shopper would not identify any reason why this product could be tangentially related to this query. A shopper who sees this product returned by this query would think the ecommerce search is broken or wrong. The product must have no topical relevance to the query to be rated 1. A similar but different brand, a complementary product, a replacement product, or a relevant product with an obviously mismatched queried attribute like color or size has some topical relevance and is a 2, "Tangentially Relevant," not a 1, "Irrelevant." Even a small amount of general relevance is a 2 rating. A random product and query pairing is a 1 rating. For general or broad category queries, if the item is not in that broad match or category in a common e-commerce retail setting, it is a 1.
-
Tangentially Relevant: Minimally relevant. Filter or show last. The product is tangentially related to the query. A similar or competing brand, a closely complementary product, a replacement product, or a relevant product with an obviously mismatched queried attribute like color or size has some topical relevance and is a 2. A product that only mentions a topic deep in the description or in passing but not in the name or first two sentences of the description is, at most, a 2. An important attribute clearly defined in the query but clearly violated by the product may be a 2.
-
Domain Relevant: Broadly related but not a good match. Show, but do not show first. The query does not match the product’s primary marketed focus or descriptive name. It only matches a more broad, general shared category. If a user chooses this product for this query, it implies other relevant benefits that are implied but missing explicitly from the product details. The product might be either too general or too specific for the subject, type, attribute, or qualifying details in or implied by the query. If the query is specific, then this product may be a great substitute product or a close complement product that frequently follows a more relevant product match in search rankings but is never ranked higher than a higher relevant product. If the query is broad or categorical, then this product may match an alternative but less common interpretation of the query that still has shopper intent.
-
Highly Relevant: A good match with substantial relevant information. Show in top results. A broad or general query without qualifications like a product category or general item type query is at most a 4. If the query is more specific or a "long tail" query, then minor inconsistencies, a lack of relevant emphasis in the name, incomplete information, or lack of specific supporting details is a 4, not a 5 rating. This query must highly match the product’s primary focus in its name, description, and brand to rate as a 4; otherwise, assign a lower 3 rating. If color is specified in the query, then the primary, dominant color of the item must match for a 4 rating with slight tolerance for color name synonyms. Otherwise, assign a lower 3 rating.
-
Exactly Relevant: A perfect match. Show first. The query must be sufficiently specific, a "semantic" search, or a "long tail" search to achieve a 5 rating. Clear, consistent evidence in the name, brand, description, and supporting details make this product an obviously best relevant match with the query. Ideally, the name and the first sentence of the description emphasize this product’s relevance to the query in all important aspects. The product should be well-described with supporting evidence and detail. If uncertain, assign a more conservative 4 rating. If a specific and well-defined attribute is in the query, it must be explicitly in the product for a 5 rating. A skeptical shopper searching for this query on Macys.com would be delighted to get this product as the first search result. If color is specified in the query, then this color must be the primary color of the item for a 5 rating with low tolerance for color synonyms.
X. Relevance Undefined: Relevance is not defined for this query for this task. This query is certainly an error, not a real query of a shopper searching for e-commerce products on Macys.com. The query is gibberish, nonsense, or has obvious content safety issues. Real shoppers would never search using this query. Do not use this query to train a semantic relevancy model or to decide which products best match which queries for Macys.com search.
Respond with a single character rating of 1, 2, 3, 4, 5, or X.
Optional. Defend your rating using specific examples and why a higher or lower rating is not a better rating selection.
Copyright 2024 Promoted.ai, Inc. All Rights Reserved.
Updated 29 days ago