Search Engineering Q&A

From a real Q&A session at a search engineering team

Q. We're a publicly traded marketplace with a strong engineering brand. Give us a technical deep dive about what are your strengths and why it would be better over what we already have.

Promoted Delivery is 2nd Stage ranking and blending in what all large companies eventually build in multi-stage search and discovery. We have an especially efficient implementation that is so efficient that it can run both search & feed ranking and performance ads with design features of both and cost savings in infrastructure, measurement definitions, and latency by combining both. You realize these benefits as +5-10% more organic revenue driven by more efficiently matching buyers and sellers.

Our product was designed for ads, but it also works especially well for large marketplace S&D (search and discovery) because each online marketplace listing is effectively a performance ad: they all compete with each other, the inventory is dynamic and ephemeral, and they all have competing commercial objectives.

Behind the scenes, Promoted’s personalized search and feed ranking is ads ranking with the ads auction removed. For search teams, it is a very efficient 2nd and 3rd stage ranker that uses real-time signals and user features to be “personalized.” In practice, the “personalization” aspect (i.e. user features in recommendation models) is only a contributor to performance; the bulk of the A/B impact comes from the ability to optimize each response in real-time and dynamically control the final allocation functions and rules. If you disabled all “personalization” (user features in models), you’d still receive some benefit over standard retrieval ranking systems.

Ads-grade measurement for everything

While many think of Promoted as Delivery (our core value proposition is that we increase your revenue by 5-10% by better marketplace ranking) Promoted’s secret sauce is our measurement engine powered by our own streaming data warehouse. It is infeasible to absurdly expensive to use typical data systems like Snowflake to power scaled performance ads because of the data volume of IAB impressions and the need for joins to produce p(CTR) and p(CVR) model training examples, measurement attribution, and for use in pacing. For search, the quality bar for data correctness and completeness can be much lower because data can be used for internal analytics only. For ads, this measurement level is required to produce calibrated p(CTR) used in bidding and for advertising reporting. Typically, the ads team will make these data measurement engineering investments, but only for ads, and at a tremendous expense to the engineering team. However, of course, the search team also benefits from more, higher-quality measurement! The most direct benefit is more data for ranking models. Other benefits include more sophisticated supply and demand analytics and first-class seller SEO performance reporting, which can be integrated into your ads manager product. The ads team, of course, benefits from better measurement in search. “Sponsored search” is effectively “search.” Data about what happens in search is relevant to sponsor search performance.

Any reasonable staff engineer can launch XGBoost GBDT for p(CTR) scoring and a PID controller for discount pacing MVP in about a month each, absent the typical blockers in a large engineering organization. The engineering challenge is getting all the data used for scoring and pacing logged, joined, transformed, and served back in streaming live production where the stakes are millions of dollars of other peoples’ money in an adversarial environment. Promoted provides this. This level of engineering is not required to run organic search: something like Elastic with a reasonable quality score and filters will deliver reasonable, stable results, and the “search product” will work well enough. This level of engineering is required for scaled performance ads, which are effectively wringing the last 5-10% in matching efficiencies as profits. Presumably, the engineering team has already been making the product as efficient as possible for years, so it’s reasonable that a large improvement would be hard won.

Ultimately, the search team does not need to use Promoted’s allocation if you are using Promoted for ads, even if you send both search and ads data to Promoted in Delivery and Metrics APIs. You can use Promoted only for unified data processing, and for ads, and we fully support shadow traffic (non-blocking production test traffic) and input rank echoing. We return all pieces of our scores, like p(CTR) and p(CVR) for use in your own reranking function on top of Promoted, if you choose to do this. Because we use model stacking, we model our ranking on top of your best ranking, so it’s unlikely that it will underperform your ranking. You can A/B test Promoted search ranking to verify performance at any time.

How do you plan to solve for availability?

We do this already for Outschool and Hipcamp, which have severe availability challenges (live classes with limited slots and property vacation rentals). We handle this in two ways:

  1. We are built on top of your existing search retrieval logic. While we may make multiple calls to your retrieval system and over-fetch, we only show items that your other systems already permit to be shown.
  2. Send us “availability” as features live on insertions in Delivery API if real-time is critical or asynchronously via the Item CMS if a delay of a few seconds is permitted to save on serde processing costs incurred by passing features on Delivery API. We use this feature to predict clicks and conversions for use in search and ads ranking. If “availability” signals are timestamps, then we convert these to time-to-now with some sine/cosine transformations. We attempt to use all information you send us about items for all models.
  3. We support more advanced forms of “availability optimizations” in joint-modeling collaborations. For example, Outschool, they have a concept of recurring subscriptions versus one-time bookings, and they provide an LTV model to our models to optimize accounting for this.

How do you understand intent of query in 1 vs 2 sided marketplace?

If by “intent” you mean semantic modeling, we re-use whatever models your team has already created to predict p(CTR), p(CVR), and utility function optimizations used in search and ads ranking. We have some query-to-content modeling capabilities, and we can support text embeddings if you can support the infrastructure costs to use these in 2nd stage ranking. Also, since we are built on top of your existing retrieval systems, we don’t need to re-implement business logic about what is “allowed” to be shown.

Otherwise, Promoted functions by training on user engagement, including views, clicks, and sales primarily.

If by “intent” you mean the distinction between intent in 1 vs 2 sided marketplace, the biggest difference is that in 2-sided, both the search and the searchee have intent: the stakeholder of every listing wants to be “first.” The intent is to match, which is two-sided, not select the best, which is 1-sided. One challenge for marketplace product teams is communicating with stakeholders that appearing search is not a “reward” for being “good.” It is a limited, competitive resource that must be explored to match fairly to keep the marketplace healthy according to buyer feedback. Sellers and merchandising teams invariably dislike this answer, which is why ads are useful to managing mature marketplaces: if you want more volume, then you need to pay a higher effective take rate. It’s bad management to capitulate to seller complaints for increased volume. This rewards corruption and your most expensive support cases at the expense of your business. This practice kills marketplaces. However, not having ads in your seller management playbook to address the demand for increased seller volume and control can feel like stonewalling and gaslighting your sellers and business teams, which is better than corruption and business failure, but also undesirable to be left unmanaged.

Another challenge in managing seller-side intent is that marketplaces must try to deliver every item. In a one-sided search, burying all but the top 0.1% performing materials may be acceptable. A marketplace listing that gets no traffic will delist. Delisting should be intentional to manage supply and demand and provider quality, not a consequence of a poorly functioning S&D marketplace product. In marketplace ads, the expectation for some best effort of some delivery is even more explicit. The consequence of these competing intents between buyers and sellers is a high variance in results between similar queries because every listing rotates into top slots for some minimum exposure. Even observing this variance, let alone maximizing a business objective subject to this variance, is a dramatic increase in measurement data infrastructure compared to stable rankings. Promoted provides this measurement data infrastructure.

What's your POV on ads vs search ranking (boosting vs slotting vs carousel etc) and optimization function

Carousel:
The limited resource is buyer attention, not “slots.” If the idea is “how do we pack in more ads without taking slots,” then this seems more like a too-clever solution to “fixed slots” organizational issue. My POV is that unless carousels would make sense for non-ads, then don’t do them for ads. Unless carousels materially improve the end-user experience, then you’re not gaining anything except additional implementation and optimization complexity with carousels. If a carousel is a natural way to feature a category of content, which could be promoted items, but could be other categories or multiple views of the same item, then carousels could be reasonable.

Boosting:
This is effectively an ads retrieval implementation, not an ad allocation solution, because otherwise, all top items will be ads without allocation limits, and that’s probably not desirable. Otherwise: yes, this is a reasonable implementation of “sponsored search” because it re-uses business logic rules already implemented in the organic product.

(Fixed) Slotting:
“Slotting” is good because it’s efficient and easy: you can run search 2nd stage ranking and ads ranking independently and in parallel This saves end-user latency and decouples engineering system dependencies. It’s easy to implement and understand, and it’s easiest to coordinate organizationally. Ads slots are frequently modeled as “product concessions” by the organic team. When ads revenue goals are under pressure, having fixed ad slots pre-negotiated at the beginning of the quarter is a reasonable failsafe to avoid cannibalizing your core business with a spiraling ad load. Many “reasons” can be pushed that always seem to result in “more ads” in the iterative game of “meeting revenue goals” over time. That’s bad. Explaining fixed ad slots to external stakeholders like advertisers, brands, and executive management is also easy.

Slotting is bad because it’s technically inefficient. It’s not hard to imagine how the ads and search systems not knowing about each other can result in allocation inefficiencies and why dynamic ad load can be more efficient depending on market dynamics and how ad quality evolves. However, moving away from slotting is “hard mode.” Your engineering and organization must be ready for the dramatic increase in operational overhead to justify the potential increased technical efficiencies. Promoted requires unified organic+ads measurement before any dynamic ad load optimization.

Optimization Function:
Typically, this means “ranking function,” or “the scoring formula you sort by.” Sorting is a simple allocation function, but there are more complex functions.

Short answer: by default, we rank by p(click)*p(gmv|click) where “click” is attributed to a search result impression and “gmv” is the value of a purchase attributed to such a click. For ads, the bids are converted to insertion bids depending on the optimization objective and allocated into slots. More details on the bidding function. Our customers frequently add additional rules like filters, boosters, or a utility function that prioritizes “conversion” versus “gmv aka value of conversion” for new users.

Longer answer: allocation functions can be complex and include variables that are not item-independent, like diversity or spacing controls or ad slot placements. We collect all “allocation parameters,” including “utility function(s) parameters” in a system called “blender” for management and, ultimately, joint automated optimization.