Behind the paper: Anand Muralidhar detects robotic clicks on advertising

Anand

Customer trust is a priority for Amazon, so there is no room for fraudulent clicks on advertising in its platforms. As bad actors have gotten more sophisticated at programming bots to impersonate humans clicking on ads, Amazon Ads has leveled up its security. The resulting model is described in this article and in a paper that was presented at the 2023 Conference on Innovative Applications of Artificial Intelligence, part of the annual meeting of the Association for the Advancement of Artificial Intelligence.

Lead author and Amazon Ads Principal Scientist Anand Muralidhar, who has a PhD in electrical and computer engineering from the University of Illinois Urbana-Champaign, talks here about the paper and his current research.

Why did you join Amazon Ads?

Right in the beginning, to be honest, I wasn't familiar with the kind of work that Amazon Ads does. In 2016, I was winding down my work on a startup and looking for a role that would allow me to work on machine learning models, and this role popped up. I got into it without an idea of the scale or the complexity, so it was a welcome surprise once I joined the team.

What areas of research do you focus on?

My research focus has evolved. I spent maybe the first three-quarters of my Amazon career detecting robotic traffic. In the last couple of years or so, I've started looking at contextual advertising. That's an important area of focus for Amazon Ads, as we try to understand the content of a web page or app that a user is looking at and then show ads matched to that. So if you're on a page that's talking about a recipe for, let's say, chocolate cake, then I want to show you ads related to chocolate cake—maybe a baking pan, butter, chocolate chips, and so on.

What is robotic traffic, and why does it happen?

Every day on Amazon.com, we show billions of ads that receive millions of clicks, and we charge advertisers every time somebody clicks on an ad. There are some unscrupulous elements on the web that want to exploit this, and they build robots to click on these ads.

There could be a variety of reasons to build a robot to click on ads. Let's say you want to exhaust the advertising budget of a seller of wrist watches. When somebody searches for watches on Amazon and the seller's watches show up as ads, if a robot clicks on every such ad, the seller's advertising budget will deplete very soon with no human ever seeing an ad. Another example of robotic traffic is when a robot drives up ad rankings for a product through clicks even if other ads are more relevant for a search query. This could confuse machine learning systems and inadvertently boost rankings.

People who come up with these robots have become very sophisticated, and they keep improving and evolving their algorithms.

How does your paper, Real-Time Detection of Robotic Traffic in Online Advertising, address this problem?

This particular paper talks about a machine learning model to identify such robotic traffic: slice-level detection of robots (SLIDR). SLIDR runs in real time, and it looks at every click that is made on Amazon.com by somebody who views an ad. It figures out whether the click came from a human or a robot—and if it's from a robot, we do not charge the advertiser.

SLIDR deployed in 2021, and it processes tens of millions of clicks and a few billions of impressions every day. Today, deploying a deep learning model may not sound like such a big deal because everybody does it. But when we started this in 2020, it was probably the first such model to be running at such a scale on Amazon.com, and it was quite a challenge for us.

The SLIDR model looks at slices of traffic coming from different devices, such as a desktop, mobile app, or mobile web. All of it needs to be handled differently to achieve maximum performance from the system, and we came up with some techniques to do that. Also, we realized over time that we needed guardrails to ensure that when we deploy these systems in production, nothing goes wrong and we always have a fail-safe mode. The paper also has a few other technical details on how we set up the problem: the model architecture, the kind of metrics we use to evaluate performance, how the model works on different slices of traffic, and so on.

What is exciting about this research and its impact?

SLIDR ends up saving advertiser revenue that otherwise would have been wasted.

Another important thing is the scale: There are very few systems that match up to Amazon Ads in this regard. Even when people talk about building models for big data, they don't really run those models at that scale.

This is one of the wonderful things about working at Amazon Ads—you work with data at a scale that is quite unimaginable. We deal with billions of records in a day, and it becomes a huge amount of data over a month. So the kinds of models that we build need to be robust, very efficient, and closely monitored. At the same time, we use machine learning, so we also need to guarantee performance based on whatever metrics we’ve picked.

All of this makes it a fairly challenging and exciting space to work in. We end up seeing a lot of quirkiness in the data, which you will not see if you're just doing theoretical research or working with a proof of concept. Only when you start running things at this scale, where even a small movement in your model's performance can have a huge impact on Amazon's revenue or a customer's budget, does the complexity become apparent.

One more impact of this research was that it gave us a lot of confidence on how to deploy deep learning models in a production framework. Before this, we had no experience doing it, and we weren't sure how to pull it off. Now we’re very comfortable running deep learning models at scale, and that was a fairly big jump for us.

Why did your team decide to pursue the SLIDR model?

Some of the initial solutions my team built for identifying robotic traffic were based on relatively simple rules that became quite complex over time. We were tracking various parameters such as the rate at which a particular IP address or user was making clicks and how many clicks were made in the last few hours, last few minutes, last few seconds, and so on.

As Amazon Ads grew, so did the scale of the robotic traffic and the complexity of the algorithms that robot perpetrators were using. We realized that the rules we had in place were not scaling to match the challenge, and calibrating them manually every year or maybe every quarter was a fairly time-consuming exercise.

This led us to ask whether we should transition from handcrafted rules to a machine learning model. This was a problem to solve at the beginning, not only because of the scale but also the real-time nature. We just have a few milliseconds to evaluate clicks as they happen. We built some models called gradient-boosted trees, which ran quite successfully for a couple of years. But then we experienced the deep learning wave, which provided an opportunity to take our models to the next level. These models continue to evolve, and we’re building more complex techniques that can distinguish human clicks from robotic clicks even better.

You mentioned being pleasantly surprised at the scale and complexity of Amazon Ads when you joined. What else have you noticed?

You might think scientists are sitting in their corner developing machine learning models and then just writing a spec for deployment and giving it to engineers who are sitting somewhere else. But that's not the case. Here, all of us are sitting on the same floor right next to each other, and that makes it a very interesting environment where we can iterate on ideas in tandem with the engineers.

Our team has built frameworks that allow the scientists to deploy a model in the production system with minimal effort. The cycle of coming up with a model concept to deploying it in production used to span many, many months, but now we’ve reduced it to a few weeks. Somebody can come up with a fantastic new idea or a new machine learning model, quickly test it, and launch it in production, and it will be running live. That’s fantastic because it allows somebody to see the impact of what they’ve done in a very short period. I don't think that kind of opportunity is available elsewhere, where you can truly move the needle on a business that is measured in billions of dollars.

How are you re-imagining advertising in your role?

As internet browsers continue to move away from third-party cookies, my research has moved to contextual ads. These ads identify the main topic, content, and top keywords of a web page and show the most appropriate ad based on this information. That's our responsibility at Amazon—to ensure that the advertisers who are placing their trust in us continue to get the same performance as before.

I’m excited that we’re driving innovation in the space of contextual ads by using state-of-the-art AI techniques to deliver the best experience for both the advertiser and the user.