4

How to Secure Your Site Against Web Scrapers

 1 year ago
source link: https://blog.bitsrc.io/how-to-secure-your-site-against-web-scrapers-the-best-browser-fingerprinting-techniques-9b0a996efb91
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

How to Secure Your Site Against Web Scrapers

The Best Browser Fingerprinting Techniques. Learn how easy it is to implement your own fingerprint code in minutes!

1*dXUpAE3joLMOatMLJz0o9Q.jpeg
Photo by George Prentzas on Unsplash

Browser fingerprinting is a security technique used in many different situations where you want to uniquely identify a user, no matter who the user is telling you they are.

One of those situations, is of course, when they’re scrapping your website. If you’re using limited detection techniques, such as checking their User-Agent or their IP, they can easily change that information and you’ll be none the wiser.

On the other hand, if you implement something like browser fingerprinting, then you’ll make their job a lot more difficult, because you’ll be using information they can’t fake.

In this article, I’m going to cover some of the most common techniques for creating this so-called “browser fingerprint”, and then I’ll show you how to implement one of them.

Common browser fingerprinting techniques

I know it sounds like a fancy thing, but a fingerprint is nothing more than a hash created from a certain set of input data.

The real trick is how you get to it.

Using the Canvas element

One way of getting that fingerprint, is by drawing something without the user seeing it, and checking the result.

You see, each browser will have some variations in the way they render content on the Canvas element. They’re subtle differences, so they’re really hard to spot if you’re not looking for them.

This particular use case takes advantage of several facts:

  • Fonts aren’t available on all systems, so other fallback options have to be used in those cases.
  • The anti-aliasing applied is not always the same.
  • Some other minor details based on the implementation of your browser.

Once rendered, the image needs to be turned into a hash (a unique value), and you have your fingerprint.

Best part? The canvas element can be hidden, so the user doesn’t really know what you’re doing.

Media device list

Another way of fingerprinting your browser, is to gather a list of media devices connected to your system and create the hash based on that information.

How can you do that? Actually, pretty easy, there is a Media Device API on all modern browsers that allow you to list all connected devices.

The following code lists all media devices from your system:

The code was taken from the official MDN documentation, and if I run it on my Firefox, I get this output:

1*h4W0ZefUy-b_mgMmvbDWmQ.png

I’m sure you can see how this can be used to construct a single hash value that can identify you in future requests, even if you happen to change the configuration of your browser.

Audio wave fingerprinting

Just like with the canvas element technique, you could potentially generate an audio clip using the Web Audio API based on a fixed source (an oscillator that would give all systems the same input) and then calculate the hash of the output.

You see, given the complexity of the Web Audio API, there is a lot of math and floating point math involved, which means each browser has a slightly different implementation depending on who coded it and the OS it’s meant for. Those differences add up to a point where they can generate a unique value that is consistent across executions.

Meaning that the value will be unique to identify the browser and the OS, but it’ll be consistent if you keep calculating it on the same browser + OS combo.

This is enough to use that value as a reliable fingerprint, especially so because it remains the same even if you’re using the incognito mode on your browser.

Implementing your own fingerprinting code

Now that you understand some of the most common fingerprinting techniques, let’s quickly go over an example of how you’d go about implementing one of them.

You already saw some code for the media devices list, but I’ll focus on the canvas element one, because I think it’s more interesting than collecting a lot of data points.

Remember, this technique requires us to draw a picture with text inside a canvas element, and then turn that drawing into a single hash.

All we have to do is define a canvas, draw some shapes into it, add some text, and then call the toDataURL method on it. By default, it’ll turn our image into a PNG which we’ll then iterate over and turn into a single hash value.

In particular, our “hash” is going to be a single integer, you can use your own hashing method, I just picked up one that was easy to code and fast performing.

Original code taken from there and modified

This code generates an output like this:

1*HySUPcIDyY-x8rcAhcMCJA.png

Of course, the actual image is quite ugly, we’re not trying to make something pretty now, we’re just worried about the hash. Notice how the value is the same for Chrome and Chrome on incognito mode but different for Firefox.

This is key, because it means you’d be able to know when your crawler/scraper tries to “fool you” by simply using an incognito mode on their browser.

Granted, they could change browsers and then you’d be none the wiser, but scrapers/crawlers are usually always using the same one that tries to identify itself like something else (with the User Agent string).

In this situation, you could even protect against a headless browser trying to crawl/scrap your website. Simply have a script that executes the above code on a hidden Canvas element, get the hash and compare it against your own list of “blacklisted” values. If they’re there, then you know what to do!

Browser fingerprinting can be a fantastic tool to protect you from users trying to interact with your system in ways you don’t want them to. From people trying to scrap your site, to making sure users don’t fool your paywalled content, or only access your content a limited amount of times. Really, your imagination is the limit here.

And as you can see, while the theory sounds complex, the actual implementation isn’t. You can have your own fingerprinting code in no time.

Have you used one of these techniques before? Or even better, have you had to fool one of them in the past? How did you do it? Share your stories in the comments!

Build microfrontends with reusable components

1*mutURvkHDCCgCzhHe-lC5Q.png

Bit’s open-source tool help 250,000+ devs to build apps with components.

Turn any UI, feature, or page into a reusable component — and share it across your applications. It’s easier to collaborate and build faster.

Learn more

Split apps into components to make app development easier, and enjoy the best experience for the workflows you want:

Micro-Frontends

Design System

Code-Sharing and reuse

Monorepo


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK