What Does the LinkedIn / hiQ Ruling Mean For Web Scrapers?


A few weeks ago, a California federal court issued an injunction that has some serious implications for SEOs of all stripes. If you’re into reading legal text, you can check out the ruling here, but if you prefer the tl;dr version, allow me:

Basically, Judge Edward Chen said that once a website like LinkedIn allows users to make data public, it can’t then decide that certain individuals or groups are not allowed to access that data.

Get it?  In other words: if it’s public data, it’s public data.

The backstory

HiQ has a business model that relies on scraping massive amounts of data from public LinkedIn profiles. It then sells that data to employers who might want to know if they have people who are actively looking for other jobs, for example.

But LinkedIn doesn’t like that idea. So it argued that HiQ was breaking the Computer Fraud and Abuse Act of… 1986.

Yes, 1986. When the internet didn’t exist.

The law

The CFAA is pretty clear on what’s illegal.  Basically, if you get access without permission, or “exceeds authorized access” to obtain information from a protected computer.  The CFAA was enacted by congress in the days when hacking was the big new scary threat.  But does it still work in the internet age?

Well broadly, yes.  The CFAA has been cited multiple times and enforced against people gaining information from web servers, including Facebook. However, Judge Chen astutely notes that those were instances where the defendants were accessing data protected by passwords.

Chen then agrees that if the data being accessed is literally made public by the website and user, then the CFAA doesn’t apply.

Chen further asserts that if the CFAA were allowed to apply to publicly viewable data it would potentially allow websites to “weaponize… criminal sanctions” against any user they wanted to.

Robots.txt and user agreements

Interestingly, though it wasn’t the focus of Chen’s order, the ruling seems to indicate that neither the robots.txt file, nor IP blocking, nor the User Agreement are sufficient to prevent data scraping of public information.

Of course, one need not have a profile on LinkedIn to access the public profiles on the site. This is pretty important because if a site hid profile information until you signed up as a user yourself (as many forums do) then the whole ruling might have been different.

What does it mean for scrapers?

If you do anything in SEO, you’ve probably done at least some data scraping. Even if it’s just a simple crawl with screaming frog, or a full-on extraction tool you wrote yourself, getting key data off of big sites is essential to doing our work.

But if the site’s robots.txt blocks your scraper… can you keep going anyway?

Look, I’m not a lawyer, so I will not give you legal advice on your situation. But here’s what the recent ruling seems to be indicating for us:

  1. Public facing data is really public… for now.  (keep watching, this has the potential to go to the supreme court)
  2. Any data protected by password or requiring you to agree to terms or conditions is probably NOT public and if you try to scrape it you might be subject to criminal charges.
  3. If a site blocks your IP, or robot in robots.txt, that does not make the data on their site less public.
  4. The CFAA was written before the internet existed and, according to Judge Chen, doesn’t apply unless you either obtain access without authorization, or used authorized access improperly.

You can still get sued

Here’s something that most people don’t realize about our legal system.  Even if you’re in the right, even if you are doing everything you should be doing, you can still get sued.

An open and accessible legal system that allows every citizen fair access to justice must also deal with madmen and corporate brutes who want to use the system as a hammer to punish others. It’s the reality of the world we live in.

That means no matter how careful you are, if somebody wants to try and sue you, they can. Protect yourself by being as reasonable and fair as possible.  Don’t break terms and conditions. Keep scrapes slow. Don’t try to access data improperly.

We SEOs have some really cool tools available to us, but with great power comes great… caution. We have a duty to be responsible in how we interact with data and websites. Be smart, be informed, and be responsible in how you interact with the data of others.

Not Sure Where to Start?

get free help from our various online resources

I publish new content constantly through my blog, podcast, book and university. Whenever new content is updated I will share it through my weekly newsletter. Sign up for free notifications below:

Subscribe

Books $13

The little book on digital marketing is my series written for the do-it-yourself company handling their own digital marketing. My hope is that you will see terrific results by doing what we do every day in our own marketing firm.

View Books


FREE Blog

Each week I add insights to the blog on various new digital marketing topics. SEO changes constantly and its important to stay on top of all the updates if you plan to be competitive. The blog will contain my most up to date tips related to digital marketing and SEO.

View Blog


FREE Podcast

Each week I record a podcasts in multiple formats. Each month we have interviews, radio hours and quick tips from the world of SEO and online marketing. All of the content is free and will help you grow your business marketing to the next level.

View Episodes


FREE Tools

We are now publishing tools for our users to download for free. These range from SEO tools to general user experience tools. Check out some of our latest releases below:

Tools


University $290

Later in 2017 I will be launching my online university for businesses who want to keep SEO in-house. We will include copies of all books in The Little Book on Digital Marketing Series, a login to our online video classes, and 1 hour of consulting with a specialist each month to review your work and progress.