I noticed a while back in my Technorati blog backlinks thing that a certain blog was reposting all of the articles from my anime blog. It was an RSS Scraper that ran on some software called “Autoblogger Pro.” Basically, the site scraped posts from a number of anime blogs and reposted the content with adsense ads. While I didn’t mind the link back, it still bugged me that the person who set this up was simply profiting from other people’s work. I figured I’d do something about it.
The first thing I noticed was that all comments on the blog were turned off. This makes sense, since every comment would have probably been “stop stealing my content, you jackass!” There wasn’t any contact info or even an “about” page. I decided that contacting the owner of the website wouldn’t have helped anyway. It was time to take matters into my own hands.
As a Computer Scientist, I’ve learned to think as an adversary. For most automated systems, there’s usually a way to exploit some sort of flaw. I just had to figure out the weakness in Autoblogger Pro. In this case, the weakness is that everything is automated via RSS scraping. If they can’t get my RSS feed, they can’t scrape it.

The first thing I had to do was find their IP address. I used the dns lookup tool here to find the IP of the website. Note that the website’s IP is not necessarily the one that’s pulling the feed. It was in my case, though.

Just to be safe, I checked out my logs to see if that IP was pulling my feed. Bingo! They were pulling the feed every few hours.
In order to stop them from accessing the feed, all I’d have to do is deny that IP. It’s pretty simple with .htaccess’ rewrite engine. Now, you could simply throw a forbidden 403 code, but where’s the fun in that? Sure, your site isn’t indexed anymore, but what about all the other sites whose content is being stolen? Someone has to stand up to these bullies!
I decided to make a fake RSS feed to redirect to. This one would have 1000 entries in it, each 1 minute apart from the last. This would muddy up their site, and disallow anyone from actually seeing the stolen content. The fake feed would be auto-generated at the time of request, so each time they pulled the feed, it would be recent. I wrote this in my rails application, but it would probably be just as easy in php. Then I made my .htaccess forward to the fake rss for addresses coming from that ip’s location.
RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^209\.200\.12\.(.*)$
RewriteRule .* http://www.thebadrss.com/feed(not really the fake feed) [R,L]
Voila! After waiting a while for the scraper to pick up the feed, I was happy to see the results. You’ll notice that I inserted random numbers into each entry, just to make sure they were all unique. I thought that maybe the software could detect duplicate entries.
This might not work for all content stealers, but if enough people start spamming the spammers, maybe they’ll stop. I’d really love to see the look on their face when the guy running this site sees what’s become of it!
Full Disclosure: I actually run an RSS aggregator of sorts, myself. It’s called Anime Nano. There are a few differences, though. First off, it’s opt in. Second, it’s a community of readers and bloggers who like anime, not a one-man profit machine. Third, visit it at animenano.com!
Good article. Is good do this with de .htaccess file.
If you don’t have this file you can use a WordPress plugin
http://www.anieto2k.com/2006/06/15/autor-feed-elige-quien-te-sindica/
Bye.
Yay,revenge!
All you need is a programme that allows you to see through someone elses computer screen to take a snap shot of their faces when they saw it. LOL.
LOL, Congrats, way better than just blocking them …
Fantastic post, thank you very much!
I’ve been looking for a way to shut down RSS scrapers and I love the revenge angle you take.
Soultrance
I think it would be great if someone addressed the problem of identifying the scrapping addresses with one-time content poisoning, in case they pull from a different IP address. There must be a module which allows to watermark feeds on the fly. This is, of course, puts load on your CPU, but it’s all in a good cause.
Hi, thanks for the tip for using network-tools.com to get the scraper’s IP address. I’m using the Antileech plugin which is supposed to do the same thing (feed fake RSS content based on IP address), however it didn’t work for one of the scrapers and I had to block their IP in my htaccess file after reading your post.
Have you written about how you created your own fake RSS feed? I’d love to learn how you did that.
Ugh I was trying to do this but I don’t get the htaccess thing T_T These fagg0ts are stealing like everyone’s content:
http://direct-anime.org/
Fantastic article!
But the true irony of the post is that your google ads on this page are now displaying sites such as “Easy web data scraping” etc…
How did you write the auto-generated feed?
Thanks.
Hmm, a few people have asked how I did a fake rss feed. So I should really write a post about it huh? I’ll put it on my to do list.
Basically, I generated xml and used dates that were close to the current date so that the fake content would show up on the front page instead of the archives. I’ll leave it as an exercise for the reader to try this until I write a post about it.
I keep noticing these in my trackbacks. They’re a real pain. One guy was just wholesale reposting my entire content – not just a snippet – and to cap it off he was hotlinking my images. I sent him a strongly worded email (I whois’d his domain name and found his email address), and now the site is a blank page. Guess he got the message.
What’s the best RSS scraper service? an online one, please
ha ha! that’s genius. unfortunately, i don’t know how to do any of that stuff, so i have to foil my scraper with lame methods. i just put a line in my posts linking back to my blog as the source of the material. and i set my feed to “short” when it used to be “ful”. but i can’t contact the douchebag, and i can’t leave comments on the stolen posts
on the bright side, i’m getting back links everyday :\
Good stuff,
I reason i came here is of course i was looking for a solution to a similar problem.Currently am just going with an ip block but i may eventually rename the feeds and re-submitt them,it’s a pain but it’s a radically reliable solution
Thank you very much for the article
If you have RSS, you are syndicating your content and what you have done shows your skills but also your hostility towards people syndicating your content.
Scraping a web page is one thing, and I could understand it.
But preventing people using RSS on other websites is simply a nonsense.
RSS is for syndication, and if you don’t want syndication, don’t provide RSS.
Sorry, but this is bullshit. If you have RSS, which by definition today means Really Simple Syndication, you basically offer other people to “syndicate” the summary of your content.
You should not put anyway full pages into the RSS, put only the summary of pages.
With the RSS on other people’s website you gain new visitors, you gain benefits, and you should not complaining when someone else is doing promotional work on your behalf.
This is grossly your error on understanding the purposes of the RSS.
The major purpose of the RSS is promotion of your content.
So, don’t make “smart” articles on how to stop RSS syndication, as in the first place you have offered RSS to other people, so don’t complain, as that is bullshit.
There’s a distinction between using the content for reading somewhere else (like Google Reader) and republishing it as your own. I syndicate for the former and not the latter.
You are right, there is distinction. But if you offer RSS, too many people will think it is free for distribution. So, don’t blame people for “stealing” your content, because it is not a “content” but rather summary of content in the RSS. And if you don’t wish them using RSS on their websites, tell them, make a legal terms of use, incorporate it into RSS or put them on a website.
Very small number of companies rejects usage of their RSS on websites. You are somewhere there, in the minority. Make a legal agreement with those people.
Blocking them or using smart network ideas, or firewalls, is too silly.
Just because a recipe in a cookbook is published for the use of others, doesn’t mean you are giving anyone permission to take the recipe and republish it as their own. The author has made the distinction perfectly clear and it’s black and white, there is nothing to question.
The purpose of RSS feeds is to make it more simpler for your readers to read content from YOUR site, not steal it and put your name on it. Most people know better, and so do you (and scrapers).
Also, scrapers don’t typically steal the summary feeds, but feeds with the entire article. There’s little point otherwise.
Yes, you can ask them to remove it, but 9 times out of 10 they will simply ignore you. I’ve had this happen with a client of mine, and when frequent requests fell on deaf ears, I eventually banned the IP.
Видео с памелой андерсон vip проститутки перми бесплатно отдам лабрадора бесплатно джо кокер проститутки мужчины санкт петербург интим апартаменты москва липецкое порно. Вы не увидите там страшных, которым только, что и осталось попробовать в жизни – так это сняться в видео. Полное бесплатное порно b ajnj или проститутки ногинска русское семейное порно фото! кселлос любовь скачать бесплатно игры k750i ищу секретаря интим как трахнуть сестру, а также порно 35 проститутки евпатории фото голых девочек 15 лет порно фестиваль барселона фотография лица красивой девушки. лучше, чем в клубах запариваться и параноиться – ну ты черезчур, музон круче однообразного трепа – треп тут неслабый, а вот музон грузит – наоборот - ку?- я не только у тебя одного вишу, вчера с тобой трепались – так будешь со мной связываться по фону? – релакснись, я с телефоном завязала.
Thanks. I redirected the offenders to a porn RSS feed. W00t!
Hi,
I feel so helpless, I write quality content and some people would just pull the feed from my site and profit! I can’t take down RSS, since it’s the only way I can submit a sitemap to google (I’m hosted in blogger blogspot).
Admin of this site, I’d like to ask for help! Please mail me back if you have time!