Des let us know via our BBC contributors list that he found a site that was scraping all our content from BBC and reposting it, verbatim.
The blog in question was using our feed from Feedburner to access our content. The blog posts even included our Feedflare items (more on that later). However there is no attribution that the posts come from BBC or are written by someone else. To make matters worse, this blogger even copied all our tags and categories for his own blog. I left a comment on own of my own posts that was scraped asking this person to stop and that he was in violation of the DMCA and international copyright law (the comment is still in moderation, imagine that). So, then, what can you do about a splog scrape? This is what we’ve done so far.
Right now we’ve thrown down the first gauntlet to try to embarrass this person. We’ve added to our Feedflare items (which you can see in our feed) a CreativeCommons license, Copyright statement, and Attribution link. All of these things will make it really clear to a reader that a) the content doesn’t belong to him and b) who the content really does belong to.
Is that going to be enough? Probably not. We can also file DMCA papers to Google and his ISP, which is pretty serious stuff. Google does not take kindly to people using Adsense to make money off stolen content. ISPs also get a little edgy about this kind of stuff too. One course of action that we haven’t taken, yet, is actually altering our feed. Right now we publish a full feed (that is you get the complete content of the post). There are lots of debates about full vs partial feeds, and this isn’t the time or place. What we can do, and very easily with Feedburner, is to switch to not only a partial feed, but a partial feed with a message like “Sorry for the inconvenience, but some blogs are stealing the content from this blog so the feed has been truncated. Stealing content is wrong.”
Does this tactic work? Sure does. Jim Turner and I helped a friend of ours do this and within a few days the scraping stopped. Rather embarrassing and not good for clicks when a website visitor sees that message above.
Beyond the tactics for how to combat scrapers, how did we find out in the first place? Des was the key to this. He was looking at our Technorati links and saw something hinky. A little digging led him to this blog and the discussion began. We’ll keep you posted on how it all turns out.
Now there are legit ways to use and consolidate content from other blogs. You can list headlines from a topic, couple sentences, and a link back … doing this ads content and value to your blog, in addition to your own content. Recently I’ve been getting a lot of good traffic from a legit site in this way. Just the headline from one of my posts (about the whole podcasting – netcasting question) on a Mac site brought a goodly number of visitors and it was my #1 referrer yesterday. I’ve also seen my headlines and a few words on other sites as a “great links for the day” … always flattering to read that.
Where do you draw the line? Fair use. You can use a feed to bump up content on your site if you just use the headline, a short snippet of the post, don’t claim it’s yours, and link back to the author/original post. That’s cool and helps everyone. You may not, without permission, copy and republish an entire post on another site. Note the “without permission” part … I’ve been asked and have granted permission for a few of my articles to be republished from time to time. Again, always flattering.
This will probably be the first post of many on this affair … so watch our feed.
Technorati tags: content scraping, splogs, copyright law, DMCA, Feedburner