Taming the Wild Wild Web
After Jonathan Bailey was beleaguered by digital plagiarists, he fought back by starting a blog about online content theft. Besides blogging about the subject, Bailey still finds himself dealing with those bold enough to plagiarize PlagiarismToday.com.

According to Bailey, a threat most online authors don’t see coming is automated RSS scrapers, which use syndication channels to dump content minus attribution on a spam blog, or “splog.” Using keyword-rich content, the splogs set up a contextual ad scheme and divert search engine traffic to pick up the ad hits from the original author’s site. And these days, Bailey says he’s seen RSS scrapers mining directly from Google or Technorati feeds the author set up himself.

“In the world of RSS feeds, it seems to me that most people don’t think of re-using content as plagiarism,” said Holt. The rush to set sites up for syndication may inadvertently put the con- tent at risk of being ripped off by some bad bots.

That’s not to say that RSS is a harmful tool. The key, said Bailey, is to keep an eye on who’s subscribing to what. To that end, Rick Klau, vice president of publisher services at FeedBurner, which creates RSS-management tools, said that FeedBurner developed its Uncommon Uses tool to help content creators analyze traffic patterns, identify suspicious use, and see where their feed winds up.

“Uncommon Uses helps to identify re-syndication of your feed beyond standard consumption points, including contact with your feed by non- subscribers,” said Klau. “I think the combination of blog plugins, services like Uncommon Uses, and traffic analysis can go a long way to minimizing the potential harm.”

There’s no widely used tool to deal with the explosion of text-free intellectual property like video, graphics, and pictures yet, because pixel matching is still hard to implement on a large scale. However, a unique file name or tag can make duplicate copies more findable. Bailey recommended overlaying images with a subtle watermark and keeping an eye on server logs to see who’s linking to images and videos directly.

The watermark principle is already being employed as protection on some self-publishing platforms, usually as a plugin or optional tool. For instance, blog host WordPress allows users to tag their content with an invisible digital watermark. The service, called Digital Fingerprinting, then monitors major information feeds and search engines to see if the watermark turns up anywhere other than your site.

“A lot of bloggers that know there’s a problem don’t know how this can impact them whether there’s anything they can do about it,” said Bailey. By plugging unique phrases or tags into a standard web search engine like Google every week or so, or setting up a search alert on those key phrases, internet authors can keep control over their content.

If plagiarism is discovered, Bailey said, bloggers don’t need the legal resources of a corporation to regain that control. He recommended sending a cease-and-desist letter to the plagiarist or notifying advertising companies such as Google or Yahoo!, whose services are supporting the sploggers—the plagiarist is probably in violation of the company’s usage policy. And, if all else fails, a DMCA violation notice usually gets people’s attention.

Although content digitization has made it easier to point, click, and grab, tools are emerging—making it easier for authors, teachers, managers, and any other online publishers to nab the grabbers—to help balance out the scales of just content use.


Sidebar: Dealing with Digital Plagiarism

There’s no detective unit devoted to uncovering crimes of
plagiarism. Even if you’ve got the right tools to catch copying when it happens, here are some tips from digital content experts to make your work less vulnerable to misuse:

Post warning signs. Let readers know you’re paying attention to how your content is being used and re-used. “It’s still better to prevent your content from being stolen in the first place,” said Copyscape’s Gideon Greenspan. “A simple yet effective measure is to warn potential content thieves that they will be discovered. A range of warning banners is available for free at Copyscape.com. These banners are already included in over 20,000 different websites.” If a banner isn’t your style, a simple notice that the content is protected posted in a prominent place can get the message across.

For schools and enterprises, the knowledge that deeply sourced anti-plagiarism tools are in place to verify content’s authenticity can discourage authors from cutting corners. “At the end of the day, what’s important is the understanding that there’s a chance that documents you have been using are in the source file,” said Holt. “As an employee or student, you’d be more inclined to work carefully.”

Copyright it. Affixing copyrights all over the internet bogs down the flow of information. But, for digital authors, the ability to protect original work when necessary is important.

In the United States, most digital creators are assumed to have certain copyright protections over their creations. But, said Bailey, registering with U.S. copyright officials can give authors a wider range of control over their work, including the right to sue for punitive damages if the work is being egregiously misused.

Another option is to establish a Creative Commons license for the content. The more flexible model helps authors strike a suitable balance between reader freedom and user protection. FeedBurner’s RSS tool allows authors to publicize and attach Creative Commons protections to content on its way out to subscribers.

See what’s out there. Even if you’re using a tool like CopyGuard or Copyscape to check in on your content, it doesn’t hurt to venture beyond their limitations and make sure plagiarists aren’t slipping through the net.

“Listen to your readers, and do searches yourself,” suggested Bailey. “Even if you don’t find anything, at least you know. Unfortunately, the odds of you finding nothing these days are pretty slim.”


Companies Featured in this Article

Plagerism Today
www.plagiarismtoday.com  
iParadigms, LLC.
www.iparadigms.com  
Lexis Nexis
www.lexisnexis.com/copyguard
Copyscape
www.copyscape.com  
FeedBurner
www.feedburner.com  
WordPress
www.wordpress.com  
SearchInform
www.searchinform.com