To Catch a Thief: Tools and Tips to Combat Digital Content Plagiarism

Page 1 of 3

The saying goes, “Imitation is the sincerest form of flattery.” Yet in the world of web content, imitation often goes beyond emulation of style or even subject matter. With cut-and-paste ease, imitation becomes plagiarism. Jonathan Bailey, a journalist and writer, had been posting his original poetry and essays to literature websites when he got a wake-up call in his email inbox about five years ago. “A reader of mine tipped me off that my site was being plagiarized,” Bailey said. “It was a jaw-dropping moment for me.”

Since then, Bailey said he’s found over 500 other instances of his work being duplicated across the internet. “I’m fine with copying—all I asked for was attribution,” he said.

Print plagiarism used to be considered an occupational hazard for scholars and writers, like Bailey. With the advent of the internet, however, plagiarizing someone else’s original work requires less heavy lifting than ever. With a few keystrokes, content thieves can copy sentences and mirror entire sites, claiming false credit, taking ad revenue out of content creators’ pockets, or snagging search engine hits away from their rightful owners.

Luckily, the digital environment that makes plagiarizing content easier can make spotting poached pieces easier too. Digital tools are available to help authors discover when others are replicating their work without permission, and other developments can help teachers and managers make sure an author’s words are his or her own. For savvy users who want to protect their creations from falling into the wrong hands in the first place, there are some smart strategies to make it harder for human thieves and scheming bots to steal the credit for someone else’s original ideas. It’s about keeping others honest—and making sure you do the same.

A Tool to Match the Crime
First, the bad news: Digital tools can’t stop plagiarism before it happens. Original ideas get out—that’s the whole point of publication and distribution. And unless all proprietary content is locked away from public view, plagiarists will find a way to get their hands on it.

Until recently, a hunch was all teachers, editors, or other suspicious readers had to go on in order to catch the thief. They’d pour over the plagiarized work and try to match key phrases, paragraphs, and even pages to try to find what triggered the warning bells. The process could take weeks or months and involved thumbing through paper pages in endless archives.

The good news is that certain tools can tap the power of automated search and the depth of digital content databases to make the same standard process faster and more thorough. The basic idea underlying these tools isn’t all that different: The checker uploads the document in question, each word pattern is checked against the full text of original sources numbering in the dozens up to the billions, and then the system generates a report highlighting suspicious passages.

However, as Tom Holt, president and CEO of the search technology company Surf Wax, noted, “Each person is going to find that their definition of plagiarism is not identical [to anyone else’s]. The comparison engine has to accommodate some degree of defining what constitutes plagiarism, depending on the underlying purpose of the organization using the tool.”

Teachers might want to allow for the greatest amount of creative freedom. Lawyers might be looking for the smallest amount of leeway to protect the company’s reputation. And webmasters might need to check for duplicates of a site’s code. The following tools might share the underlying source-search report functions, but they all allow for different interpretations of what constitutes content theft.

Page 1 of 3