h1

Detecting plagiarism by comparing sequences of references

October 14, 2013

I recently reviewed a manuscript bearing a suspicious resemblance to an already-published paper. The exact wording of the previous paper had not been copied; however, in a couple of sections, the new manuscript made a series of points in the same order as the previous paper, using a very similar sequence of references. The sequence of greatest overlap is shown below (with references from Paper B renumbered according to the bibliography of Paper A).

reference sequence

While this similarity does not itself prove that plagiarism occurred, it certainly constitutes grounds for further investigation.

It occurred to me that comparing sequences of references might be a good way of detecting possible cases of “subtle plagiarism,” in which the original text has been rephrased. This would be sort of analogous to the BLAST (Basic Local Alignment Search Tool) algorithms used in biology to identify related nucleotide and amino acid sequences (as in the comparison below of Calcium-Dependent Protein Kinases from different organisms, taken from Ojo et al. 2010).

CDPKs

In poking around online, I noticed that, not only had others independently come up with “my” idea, they had already implemented it and published papers about it. The 2-page Gipp & Beel contribution to the 21st ACM Conference on Hyptertext and Hypermedia (June 2010) provides a nice introduction to the approach.

Gipp and coworkers are also developing CitePlag.org, a website intended to let others perform document comparisons of their own. According to Gipp, the site currently struggles with long documents and not-yet-accounted-for citation styles. My own testing of it indicates that it is not especially useful at the moment, but definitely on the right track.

One comment

  1. […] friend Greg Crowther suggests a way to detect subtle plagiarists, who are careful not to copy exact phrases: compare sequences of […]



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: