WeCP (We Create Problems) is an online platform that provides coding challenges and assessments for technical interviews and competitions. One of the critical aspects of managing a platform like WeCP is ensuring the integrity of the code submissions. To detect and prevent code plagiarism, WeCP uses MOSS (Measure of Software Similarity), a powerful automatic system developed by Stanford University.
In this help article, we will explain in detail how WeCP detects code plagiarism using MOSS, along with an example to help users better understand the process.
How WeCP Detects Code Plagiarism with MOSS:
Code Submission: Users submit their solutions to coding challenges on WeCP. These submissions are then collected and sent to MOSS for plagiarism detection.
Normalization: MOSS starts by normalizing the submitted code, removing superficial differences like whitespace and comments, which do not impact the code's functionality. This step ensures that the comparison process focuses on the actual code structure and logic.
Tokenization: The normalized code is then converted into a sequence of tokens. Tokens are atomic elements that represent the code's syntax and structure, such as keywords, operators, identifiers, literals, or punctuation marks. Tokenization simplifies the code to a standardized format, enabling easier comparisons between different submissions.
Fingerprinting: MOSS generates fingerprints for each code submission using the Karp-Rabin string-matching algorithm. Fingerprints are compact representations of the code that allow for efficient comparisons. The algorithm selects a subset of tokens from the code, creating a smaller set of fingerprints that still represent the overall structure and content of the code.
Comparing fingerprints: MOSS compares the fingerprints of all submitted code files pairwise. By comparing fingerprints instead of the entire code, MOSS can efficiently detect similarities between large numbers of submissions. If two code submissions share a significant number of fingerprints, it indicates that there may be a high degree of similarity between them.
Ranking: Based on the number of matching fingerprints, MOSS calculates a similarity score for each pair of submissions. The higher the score, the more similar the two code files are. MOSS then generates a report that ranks the pairs of submissions according to their similarity scores.
Result visualization: WeCP presents the MOSS report to the instructors or administrators, highlighting the similarities between code submissions. The report shows side-by-side comparisons of the code segments with matching fingerprints, making it easy to review and determine if plagiarism has occurred.
Example:
Suppose two users, Alice and Bob, submit solutions to a coding challenge on WeCP. Alice writes a genuine solution, while Bob decides to copy Alice's code with some minor modifications, like changing variable names and comments. When the submissions are sent to MOSS, the following happens:
MOSS normalizes both code submissions, removing differences in whitespace and comments.
The code is tokenized, converting both submissions into sequences of tokens representing their syntax and structure.
Fingerprints are generated for each submission using the Karp-Rabin algorithm.
MOSS compares the fingerprints of Alice's and Bob's submissions.
Despite the modifications Bob made, MOSS detects a high degree of similarity in the fingerprints.
MOSS assigns a high similarity score to the pair of submissions and includes them in the report.
WeCP administrators or instructors review the MOSS report, which highlights the similar code segments between Alice's and Bob's submissions. They can then investigate further and determine if plagiarism occurred.
Note:
It is important to note that MOSS is not a definitive tool for proving plagiarism; it merely helps in identifying potential cases of code plagiarism. A manual inspection of the flagged code segments is necessary to make a final determination.