Skip to main content

How is code plagiarism detected in WeCP

Team WeCP avatar
Written by Team WeCP
Updated over 2 weeks ago

WeCP (We Create Problems) is an online platform that provides coding challenges and assessments for technical interviews and competitions. One of the critical aspects of managing a platform like WeCP is ensuring the integrity of the code submissions. To detect and prevent code plagiarism, WeCP uses MOSS (Measure of Software Similarity), a powerful automatic system developed by Stanford University.

Code Plagiarism Detection During Tests

Ensuring fairness in assessments shouldn’t rely on manual checks alone. WeCP automatically detects similar or plagiarized code across candidates attempting the same test, helping teams quickly identify potential copying and protect assessment integrity.

This detection runs seamlessly in the background and flags suspicious similarities for review, allowing recruiters and evaluators to focus on decision-making rather than manual comparison.

In this help article, we will explain in detail how WeCP detects code plagiarism using MOSS, along with an example to help users better understand the process.

Code plagiarism detection in WeCP is powered by MOSS and runs automatically once enabled at the test level.

Steps to Enable Plagiarism Detection

  1. Go to your WeCP Dashboard

  2. Select the test you want to configure

  3. Click Integrity & Experience tab

  4. Navigate to the Integrity Experience (or Integrity Settings) section

  5. Enable Code Plagiarism Detection

How Plagiarism Results Appear in the Candidate Report

After candidates submit their test, plagiarism results become available in the candidate reports for review.

Viewing Plagiarism Details for a Candidate

  1. Go to your WeCP Dashboard

  2. Select the test you wish to review

  3. Click the candidate you wish to review

  4. Scroll to the Proctoring Analysis card and click Inspect under Code Plagiarism Detected (visible only if plagiarism detection is enabled)

How to Interpret the Results

  • A higher similarity score indicates a greater overlap with another candidate’s submission

  • Scores alone do not automatically disqualify a candidate

  • Recruiters should manually review the highlighted code sections before making a decision

Important: WeCP flags potential plagiarism but does not auto-reject candidates. Final decisions should always be based on human review.


When Plagiarism Results Are Generated

  • Results are generated after at least two candidates submit coding solutions

  • Reports update automatically as more submissions are received

  • Similarity comparisons are always scoped to the same test

How WeCP Detects Code Plagiarism with MOSS:

  1. Code Submission: Users submit their solutions to coding challenges on WeCP. These submissions are then collected and sent to MOSS for plagiarism detection.

  2. Normalization: MOSS starts by normalizing the submitted code, removing superficial differences like whitespace and comments, which do not impact the code's functionality. This step ensures that the comparison process focuses on the actual code structure and logic.

  3. Tokenization: The normalized code is then converted into a sequence of tokens. Tokens are atomic elements that represent the code's syntax and structure, such as keywords, operators, identifiers, literals, or punctuation marks. Tokenization simplifies the code to a standardized format, enabling easier comparisons between different submissions.

  4. Fingerprinting: MOSS generates fingerprints for each code submission using the Karp-Rabin string-matching algorithm. Fingerprints are compact representations of the code that allow for efficient comparisons. The algorithm selects a subset of tokens from the code, creating a smaller set of fingerprints that still represent the overall structure and content of the code.

  5. Comparing fingerprints: MOSS compares the fingerprints of all submitted code files pairwise. By comparing fingerprints instead of the entire code, MOSS can efficiently detect similarities between large numbers of submissions. If two code submissions share a significant number of fingerprints, it indicates that there may be a high degree of similarity between them.

  6. Ranking: Based on the number of matching fingerprints, MOSS calculates a similarity score for each pair of submissions. The higher the score, the more similar the two code files are. MOSS then generates a report that ranks the pairs of submissions according to their similarity scores.

  7. Result visualization: WeCP presents the MOSS report to the instructors or administrators, highlighting the similarities between code submissions. The report shows side-by-side comparisons of the code segments with matching fingerprints, making it easy to review and determine if plagiarism has occurred.

Example:

Suppose two users, Alice and Bob, submit solutions to a coding challenge on WeCP. Alice writes a genuine solution, while Bob decides to copy Alice's code with some minor modifications, like changing variable names and comments. When the submissions are sent to MOSS, the following happens:

  1. MOSS normalizes both code submissions, removing differences in whitespace and comments.

  2. The code is tokenized, converting both submissions into sequences of tokens representing their syntax and structure.

  3. Fingerprints are generated for each submission using the Karp-Rabin algorithm.

  4. MOSS compares the fingerprints of Alice's and Bob's submissions.

  5. Despite the modifications Bob made, MOSS detects a high degree of similarity in the fingerprints.

  6. MOSS assigns a high similarity score to the pair of submissions and includes them in the report.

  7. WeCP administrators or instructors review the MOSS report, which highlights the similar code segments between Alice's and Bob's submissions. They can then investigate further and determine if plagiarism occurred.

Note:

It is important to note that MOSS is not a definitive tool for proving plagiarism; it merely helps in identifying potential cases of code plagiarism. A manual inspection of the flagged code segments is necessary to make a final determination.

Did this answer your question?