The Short Answer
Codility detects AI usage through four layered systems — Similarity Check, which cross-references each submission against a corpus reported to span roughly ten million historical solutions and a growing set of reference AI outputs; an AI-generated code classifier added to the Similarity Check pipeline in 2024; behavioral telemetry covering paste events, tab focus loss, and typing cadence; and the Device Integrity App that scans for known capture and assistant binaries at session start. None of these layers can see what is happening on the candidate's desktop outside the browser tab unless the Device Integrity App is installed. The classifier can still flag the resulting code as AI-generated based on its structural fingerprint regardless of how it arrived in the editor. The detection surface is real, narrower than candidates often assume, and broader than vendor marketing usually implies.
How Codility's Similarity Check Actually Works
Similarity Check is the oldest layer in Codility's detection stack and remains the most heavily weighted in 2026. It runs after submission on solutions that score above the passing threshold. Each submission is tokenized, normalized to remove cosmetic differences such as identifier renames and whitespace, and compared against a database of historical candidate solutions, leaked solutions surfaced by Codility's LeakSweep crawler, and reference solutions produced by major frontier models on the same tasks. The comparison produces a similarity percentage that the recruiter reviews in the candidate report.
The engine is designed to recognize similarities even when the submission has been modified. Reformatting, identifier renaming, control-flow reordering, and minor structural changes are flattened out by the tokenization step. SQL tasks are explicitly excluded from similarity scoring because legitimate SQL solutions naturally converge on a small number of canonical forms. The same logic underpins the platform's documented note that recruiters should treat the score as a flag for review rather than as a verdict, an approach also reflected in the broader landscape covered in the comparison piece on whether HackerRank detects AI in 2026.
The pseudocode below approximates the shape of a similarity-threshold check. It is a simplified illustration, not Codility's actual model, but it captures the general logic that produces a comparable flag.
def similarity_flag(submission, corpus, threshold=0.72):
tokens = normalize(tokenize(submission))
best_score = 0.0
best_match = None
for ref in corpus:
ref_tokens = normalize(tokenize(ref.code))
score = jaccard(set(ngrams(tokens, 5)),
set(ngrams(ref_tokens, 5)))
if score > best_score:
best_score = score
best_match = ref
if best_score >= threshold:
return {
"flag": True,
"score": best_score,
"match_source": best_match.source,
"match_kind": best_match.kind,
}
return {"flag": False, "score": best_score}
The AI-Generated Code Classifier
Codility added an AI-generated code classifier on top of the Similarity Check pipeline in 2024 and continued tuning it through 2025 and into 2026. The classifier looks at the final submitted code and assigns a probability that the submission was authored or substantially assisted by a large language model. The features it uses include stylistic markers — unusually consistent naming, idiomatic but textbook-pattern structure, the absence of dead code or false-start fragments, comment style associated with frontier-model output — and structural markers visible in the edit history that the editor records alongside the final code.
The classifier sits inside the Similarity Check workflow rather than as a separate product. Recruiters see its output as part of the integrity section of the candidate report. The probability rolls into the overall recommendation alongside the corpus similarity score, the behavioral telemetry, and any proctoring evidence. Codility does not publish a single headline accuracy number for the classifier; independent research on AI-code detectors through 2024 to 2026 places typical accuracy on unmodified LLM output in the 85 to 92 percent range depending on language, with sharp degradation once a candidate makes even small structural edits. That degradation is structural to the detection problem rather than specific to Codility.
Need a defensive workflow that does not rely on external paste or screen sharing? TechScreen runs as an invisible desktop layer outside the browser sandbox where Codility's classifier and behavioral telemetry operate. Claim 3 free tokens at techscreen.app.
Paste Events, Focus Loss, and Typing Cadence
The behavioral telemetry layer runs in every Codility assessment by default and does not require any extra permission beyond the standard browser sandbox. It logs tab focus changes through the visibilitychange and blur events, paste operations into the code editor through the standard ClipboardEvent interface, the timestamp and length of each typed segment, and IDE shortcuts that touch the clipboard. Each event lands in the candidate timeline visible to the recruiter after submission, alongside the timeline of code edits.
Mini Q&A. Does Codility log the full content of every paste? Yes. The platform records the timestamp, the length in characters, and a hash of the pasted text, and the recruiter can later reconstruct which fragment of the final solution arrived as a paste versus which fragment was typed. Pasting a single line of helper code is materially different from pasting a 78-line block right after a forty-second focus loss, and the timeline makes that distinction visible at a glance.
Typing Pattern Detection, expanded through 2025 and 2026, looks at the rhythm of keystrokes inside the editor. The model is trained on a large corpus of organic candidate sessions and learns the distribution of pauses, backspaces, and burst-typing patterns characteristic of a human writing code under interview pressure. Sessions where the keystroke distribution looks more like transcription than composition — long pauses, then long bursts of clean typing with very few edits — get flagged with a higher behavioral suspicion weight. The cadence model overlaps with the patterns described in the HackerRank detection breakdown, and many of the same false-positive cases apply.
What Each Codility Surface Actually Detects
The Codility product line has three distinct surfaces, and their detection capabilities differ meaningfully. Candidates often conflate them, then either overestimate or underestimate the surveillance they are facing in a given round. The table below summarizes the published capabilities of each surface as of 2026.
| Detection signal | CodeCheck (async) | CodeLive (live interview) | Codility Tasks (practice) |
|---|---|---|---|
| Similarity Check against corpus | Yes | Post-session, configurable | No |
| AI-generated code classifier | Yes | Configurable | No |
| LeakSweep cross-reference | Yes | Configurable | No |
| Paste event logging | Yes | Yes | No |
| Tab focus and blur events | Yes | Yes | No |
| Typing cadence analysis | Yes | Limited | No |
| Webcam snapshot proctoring | Optional | Built into video call | No |
| Continuous screen recording | Optional | Built into video call | No |
| Device Integrity App scan | Optional | Optional | No |
| Live human interviewer | No | Yes | No |
| Recruiter-facing detection report | Yes | Yes | No |
| ID verification | Optional | Optional | No |
CodeCheck, the default async assessment used for first-round screens at most European employers and a growing list of US ones, is the surface where automated detection does the heaviest work. CodeLive is the synchronous interview environment where the interviewer is the primary detection layer and the automated signals run quieter in the background. Codility Tasks is the practice library; nothing in that environment is reported anywhere, and candidates often confuse the absence of detection on Tasks with absence of detection on CodeCheck.
The candidate's actual risk profile depends entirely on which of these three surfaces is in front of them. A CodeCheck with full proctoring is a different environment than an open CodeLive session with one interviewer over a video call.
What Codility Structurally Cannot Detect
Every layer described above runs inside the browser, with the single exception of the Device Integrity App, which is an optional installed component the candidate is asked to run before a high-stakes proctored session. There is no kernel-level driver, no operating-system hook, and no virtual-desktop scan in the default configuration. This is the fundamental architectural boundary of Codility's detection surface, and it defines what the platform structurally cannot observe.
Codility cannot see windows that never enter the WebRTC capture buffer or the browser tab. It cannot see content rendered on a second monitor unless the candidate selected that monitor when prompted to share their screen. It cannot see audio on the system. It cannot see a second device sitting next to the candidate's main computer. It cannot see input that does not produce a keystroke or paste event inside the CodeCheck editor — which includes input delivered by simulated typing tools that emit individual keydown events rather than a single paste, although those tools trigger the cadence model separately. The same browser-sandbox boundary is discussed in detail in the companion article on whether interviewers can see paste events in 2026.
The Device Integrity App, where installed, narrows but does not eliminate this gap. It enumerates the running process list and loaded kernel extensions for a curated set of known capture tools, remote-desktop daemons, and AI interview assistants whose binaries are on Codility's list at session start. Tools whose process signatures are not on that list, or that run on a different device entirely, are outside the App's observation.
False Positives and the Documented Risk Surface
Codility's documentation explicitly notes that behavioral signals on their own are not proof of cheating and asks recruiters to review the full report before drawing conclusions. The platform publishes no numerical false-positive rate, but the categories of benign behavior that trigger flags are well known and worth enumerating because they shape how candidates should think about the system.
The most common false-positive case is the canonical-solution flag. A candidate who has practiced extensively, learned the textbook approach to a common interview problem, and writes clean code with idiomatic naming will produce a submission whose token-level fingerprint resembles many other clean submissions on the same task. Similarity Check is more likely to flag this candidate than a candidate who arrives at a messier, more idiosyncratic solution. The AI classifier compounds the effect because clean idiomatic code is exactly what frontier models tend to produce.
The second common case is the IDE-shortcut flag. Candidates who use VS Code or JetBrains shortcuts for refactoring, multi-cursor edits, or code generation can produce edit-history patterns that the cadence model reads as paste-like bursts even when no paste event was actually recorded. Accessibility tools that simulate input fall into the same category.
The third case is the legitimate-paste flag. Candidates who copy the problem statement to a notes app to work through it offline, then paste back their own draft solution, generate a paste event whose hash does not match the problem description but whose content is entirely their own work. The platform cannot distinguish this case from a paste of externally-generated code without human review of the full session.
Three free tokens with TechScreen are available for first-time users at techscreen.app. The tool operates outside the browser tab where Codility's behavioral telemetry runs.
Common Mistakes Candidates Make About Codility
Most of the misunderstandings about Codility's detection cluster around the assumption that the platform sees more than it actually does, or that it sees less. The list below covers the four most common patterns that turn into real problems for candidates.
- Treating CodeCheck and CodeLive as the same surveillance environment. They are not. CodeCheck has automated detection running across every layer of the stack. CodeLive has lighter automation and a human interviewer as the primary signal. A defensive approach calibrated for one is the wrong approach for the other.
- Assuming the absence of webcam means the absence of detection. Behavioral telemetry runs in every Codility assessment by default, with no permission prompt and no visible indicator. Tab focus events, paste events, and cadence patterns are captured even when the recruiter has not enabled any optional proctoring.
- Pasting a self-written draft back into the editor without context. The platform sees the paste event and weighs it the same way as a paste of externally-generated code. Candidates who solve the problem in a separate editor and then paste their solution in look identical to candidates who solve it with an external tool. The defensive move is to compose directly in the Codility editor.
- Believing the AI classifier is deterministic. It produces a probability, not a verdict. Probabilities in the 85 to 92 percent accuracy band mean that a non-trivial fraction of submissions on either side of the line are misclassified. The classifier's output is one signal among several in the recruiter's report.
- Conflating Similarity Check with plagiarism. Similarity Check measures token-level overlap against a corpus. High overlap can mean copying. It can also mean two candidates independently arrived at the canonical solution. Codility's own documentation treats the score as a flag, not a verdict.
- Underestimating how much of the detection runs post-hoc. Many candidates assume that nothing was flagged because no warning appeared during the test. The Similarity Check, AI classifier, and corpus comparison all run after submission. The recruiter sees the result; the candidate does not.
Live versus Asynchronous: Why the Distinction Matters
The single most important framing for thinking about Codility detection in 2026 is whether the candidate is in CodeCheck or CodeLive. CodeCheck is asynchronous: the candidate works alone in the browser, the platform records everything, the recruiter reviews after submission, and the integrity report combines the corpus similarity, the AI classifier output, the behavioral timeline, and any proctoring evidence into a single recommendation. CodeLive is synchronous: a human interviewer is on a video call with the candidate in real time, the same editor is shared, automated signals run quieter in the background, and the interviewer's own observations dominate the post-session writeup.
The detection economics are different in each mode. In CodeCheck, the platform is doing the work, and the candidate's job is to produce a submission that does not light up the integrity report. In CodeLive, the interviewer is doing the work, and the candidate's job is to produce a session in which they can explain their reasoning, respond to follow-ups, and demonstrate that the code they wrote is code they understand. The same defensive principles apply across other live environments — for instance, candidates who want to understand how live coding environments at specific employers like Linear, Notion, Airbnb, Shopify, Cloudflare, Pinterest, Palantir, or Jane Street configure their tooling can plan accordingly.
Mini Q&A. If I solve a CodeLive problem perfectly and the interviewer never seems suspicious, can Similarity Check still flag me post-session? Yes, if the customer has Similarity Check enabled on CodeLive sessions. The classifier and the corpus comparison run after submission regardless of whether anything seemed off during the live call. Companies that take the integrity report seriously will read it before sending an offer.
What It Means in Practice
The combined picture of Codility's 2026 detection stack is straightforward to summarize even though each layer is internally complex. The platform is good at catching submissions whose final form matches its corpus of historical solutions, leaked solutions, and reference AI outputs. It is good at catching paste events and tab focus loss. It is moderate at catching typing cadence anomalies, with a meaningful false-positive surface on fast typists and on candidates with extensive prior practice. It is poor at catching anything that happens outside the browser tab and outside the Device Integrity App's known-binary list.
That picture has practical consequences. A candidate who pastes a 78-line block of code after a focus-loss event is lighting up the easiest signal in the stack. A candidate who composes directly in the CodeCheck editor, writes code that resembles their natural style, and avoids the canonical textbook idioms is producing a submission whose only detection vector is the AI classifier — and the classifier's accuracy on modified output is well below its accuracy on unmodified LLM output. A candidate who uses an external tool that does not enter the browser sandbox at all is outside the platform's structural observation surface.
The broader landscape of detection tooling across platforms is covered in the overview of how AI interview assistants work, the comparison of the best invisible AI tools for technical interviews in 2026, and the deeper-dive piece on how to use AI in coding interviews without getting caught in 2026. The companion article on whether interviewers can see paste events in 2026 covers the paste-event layer across all major platforms. For the broader ethical framing, the is using AI during a coding interview cheating piece is the relevant reading.
TechScreen sits in the layer Codility cannot see — outside the browser, outside the WebRTC capture buffer, outside the known-binary list — and offers 3 free tokens at techscreen.app for new users.
The Closing Frame
Codility's AI detection in 2026 is a layered, post-hoc, signal-aggregation system rather than a single deterministic check. Similarity Check carries most of the weight; the AI classifier amplifies the signal when the submission carries LLM-style markers; the behavioral telemetry catches the loud cheating patterns; the Device Integrity App narrows the desktop observation gap when installed. Each layer has a documented false-positive surface, and the platform's own documentation positions the integrity report as a starting point for human review rather than as a verdict.
The right mental model for a candidate is to assume that everything inside the browser tab is logged, that the final submission will be compared against a corpus that includes both historical human solutions and reference LLM outputs, and that anything outside the browser tab and outside the Device Integrity App's known-binary list is structurally invisible to the platform. The detection surface is real and the false-positive surface is real, and the distance between the two is where most of the candidate-side and the recruiter-side decisions actually get made.
Frequently Asked Questions
Does Codility actually detect ChatGPT, Claude, or other AI assistants?
Codility does not directly detect external AI applications running on the candidate's desktop. The platform detects downstream artifacts: code that matches a corpus of known AI-generated reference solutions through Similarity Check, paste events into the CodeCheck or CodeLive editor, abnormal typing cadence, and tab focus loss. The AI-generated code classifier added to the Similarity Check pipeline in 2024 increases the weight of those signals when the final submission carries stylistic markers characteristic of frontier-model output.
How accurate is Codility's AI detection?
Codility does not publish a single headline accuracy number for its AI classifier. Independent reporting and 2024 to 2026 academic studies on AI-code detectors place the typical accuracy band for unmodified LLM output at roughly 85 to 92 percent depending on the language, with accuracy degrading sharply once a candidate makes even small structural edits. Codility's own documentation positions Similarity Check as a flag for human review rather than an automatic verdict.
What is the difference between CodeCheck, CodeLive, and Codility Tasks for AI detection?
CodeCheck is the asynchronous assessment with the full integrity stack enabled by default: Similarity Check, AI classifier, paste logging, tab focus tracking, and optional webcam or screen proctoring. CodeLive is the live interview environment where a human interviewer is on the call; automated detection is lighter and post-hoc, with the interviewer as the primary signal. Codility Tasks is the practice and training library, which does not produce a recruiter-facing detection report at all.
Can Codility see code typed in an external editor and then pasted in?
Codility can see the paste event itself with timestamp, length, and content. It cannot see the external editor where the code was composed, what tool generated it, or what tabs were open while the candidate was outside the browser. A paste of more than a few hundred characters after a long silence, or immediately after a tab focus loss, is one of the highest-weighted patterns in Codility's behavioral telemetry.
What triggers a Codility Similarity Check flag?
Flags are triggered by clusters of signals rather than any single match. Common triggers include high token-level similarity against another candidate's submission, similarity against a leaked solution that Codility's LeakSweep crawler has found online, similarity against a reference solution produced by a major large language model on the same task, and structural patterns that match the AI classifier's training distribution. SQL tasks are excluded from similarity scoring because canonical SQL solutions naturally converge.
Will canonical or idiomatic code falsely trigger Codility's AI classifier?
It can, and Codility's own documentation acknowledges this. Common false-positive cases include candidates who learned the textbook solution to a canonical interview problem, candidates who use clean naming conventions and avoid throwaway variables, and candidates with prior competitive programming experience whose code closely matches the canonical idiom. The classifier produces a probability, and recruiters are explicitly asked to review the full report rather than rely on a single score.
Does Codility detect AI in CodeLive interviews the same way as in CodeCheck?
No. CodeLive has lighter automated detection because a human interviewer is present and the session is recorded for collaborative review. Paste events are still logged into the editor timeline. Similarity Check runs after the session if configured. The AI classifier may or may not be applied depending on the customer configuration. The interviewer's own observations and follow-up questions remain the primary detection layer in CodeLive.
What is the practical risk of getting flagged by Codility's AI detection?
The flag itself does not auto-reject the candidate. It surfaces in the recruiter's report alongside the supporting signals — similarity score, paste timeline, focus events, classifier probability. The outcome depends entirely on the hiring company's policy. Some companies disqualify on a single high-confidence flag, others ignore the flag, most use it as grounds for a follow-up technical conversation in which the candidate is asked to explain their solution.
Ready to use AI assistance in your next interview?
TechScreen is the invisible AI assistant trusted by engineers interviewing at Google, Meta, Amazon, and hundreds of other companies. Start with 3 free tokens — no credit card required.
Ace your next interview →