Why AI Detection Tools Are the Wrong Answer to the Right Question

A study published last week in Education and Information Technologies tested 81 scripts with AI-produced text spliced in at intervals from 5% to 100%. The results should end the conversation about AI detection as an integrity tool.

What the Turnitin Study Actually Found

At low levels of AI involvement — 5% to 10% — the detector returned no score. Between 15% and 40% it scored higher than the actual AI proportion, flagging honest assistive use as suspicious. From 70% upward it scored consistently lower than the real figure. A script written entirely by ChatGPT returned a score of 60%. When fully AI-produced text was run through a paraphrasing tool first, several combinations scored 0%.

The tool flags the honest and clears the strategic. It does both from the same broken number. Universities are suspending students and convening misconduct panels on this number. That should stop.

The Wrong Question

The appeal of AI detection tools is understandable. Institutions need confidence that students are genuinely learning. Professional programmes need assurance that graduates possess the knowledge and judgement that qualifications represent. These are legitimate concerns.

But AI detection addresses them indirectly, by asking whether AI was involved rather than whether learning occurred. Those are not the same question, and treating them as equivalent is the source of the problem.

A student can use AI extensively and still demonstrate deep understanding. A student can avoid AI entirely and still fail to engage with the subject matter. Detection cannot distinguish between the two because it is not measuring learning. It is measuring text patterns — and the Turnitin study confirms that it cannot do even that reliably.

The Governance Problem

When a detection score triggers a misconduct process, four questions immediately arise:

What level of confidence is required before action is taken?
What evidence supports the conclusion?
Can the student challenge the result?
Who is accountable if the detector is wrong?

Detection scores alone satisfy none of these requirements. The score is a probability estimate derived from pattern analysis, not a record of what happened. It is retrospective and inferential — a machine guessing at the past from a finished document. In any other context where algorithmic outputs drive consequential decisions about individuals, we would require transparency, accountability, contestability, and proportionality. AI detection in its current form provides none of these.

The research also raises concerns that detector outputs may disproportionately affect certain writing styles, non-native speakers, and neurodiverse students — the same groups already most vulnerable to institutional inequity. A tool that is both unreliable and systematically biased is not a neutral instrument with error margins. It is an active source of injustice.

Detection Versus Provenance

The alternative is not no governance. It is better governance.

Detection is retrospective — it examines a finished submission and attempts to reconstruct what happened. Provenance is contemporaneous — it records what happened as it happens, creating an accountable record of how work was developed, where AI assistance was provided, and where human judgement shaped the outcome.

The distinction matters practically. A provenance-based approach does not ask learners to prove they did not use AI. It asks them to demonstrate how they used it, what they learned from it, and where responsibility for the final work remains. That is a verification model rather than a suspicion model, and it remains valid regardless of how AI technology evolves — because it is measuring learning rather than attempting to detect a tool.

Clinical nursing education has been working through exactly this problem. When student nurses document clinical placements with AI assistance, the question that matters is not whether AI was involved in structuring a reflective account. The question is whether the student genuinely participated in the clinical experience, whether professional judgement was exercised, and whether an accountable record exists of how conclusions were reached. An immutable provenance record — showing the student's original notes, the AI contribution, and the final edited submission — makes that question answerable without requiring detection. The supervisor can see exactly what happened. The student cannot misrepresent their contribution. The institution has the evidence it actually needs.

Academic assessment needs the same architecture.

Reportica Pulse's Integrity Trace provides the provenance architecture described in this article — an immutable record of student contribution, AI involvement, and supervisor verification for clinical placement documentation.

Explore the governance framework →

What This Requires

Assessment needs to be redesigned, not the detector. The redesign should centre on making learning visible rather than trying to make AI invisible.

That means moving from product-based assessment — evaluating finished submissions — toward process-based verification — evidencing how understanding was developed, how decisions were made, and how professional judgement was exercised. It means building systems that record provenance throughout the creation process rather than attempting to reconstruct it afterwards. And it means treating AI involvement as something to be declared and contextualised rather than hidden and detected.

The institutions that navigate this well will not be those with the most sophisticated detection infrastructure. They will be those that develop the clearest mechanisms for verifying that learning occurred — mechanisms that are transparent to students, accountable to institutions, and robust enough to survive the next iteration of AI capability.

The Path Forward

Detection tools emerged because institutions asked a reasonable question. The difficulty is that they answer a different question from the one education actually needs answered. Until that is recognised, institutions will continue investing in technologies that provide the appearance of assurance while leaving the underlying challenge — knowing whether learning occurred — entirely unresolved.

The future of academic integrity is not detection. It is provenance. It is transparency. It is systems that make learning visible and hold all parties accountable for the work they submit and verify.