Hallucinated Law: Why GenAI Gets Cases Wrong, Who’s Responsible and What to do about it

Admin ILTN
Apr 14
11 min read

Artificial Intelligence (“AI”) and Fabrication of Records

In State v. Coleman, an attorney allowed a paralegal to use ChatGPT to prepare an appellate filing. However, the AI tool generated fabricated transcript quotations attributing specific inflammatory statements to a real prosecutor that was never spoken. When the Ohio Court of Appeals identified these fabrications, the attorney did not promptly correct the record. Instead, when the matter escalated the filings continued to exhibit unreviewed AI output including a prompt embedded in the legal brief.

The court’s position was unambiguous, holding that "An attorney who files a document containing AI-generated content is responsible for that content, fully and without qualification. The duty to verify, the duty of candor, the duty of competence and the duty of supervision cannot be delegated to a machine." The onus and accountability of evaluating AI-generated content is on the attorney and cannot be shifted to the AI tool.

The attorney was sanctioned with a monetary fine of $2,000, disqualified from the case, referred to the bar, mandatory CLE and a formal letter of apology to the affected individuals. State v. Coleman is not an isolated failure. Rather, it is part of a category of cases where courts are identifying and sanctioning hallucinated legal content. As of March 2026, 1,222 cases were identified involving hallucinated legal content with the United States (US) alone accounting for 810 cases.

Understanding the pattern of hallucinated legal content requires moving beyond isolated incidents to identify the root cause of the problem. The article begins with emerging empirical evidence on the scale and distribution of AI-generated legal errors, before evaluating how such errors manifest and often evade detection. Furthermore, the article considers whether tools that are built on authoritative legal databases (Westlaw, LexisNexis) meaningfully reduce hallucination risks and locates these developments within the profession’s existing ethical framework and measures control hallucination risks.

From Anecdote to Systemic Risk

Over the past 2 years, courts across jurisdictions have begun encountering and explicitly recording AI-generated legal errors. Hallucinations were initially treated as “anomalies” but they are increasingly part of a measurable and accelerating pattern.

The most comprehensive account of such cases is maintained by Damien Charlotin. This database tracks legal decisions where courts identified reliance on fabricated or inaccurate AI-generated content. As of early 2026, the database records 1,222 cases, with the US accounting for more than 800.

In 2024, there were 36 cases while it grew more than tenfold to 487 cases in 2025. The rate of growth is rapid with 100 cases being added every month to the database. The pattern of hallucination is not incidental. A benchmarking study by Stanford found that LLMs hallucinate in at least 1 out of 6 benchmarking queries with LLMs generating incorrect text 58% to 88% of the time.

In the US, pro se litigants accounted for 488 while lawyers accounted for 308 cases. Pro se litigants account for a majority of cases as opposed to licensed attorneys because they are more likely to use general-purpose tools and less likely to verify such outputs based on primary research.

This distribution does not confine the problem to only non-lawyers. A substantial number of cases involve licensed attorneys in court filings and civil litigation. The profession does not suffer from a literacy problem but rather a verification problem that needs safeguards to ensure AI-generated errors do not enter the record.

What “Hallucination” actually mean in legal practice?

Failure mode	What it looks like in practice	Why it is especially dangerous
Fabricated authority	A realistic-looking citation for a case that does not exist.	Completely undetectable without checking a primary database.
Distorted holding	A real case, real citation but wrong ratio, wrong outcome or wrong jurisdiction.	Passes cursory verification; more dangerous than pure fabrication.
Temporal error	Overruled, amended, or lapsed law presented as current.	Silent compliance/negligence risk; lawyers may not think to check currency.
Hallucination by agreement	The model builds on a wrong premise embedded in the prompt.	Gets worse the more you prompt; the lawyer's own error is amplified, not corrected.

The substance of AI-generated errors might vary depending on the kind of error that the tool has committed. All hallucinations are not alike and they differ based on detectability, risk and form. For example, a hallucinated case law in a court filing has a higher risk than an incorrect citation in an internal memo. The hallucinations can be classified into 4 categories.

The “Hallucination by agreement” is the most underreported hallucination and the most important one to understand for sophisticated practitioners. When a lawyer asks a model to draft or research something based on “X” and this position X is questionable, the model will invent case law, clauses and that reinforce this belief. This hallucination is difficult to catch because the error gets amplified as the model is prompted more, especially in longer chats considering the fact that legal documents are lengthier and substance-heavy.

Hallucinations as a feature, not a bug

The hallucination errors discussed in the previous section are not random errors. These errors arise from how LLMs are built. Hallucinations are a feature of LLMs and not a bug for 3 reasons. These reasons are:

1) The Confident Junior – LLMs are trained on swathes of data that reward fluent and coherent answers. LLMs are effectively penalized during training for “I don’t know” responses. The model behaves like a confident junior associate that has digested the complete law library and will generate an answer even if it has nothing to draw from. Furthermore, a recent MIT study highlighted the “sycophantic” problem of AI where models often agree and validate the idea of users reinforcing their beliefs. Over time, this creates a feedback loop where users can spiral, that is believing incorrect ideas.

2) The Incomplete Library – A model can cite sources such as tribunal decisions, case law, statutes, procedural rules and guidelines only if it was trained on such data. These sources are largely underrepresented in training data. The model will not be able to accurately cite what it did not see and fills the gap with something based on the pattern as these models operate on probability.

3) The Trainee with a Folder – LegalTech tools employ Retrieval-Augmented Generation (RAG). Instead of purely relying on the model’s memory, the model searches documents from a database that contains case law statutes, guidelines and foreign judgements. This gives the trainee a certain set of documents to work with and significantly reduces citations. The RAG folder has to contain relevant law and be continuously updated. If the folder contains poor and incorrect legal sources then the model will generate an unsatisfactory response.

Hallucination cannot be engineered out of the current LLM architecture. Lawyers have to build systems that can catch such errors before they cause harm.

Harvey and the Architecture of LegalTech

As of May 2025, a dataset of 125 verified hallucinations incidents across 93 tools involved a wide variety of tools such as ChatGPT, Claude, Westlaw, CoCounsel, Gemini and Microsoft Copilot. Harvey did not appear in any of these documented hallucination cases.

Harvey’s core design choice is “verticalization” - constraining and training intelligence within legal practice boundaries. The June 2025 strategic alliance with LexisNexis and the August 2025 launch of Ask LexisNexis is one of the key grounding moves. Lawyers will receive citation-backed answers from primary LexisNexis sources powered by Sheperd’s citations. Harvey has partnered with SCC as well to integrate Indian-specific case law into its suite of tools.

Though Harvey has grounded its model in authoritative legal sources, there are possibilities of hallucinations occurring. This is not because Harvey’s tools are inaccurate. Rather, it is because even the most architecturally responsible LegalTech tools rely on lawyers verifying the output. Harvey’s absence from the list of documented hallucination cases could reflect better design or may reflect issues that haven’t yet arisen.

Lexis + AI: Citation, Grounding and Residual Risk

LexisNexis markets Lexis + AI, their GenAI solution as being able to produce “100% hallucination-free linked citations. This is because LexisNexis is grounded in authoritative legal content. Despite being grounded in authoritative legal content, LexisNexis continues to produce incorrect information 17% of the time. This hallucination is not limited to LexisNexis alone but also Westlaw’s AI-assisted research produced incorrect information 34% of the time.

The K&L Gates case serves as a great example of grounded tools being prone to hallucination risks as well. In this case, attorneys from Ellis George LLP and K&L Gates LLP submitted a brief to Special Master Michael Wilner. This brief contained several hallucinated citations. The attorneys had used tools such as CoCounsel, Westlaw Precision and Gemini to outline and generate their brief. Nine citations from a 27-citation brief were wrong. The corrected brief still had six errors.

Grounded tools reduce hallucination risk but do eliminate them completely. When every answer is citation-linked, the residual risk of verifying whether the citation backs the proposition for which it was meant still falls on the lawyer.

Tier	Examples	Architecture	Hallucination risk	Key implication
1) Grounded, citation-linked	Lexis+ AI, Westlaw AI, Harvey + Ask LexisNexis	RAG on curated authoritative legal corpus; Every answer links to a source document.	Lower, but not zero depending on corpus completeness and user verification.	Always click through to the source. Never paste a generated summary without reading it.
2) Legal-tuned general models	CoCounsel, Claude in legal wrapper, Harvey standalone workflows.	Better than raw chatbots with some legal training and varies by workflow.	~1 in 6 benchmark queries per Stanford HAI	Suitable for drafting and issue-spotting but not for generating authorities without verification. Firm AI policy must explicitly cover this tier.
3) General-purpose chatbots used ad hoc	ChatGPT, Google Gemini, Microsoft Copilot used without legal system layer	No legal database grounding with no citation architecture.	Highest; 58–82% hallucination rates on legal questions in some benchmarks.	Should be prohibited by firm policy from generating case citations for any client-facing work. Off-record ideation only.

Risk is not binary but depends on the kind of LegalTech tools that you pick. Grounded, citation linked tools are the best but even these tools can hallucinate. Lawyers have to set verification parameters based on the kind of legal system that they are deploying. Thus, even the most professional grade tools do not cancel out the need for independent verification.

Professional Responsibility in the Age of AI

The duties of lawyers such as competence, confidentiality, supervision and candor and meritorious claims are not new. The frequency, scale and subtlety with which such duties are breached in AI-assisted workflows is difficult because such errors are difficult to catch.

1) Duty of Competence (Rule 1.1) – Lawyers must have a reasonable understanding of GenAI tools, their capabilities and limitations. This does not mean that they need to be technical experts but must possess technological literacy. This duty is not static but evolves with emerging technologies. Thus, a lawyer’s reliance on GenAI tools output without a level of independent verification could violate the duty to provide competent representation. Competence involves understanding when such systems start to fail and understanding when human judgement needs to step in.

2) Duty of Confidentiality (Rule 1.6) – Lawyers have to maintain attorney-client privilege and ensure that the client’s information remains confidential. Inputting confidential information into AI tools could lead to disclosure of information and violate confidentiality. This problem is amplified in case of self-learning models (AI Agents) that can improperly disclose information. Lawyers must assess data handling policies (Terms of Service and Privacy) and obtain client’s consent before inputting such information. Furthermore, this consent has to be obtained clearly by explaining the GenAI tool, merits and limitations.

3) Duty of Supervision (Rule 5.1 and 5.3) – Lawyers have a responsibility to supervise non-legal staff and employees. Law firms have to establish AI usage policies and monitor compliance. They have to ensure that the staff are properly trained. Whenever lawyers are using third-party AI vendors, they have to ensure security, reliability and confidentiality. Delegating work to AI systems through junior staff does not dilute responsibility but expands the scope of supervision.

The UK courts have flagged multiple AI-generated filings that contained hallucinations. Furthermore, the Solicitor’s Regulator Authority (SRA) existing obligations already cover technology use. SRAs guidance on innovation clearly states that solicitors “remain responsible for their work and client service, even when using innovative technology."

Judges are requesting for a centralized, judiciary-wide mechanism for tracking sanctions because they want to evaluate the frequency, pattern of misuse and effectiveness of sanctions. In Fletcher v. Experian Info Solutions, the U.S Court of Appeals for the Fifth Circuit cited Damien Charlotin’s database making it the first federal court to do so. One of the other rules to be introduced for lawyers is to include a mandatory hyperlink rule requiring every cited authority to be linked to reputable databases such as Westlaw, LexisNexis and Indian Kanoon. This is not an additional duty as lawyers have a pre-existing duty to ensure that their filings are backed by accurate authorities.

Taken together, these duties point to a conclusion that LegalTech tools do not relax professional standards. Rather, the threshold for such standards have been increased with the lawyer having to verify AI-generated output.

Operationalizing Verification in Legal Practice

Hallucinations are structural and not a bug. Verification has to be built into the manner in which legal work is done. The question is not about how lawyers use AI but about how they use it responsibly that prevents unverified output from entering the record. The problem of hallucination is not merely technical but economic as well. GenAI compresses the time required to research, summarize and draft and lawyers are usually incentivised around speed, volume and client responsiveness. AI tools increase the temptation to skip verification and verification becomes slow and invisible.

At the individual level, every lawyer has to treat a GenAI output as a first draft that requires detailed interrogation. The model’s confidence is not evidence of accuracy. Every case citation in a GenAI output has to be checked against a reliable database such as Westlaw, LexisNexis, India Kanoon or official court websites. If the case does not appear, then the model has hallucinated the case.

Lawyers have to read the full judgement not only to check if the case is real but also to evaluate the ratio decidendi and whether or not the case is still good law. Furthermore, lawyers have to look out for red flags such as unfamiliar reporters, odd court/jurisdiction combinations and recent “decisions” absent from major databases.

At the firm level, every firm needs to implement an AI policy that addresses hallucinations with 4 components. These four components are:

1) Permitted Uses – LegalTech tools can be used for research, drafting, template population and issue spotting. However, they cannot be used to prepare final-citation lists or court-related material without verification.

2) Prohibition of Use – Firms should ensure that public models are not used for inputting confidential client information. Furthermore, authorities and procedural rules drawn from the tool have to be verified.

3) Training – Firms have to ensure that partners and associates undergo training in the form of workshops, courses and conferences to ensure they use AI tools responsibly and ethically. There should be an emphasis on prompt engineering to ensure they do not input prompts that can lead to reinforcing a false belief. An example of a good prompt instruction is “Based only on [jurisdiction] primary law, are there any authorities that support [proposition]? If you are uncertain, say so." This shifts the model from confirming your premise to genuinely searching. Importantly, lawyers should learn to access the EDRM judicial order database to be continuously updated with the standards they apply.

4) Supervision and Sign-Off – Firms have to ensure that any AI-generated work product before a court, tribunal or a regulator is signed off by a competent lawyer handling the matter. This is not merely about developing the best practice but about complying with ABA’s ethical obligation under Rule 5.3 of Opinion 512.

Lawyers and law firms can equip themselves with citation and reference checkers to speed up the process of review. Damien Charlotin has developed an automated reference checker called “PelAlkan” that can detect hallucinations. These tools scan documents and verify citations against a primary database. Citation checkers are useful but human review is still required.

Conclusion

The emergence of GenAI in legal practice has not created a new category of risks. Rather, it has exposed existing risk at scale. Courts and regulators are becoming clear with the duties of competence, confidentiality and supervision. ABA’s formal opinion 512 and the EU Act’s risk-based classification could apply to LegalTech in adversarial proceedings. India’s Supreme Court has recently flagged AI-generated fake case law as a “menace” and could potentially be termed as misconduct. This means that Human-in-the-loop becomes a regulatory requirement and a mandatory human sign-off is required for AI-generated legal content.

Courts could fill this vacuum by introducing a “mandatory hyper-link rule” and promote the usage of citation checkers such as PelAlkan that checks references before the file leaves the drafting environment. A related line of research focuses on “calibrated uncertainty” where systems are able to flag the unreliable output or communicate the level of risk of the output.

While this has not yet been reliably productised, the direction is pointing towards a shift where tools are shifting from purely fluent systems to systems capable of expressing uncertainty. Until then, the profession's most reliable hallucination detector remains a competent lawyer with time to check their sources.

This article has been authored by Harshith Viswanath, LegalTech Fellow at the Indian LegalTech Network and a second-year student at the National Academy of Legal Studies and Research (NALSAR) University, Hyderabad.

Hallucinated Law: Why GenAI Gets Cases Wrong, Who’s Responsible and What to do about it

Recent Posts

Comments

Send us a message, we'll get back to you shortly!