Recently, METR published a randomized controlled trial showing that Cursor 3.7 slowed experienced open-source maintainers by 19%. For those familiar with practical AI adoption in complex codebases, this result wasn't surprising.
Once you examine the study setup, the outcome becomes easier to understand. The participants were highly experienced maintainers working in deeply familiar, mature repos: environments where generalized AI suggestions are rarely a clean fit. No ramp-up time, limited task diversity, outdated models, and surface-level integration all contributed to a scenario where AI was bound to underperform.
That said, this kind of outcome doesn't reflect what's possible when AI assistants are rolled out with structure and intention. At DataArt, we've seen dramatic productivity gains in real-world projects, especially when the setup includes documentation, context, and a deliberate onboarding phase. This post explores what might go wrong, why some AI rollouts stall, and what patterns have emerged from setups that consistently deliver results.
Why the METR Study Was Designed to Show Slowdowns
Worst-Case Task Selection
The METR trial results make more sense when you look at the task setup. Developers were working in deeply familiar, highly customized codebases, the kind where experienced maintainers rely on years of accumulated knowledge.
In these environments, AI suggestions rarely land well right away. Critical context often lives in the developer's head: custom build flags, test harness quirks, undocumented conventions. None of it is visible to the model unless it's been clearly documented or included in training. Without that scaffolding, the AI is left to guess.
Missing Ramp-Up Periods
What was missing were the preparation steps that make success possible. There are two ramp-ups that matter:
- Ramp-up for the developer — learning how to work with AI, write better prompts, and integrate it naturally into workflow
- Ramp-up for the agent — either by refining the model (like Amazon Q Developer Customization), or by unlocking knowledge from your repo through documentation
Tools like Cursor 3.7 aren't plug-and-play. In the trial, most developers had completed only ~20 AI-assisted tasks. That's early days. There was no structured onboarding, no prompt refinement, and no context setup.
At DataArt, we've learned it takes longer. A few quick tries don't do much. But once AI becomes part of the daily flow — after two to three weeks of steady use — things shift. That's when you begin to recognize the assistant's strengths, guide it more effectively, and see it adapt to your codebase.
Outdated Models & Low Acceptance Rates
The experiment used Sonnet 3.5/3.7, which have since been surpassed by tools like Claude Code and Opus 4. The trial saw a 39% acceptance rate on suggestions — meaning over 60% of the AI's output was rejected or required heavy editing. That kind of friction adds up.
At DataArt, we often use agents in auto-accept mode — but only once they've structured the problem and scoped the task properly. In those cases, the assistant iterates and tests autonomously with high success rates, particularly for peripheral or "safe" code paths.
Incentive & Measurement Biases
The study design also introduced subtle distortions. With an hourly pay structure and no incentive to move faster, some participants reportedly waited for AI output while checking email. AI wasn't integrated — just available.
Results were based mostly on task duration, which might miss other important effects like scaffolding, architectural improvements, or reduced mental load. At DataArt, we use AI not just to write code, but to generate runbooks, maintain documentation, and explain tricky system parts — all of which save time across the team, even if they don't speed up an individual commit.
Real-World Example: What Onboarding Actually Looks Like
At DataArt, we build data pipelines using declarative configurations in AILA — DataArt's AI Lake Accelerator — to keep things modular, reusable, and simple. When I first asked an agent to help configure a new pipeline, I expected it to understand that pattern.
Instead, it gave me something completely off-track: a mix of new Python and Terraform code. Useful in theory, but it broke the whole point of our low-code, config-driven approach. It basically said: "write new code from scratch."
That's when it became clear: the agent needed better context.
So I followed OpenAI's prompt engineering guide and built a meta-agent prompt that rewrote my inputs into clearer, best-practice instructions. I used it to generate documentation for our repo by combining code comments, architecture diagrams, and platform principles spread across different sources.
Then, I created a prompt that required the agent to "read" this documentation first, before answering. And so, no more generic code dumps. The agent started generating accurate, declarative JSON configurations aligned with our standards.
After a few iterations, we layered in validation checks and self-reflection steps. The agent became part of the workflow. Not just a code generator, but a thinking partner. See how it works in our demo video.
The Missing Piece: Structured Agent Onboarding
This is where most successful AI rollouts diverge from failed ones. Before anyone expects value from an AI assistant, there needs to be a foundation, one that mirrors how a senior hire would be onboarded.
At DataArt, we call this "agent onboarding," and it's become essential to our AI workflow. Here's what that foundation looks like:
Low-Level Documentation (Code & API)
- Module responsibilities, extension points, data models
- Run/test instructions: configs, test harnesses, dependencies
- Coding conventions, formatting, naming patterns, refactor idioms
Mid-Level (Components & Flows)
- Feature guides with component interaction flows
- Sequence diagrams for key request/response cycles
- Configuration references: flags, environments, presets
High-Level (Architecture & Domain)
- System overview and business context
- Domain-driven events and state boundaries
- Rationale for design choices, known pitfalls
Even the best engineers take weeks or months to ramp up in unfamiliar code. AI agents are no different. They require context, examples, conventions, and time. The teams seeing real productivity gains are the ones who front-load this investment, and then compound it by keeping it alive.
DOs & DON'Ts for AI-Assisted Development
Setup & Ramp-Up
✅ DO
- Create an AI Onboarding Bundle: README.md, design docs, prompt libraries. Keep it versioned and close to your code
- Allocate 1–2 weeks of agent ramp-up: refine prompts, test suggestions, build trust
- Train or customize your model when possible (e.g., Amazon Q Developer Customization) to reflect your domain and coding patterns
❌ DON'T
- Drop AI into an undocumented repo and expect instant results
- Expect productivity boosts on day one
- Assume generic models will perform well on domain-specific or high-context tasks out of the box
Usage & Integration
✅ DO
- Use auto-accept or autonomous loops for scoped, low-risk tasks (e.g., UI scaffolding, test helpers)
- Mirror human onboarding: use checklists, buddy reviews, and quick-start guides
- Share and evolve prompt libraries across the team to reduce friction and improve consistency
❌ DON'T
- Blindly trust AI output for core business logic without review checkpoints
- Forget that AI agents, like humans, need structure and repetition to perform well
- Hardcode prompts into private IDEs — they'll disappear with the developer
Knowledge & Documentation
✅ DO
- Keep docs alive: maintain Claude.md, agent.md, inline comments, and session summaries that the assistant can reference
- Co-locate context files with your codebase: architecture guides, environment configs, key decisions
❌ DON'T
- Treat docs as one-time deliverables — stale docs confuse both humans and agents
- Expect AI to infer system design or domain knowledge without access to it
Measurement & Expectations
✅ DO
- Measure long-term impact: bug rate, review speed, code clarity, onboarding time
- Give AI workflows time to settle and scale. Track the curve over weeks, not hours
❌ DON'T
- Rely only on stopwatch-style task duration metrics
- Ignore the learning phase for both the human and the tool
The Bottom Line
AI agents and assistants do work (even at these earlier stages) but only when they're treated as part of the team, not as a magical shortcut.
At DataArt, the biggest gains come from structure: clear documentation, intentional onboarding, feedback loops, and thoughtful task design. The teams seeing consistent impact are the ones willing to invest upfront, stay disciplined, and give both the developer and the assistant time to learn.
There's no instant magic, but with the right setup, the payoff grows fast.












