PR Analysis Beta: 15 Hours Validating What Actually Matters
PR Analysis Beta: 15 Hours Validating What Actually Matters
The Feature Nobody Asked For (With 0 Customers)
A week or so ago, I read Zed's blog post about how they hired two developers purely through their GitHub PR contributions. No traditional interviews. Just found impressive PRs in open source projects and hired them.
What made it fascinating were the different contribution patterns:
- Developer #1: Became their top external contributor over 10 months of sustained contributions
- Developer #2: Worked on debugger backend infrastructure for over half a year, collaborating closely with the team
These weren't trivial bug fixes. These were different patterns of valuable contributions - sustained engagement as a top contributor, and deep feature ownership over extended periods.
My original plan was to implement PR analysis 1-2 months after launch, once I had some traction. But then I did what any rational founder would do: researched the competition.
Turns out no one was doing PR analysis for hiring.
Perfect moat opportunity.
So naturally, with 0 customers and 0 revenue, I decided to build a feature no one asked for. Why wait for customers to validate what you should build when you can just... build it anyway?
The idea: if we could capture those exact hiring signals - sustained contributions as a top contributor, deep feature ownership over months - then we'd have something companies would actually use.
The Validation Weekend
I tested it against both developers from the Zed article. Started with the one who worked on debugger backend infrastructure.
The analysis was working - generating insights, finding patterns, looking really promising. I was genuinely excited.
Then I noticed something: our analysis showed 230 PRs, but GitHub UI showed 216 authored PRs. Turned out we were correctly fetching both authored (216) and assigned PRs (14). One of those 14 assigned PRs was the debugger PR - 977 commits and 25,837 additions. The exact PR that got them hired at Zed.
And our AI never mentioned it.
The PR was in the data. All 977 commits. All 25,837 additions. But it was buried at position #149 out of 151 items in the evidence list. We ask the AI to provide 5-7 items per category. Position #149 might as well not exist.
Three Key Fixes
1. The API Evolution (REST → Hybrid → Pure GraphQL)
First issue I tackled: the API approach was messy. I'd started with REST (120+ calls), then built a hybrid during initial implementation - 1 GraphQL call + 5-8 REST calls per PR.
Switched to pure GraphQL:
- Single query gets everything: title, body, commits, reviews, comments, merge status, author
- Added pagination for complete PR history
- Separate assignee search (finds collaborative PRs like the debugger one)
- Total: 4-5 API calls vs 120+ originally
And pure GraphQL gave us MORE data - commit messages for co-authorship detection, review counts for collaboration depth, full PR body for documentation quality.
2. Evidence Sorting (The Real Fix)
The debugger PR was there, just invisible. Classic mistake: I tried prompt engineering first. Tweaked the prompts to emphasize larger contributions, architectural work, substantial PRs. The AI would briefly mention the debugger work, but never highlighted it the way it should have been - this was the PR that got the developer hired at Zed.
Wrong approach. The AI wasn't broken. My evidence sorting was.
Instead of prompt engineering, I sorted evidence BEFORE the AI saw it:
def sort_by_actual_data(pattern):
commits = extract_commit_count(pattern)
additions = extract_additions(pattern)
is_merged = check_if_merged(pattern)
return (-commits, -additions, -is_merged)
Debugger PR jumped from position #149 to position #1. No prompt changes needed.
3. Quality Signals from Real Hiring Decisions
Zed's article was invaluable - they explicitly stated what they valued:
✅ Sustained engagement over years, not spurts ✅ Collaboration depth - review cycles, not rubber stamps ✅ Pair programming - Co-authored-by commits ✅ Feature ownership - concept to production
Not total PR count, lines of code, or commit frequency.
I rebuilt the evidence extraction to track these signals:
pr_dates = [] # Contribution timeline
co_authored_prs = [] # Pair programming evidence
high_review_prs = [] # Deep collaboration (3+ reviews)
owned_features = [] # Features to production
Now the evidence reflects what Zed actually hired for, not what's easy to count. It's one case study, but at least it's a real one.
(And yes, there was a comedy bug where the analysis asked "How did you coordinate with yourself on these PRs?" when developers were assigned to their own PRs. Self-collaboration. Fixed.)
What I Learned (The Hard Way)
1. AI is only as good as the data you feed it
Prompt engineering is a band-aid for bad data architecture. Fix your evidence extraction, not your prompts.
2. Sort by what matters, not what's easy to measure
Sorting by PR size is easy. Sorting by actual hiring signals (review decisions, sustained contributions, pair programming) takes work.
But only one of those matters.
3. Real case studies unlock the right solution
I was always going to build PR analysis down the road. But without the Zed article as validation, I'm not sure I would've built what matters.
Testing against developers who actually got hired through PRs revealed:
- Self-collaboration bugs
- Missing critical contributions
- Evidence buried where AI couldn't see it
- Wrong quality signals (what's easy to measure vs what actually gets people hired)
The Zed case study was the key to unlocking what PR analysis needed to be. Not just "does it work?" but "does it surface the signals that matter?"
Was It Worth It?
Look, validating against two developers is an extremely small case study. It proves nothing statistically.
But it gave me a point of origin. A springboard.
Here we have two developers who were hired directly through PR contributions. Zed didn't ask them to solve LeetCode puzzles or whiteboard algorithms. They looked at actual work - merged PRs, documentation, collaboration patterns. The very thing you're hiring developers to do.
After 15 hours of work, our PR analysis now correctly identifies:
- The debugger PR (977 commits, 25,837 additions) as the #1 key insight
- Sustained contributions over 3+ years
- Pair programming evidence (65 co-authored PRs)
- Deep collaboration (25 PRs with 3+ review cycles)
- Feature ownership (26 major features taken to production)
These are the exact signals Zed valued when they hired these developers.
The next step? Ensure we can provide meaningful, actionable insights for anyone looking to recruit top talent in tech. Because if Zed's approach proves anything, it's that analysing actual work developers will do day-to-day - successful merges, code quality, collaboration - is more aligned with real job performance than asking candidates to solve puzzles under pressure.
You're hiring them to write production code and collaborate with teams. Now you can analyse evidence of exactly that, and get actual insights on how they work.
Building a feature with 0 customers? Still ridiculous.
But having a validated starting point based on real hiring decisions? That made it worth it.
What's Next
Look, I have 0 customers. No direct validation that anyone wants what I'm building.
But I believe if anyone does end up using this, PR analysis will provide even more value on top of the public repo analysis we currently offer. Public repos show what someone built. PR contributions show observable collaboration patterns - review cycles, co-authored commits, merge rates, assignment patterns.
And once people actually start using it - CTOs, engineering managers, whoever - we'll tweak the criteria based on real-world needs. The Zed case study is the starting point, not the final answer.
PR analysis is in beta. If I can convince anyone to actually use it, we'll be gathering feedback on:
- The quality signals match what hiring managers actually look for
- The evidence is comprehensive enough for meaningful discussions
- The insights help identify technical depth vs surface-level contributions
I'd rather spend a weekend validating something I actually want to build. Plus, the results were really something - and no one else is doing it.