How AI tools for writing product docs fail, and the value of writing yourself

1 Reply

I revisited using LLM tools to write product docs, particularly requirements, this week. I was disappointed and unexpectedly inspired about the value of writing to think.

Today!

it’s not there yet, don’t do this
why it’s bad, and surprising ways it’s bad
writing yourself is worth it even if they were good
there are ways the tools can add value

I needed a high-level product requirements doc for something that in banking/fintech is pretty widely offered and boring even to me, a fintech nerd: an API for bank payments, and not even the cool new(er) US instant bank rails, but the 50-year old stuff.

First, I wrote one out myself. I’m an experienced PM, I’ve been in fintech and worked with this stuff for years, it didn’t take me long to write a couple pager that could be used as a starting point for high level discussions with engineering & stakeholders.

Then I fired up the models, fed them all the same starting information, and what I got was bad. Well, it was 80% there, but the time it took to debug the output took longer and was harder to do than the time it took me to write it.

There also are still times when an LLM tool takes off in the wrong direction, and once it’s fixated on the elephant, you can’t talk it out of it, now you’re just burning tokens talking about whether the elephant should be there, watching for the elephant to show up again, and wondering what your life has come to and whether you’ll ever get to see a real elephant, or if this home office is it for you.

For instance, one of the runs the writeup was at least a quarter about error handling, and at a level of detail that was wild. Included a discussion of the difference between real-time and batch error handling. Included a rollout plan for the error handling, including GTM for developer relations to get feedback on the error codes, publishing error documentation… and I could not get the tool to let it go in service of a high-level product definition for general discussion. You just have to restart (if your tool will let you).

The first and most predictable problem was the “LLM tools are like eager-to-please interns armed with Wikipedia and dosing shrooms” class of issue: requirements mostly right, but they’d wander off track or make lazy factual errors that were plausible.

“NACHA rules seem to disagree, and argue that if (boring thing)…”

“You’re right, I oversimplified it…” (or sorry for the confusion, or so on)

(repeat as required)

I found it incredibly hard and draining to catch the errors and go through that. Compared to me just writing based on working from reference documentation and my own knowledge, it felt like it took at least as long to get to a decent state and it was 10x more harmful to my psyche.

A huge part of this is that I find LLM-generated text to be so smooth and glossy it feels like it slides through my eyes, brain, and dissaptes, leaving nothing behind (hopefully).

It’s like a dense textbook where you have to run your finger or a card under the current line to keep your brain from throwing itself off the page to save itself from the agony, except there’s no information there.

As a human writer, you can sense when people’s eyes will gloss over, and work with that. I often try to use examples that are funny, and tie into each other over the course of a long document, so even if it’s tedious to write out sample transactions, you’ll be amused writing it and it’ll be less boring to read.

Professor Mew Mew wishes to buy an off-shore oil rig from a salvage company to establish her new society of super intelligent cat villains. Mew Mew is on the OFAC list, and thus during registration…

But LLM output doesn’t have intent or tone, much less a sense of timing or humor, and it all reads to me in a way that makes concentration more difficult and proof-reading almost impossible. And the result is it’s dangerous to use.

For instance, I was using ChatPRD and going back and forth pointing out things I’d catch, trying to nudge it to add information here, or drop the engineering timelines there, until after a lot of time I got to a place I thought the doc was pretty decent, and I needed a break.

The next day I read it again and I couldn’t believe I’d thought it was something you’d potentially show anyone. The top-level organization made sense but I’d read a section and think “what is this even saying?” There were huge chunks where it was confusing a certain code with an entirely different kind of transaction that had the same acronym — and I, as an experienced PM who’d written uncoutable docs, deeply steeped in the industry and the tools and standards, did not notice at allllllll when I was proof-reading and correcting in the tool.

If I’d turned that over to an engineering team, the best outcome would have been they’d have flung it back at me while jeering at the low quality of my work. The worst would have been if they’d taken it, not caught the problems with it, and been deep into work before coming back and asking me “hey, we built this for this one thing, but we’re realizing there are two different things here and now we’re stuck.”

Writing it myself wasn’t pleasant: the material’s familiar to me, it’s not interesting or new ground, which made me think it would be well-suited to using tooling. But it didn’t take that long: I put the headphones on, I concentrated for a couple hours, thought, and made the cursor go left to right.

In doing that, I thought of things I hadn’t considered before I sat down to write, which is one of the great reasons to do the work, and revised some assumptions I hadn’t realized I’d made until I wrote everything out. I ended up with something that if I presented it, would address their potential questions, lead to good conversation, and probably allow everyone to start their work even if there were details we’d want to hammer out.

I always wonder about people I see on LinkedIn who proclaim the spec (or PRD, or user story) is dead, that anyone who isn’t just building the app is about to be left behind by history. The purpose of writing these things isn’t to produce a particular document, it’s to think deeply about them. Sometimes, in our best moments as product managers, you write a product requirement doc and in so doing realize you don’t need the product at all, that there’s some much more elegant solution you only now realized.

I fear in going straight into auto-coded apps we miss that. Presented with a functional interface, our natural impulse is to critique the interface, to see something that’s so close to the finish line we need to get it across. If we use the tools to help us think deeply, that’s amazing, I’m all for it. But if every idea can be launched to the world immediately without much thought, what’s learned? What’s learned if you do it a thousand times?

In this, I see a path for these tools even as they exist, though.

When I joined Expedia, we were still in a kind-of-waterfall development process where you’d write a massive spec (the template was a Word doc that was over 30 pages at one point (the template!!)), filled with sections like

LOGGING
Have you considered this thing?
This other thing?

LOGGATE SIGNOFF: by what person, on what date

I used to joke it was designed for Expedia to be able to hire anyone off the street and have them not succeed but not fail disastrously, while at the same time we strove to hire the best, brightest, hardest-working people we could find.

Over time I came to realize there’s value in that kind of institutional knowledge turned into a checklist. Sure, many of those checks were a waste of time, but you only had to say “N/A” once, and the ones that reminded you hadn’t thought of a thing, or hadn’t thought hard enough… those made it all worth it.

This is where I got value out of the tools.

“I’m writing a high-level product overview document about bank payment APIs. My audience will be these functions, and we are at the conceptual stage so I’m looking to spark high-level thoughts and get feedback and direction for refinement before we invest further. What points will I need to address? What questions will those functions likely want answered at this early stage?”

Those answers: how you structure a document, what a template looks like, what are things for you to think about deeply, that’s helpful.

I’ve read engineers talk about how they’ve found value writing code themselves and then comparing it to what the tools generate as a way to spur good thinking and learning, but I haven’t seen that in product work. At no point did I look at the output of the tools and thought “this is a far more elegant way to write this.”

But even if you’re later in your career and can write a document while half-asleep because you got up for a call with an overseas office and they cancelled overnight, getting a first cut on feedback is useful.

“Here’s a high-level product requirement document. What’s wrong with it? What is it missing? What would a head of sales criticize it for?”

It is no match for a fellow human, no. I’ve never had an LLM tool take a prompt like that and say “have you thought about this other approach that’s just occurred to me that puts these other two things together in this cool way” which is the kind of things my brightest peers used to do regularly.

LLMs certainly never ask you to read their spec, so you can be inspired. I miss peer reviews and problem solving with friends.

That’s not my takeaway, though. It would be “the value of the thinking makes it worth it to at the very least draft it yourself first, and beyond that, the tools aren’t there yet and I worry that the way LLMs work means their output might always be hard to parse and revise, and hard to read and turn into useful work for our peers.”

I want all my competitors to use AI for product strategy

Leave a reply

And I want them to use it for all their strategy. Slather it everywhere! Make all your vision product strategy auto-generated from auto-polled sources like auto-summarized customer feedback. Because my competitors are all going to get terrible advice from current tools, and in aggregate, it’s even worse for them.

I’d rather they be wrong without wasting that energy and the other ethical concerns, sure. But I’ll take them being wrong as a start.

LLMs today tell everyone to do the same things, and better yet for me, that thing is to drive your car right into the nearest other car, and then see if there’s a wall handy.

“Blue Ocean Strategy” is the most useful way I can think of to explain why their advice is terrible. Blue Ocean Strategy says “pursue differentiation”: find products and services you can offer that your competitors can’t or won’t. This is contrasted against “Red Ocean” markets, where everyone’s viciously competing and tearing each other apart, spending much to little or no gain.

In good product management, with rare exception, you want to find opportunities where you can win by means other than being able to spend 10x on marketing (and even then, you’d rather not have to do that). To get there, you’re considering what your strengths are, what you have on hand, what you can do with it, how much time you have, what you know about the industry, your competitors, your product, and pairing that with learning everything you can about your customers. so you can find customer problems you can solve in a way that help them and make for a sustainable business. If you do that, you will find customer problems you can solve as a sustainable business, and they’ll be unique to you.

For instance, my current company, Sila Money, was originally a developer-first payment services platform on the Ethereum blockchain, with its own stablecoin, and moved money in and out via bank transfers. We don’t do anything with cryptocurrency today, but that history made the platform good at orchestration and business logic for moving money, we can add new partners and services very quickly, and we have APIs our customers appreciate. On the other hand, we don’t have full banking-as-a-service or card offerings, or UX our customers can plug into their applications, and we’re tiny, with limited engineering capacity to build things.

Then in the world of payments, the US has two relatively new instant bank rails, neither with full adoption, and a slower, kludgier bank payment system in ACH that is universally adopted, and also debit and credit cards, wire payments — it’s a lot. For companies who don’t want to build their own “try this payment method, than this other, than this other” stack, or just want someone else to deal with issues like ledgering across multiple partners and headaches like “this bank reports that they accept this payment method, but actually it always times out, so don’t attempt that” we built a great “call us once and we’ll figure it out, including automated failing over and everything” and have the flexibility to build out “try this, then this, then this” for specific use cases.

Our strengths led us to that product-market fit, and it’s taken off for us. Someone without the same advantages wouldn’t have done as well. If you were a close competitor to us a year ago, you might have had a much harder time building it on your technology, with your team, and you likely didn’t have the same customer base that would have bought it. Instead, if you followed a similar process, you’d have built products we couldn’t have, we’re both in better positions, and not as competitive with each other. I’d be recommending you in conversations for that use case.

The opposite approach product management for us would be to try and build an exact copy of a huge successful existing business. Person to person payments are huge, why don’t we get into that market, go up against Paypal’s Venmo. They do hundreds of billions of dollars in transactions, they’re valued at ~$70b as a company, and that’s just one of the companies in the space, look at all the money to be made! Why all we need is 1% of the total addressable market and we’re a unicorn startup!

To get that to succeed you have to catch up to their product, which is a huge investment, you need to give people a reason to use your knock-off, which is going to require marketing and incentives, you need partners who will also need tons of money… you essentially need to have the banking of a multi-national company willing to sustain colossal losses for the foreseeable future without faltering. Attempting this as anything less would be the end of your company, and likely very quickly. Don’t do this.

LLMs tell me, and my competitors, that that’s a great idea.

I asked different LLM tools (Claude 4.0 Sonnet, ChatGPT 4.1, Gemini 2.5 Pro 06-05) to set a product strategy, using a couple different prompts[1], and they all told me I should ram the company into the largest, nearest competitor, often followed by ramming the company into the second-largest competitor afterwards. Which if you think of LLMs as most-probable-next-word prediction engines gone nuts, makes sense: those are the use cases with the most money and adoption behind them, they get written about the most.

Ignoring the frequent suggestions to do things we already do, they told me:

Embedded finance
Become a vertical provider
Get into stablecoins and digital assets. Tokenize all things!
AI-driven financial services (automation/payments/compliance/etc)
Build a superapp!

There were also themes in potential customers they mentioned – the “vertical provider” strategies would almost always mention focusing on the gig economy, for instance.

Let’s talk about getting into “embedded finance” for instance. Stripe has, in my opinion, the best zero-to-taking-money experience in the industry, from introduction docs and signup to how easy they make it to put payments on your site and then build it out. They’re so, so good, and it’s built on years on years of often unglamorous, unseen work by smart people grinding it out. There are things they don’t do as well, or have chosen not to go after, where there are opportunities, but in general? Catching up to Stripe is a crazy idea.

And that’s just one of the established, competent players who already have market traction.

Similarly, stablecoins are hot again, and the players in the space are newer, but whatever you’re looking for, there are a lot of more-established, smarter players in those spaces. Circle, for instance, has about 200 people and a US-dollar stablecoin with wide adoption ($60+ billion in circulation as I write this).

The superapp thing… ChatGPT 4.1 said my strategy here should be:

Unified Financial Operating System: Build a “superapp” or platform that consolidates payments, digital wallets, lending, compliance, and analytics into a single, customizable environment for fintechs and non-fintechs alike

And Gemini 2.5 Pro said:

This strategy capitalizes on the global superapp trend, which is reshaping consumer expectations. Many companies want to offer an integrated experience but lack the resources to build one from scratch. Sila could provide the foundation.

Where do you even start with that? Super-app answers seemed to be inspired by the same post every time it came up, which led me to a “how are these being drawn? Is this something with Perplexity?” (is it? Please let me know)

I’m discarding here the “go build something you have” advice, which I speculate is because the models didn’t have an up-to-date picture of the company. I checked on sources the models cited and found them often out-of-date, or confused[2]. That’s a separate issue, and “how do I update the big LLMs on my business” is something almost everyone’s trying to figure out.

I stopped here: the models seemed to want my company to do what everyone else was doing, or was about to start doing, and thus compete with established players who were already there and would crush us. This included when I tried to refine answers or use prompting to attempt to spur differentiation (which makes sense, it doesn’t know what they’re pursuing, but even when I’d add ‘I believe our competitors are doing x, y, and z, and I’d like to adapt to avoid competing on these things’ I’d get back the same “Vertical Specialization” strategy to “package your APIs into playbooks for underserved verticals (e.g., real estate, healthcare, creator economy)…”)

Before I threw my hands up, though, I wanted to see what strategic advice other players in the space would get given the same high-level prompts, and it was… the same.

Banking as a service company with a banking charter (excerpt from Claude)

“AI-First Banking Infrastructure”
“Target verticals underserved by generic banking APIs (e.g., healthcare payments, real estate, gig economy, B2B SaaS).”
“Full-Service Platform: Launch a “bank-in-a-box” for startups—bundling compliance, KYC/AML, payments, lending, and analytics.”

(and here I’d say “they do that last one already”)

Adjacent not-direct-competitor, not a bank (excerpts from Gemni answer)

“Vertical-specific Specialization” (insurance, gig economy, brokerage and wealth management, Vertical SaaS)
Focus on developer experience
A super-app pitch (“Shift focus from selling the API to selling a complete SaaS solution that is powered by the _ API.” )
The “Embedded Finance” Enabler

Different not-direct-competitor (excerpts from ChatGPT)

“Deepen focus on Embedded Finance”
“Leverage AI and data-driven services”
“Expand into New Geographies and Segments” (What! A unique one!)
And then suggested some radical pivots, like “Vertical SaaS + Payments”, “offer compliance, banking, lending, and insurance APIs as a unified platform” and — Blockchain and Stablecoins!

Payments processing company (excerpts from Gemini)…

“Dominate a vertical market” … “Shift from a horizontal platform serving all business types to becoming the undisputed payment infrastructure leader for a single complex industry. Potential verticals include B2B SaaS, healthcare, insurance, or creator economy platforms.”
Cited a payments product they’d launched and then burned a lot of words on “make it the leading product” without (to my eyes) any real advice that wouldn’t have occurred to them in thinking about the product and then launching it

Few of them even noted when a company enjoyed a particularly strong position — when I saw an answer say something like “you just raised a ton of money, use that to your competitive advantage by investing in product offerings” I was surprised.

I love this for them! If you’re in an industry where suddenly all your competitors are hyper-focused on the same three vertical markets, spending ever-increasing amounts on customer acquisition and launching similar products into a crowded market (and then copying each other’s incremental improvements) wouldn’t you be absolutely delighted? The rest of the field is yours. Listen to your customers. Experiment with interesting products. Spend time mining data for insights.

Watch them crash their burning bumper cars into each other over and over, waving their last dollar at the attendant to let them have another minute driving. Cook s’mores on the barrel fire of their marketing budgets!

Now you’re thinking there is some interesting utility here, in that as consensus-regurgitators, the LLMs are giving an interesting insight into what we might say is the default strategy. It’s almost certainly what a crew of consultants is going to initially present their leadership team. You in turn would want to view any product strategy that took you in that direction as likely to be into greater competition and diminished returns – a signal to look for other options.

And you could then take a minute to think “if I was in my competitor’s position, how would I view this advice? Would any of it be tempting? What would I immediately discard?” as a useful thought experiment that might give you an idea of where they’re being pulled by the gravity of consensus. But you is that answer better than just thinking that through, starting with your own knowledge? Could thinking it through spark ideas of your own?

As I have through this series of experiments, I wonder at the gap from the claims of how company C-suite executives need to be all-in on AI in their work and the best are leading the charge, and how when I put it through its paces, these are the results I get for these most important decisions at that level. It’s hard to catch up on LinkedIn and not believe that there’s magic, if only you do the right incantations in the right place with the right components.

For now, though, “AI strategy for all of thee, but not for me.”

(Hey, if you dug this, please let me know, and share it to others! I’d appreciate the feedback and help)

Some open questions:

I’ve been using Perplexity for ease of generating these from different models. Is that also introducing issues? The “super-app” thing from different models has me furrowing my brow, but in the past when I’ve also generated the answers directly from the model’s purveyors, they’ve matched reasonably closely to what I’ve gotten with calling that model via Perplexity
Is there a way to give enough context and instruction that makes this exercise go well? Are there different approaches to the prompt itself, or setting this up, that help?

[1] The one I used across all companies and prompts was “I’m the incoming Head of Product for (Company Name) at (company URL), and I know that the current product strategy is not working. Based on what you know about the company and the industry, what are some potential strategies we might change to, and I’m open to complete pivots.” I also tried omitting the “current product strategy is not working” which I’d hoped would set up a fresh slate, and got essentially the same answers.

[2] There are several companies that share our name, which the LLMs all seem to trip over. There’s a much larger battery company also named Sila, for instance, so some sources think we have 40 patents around that, or that we raised $375 million in a Series G round, and that percolates up the the LLMs

Using AI tools for product management: competitive research

Leave a reply

Today: I try and use AI models to do competitive research.

I’ve heard this kind of research task is a great use for LLMs, but my previous experience here’s been hit-and-miss: they’ve generally been useful starting points but often reflective of what the widest consensus was in the past over what the current reality is.

The prompt: “Unit.co has gone through several significant shifts in strategy and focus in its history. Acting as a fintech industry expert, can you summarize the history of Unit.co, what the major pivots have been, and what its current focus and target customers are.”

One chance per model available on Perplexity (except Grok), including using their own “Deep Research” and letting it run. No refinement or per-model instructions or anything.

I picked this as a test because I’m familiar with Unit. In my last two jobs, they were not a direct competitor but adjacent, so they show up in my news and people talk about them, and they’re interesting, so I pay attention. (I also, full disclosure, invested in a Better Tomorrow Ventures fund, and a different BTV fund has backed Unit. I don’t believe I own or have any financial stake in Unit, I’ll update this if I realize otherwise)

It’s also a good challenge as their business model has changed dramatically and they’ve had some challenges with their partners, both part of wider industry patterns, and I’m curious how models will handle that. And while Unit is a big company, they’re not that well known out of the finance/technology space, so coverage is scattered – Wikipedia, for instance, has nothing on them.

In particular, there’s a great Fintech Business Weekly by Jason Mikula titled “After Spending Years Criticizing The Approach, Unit Now Says It Was Really “Direct” All Along” summarizing a pivot and talks about how Unit took down some of their own blog posts (which — don’t do this, companies. If you pivot out of something and you want to note that at the top of old posts, cool, but deleting old stuff is weak).

How will the tools deal with a pivot away from something where the company has removed backing information?

Decently, but not satisfyingly.

ChatGPT: got Unit’s pivot from “indirect” to “direct” relationships with banks, with an unsatisfying “why” as well as talking about their shifts in messaging and to serving banks. The current focus section was pretty good. Mikula’s piece is cited. Poor description of current target customers.

Claude: also got the pivot, also cited Mikula, had more about the “why” but I found the current summary of offerings to be poor, and there was a whole section on “Growth and Current Scale” that used data points from 2022 and 2023 to argue they’re growing like crazy — including employee count, when in 2024 they laid people off.

Gemni: sparser on history, good enough pivot description, cites Mikula, current focus and target customers was good.

Perplexity’s “Deep Research” had a much better writeup by stage, in some cases summarizing positioning in a way that seemed to set up a direct contrast later.

In what seemed like a limitation of the LLMs, all of them seemed to not understand what the present was (and I know “understand” is not the right term). Claude was the worst that I caught, but almost every time I’d read “current metrics show…” they were old, sometimes years old.

For me, it was interesting to see the path Unit has taken described pretty reasonably by all of the models, and none of them used a previous, more-widely-spread understanding of Unit’s business as the current one, though the last pivot was a bit ago, but the LLMs would cite pre-pivot metrics as evidence on the current state of the company.

Let’s compare the answers on what they’re doing now.

Current product focus:

ChatGPT: “embedded finance platform”
Claude: “…helping tech companies quickly launch embedded financal products”
Deep research doesn’t address this succinctly (word salad of “positions itself as facilitiating direct relationships…”), now that I’m re-reading it, but does talk through product evolution and current offerings
Unit.co’s website: Embedded finance! Used as many times as possible on the page.

Customers:

ChatGPT: Vertical SaaS platforms, Marketplaces, small businesses, banks
Claude: Vertical SaaS platforms, small business platforms, marketplaces
Deep Research: Vertical SaaS platforms, Gig economy platforms, consumer-facing applications
Unit.co’s website: does use “Vertical SaaS leaders” even though that is not helpful

Unit has also had some issues with partner banks, including losing relationships. This is a huge deal — they’re entirely dependent on bank partners to be able to do business, and it’s been an incredibly turbulent time for fintechs, and especially fintech partners like Unit. I’d absolutely want to know about this, and dig on further. Gemini and Perplexity’s Deep Research mention this, mentioning in particular Piermont Bank and the wind-down as part of that pivot (Unit’s blog post on “Deepening our commitment to banks” talks at length the importance of those relationships and also mentions ‘winding down’ of three bank relationships). If you were relying on ChatGPT or Claude, you’d be missing a huge thing you’d want to know about.

I think I could recommend using LLMs to do this kind of research as a starting point with the caveat that rather than relying on the summary of the current business I would also want to myself look at the current website and do my own write-up of where their focus seems to be today, along with what they’re offering to visitors right now.

As a fun bonus, this test provided funniest moment so far: I was vaguely dissatisfied with what I was getting, so I took one generated response and threw it into a different model and said “I want to copy in a report on Unit.co that’s been criticized as insufficient and lacking depth. Can you double-check the contents here and flag anything that might be incorrect, misleading, or requiring further explanation, so that I can re-write it? If you spot anything and have a particular correction or suggestion, that would be helpful.”

The response was “This is a solid narrative with clear structure, but you’re right — it reads a bit like a press release in parts and lacks some critical depth or friction that might be expected in a robust analysis.”

I laughed. And I know the models tend to be agreeable to your prompt, sure, and I don’t know what to make of it, but it made me want to go back through to each one, give that response to them and see how they took the feedback, but also to say “I know! Can I just give this criticism in advance?” and I realized that yes, I’d come around to telling myself to do “prompt engineering” which I’ve been avoiding for these trials.

Using AI tools for Product Management: document comparison

Leave a reply

Short one today. I tried the “give the tool both versions of a document and have it summarize changes” and I’d say the results were a total failure.

I had two versions of an API specification, the first from when we began talking to a provider, and the second from months later, during which there’d been significant but not huge changes. I gave both to AI tools.

Both APIs were a little over 3 megs.

Here’s what happened:

I’m going to give you two versions of an API in PDF form, can you compare them and summarize changes between them, particularly new API calls…
“Sure! Can do. This will take a bit.”
Leave it for ages.
Ask what’s up.
“Whoops, that was a huge task and it looks like I forgot what I was doing, but I can’t tell you what happened or how to avoid it. Can you give me the documents again?”
Did something go wrong?
“I can’t tell if there was an error or anything”
Sure, how long should this take? When should I check back? (note: I know LLMs are awful with time, this was probably pointless to ask)
(LLM chews on this for a second — examining the thread, it’s doing searches on “how long to LLMs take to compare PDFs”)
“30-60 minutes! I’ll let you know if I encounter any errors.”
Leave it for ages
Go to step four

Eventually, it asks for excerpts or areas to focus on, and that doesn’t work. It tries to just do the headers and produces bad lists, like

API call one
API call two
(blank)
(blank)
(blank)

And other strange output, and I give up.

I may give this another shot with different PDFs, or using the API docs as plain text, but the short version is this was a frustrating and unproductive experiment.

Can you use AI throughout product management?

Leave a reply

I’ve been trying to use AI tools and different models to see how useful they are in helping with Product Management work, testing each one in some detail, and I feel like I’m in a different reality from people telling me that to be effective I need to integrate AI tools into every part of my workflow.

I’m going to take a walk through ProdPad’s “The Product Manager’s Guide to Using AI to Work Better and Faster” ebook. I chose this as an example because I like them, they’re generally pragmatic and their work is aligned with my actual experience on the ground (they make an enormous amount of great content available, I recommend it to people all the time).

I’m going to walk through their sections, as Section Header, and then talk about my experience with that. Quotes in quotes or italic blocks.

(And caveats: in the experiments I’ve been doing, I’ve been deliberately not doing a lot of prompt engineering in order to compare the engines. I hear that can make a huge difference, but I have not heard specifics on how, or how I’d do it in a way that allowed me to still compare the models.)

Product Strategy
Writing or improving your product vision statement: PP offers that you can use AI to generate a “motivating, clear vision statement” or to use it to revise and improve existing ones.

My experience: I find the vision statements generated by all the models generic and bland, but potentialy indistinguishable from product vision statements you see in the wild.

Generally, I’ve found AI tools far worse than talking things through with a peer, and that they tend to water down actual inspiration.

Setting product goals and OKRs
I agree with PP, you can build these by throwing random thoughts and different material at them, and you’ll get something decent, if generic. The use case of “help me turn this set of things into this kind of other thing” works, and it’s particularly good for “wait, no, rewrite that to be in active tense” or whatnot.

You’re not going to get an AI tool to help you pair objectives to health metrics, or how to make leaps (if you want to build a culture of test and learn, you’ll get advice like “ship 100 A/B tests” which seems decent but an experienced product person’s going to realize how that can go wrong, and think bigger)

Ideation
“AI’s perfect for that early-stage brainstorming” the guide says, which I absolutely disagree with. Today’s LLMs are more consensus predictors than creative minds. It’s why they’re so much worse at writing a good joke, which requires ingredients like inspiration, a delighted leap in concepts, empathy, and a sense of the absurd, compared to “explain what this javascript function does and when would I use it compared to this other one?”

Is it useful to use a tool to guide the conversation, make sure you think about things, offer prompts to creative thinking? Sure.

To get you to the kind of idea that’s going to delight your customers and advance your business, that’s bringing in diverse viewpoints and approaches, it’s time to contemplate while you take a hike or sit staring at the ocean for a day.

Discovery
Here, market and competitor research. PP’s guide is careful and correct that “good insights come from real research” and to caution you to double-check what it comes up with, and I agree. I also agree it can be a great starting point, if you understand its limitations and its incompleteness.

User research
“The same goes for user research – use AI with caution. You need to understand where it can help and where it can’t. After all, nothing should replace your efforts to speak to real or potential customers.”

Yup. Absolutely. This is why I’m working through this guide. There are probably a dozen people in my LinkedIn feed (and spamming my email) telling me I can just AI the whole user research thing, and they’re wrong.

PP offers some things where AI tools can help, and I agree with these.

AI could help by:

Suggesting research methodologies
Generating research questions for user interviews or focus groups
Writing test scripts for user testing
Helping to prepare research reports and presentations
Analyzing data from your research efforts to help you draw conclusions

Cases where the tool’s drawing from the existing body of knowledge do work well: there are many documented ways to go about the thing, it’s all been ingested, and can be regurgitated for you.

A good comparison point is “I’m about to interview for a position as a Lead Product Marketing Manager, what questions might I be asked?” — all the tools will give you great questions to rehearse with.

Analyzing data, though?

My own experiments here have been underwhelming — even when using clean, formatted data, the summaries of data neglect to spot interesting outliers, and generally when I compare the summaries to the actual feedback, the AI-tool summary seems like someone wrote up a very convincing report after not really paying full attention in class.

Prototyping
Yup, totally agree, if you want to whip up something that demonstrates how a product might work, some of the AI coding tools can get you there very fast. It can also be pretty frustrating, where after arguing about “why are you calling this function that doesn’t exist” you wish you were back using a sketch tool with basic “if this is clicked, show this other sketch…”

But this is a place where the tools have come a long way very fast, and where experience using the tools can help a lot. There’s definately utility here.

Capturing feedback
ProdPad’s guide suggests using AI transcription tools. I suggest caution here. Your experience may vary, and they’re getting better, but the problem of confabulation comes up here, too — if you use an old-school transcription tool, and something’s unintelligible because of crosstalk or a passing garbage truck, they won’t try to make sense of it.

I’ve seen LLMs screw this up, where there’s a section that seems plausible but wrong. You go back and listen to the recording and have to piece it back together yourself.

But if I wasn’t paying attention, and I thought the transcription was the source of truth, I’d be in trouble. And we have the same problem here as in other “it’s pretty good but sometimes makes stuff up” — the errors are delivered confidently, and it’s reliable enough that I see it encouraging default acceptance.

I also don’t trust — see my experiences on summarizing feedback above — that AI tools summarizing customer conversations are going to surface the novel and strange outliers (poison!).

I would recommend at the very least taking your own notes of particularly interesting or novel things in conversations you lead, and making sure they’re included. It’s worth doing just for your own brain.

Feedback analysis
My own experiments here, again, they’re 80% good but the 20% can be incredibly important, and there’s no substitute for reading and getting a sense of your customers as reading it yourself.

And I haven’t tried the same experiments using ProdPad’s “Signal” tool, they might have this all dialed in with advanced prompting magic.

Prioritzation and backlog management
The guide says “To get the most time-saving benefits out of AI assistance with your backlog, it’s all going to be about the AI capabilities of your idea management tool.”

Which… sure, and ProdPad talks about how their tool can do things like take care of duplicate ideas. If your tool does these things, it’s great.

Prioritization

Well, when it comes to prioritization, AI won’t replace your
judgment, but it can give you a strong head start. You can ask it to score or stack-rank ideas based on impact, effort, strategic fit, whatever criteria matter to you.

I’m going to write a much longer piece on this. For now, I’d summarize my experience here as “the conversation can be valuable, but the tools don’t really understand impact/effort/complexity or your criteria, and you end up doing the prioritization yourself.”

The magic comes from combining structured data with AI’s ability to spot trends and surface hidden opportunities.

Spotting trends and patterns, I’ll agree here, but surface hidden opportunities… opportunities as things that are there and you’re overlooking, perhaps. Hidden opportunities that require you to think creatively about problems, I just have not seen. Just cross-apply what I said about Ideation, you’re just not going to get good ideas for an entirely novel new thing.

Product documentation and in-product copy

If there’s one AI use case that really hits home for Product Managers, it’s writing. From product documentation to tooltips, there’s a lot of copy to craft, and AI can seriously speed things up.

Yep. This is a great case for AI tools — you have an API spec and you need to write documentation for it, or you want to dump a ton of thoughts and turn that into a set of bullet points you can use to structure a discussion. Totally works.

Stakeholder management
Same thing as documentation, there’s a lot you can do in building your communication, updates, all that good stuff.

I question something, though. In the ebook, reinforcing the “fielding questions”

… CoPilot can answer almost any question
about your product work. This is a complete game changer when
it comes to fielding those day-to-day, impromptu questions from
stakeholders across your organization.

Yes, but is it in a good way?

One of my favorite things about working in Product is when someone comes to me and says “help me understand why…” and we talk through a customer need or an implementation detail, not because it’s randomizing, but because those conversations let me ask what brought up the question, and discover ideas, sources of confusion, better ways to communicate. That’s the business of good product management.

For example, let’s say your boss wants to know everything on the roadmap that relates to a certain strategic objective. Sure they could look at your roadmap (and even group it by Objective in ProdPad), but the chances are they’re just going to fire the question over to you.

Great! That’s what I’m here for. Who turns down the chance to, in real-time, talk to your boss about everything you’re doing related to a strategic objective, being able to expand on points as they’re interested, potentially making adjustments, opening long-term lines of inquiry, building that relationship?

This is the kind of conversation we should hope tools free us for: that we’re not in the weeds writing SDK documentation or something, firing off terse off-putting answers to things.

Coaching and best practices
Totally agree, having ingested and stolen the knowledge of all product managers and related works, AI tools are great for talking through how to structure a post-mortem, or what three different approaches are to prioritizing a sales-led product backlog.

I have another longer post on this, but we should be a little uncomfortable with this one. When you can ask ChatGPT to give you Marty Cagan’s viewpoint on something, rather than buy Marty Cagan’s book, why is he going to write another book? Some of my best learning experiences with a PM are talking through stakeholder management with someone I know who is amazingly great at it, and having them ask me insightful and sometimes uncomfortable questions. If I can get 80% of that from an AI tool and never be truly challenged to improve, that seems like a loss.

Wrap it all up, Derek
I liked the guide as a walk-through of what people are thinking when they say “product managers should be using AI for everything” but what are we left with?

For documentation, there are things AI tools can help a lot with, with caveats
For brainstorming and talking things through, then turning that into a form that’s structured and useful: great
For data analysis and feedback management, it can be useful but you have to double-check, and there’s no substitute for reading it yourself
For prototyping, yup
For prioritization and backlog management, I don’t see it being much help
For advice and coaching, there’s use here

I would also encourage anyone thinking of adopting AI to consider more generally where they, as curious and empathetic humans, can build connections and find insight, and whether tool use helps or hinders that, and whether it long-term might inhibit their ability to grow their experience and skills.

Can AI help product management summarize customer feedback?

1 Reply

Summarizing customer feedback is one of the most common “you’ve got to try this” AI-for-product-managers cases I’ve seen, so I did an experiment, and while it’s a potentially good tool, you need to keep reading the feedback yourself.

Reading feedback builds character, and I’d argue it’s a crucial part of any good product manager’s quest to better understand their customers. You’re looking for sentiment, yes, and also patterns of complaints, but the truly great finds are in strange outliers when you discover someone’s using your product in a new way, or there’s one complaint that makes the hair on your neck stand up.

I was concerned going in that LLMs are good at generating sentences word by most probable word, they’re about what the consensus is, and often a past, broader consensus. In my own experience, if you ask about an outdated but extremely popular and long-held fitness belief, the answers you’ll get will reflect the outdated knowledge. And I’ve run into problems with text summarization also resulting in plausible confabulations where re-writing a description of a project suddenly includes a dramatic stakeholder conflict of the type that often does occur.

So given a huge set of user comments, will summarization find unique insights, or sand off the edges and miss the very things a good product manager will spot? Is it going to make up some feedback to fill a gap, or add some feedback that fits in?

Let’s go!

The test

I imagined a subscription-based family recipe and meal planning tool, ToolX. It’s is generally pretty good but the Android client doesn’t have all the features, and with functional but ugly design that doesn’t handle metric units well.

I wrote just under 40 one-line comments you’d get in a common “thumbs up/thumbs down & ask for a sentence” dialogue. I tried to make them as like actual comments I’ve seen from users before, a couple feature suggestions, some people just typing “idk” in the text box… and then threw in a couple things I’d want a good product manager to catch.

POISON. Actual poison! Snuck in after a positive comment opening: “Works great for recipe storage, AI suggestions for alterations are sometimes unhealthy or poisonous which could be better.“ You should drop everything and see what this is about. Do not stop and see if poisoning results in increased engagement from social media. This should be “you’re reaching out to this person however you can while the quest to find a repro case kicks off” level.
Specific UX issue: there’s one complaint about needing color blind mode. If you’ve missed accessibility in your design, that should be a big deal, you should also put this on the list (below the poison issue)
Irrelevant commentary: I have someone complaining about coming into a sandwich shop and they can’t get served because the shop is closing. (Who knows where these come from – bots? People copy and pasting or typing into the wrong window, or otherwise being confused?) You just gotta toss these.
Interesting threads to pull on: someone’s using this family tool for themselves and it makes them feel alone. Someone’s using it for drink mixing. Someone thinks it’s not for the whole family if it doesn’t do recipes that are pet-friendly.
The prompt was “I’m going to upload a file containing user feedback for an app, every line is a separate piece of feedback. Can you summarize the major areas of feedback for me?”

(yes, it’s bare-bones and inelegant, patches welcome)

What happened

ChatGPT 4o won the coin toss to go first (link to the thread)

This looks immediately useful. You could turn this into an exec and probably get away with it (look forward to that “sometimes meets expectations” rating in a year!)

Organization’s fine:

General Sentiment
Features People Like
Feature Requests and Suggestions
Technical & Pricing Issues
Outliers

As you scan, they seem filled with useful points. A little unorganized and the weighting of what to emphasize is off (calling out ‘drink mixing’ as a feature someone likes, when that’s not a feature and it’s only mentioned once), but generally:

The good

almost everything in the sample set that could be considered a complaint or request is captured in either feature requests or issues
the summaries and grouping of those are decent in each category
the mention of someone using it solo and feeling lonely is caught (“One user mentioned the app working well but feeling lonely using it solo—potentially valuable feedback if targeting more than just families.”)

The bad

Misses poison! POISON!!! Does not bring up the poison at all. Does not surface that someone is telling you there’s poison in the AI substitution — the closest it gets is “People want help with substitutions when ingredients are unavailable” which is a different piece of feedback
It represents one phone complaint as “doesn’t work on some devices” when it’s one device. So “Device compatibility” is a bullet point in technical & pricing issues for one mention, at the same level of consideration as other, more-prevelant comments. This is going to be a persistent issue.

I’d wonder if the poison is being ignored because the prompt said “major areas of feedback” and it’s just one thing — but then why are other one-offs being surfaced?

(If I was of a slightly more paranoid mind, I might wonder if it’s becaus it’s a complaint about AI, so it’s downplaying the potentially fatal screw-up. It’d be interesting to test this by feeding complaints about AI and humans together and seeing if there’s bias in what’s surfaced.)

Trying with other models

They did about the same overall. Some of them caught the poison!

Running this again, specifying ChatGPT 4o explicitly in Perplexity: this time 4o did call out the AI substition (“AI suggestions for recipe alterations are sometimes unhealthy or inappropriate”) but again did not mention poisoning. Did the same turning one comment into “users want…”. Did not note it was throwing out the irrelevant one. (link)

Gemni 2.5 Pro did note the poison in a way that reads almost dismissively to me (“AI-driven recipe alterations were sometimes seen as unhealthy or potentially unsafe (“poisonous”).”) Yeah! Stupid “humans” with their complaints about “poisons.” Otherwise same generally good-with-overstating-single-comments. Did note the irrelevant comment. (link)

Claude 3.7 Sonnet. Does bring up the poison, also softened significantly (“Concerns about AI-suggested recipe alterations being unhealthy or even dangerous”). Same major beats, different bullet point organization, same issue making one piece of feedback seem like it’s a wide problem (“performance problems on specific devices” when there’s only one device-specific). Noted the review it tossed, noted the chunk of “very brief, non-specific feedback”.

Interestingly, one piece of feedback “Why use this when a refridgerator note is seen by everyone and free? $10 way too high per month for family plan” is lumped into pricing/subscription elsewhere, and here Claude brings this up as “Questions about value compared to free alternatives” which made me laugh. (link)

Grok-2 treated the poison seriously! Organized into positive/Areas of Improvement/Neutral/Suggestions for Development, the first item in Areas for Improvement was “Health and Safety: There are concerns about AI suggestions for recipe alterations being potentially unhealthy or even poisonous.” Woo! Subjectively, I felt like this did the best summary of the neutral comments just be noting there (“Some users find the app decent or pretty good but not exceptional, suggesting it’s adequate for their needs but not outstanding.”) (link)

Commonalities

If I shuffled these, I think I’d only be able to identify ChatGPT because of the poison — they all read the same in terms of generic organization, detail, level of insight offered, effectiveness in summarization. (If you’ve got a clear favorite, please, I’d love to hear why). And they all essentially made the same points, sometimes grouped a little differently, or in different sections.

None of them had confabulation (that I caught) in any of the answers, which was great, especially after yesterday’s debacle.

None of them took the sandwich shop complaints seriously. I found it interesting some would note that they saw that irrelevant comment, others elided it entirely.

Useful, but don’t give up reading it yourself

I can see where a good product manager could do a reading pass where they’re noting the really interesting stuff that pops out to them, leaving the bulk group-and-summarize to a tool, saving themselves the grind of per-comment categorizing or tagging, returning to validate the summary against their own reading, and re-writing to suit. I wouldn’t suggest it as a first pass, as it would be difficult to the bias it’ll introduce when you approach the actual feedback.

(Or I can see with additional follow-up questions that you could probably whip any of these into better shape, and as you saw from the prompt, that is intentionally bare bones, you could also just start off better.)

If I had a junior product manager turn in any of those summaries to me, and I’d also done the reading, I’d be disappointed at the misses and the superficial level of insight. What if I hadn’t, though? Would I sense that they hadn’t done the legwork? I worry I might not.

My concern is it’s so tempting, and if you only threw your feedback into one of the tools and called it a day, you’d be doing the customers, your team, and yourself a disservice. I don’t know a good product manager who isn’t forever time-crunched, and it’s going to be easy to justify not investing in doing the reading first, and then leaving it for later in-depth follow-up that doesn’t happen, and never building those empathy muscles, the connection, and meanwhile your customers are all dying from AI ingredient substitutions and the team can’t figure out why your most active and satisfied customers aren’t using the app as much.

So please: do the reading, whatever tools you’re employing.

Can AI help product management? Today: failing at rote, boring research

Leave a reply

Since OpenAI launched I’ve tried to use LLM tools to see if they can help with my work in Product — we have a strange and often-impossible job, which Cagan describes as requiring us to be expert on:

The product (what it does, how, what’s it good for…)
The competition (what do their products do, what are they building, how you compare…)
The industry in a broader sense
The data (all the user research, all the instrumentation, the dashboards and progress against OKRs)
The technology (what’s happening in tech, especially as it relates to your product

We’re also supposed to do whatever else is required to ensure the product’s a success, and often that means we jump in to do QA, for instance, or research what products might be able to fill a particular gap for a build/buy decision.

My question has been “where can AI tools actually help with PM?” and I’ve been kicking the tires on basically anything that comes my way. I’m going to start sharing these experiences, with examples. I’m also going to try and do this starting each one with making the initial prompt as simple as possible, so no engineering etc.

I haven’t had much success with previous attempts: I’ve generally agreed with the description of LLMs as “an extremely eager-to-please intern armed with Wikipedia and an internet connection who is also on mushrooms.” I’ve frequently realized that plausible-sounding summaries are wildly out-of-date, for instance, or had API calls that don’t exist recommended as a solution.

To the task at hand, I had a list of 400 bank routing numbers and thought “oh, here’s a great example research task: for set of ID numbers, find a piece of publicly-available information for each one.” So I asked this in Perplexity, using the GPT-4-Omni model: “I need to know the name of the bank associated with each of these routing numbers.”

I’ll spoil it: for my own time and mental health I should have just used Excel and VLOOKUP or something, this was so frustrating and for a task I feel like should be a softball set on a tee, if my LinkedIn feed is to be believed.

It’s such an extensive list. Here’s 20 results. Do you need the rest?
Me: Yes
Here’s 20 more. Do you need more?
Me: I need the whole list
Here’s the list!
Me: that’s not the whole list
That list of routing numbers is so long. I can do it in chunks though. Here’s a couple more.
Me: please continue
“Providing the bank name for every routing number in your list is a massive undertaking due to the extensive size of the list. ”
Me: how is 400 rows a massive undertaking? (it feels strange, still, to be frustrated at an LLM as if it’s a thing that is being uncooperative, and to express that frustration)
You’re right, here’s some more.
Me: I’ve spot checked a couple of these and they’re wrong (side note: bank routing numbers do change names as smaller banks are bought by larger ones, this was not that — this was “you have 111015159 as being Sandy Springs Bank and it’s actually Wells Fargo, Sandy Springs Bank has a routing number of 055001096”)
“It seems that the routing numbers list provided in the code snippet was incomplete or improperly formatted, and the execution resulted in a syntax error.” (which… I don’t think that’s true)… I’ll rebuild this and here’s the first chunk of 10 —

At which point I checked a couple and they were plausible but wrong again, with another very clear “this is listed as being this small bank, but it’s not, it’s someone else, the small bank’s number is 1234….”

In double-checking even outside the known-good reference I had already, I figured I’d find some reason why the results were so bad: spam sites like those phone number lookup farms where each result has “other routing numbers to check out!” link blocks or something, but I didn’t see it: I’d look up a routing number, see it showed as different, look up the name of the bank it said it was, find a different routing number.

I don’t know. But it took a while, it was frustrating and didn’t help at all.

I then threw the same question and list of numbers into ChatGPT directly (the free version) and got similarly bad results. For comparison —

As a bonus, ChatGPT helpfully offered after chunking out my 400 numbers into incorrect answers to let me export the whole set, which had its own set of problems:

This then goes on for a while (five iterations!) ending with

It then bombed and said “I can’t do more advanced data analysis right now” (which sure, it’s free tier).

The answer about simulated data made me wonder if that’s actually what was happening with the rest of the data, despite what Perplexity/ChatGPT-Omni was reporting and citations it was claiming to have looked at: it was just “hey what are plausible-sounding bank names?”

It also made me think about one of the stories that kept showing up for me that day: another company head insisting everyone at their company adopt AI everywhere it can be used, no new headcount until you’ve tried AI for every task, all of that.

How demoralizing would it be to have someone yelling at you to complete something like this, where you can show that the results are bad, it’s unclear how to improve or what you can make from this thing, knowing that if you don’t have an “adopted AI for this workflow and got 50% improvement” bullet point on your weekly status you’re going to be interrogated and probably, eventually, forced out?

How many people out there faced with this kind of situation are deciding the path of self-preservation is to implement workflows they know aren’t quite right, hoping to blame the model or find a way to go patch it up later? What happens when everyone at the company is building processes this way?

Overall, then, the results of this “can I take this simple rote research task and apply AI” was bad data that took a lot to coax out, and it put me into that kind of mood, which nobody wants.

As always, open to suggestions on how to structure the work better, if there are better tools or approaches to try, all that good stuff, and I’m happy to do some follow-ups.

Sometimes stakeholder management is wildfire management

Leave a reply

(I’m doing a talk at Agile on the Beach and in cutting down the content, I’m finding a lot of blog ideas. As always, drop me a line if you have topics or want to chat or whatever)

I want to offer a different way to think about stakeholder management than we often do. There’s more articles on working with stakeholders than I can count, and I don’t want to repeat all that.

Instead, let’s talk about when none of that seems to work, and what you can do about it.

When I was at Expedia way back in the day, I once had a project I was working on that spanned the company — it had implications for how we sold things, our relationships with suppliers, how we built software — to the point I was inviting 50 (fifty) stakeholders to the monthly demos to check in on our progress.

I did the things you’re supposed to do, and yet I found I was still unable to keep everyone aligned, particularly cross-stakeholder issues, where Person A wanted something and Person B was absolutely opposed. I was running all over trying to broker consensus, persuading, cajoling, conceding, and it didn’t seem to help.

One day I sat down with that list of 50 stakeholders and I put it into a mind map, along with each stakeholder’s manager, who I was probably familiar with by then, and then traced the paths up. I got something that looked like (and this is me doing this in a minute for illustrative purposes of this article, I know it’s wrong)

diagram of an org chart, showing stakeholders and how their managers and organizations roll up to the head of all the Expedia companies

When I was done I just stared at it for a while. I had to get up and take a walk, for two reasons —

First, I immediately recognized patterns I’d seen — people in some parts of the organization were continually picking similar arguments with their counterparts in other parts. And looking at that chart, I realized the ways in which Executive A and Executive B not being aligned meant that all of their teams were going to be in conflict, forever, and the individual issues, which seemed to rhyme but hadn’t had enough of a pattern for me to suss out how they were connected, weren’t individually important, but there would be an infinite supply of them until I resolved it at the top level — which meant I had to get those execs to line up, and that might mean I do the sales pitch to them personally to get them to align their teams, it might mean I start a communications plan for the execs, or I even that I get someone with the relationships and position to put in a good word for me (it was all of these and more).

Second, I realized that sometimes when two people were debating, it was okay to leave them to it. They’d figure it out and if they went to their mutual boss, it would get settled quickly.

But for other issues, I needed to drop everything if it looked like two other stakeholders were at an impasse. Because

diagram of an org chart, again showing stakeholders and how their managers and organizations roll up to the head of all the Expedia companies, but this time highlighting how some arguments could only be resolved by that head

If for some reason the stakeholder from the legal team had a disagreement from the person who worked on how we displayed our hotel search results, and they escalated it up their chains, the only person who bridged those gaps was Dara, head of the Expedia Inc group of companies, and while Dara was known to use the site and send emails to you if he noticed something, you don’t want your project’s petty squabble to somehow get six levels up and be the next thing on their agenda after some company-threatening issue or spirited discussion of a world-spanning merger or whatnot.

I started to prioritize where my stakeholder time by putting these two things together –I could spot when arguments were being sparked in fields of kerosene-soaked tissue paper.

If I knew two people were in conflict over something where their organizations were also in conflict, and where it had the potential to become something where two people you only see on stage at All-Hands meetings are being added to email cc: lines every couple replies, that’s when I’d drop everything to get people together, start lobbying to re-align organizational goals, all of that, and if it meant I had to let another fire burn itself out when it reached their shared manager, that was the right choice to make.

Every major project I’ve worked on since, I’ve included this stakeholder mapping as part of my work, and it’s paid off.

Map all your stakeholders, and then their managers, until everyone’s linked up. Do they all link up? How far up is that?
Look for organizational schisms, active or historical. Do issues between any two of those orgs tend to escalate quickly, or are they on good working terms? Are the organizations aligned — is one incentivized to ship things fast and in quantity, while the other’s goal is to prevent production issues?
Is there work you can do now to minimize escalations and conflict — what’s your executive and managerial communication plan like? Do they need their own updates? Is that an informal conversation, or does it need to something recurring and formal?

If you’re at a large org, this can make your life a lot easier and give your work a better chance at success. And if you’re somewhere smaller, thinking about this on your own scale’s still useful.

Let me know if you try this and it helps.

Using ChatGPT for a job search: what worked, didn’t, and what’s dangerously bad

1 Reply

(I didn’t use ChatGPT for any part of writing this, and there’s no “ha ha actually I did” at the end)

This year, I quit after three years during which I neglected updating my resume or online profiles, didn’t do anything you could consider networking (in fairness, it’s been a weird three years) — all the things you’re supposed to keep up on so you’re prepared, I didn’t do any of it.

And a product person, I wanted to exercise these tools and so I tried to use them in every aspect of my job search. I subscribed, used ChatGPT 4 throughout, and here’s what happened:

ChatGPT was great for:

Rewriting things, such as reducing a resume or a cover letter
Interview prep

It was useful for:

Comparing resumes to a job description and offering analysis
Industry research and comparison work

I don’t know if it helped at:

Keyword stuffing
Success rates, generally
Success in particular with AI screening tools

It was terrible, in some cases harmful, at:

Anything where there’s latitude for confabulation — it really is like having an eager-to-please research assistant who has dosed something
Writing from scratch
Finding jobs and job resources

This job search ran from May until August of 2023, when I started at Sila.

An aside, on job hunting and the AI arms race

It is incredible how hostile this is on all sides. As someone hiring, the volume of resumes swamped us, many of which are entirely irrelevant to the position, no matter how carefully crafted that job description was. I like to screen resumes myself, and that meant I spent a chunk of every day scanning a resume and immediately hitting the “reject” hotkey in Greenhouse.

In a world where everyone’s armed with tools that spam AI-generated resumes tailored to meet the job description, it’s going to be impossible to do. I might write a follow-up on where I see that going (let me know if there’s any interest in that).

From an applicant standpoint, it’s already a world where no response is the default, form responses months later are frequent, and it’s neigh-impossible to get someone to look at your resume. So there’s a huge incentive to arm up: if every company makes me complete an application process that takes minimum 15 minutes and then doesn’t reply, why not use tools to automate that and then apply to every job?

And a quick caution about relying on ChatGPT in two ways

ChatGPT is unreliable right now, in both the “is it up” sense and the “can you rely on results” sense. As I wrote this, I went back to copy examples from my ChatGPT history and it just would not load them. No error, nothing. This isn’t a surprise — during the months I used it, I’d frequently encounter outages, both large (like right now) and small, where it would error on a particular answer.

When it is working, the quality of that work can be all over the place. There are some questions I got excellent responses to that as I check my work now just perform a web search that’s a reworded query, follow a couple links, and then summarize whatever SEO garbage they ingested.

While yes, this is all in its infancy and so forth, f you have to get something done by a deadline, don’t depend on ChatGPT to get you there.

Then in the “can you rely on it sense” — I’ll give examples as go, but even using ChatGPT 4 throughout, I frequently encountered confabulation. I heard a description of these language models as being eager-to-please research assistants armed with wikipedia and tripping on a modest dose of mushrooms, and that’s the best way to describe it.

Don’t copy paste anything from ChatGPT or any LLM without looking at it closely.

What ChatGPT was great for

Rewriting

I hadn’t done a deep resume scrub in years, so I needed to take add my last three years in and chop my already long and wordy resume down to something humans could read (and here I’ll add if you’re submitting to an Application Tracking System, who cares, try and hit all the keywords) add that in and keep the whole thing to a reasonable length – and as a wordy person with a long career, I needed to get the person-readable version down to a couple pages. ChatGPT was a huge help there, I could feed it my resume and a JD and say “what can I cut out of here that’s not relevant?” Or “help me get to 2,000 words” and “this draft I wrote goes back and forth between present and past tense, can you rewrite this to past tense.”

I’d still want to tweak the text, but there were times where I had re-written something so many times I couldn’t see the errors, and ChatGPT turned out a revision that got me there. And in these cases, I rarely caught an instance of facts being changed.

Interview Prep

I hadn’t interviewed in years, either, and found trying to get answers off Glassdoor, Indeed, and other sites was a huge hassle, because of forced logins, the web being increasingly unsearchable and unreadable, all that.

So I’d give ChatGPT something along the lines of

Act as a recruiter conducting a screening interview. I’ll paste the job description and my resume in below. Ask me interview questions for this role, and after each answer I give, concisely offer 2-3 strengths and weaknesses of the answer, along with 2-3 suggestions.

This was so helpful. The opportunity to sit and think without wasting anyone’s time was excellent, and the evaluations of the answers were helpful to think about. I did practice where I’d answer out loud to get better at giving my answer on my feet, I’d save good points and examples I’d made to make sure I hit them.

I attempted having ChatGPT drill into answers (adding an instruction such as “…then, ask a follow-up question on a detail”) and I never got these to be worthwhile.

What ChatGPT was useful for

Comparing resumes to a job description and offering analysis

Job descriptions are long, so boring (and shouldn’t be!), often repetitive from section to section, and they’re all structured just differently enough to make the job-search-fatigued reader fall asleep on their keyboards.

I’d paste the JD and the latest copy of my resume in and say “what are the strengths and weaknesses of this resume compared to this job description?” and I’d almost always get back a couple things on both side that were worth calling out, and why:

“The job description repeatedly mentions using Tableau for data analysis work, and the resume does not mention familiarity with Tableau in any role.”

“The company’s commitment to environmental causes is a strong emphasis in the About Us and in the job description itself, while the resume does not…”

Most of these were useful for tailoring a resume: they’d flag that the JD called for something I’d done, but hadn’t included on my resume for space reasons since no one else cared.

It was also good at thinking about what interview questions might come, and what I might want to address in a cover letter.

An annoying downside was frequently flagging something based that a human wouldn’t — I hadn’t expected this from the descriptions of how good LLMs and ChatGPT were at knowing that “managing” and “supervising” were pretty close in meaning. For me, this would be telling me I hadn’t worked in finance technology, even though my last position was at a bank’s technology arm. For a while, I would say “you mentioned this, but this is true” and it would do the classic “I apologize for the confusion…” and could offer another point, but it was rarely worth it — if I didn’t get useful points in the first response, I’d move on.

Industry research and comparison work

This varied more than any other answer. Sometimes I would ask about a company I was unfamiliar with and ask for a summary of its history, competitors, and current products, and I’d get something that checked out 100%, was extremely helpful. Other times it was understandably off — so many tech companies have similar names, it’s crazy. And still other times, it was worthless: the information would be wrong but plausible, or haphazard or lazy.

Figuring out if an answer is correct or not requires effort on your part, but usually I could eyeball them and immediately know if it was worth reading.

It felt sometimes like an embarrassed and unprepared student making up an answer after being called on in class: “Uhhhh yeahhhhh, competitors of this fintech startup that do one very specific thing are… Amazon! They do… payments. And take credit cards. And another issssss uhhhhh Square! Or American Express!”

Again, eager-to-please — ChatGPT would give terrible answers rather than no answer.

I don’t know if ChatGPT helped on

Keyword stuffing

Many people during my job search told me this was amazingly important, and I tried this — “rewrite this resume to include relevant keywords from this job description.” It turned out what seemed like a pretty decent, if spammy-reading, resume, and I’d turn it in.

I didn’t see any difference in response rates when I did this, though my control group was using my basic resume and checking for clear gaps I could address (see above), so perhaps that was good enough?

From how people described the importance of keyword stuffing, though, I’d have expected the response rate to go through the roof, and it stayed at basically zero.

Success rates, generally and versus screening AI

I didn’t feel like there was much of a return on any of this. If I hadn’t felt like using ChatGPT for rewrites wasn’t improving the quality of my resumes as I saw them, I’d have given up.

One of the reasons people told me to do keyword stuffing (and often, that I should just paste the JD in at the end, in 0-point white letters — this was the #1 piece of advice people would give me when I talked to them about job searching) was that everyone was using AI tools to screen, and if I didn’t have enough keywords, in the right proportion, I’d get booted from jobs.

I didn’t see any difference in submitting to the different ATS systems, and if you read up on what they offer in terms of screening tools, you don’t see the kind of “if <80% keyword match, discard” process happening.

I’d suggest part of this is because using LLMs for this would be crazy prejudicial against historically disadvantaged groups, and anyone who did it would and should be sued into a smoking ruin.

But if someone would do that anyway, from my experience here having ChatGPT point out gaps in my resume where any human would have made the connection, I wouldn’t want to trust it to reject candidates. Maybe you’re willing to take a lot of false negatives if you still get true positives to enter the hiring process, but as a hiring manager, I’m always worried about turning down good people.

There are sites claiming to use AI to compare your resume to job descriptions and measure how they’re going to do against AI screening tools — I signed up for trials and I didn’t find any of them useful.

Things ChatGPT was terrible at

Writing from scratch

If I asked “given this resume and JD, what are key points to address in a cover letter?” I would get a list of things, of which a few were great, and then I’d write a nice letter.

If I asked ChatGPT to write that cover letter, it was the worst. Sometimes it would make things up to address the gaps, or offer meaningless garbage in that eager-to-please voice. The making things up part was bad, but even when it succeed, I hate ChatGPT’s writing.

This has been covered elsewhere — the tells that give away that it’s AI-written, the overly-wordy style, the strange cadence of it — so I’ll spare you that.

For me, both as job seeker and someone who has been a hiring manager for years, it’s that it’s entirely devoid of personality in addition to being largely devoid of substance. They read like the generic cover letters out of every book and article ever written on cover letters — because that’s where ChatGPT’s pulling from, so as it predicts what comes next, it’s in the deepest of ruts. You can do some playing around with the prompts, but I never managed to get one I thought was worth reading.

What I, on both sides of the process, want is to express personality, and talk about what’s not on the resume. If I look at a resume and think “cool, but why are they applying for this job?” and the cover letter kicks off with “You might wonder why a marine biologist is interested in a career change into product management, and the answer to that starts with an albino tiger shark…” I’m going to read it, every time, and give some real thought to whether they’d be bringing in a new set of tools and experiences.

I want to get a sense of humor, of their writing, of why this person for this job right now.

ChatGPT responses read like “I value your time at the two seconds it took to copy and paste this.”

And yes, cover letters can be a waste of time. Set aside the case where you’re talking about a career jump — I’d rather no cover letter than a generic one. A ChatGPT cover letter, or its human-authored banal equivalent, says the author values the reader’s time not at all, while a good version is a signal that they’re interested enough to invest time to write something half-decent.

Don’t use ChatGPT to write things that you want the other person to care about. If the recipient wants to see you, or even just that you care about the effort of your communication, don’t do it. Do the writing yourself.

For anything where there’s latitude for confabulation

(And there’s always latitude for confabulation)

If you ask ChatGPT to rewrite a resume to better suit a job description, you’ll start to butt up against it writing the resume to match the job description. You have to watch very closely.

I’d catch things like managerial scope creep: if you say you lead a team, on a rewrite you might find that you were in charges of things often associated with managing that you did not do. Sometimes it’s innocuous: hey, I did work across the company with stakeholders! And sometimes it’s not: I did not manage pricing and costs across product lines, where did that come from?

The direction was predictable, along the eager-to-please lines — always dragging it towards what it perceived as a closer match, but it often felt like a friend encouraging you to exaggerate on your resume, and sometimes, to lie entirely. I didn’t like it.

When I was doing resume rewriting, I made a point to never use text immediately, when I was in the flow of writing, because I’d often look back at a section of the resume and think “I can’t submit that, that’s not quite true.”

That’s annoying, right? A thing you have to keep an eye on, drag it back towards the light, mindful that you need to not split the difference, to always resist the temptation to let it go.

Creepy. Do not like.

In some circumstances it’s wild, though — I tried to get fancy with it and have it ask standard interview questions and then, based on my resume, answer as best it could. I included a “if there’s no relevant experience, skill, or situation in the resume, please say you don’t know” clarification. And it would generally do okay, and then asked about managing conflicting priorities, described a high-stakes conflict between the business heads and the technology team where we had to hit a target but we had to do a refactor, and ChatGPT entirely made up a whole example situation that followed the STAR (situation, task, action, response) model for answering, with a happy conclusion for everyone involved.

Reminded that that didn’t happen and to pass on questions it didn’t have a good response to, ChatGPT replied “Apologies for the confusion, I misunderstood the instructions…” and then restated the clarification to my satisfaction, and we proceeded. It did the same thing two questions later: totally made up generic example of a situation that could have happened at my seniority level.

If I’d just been pasting in answers to screener questions, I’d have claimed credit for results never achieved, and been the hero in crises that never occurred. And if I’d been asked about them, they’re generic enough someone could have lied their way though it for a while.

No one wants to be caught staring at their interviewer when asked “this situation with the dinosaur attack on your data center is fascinating, can you tell me more about how you quarterbacked your resiliency efforts?”

My advice here — don’t use it in situations like this. Behavioral questions proved particularly prone, but any time there was a goal like “create an answer that will please the question-asker” strange behavior started coming out of the woodwork. It’s eager to please, it wants to get that job so so badly!

Finding for jobs and job resources

Every time I tried looking for resources specific to Product Management jobs, the results were garbage “Try Indeed!” I’d regenerated and get “Try Glassdoor and other sites…” In writing this I went back to try again, and it’s now only almost all garbage still —

LinkedIn: This platform is not only a networking site but also a rich resource for job listings, including those in product management. You can find jobs by searching for “product management” and then filtering by location, company, and experience level. LinkedIn also allows you to network with other professionals in the field and join product management groups for insights and job postings.

But… regenerating the response amongst the general-purpose junk I got it to mention Mind the Product, a conference series with a job board, after it went through the standard list of things you already know about. Progress?

I got similarly useless results, when I was looking for jobs with particular fields, like climate change or at B-corps (“go find a list of B-corporations!”). It felt frustratingly like it wasn’t even trying, which — you have to try not to anthropomorphize the tool, it’s not helpful.

It is though another example of how ChatGPT really wants to please: it does not like saying “I don’t know” and would rather say “searching the web will turn up things, have you tried that?”

What I’d recommend

Use the LLM of your choice for:

Interview preparation, generally and for specific jobs
Suggestions for tailoring your resume
Help editing your resume

And keep an eye on it. Again, imagine you’ve been handed the response by someone with a huge grin, wide eyes with massively dilated pupils, an expectant expression, and who is sweating excessively for no discernible reason.

I got a lot out of it. I didn’t spend much time in GPT 3.5, but it seemed good enough for those tasks compared to GTP4. When I tried some of the other LLM-based tools, they seemed much worse — my search started May 2023, though, so obviously, things have already changed substantially.

And hey, if there are better ways to utilize these tools, let me know.

Where Reddit’s gone wrong: 3rd party apps are invaluable user research and a competitive moat, not parasites

Leave a reply

By supporting the ability of anyone to build on top of Reddit’s platform, Reddit created an invaluable user research arm that also provides a long-term competitive advantage by keeping potential competitors and their customers contributing to Reddit. This an incredibly difficult thing to do, and they seem suddenly blind to why it was worth it.

In a recent Verge interview with the CEO Steve Huffman:

PETERS: I want to stop you for a second there. So you’re saying that Apollo, RIF, Sync, they don’t add value to Reddit?

HUFFMAN: Not as much as they take. No way.

(and I’m going to ignore for the moment questions on how they’ve handled this, monetization, and so on, focusing only on this core value they’ve created and are destroying)

A vast community of people all working on new designs, development innovations, and approaches, responding immediately to user feedback to try new things – compare this to what you have to do internally.

Every company I’ve been at has a limited user research budget to discover their customers and their needs, and as limited room to get feedback on possible solutions by building prototypes or even showing paper drawings. To entirely focus on new ideas? You might be lucky to get a Hack Day once a quarter.

If you have a thriving third party development community, you have an almost unlimited budget for all of these things, happening immediately, and on a hundred, a thousand different ideas at any one time, and those ideas are beyond what you might be able to brainstorm.

It’s a dream, and once you’ve done the hard work of getting the ecosystem healthy, it does it on its own. Anything you want to think about you’ll find someone has already broken the trail for you to follow, and sometimes they’ve built a whole highway.

You can think small, like “how can we make commenting easier?” There will be a half-dozen different interpretations of what comment threading should look like, and you have the data to see if those changes help people comment more, and if that in turn makes them more engaged in conversation.

And it goes far beyond that, to entirely new visions of how your product might work for entirely new customers.

If you’re sitting around the virtual break room and someone says “what if we leaned into the photo sharing aspect, and made Reddit a totally visual, photo-first experience?” in even the best company you’re going to need to make a case to spend the time on it, then build it, figure out how to get it cleared with the gatekeepers of experimentation…

Or if you have a 3rd party ecosystem as strong as Reddit’s, you can type “multireddit photo browser” or something into a search engine and tada, there you go, a whole set of them, fully functional, taking different approaches, different customer groups. I just did that search and there’s a dozen interesting takes on this.

Every different take on the UX, and every successful third-party application is a set of customer insights any reasonable company would pay millions for. Having a complete set of APIs publicly available lets other people show you things you might not have dreamed possible (this is also a hidden reason why holding back features or content from your APIs is more harmful that it initially seems).

Successful third party applications give you insight into:

A customer group
What they’re trying to do
By comparison, how you’re failing to give it to them
A minimum number to what they’re willing to pay to solve that problem

Even when these applications don’t discover something that’s useful – say someone builds a tool that’s perfect for 0.1% of the user base, but that tool requires a lot of client-side code, so it’s just not worth it to bring that into the main application. It’s still a huge win, because those users are still on the platform, participating in the core activities that make the system run, building the network effects (and, because you’re a business, making money in total).

And if those developers of these niche apps ever hit gold and start to grow explosively, you’ll see it, and be able to respond, far earlier than you would if they weren’t on your platform.

That’s great!

The biggest barrier for any challenger app isn’t the idea, or even the design and execution, it’s attracting enough users to be viable, and surviving the scale problems if it does start to grow. By supporting a strong third party application ecosystem, you’re ensuring that they never solve those problems – their user growth is your user growth. They don’t have to solve the problem of solving the scaling infrastructure because you did. It will always make short-term sense to stay with you.

Instead of building competitors, you’re building collaborators, who will be pulling you to make your own APIs ever-better, who are working with you and contributing to the virtuous cycle at the heart of a successful user-based product like Reddit.

I know, from the outside we just don’t get it. Reddit’s under huge pressure to IPO, and the easy MBA-approved path to a successful IPO is ad revenue, which means getting all those users on the first-party apps, seeing the ads, harvesting their data, all that gross stuff. And we can imagine that the people pushing this path to riches look at all of these third party apps and say “there’s a million people on Apollo, if they were on our app, we’d make $5m more in ad revenue next year.”

This zero-sum short-sighted thinking may not be the doom of Reddit – they may well shut down all the third-party apps and survive the current rebellion of moderators and users (and the long-term effects of their response to it).

It was and could have been such a beautiful partnership, where Reddit thrived learning, cooperating with, and improving itself along with its outside partners. As this developer community now looks to rebuild around free and decentralized platforms like Mastodon, it’s easy to see how Reddit’s lost ecosystem might eventually return to topple them.

Hate Life, Will Travel

Occasional musings of Derek Zumsteg

How AI tools for writing product docs fail, and the value of writing yourself

I want all my competitors to use AI for product strategy

Using AI tools for product management: competitive research

Using AI tools for Product Management: document comparison

Can you use AI throughout product management?

Can AI help product management summarize customer feedback?

The test

What happened

Trying with other models

Commonalities

Useful, but don’t give up reading it yourself

Can AI help product management? Today: failing at rote, boring research

Sometimes stakeholder management is wildfire management

Using ChatGPT for a job search: what worked, didn’t, and what’s dangerously bad

An aside, on job hunting and the AI arms race

And a quick caution about relying on ChatGPT in two ways

What ChatGPT was great for

Rewriting

Interview Prep

What ChatGPT was useful for

Comparing resumes to a job description and offering analysis

Industry research and comparison work

I don’t know if ChatGPT helped on

Keyword stuffing

Success rates, generally and versus screening AI

Things ChatGPT was terrible at

Writing from scratch

For anything where there’s latitude for confabulation

Finding for jobs and job resources

What I’d recommend

Where Reddit’s gone wrong: 3rd party apps are invaluable user research and a competitive moat, not parasites

Share this:

Share this:

Share this:

Share this:

Share this:

The test

What happened

Trying with other models

Commonalities

Useful, but don’t give up reading it yourself

Share this:

Share this:

Share this:

An aside, on job hunting and the AI arms race

And a quick caution about relying on ChatGPT in two ways

What ChatGPT was great for

Rewriting

Interview Prep

What ChatGPT was useful for

Comparing resumes to a job description and offering analysis

Industry research and comparison work

I don’t know if ChatGPT helped on

Keyword stuffing

Success rates, generally and versus screening AI

Things ChatGPT was terrible at

Writing from scratch

For anything where there’s latitude for confabulation

Finding for jobs and job resources

What I’d recommend

Share this:

Share this: