Short one today. I tried the “give the tool both versions of a document and have it summarize changes” and I’d say the results were a total failure.
I had two versions of an API specification, the first from when we began talking to a provider, and the second from months later, during which there’d been significant but not huge changes. I gave both to AI tools.
Both APIs were a little over 3 megs.
Here’s what happened:
- I’m going to give you two versions of an API in PDF form, can you compare them and summarize changes between them, particularly new API calls…
- “Sure! Can do. This will take a bit.”
- Leave it for ages.
- Ask what’s up.
- “Whoops, that was a huge task and it looks like I forgot what I was doing, but I can’t tell you what happened or how to avoid it. Can you give me the documents again?”
- Did something go wrong?
- “I can’t tell if there was an error or anything”
- Sure, how long should this take? When should I check back? (note: I know LLMs are awful with time, this was probably pointless to ask)
- (LLM chews on this for a second — examining the thread, it’s doing searches on “how long to LLMs take to compare PDFs”)
- “30-60 minutes! I’ll let you know if I encounter any errors.”
- Leave it for ages
- Go to step four
Eventually, it asks for excerpts or areas to focus on, and that doesn’t work. It tries to just do the headers and produces bad lists, like
- API call one
- API call two
- (blank)
- (blank)
- (blank)
And other strange output, and I give up.
I may give this another shot with different PDFs, or using the API docs as plain text, but the short version is this was a frustrating and unproductive experiment.