ChatGPT Answers Programming Questions Incorrectly 52% of the Time: Study

@[email protected] · 1 year ago

ChatGPT Answers Programming Questions Incorrectly 52% of the Time: Study

@[email protected] · edit-2 1 year ago

I wouldn’t trust an LLM to produce any kind of programming answer. If you’re skilled enough to know it’s wrong, then you should do it yourself, if you’re not, then you shouldn’t be using it.

I’ve seen plenty of examples of specific, clear, simple prompts that an LLM absolutely butchered by using libraries, functions, classes, and APIs that don’t exist. Likewise with code analysis where it invented bugs that literally did not exist in the actual code.

LLMs don’t have a holistic understanding of anything—they’re your non-programming, but over-confident, friend that’s trying to convey the results of a Google search on low-level memory management in C++.

@[email protected] · edit-2 1 year ago

If you’re skilled enough to know it’s wrong, then you should do it yourself, if you’re not, then you shouldn’t be using it.

Oh I strongly disagree. I’ve been building software for 30 years. I use copilot in vscode and it writes so much of the tedious code and comments for me. Really saves me a lot of time, allowing me to spend more time on the complicated bits.

@[email protected] · edit-2 1 year ago

I’m closing in on 30 years too, started just around '95, and I have yet to see an LLM spit out anything useful that I would actually feel comfortable committing to a project. Usually you end up having to spend as much time—if not more—double-checking and correcting the LLM’s output as you would writing the code yourself. (Full disclosure: I haven’t tried Copilot, so it’s possible that it’s different from Bard/Gemini, ChatGPT and what-have-you, but I’d be surprised if it was that different.)

Here’s a good example of how an LLM doesn’t really understand code in context and thus finds a “bug” that’s literally mitigated in the line before the one where it spots the potential bug: https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-stands-for-intelligence/ (see “Exhibit B”, which links to: https://hackerone.com/reports/2298307, which is the actual HackerOne report).

LLMs don’t understand code. It’s literally your “helpful”, non-programmer friend—on stereoids—cobbling together bits and pieces from searches on SO, Reddit, DevShed, etc. and hoping the answer will make you impressed with him. Reading the study from TFA (https://dl.acm.org/doi/pdf/10.1145/3613904.3642596, §§5.1-5.2 in particular) only cements this position further for me.

And that’s not even touching upon the other issues (like copyright, licensing, etc.) with LLM-generated code that led to NetBSD simply forbidding it in their commit guidelines: https://mastodon.sdf.org/@netbsd/112446618914747900

Edit: Spelling

@[email protected] · edit-2 1 year ago

I’m very familiar with what LLMs do.

You’re misunderstanding what copilot does. It just completes a line or section of code. It doesn’t answer questions - it just continues a pattern. Sometimes quite intelligently.

Shoot me a message on discord and I’ll do a screenshare for you. #locuester

It has improved my quality and speed significantly. More so than any other feature since intellisense was introduced (which many back then also frowned upon).

@[email protected] · 1 year ago

Fair enough, and thanks for the offer. I found a demo on YouTube. It does indeed look a lot more reasonable than having an LLM actually write the code.

I’m one of the people that don’t use IntelliSense, so it’s probably not for me, but I can definitely see why people find that particular implementation useful. Thanks for catching and correcting my misunderstanding. :)

@[email protected] · edit-2 1 year ago

APIs that don’t exist

I had that. I got a bunch of ok code for an AWS API, but then it decided to hallucinate a method. I tried all kind of prompt to instruct it that the method didn’t exist and not to use it, but it always came back telling me it was the right way to do it.

Anyway, still faster than reading the doc for a one off script I just wanted thrown together quickly and never to be reused again.