• polyploy@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    111
    ·
    6 days ago

    God damn this is bleak.

    Mitch says the first signs of a deepening reliance on AI came when the company’s CEO was found to be rewriting parts of their app so that it would be easier for AI models to understand and help with. “Then”, Mitch says, “I had a meeting with the CEO where he told me he noticed I wasn’t using the Chat GPT account the company had given me. I wasn’t really aware the company was tracking that”.

    “Anyway, he told me that I would need to start using Chat GPT to speed up my development process. Furthermore, he said I should start using Claude, another AI tool, to just wholesale create new features for the app. He walked me through setting up the accounts and had me write one with Claude while I was on call with him. I’m still not entirely sure why he did that, but I think it may have been him trying to convince himself that it would work.”

    Mitch describes this increasing reliance on AI to be not just “incredibly boring”, but ultimately pointless. “Sure, it was faster, but it had a completely different development rhythm”, they say. “In terms of software quality, I would say the code created by the AI was worse than code written by a human–though not drastically so–and was difficult to work with since most of it hadn’t been written by the people whose job it was to oversee it”.

    “One thing to note is that just the thought of using AI to generate code was so demotivating that I think it would counteract any of the speed gains that the tool would provide, and on top of that would produce worse code than I didn’t understand. And that’s not even mentioning the ethical concerns of a tool built on plagiarism.”

    • Pennomi@lemmy.world
      link
      fedilink
      English
      arrow-up
      52
      ·
      6 days ago

      Code written by AI is really poorly written. A couple smells I’ve noticed:

      • Instead of fixing error cases, it overly relies on try:catch structures, making critical bugs invisible but still present. Dangerous shit.
      • It doesn’t reuse code that already exists in the project. You have to do a lot of extra work letting it know that your helper functions or CSS utility classes exist.
      • It writes things in a very “brute force” way. If it finds any solution, it charges forward with the implementation even if there is a much simpler way. It never thinks “but is there a better way?”
      • Likewise, it rarely uses the actual documentation for your library. It goes entirely off of gut instincts. Half the time if you paste in a documentation page, it finally shapes up and builds the code right. That should be default behavior.
      • It has a string tendency to undo manual changes I have made, because it doesn’t know the reasons why I did them.

      On the other hand, if you’re in a green field project and need to throw up some simple, dirty CSS/HTML for a quick marketing page, sure, let the AI bang it out. Some projects don’t need to be done well, they just need to be done fast.

      And the autocomplete features can be a time saver in some cases regardless.

      • svtdragon@lemmy.world
        link
        fedilink
        English
        arrow-up
        26
        ·
        edit-2
        6 days ago

        I just spent about a month using Claude 3.7 to write a new feature for a big OSS product. The change ended up being about 6k loc with about 14k of tests added to an existing codebase with an existing test framework for reference.

        For context I’m a principal-level dev with ~15 years experience.

        The key to making it work for me was treating it like a junior dev. That includes priming it (“accuracy is key here; we can’t swallow errors, we need to fail fast where anything could compromise it”) as well as making it explain itself, show architecture diagrams, and reason based on the results.

        After every change there’s always a pass of “okay but you’re violating the layered architecture here; let’s refactor that; now tell me what the difference is between these two functions, and shouldn’t we just make the one call the other instead of duplicating? This class is doing too much, we need to decompose this interface.” I also started a new session, set its context with the code it just wrote, and had it tell me about assumptions the code base was making, and what failure modes existed. That turned out to be pretty helpful too.

        In my own personal experience it was actually kinda fun. I’d say it made me about twice as productive.

        I would not have said this a month ago. Up until this project, I only had stupid experiences with AI (Gemini, GPT).

        • Pennomi@lemmy.world
          link
          fedilink
          English
          arrow-up
          15
          ·
          6 days ago

          Agreed. I use it in my daily workflow but you as the senior developer have to understand what can and cannot be delegated, and how to stop it from doing stupid things.

          For instance when I work in computer vision or other math-heavy code, it’s basically useless.

        • FarceOfWill@infosec.pub
          link
          fedilink
          English
          arrow-up
          13
          ·
          6 days ago

          Typically working with a junior on a project is slower than not working with them. It’s a little odd you see this as like that and that it’s also faster.

          • Quik@infosec.pub
            link
            fedilink
            English
            arrow-up
            12
            ·
            6 days ago

            I don’t think it’s odd, because LLMs are just way faster than any junior (or senior) Dev. So it’s more like working with four junior devs but with the benefit of having tasks done sequentially without the additional overhead of having to give tasks to individual juniors and context switching to review their changes.

            (Obviously, there are a whole lot of new pitfalls, but there a real benefits in some circumstances)

          • svtdragon@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            6 days ago

            The PR isn’t public yet (it’s in my fork) but even once I submit it upstream I don’t think I’m ready to out my real identity on Lemmy just yet.

      • AA5B@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        ·
        5 days ago

        It doesn’t reuse code that already exists in the project

        I had a pissing contest with one of the junior guys over this. He didn’t seem to understand why we should use the existing function and had learned so little about the code base that he didn’t know where to find it. He’s gone

        The more interesting flaw in his ai code was it hallucinated an entirely different mocking tool for unit tests

      • BeigeAgenda@lemmy.ca
        link
        fedilink
        English
        arrow-up
        11
        ·
        6 days ago

        Sounds about right, I had a positive experience when I told my local LLM to refactor a function and add a single argument.

        I would not dare letting it loose on a whole source file, because it changes random things giving you more code to review.

        In my view current LLM’s do a acceptable job with:

        • Adding comments
        • Writing docstrings
        • Writing git commit messages
        • Simple tasks on small pieces of code
      • morrowind@lemmy.ml
        link
        fedilink
        English
        arrow-up
        10
        ·
        6 days ago

        Yeah likewise. I think it shows the primary weakness of Llms right now is not skill or understanding, but context.

        It can’t use functions or docs that it doesn’t know about. Neither can I. RAG systems are supposed to solve this but in practice they don’t seem to work that great