Big day for people who use AI locally. According to benchmarks this is a big step forward to free, small LLMs.

  • just another devA
    link
    fedilink
    English
    72 months ago

    Yeah, there’s a massive negative circlejerk going on, but mostly with parroted arguments. Being able to locally run a model with this kind of context is huge. Can’t wait for the finetunes that will result from this (*cough* NeverSleep’s *-maid models come to mind).

    • @[email protected]
      link
      fedilink
      English
      2
      edit-2
      2 months ago

      I am looking into doing it on the 12B for myself, not so much for RP but novel style prose.

      I am thinking literature + a fanfic dump as a dataset?

      • just another devA
        link
        fedilink
        English
        11 month ago

        Ah, that’s a wonderful use case. One of my favourite models has a storytelling lora applied to it, maybe that would be useful to you too?

        At any rate, if you’d end up publishing your model, I’d love to hear about it.

          • just another devA
            link
            fedilink
            English
            11 month ago

            Oof - not on my 12gb 3060 it doesn’t :/ Even at 48k context and the Q4_K quantization, it’s ollama its doing a lot of offloading to the cpu. What kind of hardware are you running it on?

            • @[email protected]
              link
              fedilink
              English
              2
              edit-2
              1 month ago

              A 3090.

              But it should be fine on a 3060, with zero offloading.

              Dump ollama for long context. Grab a 5-6bpw exl2 quantization and load it with Q4 or Q6 cache depending on how much context you want. I personally use EXUI, but text-gen-webui and tabbyapi (with some other frontend) will also load them.