I’m currently shopping around for something a bit faster than ollama and because I could not get it to use a different context and output length, which seems to be a known and long ignored issue. Somehow everything I’ve tried so far did miss one or more critical features, like:

  • “Hot” model replacement, so loading and unloading models on demand
  • Function calling
  • Support of most models
  • OpenAI API compatibility (to work well with Open WebUI)

I’d be happy about any recommendations!

  • hendrik@palaver.p3x.de
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    8 hours ago

    I’m also aware of LocalAI with automatic model swapping and OpenAI compatible API.

    But unless I’m mistaken, they all use ggml behind the scenes? So you might want to look for something that uses vllm or exllama or something if you want a completely different backend.

    • Daughter3546@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      3 hours ago

      I would not recommend LocalAI. There documentation is somewhat lacking and it’s an all in one utility with many moving parts. The parts also tend to break, quite often.

  • theunknownmuncher@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    1
    ·
    edit-2
    14 hours ago

    Ummm… did you try /set parameter num_ctx # and /set parameter num_predict #? Are you using a model that actually supports the context length that you desire…?

    • hendrik@palaver.p3x.de
      link
      fedilink
      English
      arrow-up
      3
      ·
      8 hours ago

      Btw, Ollama is a software to run AI models. Deepseek is just a company. Or a model file or a service. But that’s not what OP is looking for. They want to run a model. And that needs software like Ollama.