Organizations want to infuse LLMs into every part of their planning and decision making process. This translates to thousands, if not millions of API calls a day. While GPT-4 is great, it’s also prohibitively expensive, at scale.
Luckily, the open-source LLM community has been extraordinarily prolific and 70B Llama2 models can be easily customized for specific enterprise AI tasks. When done well, these fine-tunes can almost match GPT-4 performance for a small fraction of the cost.
The open-source LLMOps stack has 3 key components and is targeted at DIY developers and folks who want to build custom LLM use-cases within their organization.
Open Source LLMs APIs – Much like GPT-4 and GPT 3.5 Turbo these are simple APIs to OSS LLM models. We do the hosting of these 70B Llama-2 and Llama-2 32k (Abacus Giraffe) models for you. These modes are 60-120x cheaper than GPT-4 and 3-4x cheaper than GPT-3.5-Turbo. The advantage of using our model APIs is that you don’t have to spend time hosting them and can simply switch your OpenAI key with the Abacus AI one and get going!. The 32K context instruct tuned model – Abacus Giraffe – works very well for enterprise AI use-cases and is used by our large enterprise customers. The larger context allows you to pass longer chat histories and data from your custom knowledgebase.
Specifically we recommend these models for summarization, entity extraction and creating simple custom chatbots. A large number of common enterprise use-cases like contract terms extraction, PDF to structured data, chat on internal knowledge bases, customer support and tech support can be solved this way. Sign up here
Fine-tuned versions of Open-Source models – Oftentimes, you may have a specific task and have labeled examples. Abacus AI will offer custom fine-tunes based on your labeled data. We also have a few out-of-the-box fine-tunes for domain specific tasks like complex PDF extraction. Fine-tunes tend to be about 40-50x cheaper than GPT-4 and perform just as well. Fine-tunes are great when you have few high-quality examples of how you want the output structured. Fine-tuning also helps, if the non-fine-tuned model isn’t performant enough. Sign up here
Vector Store API for RAG – The last piece of the puzzle is a scalable vector store API where you can store and retrieve your embeddings. The Abacus AI vector store can store billions of embeddings, has millisecond latencies can deliver 100% recall. By combining our Vector Store APIs with the LLM APIs, you can quickly string together a ChatGPT for your own data. You don’t have to host your own Vector Store, you can simply use ours and get going for pennies! Sign up here
These LLMOps APIs are for developers who want a cheap, reliable and scalable alternative to closed models and want to just use stand-alone / individual components.
You can also optionally, just use our state-of-the-art LLMOps platform that offers all this and a lot more including orchestration, UX customization, code execution and data transformers. Using our standard LLMOps platform building a custom chatbot will take hours. Several of our enterprise AI customers are now using this platform to put dozens of use-cases in production.
- Open Source LLMs, Fine-Tunes and RAG Based Vector Store APIs - October 19, 2023
- Closing the Gap to Closed Source LLMs – 70B Giraffe 32k - September 25, 2023
- Treating Attention Deficit Disorder in LLMs - July 27, 2023