Use ollama and open-webui to experience open source big models
It has been more than a year since the use of large models. From the initial chatgbt3.5/4 to the current domestic large models, open source large models are becoming more and more powerful, such as Llama, SD, etc. Today, I will introduce two tools that have been used for a long time, ollama and open-webui.
Ollama
Ollama is an open source deep learning framework designed for easy deployment and operation of large language models (LLM) on local machines. It provides a complete set of deep learning tool chains, including data preprocessing, model building, training, evaluation and deployment.
Project address: https://github.com/ollama/ollama
Main features of ollama:
Local deployment and operation: One of the main goals of Ollama is to simplify the process of deploying large language models in Docker containers. This makes it easy for non-professional users to manage and run these complex models.
Lightweight and scalability: As a lightweight framework, Ollama maintains a small resource footprint and has good scalability. This allows users to adjust the configuration according to the scale and hardware conditions of the project.
API support: Ollama provides a simple API that enables developers to easily create, run, and manage large language model instances. This lowers the technical threshold for interacting with models.
Pre-built model library: Ollama contains a series of pre-trained large language models. Users can directly select these models and apply them to their own applications without having to train from scratch or find the model source themselves.
Model import and customization: (1) Import from GGUF: Supports importing existing large language models from specific platforms (such as GGUF). (2) Import from PyTorch or Safetensors: Compatible with these two deep learning frameworks, allowing users to integrate models trained based on these frameworks into Ollama. (3) Custom prompts: Allow users to add or modify prompts for the model to guide the model to generate text output of a specific type or style.
Cross-platform support: Provides installation guides for macOS, Windows (preview version), Linux, and Docker to ensure that users can successfully deploy and use Ollama in multiple operating system environments.
Command line tools and environment variables: Command line startup: The Ollama service can be started through the command ollamaserve or its aliases serve and start.
Environment variable configuration: such as OLLAMA_HOST, which is used to specify the host address and port bound to the service. Users can modify it as needed.
In addition, Ollama also provides rich API interfaces, community and documentation support, making it a powerful tool for developers and individual users to run and manage large language models locally.
ollama quick installation
linux ollama installation command:
|
|
ollama basic commands
Talk to llama3 Chinese version:
|
|
Use ollama to adjust the prompt words and parameters of the large model
Here, based on Alibaba’s open source Qianwen large model, create Modelfile
|
|
|
|
REST API request
Generate response:
Model dialogue:
ollama supports many large models. If you need more experience, you can move to: https://ollama.com/library
Open-webui
Open-webui is an extensible, feature-rich, user-friendly self-hosted web UI developed with the Svelte front-end framework and designed to run completely offline. It supports various LLM runners, including APIs compatible with Ollama and OpenAI.
Previously, you can operate the Ollma command line, but now you can operate it completely on the web.
open-webui features
Easy installation: Seamless installation with Docker or Kubernetes (kubectl, kustomize or helm), support for :ollama and :cuda tag images, and get a worry-free experience.
Ollama/OpenAI API integration: Easily integrate OpenAI compatible APIs for versatile conversations with Ollama models. Custom OpenAI API URLs link to LMStudio, GroqCloud, Mistral, OpenRouter, etc.
Pipelines, Open web plugin support: Seamlessly integrate custom logic and Python libraries into Open web with the Pipelines Plugin Framework. Start a pipeline instance, set the OpenAI URL as the pipeline URL, and explore endless possibilities. Examples include function calls, user rate limits to control access, usage monitoring with tools such as Langfuse, real-time translation with multi-language support using LibreTranslate, poisonous message filtering, and more.
Responsive Design: Enjoy a seamless experience on desktop, laptop, and mobile.
Mobile Progressive Web App (PWA): Enjoy a native app-like experience on your mobile device with our PWA, providing offline access on localhost and a seamless user interface.
Full Markdown and LaTeX Support: Elevate your LLM experience with comprehensive Markdown and LaTeX capabilities to enrich interactions.
️ Model Generator: Easily create [recommended] models through the web UI. Easily create and add custom roles/agents, custom chat elements, and import models with Open web Community integration.
Native RAG Integration: Dive into the future of chat interactions with groundbreaking Retrieval Augmented Generation (RAG) support. This feature seamlessly integrates document interactions into your chat experience. You can load documents directly into the chat or add files to the document library to easily access them using the # command before querying.
Web Search for RAG: Perform web searches using providers like SearXNG, Google PSE, Brave Search, serstack, and serper, and inject results directly into your chat experience.
Web Browsing: Seamlessly integrate websites into your chat experience using the # command followed by a URL. This feature allows you to incorporate web content directly into your conversations, enhancing the richness and depth of your interactions.
Image Generation Integration: Seamlessly incorporate image generation capabilities using the AUTOMATIC1111 API or options like ComfyUI (local) and OpenAI’s DALL-E (external) to enrich your chat experience with dynamic visual content.
️Many Model Conversations: Effortlessly engage with a variety of models simultaneously, leveraging their unique strengths to get the best responses. Enhance your experience by leveraging a diverse set of models in parallel.
Role-Based Access Control (RBAC): Ensure secure access with restricted permissions; only authorized individuals can access your Ollama, and exclusive model creation/extraction permissions are reserved for administrators.
Multi-language support: Experience the Open web in your preferred language with our internationalization (i18n) support. Join us in expanding the languages we support! We are actively looking for contributors!
Continuous updates: We are committed to improving the Open web with regular updates, fixes, and new features.
Deploy open-webui
In order to use with ollama, here we use docker-compose to quickly deploy open-webui:
You can refer to: https://github.com/valiantlynx/ollama-docker
Since I use the Nvidia Tesla T4 card locally, I directly use the GPU configuration (the driver needs to be installed in advance, which is ignored here)
docker-compose-ollama-gpu.yaml
|
|
Start the service:
|
|
Visit https://xxx.xxx.xxx.xxx:8080/ and register for a management account to use it.
Summary
The combination of Ollama and Open WebUI brings many advantages. First, through the intuitive UI interface, non-professional readers can also easily build a deep learning platform and train models. Second, Open WebUI provides Ollama with rich functions, such as data visualization, model performance monitoring, etc., so that users can more easily understand the training status and performance of the model. Finally, the open source nature of Ollama and Open WebUI allows users to customize and extend them according to their needs.