Modern tools allow companies to deploy a large language model (LLM) locally in just a few hours. This gives complete control over data and infrastructure, eliminating dependence on external cloud services and minimizing the risk of information leakage. This solution is especially relevant for organizations working with sensitive project documentation or confidential business data.
Depending on the tasks and resources, different deployment scenarios are available, from out-of-the-box solutions to more flexible and scalable architectures. One of the easiest tools is Ollama, which allows you to run language models literally in one click, without the need for deep technical knowledge. A quick start with Ollama:
- Download the distribution for your operating system (Windows / Linux / macOS) from the official website: ollama.com
- Install the model via the command line. For example, for the Mistral model:
ollama run mistral
After running the model is ready to work – you can send text queries through the terminal or integrate it into other tools. Run the model and execute a query:
ollama run mistral “How to create a calculation with all the resources for the work to install a 100mm wide plasterboard partition wall?”
For those who prefer to work in a familiar visual environment, there is LM Studio, a free application with an interface reminiscent of ChatGPT
- Install LM Studio by downloading the distribution kit from the official website – lmstudio.ai
- Through the built-in catalog, select a model (e.g. Falcon or GPT-Neo-X) and download it
- Work with the model through an intuitive interface reminiscent of ChatGPT, but completely localized

The choice of model depends on the requirements for speed, accuracy and available hardware capabilities (Fig. 3.3-4). Small models such as Mistral 7B and Baichuan 7B are suitable for lightweight tasks and mobile devices, while powerful models such as DeepSeek -V3 require significant computational resources but provide high performance and support for multiple languages. In the coming years, the LLM market will grow rapidly – we will see more and more lightweight and specialized models. Instead of general-purpose LLMs covering all human content, models trained on narrow domain expertise will emerge. For example, we can expect to see the emergence of models designed solely to work with engineering calculations, construction estimates, or CAD-formatted data. Such specialized models will be faster, more accurate and safer to use – especially in professional environments where high reliability and subject matter depth are important.
Once the local LLM has been launched, it can be adapted to the company’s specific tasks. For this purpose, the fine-tuning technique is used, whereby the model is further trained on internal documents, technical instructions, contract templates or project documentation.