Today, most companies are facing a paradox: about 80% of their daily processes still rely on classic structured data – familiar Excel spreadsheets and relational databases (RDBMS) (М. Shacklett, “Structured and unstructured data: Key differences,” 2024). However, at the same time, 80% of new information entering the digital ecosystem of companies is unstructured or loosely structured (Fig. 3.2-3)(“Structured and unstructured data: What’s the Difference?,” 2024). This includes text, graphics, geometry, images, CAD -models, documentation in PDF, audio and video recordings, electronic correspondence, and much more.
Moreover, the volume of unstructured data continues to grow rapidly – the annual growth rate is estimated at 55-65% (К. Woolard, “Making sense of the growth of unstructured data,” 2024). Such dynamics creates serious difficulties in integrating new information into existing business processes. Ignoring this flow ofmultiformat data leads to the formation of information gaps and to a decrease in the manageability of the entire digital environment of the company.

Ignoring complex unstructured and confusing loosely structured data in automation processes can lead to significant gaps in a company’s information landscape. In today’s world of uncontrolled and avalanche-like information movement, companies need to adopt a hybrid approach to data management that incorporates effective methods for handling all types of data.
The key to effective data management lies in organizing, structuring and classifying different types of data “Babel” (including unstructured, textual and geometric formats, into structured or loosely structured data). This process transforms chaotic data sets into organized structures for integration into systems, thereby enabling decision making based on them (Fig. 3.2-4).

One of the key obstacles to such unification remains the low level of interoperability between different digital platforms – the “silos” we discussed in the previous chapters.
According to the report, the National Institute of Standards and Technology (NIST, USA) emphasizes (A. C. O. J. L. D. J. a. L. T. G. Michael P. Gallaher, “Cost Analysis of Inadequate Interoperability in the,” 2004) that poor data compatibility between different building platforms leads to loss of information and significant additional costs. In 2002 alone, due to software interoperability problems, losses in US capital construction amounted to 15.8 billion dollars per year, where two-thirds of these losses are borne by building owners and operators, especially during operation and maintenance (A. C. O. J. L. D. J. a. L. T. G. Michael P. Gallaher, “Cost Analysis of Inadequate Interoperability in the,” 2004). The study also notes that standardization of data formats can reduce these losses and improve efficiency throughout all phases of the facility life cycle.
According to the 2016 CrowdFlower study (CrowdFlower, “Data Science Report 2016,” 2016), which covered 16,000 data scientists around the world, the main problem remains “dirty” and multiformat data. According to this study, the most valuable resource is not the final databases or machine learning models, but the time spent on preparing information.
Cleaning, formatting, and organizing takes up to 60 percent of an analyst’s and data manager’s time. Nearly one-fifth is spent finding and collecting the right data sets, which are often hidden in closed storage (“silos”) and inaccessible for analysis. And only about 9 percent of the time is spent directly on modeling, analytics, making predictions and testing hypotheses. The rest is spent communicating, visualizing, reporting, and researching supporting information sources
On average, a manager’s data work is distributed as follows (Fig. 3.2-5):
- Cleaning and organizing data (60%): having clean and structured data can significantly reduce the analyst’s work time and speed up the process of completing tasks.
- Data collection (19%): A major challenge for data science professionals is finding relevant datasets. Often, company data is stacked in chaotically organized “silos,” making it difficult to access the information they need.
- Modeling/Machine Learning (9%): Often complicated by lack of clarity of business objectives on the part of customers. The lack of a clear mission statement can negate the potential of even the best model.
- Other tasks (5%): in addition to processing data, analysts have to deal with research, exploring data from different angles, communicating results through visualizations and reports, and recommending optimization processes and strategies.

These estimates are supported by other studies as well. According to the Xplenty study published in BizReport in 2015(BizReport, “Report: Data scientists spend bulk of time cleaning up,” 06 July 2015), between 50% and 90% of business intelligence (BI) professionals’ time is spent on preparing data for analysis.
Cleaning, validating and organizing data represents a critical foundation for all downstream data and analytics processes, taking up to 90% of data scientists’ time.
This painstaking work, invisible to the end user, is critical. Errors in raw data inevitably distort analysis results, are misleading and can lead to costly management errors. That’s why data cleaning and standardization processes – from eliminating duplicates and filling in omissions to harmonizing units of measure and aligning to a common model – are becoming a cornerstone of today’s digital strategy.
Thus, thorough transformation, cleaning and standardization of data not only occupy the majority of specialists’ time (up to 80% of work with data), but also determine the possibility of their effective use within the framework of modern business processes. However, data organization and cleaning alone do not exhaust the task of optimal management of the company’s information flows. During the stage of organization and structuring becomes the choice of a suitable data model, which directly affects the convenience and efficiency of working with information in subsequent stages of processing.
Since data and business objectives are different, it is important to understand the characteristics of data models and be able to select or create the right structure. Depending on the degree of structuring and the way the relationships between elements are described, there are three main models: structured, loosely structured and graph models. Each is suitable for different tasks and has its own strengths and weaknesses.