In the era of the data-driven economy, data is becoming the basis for decision-making rather than an obstacle. Instead of constantly adapting information to each new system and its formats, companies are increasingly striving to form a single structured data model that serves as a universal source of truth for all processes. Modern information systems are designed not around formats and interfaces, but around the meaning of data – because the structure may change, but the meaning of information remains the same for much longer.
The key to working effectively with data lies not in its endless conversion and transformation, but in organizing it correctly from the start: creating a universal structure capable of providing transparency, automation and integration at all stages of the project lifecycle.
The traditional approach forces manual adjustments with each new platform implementation: migrating data, changing attribute names, and adjusting formats. These steps do not improve the quality of the data themselves, but only mask problems, creating a vicious cycle of endless transformations. As a result, companies become dependent on specific software solutions, and digital transformation slows down.
In the following chapters, we will look at how to structure data properly and then how to create universal models, minimize platform dependency, and focus on what matters most – data as a strategic resource around which sustainable processes are built.
In construction projects, the vast majority of information exists in unstructured form: technical documents, statements of work, drawings, specifications, schedules, and protocols. Their diversity – both in format and content – complicates integration and automation.
The conversion process to structured or semi-structured formats may vary depending on the type of input data and the desired processing results.
Transforming data from unstructured to structured form is both an art and a science. This process varies depending on the type of input data and the purpose of the analysis and often takes up a significant portion of the work of the data engineer (Fig. 3.2-5) and analyst, with the goal of producing a clean, organized data set.

Turning documents, PDF, pictures, and texts into a structured format (Fig. 4.1-1) is a step-by-step process that includes the following steps:
- Extract): In this step, a source document or image containing unstructured data is loaded. This can be, for example, a PDF -document, a photo, a drawing or a schematic.
- Data conversion (Transform): This is followed by the step of converting unstructured data into a structured format. For example, this may involve recognizing and interpreting text from images using optical character recognition (OCR) or other processing methods.
- Loading and saving data (Load): the last step involves saving the processed data in various formats such as CSV, XLSX, XML, JSON, for further work, where the choice of format depends on specific requirements and preferences.
This process, known as ETL (Extract, Transform, Load), plays a key role in automated data processing and will be discussed in more detail in the chapter “ETL and Pipeline: Extract, Transform, Load”. Next, we will look at examples of how documents of different formats are transformed into structured data.