Your learning journey starts here – select a chapter group
Part 8 explores modern data storage and management technologies for the construction industry. It analyzes efficient formats for handling large amounts of information - from simple CSV and XLSX to the more productive Apache Parquet and ORC with a detailed comparison of their capabilities and limita-tions. The concepts of data warehouses (DWH), data lakes (Data Lakes) and their hybrid solutions (Data Lakehouse), as well as the principles of data gov-ernance (Data Governance) and data minimalism (Data Minimalism) are dis-cussed. The problems of Data Swamp) and strategies to prevent chaos in information systems are covered in detail. New approaches to working with data are presented, including vector databases and their application in con-struction through the concept of Bounding Box. This part also touches upon the DataOps and VectorOps methodologies as new standards for organizing data workflows.
134 Data atoms the foundation of effective information management
Everything in the Universe consists of the smallest building blocks – atoms and molecules, and over time all living and non-living things inevitably return to this initial state. In nature, this process occurs with astonishing...
135 Information storage files or data
Data warehouses allow companies to collect and combine information from different systems, creating a single center for subsequent analytics. Collected historical data enables not only deeper analysis of processes, but also the identification of patterns...
136 Big Data Storage Analyzing Popular Formats and Their Effectiveness
Storage formats play a key role in the scalability, reliability, and performance of analytics infrastructure. For data analysis and processing – such as filtering, grouping, and aggregation – our examples used Pandas DataFrame – a...
137 Optimize storage with Apache Parquet
One of the popular formats for storing and processing big data is Apache Parquet. This format is designed specifically for columnar storage (similar to Pandas), which allows you to significantly reduce memory footprint and increase...
138 DWH Data Warehouse data warehouses
Just as the Parquet format is optimized for efficient storage of large amounts of information, the Data Warehouse is optimized for integrating and structuring data to support analytics, forecasting and management decision making. In today’s...
139 Data Lake – evolution of ETL to ELT from traditional cleaning to flexible processing
Classic DWH – data warehouses, designed to store structured data in a format optimized for analytical queries, have faced limitations in handling unstructured data and scalability. In response to these challenges, Data Lakes) have emerged,...
140 Data Lakehouse architecture synergy of warehouses and data lakes
To combine the best features of DWH (structured, manageable, high performance analytics) and Data Lake (scalability, handling heterogeneous data), the Data Lakehouse approach was developed. This architecture combines the flexibility of data lakes with the...
141 CDE, PMIS, ERP or DWH and Data Lake
Some construction and engineering companies are already using the concept of Common Data Environment (CDE) according to ISO 19650. In essence, the CDE performs the same functions as a data warehouse (DWH) in other industries:...
142 Vector Databases and the Bounding Box
Vector databases are a new class of repositories that do not just store data, but allow searching by meaning, comparing objects by semantic proximity, and creating intelligent systems: from recommendations to automatic analysis and context...
143 Data Governance, Data Minimalism and Data Swamp
Understanding and implementing the concepts of Data Governance, Data Minimalism, and preventing Data Swamp are key to successfully managing data warehouses and delivering business value (Fig. 8.2-3). According to a study by Gartner (2017), 85%...
144 DataOps and VectorOps new data standards
While Data Governance is responsible for controlling and organizing data, DataOps helps ensure its accuracy, consistency and smooth flow within the company. This is especially critical for a number of business cases in construction, where...
145 Next steps from chaotic storage to structured storage
Traditional approaches to building data warehousing often result in the creation of disparate “silos of information” where important insights are inaccessible for analysis and decision making. Modern storage concepts, such as Data Warehouse, Data Lake...