AI (Artificial Intelligence) – Artificial intelligence; the ability of computer systems to perform tasks that normally require human intelligence, such as pattern recognition, learning, and decision making.
Apache Airflow is an open source workflow orchestration platform for programmatically creating, scheduling, and tracking workflows and ETLs using DAGs (directed acyclic graphs).
Apache NiFi is a tool for automating data flows between systems, specializing in data routing and transformation.
Apache Parquet is an efficient file format for columnar data storage optimized for use in big data analytics systems. It provides significant compression and fast processing.
API (Application Programming Interface) – a formalized interface that allows one program to interact with another without access to the source code, exchanging data and functionality through standardized requests and responses.
Attribute – a characteristic or property of an object that describes its features (e.g., area, volume, cost, material).
Databases are organized structures for storing, managing and accessing information, used for efficient data retrieval and processing.
BEP (BIM Execution Plan) – A building information modeling implementation plan that defines the goals, methods, and processes for implementing BIM in a project.
Big Data – arrays of information of significant volume, variety and update rate, requiring special technologies for processing and analysis.
BI (Business Intelligence) – Business Intelligence; the processes, technologies and tools to transform data into meaningful information for decision making.
BIM (Building Information Modeling) – Building Information Modeling; the process of creating and managing digital representations of physical and functional characteristics of construction projects, including not only 3D models, but also information about characteristics, materials, time and cost.
BlackBox/WhiteBox – approaches to understanding the system: in the first case, the internal logic is hidden, only inputs and outputs are visible; in the second case, the processing is transparent and available for analysis.
Bounding Box is a geometric construct that describes the boundaries of an object in three-dimensional space through the minimum and maximum coordinates on the X, Y and Z axes, creating a “box” around the object.
BREP (Boundary Representation) is a geometric representation of objects that defines them through the boundaries of surfaces.
CAD (Computer-Aided Design) is a computer-aided design system used to create, edit and analyze accurate drawings and 3D models in architecture, construction, engineering and other industries.
CAFM (Computer-Aided Facility Management) is real estate and infrastructure management software that includes space planning, asset management, maintenance and cost monitoring.
CDE (Common Data Environment) – a centralized digital space for managing, storing, sharing and collaborating with project information at all stages of the facility life cycle.
Center of Excellence (CoE) – a specialized structure in an organization responsible for developing a specific area of knowledge, developing standards and best practices, training staff and supporting the introduction of innovations.
CoClass is a modern, third-generation building element classification system.
A conceptual data model is a high-level representation of basic entities and their relationships without attribute granularity, used in the initial stages of database design.
CRM (Customer Relationship Management) is a customer relationship management system used to automate sales and service processes.
DAG (Directed Acyclic Graph) is a directed acyclic graph used in data orchestration systems (Airflow, NiFi) to determine task sequences and dependencies.
Dash is a Python framework for creating interactive web-based data visualizations.
Dashboard – A dashboard that visually presents key performance indicators and metrics in real time.
The Data-Centric approach is a methodology that prioritizes data over applications or software code, making data the central asset of the organization.
Data Governance – a set of practices, processes and policies that ensure the appropriate and effective use of data in an organization, including access, quality and security controls.
Data Lake is a storage facility designed to store large amounts of raw data in its original format until it is used.
Data Lakehouse is an architectural approach that combines the flexibility and scalability of data lakes (Data Lake) with the manageability and performance of data warehouses (DWH).
Data-Driven Construction is a strategic approach in which every stage of the facility lifecycle – from design to operations – is supported by automated, interconnected systems. This approach provides continuous, fact-based learning, reduces uncertainty, and enables companies to achieve sustainable industry leadership.
Data-Driven integrator – a company specializing in combining data from disparate sources and analyzing it to make management decisions.
Data-Driven approach – A methodology where data is viewed as a strategic asset and decisions are made based on objective analysis of information rather than subjective opinions.
Data Minimalism – an approach to reducing data to the most valuable and meaningful, allowing for simplified processing and analysis of information.
Data Swamp – A scattered array of unstructured data that occurs when information is collected and stored in an uncontrolled manner without proper organization.
DataOps is a methodology that combines DevOps principles, data and analytics, focused on improving collaboration, integration and automation of data flows.
Information digitalization is the process of converting all aspects of construction activities into a digital form suitable for analysis, interpretation and automation.
DataFrame – A two-dimensional tabular data structure in the Pandas library, where rows represent individual records or objects and columns represent their characteristics or attributes.
Descriptive Analytics – Analyzing historical data to understand what happened in the past.
Diagnostic Analytics – Analyze data to determine why something happened.
A Gantt chart is a project planning tool that represents tasks as horizontal bars on a timeline, allowing you to visualize the sequence and duration of work.
DWH (Data Warehouse) is a centralized data warehouse system that aggregates information from multiple sources, structures it and makes it available for analytics and reporting.
ESG (Environmental, Social, Governance) – a set of criteria for assessing the environmental, social and governance impacts of a company or project.
ELT (Extract, Load, Transform) is a process where data is first extracted from sources and loaded into a repository and then transformed for analytical purposes.
ETL (Extract, Transform, Load) is the process of extracting data from various sources, transforming it into the desired format and loading it into the target storage for analysis.
ER-diagram (Entity-Relationship) – a visual diagram showing entities, their attributes and the relationships between them, used in data modeling.
ERP (Enterprise Resource Planning) is a comprehensive modular enterprise resource planning system used to manage and optimize various aspects of the construction process.
Features – In machine learning, independent variables or attributes used as inputs to a model.
Physical data model – a detailed representation of the database structure, including tables, columns, data types, keys and indexes, optimized for a particular DBMS.
FPDF is a Python library for creating PDF documents.
Geometry Core is a software component that provides basic algorithms for creating, editing and analyzing geometric objects in CAD, BIM and other engineering applications.
HiPPO (Highest Paid Person’s Opinion) is an approach to decision making based on the opinion of the highest paid person in the organization rather than objective data.
IDE (Integrated Development Environment) – integrated development environment, a comprehensive tool for writing, testing and debugging code (e.g. PyCharm, VS Code, Jupyter Notebook).
IDS (Information Delivery Specification) is an information delivery specification that defines the data requirements at different stages of a project.
IFC (Industry Foundation Classes) is a BIM data exchange format that provides interoperability between different software solutions.
Industry 5.0 is a concept for industrial development that combines the capabilities of digitalization, automation and artificial intelligence with human potential and environmental sustainability.
Data integration is the process of combining data from different sources into a single, coherent system to provide a unified view of information.
Information silos are isolated data storage systems that do not share information with other systems, creating barriers to efficient data utilization.
IoT (Internet of Things) is the concept of connecting physical objects to the internet to collect, process and transmit data.
k-NN (k-Nearest Neighbors) is a machine learning algorithm that classifies objects based on similarity to the nearest neighbors in the training sample.
Kaggle is a platform for data analytics and machine learning competitions.
Calculation – calculation of the cost of construction works or processes for a certain unit of measurement (e.g. for 1 m² of plasterboard wall, 1 m³ of concrete).
KPIs (Key Performance Indicators) are key performance indicators, quantifiable metrics used to evaluate the success of a company or a specific project.
Labels – In machine learning, the target variables or attributes that the model should predict.
Learning Algorithm – The process of finding the best hypothesis in a model corresponding to a target function using a set of training data.
Linear Regression – A statistical method for modeling the relationship between a dependent variable and one or more independent variables.
LLM (Large Language Model) – Large Language Model, an artificial intelligence trained to understand and generate text from huge data sets, capable of analyzing context and writing program code.
LOD (Level of Detail/Development) – the level of detail of the model that determines the degree of geometric accuracy and information content.
A logical data model is a detailed description of entities, attributes, keys, and relationships that reflects business information and rules, an intermediate stage between the conceptual and physical models.
Machine Learning – A class of artificial intelligence techniques that allow computer systems to learn and make predictions from data without explicit programming.
Masterformat is a first generation classification system used to structure construction specifications by section and discipline.
MEP (Mechanical, Electrical, Plumbing) – Building engineering systems that include mechanical, electrical, and plumbing components.
Mesh is a mesh representation of 3D objects consisting of vertices, edges and faces.
Model – In machine learning, a set of different hypotheses, one of which approximates the target function to be predicted or approximated.
Data modeling is the process of creating a structured representation of data and their relationships for implementation in information systems, including conceptual, logical and physical levels.
n8n is an open source tool for automating workflows and integrating applications through a low-code approach.
Normalization – in machine learning, the process of bringing different numerical data to a common scale to facilitate their processing and analysis.
Reverse engineering – the process of studying the device, functioning and manufacturing technology of an object by analyzing its structure, functions and operation. In the context of data – extracting information from proprietary formats for use in open systems.
OCR (Optical Character Recognition) is an optical character recognition technology that converts text images (scanned documents, photos) into a machine-readable text format.
OmniClass is a second-generation international classification standard for construction information management.
Ontology – A system of interrelationships of concepts that formalizes a particular field of knowledge.
Open Source – a model for developing and distributing open source software that is available for free use, study and modification.
Open BIM is the concept of open BIM, which implies the use of open standards and formats for data exchange between different software solutions.
Open standards are publicly available specifications for accomplishing a specific task that allow different systems to interoperate and exchange data.
Pandas is an open source Python library for data processing and analysis, providing DataFrame and Series data structures for efficient handling of tabular information.
The open data paradigm is an approach to data processing in which information is made freely available for use, reuse and dissemination by anyone.
Parametric method is a construction project estimation method that uses statistical models to estimate cost based on project parameters.
PIMS (Project Information Model) is a digital system designed to organize, store and share all project information.
Pipeline – A sequence of data processing processes, from extraction and transformation to analysis and visualization.
PMIS (Project Information Management System) is a project management system designed for detailed control of tasks at the level of an individual construction project.
Predictive Analytics – A section of analytics that uses statistical methods and machine learning to predict future outcomes based on historical data.
Prescriptive Analytics – A section of analytics that not only predicts future outcomes, but also suggests optimal actions to achieve the desired results.
Proprietary formats are closed data formats controlled by a particular company that limit the ability to share information and increase dependence on specific software.
QTO (Quantity Take-Off) is the process of extracting quantities of elements from the design documents to calculate the quantities of materials required for the project.
Quality Management System – a quality management system that ensures that processes and results meet established requirements.
RAG (Retrieval-Augmented Generation) is a method that combines the generative capabilities of language models with the extraction of relevant information from corporate databases, improving the accuracy and relevance of answers.
RDBMS (Relational Database Management System) is a relational database management system that organizes information in the form of interrelated tables.
RegEx (Regular Expressions) is a formalized language for searching and processing strings, allowing you to specify templates for checking text data for compliance with certain criteria.
Regression is a statistical method of analyzing the relationship between variables.
CO₂ calculations are a method of estimating carbon dioxide emissions associated with the production and use of construction materials and processes.
Resource method – a method of making estimates based on a detailed analysis of all necessary resources (materials, labor, equipment) to perform construction work.
RFID (Radio Frequency Identification) is a technology for automatically identifying objects using radio signals, used for tracking materials, machinery and personnel.
ROI (Return on Investment) – an indicator reflecting the ratio between profit and invested funds, used to assess the effectiveness of investments.
SaaS (Software as a Service) is a model of software as a service where applications are hosted by a provider and made available to users over the Internet.
SCM (Supply Chain Management) – supply chain management, which includes the coordination and optimization of all processes from the purchase of materials to the delivery of finished products.
Data silos are isolated stores of information in an organization that are not integrated with other systems, making it difficult to share data and inefficient.
SQL (Structured Query Language) is a structured query language used to work with relational databases.
SQLite is a lightweight, embeddable, cross-platform DBMS that does not require a separate server and supports basic SQL functions, widely used in mobile applications and embedded systems.
Structured data – information organized in a specific format with a clear structure, such as in relational databases or tables.
Loosely structured data – information with partial organization and flexible structure, such as JSON or XML, where different elements may contain different sets of attributes.
An entity is a concrete or abstract object of the real world that can be uniquely identified, described and represented in the form of data.
Supervised Learning – A type of machine learning in which an algorithm is trained on labeled data where the desired outcome is known for each example.
Taxonomy – A hierarchical classification system used to systematically categorize elements based on common features.
Titanic Dataset is a popular dataset for training and testing machine learning models.
Training – The process in which a machine learning algorithm analyzes data to identify patterns and form a model.
Transfer learning is a machine learning technique in which a model trained for one task is used as a starting point for another task.
Data Transformation – The process of changing the format, structure, or content of data for later use.
Data requirements are formalized criteria that define the structure, format, completeness and quality of information needed to support business processes.
Uberization of the construction industry is the process of transformation of traditional business models in construction under the influence of digital platforms that provide direct interaction between customers and contractors without intermediaries.
Uniclass is a second and third generation building element classification system widely used in the UK.
USD (Universal Scene Description) is a data format developed for computer graphics, but has gained application in engineering systems due to its simple structure and independence from geometric cores.
Data validation is the process of checking information against established criteria and requirements to ensure accuracy, completeness and consistency of data.
Vector Database – A specialized type of database that stores data as multidimensional vectors for efficient semantic search and object comparison.
Vector representation (embedding) is a method of transforming data into multidimensional numerical vectors that allows machine algorithms to efficiently process and analyze information.
VectorOps is a methodology focused on processing, storing and analyzing multidimensional vector data, especially relevant in areas such as digital twins and semantic search.
Visualization – Graphical representation of data for better perception and analysis of information.
The alphabetical categorization of terms was done by their names in English.