Рисунок 2
158 Project cost and time prediction using linear regression
10 June 2025
160 Next steps from storage to analysis and forecasting
10 June 2025

159 Project cost and time predictions using the K-nearest neighbor algorithm (k-NN)

We use the k-Nearest Neighbors (k-NN) algorithm as an additional predictor to estimate the cost and duration of a new project. The K-Nearest Neighbors (k-NN) algorithm is a supervised machine learning (supervised machine learning) method for both classification and regression. We have also previously discussed the k-NN algorithm in the context of vector database search (Fig. 8.2-2), where it is used to find the closest vectors (e.g., texts, images, or technical descriptions). In this approach, each project is represented as a point in a multidimensional space, where each dimension corresponds to a specific attribute of the project.

In our case, given the three attributes of each project, we will represent them as points in a three-dimensional space (Fig. 9.3-5). Thus, our upcoming project X will be localized in this space with coordinates (x=4, y=4, z=7). It should be noted that in real conditions, the number of points and the dimensionality of the space may be orders of magnitude larger.

The K-NN (k-nearest neighbors) algorithm works by measuring the distance between the desired project X and the projects in the training database. By comparing these distances, the algorithm determines the projects that are closest to the point of the new project X.

For example, if the second project (x=8, y=9, z=6) from our original dataset is much farther away from X (Fig. 9.3-5) than the other projects, it can be excluded from further analysis. As a result, only the two (k=2) nearest projects can be used for calculations, based on which the average value will be determined.

Such a method, through a neighborhood search, allows us to assess the similarities between projects, which in turn helps us to draw conclusions about the possible cost and timing of a new project based on similar projects that have been implemented previously.

image81
Fig. 9.3-5 In the K-NN algorithm, projects are represented as points in a multidimensional space, and nearby projects are selected based on distances to evaluate similarity and make predictions.

The work of k-NN involves several key steps:

  • Data preparation: training and test data sets are loaded first. Training data is used to “train” the algorithm, and test data is used to check its efficiency.
  • Selecting the parameter K: a number K is selected, which indicates how many nearest neighbors (data points) should be considered in the algorithm. The value of “K” is very important because it affects the result.
  • Classification process and regression for test data:
    • Calculating distances: for each element from the test data, the distance to each element from the training data is calculated (Fig. 9.3-5). Different distance measurement methods can be used for this, such as Euclidean distance (the most common method), Manhattan distance or Hamming distance.
    • Sorting and selecting K nearest neighbors: after calculating the distances, they are sorted and K nearest points to the test point are selected.
    • Determining the class or value of a test point: if it is a classification task, the class of the test point is determined based on the most frequent class among K selected neighbors. If it is a regression task, the mean (or other measure of central tendency) of the K neighbors’ values is calculated.
  • Completion of the process: once all test data has been classified or predictions have been made for it, the process is complete.

The algorithm k-nearest neighbors (k-NN) is effective in many practical applications and is one of the main tools in the arsenal of machine learning specialists. This algorithm is popular due to its simplicity and efficiency, especially in tasks where relationships between data are easy to interpret.

In our example, after applying the K-nearest neighbor algorithm, the two projects (from our small sample) with the shortest distance to project X were identified (Fig. 9.3-5). Based on these projects, the algorithm determines the average of their price and construction duration. After analysis (Fig. 9.3-6), the algorithm, by averaging the nearest neighbors, concludes that project X will cost approximately$ $3,800,000 and take about 250 days to complete.

.

image170
Fig. 9.3-6 The K-nearest neighbors algorithm determines the cost and schedule of project X by analyzing the two closest projects in the sample.

The k-Nearest Neighbors (k-NN) algorithm is particularly popular in classification and regression tasks, such as recommendation systems, where it is used to suggest products or content based on preferences similar to the interests of a particular user. In addition, k-NN is widely used in medical diagnostics to classify types of diseases based on patient symptoms, in pattern recognition, and in the financial sector to assess the creditworthiness of customers.

Even with limited data, machine learning models can provide useful predictions and significantly enhance the analytical component of construction project management. As historical data is expanded and cleaned up, it is possible to move to more complex models – for example, taking into account the type of construction, location, season of construction start and other factors.

Our simplified problem used three attributes for visualization in three-dimensional space, but real projects, on average, include hundreds or thousands of attributes (see the dataset from the chapter “Example of CAD (BIM) based big data”), which greatly increases the dimensionality of the space and the complexity of representing projects as vectors (Fig. 9.3-7).

.

image161
Fig. 9.3-7 In the simplified example, three attributes were used for 3D -visualization, while real projects have more.

Applying different algorithms to the same data set for project X, which has 40 apartments, 4 floors, and a complexity level of 7, yielded different predicted values. The linear regression algorithm predicted a completion time of 238 days and a cost of$ 3,042,338 (Fig. 9.3-4), while the k-NN algorithm predicted 250 days and$ 3,882,000 (Fig. 9.3-6).

The accuracy of predictions obtained using machine learning models, directly depends on the volume and quality of the input data. The more projects are involved in training, and the more completely and accurately their characteristics (attributes) and results (labels) are represented, the higher the probability of obtaining reliable predictions with minimal error values.

Data preprocessing techniques play an important role in this process, including:

Normalization to bring features to a common scale;

Outlier detection and elimination that eliminates model distortion;

Coding of categorical attributes to allow manipulation of textual data;

Filling in missing values, increasing the stability of the model.

In addition, cross-validation methods are used to assess the generalizability of the model and its robustness to new datasets to detect overfitting and improve the reliability of the prediction.

Chaos is an order to be deciphered(J. Saramago, “Quotable Quote,”).

– José Saramago, “The Double”

Even if it seems to you that the chaos of your tasks cannot be described formally, know that any event in the world and especially construction processes are subject to mathematical laws, which may need support for calculating values not through strict formulas but with the help of statistics and historical data.

Both traditional estimates performed by estimating departments and machine learning models inevitably face uncertainty and potential sources of error. However, when sufficient quality data is available, machine learning models can demonstrate comparable and sometimes even higher prediction accuracy than expert estimates.

Machine learning is likely to become a reliable complementary tool for analysis, allowing to: refine calculations, propose alternative scenarios and identify hidden dependencies between project parameters. Such models will not claim to be universal, but they will soon occupy an important place in calculations and project decision-making processes. Machine learning technologies will not exclude the participation of engineers, estimators and analysts, but, on the contrary, will expand their capabilities by offering an additional point of view based on historical data.

If properly integrated into the business processes of construction companies, machine learning has the potential to become an important element in the management decision support system – not as a replacement for humans, but as an extension of their professional intuition and engineering logic.

.

Change language

Post's Highlights

Stay updated: news and insights



We’re Here to Help

Fresh solutions are released through our social channels

Leave a Reply

Your email address will not be published. Required fields are marked *

Focus Areas

navigate
  • ALL THE CHAPTERS IN THIS PART
  • A PRACTICAL GUIDE TO IMPLEMENTING A DATA-DRIVEN APPROACH (8)
  • CLASSIFICATION AND INTEGRATION: A COMMON LANGUAGE FOR CONSTRUCTION DATA (8)
  • DATA FLOW WITHOUT MANUAL EFFORT: WHY ETL (8)
  • DATA INFRASTRUCTURE: FROM STORAGE FORMATS TO DIGITAL REPOSITORIES (8)
  • DATA UNIFICATION AND STRUCTURING (7)
  • SYSTEMATIZATION OF REQUIREMENTS AND VALIDATION OF INFORMATION (7)
  • COST CALCULATIONS AND ESTIMATES FOR CONSTRUCTION PROJECTS (6)
  • EMERGENCE OF BIM-CONCEPTS IN THE CONSTRUCTION INDUSTRY (6)
  • MACHINE LEARNING AND PREDICTIONS (6)
  • BIG DATA AND ITS ANALYSIS (5)
  • DATA ANALYTICS AND DATA-DRIVEN DECISION-MAKING (5)
  • DATA CONVERSION INTO A STRUCTURED FORM (5)
  • DESIGN PARAMETERIZATION AND USE OF LLM FOR CAD OPERATION (5)
  • GEOMETRY IN CONSTRUCTION: FROM LINES TO CUBIC METERS (5)
  • LLM AND THEIR ROLE IN DATA PROCESSING AND BUSINESS PROCESSES (5)
  • ORCHESTRATION OF ETL AND WORKFLOWS: PRACTICAL SOLUTIONS (5)
  • SURVIVAL STRATEGIES: BUILDING COMPETITIVE ADVANTAGE (5)
  • 4D-6D and Calculation of Carbon Dioxide Emissions (4)
  • CONSTRUCTION ERP AND PMIS SYSTEMS (4)
  • COST AND SCHEDULE FORECASTING USING MACHINE LEARNING (4)
  • DATA WAREHOUSE MANAGEMENT AND CHAOS PREVENTION (4)
  • EVOLUTION OF DATA USE IN THE CONSTRUCTION INDUSTRY (4)
  • IDE WITH LLM SUPPORT AND FUTURE PROGRAMMING CHANGES (4)
  • QUANTITY TAKE-OFF AND AUTOMATIC CREATION OF ESTIMATES AND SCHEDULES (4)
  • THE DIGITAL REVOLUTION AND THE EXPLOSION OF DATA (4)
  • Uncategorized (4)
  • CLOSED PROJECT FORMATS AND INTEROPERABILITY ISSUES (3)
  • MANAGEMENT SYSTEMS IN CONSTRUCTION (3)
  • AUTOMATIC ETL CONVEYOR (PIPELINE) (2)

Search

Search

057 Speed of decision making depends on data quality

Today’s design data architecture is undergoing fundamental changes. The industry is moving away from bulky, isolated models and closed formats towards more flexible, machine-readable structures focused on analytics, integration and process automation. However, the transition...

060 A common language of construction the role of classifiers in digital transformation

In the context of digitalization and automation of inspection and processing processes, a special role is played by classification systems elements – a kind of “digital dictionaries” that ensure uniformity in the description and parameterization...

061 Masterformat, OmniClass, Uniclass and CoClass the evolution of classification systems

Historically, construction element and work classifiers have evolved in three generations, each reflecting the level of available technology and the current needs of the industry in a particular time period (Fig. 4.2-8): First generation (early...

Don't miss the new solutions

 

 

Linux

macOS

Looking for the Linux or MAC version? Send us a quick message using the button below, and we’ll guide you through the process!


📥 Download OnePager

Welcome to DataDrivenConstruction—where data meets innovation in the construction industry. Our One-Pager offers a concise overview of how our data-driven solutions can transform your projects, enhance efficiency, and drive sustainable growth. 

🚀 Welcome to the future of data in construction!

You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

By downloading, you agree to the DataDrivenConstruction terms of use 

Stay ahead with the latest updates on converters, tools, AI, LLM
and data analytics in construction — Subscribe now!

🚀 Welcome to the future of data in construction!

You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

By downloading, you agree to the DataDrivenConstruction terms of use 

Stay ahead with the latest updates on converters, tools, AI, LLM
and data analytics in construction — Subscribe now!

🚀 Welcome to the future of data in construction!

You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

By downloading, you agree to the DataDrivenConstruction terms of use 

Stay ahead with the latest updates on converters, tools, AI, LLM
and data analytics in construction — Subscribe now!

🚀 Welcome to the future of data in construction!

You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

By downloading, you agree to the DataDrivenConstruction terms of use 

Stay ahead with the latest updates on converters, tools, AI, LLM
and data analytics in construction — Subscribe now!

🚀 Welcome to the future of data in construction!

You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

By downloading, you agree to the DDC terms of use 

🚀 Welcome to the future of data in construction!

You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

By downloading, you agree to the DataDrivenConstruction terms of use 

Stay ahead with the latest updates on converters, tools, AI, LLM
and data analytics in construction — Subscribe now!

DataDrivenConstruction offers workshops tested and practiced on global leaders in the construction industry to help your team navigate and leverage the power of data and artificial intelligence in your company's decision making.

Reserve your spot now to rethink your
approach to decision making!

Please enable JavaScript in your browser to complete this form.

 

🚀 Welcome to the future of data in construction!

By downloading, you agree to the DataDrivenConstruction terms of use 

Stay ahead with the latest updates on converters, tools, AI, LLM
and data analytics in construction — Subscribe now!

Have a question or need more information? Reach out to us directly!
Schedule a time to discuss your needs with our team.
Tailored sessions to help your team grow — let's plan together!
Have you attended one of our workshops, read our book, or used our solutions? Share your thoughts with us!
Please enable JavaScript in your browser to complete this form.
Name
Data Maturity Diagnostics

🧰 Data-Driven Readiness Check

This short assessment will help you identify your company's data management pain points and offer solutions to improve project efficiency. It takes only 1–2 minutes to complete and you will receive personalized recommendations tailored to your needs.

🚀 Goals and Pain Points

What are your biggest obstacles today — and your goals for the next 6 months? We’ll use your answers to build a personalized roadmap.

Build your automation pipeline

 Understand and organize your data

Automate your key process

Define a digital strategy

Move from CAD (BIM) to databases and analytics

Combine BIM, ERP and Excel

Convince leadership to invest in data

📘  What to Read in Data-Driven Construction Guidebook

Chapters 1.2, 4.1–4.3 – Technologies, Data Conversion, Structuring, Modeling:

  • Centralized vs fragmented data

  • Principles of data structure

  • Roles of Excel, DWH, and databases

Chapters 5.2, 7.2 – QTO Automation, ETL with Python:

  • Data filtering and grouping

  • Automating QTO and quantity takeoff

  • Python scripts and ETL logic

Chapter 10.2 – Roadmap for Digital Transformation:

  • Strategic stages of digital change

  • Organizational setup

  • Prioritization and execution paths

Chapters 4.1, 8.1–8.2 – From CAD (BIM) to Storage & Analytics:

  • Translating Revit/IFC to structured tables

  • BIM as a database

  • Building analytical backends

Chapters 7.3, 10.2 – Building ETL Pipelines + Strategic Integration:

  • Combining Excel, BIM, ERP

  • Automating flows between tools

  • Connecting scattered data sources

Chapters 7.3, 7.4 – ETL Pipelines and Orchestration (Airflow, n8n):

  • Building pipelines

  • Scheduling jobs

  • Using tools like Airflow or n8n to control the flow 

Chapters 2.1, 10.1 – Fragmentation, ROI, Survival Strategy:

  • Hidden costs of bad data

  • Risk of inaction

  • ROI of data initiatives

  • Convincing stakeholders

Download the DDC Guidebook for Free

 

 

🎯 DDC Workshop That Solves Your Puzzle

Module 1 – Data Automation and Workflows in Construction:
  • Overview of data sources
  • Excel vs systems
  • Typical data flows in construction
  • Foundational data logic

Module 3 – Automated Data Processing Workflow:
  • Setting up ETL workflows
  • CAD/BIM extraction
  • Automation in Excel/PDF reporting

Module 8 – Converting Unstructured CAD into Structured Formats 
  • From IFC/Revit to tables
  • Geometric vs semantic data
  • Tools for parsing and transforming CAD models

Module 13 – Key Stages of Transformation 
  • Transformation roadmap
  • Change management
  • Roles and responsibilities
  • KPIs and success metrics

Module 8 – Integrating Diverse Data Systems and Formats
  • Excel, ERP, BIM integration
  • Data connection and file exchange
  • Structuring hybrid pipelines

Module 7 – Automating Data Quality Assurance Processes 
  • Rules and checks
  • Dashboards
  • Report validation
  • Automated exception handling

Module 10 – Challenges of Digitalization in the Industry 
  • How to justify investment in data
  • Stakeholder concerns
  • ROI examples
  • Failure risks

💬 Individual Consultation – What We'll Discuss

Audit of your data landscape 

We'll review how data is stored and shared in your company and identify key improvement areas.

Select a process for automation 

We'll pick one process in your company that can be automated and outline a step-by-step plan.

Strategic roadmap planning 

Together we’ll map your digital transformation priorities and build a realistic roadmap.

CAD (BIM) - IFC/Revit model review 

We'll review your Revit/IFC/DWG data and show how to convert it into clean, structured datasets.

Mapping integrations across tools 

We’ll identify your main data sources and define how they could be connected into one workflow.

Plan a pilot pipeline (PoC) 

We'll plan a pilot pipeline: where to start, what tools to use, and what benefits to expect.

ROI and stakeholder alignment 

📬 Get Your Personalized Report and Next Steps

You’ve just taken the first step toward clarity. But here’s the uncomfortable truth: 🚨 Most companies lose time and money every week because they don't know what their data is hiding. Missed deadlines, incorrect reports, disconnected teams — all symptoms of a silent data chaos that gets worse the longer it's ignored.

Please enter your contact details so we can send you your customized recommendations and next-step options tailored to your goals.

💡 What you’ll get next:

  • A tailored action plan based on your answers

  • A list of tools and strategies to fix what’s slowing you down

  • An invite to a free 1:1 session to discuss your case

  • And if you choose: a prototype (PoC) to show how your process could be automated — fast.

Clean & Organized Data

Theoretical Chapters:

Practical Chapters:

By downloading, you agree to the DataDrivenConstruction terms of use 

159 Project cost and time predictions using the K-nearest neighbor algorithm (k-NN)
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.
Read more
×