110 Emergence of LLM in design CAD data processing processes

10 June 2025

112 Next steps moving from closed formats to open data

10 June 2025

CHAPTERS 1-12

HOW INFORMATION HAS EVOLVED IN CONSTRUCTION

CHAPTERS 13-22

HOW THE CONSTRUCTION BUSINESS IS DROWNING IN DATA CHAOS

111 Automated analysis of DWG -files with LLM and Pandas

The process of data processing from DWG -files due to the unstructured nature of the information – has always been a complex task, requiring specialized software and often manual analysis. However, with the development of artificial intelligence and LLM tools, it has become possible to automate many stages of this, today, mostly manual process. Consider a real Pipeline of requests to LLM (in this example ChatGPT) to work with DWG drawings, which allow you to work with the project:

Filter DWG data by layer, ID and coordinates
Visualize the geometry of the elements
Automatically annotate drawings based on parameters
Expand wall polylines to the horizontal plane
Create interactive 3D -visualizations of planar data
Structure and analyze construction data without complex CAD -tools

In our case, the process of building Pipeline starts with sequential code generation through the LLM. First, a query describing the task is generated. ChatGPT generates Python -code, which is executed and analyzed, showing the result inside the chat room. If the result is not as expected, the request is corrected and the process is repeated

Pipeline is a sequence of automated steps performed to process and analyze data. In such a process, each step takes data as input, performs transformations, and passes the result to the next step.

After obtaining the desired result, the code is copied from LLM and pasted into the code in the form of blocks in any of the convenient IDEs, in our case on the Kaggle platform.com. The resulting code fragments are combined into a single Pipeline, which automates the entire process – from data loading to its final analysis. This approach allows rapid development and scaling of analytical processes without deep programming expertise. The full code of all the fragments below, along with sample queries, can be found on the Kaggle.com platform by searching for “DWG Analyse with ChatGPT | DataDrivenConstruction” (А. Boiko, “DWG Analyse with ChatGPT | DataDrivenConstruction,” 5 Mar 2024).

Let’s start the process of working with DWG data, after conversion to structured view (Fig.‎ 4.1-13), with a classical step – grouping and filtering of all the drawing data, necessary for our task wall elements, specifically polylines (parameter ‘ParentID’ allows to group lines into groups), which in the parameter (dataframe column) “Layer” has a string value containing the following combination of letters (RegEx) – “wall”.

To get the code for a similar task and the result in the form of a picture you should write the following query in LLM:
First, check if the dataframe obtained from DWG contains the defined columns: ‘Layer’, ‘ID’, ‘ParentID’ and ‘Point’. Then filter out the IDs from the ‘Layer’ column that contain the string ‘wall’. Find the items in the ‘ParentID’ column that match these identifiers. Define a function to clean and split the data in the ‘Point’ column. This includes removing brackets and splitting the values into ‘x’, ‘y’ and ‘z’ coordinates. Plot the data using matplotlib. For each unique ‘ParentID’, draw a separate polyline connecting the ‘Point’ coordinates. Make sure the first and last points are connected if possible. Set the appropriate labels and headers, ensuring that the x and y axes are equally scaled.
The answer LLM will give you a ready-made picture behind which hides the Python code that generated it:

Рисунок 1 — Fig. 6.4-8 LLM code extracted all the lines of the “wall” layer from the DWG -file, cleared their coordinates, and constructed the polylines using one of the Python libraries.

Now let’s add to the lines the area parameter that each polyline has in its properties (in one of the dataframe columns):
Now get just one “ParentID” from each polyline – find that ID in the “ID” column, take the “Area” value, divide by 1,000,000 and add that value to the graph

The LLM response will show a new graph where each polyline will have a caption with its area:

Рисунок 2 — Fig.‎ 6.4-9 LLM has added code that takes the area values for each polyline and adds it to the image with line visualization.

Then we will transform each polyline into a horizontal line, add a parallel line at a height of 3000 mm and connect them into a single plane, to show in this way the layout of the surfaces of the wall elements:
You need to take all the elements from the “Layer” column with the value “wall”. Take these IDs as a list from the “ID” column and find these IDs from the whole dataframe in the “ParentID” column. All elements are lines that are combined into a single polyline. Each line has a different x, y geometry of the first point in the “Point” column. You must take each polyline in turn and from the point 0,0 horizontally draw the length of each segment from the polyline. the length of each segment of the polyline into one line. Then draw exactly the same lines only 3000 higher, connect all points into one plane.
Now let’s move from 2D projection to 3D -model of walls from flat lines by connecting upper and lower layers of polylines:
Visualize wall elements in 3D, connecting polylines at heights z = 0 and z = 3000 mm. To create a closed geometry representing the walls of the building. Use Matplotlib 3D graphing tool.

The LLM response will output code that allows you to plot wall drawings in the plane:

Рисунок 4 — Fig.‎ 6.4-10 We turn each polyline using prompts into a layout that visualizes the wall planes directly in the LLM chat.

LLM will generate an interactive 3D -graph in which each polyline will be represented as a set of planes. The user will be able to move freely between elements with a computer mouse, exploring the model in 3D mode by copying the code from the chat to the IDE:

Рисунок 5 — Fig.‎ 6.4-11 LLM helped build code [129] to visualize flat drawing lines into a 3D view that can be explored in the 3D viewer inside the IDE.

To build a logical and reproducible Pipeline – from initial conversion and loading of DWG -file to the final result – it is recommended to copy the generated LLM -block of code to the IDE after each step. In this way, you not only check the result in chat, but also run it in your development environment immediately. This allows you to build the process sequentially, debugging and adapting it as needed.

You can find the complete Pipeline code of all fragments (Fig. 6.4-8 to Fig. 6.4-11) along with sample queries on the Kaggle platform.com by searching for “DWG Analyze with ChatGPT | DataDrivenConstruction” (А. Boiko, “DWG Analyse with ChatGPT | DataDrivenConstruction,” 5 Mar 2024).On Kaggle you can not only view the code and the prompts used, but also copy and test the entire Pipeline with the original DWG dataframes in the cloud for free without having to install any additional software or the IDE itself.

The approach presented in this chapter allows you to fully automate the checking, processing and generation of documents based on DWG -projects. The developed Pipeline is suitable both for processing individual drawings and for batch processing of dozens, hundreds and thousands of DWG-files with automatic generation of necessary reports and visualizations for each project.

The process can be organized sequentially and transparently: first the data from CAD -file is automatically converted into XLSX format, then loaded into a dataframe, followed by grouping, checking and result generation – all this is realized in a single Jupyter notebook or Python -script, in any popular IDE. If necessary, the process can be easily extended through integration with project documentation management systems: CAD files can be automatically retrieved according to specified criteria, results can be returned back to the storage system and users can be notified when the results are ready – by email or messengers.

Using LLM chats and agents to work with design data reduces dependence on specialized CAD -programs and allows you to perform analysis and visualization of architectural designs without the need for manual interaction with the interface – without mouse clicks and remembering complex menu navigation.

With each passing day, the construction industry will hear more and more about LLM, granular structured data, DataFrames and columnar databases. Unified two-dimensional DataFrames formed from various databases and CAD formats, will be the ideal fuel for modern analytical tools that are actively handled by specialists in other industries.

The automation process itself will be significantly simplified – instead of studying API of closed niche products and writing complex scripts to analyze or transform parameters, now it will be enough to formulate a task in the form of a set of individual text commands, which will be folded into the required Pipeline or Workflow-process for the required programming language, which runs for free on almost any device. No more waiting for new products, formats, plug-ins or updates from CAD- (BIM-) tool vendors. Engineers and builders will be empowered to work independently with data using simple, free and easy-to-understand tools, assisted by LLM chats and agents.

Change language

Post's Highlights

Stay updated: news and insights

Focus Areas

architecture Construction cost data digital EVOLUTION OF DATA USE IN THE CONSTRUCTION INDUSTRY information intelligence logic

001 The birth of the data era in construction

About 10,000 years ago, in the Neolithic era, mankind made a revolutionary transition in its development, abandoning the nomadic lifestyle in favor of sedentary life, which led to the appearance of the first primitive buildings...

application architecture cost decision design EVOLUTION OF DATA USE IN THE CONSTRUCTION INDUSTRY information planning process record tool

002 From clay and papyrus to digital technology

The first documentary evidence in construction dates back to the period of pyramid building, around 3000-4000 BC (“Papyrus, 3rd century B.C. Language is Greek,” 2024). Since then, the keeping of written records has facilitated and...

Analytics decision digitalization EVOLUTION OF DATA USE IN THE CONSTRUCTION INDUSTRY Forecasting information planning process structure tool transformation

003 Process as a tool for data-driven experience

At the heart of any process is the transformation of past experience into a tool for planning the future. Experience in the modern sense is a structured set of data, the analysis of which allows...

Analytics cost decision design digitalization EVOLUTION OF DATA USE IN THE CONSTRUCTION INDUSTRY format information learning text Verification

004 Digitalization of construction process information

For millennia, the amount of information recorded in construction has barely changed, but it has grown rapidly in recent decades (Fig. 1.1-5). According to the PwC study® “Managed Data. What Students Need to Succeed in...

architecture asset cost digitalization information MANAGEMENT SYSTEMS IN CONSTRUCTION platform report risk source structure

005 Digital revolution and the emergence of modular MRPERP-systems

The era of modern digital data storage and processing began with the advent of magnetic tape in the 1950s, which opened up the possibility of storing and utilizing large amounts of information. The next breakthrough...

architecture asset cost design information MANAGEMENT SYSTEMS IN CONSTRUCTION manager network performance planning procurement

006 Data management systems from data mining to business challenges

Today’s companies are faced with the need to integrate multiple data management systems. Selecting data management systems, managing these systems well, and integrating disparate data sources is becoming critical to business performance. In the mid-2020s,...

analyst Analytics decision design ERP information MANAGEMENT SYSTEMS IN CONSTRUCTION network procurement structure tool

007 Corporate mycelium how data connects to business processes

The process of integrating data into applications and databases relies on the aggregation of information from a variety of sources, including different departments and specialists (Fig. 1.2-4). Specialists search for relevant data, process it, and...

Construction data digital digitalization information intelligence learning machine role THE DIGITAL REVOLUTION AND THE EXPLOSION OF DATA value

008 The beginning of the data volume boom as an evolutionary wave

The construction industry is experiencing an unprecedented information explosion. If we think of business as a knowledge tree (Fig.‎ 1.2-5) fed by data, the current stage of digitalization can be compared to the rapid growth...

AI Construction cost data digital drawings information management storage THE DIGITAL REVOLUTION AND THE EXPLOSION OF DATA transformation

009 The amount of data generated in a modern company

In the last two years, 90% of all existing data in the world has been created (B. Marr, “How much data do we create every day? The Mind-Blowing Stats Everyone Should Read,” 2018). As of...

AI architecture cost decision digitalization ERP information intelligence process THE DIGITAL REVOLUTION AND THE EXPLOSION OF DATA trend

010 Cost of data storage the economic aspect

In recent years, more and more companies are outsourcing data storage to cloud services. For example, if a company hosts half of its data in the cloud, at an average price of $0.015 per gigabyte...

Audit automation Construction decision engineer information resource role structure system THE DIGITAL REVOLUTION AND THE EXPLOSION OF DATA

012 Next steps from data theory to practical change

The evolution of data in construction is a journey from clay tablets to modern modular platforms. The challenge today is not to collect information, but to create a framework that turns disparate and diverse data...

application asset cost DATA UNIFICATION AND STRUCTURING decision information performance process text transformation trend

034 Filling systems with data in the construction industry

Whether it is large corporations or medium-sized companies, specialists are daily engaged in filling program systems and databases with various interfaces with multiformat information (Fig.‎ 3.2-1), which, with the help of managers, must interact with...

analyst DATA UNIFICATION AND STRUCTURING decision format information Interoperability learning manager report structure text

035 Data transformation the critical foundation of modern business analysis

Today, most companies are facing a paradox: about 80% of their daily processes still rely on classic structured data – familiar Excel spreadsheets and relational databases (RDBMS) (М. Shacklett, “Structured and unstructured data: Key differences,”...

Analytics architecture CSV DATA UNIFICATION AND STRUCTURING entity format information ontology record structure table

036 Data models relationships in data and relationships between elements

Data in information systems are organized in different ways – depending on the tasks and requirements for storing, processing and transmitting information. The key difference between the types of data models, the form in which...

API architecture asset DATA UNIFICATION AND STRUCTURING digitalization format information interface Interoperability structure table

037 Proprietary formats and their impact on digital processes

One of the key challenges faced by construction companies during digitalization is limited access to data. This makes it difficult to integrate systems, reduces the quality of information and complicates the organization of efficient processes....

AI DATA UNIFICATION AND STRUCTURING design digitalization format information Interoperability performance report source Workflow

038 Open formats are changing the approach to digitalization

The construction industry was one of the last to address the problem of closed and proprietary data. Unlike other sectors of the economy, digitalization has been slow to develop here. The reasons for this include...

Analytics architecture cost DATA UNIFICATION AND STRUCTURING design information monitor orchestration report structure transformation

039 Paradigm Shift Open Source as the End of the Era of Software Vendor Dominance

The construction industry is undergoing a shift that cannot be monetized in the usual way. The concept of data-driven, data-centric approach and the use of Open Source tools is leading to a rethinking of the...

AI Analytics architecture DATA UNIFICATION AND STRUCTURING decision information report source structure transformation unification

040 Structured open data the foundation of digital transformation

While in past decades business sustainability was largely determined by the choice of software solutions and dependence on specific vendors, in today’s digital economy the key factor is data quality and the ability to work...

AI information intelligence LLM AND THEIR ROLE IN DATA PROCESSING AND BUSINESS PROCESSES Pandas python source sql table text tool

041 LLM chat rooms ChatGPT, LlaMa, Mistral, Claude, DeepSeek, QWEN, Grok for automating data processing processes

The emergence of Large Language Models (LLMs) was a natural extension of the movement towards structured open data and the Open Source philosophy. When data becomes organized, accessible and machine-readable, the next step is a...

decision design format information learning LLM AND THEIR ROLE IN DATA PROCESSING AND BUSINESS PROCESSES monitor neural query structure text

042 Large Language Models LLM how it works

Big language models (ChatGPT, LlaMa, Mistral, Claude, DeepSeek, QWEN, Grok) are neural networks trained on huge amounts of textual data from the Internet, books, articles and other sources. Their main task is to understand the...

AI architecture cost design Forecasting information LLM AND THEIR ROLE IN DATA PROCESSING AND BUSINESS PROCESSES performance report source text

043 Utilizing local LLMs for sensitive company data

The appearance of the first chat-LLMs in 2022 marked a new stage in the development of artificial intelligence. However, immediately after the widespread adoption of these models, a legitimate question arose: how secure is it...

accuracy AI application information interface LLM AND THEIR ROLE IN DATA PROCESSING AND BUSINESS PROCESSES performance query risk source text

044 Full control of AI in the company and how to deploy your own LLM

Modern tools allow companies to deploy a large language model (LLM) locally in just a few hours. This gives complete control over data and infrastructure, eliminating dependence on external cloud services and minimizing the risk...

AI architecture assistant information interface langchain LLM AND THEIR ROLE IN DATA PROCESSING AND BUSINESS PROCESSES python query RAG text

045 RAG Intelligent LLM -assistants with access to corporate data

The next stage in the evolution of LLM application in business is the integration of models with actual real-time corporate data. This approach is called RAG (Retrieval-Augmented Generation) – Retrieval-Augmented Generation. In this architecture, the...

decision format IDE WITH LLM SUPPORT AND FUTURE PROGRAMMING CHANGES information interface learning network NiFi platform RAG structure

046 Choosing an IDE from LLM experiments to business solutions

When diving into the world of automation, data analysis, and artificial intelligence – especially when working with large language models (LLMs) – it is critical to choose the right integrated development environment (IDE). This IDE...

Analytics CSV decision format IDE WITH LLM SUPPORT AND FUTURE PROGRAMMING CHANGES information learning neural Pandas platform query

048 Python Pandas an indispensable tool for working with data

Pandas occupies a special place in the world of data analysis and automation. It is one of the most popular and widely used libraries of the Python programming language(“Python Packages Download Stats,” 2024), designed to...

Analytics CSV format IDE WITH LLM SUPPORT AND FUTURE PROGRAMMING CHANGES information label Pandas query structure table text

049 DataFrame universal tabular data format

DataFrame is the central structure in the Pandas library, which is a two-dimensional table (Fig. 3.4-6) where rows correspond to individual objects or records and columns correspond to their characteristics, parameters, or categories. This structure...

asset Audit decision framework IDE WITH LLM SUPPORT AND FUTURE PROGRAMMING CHANGES information Pandas process python structure transformation

050 Next steps building a sustainable data framework

In this part, we reviewed the key types of data used in the construction industry, got acquainted with different formats of their storage and analyzed the role of modern tools, including LLM and IDEs, in...

analyst CSV DATA CONVERSION INTO A STRUCTURED FORM decision format information JSON learning platform structure text

051 Learning how to turn documents, PDF, pictures and texts into structured formats

In the era of the data-driven economy, data is becoming the basis for decision-making rather than an obstacle. Instead of constantly adapting information to each new system and its formats, companies are increasingly striving to...

AI CSV DATA CONVERSION INTO A STRUCTURED FORM feedback format learning Pandas query source table text

052 Example of converting a PDF -document into a table

One of the most common tasks in construction projects is to process specifications in PDF format. To demonstrate the transition from unstructured data to a structured format, let’s consider a practical example: extracting a table...

CSV DATA CONVERSION INTO A STRUCTURED FORM decision format information manager Pandas query structure table taxonomy

054 Converting text data into a structured form

In addition to PDF documents with tables (Fig.‎ 4.1-2) and scanned versions of tabular forms (Fig.‎ 4.1-5), a significant part of information in project documentation is presented in text form. It can be both coherent...

API cost DATA CONVERSION INTO A STRUCTURED FORM design dwg format information Pandas source table transformation

055 Translation of CAD data (BIM) into a structured form

Structuring and categorizing CAD data (BIM) is more challenging because data stored from CAD (BIM) databases are almost always in closed or complex parametric formats, often combining geometric data elements (semi-structured) and metainformation elements (semi-structured...

Analytics architecture DATA CONVERSION INTO A STRUCTURED FORM design digitalization entity format information Interoperability platform structure

056 CAD solution vendors move to structured data

From 2024, the design and construction industry is undergoing a significant technological shift in the use and processing of data. Instead of free access to design data, CAD -system vendors are focusing on promoting the...

automation CLASSIFICATION AND INTEGRATION: A COMMON LANGUAGE FOR CONSTRUCTION DATA Construction decision design digitalization information Integration project real-time Verification

057 Speed of decision making depends on data quality

Today’s design data architecture is undergoing fundamental changes. The industry is moving away from bulky, isolated models and closed formats towards more flexible, machine-readable structures focused on analytics, integration and process automation. However, the transition...

accuracy CLASSIFICATION AND INTEGRATION: A COMMON LANGUAGE FOR CONSTRUCTION DATA decision design format information intelligence process structure unification Verification

058 Data standardization and integration

Effective data management requires a clear standardization strategy. Only with clear requirements for data structure and quality can data validation be automated, manual operations reduced and informed decision making accelerated at all stages of a...

CLASSIFICATION AND INTEGRATION: A COMMON LANGUAGE FOR CONSTRUCTION DATA cost design entity format information Interoperability report scheduling structure Verification

110 Emergence of LLM in design CAD data processing processes

112 Next steps moving from closed formats to open data

CHAPTERS 1-12

CHAPTERS 23-50

CHAPTERS 51-72

CHAPTERS 73-93

CHAPTERS 94-112

CHAPTERS 113-132

CHAPTERS 133-145

CHAPTERS 146-160

111 Automated analysis of DWG -files with LLM and Pandas

Leave a Reply Cancel reply

Focus Areas

Search

Don't miss the new solutions

Looking for the Linux or MAC version? Send us a quick message using the button below, and we’ll guide you through the process!

📥 Download OnePager

Reserve your spot now to rethink your approach to decision making!

🧰 Data-Driven Readiness Check

🚀 Goals and Pain Points

Build your automation pipeline

Understand and organize your data

Automate your key process

Define a digital strategy

Move from CAD (BIM) to databases and analytics

Combine BIM, ERP and Excel

Convince leadership to invest in data

📘 What to Read in Data-Driven Construction Guidebook

Chapters 1.2, 4.1–4.3 – Technologies, Data Conversion, Structuring, Modeling:

Centralized vs fragmented data

Principles of data structure

Roles of Excel, DWH, and databases

Chapters 5.2, 7.2 – QTO Automation, ETL with Python:

Data filtering and grouping

Automating QTO and quantity takeoff

Python scripts and ETL logic

Chapter 10.2 – Roadmap for Digital Transformation:

Strategic stages of digital change

Organizational setup

Prioritization and execution paths

Chapters 4.1, 8.1–8.2 – From CAD (BIM) to Storage & Analytics:

Translating Revit/IFC to structured tables

BIM as a database

Building analytical backends

Chapters 7.3, 10.2 – Building ETL Pipelines + Strategic Integration:

Combining Excel, BIM, ERP

Automating flows between tools

Connecting scattered data sources

Chapters 7.3, 7.4 – ETL Pipelines and Orchestration (Airflow, n8n):

Building pipelines

Scheduling jobs

Using tools like Airflow or n8n to control the flow

Chapters 2.1, 10.1 – Fragmentation, ROI, Survival Strategy:

Hidden costs of bad data

Risk of inaction

ROI of data initiatives

Convincing stakeholders

🎯 DDC Workshop That Solves Your Puzzle

Module 1 – Data Automation and Workflows in Construction:Overview of data sourcesExcel vs systemsTypical data flows in constructionFoundational data logic

Module 3 – Automated Data Processing Workflow:Setting up ETL workflowsCAD/BIM extractionAutomation in Excel/PDF reporting

Module 8 – Converting Unstructured CAD into Structured Formats From IFC/Revit to tablesGeometric vs semantic dataTools for parsing and transforming CAD models

Module 13 – Key Stages of Transformation Transformation roadmapChange managementRoles and responsibilitiesKPIs and success metrics

Module 8 – Integrating Diverse Data Systems and FormatsExcel, ERP, BIM integrationData connection and file exchangeStructuring hybrid pipelines

Module 7 – Automating Data Quality Assurance Processes Rules and checksDashboardsReport validationAutomated exception handling

Module 10 – Challenges of Digitalization in the Industry How to justify investment in dataStakeholder concernsROI examplesFailure risks

💬 Individual Consultation – What We'll Discuss

Audit of your data landscape

We'll review how data is stored and shared in your company and identify key improvement areas.

Select a process for automation

We'll pick one process in your company that can be automated and outline a step-by-step plan.

Strategic roadmap planning

Together we’ll map your digital transformation priorities and build a realistic roadmap.

CAD (BIM) - IFC/Revit model review

We'll review your Revit/IFC/DWG data and show how to convert it into clean, structured datasets.

Mapping integrations across tools

We’ll identify your main data sources and define how they could be connected into one workflow.

Plan a pilot pipeline (PoC)

We'll plan a pilot pipeline: where to start, what tools to use, and what benefits to expect.

ROI and stakeholder alignment

We’ll discuss how to justify data investments with ROI examples and stakeholder alignment tips.

Reserve your spot now to rethink your
approach to decision making!

Module 1 – Data Automation and Workflows in Construction:
Overview of data sources
Excel vs systems
Typical data flows in construction
Foundational data logic

Module 3 – Automated Data Processing Workflow:
Setting up ETL workflows
CAD/BIM extraction
Automation in Excel/PDF reporting

Module 8 – Converting Unstructured CAD into Structured Formats
From IFC/Revit to tables
Geometric vs semantic data
Tools for parsing and transforming CAD models

Module 13 – Key Stages of Transformation
Transformation roadmap
Change management
Roles and responsibilities
KPIs and success metrics

Module 8 – Integrating Diverse Data Systems and Formats
Excel, ERP, BIM integration
Data connection and file exchange
Structuring hybrid pipelines

Module 7 – Automating Data Quality Assurance Processes
Rules and checks
Dashboards
Report validation
Automated exception handling

Module 10 – Challenges of Digitalization in the Industry
How to justify investment in data
Stakeholder concerns
ROI examples
Failure risks