Analytics and data analysis
23 February 2024
CO2 estimation and calculation
24 February 2024

Open Data, Pandas, DataFrame and ChatGPT

The transition from unmanaged data flow to its effective integration into business processes starts with converting data from closed formats to open formats.

In scientific research, the principle of sharing open data accelerates discovery and facilitates international collaboration among scientists. In medicine, sharing information between institutions leads to more effective diagnosis and treatment. In information technology, open-source applications allow developers around the world to collaboratively improve software.

 

A major benefit of open data is its ability to remove the dependence of application developers on specific platforms to access data.

The choice between open and closed data is an obvious one, as is the preference for structured data in automation, data processing and data warehousing processes. Structured data is often used by default in most systems because of its ease of processing and unambiguous interpretation, making it the most preferred type for communication and collaboration at the requirements and business process level.

In the context of the construction industry, open structured data enables smooth and coordinated business processes where teams can focus on optimizing projects rather than struggling with incompatible data formats, platforms and systems.

To transform data into a structured format, a wide range of tools are available, where one of the most popular tools is the Python language library - Pandas.

 

Due to its flexibility and wide functionality, Pandas has become an indispensable tool for data scientists, automation and analytics professionals, facilitating the process of turning raw data into valuable information. We will use the Pandas library in conjunction with the ChatGPT tool in practical examples in the following chapters of this book, so let's take a closer look at these tools.

Pandas Python

Pandas library, occupies a special place in the arsenal of tools for working with data, becoming one of the most popular and demanded in this area.

In the world of analytics and structured data management, Pandas stands out for its simplicity, speed and power, providing users with a wide range of tools to effectively analyze and process information.

The Python programming language's Pandas library not only allows to perform basic operations such as reading and writing tables, but also to perform more complex tasks, including merging data, grouping data, and performing complex analytical calculations. Pandas can be compared to a Swiss knife for data analysts and data engineers.

As of January 2024, the number of downloads of the Pandas library is about 4.3 million per day.

The query language in the Pandas library is similar in its functionality to the SQL query language we discussed in the chapter "Relational Databases and SQL Query Language".

Both tools offer powerful data manipulation capabilities including sampling, filtering, sorting and grouping data. Pandas is often preferred in scientific research, process automation, Pipeline creation, and Python data manipulation, while SQL is the standard in database management and is often used in enterprise environments to work with large amounts of data.

Using Pandas, it is possible to work efficiently with large amounts of data - much larger than what Excel can handle. Even when millions of rows are involved, Pandas can handle such tables with ease, providing powerful tools for analyzing, visualizing, and gaining valuable insights from the data. In addition, Pandas has strong community support: hundreds of millions of developers and analysts (Kaggle.com, Google Collab, Microsoft Azure Notebooks, Amazon SageMaker) around the world use it daily online or offline, providing a large number of out-of-the-box solutions for any business desire.

DataFrame

DataFrame in the Pandas library is the name of a two-dimensional data table with a flexible data structure. A DataFrame is organized as a table where each column contains data of the same type (e.g., numbers, strings, dates) and each row represents a separate data set, or record.

A DataFrame is a way of organizing data into a table very similar to the one you might see in Excel. In this table, the rows are individual records or entities, and the columns are the various characteristics or attributes of these item-entities.

For example, if we have a table with information about a construction project, the rows can represent the individual entities-elements of the project and the attributes-columns can represent their categories, parameters, position or coordinates of the BoundingBox elements.

Let's list some of the key features and functionality of DataFrame in Pandas:

  • Columns: in a DataFrame, data is organized in columns, each with a unique name. Columns-attributes can contain data of different types, similar to columns in databases or columns in tables.
  • Rows: in a DataFrame can be indexed with unique values known as a DataFrame index. This index allows to quickly modify and manipulate data on specific rows.
  • Index: by default, when a DataFrame is created, Pandas assigns an index from 0 to N-1 to each row (where N is the number of all rows in the DataFrame). However, the index can be modified so that it contains specific labels such as dates or unique identifiers.
  • Indexing rows in a DataFrame means assigning each row a unique identifier or label, known as the DataFrame index.
  • Data Types: DataFrame supports a variety of data types, including: `int`, `float`, `bool`, `datetime64` and `obect` for text data. Each DataFrame column has its own data type that defines what operations can be performed on its contents.
  • Data operations: DataFrame supports a wide range of operations for data processing, including aggregation (`groupby`), merge (`merge` and `join`), concatenation (`concat`), split-apply-combine, and many other methods for manipulating and transforming data.
  • Size Manipulation: DataFrame allows to add and remove columns and rows, making it a dynamic structure that can be modified according to data analysis needs.
  • Data Visualization: using built-in visualization techniques or interfacing with popular data visualization libraries such as Matplotlib or Seaborn, DataFrame can be easily converted to graphs and charts to present data graphically.
  • Data input and output: Pandas provides functions to read import and export data to various file formats such as CSV, Excel, JSON, HTML and SQL, making DataFrame a central hub for data collection and distribution.

 

These are just the main features and capabilities of DataFrame, but they already make it an indispensable tool for importing, organizing, analyzing, validating, and processing and exporting multi-format and multi-structured data. We will talk more about types of other formats Parquet, Apache orc, JSON, Father, HDF5 and data warehouses in the chapter "Modern data technologies in the construction industry".

The Pandas library and the DataFrame format, due to their popularity and ease of use, have become the primary tools for data processing and automation in the ChatGPT model (in 2023-2024). ChatGPT considers using Pandas and Python often the default when handling queries related to data validation, analysis, and processing.

ChatGPT and LLM

ChatGPT and other tools based on the use of large language models (LLMs) greatly simplify data collection, analysis, and automation. These tools allow users to formulate data queries, avoiding the cost of programmers or learning programming languages and various frameworks on their own.

ChatGPT, developed by OpenAI, is an artificial intelligence that processes natural language and uses extensive data from the Internet to answer queries.

In the past, data analysis required knowledge of the Python programming language and specialized libraries such as Pandas, Polars and DuckDB. By 2023, however, the process has become much simpler thanks to ChatGPT's ability to process text queries and provide accurate results without the need for manual coding. This textual communication capability has made code creation easier and data processing more accessible to a wider audience, becoming a significant breakthrough in usability.

Just like at a certain point, users no longer need to understand how the internet works in order to use it or even create online applications or pages (CMS WordPress, Joomla, Drupal), specialists and engineers in construction companies without deep programming knowledge are now using tools like ChatGPT and LLaMA to automate the logic of processes and replace the functions of individual specialists or entire departments.

 

LLM chats such as ChatGPT and LLaMA allow professionals without deep programming knowledge to contribute to automating and improving a company's business processes.

Once we have familiarized ourselves with the main data types and tools for processing them, we are ready to move on to the first stage of working with data: opening closed formats and converting information from different formats into structured forms.

Discover Ad-Free applications

with support for the latest CAD (BIM) formats

Don't miss the new solutions

 

 

Linux

macOS

Looking for the Linux or MAC version? Send us a quick message using the button below, and we’ll guide you through the process!


📥 Download OnePager

Welcome to DataDrivenConstruction—where data meets innovation in the construction industry. Our One-Pager offers a concise overview of how our data-driven solutions can transform your projects, enhance efficiency, and drive sustainable growth. 

🚀 Welcome to the future of data in construction!

You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

By downloading, you agree to the DataDrivenConstruction terms of use 

Stay ahead with the latest updates on converters, tools, AI, LLM
and data analytics in construction — Subscribe now!

🚀 Welcome to the future of data in construction!

You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

By downloading, you agree to the DataDrivenConstruction terms of use 

Stay ahead with the latest updates on converters, tools, AI, LLM
and data analytics in construction — Subscribe now!

🚀 Welcome to the future of data in construction!

You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

By downloading, you agree to the DataDrivenConstruction terms of use 

Stay ahead with the latest updates on converters, tools, AI, LLM
and data analytics in construction — Subscribe now!

🚀 Welcome to the future of data in construction!

You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

By downloading, you agree to the DataDrivenConstruction terms of use 

Stay ahead with the latest updates on converters, tools, AI, LLM
and data analytics in construction — Subscribe now!

🚀 Welcome to the future of data in construction!

You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

By downloading, you agree to the DDC terms of use 

🚀 Welcome to the future of data in construction!

You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

By downloading, you agree to the DataDrivenConstruction terms of use 

Stay ahead with the latest updates on converters, tools, AI, LLM
and data analytics in construction — Subscribe now!

DataDrivenConstruction offers workshops tested and practiced on global leaders in the construction industry to help your team navigate and leverage the power of data and artificial intelligence in your company's decision making.

Reserve your spot now to rethink your
approach to decision making!

Please enable JavaScript in your browser to complete this form.

 

🚀 Welcome to the future of data in construction!

By downloading, you agree to the DataDrivenConstruction terms of use 

Stay ahead with the latest updates on converters, tools, AI, LLM
and data analytics in construction — Subscribe now!

Have a question or need more information? Reach out to us directly!
Schedule a time to discuss your needs with our team.
Tailored sessions to help your team grow — let's plan together!
Have you attended one of our workshops, read our book, or used our solutions? Share your thoughts with us!
Please enable JavaScript in your browser to complete this form.
Name
Data Maturity Diagnostics

🧰 Data-Driven Readiness Check

This short assessment will help you identify your company's data management pain points and offer solutions to improve project efficiency. It takes only 1–2 minutes to complete and you will receive personalized recommendations tailored to your needs.

🚀 Goals and Pain Points

What are your biggest obstacles today — and your goals for the next 6 months? We’ll use your answers to build a personalized roadmap.

Build your automation pipeline

 Understand and organize your data

Automate your key process

Define a digital strategy

Move from CAD (BIM) to databases and analytics

Combine BIM, ERP and Excel

Convince leadership to invest in data

📘  What to Read in Data-Driven Construction Guidebook

Chapters 1.2, 4.1–4.3 – Technologies, Data Conversion, Structuring, Modeling:

  • Centralized vs fragmented data

  • Principles of data structure

  • Roles of Excel, DWH, and databases

Chapters 5.2, 7.2 – QTO Automation, ETL with Python:

  • Data filtering and grouping

  • Automating QTO and quantity takeoff

  • Python scripts and ETL logic

Chapter 10.2 – Roadmap for Digital Transformation:

  • Strategic stages of digital change

  • Organizational setup

  • Prioritization and execution paths

Chapters 4.1, 8.1–8.2 – From CAD (BIM) to Storage & Analytics:

  • Translating Revit/IFC to structured tables

  • BIM as a database

  • Building analytical backends

Chapters 7.3, 10.2 – Building ETL Pipelines + Strategic Integration:

  • Combining Excel, BIM, ERP

  • Automating flows between tools

  • Connecting scattered data sources

Chapters 7.3, 7.4 – ETL Pipelines and Orchestration (Airflow, n8n):

  • Building pipelines

  • Scheduling jobs

  • Using tools like Airflow or n8n to control the flow 

Chapters 2.1, 10.1 – Fragmentation, ROI, Survival Strategy:

  • Hidden costs of bad data

  • Risk of inaction

  • ROI of data initiatives

  • Convincing stakeholders

Download the DDC Guidebook for Free

 

 

🎯 DDC Workshop That Solves Your Puzzle

Module 1 – Data Automation and Workflows in Construction:
  • Overview of data sources
  • Excel vs systems
  • Typical data flows in construction
  • Foundational data logic

Module 3 – Automated Data Processing Workflow:
  • Setting up ETL workflows
  • CAD/BIM extraction
  • Automation in Excel/PDF reporting

Module 8 – Converting Unstructured CAD into Structured Formats 
  • From IFC/Revit to tables
  • Geometric vs semantic data
  • Tools for parsing and transforming CAD models

Module 13 – Key Stages of Transformation 
  • Transformation roadmap
  • Change management
  • Roles and responsibilities
  • KPIs and success metrics

Module 8 – Integrating Diverse Data Systems and Formats
  • Excel, ERP, BIM integration
  • Data connection and file exchange
  • Structuring hybrid pipelines

Module 7 – Automating Data Quality Assurance Processes 
  • Rules and checks
  • Dashboards
  • Report validation
  • Automated exception handling

Module 10 – Challenges of Digitalization in the Industry 
  • How to justify investment in data
  • Stakeholder concerns
  • ROI examples
  • Failure risks

💬 Individual Consultation – What We'll Discuss

Audit of your data landscape 

We'll review how data is stored and shared in your company and identify key improvement areas.

Select a process for automation 

We'll pick one process in your company that can be automated and outline a step-by-step plan.

Strategic roadmap planning 

Together we’ll map your digital transformation priorities and build a realistic roadmap.

CAD (BIM) - IFC/Revit model review 

We'll review your Revit/IFC/DWG data and show how to convert it into clean, structured datasets.

Mapping integrations across tools 

We’ll identify your main data sources and define how they could be connected into one workflow.

Plan a pilot pipeline (PoC) 

We'll plan a pilot pipeline: where to start, what tools to use, and what benefits to expect.

ROI and stakeholder alignment 

📬 Get Your Personalized Report and Next Steps

You’ve just taken the first step toward clarity. But here’s the uncomfortable truth: 🚨 Most companies lose time and money every week because they don't know what their data is hiding. Missed deadlines, incorrect reports, disconnected teams — all symptoms of a silent data chaos that gets worse the longer it's ignored.

Please enter your contact details so we can send you your customized recommendations and next-step options tailored to your goals.

💡 What you’ll get next:

  • A tailored action plan based on your answers

  • A list of tools and strategies to fix what’s slowing you down

  • An invite to a free 1:1 session to discuss your case

  • And if you choose: a prototype (PoC) to show how your process could be automated — fast.

Open Data, Pandas, DataFrame and ChatGPT
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.
Read more
×