Empowers construction companies with process automation, utilizing open source code blocks and solutions borrowed from diverse industries
The reasons why I work more and
more with pipelines are simple
CEO & Co-Founder
A few years ago Data-Driven Construction team showed me Jupyter Notebooks and I fell in love.
Since then I do simple data manipulations quicker with a few lines of code instead of manually clicking in Excel. As always it was a gradual process, first I automated repeating tasks. As my confidence and know-how grows, I do more and more.
Moreover, with the advent of Chat GPT, it became even easier, by describing the desired outcome I get the code snippets and just need to adapt them.
It's saves me a lot of time. E.g. I have a list of new sign-ups and I have my company CRM. Once a month I check if all the signups are in the CRM. Before this was a manual comparison of two different lists. Now it's a press of a button and I get the list with the new signups in the right format to import it back to the CRM. By investing 1h of programming I'm saving 12h of work per year.
I reduce failures dramatically. Because I did not like to compare two lists I did not do it as regularly as I should have done it. Moreover, I'm not good at comparing data, every phone call disrupted my work and I forgot to change a data entry. The consequences were that not everybody who should have gotten my newsletter got it.
I feel like a god when I program. The more complex work becomes the less direct control over the results I have. But when programming I'm in absolute control of the result. I write a few lines of code and the computer executes them exactly as I tell him. When the result is not as I expected it to be, it's usually because of a logical shortcut on my side, the computer does exactly what I tell him.
CHATGPT MIT REVIT UND IFC
📤 The data in Revit and IFC models is created by one specialist, but there are dozens of other people working with that data, so the first priority is to optimize the flow of data throughout the organization, which LLMs such as ChatGPT models can significantly help with.
⚡️ DataDrivenConstruction converter and LLM-like ChatGPT language models pave the way for efficient process automation in processing data from Revit and IFC projects, eliminating the need for tedious manual data entry and analysis.
WHAT DOES THE PROCESS PIPELINE LOOK LIKE?
⚡️ Generate charts from Excel
An example of process automation
Manual process ~5 minutes
Pipeline runtime: ~5 seconds
1. Data opening
# Importing the necessary libraries # for data manipulation and plotting import pandas as pd import matplotlib.pyplot as plt # Reading the Excel file into a DataFrame (df) df = pd.read_excel('C:DDCConstruction-Budget.xlsx') # Show DataFrame table df
2. Grouping and visualization
# Grouping by 'Category', summing # the 'Amount', plotting the results, and adjusting the layout ax = df.groupby('Category')['Amount'].sum().plot(kind='bar', figsize=(10, 5), color='skyblue', title='Expenses by Category', ylabel='Amount', rot=45, grid=True).get_figure()
# Specifying the path for saving # the figure and saving the plot as a PNG file file_path = "C:DDCexpenses_by_category.png" plt.savefig(file_path)
⚡️ PDF from Revit or IFC project
An example of process automation
Manual process ~20 minutes
Pipeline runtime: ~20 seconds
1. Data opening
import converter as ddc import pandas as pd # Standalone conversion to flat formats without opening Revit or using API and Forge project_csv = ddc.revit('C:rme_basic_sample.rvt') # Importing Revit and IFC data df = pd.read_csv('C:rme_basic_sample.csv') df
2. Grouping and visualization
# Grouping a Revit or IFC project by parameters dftable = df.groupby('Type')['Volume'].sum() dftable
# Displaying a table as a graph graph = dftable.plot(kind='barh', figsize=(20, 12)) # Save graph as PNG graphtopng = graph.get_figure() graphtopng.savefig('C:DDC_samplegraph_type.png', bbox_inches = 'tight')
# Installing the library that allows generating PDF documents !pip install fpdf from fpdf import FPDF # Creating a PDF document based on the parameters found pdf = FPDF() pdf.add_page() pdf.set_font('Arial', 'B', 16) pdf.cell(190, 8, 'Grouping of the entire project by Type parameter', 2, 1, 'C') pdf.image('C:DDC_samplegraph_type.png', w = 180, link = '') # Saving a document in PDF format pdf.output('Report_DataDrivenConstruction.pdf', 'F')
DataFrames, are used in Pipelines - a core component of many data analysis libraries, offer significant advantages for working with data
Efficient Data Management: DataFrames are optimized for handling large datasets, providing faster data manipulation
Support for Heterogeneous Data: They can store different data types (like integers, strings, and floats) in various columns, which is ideal for real-world data
Built-in Operations: DataFrames come equipped with numerous built-in methods for data filtering, sorting, and aggregating, simplifying complex data operations.
Ease of Data Exploration: Their tabular structure makes it easy to explore, analyze, and visualize data, aiding in quick data inspection and analysis.
Compatibility with Data Analysis Tools: They seamlessly integrate with various data analysis and visualization libraries, enhancing productivity in data science tasks.
Every month fresh solutions and news in our social channels
Don't miss the new solutions
Relying solely on manual labor in the era of automation presents significant challenges
Human interventions can lead to varied outcomes, often influenced by fatigue, oversight, or misunderstanding
Manual processes can’t easily scale to handle large volumes of work or complex tasks without proportionally increasing resources or time
Manual tasks, especially repetitive ones, can be significantly slower than automated processes, leading to inefficiencies and delays
Utilizing a pipeline in data processing provides substantial benefits
Streamline data operations for faster and optimized execution
Ensure uniform results across datasets with reproducible outcomes
Structured code flow for easy understanding, debugging, and maintenance
Frequently Asked Questions (FAQ)
Pandas is an open-source data analysis library in Python that offers data structures and operations for manipulating numerical tables and time series
A pipeline in Pandas is a sequence of data processing steps, where each step is applied to the input data in order. It allows for a streamlined and organized execution of multiple operations, optimizing performance and readability
Pandas pipelines offer efficiency, consistency, and clarity in data processing. They reduce the risk of errors, provide standardized operations, and make the code easier to read and maintain
Absolutely! Pandas pipelines are designed to be flexible. You can add, modify, or remove specific steps to tailor the pipeline to your unique data processing requirements
Yes, pipelines can optimize the sequence of operations, often reducing overhead and improving execution speed. By eliminating unnecessary intermediate steps, pipelines often lead to faster data processing
Start by identifying the sequence of operations you wish to perform on your data. Then, using Pandas’ built-in methods and functions, you can chain these operations together to form a pipeline. There are also many tutorials and resources available online to guide you through the process
Yes, you can incorporate data retrieval steps from various sources, including databases, web APIs, and other external systems, into your pipeline. Once the data is retrieved, it can be processed seamlessly within the pipeline
Certainly! There are many experts and communities dedicated to Pandas and data processing. You can also reach out to specialized service providers who offer tailored solutions and support for building and maintaining customized pipelines