Creating a database using ChatGPT
24 February 2024
Data validation and validation results
24 February 2024

Structured requirements and RegEx

Due to the ever-increasing number of systems in companies, managers responsible for various systems are moving around the departments, who, unable to keep up with the growing amount of information, ask specialists to create data in such a way that this data can be used in other systems and applications.

Engineers and specialists who create data often do not know and do not understand where the data they create will be used later.

Therefore, the main problem of the managing managers of companies working in the construction industry is to make the process transparent and understandable for all participants.

To develop precise data requirements for a construction project, it's essential to start by comprehensively understanding the processes and applications involved. Construction projects vary greatly in complexity, often encompassing dozens to thousands of applications and systems, each requiring accurate up-to-date information and continuous monitoring.

To effectively write data requirements in existing tables and databases, one must first understand the specifics of the processes and applications for which the data is being collected.

In the following example, we will look at a scenario in which technicians maintaining various systems, tables, and databases in a company are populating a project with data to fulfil a client's request to add a new window to the current project.

Analyzing and gathering process requirements starts with finding all stakeholders. Let's imagine a company has a project where the client makes a new request - "add an additional window on the north side of the building".

The small process of "client request to add a new window to the current project" involves the architect, client, CAD (BIM) specialist, construction manager, logistics manager, ERP analyst, quality control engineer, safety engineer, control manager and real estate manager.

Even a small process can involve dozens to hundreds of different specialists. Each process participant needs to understand the requirements of the specialists to whom they are linked at the data level.

At the textual level, the communication between the client and the specialists is as follows:

Customer: "We have decided to add an additional window on the north side for better lighting. Can this be realized?"

Architect: "Of course, I will revise the design to include the new window and send updated CAD (BIM) plans."

CAD (BIM) specialist: "Got a new project. I update CAD (BIM) model with additional window and after coordination with statistician I provide exact location and dimensions of new window.

Construction Manager: "New design received. We adjust the timing of 4D installation and inform all relevant organisations".

Facilities Engineer (CAFM): "I will enter 6D data on the new window into the CAFM system for future facility management and maintenance planning."

Logistics Manager: "I need the dimensions and weight of the new window to organize the delivery of the window to the site."

ERP Analyst: "Updating the 5D budget in our ERP system to reflect the cost of the new window in the overall project estimate."

Quality Control Engineer: "Once the window specifications are ready, I will make sure they meet our quality and material standards."

Safety Engineer: 'I will assess the safety aspects of the new window, focusing on compliance and 8D evacuation.

Controls Manager: "Let's update our 4D timeline to reflect the new window installation and save the new data into the project's content management system."

Worker (installer): "Need instructions on installation, assembly, and lead times. Also, any special safety protocols I need to follow?

Property Manager: "Once installed, I will document warranty and maintenance information for long-term management."

Asset Manager: 'Equipment Engineer, please send final data for asset tracking and lifecycle management'.

Client: "Wait, maybe I'm in a hurry and the window won't be needed. It might be worth making a balcony.".

In this scenario, communication between different professionals, including the client and the architect, occurs predominantly through textual data such as emails, dialogs, calls and meetings.

In such a text-based communication system for a construction project, a system of legal confirmation and recording of all data exchange transactions and all decisions made is essential. This is to ensure that every decision, instruction and change made is legally valid and traceable, reducing the risk of misunderstandings in the future.

The lack of legal control and confirmation of decisions in the relevant systems of a construction project can lead to serious problems for all involved. Every decision, instruction or change made without proper documentation and confirmation can lead to disputes and legal proceedings.

Lack of text transaction confirmations not only delays the project but can also lead to additional financial losses and deterioration of relationships between project participants.

Legal fixation of all decisions in text communication can be ensured only with a large number of signed documents, which will fall on the shoulders of the management, which will be obliged to record all transactions. Text-based communication require each specialist to either familiarize himself with the full correspondence or to participate in all meetings on a regular basis to understand the current status of the project.

In order to move from textual communication and textual records of "decision-making operations" to the system level, methods for converting textual operations into a more structured and usable format are needed. As in data modelling (Figure 2.5-2), we moved from the contextual-idea level to the conceptual level, adding the systems and tools used by participants and the links between them.

And the first step in systematising requirements and relationships is to visualise all links and relationships with the help of visual flowcharts. The conceptual level not only makes it easier for all process participants to understand the entire process chain, but also visualises why and for whom data is needed at each process step.

Block diagrams of processes and the effectiveness of conceptual frameworks

Due to the rapid growth of digitised documents, when specialists only captured and stored data in tables and databases, the requirements for data, collection processes and the databases or tables themselves were rarely formed.

To bridge the gap between traditional data management practices and today's digital requirements, it is necessary to recognize the evolution of data handling from simple storage to sophisticated analysis and automation.

 

This transformation requires a shift to a structured and conceptualized approach to data management. By focusing on creating conceptual frameworks and visual representations such as flowcharts, organisations can better understand both the features and intricacies of their own processes.

If there is a need for processes not just to store data, but to analyse or automate it, then it is necessary to start dealing with the topic of creating a conceptual-visual level of requirements.

As we move toward more complex data ecosystems, implementing conceptual and visual tools becomes critical to ensure that data processes are not only efficient, but also aligned with the organization's strategic goals. Data consistency starts with a thorough understanding of use cases, which lays the foundation for defining the minimum requirements for data collection and analysis.

Adopting minimalism in data practices naturally leads to improved data governance. With a clear framework based on minimum requirements, organisations can implement effective data governance strategies that ensure the quality, security, and compliance of data throughout lifecycle.

In our example, each specialist can be part of not only a small team, but also a larger department including up to a dozen experts under the control of a general manager. Each department uses a specialized application database (e.g. ERP, CAD, MEP, CDE, ECM, CPM) that is regularly updated with incoming information needed to create documents, record the legal status of decisions and manage processes.

It is similar to the work of ancient managers 4,000 years ago, when clay tablets and papyrus were used to legally confirm decisions. The difference between modern systems and their clay and paper predecessors is that modern methods additionally add the process of converting textual information into digital form for further processing in other systems and tools.

Creating a visualization of the process in the form of conceptual flowcharts will help to describe each step and the interaction between different roles, making a complex workflow clearer and simpler.

The dialog that takes place between the participants at the initial stage of setting a task from the client about installation of a new window - in the flowchart will be represented as follows: specialists will be designated as users of systems or databases, and their messages and requests will be replaced by communication lines with the direction of arrows, showing the interaction between different systems.

Process visualization ensures that the logic of the entire process is transparent and accessible to all team members.

Unfortunately, often when presenting processes in the form of flowcharts with standard documents attached, specialists and managers limit themselves to the " conceptual level" of process visualisation, believing that the project participants will understand their functions thanks to the flowchart and cope with requirements and data quality checks on their own.

At a conceptual level, it is difficult for specialists to understand the requirements of other systems and applications in the company used in different departments. This difficulty arises because process participants often do not understand what their own data requirements are and cannot determine from conceptual flowcharts what data requirements are needed by their colleagues with whom they interact in the overall process.

As a result, even if a process is already described in detail at a conceptual level using flowcharts, this does not necessarily make it more efficient. Visualization often simply makes the work of managers easier, who, having applied a step-by-step reporting system, can now more conveniently request information from colleagues and track the process manually using flowcharts.

To fully translate this process to the data level, we need to go a level deeper and translate the conceptual visualisation of the process to the logical and physical level of data, required attributes and their boundary values.

Structured requirements and RegEx regular expressions

The majority of data (80%) in companies is created in unstructured formats, which slows down or makes it often impossible to flow smoothly, so we convert textual, unstructured and semi-structured data into structured form to improve the efficiency of data processing.

In the same way that specialists often do not know how to translate multi-format data into structured formats, specialists also do not know how to structure their requirements and wishes, leaving them in text format throughout the process.

 Just as we have already converted data from unstructured text form to structured form, in our requirements process we will convert textual requirements into "logical and physical level" structured format.

In tabular form, we will describe the requirements for each system in the form of attributes and their boundary values.

In our example, let's take a closer look at the needs of a quality control engineer who uses a construction quality management system (CQMS) to ensure that the standards and requirements of a building product, in this case window systems, are met.

As an example, consider some important requirements for attributes of window system type entities in CQMS: energy efficiency, acoustic performance and warranty period. Each category includes certain standards and specifications that must be considered when designing and installing window systems.

The data requirements that the QA engineer sets up in the form of a table have the following boundary values:

  • Window energy efficiency class attributes range from "A++", denoting the highest efficiency, to "B", considered the minimum acceptable level, and these classes are represented by the list ["A++", "A+", "A", "A", "B"].
  • The attribute acoustic insulation of windows, measured in decibels and showing their ability to reduce street noise, is defined by the regular expression d{2}dB.
  • The attribute warranty period for window type entity starts at five years, setting this period as the minimum allowed when selecting a product; warranty period values are also specified, e.g. ["5 years", "10 years", etc.].

Within the established attributes, classes below "B" such as "C" or "D" will not pass inspection for window energy efficiency when new data becomes available. Acoustic insulation of windows, in data or documents that come to the Quality Control Engineer must be labeled with a two-digit number followed by the postfix "dB", such as "35dB" or "40dB", and values outside of this format, such as "9dB" or "100dB", will not be acceptable. The warranty period must begin with a minimum of "5 years" and shorter periods such as "3 years" or "4 years" will not meet the requirements that the quality engineer has described in the table format.

 

In order to check values against boundary values from requirements during the validation process we use regular expressions (acoustic insulation of windows) to check data consistency and integrity based on predefined rules.

 

Regular expressions (RegEx, are used in programming languages, including Python (Re), to find and manipulate strings. Regex is like a detective in the world of strings, able to identify text patterns in text with precision.

In regular expressions, letters are described directly using the corresponding characters of the alphabet, while numbers can be represented using the special character d, which corresponds to any digit from 0 to 9. Square brackets are used to indicate a range of letters or digits, e.g., [a-z] for any lowercase letter of the Latin alphabet or [0-9], which is equivalent to d. For non-numeric characters and non-letter characters, D and W are used respectively.

Popular RegEx use cases:

  • Checking an email address: To check if the string is a valid email address, the template ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$ can be used.

 

  • Date Extraction: bd{2}d{2}d{2}d{2}d{2}.d{2}.d{4}b template can be used to extract the date from the text in DD.MM.YYYYY format.

 

  • Phone Number Verification: To verify phone numbers in the format +49(000)000-0000, the pattern will look like +d{2}(d{3})d{3}-d{4}.

By translating the QA engineer's requirements into the format of attributes and their boundary values, we have transformed them from their original text format into an organised and structured table, thus facilitating future validation and analysis of incoming data. By having requirements, we do not need data that has not been validated, while validated data can be automatically transferred to systems for further processing.

Now we will convert all the requirements of all the specialists from our new window installation process into an organised list in attribute format and add these lists from the required attributes to our flowchart for each specialist..

By adding all attributes to one common process table, we transform the information previously presented in the form of text and dialogue into a structured and systematized form of tables linked by means of a block diagram.

The data requirements are now clearly structured and the next step is to collect the data and prepare it for the validation process. The requirements for each system should be communicated to the specialists who create the data for those specific systems. Only when the requirements are in hand can the data creation phase begin.

 

Checking the presence of attributes and their values in the created data will allow us to ensure that the information provided has reached the required level of quality and is ready to be used in the appropriate use cases important to the professionals at a particular stage of the process of adding a new window entity to the project. If the data meets the requirements like a green light, we can automatically route it to the right people and systems for which the data was intended.

Leave a Reply

Change language

Post's Highlights

    Stay updated: news and insights



    We’re Here to Help

    Fresh solutions are released through our social channels

    UNLOCK THE POWER OF DATA
     IN CONSTRUCTION

    Dive into the world of data-driven construction with this accessible guide, perfect for professionals and novices alike.
    From the basics of data management to cutting-edge trends in digital transformation, this book
    will be your comprehensive guide to using data in the construction industry.

    Related posts 

    Focus Areas

    navigate
    • ALL THE CHAPTERS IN THIS PART
    • A PRACTICAL GUIDE TO IMPLEMENTING A DATA-DRIVEN APPROACH (8)
    • CLASSIFICATION AND INTEGRATION: A COMMON LANGUAGE FOR CONSTRUCTION DATA (8)
    • DATA FLOW WITHOUT MANUAL EFFORT: WHY ETL (8)
    • DATA INFRASTRUCTURE: FROM STORAGE FORMATS TO DIGITAL REPOSITORIES (8)
    • DATA UNIFICATION AND STRUCTURING (7)
    • SYSTEMATIZATION OF REQUIREMENTS AND VALIDATION OF INFORMATION (7)
    • COST CALCULATIONS AND ESTIMATES FOR CONSTRUCTION PROJECTS (6)
    • EMERGENCE OF BIM-CONCEPTS IN THE CONSTRUCTION INDUSTRY (6)
    • MACHINE LEARNING AND PREDICTIONS (6)
    • BIG DATA AND ITS ANALYSIS (5)
    • DATA ANALYTICS AND DATA-DRIVEN DECISION-MAKING (5)
    • DATA CONVERSION INTO A STRUCTURED FORM (5)
    • DESIGN PARAMETERIZATION AND USE OF LLM FOR CAD OPERATION (5)
    • GEOMETRY IN CONSTRUCTION: FROM LINES TO CUBIC METERS (5)
    • LLM AND THEIR ROLE IN DATA PROCESSING AND BUSINESS PROCESSES (5)
    • ORCHESTRATION OF ETL AND WORKFLOWS: PRACTICAL SOLUTIONS (5)
    • SURVIVAL STRATEGIES: BUILDING COMPETITIVE ADVANTAGE (5)
    • 4D-6D and Calculation of Carbon Dioxide Emissions (4)
    • CONSTRUCTION ERP AND PMIS SYSTEMS (4)
    • COST AND SCHEDULE FORECASTING USING MACHINE LEARNING (4)
    • DATA WAREHOUSE MANAGEMENT AND CHAOS PREVENTION (4)
    • EVOLUTION OF DATA USE IN THE CONSTRUCTION INDUSTRY (4)
    • IDE WITH LLM SUPPORT AND FUTURE PROGRAMMING CHANGES (4)
    • QUANTITY TAKE-OFF AND AUTOMATIC CREATION OF ESTIMATES AND SCHEDULES (4)
    • THE DIGITAL REVOLUTION AND THE EXPLOSION OF DATA (4)
    • Uncategorized (4)
    • CLOSED PROJECT FORMATS AND INTEROPERABILITY ISSUES (3)
    • MANAGEMENT SYSTEMS IN CONSTRUCTION (3)
    • AUTOMATIC ETL CONVEYOR (PIPELINE) (2)

    Search

    Search

    057 Speed of decision making depends on data quality

    Today’s design data architecture is undergoing fundamental changes. The industry is moving away from bulky, isolated models and closed formats towards more flexible, machine-readable structures focused on analytics, integration and process automation. However, the transition...

    060 A common language of construction the role of classifiers in digital transformation

    In the context of digitalization and automation of inspection and processing processes, a special role is played by classification systems elements – a kind of “digital dictionaries” that ensure uniformity in the description and parameterization...

    061 Masterformat, OmniClass, Uniclass and CoClass the evolution of classification systems

    Historically, construction element and work classifiers have evolved in three generations, each reflecting the level of available technology and the current needs of the industry in a particular time period (Fig. 4.2-8): First generation (early...

    Don't miss the new solutions

     

     

    Linux

    macOS

    Looking for the Linux or MAC version? Send us a quick message using the button below, and we’ll guide you through the process!


    📥 Download OnePager

    Welcome to DataDrivenConstruction—where data meets innovation in the construction industry. Our One-Pager offers a concise overview of how our data-driven solutions can transform your projects, enhance efficiency, and drive sustainable growth. 

    🚀 Welcome to the future of data in construction!

    You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

    By downloading, you agree to the DataDrivenConstruction terms of use 

    Stay ahead with the latest updates on converters, tools, AI, LLM
    and data analytics in construction — Subscribe now!

    🚀 Welcome to the future of data in construction!

    You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

    By downloading, you agree to the DataDrivenConstruction terms of use 

    Stay ahead with the latest updates on converters, tools, AI, LLM
    and data analytics in construction — Subscribe now!

    🚀 Welcome to the future of data in construction!

    You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

    By downloading, you agree to the DataDrivenConstruction terms of use 

    Stay ahead with the latest updates on converters, tools, AI, LLM
    and data analytics in construction — Subscribe now!

    🚀 Welcome to the future of data in construction!

    You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

    By downloading, you agree to the DataDrivenConstruction terms of use 

    Stay ahead with the latest updates on converters, tools, AI, LLM
    and data analytics in construction — Subscribe now!

    🚀 Welcome to the future of data in construction!

    You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

    By downloading, you agree to the DDC terms of use 

    🚀 Welcome to the future of data in construction!

    You're taking your first step into the world of open data, working with normalized, structured data—the foundation of data analytics and modern automation tools.

    By downloading, you agree to the DataDrivenConstruction terms of use 

    Stay ahead with the latest updates on converters, tools, AI, LLM
    and data analytics in construction — Subscribe now!

    DataDrivenConstruction offers workshops tested and practiced on global leaders in the construction industry to help your team navigate and leverage the power of data and artificial intelligence in your company's decision making.

    Reserve your spot now to rethink your
    approach to decision making!

    Please enable JavaScript in your browser to complete this form.

     

    🚀 Welcome to the future of data in construction!

    By downloading, you agree to the DataDrivenConstruction terms of use 

    Stay ahead with the latest updates on converters, tools, AI, LLM
    and data analytics in construction — Subscribe now!

    Have a question or need more information? Reach out to us directly!
    Schedule a time to discuss your needs with our team.
    Tailored sessions to help your team grow — let's plan together!
    Have you attended one of our workshops, read our book, or used our solutions? Share your thoughts with us!
    Please enable JavaScript in your browser to complete this form.
    Name
    Data Maturity Diagnostics

    🧰 Data-Driven Readiness Check

    This short assessment will help you identify your company's data management pain points and offer solutions to improve project efficiency. It takes only 1–2 minutes to complete and you will receive personalized recommendations tailored to your needs.

    🚀 Goals and Pain Points

    What are your biggest obstacles today — and your goals for the next 6 months? We’ll use your answers to build a personalized roadmap.

    Build your automation pipeline

     Understand and organize your data

    Automate your key process

    Define a digital strategy

    Move from CAD (BIM) to databases and analytics

    Combine BIM, ERP and Excel

    Convince leadership to invest in data

    📘  What to Read in Data-Driven Construction Guidebook

    Chapters 1.2, 4.1–4.3 – Technologies, Data Conversion, Structuring, Modeling:

    • Centralized vs fragmented data

    • Principles of data structure

    • Roles of Excel, DWH, and databases

    Chapters 5.2, 7.2 – QTO Automation, ETL with Python:

    • Data filtering and grouping

    • Automating QTO and quantity takeoff

    • Python scripts and ETL logic

    Chapter 10.2 – Roadmap for Digital Transformation:

    • Strategic stages of digital change

    • Organizational setup

    • Prioritization and execution paths

    Chapters 4.1, 8.1–8.2 – From CAD (BIM) to Storage & Analytics:

    • Translating Revit/IFC to structured tables

    • BIM as a database

    • Building analytical backends

    Chapters 7.3, 10.2 – Building ETL Pipelines + Strategic Integration:

    • Combining Excel, BIM, ERP

    • Automating flows between tools

    • Connecting scattered data sources

    Chapters 7.3, 7.4 – ETL Pipelines and Orchestration (Airflow, n8n):

    • Building pipelines

    • Scheduling jobs

    • Using tools like Airflow or n8n to control the flow 

    Chapters 2.1, 10.1 – Fragmentation, ROI, Survival Strategy:

    • Hidden costs of bad data

    • Risk of inaction

    • ROI of data initiatives

    • Convincing stakeholders

    Download the DDC Guidebook for Free

     

     

    🎯 DDC Workshop That Solves Your Puzzle

    Module 1 – Data Automation and Workflows in Construction:
    • Overview of data sources
    • Excel vs systems
    • Typical data flows in construction
    • Foundational data logic

    Module 3 – Automated Data Processing Workflow:
    • Setting up ETL workflows
    • CAD/BIM extraction
    • Automation in Excel/PDF reporting

    Module 8 – Converting Unstructured CAD into Structured Formats 
    • From IFC/Revit to tables
    • Geometric vs semantic data
    • Tools for parsing and transforming CAD models

    Module 13 – Key Stages of Transformation 
    • Transformation roadmap
    • Change management
    • Roles and responsibilities
    • KPIs and success metrics

    Module 8 – Integrating Diverse Data Systems and Formats
    • Excel, ERP, BIM integration
    • Data connection and file exchange
    • Structuring hybrid pipelines

    Module 7 – Automating Data Quality Assurance Processes 
    • Rules and checks
    • Dashboards
    • Report validation
    • Automated exception handling

    Module 10 – Challenges of Digitalization in the Industry 
    • How to justify investment in data
    • Stakeholder concerns
    • ROI examples
    • Failure risks

    💬 Individual Consultation – What We'll Discuss

    Audit of your data landscape 

    We'll review how data is stored and shared in your company and identify key improvement areas.

    Select a process for automation 

    We'll pick one process in your company that can be automated and outline a step-by-step plan.

    Strategic roadmap planning 

    Together we’ll map your digital transformation priorities and build a realistic roadmap.

    CAD (BIM) - IFC/Revit model review 

    We'll review your Revit/IFC/DWG data and show how to convert it into clean, structured datasets.

    Mapping integrations across tools 

    We’ll identify your main data sources and define how they could be connected into one workflow.

    Plan a pilot pipeline (PoC) 

    We'll plan a pilot pipeline: where to start, what tools to use, and what benefits to expect.

    ROI and stakeholder alignment 

    📬 Get Your Personalized Report and Next Steps

    You’ve just taken the first step toward clarity. But here’s the uncomfortable truth: 🚨 Most companies lose time and money every week because they don't know what their data is hiding. Missed deadlines, incorrect reports, disconnected teams — all symptoms of a silent data chaos that gets worse the longer it's ignored.

    Please enter your contact details so we can send you your customized recommendations and next-step options tailored to your goals.

    💡 What you’ll get next:

    • A tailored action plan based on your answers

    • A list of tools and strategies to fix what’s slowing you down

    • An invite to a free 1:1 session to discuss your case

    • And if you choose: a prototype (PoC) to show how your process could be automated — fast.

    Clean & Organized Data

    Theoretical Chapters:

    Practical Chapters:

    What You'll Find on
    DDC Solutions:

    • CAD/BIM to spreadsheet/database converters (Revit, AutoCAD, IFC, Microstation)
    • Ready-to-deploy n8n workflows for construction processes
    • ETL pipelines for data synchronization between systems
    • Customizable Python scripts for repetitive tasks
    • Intelligent data validation and error detection
    • Real-time dashboard connectors
    • Automated reporting systems

    Connect Everything

    Theoretical Chapters:

    Practical Chapters:

    What You'll Find on
    DDC Solutions:

    • CAD/BIM to spreadsheet/database converters (Revit, AutoCAD, IFC, Microstation)
    • Ready-to-deploy n8n workflows for construction processes
    • ETL pipelines for data synchronization between systems
    • Customizable Python scripts for repetitive tasks
    • Intelligent data validation and error detection
    • Real-time dashboard connectors
    • Automated reporting systems

    Add AI & LLM Brain

    Theoretical Chapters:

    Practical Chapters:

    What You'll Find on
    DDC Solutions:

    • CAD/BIM to spreadsheet/database converters (Revit, AutoCAD, IFC, Microstation)
    • Ready-to-deploy n8n workflows for construction processes
    • ETL pipelines for data synchronization between systems
    • Customizable Python scripts for repetitive tasks
    • Intelligent data validation and error detection
    • Real-time dashboard connectors
    • Automated reporting systems
    Structured requirements and RegEx
    This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.
    Read more
    ×