Converting image into structured form
19 February 2024Data modelling: conceptual, logical and physical model
19 February 2024Converting documents and images into a structured format can be achieved using relatively simple tools based on taxonomy and categorization.
Categorization of elements is also a key and important part of working with project data from CAD (BIM) programs. But structuring and categorising CAD (BIM) data is a more difficult task because data exported from CAD (BIM) databases is almost always in closed or formats that mix both geometric data elements (semi-structured data) and metainformation elements (structured data).
In these formats, feature information is collected in a hierarchical classification system where the entities with relevant properties are located, like fruit of a fruit tree, at the most recent nodes of the CAD (BIM) data classification branches.
In order to get such data from CAD (BIM) tree classification it is necessary to climb with the help of an axe by clicking on the buttons of the user interface or with the help of an API chainsaw to get to the necessary branch in order to cut it down and transform it into a structured table for its use in other systems.
To retrieve data tables from CAD (BIM) projects, we can use various tools from CAD (BIM) vendors themselves, such as Pandamo (Pandas + Dynamo), Dynamo, pyRevit for Revit®, Forge or open source solutions such as IfcOpenShall for the IFC format. Such tools allow you to separate the data stored in CAD (BIM) format into geometric information and meta-information, as well as attribute information of design elements.
Proprietary CAD (BIM) database formats are closed and protected and quality access to data in such databases was provided only through specialized programs from CAD (BIM) vendors or through additional API layers that provide limited access to CAD (BIM) database programs.
With the development of reverse engineering technologies and the advent of software development kits (SDKs), the availability and conversion of data from closed CAD (BIM) program formats has become much easier.
Reverse engineering tools allow legitimate and efficient conversion of data from closed proprietary formats to structured formats, breaking down information from a mixed CAD (BIM) format into the types of data and formats that the user needs, facilitating their processing and analysis.
This allows practitioners to move away from mixed format processing of CAD (BIM) models, which focuses on working with data in specialised software, to a data-centric approach, which focuses primarily on open data.
Since 2002 for DWG (AutoCAD®) format, since 2008 for DGN (MicroStation®) format and since 2018 for RVT (Revit® BOM-BIM), it has been possible to convert data into structured formats conveniently and efficiently using reverse engineering tools.
Today, almost all major CAD (BIM) and large engineering companies in the world themselves use reverse engineering SDK tools to extract data from closed CAD (BIM) formats of other CAD (BIM) vendors.
Converting data from closed, proprietary formats to more publicly available formats or splitting mixed CAD (BIM) formats into geometric and meta-information data simplifies the process of working with them, making them more accessible for analysis, manipulation and integration with other systems. We will talk more about the conversion process and code for automating the acquisition of CAD (BIM) data in the chapter "Pipeline-ETL data validation process with ChatGPT".
In modern work with CAD (BIM) data, we have reached a stage where it is not necessary to request permission from CAD (BIM) vendors to access the data.
Having explored the different types of data and tools for transforming it from a wide range of formats into a structured form, we can now move on to one of the most difficult steps in working with data - modelling the data and checking its quality.