Structuring and categorizing CAD data (BIM) is more challenging because data stored from CAD (BIM) databases are almost always in closed or complex parametric formats, often combining geometric data elements (semi-structured) and metainformation elements (semi-structured or structured data) simultaneously.
Native data formats in CAD (BIM) systems are usually protected and inaccessible for direct use, unless specialized software or API – interfaces of the developer himself (Fig. 4.1-10). Such data isolation forms closed storage silos that limit the free exchange of information and inhibit the creation of end-to-end digital processes in the company.

In special CAD (BIM) formats, information about the characteristics and attributes of project elements is collected in a hierarchical classification system, where entities with corresponding properties are located, like the fruit of a fruit tree, in the most recent nodes of the data classification branches (Fig. 4.1-11).
Data extraction from such hierarchies is possible in two ways: either manually, by clicking on each node, as if processing a tree, cutting down selected branches of categories and types with an axe. An alternative option – the use of program interfaces (APIs) – implies a more efficient, automated approach to data retrieval and grouping, eventually transforming it into a structured table for use in other systems.
Different tools such as Dynamo, pyRvt, Pandamo (Pandas + Dynamo), ACC, or open source solutions, such as IfcOpSh or IFCjs for IFC format, can be used to extract structured data tables from CAD (BIM) projects.
Modern data export and conversion tools allow to simplify data processing and preparation by dividing the content of CAD models into two key components: geometry information and attribute data (Fig. 4.1-13) – meta-information describing the properties of design elements (Fig. 3.1-16). These two layers of data remain linked through unique identifiers, thanks to which it is possible to precisely map each element with geometry description (via parameters or polygons) to its attributes: name, material, stage of completion, cost, and so on. This approach ensures the integrity of the model and allows flexible use of data both for visualization (geometric model data) and for analytical or management tasks (structured or loosely structured), working with the two types of data separately or in parallel.

With the development of reverse engineering technologies and the advent of software development kits SDK (Software Development Kit) for CAD data conversion – availability and conversion of data from closed CAD program formats (BIM) has become much easier. It is now possible to legally and safely convert data from closed formats into universal formats suitable for analysis and use in other systems. The history of the first reverse engineering tools (“Open DWG”) and the struggle for dominance over CAD vendors’ formats was discussed in the chapter “Structured data: the foundation of digital transformation”.
Reverse engineering tools allow legitimate retrieval of data from closed proprietary formats, breaking down information from the mixed CAD (BIM) format into the data types and formats required by the user, making it easier to process and analyze.
Using reverse engineering and direct access to information from CAD databases makes information accessible, allowing open data and open tools, as well as analyzing data with standard tools, building reports, visualizations, and integrating with other digital systems (Fig. 4.1-12).

Since 1996 for DWG format, since 2008 for DGN format and since 2018 for RVT it has been possible to convert initially closed CAD data formats into any other formats, including structured formats, conveniently and efficiently with the help of reverse engineering tools (Fig. 4.1-13). Today, almost all major CAD (BIM) and large engineering companies in the world use SDKs – reverse engineering tools to extract data from closed CAD (BIM) vendor formats (“Members: Founders and corporate members,” 2024).

Converting data from closed, proprietary formats to open formats and separating mixed CAD (BIM) formats into geometric and meta-information attribute data simplifies the process of working with it, making it available for analysis, manipulation, and integration with other systems (Fig. 4.1-14).
In today’s work with CAD data (BIM), we have reached the point where you don’t need to request permission from CAD (BIM) vendors to access information from CAD formats.

Current trends in CAD design data processing continue to be shaped by key market players – CAD – vendors who are working to strengthen their position in the data world and create new formats and concepts.