Semi-structured data contains some level of organization, but does not have a strict schema or structure. Although such information includes structured elements (e.g. dates, employee names and lists of tasks completed), the format of presentation may vary considerably from project to project or even from one employee to another. Examples of such data are time logs, progress reports and schedules, which can be presented in a variety of formats.
Semi-structured data is easier to analyze than unstructured data, but requires additional processing to integrate into standardized project management systems.
Working with semi-structured data, characterized by the presence of constantly changing structure, presents significant challenges. This is because the variability of the data structure requires separate individual approaches to processing and analyzing each source of semi-structured data.
But while dealing with unstructured data requires a lot of effort, processing semi-structured data can be done with relatively simple methods and tools.
Semi-structured data is a more general term that describes data with minimal or incomplete structure. It is most often text documents, chats, emails where some metadata (e.g. date, sender) is found, but most of the information is presented in a chaotic manner.
In construction, loosely structured data is found in a variety of processes. For example, they may include:
- Estimates and quotations – tables with material, volume and cost data, but without a uniform format.
- Drawings and engineering schematics – files in PDF or DWG, containing text annotations and metadata, but without a strictly fixed structure.
- Work schedules – data from MS Project, Primavera P6 or other systems, which may have different export structure.
- CAD (BIM -models) – contain elements of the structure, but data representation depends on the software and project standard.
Geometric data, produced by CAD systems, can be classified in the same way as semi-structured data. However, we will distinguish geometric CAD (BIM) data as a separate data type because it, like text data, can often be treated as a separate data type in company processes.