Automated BCF Data Extraction For BIM QC Communication

Antonio J. Romero Requejo

Bachelor’s Thesis

Civil and Construction Engineering,

Raasepori 2019

BACHELOR’S THESIS Author: Antonio J. Romero Requejo Degree Programme: Civil and Construction Engineering, Raasepori

Specialization: Structural Engineering Supervisors: Mats Lindholm, Max Levander

Title: Automated BCF Data Extraction for BIM QC Communication ______Date 28.10.2019 Number of pages 53 Appendices 3 ______Abstract According to multiple studies, communication in the AEC industry is a large, evident problem that should be addressed in order to minimize errors and maximize overall quality. Simultaneously, the AEC industry is taking a disruptive step by highly integrating Information Technologies and Automatization in its workflows to accelerate efficiency and provide better suited solutions. How industry members adopt and change to integrate this new work approach, will define how the industry will develop and who will emerge as leader in the next decades.

BIM coordination is now an essential part of the modern construction process, both consuming and generating large amounts of information that result in the digital model that will be used to physically build and maintain the object. These large amounts of data result in an “information overload” situation, leading to dense fragmented data, lack of accountability, and failure to address problems among other, being this problem particularly acute as multiple disciplines join the model.

This thesis tries to solve this problem by developing the implementation for an automated issue (topic) tracking and quality control dashboard reporting system that would, if not necessarily solve, help to mitigate this problem by turning the current issue tracking trough single files into a more manageable and integrated way of storing, sharing and presenting BCF information. ______Language: English Key words: BIM coordination, Issue (topic) tracking, BCF, Automation, QA, QC ______

EXAMENSARBETE

Författare: Antonio J. Romero Requejo Utbildning och ort: Ingenjör (YH), byggnads- och samhällsteknik, Raseborg. Inriktningsalternativ/Fördjupning: Projektering och byggnadskonstruktion Handledare: Mats Lindholm, Max Levander

Titel: Automatiserad BCF-datautvinning för BIM QC-kommunikation ______Datum 28.10.2019 Sidantal 53 Bilagor 3 ______Abstrakt Enligt flera studier är kommunikation inom AEC-industrin ett stort uppenbart problem som bör lösas för att minimera fel och maximera den totala kvaliteten. Samtidigt tar AEC- industrin ett upplösande steg genom att i hög grad integrera informationsteknologier och automatisering i sina arbetsflöden för att öka effektiviteten och bjuda på mer lämpliga lösningar. Hur företag i branschen anpassar sig och förändras för att integrera denna nya arbetsmetod, kommer att definiera hur branschen kommer att utvecklas och vem som kommer att leda under de kommande decennierna.

BIM-samordning är en väsentlig del av den moderna byggprocessen. Det både förbrukar och producerar stora mängder information som resulterar i den digitala modellen som kommer att användas för att ”fysiskt” bygga och underhålla objektet. Dessa stora mängder data resulterar i "informationsöverbelastning", vilket leder bland annat, till täta fragmenterade data, otydligt ansvar och misslyckanden med att lösa problem. Detta problem är särskilt akut när flera discipliner ansluter sig till modellen.

Det här examensarbetet försöker lösa detta problem genom att utveckla ett automatiskt spårnings- och kvalitetskontrollrapportsystem som skulle, om inte nödvändigtvis lösa, åtminstone hjälpa till att minimera detta problem. Detta genom att göra det aktuella problemet att spåra enskilda filer till ett mer hanterbart och integrerat sätt att lagra, dela och presentera BCF-information. ______Språk: Engelska Nyckelord: BIM-samordning, Ämne spårning, BCF, Automatisering, QA, QC ______

Acknowledgements

I would like to thank Max Levander and Ramboll for the opportunity to write this thesis for the BIM and Digi Center. To my colleagues in Ramboll for all the encouragement and collaboration.

Likewise, I would like to thank Novia UAS and its faculty members, and staff for the support and encouragement received during all the years...Tack!

To my parents, sisters and extended family

To Jennie, Filip and Julian…for everything.

Abbreviations

• AEC: Architecture, Engineering and Construction • BIM: Building Information Modelling. Building Information Model. • IFC, .ifc: Industry Foundation Classes. File format extension for IFC files. • BCF, BCF: BIM Collaboration Format. File format for BCF files. • CDE: Common data environment • dB, DB: Database • URI: Uniform resource identifier • SaaS: Software as a service • PaaS: Platform as a service • IaaS: Infrastructure as a service • RFI: Request for Information • XML: Extensible Markup Language • JSON: JavaScript Object Notation • GUID: Globally Unique Identifier • DAX: Data Analysis Expression language

0

Table of Contents 1 Introduction ...... 1 1.1 Background ...... 1 1.2 The BIM Coordination Case ...... 1 1.3 Thesis Objectives ...... 2 1.4 Research Constraints ...... 3 1.5 Research approach ...... 4 2 Regarding BIM ...... 5 2.1 Collaboration in BIM Coordination ...... 5 2.2 The BIM Coordination process ...... 6 2.3 Communication in the AEC industry ...... 8 2.4 The need for a BIM information manager ...... 10 2.5 BIM coordination data extraction ...... 10 2.5.1 BIM coordination data flow ...... 11 3 File Formats and Tools ...... 17 3.1 The BCF File Format ...... 17 3.2 The JSON file format ...... 18 3.3 MS Azure and Cosmo DB ...... 20 3.3.1 Data separation and security ...... 21 3.3.2 Creating a service ...... 21 3.4 FME...... 23 3.4.1 Uploading to the server ...... 24 3.4.2 Downloading from the server ...... 27 3.4.3 Current situation with Cosmo DB and FME ...... 27 3.5 PowerBI ...... 28 3.5.1 Reading Cosmo DB data ...... 29 3.5.2 Data presentation and analysis ...... 31 3.5.3 Accessing BCF data directly from PowerBI ...... 36 4 Technical Solutions ...... 38 4.1 The OpenBIM initiative ...... 38 4.2 Parsing and data extraction...... 38 4.3 Data wrangling and code ...... 39 5 Presentation of Results ...... 42 5.1 Data Mining BCF Files ...... 42 5.2 Summary of general results ...... 50 6 Conclusion and Further Steps ...... 51 7 Bibliography ...... 53

1

1 Introduction

1.1 Background

Modern Architecture, Engineering and Construction industry (AEC) can no longer be understood without the benefits that Building Information Modelling (BIM) has offered the building sector. Increased profits, reduced errors and omissions, faster and shorter workflows (Stephen A. Jones, Harvery M. Bernstein, 2012) among other reported benefits, have pushed the industry into a new realm of effectivity and productivity. None the less there a plenty of challenges BIM and BIM Managers face daily. There are plenty of well- studied challenges, Andrew Criminale and Sandeep Langar from the University of Southern Mississippi, in their “Challenges with BIM Implementation: A Review of Literature” (Langar & Criminale, 2017) identify up to thirty-six individual problems, of which at least one third of them can be seen as tracing back to, or can cause further down the line, communication problems and errors. Delays, errors and misunderstandings are well accredited as one of the main factors leading to problems in the construction industry. (Pellinen, 2016)

1.2 The BIM Coordination Case

In Max Levander’s, head of the BIM and Digi Center for Ramboll Finland, words “BIM coordination is about assessing and cross disciplinary coordinating design using BIM”. This is what current BIM coordination at Ramboll for the Finnish market is at its core. BIM coordination, in more general terms, could be understood as the process of constructing a virtual building before any work is done on site, allowing the team to identify, schedules, cost, design and constructability issues (topics), etc. and it is nowadays an integral part of any medium to large project. The BIM coordinator’s duties consist, among other, in reviewing the physical coordination of all design disciplines and systems as a group and, is ultimately responsible, to determine the clashes and problems in the building model. Collaboration on the coordination act is difficult and problematic. In other words, communication between the parties becomes a key issue (topic) and thus is a problem on itself. BIM coordination suffers from an “information overload” problem. This is particularly acute when multiple design disciplines are aggregated to the Building Information Model, 2 increasing exponentially the number and version of issues (topics) in said model, resulting in fragmentation, lack of accountability, and failure to address problems, among other. Part of this problem is originating in the nature of the BIM Collaboration Format (BCF), the default file format used for communication in BIM coordination. Partly due to its file-based nature, amount of information contained in it, how that information is presented, among other. Other file types such as spreadsheets and Portable Document Format (PDF) files are used too commonly used to share information but they are not object of study for this thesis.

1.3 Thesis Objectives

This thesis tries to solve the above mentioned problem by developing and implementing a model for an automated tracking and quality control and assurance reporting system that would, if not necessarily solve, help to mitigate this problem by turning the current issue (topic) tracking trough single BCF files into a more manageable and integrated way of storing, sharing issue (topic) information, by presenting data in a simplified manner, and storing it in a unique centralized source of truth.

This model for the implementation of the automated system would turn current BCF issue (topic) tracking reports produced with Solibri 1 in to a cloud-centric database that would allow project managers, heads of departments and/or executives to follow project specific issue (topic) evolution as well as other aggregated information, by means of a dashboard- like frontend. 1 Please note that Solibri generated reports are being used as the source for BCF files, but being this and open format, it can be adapted to other software vendors solutions.

The implementation of such issue (topic) tracking would result in stronger, more effective, project roles as keeping track of issues (topics) is a central part for leading the project. Better control results in better project lead which in turn results in better (quality) projects. Being the main idea behind this postulate, that an always accessible, cloud-centric database, acting as a software vendor independent central issue (topic) repository, coupled with a rich, specific, “project intelligence” dashboard providing at-a-glance insight and project information, with no need to access the raw information or need of specialized training, would provide project members with a better overview of the project. 3

In other words, it would act as a telemetry like system for BIM coordination providing decision makers, quick overarching project evolution information2.

2 It is of interest to note that this strengthens the idea of “… A BIM-based quality assurance process, including checking and analysis of the BIM file, provides a better overview of the building information at an earlier stage. The mere visual examination of the BIM file will make it easier to form an overall view of the project, not to mention the more detailed analyses that can be performed.” As described COBIM Series 6: 1.1 Quality Assurance; Client View (Solibri, 2012)

1.4 Research Constraints

Although this research could be applied to a broad range of BIM-related uses cases, BIM coordination as defined in 1.2 is the central topic of interest in this thesis. Authoring software, file formats and other, have been limited in scope to the following:

• Typical (in the broad sense) BIM coordination as done by Ramboll Finland.

• Parsing of BCF v.2.1 files as exported by Solibri. Solibri Model Checker 9.8 is used as reference version. None-the-less, any software capable of exporting BCF v.2.1 files should follow the standard, and thus be suitable for study.

• Implementation of the automation has been achieved using available software tools at Ramboll. The following have been used:

o Feature Manipulation Engine (FME) by SAFE Software: A data integration platform with support for various file formats, easy flow data manipulation and third-party service integration.

o Microsoft Azure: Cloud computing service for building, testing, deploying and managing applications and services.

o Microsoft Power BI: Business analytics service capable of providing interactive visualizations and business intelligence.

o Solibri: BIM quality assurance and control software capable of producing rule-based issue (topic) reports of Building Information Models.

o Python: High-level, interpreted, general purpose programming language. 4

• Data-security and/or protection in no way or form part if this thesis. Further exploration for mission critical projects is recommended.

1.5 Research approach

This thesis started by studying BIM coordination work methods. Needs, normal procedures and workflows were studied. Followed a study of the BCF file format, its contents and structure. Data extraction of relevant information was done by parsing the file using the above-mentioned tools. One main point of interest for the author was to produce a workflow that would allow for an easy, comfortable solution from a BIM Manger’s perspective. The solution presented tries to be as simple as possible to interact with, eliminating hurdles in the current workflow, not exchanging one for others. Literature regarding communication problems in the building industry, its origins and possible solutions, was studied as an integral part of the problem.

The following steps were taken:

1. Identification of typical BIM coordination workflow when dealing with issues (topics) and problem communication. 2. Communication (in the broad sense), digital communication and digital collaboration in the AEC industry was studied. 3. Identification of best method to parse BCF file format contained information. 4. Identification of best solution for centralized cloud server repository and necessary requirements. 5. Development of workflow and automatization scripts to achieve said workflow.

5

2 Regarding BIM

2.1 Collaboration in BIM Coordination

In a recent study published by the International Journal of Project Management, “eight concepts influencing the development of BIM collaboration” (Liu, et al., 2017) were identified as key issues (topics) highlighting “the importance of collaboration within project teams in BIM project delivery”. These were, as listed by the authors, (1) IT capacity, (2) technology management, (3) attitude and behaviour, (4) role-taking, (5) trust, (6) communication, (7) leadership, and, (8) learning and experience. Of these findings (2), (4), (6), (7) are directly related to the underlaying premise in this thesis. That the assumption that using BIM automatically grants the benefits of BIM is wrong, that it is important the “how” you use BIM, how you communicate those pieces of information and how you monitor, control and deal with issues (topics) like data loss, communication issues (topics) and sub-par efficiency. If BIM coordination efforts are to succeed, they require that the technology aspect of BIM does not hinder the communication part, and in general, that in the People-Process-Technology triangle none of its parts is more than the whole.

Figure 1 Prof. Aarto Kiviniemi’s take on People-Process-Technology

6

BIM coordination must be more than issue (topic) solving and must act as a communication tool that might, and does, suffer from intrinsic problems. That BIM coordination should not live isolated, restricted to the default available tools, where information is not easily shared (or only shared with those directly dealing with it i.e. BIM coordinator and diverse design discipline leads) or understood. BIM coordination information can provide of additional insights and solutions to the project and to the business in general if an integrated information management process to deal with its inherent problems is available. This last point is of great importance as valuable metadata regarding the companies, clients, specialists and design disciplines involved in the project is siloed in the file and abandoned once the project is complete.

2.2 The BIM Coordination process

As mentioned before, the BIM coordinator reviews and conducts the symphony that all design disciplines involved in the construction project play. It is worth noting that this responsibility lies in the hands of the principal designer who is legally responsible to coordinate design efforts (Maankäyttö- ja rakennuslaki 5.2.1999/132, 1999), in Finland this would be, in most cases, the Architect. This BIM coordination effort is sometimes then offloaded on to a speciality subcontractor like Ramboll. In a process known as clash detection, if any error understood as, any possible clash, conflict or problem between design disciplines appears, a report commonly referred as a “topic” is produced with a tracking software, Solibri for example. This issue (topic) is then classified according to its severity and design disciplines involved and elevated to “in progress” status. The issue (topic) is then assigned to those design disciplines that must take part in solving the problem. Design discipline leads together with the design discipline team members will work on providing a solution, according to context, experience, cost and design. This solution is forwarded as an updated design discipline model and its status is moved to “resolved” after the BIM Coordinator approves the solution provided by the design discipline leads if no secondary problems are created by the solution. In some cases, issues (topics) might be deemed no longer relevant or otherwise no longer a problem in which case its status is moved to “closed”. Issue (topic) type (TopicType), status (TopicStatus), discipline (TopicLabel), priority (Priority) and stage (Stage) are defined, modified and used to better deal with the problem and its solution. This definition list is given to all parties 7 before any BIM coordination takes place, agreed updates and modifications to these definitions may occur in subsequent meetings. This process breakdown summarizes the key points regarding BIM coordination as exposed in an interview with Sakari Tohmo (Tohmo, 2019), BIM Project Manager and BIM Coordinator for Ramboll’s BIM and Digi center Finland at the Espoo office.

For any building design and construction, different design discipline specialists, contractors, project owners and project members interact with the BIM coordinator pushing back and forth coordination issues (topics), problems and solutions for the building model. The coordination process as seen today in the modern AEC industry is therefore a complex stream of data, a collaboration effort to track, publish and solve all issues (topics) regarding the construction of a building. This collaborative effort is broad in scope, resulting in a massive amount of data flow between all parts. It is the BIM coordinator’s duty and responsibility to encourage and facilitate information sharing and distribution among all parts. It is a daunting task that will either result in a successful project, or will finish in chaos and a failed, problem-ridden building. Likewise, the BIM coordinator must be able to convey the importance of that information being shared with all parties, and this includes clients and project owners, that not necessarily, most common than not, might not be technically trained to understand the nature and importance of said information. This means that shared BIM information must adapt itself to the language of that one accessing it, so that all project parts can effectively evaluate and help facilitate the resolution of issues (topics). With this objective, periodical coordination meetings are arranged to provide with a general overview of project and issue (topic) evolution, and to deal with other matters that might not have been dealt with because of its difficulty or schedule.

This is where the problems mentioned before, come to life. There is an over abundancy of issues (topics) to deal with, their apparent complexity will vary according to the project member qualifications, on-time communication and data loss could lead to expensive after-the-fact solutions. These are only but a few of the difficulties the BIM coordinator, as well as the other parts involved in the project, must deal with. 8

2.3 Communication in the AEC industry

According to a recent study by Plangrid, and Autodesk company, published in 2018 in collaboration with the FMI Corporation, costs for more than $31 billion3 can be directly attributed in the USA, to miscommunication and poor project data. (Autodesk, 2018)

3 thousand million

Poor communication is a well-documented problem in the construction industry as shown by multiple studies, (AbdulLateef, et al., 19-22 June 2017), (Hoezen, et al., 2006) and other, and it only seems logic that part of that problem would have its equivalent or origin in the tools and methods common in the AEC industry. The McKinsey management consulting group, mentions digital collaboration and mobility and, advanced analytics as two of the five “big ideas poised to disrupt construction” (Mckinsey Global Institute, 2016) and the World Economic Forum in its Shaping the Future of Construction: A Breakthrough in Mindset and Technology (World Economic Forum, The Boston Consulting, 2016), mentions that establishing “industry standards – for communication protocols, for instance – so that automated and interoperable equipment can be applied widely to overcome the fragmented and multi-stake holder nature of construction processes” will lead to increased benefits and significant savings. It also mentions that “insufficient knowledge transfer from project to project”, “weak project monitoring” and “little cross- functional cooperation” has led to diminished productivity and performance in the sector. Communication in the AEC industry is a factual problem. 9

Figure 2 Adapted from: Shaping the Future of Construction A Breakthrough in Mindset and Technology (World Economic Forum, The Boston Consulting, 2016)

A similar trend is visible in Finland for productivity in the building sector as seen in a figure by Tilastokeskus as published by (Rakennuslehti, 2017)

Figure 3 Adapted from Tilastokeskus by Rakennuslehti: Value added labor productivity by industry

Communication standards exists in the form of BCF and Industry Foundation Classes (IFC) for example, but it is not only the existence of a communication protocol that is relevant, how that collaboration takes place and how that information is presented is critical, as “the 10 impact of BIM on collaboration is understood as a reshaping of an individual’s cognitive determinants, which influence a team member’s framing of event patterns enacted throughout project delivery” (Poirier, et al., 2017).

2.4 The need for a BIM information manager

The volume of information produced in normal BIM projects, requires “the need to explicitly manage project information and information systems” (Froese, 2010). And “..the information subprocess around the BIM managers reveals the importance of information management in BIM projects and how it is necessary to clearly redefine the connections and the interactions between the workflow and the information flow” (Boton & Forgues, 2018). To this extent in the UK, ISO19650 (superseding PAS1192) calls for a BIM information manager as minimal requirement for any BIM level-2 project (International Organization for Standardization, 2018), who’s role among other, would be to establish a Common Data Environment (CDE) to collect, manage and disseminate documentation, graphical model and non-graphical data for the project team, facilitating collaboration between project members by enabling integration and coordination of data.

In this way, the BIM information manager, in collaboration with the BIM manager and coordinator, needs to devise a solution that helps the BIM coordinator do its job. “The management of construction projects is a problem of information...” (Winch, 2010) in which not only a lack of information is dangerous but also, a failure to further classify, compile, filter and present that information in a simple, effective way, that makes sense for all parties involved, will lead to increased cost and inefficiencies.

2.5 BIM coordination data extraction

In order to better provide better access to data, improved productivity, increased project information accuracy and better understanding of that data, as opposed to only having, knowing, or worse of all, not having at all, rethinking the process by which BIM coordination data is presented becomes critical. The process for an enhanced BIM coordination data flow process must ensure easy access and digital collaboration from all parties involved in the project. It must encourage repeatability and traceability to reduce variability and waste of time and effort, not to mention confusion, misunderstandings and 11 lack of ownership. It should where possible, enhance current BIM coordination, support and support itself with digital investment efforts company wide, otherwise risking lack of traction and integration leading to lost time, effort and competitive advantage. The (Mckinsey Global Institute, 2016) report mentions that the tools that will forward the AEC industry into the next paradigm fall into three categories. (1) On-site execution, (2) Digital collaboration and (3) Back-office integration. The tool presented in this thesis falls under (2) and (3), being this last one maybe less obvious but probably the one to produce better insight metadata. Data regarding, finance, schedules, human resources and management, resource planning just to name a few will become apparent. Project specific information such as which design disciplines tend to produce bigger amounts of clashes, time required to solve issues (topics) vs. issue (topic) severity, which types of projects require more attention etc. will be accessible and liberated from their current dormant state, perpetually siloed in the BIM coordination BCF file. If properly implemented and followed it is difficult to foresee what kind of insight it might produce once data has been collated, as it is that metadata, hidden from prying eyes that is less obvious to see.

Enhanced BIM coordination data collaboration supports the efforts currently led by the BIM and Digi Center unit in Ramboll Finland. Automated quality control reporting has led to significant improvements in the quality of Building Information Models. This effort has been developed following a LEAN mentality, where new automated workflows deliver comprehensive quality overview of delivered Building Information Models with little interaction and minimal extra workload compared to the benefits they give in return. The tool proposed in this thesis follows that same mentality. The tool can enhance the current quality control system and can act as an independent source for other valuable business information, as once the data contained in the BCF files is parsed and aggregated, extracting further information is a matter of asking the right questions.

2.5.1 BIM coordination data flow

The current data flow is very simple, yet ineffective . Data produced with Solibri or any other BCF compliant software, resides in a hopefully unique file that is shared by the BIM coordinator to all relevant members, who then act on the information contained within, updating, solving or otherwise, issues (topics) in their BIM authoring software of choice. Unfortunately, often, this flow results in: 12

1. Multiple BCF files with multiple versions for every design discipline involved, complicating the labour of collaboration and issue (topic) solving leading to delays, overwork and over costs.(see figure 5) 2. Easily manipulated, erased, modified or edited issue (topic) information. With no central supervision control, traceability and accountability of the issues (topics) reported are lost. 3. Issues (topics) are tracked but no other metadata regarding the project is produced. 4. Once the project finalizes, the BCF file resides in a project folder “never to be seen again” . Thus, valuable data becomes siloed and no valuable insights on the business is gained.

Figure 4 Current BCF data flow

This means that because of the static nature of the file, no other points of interaction without heavy time investment are produced. Less points of contact with the data equals to smaller business opportunities. 13

Figure 5 Example of the current file exchange

The proposed automatization (see figure 6) would have the following data flow. The graph shows how the ingestion, storing and serving processes data would occur. Further detailing of this process can be read in part 4. It would represent the “equivalent” of each BCF file transaction as seen in figure 4, but it would only need to happen once per coordination effort not multiple times. It would allow too, for multiple points of contact with the data. Figure 7 shows how the proposed file exchange would look like. 14

Figure 6 Proposed data flow

Please refer to part 4.2 for a technical data flow chart of the above-mentioned process.

In this proposed data flow Solibri BCF files, or any BCF compliant file, are read and parsed with FME or by a Python script. This FME flow(s) or Python script(s) (discussed later) would parse the file extracting all relevant information. This information would be reworked into the JSON format. Then the customer project database would be queried for previously existing data, storing and updating said data. Once this data would reside in the cloud server, and having learnt how the BCF files work, anything is possible. The proposed centralization and automatization (see figure 7) would have the following benefits regarding points of contact with the data:

1. Querying the database (dB) to export relevant data in spreadsheet format. 2. Presenting this data thought a web platform. 3. Creating a website that would allow for online data management and interaction. 4. Visualize, analyse and share valuable information though business intelligence application such as PowerBI. 5. Regenerate project data into a BCF file to export and share if needed. 6. Allow for integration with BIM authoring software (third party or native).

15

Figure 7 Proposed file exchange

Direct benefits of the new interaction are: As seen By the BIM coordinator 1. Reduced workload: a. Only one coordination effort per IFC model update is needed. b. No need for multiple BCF file deliveries as all parties can access the central. c. Less updates and sources result in less last-minute changes. 2. Stronger BIM coordinator role i. Diminished effort waste allows the BIM coordinator to concentrate in real model coordination. ii. Higher issue (topic) traceability empowers the coordinator role. iii. Higher accountability when having a unique source of truth, empowers the coordinator role.

As seen by the BIM design discipline specialists: 1. Reduced fragmentation results in smaller chance of rework due to old sources. 2. Always accessible source of truth. a. Teams can manage their commitments inside their areas of accountability with no need for delays, helping planners in staying on track with the work. 16

b. No need for Requests for Information (RFI) as source is available.

As seen (broadly) by the end user, client or third parties : 1. Better sources of information result in better models. 2. Smaller cost due to rework and mistakes. 3. Better workflow overall smooths all other operations by releasing human resources. 4. More points of information provide more business intelligence. 5. Increased trust and reduced data loss provide better business intelligence. 6. Simplified data results in better understanding of the whole process.

Note that the end user might not always be the client as seen by the contractor, in this case Ramboll. Ramboll in many occasions works for a third party (i.e. acting as a subcontractor), that might be the principal designer or otherwise. In broad terms there is much to be gained by all parties involved in having an issue free design.

As seen by contractors and other third parties: Basically, the benefits BIM brings to the table. Better models that result in better value added, coupled with better insights into their business and better planning by liberating valuable human resources.

Recapping, more points of contact with the data are created. Better insight into historical data is achieved. Simplified data extraction is at hand. Data synchronization, tracking and ownership is fulfilled.

17

3 File Formats and Tools

3.1 The BCF File Format

The BIM Collaboration Format or BCF, is an XML-schema (Extensible Markup Language) based, human readable file format developed by (now part of Trimble) and Solibri and introduced in 2009, designed to enable exchange of data between Building Information Models and software tools using an open standard that would later be adopted by buildingSMART for the AEC industry. BCF allows for workflow communication and can be connected to IFC files. Initially developed as a file-based implementation, a server-based implementation is available through the BCF-API for full applications via a RESTful web interface.

File and folder structure as well as file content can be found in its complete form in the BCF-XML definition page (buildingSMART, 2017). Of the content available in the definition, Solibri as of current version, exports only a selection of the optional information. Supported content table is omitted here because of size constraints, please refer to Table 1 in the Appendix for a detailed list of supported elements.

Being an XML-schema compliant format, BCF files have a tree structure that contains markup and content, tags, elements and attributes that build up the keys that encapsulate the information in a human readable text form. Tree structure depiction for BCF Markup and Visualization Information can be found in the Appendix for clarity and size constraints.

All this information regarding issues (topics) is then placed in a folder under the GUID assigned to the topic case. This GUID is unique and identifies the issue (topic) all along the process, since it is opened until it is closed. Issue (topic) folders are then zipped according to the .BCFZIP encoding guide described in the documentation. This zipped folder constitutes the BCF document that can be shared, read and manipulated as needed by Solibri or any other BCF compliant software. If unzipped and opened, files present a structure as seen in Figure 8.

18

Figure 8 BCF folder and content

An example of markup.bcf and visualizationinfo.bcfv contents can be seen in the Appendix.

The structured text-based nature of these files allows for parsing and document information extraction with a moderately low effort. Information contained in the document can then be reformatted to better suit the needs of the workflow. In this case the intent is to upload this information into a database where it would then be saved and served. The available option for storing data is a database in Microsoft’s Azure cloud computing service.

3.2 The JSON file format

JavaScript Object Notation (JSON) is a standardized human readable, language independent data format derived from JavaScript and commonly used for asynchronous browser-server communication. JSON is intended as a data serialization format and it differs from XML in that it does not separate data from metadata and in that it uses a key- value mapping to address information, where in XML this addressing happens on nodes.

The reasons for choosing the JSON format are:

1. It is an open standard supported by all available tools. 2. It is easy to read. 3. It is easy to validate. 4. Can represent any kind of data. 5. It is widely supported and natural language for the web. 6. It can contain international characters such as Nordic åöä symbols. 7. JSON has no schema and couples nicely with NoSQL databases. 19

Parsing the XML to format the JSON document is somewhat trivial once all information elements have been found in the BCF file. The JSON formatting proposed would take the same attributes and parameters in the BCF and convert them one to one to their JSON key-value equivalent. This greatly simplifies the procedure as it follows the same standard definition in the BCF guide and only modifies its structure, that is, how the information is presented, not its definition.

Although the method proposed would only search and translate text data as exported by Solibri it would be of interest to develop a template that would search for all BCF information and leaving empty value fields in the JSON document as needed. With this, possibility to transfer any BCF compliant file produced by any software into a database ready document. This possibility has been tested successfully with the current proposal with the [Labels],[Priority] and [ReplyToComment.Guid] as means of enhancing BCF data.

Two problems arise with this proposal due to how Solibri is exporting data.

1. As seen in Table 1, project.bcfp is not currently exported. This creates a vacuum of information that would be needed when tying topic information to a project. 2. Client information has no place inside the BCF file therefore it cannot be linked to the project.

This means that once data is extracted there is no way to identify client or project information thus leaving data “orphaned” in the database.

The current workaround is to obtain project name from the BCF filename. In the case that the filename would be relevant to the project and that it would not be changed during the project lifetime this would suffice to link BCF data to a project when serving it from the web. This is in any case just a patch. The BFC-API does contain a definition for project identification with web services that the proposed workflow adopts as [project_id] and [name].

A proposal for update for this workaround is discussed under 6 Conclusion and Further Steps.

An example of the current JSON formatted file follows. 20

{ "project_id" : " 6921724f-6fa6-4d0f-ae93-7bc977751521", "name" : "Project Name and Number" "Topics" : [ { "Labels" : [], "xml_topic_guid" : "fea569e9-b4ec-4b87-94e0-666465f197f6", "Title" : "Topic Title as given by BIM coordinator", "TopicType" : "TopicType", "TopicStatus" : "TopicStatus", "Priority" : null, "Index" : "1", "CreationDate" : "2019-01-16T10:41:23+02:00", "CreationAuthor" : "[email protected]", "ModifiedDate" : "2019-04-29T14:44:53+03:00", "ModifiedAuthor" : "[email protected]", "AssignedTo" : "Assigned to text", "Comments" : [ { "Date" : "2019-01-16T10:41:59+02:00", "Author" : "[email protected]", "Guid" : "e0dfaac1-0999-4454-adf7-ae565c11f0fb", "ReplyToComment.Guid" : null, "Comment" : "Comment text 1" } { "Date" : "2019-01-16T17:00:00+02:00", "Author" : "[email protected]", "Guid" : "e0dfaac1-0999-4454-adf7-ae565c11f0fb", "ReplyToComment.Guid" : null, "Comment" : "Comment text 2" } ] } ] }

3.3 MS Azure and Cosmo DB

The available option for data storing is a database in Microsoft’s Azure cloud computing service. Azure provides Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) supporting multiple programming languages, tools, frameworks etc. This solution frees its clients from having to deal with the associated costs and problems of supporting their own platform. Currently this service is integrated companywide and thus reduce time, cost and human resources efforts to use. Server set up effort was aided by Joonas Kiiskinen, developer at Ramboll.

The database of choice is Cosmo DB, a schema-agnostic NoSQL database. It implements a subset of SQL SELECT on JSON documents does providing a simplified container-like 21 system to store information. BCF information would be extracted, parsed and formatted into the JSON file format, and uploaded to Microsoft’s services. This choice allows for good compatibility among file formats and has proper support with the other tools described. The NoSQL nature of Cosmo DB provides more flexibility than relational databases storing data in a key-value data structure that perfectly matches JSON files. The flexibility it provides regarding data structure allows for easy straight forward update of contents if at any time Solibri commences of ceases support for different information elements, or to mix-match BCF data from different software tools. Likewise, it needs of no change of the data structure if it would become necessary to enhance the BCF data with information contained in other sources.

A mayor problem of this DB is that storing image data as contained by BCF files, is costly in terms of queries and infrastructure costs. Ideally this data would reside in a blob storage and would be retrieved simultaneously. This path has not been explored by this thesis for it would require of an important time and effort investment and for being partly out of scope.

3.3.1 Data separation and security

In a multi-project, multi-client enterprise environment it is imperative that a secure way to store and serve this information to relevant project members is devised. None of these are part of the scope for this thesis. It is highly encouraged that this matter is investigated thoroughly before deploying the proposed solution in a “live” environment.

The following measures have been taken for security. BCF information is stored in a project “container” unique for said project. Read and write operations to the database require of a passphrase. Only one operation is permitted to each solution.

3.3.2 Creating a service

The steps required to create a virtual server are limited to the IT specialist with enough administrative rights to create said service. They are as follows:

1. Creation of an Azure Cosmo DB service. 2. Filling Subscription information regarding: 22

a. Resource Group. b. Instance Details and API. c. Geo-redundancy and Multi-region write. 3. Filling of network information.

To create an information container, it is required to:

1. Click “+ New Container” filling information as required. (see Figure 9) 2. Once the container is created it will wait for data to be sent or read. (see Figure 10) 3. Keys (Read-Write and only-Read) can be fetched from icon (see Figure 11)

Figure 9 Add new Container

23

Figure 10 BCF data container

Figure 11 Key location. Data was edited for security reasons

3.4 FME

FME (Feature Manipulation Engine ) is a data integration software platform that allows for easy data manipulation though a visual programming workflow. It is ideal when working with large data sets and multiple formats.

In FME, nodes are used to read, transform and write data. Data is routed through channels passing relevant information from one to another. Data can be collected from several sources and formats, collated, manipulated, enhanced or otherwise written into a third format. It integrates with Microsoft Azure Cosmo DB and easily reads and writes both XML and JSON, which makes it ideal for the proposed solution. The current solution works from FME Desktop standalone solution.

The proposed data flow reads a BCFZIP, unzipping and searching for all markup.bcf files. Information contained is then parsed and formatted as in the example shown in 3.2. This data is then saved into a JSON file ready to be uploaded. See figure 12 for the complete 24 workspace required to convert BCF data to JSON. A technical description of the parsing process required for this purpose is detailed in chapter 4.2.

3.4.1 Uploading to the server

Figure 12 Complete BCFZIP to JSON FME workspace

Further information regarding the inner works of the workspace is provided under part 4 Technical solutions.

The next step once the JSON file has been generated is to upload the data to the server. In order to upload the data to the database, a writer connection must be stablished.

Data needed to connect to the database is as follows:

1. Cosmo DB Account ID: URI 2. Master Key: Primary Key 3. Database: Database name

Note: Uniform resource Identifier (URI) can be found in the server “overview” panel or at the “keys” panel where the primary(master) key is also found.

These steps are detailed as screenshots of the process in FME in figure 13 with figure 14 showing the complete workspace. 25

Figure 13 Establishing a Database Connection in FME

26

Figure 14 Complete Cosmo DB writer FME workspace

The resulting workspace collection parameters feature operation should be set to UPSERT in order to ensure proper field updates. [Collection Name] corresponds to the collection container name as established in the server.

Running the workspace will result in uploading the data to the server where it will reside until it is deleted or manipulated.

If the workspace is supplied with new data, it will be inserted in the database (see figure 15). Data that was previously contained and that has been modified will be updated. Care should be taken not to set Feature Operation to drop or all contained data will be deleted and replaced by the new data. Care should be taken.

Figure 15 Updated data

27

3.4.2 Downloading from the server

The process to download data from the server works in a similar way. In this case JSON is downloaded and formatted (see fissure 16).

Figure 16 Complete Cosmo DB to JSON FME Workspace

Data is reconverted into XML compliant data and then fanned out using [xml_topic_guid] as separation parameter and saved into a BCFZIP file as shown in figure 17.

Figure 17 JSON to XML FME Workspace

Further information regarding the inner works of the workspace is provided under part 4 Technical solutions.

The connection to Cosmo DB happens in the same manner as before. FME will save the connection parameters that may be reused in this situation. There is no need to create a new connection to the database server.

3.4.3 Current situation with Cosmo DB and FME

During the writing of this thesis, Microsoft announced it would drop support for non- partitioned databases in Cosmo DB (see figure 18). More information can be found on Azure’s documentation (Microsoft, 2019). Partitioning in large clustered systems reduces 28 the likelihood of failure. Also, non-partitioned databases do not scale well and should be avoided when possible.

Figure 18 Microsoft's announcement

Without going deeper into document-based partitioning and indexing, this “unfortunate” event means that FME 2018 is no longer capable of read and write operations with Cosmo DB. Support for partition keys has been announced for FME 2019 (SAFE software, 2019). More information regarding this issue can be found in FME documentation This plainly means that as of today, without access to FME 2019 the above described workspaces do not work. Please refer to in part 6: Conclusion and Further Steps for further details. In any case this only poses a minor inconvenience as data could still be uploaded by other means to the server.

3.5 PowerBI

Once data resides in the server, it is possible to serve it to Business Intelligence platforms such as PowerBI. PowerBI, a business analytics platform from Microsoft. Power BI allows to read data from several sources, and to easily create information visualizations regarding said data. More information regarding PowerBI can be found in its documentation pages. (Microsoft, 2019) 29

3.5.1 Reading Cosmo DB data

It is worth noting again that data access and security has not been a part of this thesis yet remains an integral part for this to develop as a business case. PowerBI allows for user access thought Data Analysis Expression Language (DAX) (see figure 19) function USERPRINCIPALNAME() (Microsoft, 2019). A proper client to project reference should be developed if the Cosmo DB server is to hold data for more than one client. Please refer to part 6 Conclusion and Further Steps for more information.

Figure 19 DAX user guide

PowerBI can access data directly from any Cosmo DB by providing URI string. Unfolding the data in tabular form is a straight forward step (see figure 20). Just by selecting an Azure Cosmo DB as new data source in PowerBI it is possible to synchronize the dashboard to the previously extracted BCF data. 30

Figure 20 PBI Data navigation

Ideally all data provided to PowerBI would be populated and would follow a standard. In this regard there are two problems that would require further attention.

1. Solibri does not populate all data fields by default. This means that data rows will appear as “null” objects. This is far from optimal, especially in rows where populated data would live side to side with null values.

Figure 21 Null data

2. Populated strings when done by hand tend to have a wide variation in values that otherwise would be the same. For example, labels like “Arch”, ”ARCH”, ”Architecture” are not the same even if they would be refering to the same concept. This will cause a problem when analyzing data. Please refer to in part 6: Conclusion and Further Steps for details on how to address this problem. 31

Figure 22 Example of Collated data

3.5.2 Data presentation and analysis

Once data has been properly conditioned and parsed it is possible to analyses to provide further valuable insights, both on the project and on the AEC industry.

Note: This example does not contain real data. It is only to be used as an example of the applications this thesis .

Data produced in PowerBI can also be distributed though digital collaboration channels and in mobile devices allowing for a better, more fluid communication between parts. This is of vital importance as other research efforts, mentioned in the initial parts have shown.

In figures 24 to 27, information regarding the evolution, type and quantity of issues (topics) is presented in a cohesive manner. Information has been structured in such way that allows for a much simpler interaction and overview. This method compared to the normal Excel spreadsheet that Solibri exports (see figure 23) is clearly much more straight forward and intuitive. In this way both non-technical clients and managers can obtain a better understanding of the project dynamics. Solibri’s Excel export still has place among the design discipline specialists but there is much to gain in terms of communication for the AEC industry using this type of solutions as PowerBI allows for information exploration in an intuitive way. 32

Figure 23 Solibri Excel export (blurred for privacy reasons) 33

Figure 24 Example of analysed data 34

Figure 25 Information can be analysed exploring though the data 35

Figure 26 Further exploring data 36

Figure 27 Mobil device data sharing Note: Figures 24 through 27 provide just an example of how much more comprehensible BCF data could be.

3.5.3 Accessing BCF data directly from PowerBI

Being BCF data XML contained in a zip file it was possible to extract file contents directly in PowerBI by unzipping the file in memory. Never-the-less DAX limited capabilities might not be the best to attack this problem.

Figure 28 DAX extracted BCF data

37

A much cleaner option is to extract that data and read the XML contents. PowerBI handles accessing XML data natively. An optimal solution would be to create a PowerBI connector that would read BCF data directly from the server, but that approach has not been pursued in this thesis. This solution could be of use in cases where proper standards to fill in BCF data are followed. Enhancing data could be then achieved by collating information from other sources assuming proper relations could be formed. An example of how to achieve this in the Python programming language is provided in chapter 4.2. 38

4 Technical Solutions

4.1 The OpenBIM initiative

Much of this thesis would not be possible without previous efforts led by the OpenBIM initiative. This collaboration open standard led by buildingSMART has as a goal better coordination in BIM projects by promoting open collaboration workflows. This thesis does not implement the definitions in the BCF standard of BCF-API but uses them to achieve the desired results.

4.2 Parsing and data extraction

The following figure shows how the ingestion, storing and serving data processes from a machine perspective would occur, as proposed by the author.

Figure 29 Data parsing

Ideally if the whole intent would be to provide with a native client to deal with BCF data in the BIM authoring software, the complete BCF-API definition should be implemented allowing for a better, more compact data flow. 39

4.3 Data wrangling and code

Data contained in the BCF file, as mentioned previously, follows the XML standard and needs to be parsed and collated into JSON before it can be uploaded to the server. This is done with the FME scripts shown above and attached to the thesis. Follows a description of how that XML data is mashed up to produce a unique, self-contained source of information for the project.

The code needed to properly template the source data is as follows:

ROOT { "project_id" : fme:get-attribute("fme_basename"), "Topics" : [ { fme:process-features("MAIN","fme_feature_type",fme:get-attribute("fme_feature_type")) } ]

} MAIN { "Labels" : [fme:get-attribute("Labels{0}"),fme:get-attribute("Labels{1}"),fme:get- attribute("Labels{2}"),fme:get-attribute("Labels{3}")], "xml_topic_guid" : fme:get-attribute("xml_topic_guid"), "Title" : fme:get-attribute("Title"), "TopicType" : fme:get-attribute("TopicType"), "TopicStatus" : fme:get-attribute("TopicStatus"), "Priority" : fme:get-attribute("Priority"), "Index" : fme:get-attribute("Index"), "CreationDate" : fme:get-attribute("CreationDate"), "CreationAuthor" : fme:get-attribute("CreationAuthor"), "ModifiedDate" : fme:get-attribute("ModifiedDate"), "ModifiedAuthor" : fme:get-attribute("ModifiedAuthor"), "AssignedTo" : fme:get-attribute("AssignedTo"), "Comments" : [ { fme:process-features("SUB","xml_topic_guid",fme:get-attribute("xml_topic_guid")) } ]

} SUB { "Date" : fme:get-attribute("Date"), "Author" : fme:get-attribute("Author"), "Guid" : fme:get-attribute("Guid"), "ReplyToComment.Guid" : fme:get-attribute("ReplyToComment.Guid"), "Comment" : fme:get-attribute("Comment") }

This same work could be done in any other programming language. A similar approach was taken using Python just to demonstrate its feasibility. The following code unpacks the 40

BCFZIP file in memory and searches for any markup.bcf file. Information is then parsed and exported as JSON containing all relevant data as an output. import zipfile import xml.etree.ElementTree as etree import fnmatch import io import json from io import StringIO from lxml import etree from pathlib import Path filename = Path("C:\\example.bcfzip") targetfile = Path("C:\\example.json") issues = [] data = {} with zipfile.ZipFile(filename, 'r') as zfile:

for name in zfile.namelist(): if fnmatch.fnmatch(name, '*markup.bcf'): issues.append(zfile.read(name))

data = {} data['issue'] = [] for issue in issues: root = etree.fromstring(issue)

for parent in root:

if parent.tag == 'Topic':

data['issue'].append({ "Guid" : str(parent.get('Guid')), "TopicType" : str(parent.get('TopicType')), "TopicStatus" : str(parent.get('TopicStatus')) })

with open('data.json', 'w') as outfile: json.dump(data, outfile) print("done")

Cosmo DB contains bindings that would allow to code the upload to server functions as well as to query the database if there would be interest in developing a full featured application. Developing a full application for this purpose is out of the scope for this thesis and has not been explored further than producing a proof of concept to parse BCF data.

More information regarding Cosmo DB API bindings for Python and other programming languages can be found in the official documentation (Microsoft, 2019). 41

Likewise, reformatting JSON to XML compliant BCF data needs templating data as follows: {fme:process-features("HEADER")} {fme:process-features("TOPIC")} {fme:process-features("COMMENT")}

{fme:get-attribute("Title")} {fme:get-attribute("Index")} {fme:get-attribute("TopicStatus")} {fme:get-attribute("TopicType")} {fme:get-attribute("CreationDate")} {fme:get-attribute("CreationAuthor")} {fme:get-attribute("ModifiedDate")} {fme:get-attribute("ModifiedAuthor")} {fme:get-attribute("AssignedTo")} {fme:get-attribute("Description")}

{fme:get-attribute("Date")} {fme:get-attribute("Author")} {fme:get-attribute("Comment")} {fme:get-attribute("Viewpoint.Guid")} {fme:get-attribute("VerbalStatus")} {fme:get-attribute("Status")}

42

5 Presentation of Results

5.1 Data Mining BCF Files

As a complimentary effort, data obtained through production BCF files was analysed and is presented below. Data shown here has been anonymized for confidentiality and privacy reasons. By no means it was an exhaustive analysis, but it throws some interesting patterns.

Note: Having a population size of 45 projects a sample size of 41 would be needed to provide a confidence level of 95% with a margin of error of 5%. No better than 18% confidence can be obtained with this sample size. Therefore, this analysis cannot provide statistical significance for the whole BIM coordination effort. Also, the time span for the selected projects is small, barely one year, which throws off any assumption that could be made regarding work evolution or work patterns.

Information has been classified in three interest groups. Topics, regarding BCF topic information. Comments, regarding BCF comment information and, Disciplines & References, regarding design discipline and linguistic refences.

For this, twenty-five BCFZIP files overarching five BIM construction projects with a total size of 237Mb of data were analysed. 3825722 items of information where parsed, resulting in 835 Topics and 671 comments being explored. The project fields where Hospital, Commercial, Industrial and two Infrastructure. They range from full collaboration from project beginning to end as well as late stage limited collaboration effort. All projects come from the backlog of completed projects by the BIM unit’s predecessor in the last two fiscal years since the date of this thesis but barely span over a 12-month period.

Topic and comment information

Of the 825 topics inspected, 66% had “Error” as a topic type. Solely one item had “Warning” as topic type. The rest had no information at all, or the information holder was not there for some reason or it was of other category. The official BCF specification considers at least on 4 possibilities. Regardless of what the holder could contain this seems a clear sign of the communication problem the AEC industry is riddled with. Using exclusively one label, probably does not clearly define the wide variety of problems 43 occurring in during BIM coordination. On the other hand, the quick pace associated with projects might cause a tendency to concentrate on the real problems and thus limit communication interaction.

Topic status in its vast majority is “Open”. Considering that nearly all projects analysed where closed, this seems to indicate that the BIM coordination workflow never really takes the steps to conclude the information exchange. It might very well be that all those topics that were never labelled as closed or corrected, were fixed and present no longer an issue in the model but if so, it is difficult to say as there is no actual confirmation of that happening. The BCF definition does contain enough information to represent all states an issue could be at, but clearly, they are not being used to its full potential. It could also happen that there are later stage BCF files containing those fixes and that the file never found its way to the server. Topic descriptions have a median of 99 characters and average of 109 characters and at a closer look they show a reasonable use of the description field. On the other hand, topic titles are on average are longer, 130 characters and when looking at the actual information in them, it is seen that in some cases, title containers have been used to fill in information regarding tagged elements using up to 290 characters.

Comment information shows a similar trend, with a mayor proportion of the comments never changing verbal status. Likewise, status for comments is being defined as “Error” or it is not being specified at all. The median length for comments is of 72 characters and a closer look shows comment fields are used in a reasonable manner.

There would be a need to define an official standard to be used when filling in this data to achieve better results, otherwise data as in its current form is difficult to clean thus complicating any mining effort. 44

Figure 30 Topic information 45

Figure 31 Comment information 46

A deeper look into the data showed that a large part of the information was being conveyed using the comment fields. Unfortunately, they were not used to convey status or type information but more practical information regarding the issue (topic).

Disciplines and references

Comment and title fields have been used to convey the bulk of the information meanwhile ignoring other flag fields. In an important part of the topics, the title field was used as a comment field, ignoring the description field completely in some cases. In order to obtain better insights, title and comments fields where parsed to search for keywords that could provide a deeper look into the information.

Approximately two-thirds of the issues are tagged as related to structural or architectonical parts of the model. The third left corresponds to electrical and plumbing. 55% of the comments relate to construction elements (beams, slabs, pillars, foundations and roofs) the 45% left mention technological elements (cables, tubes and channels, windows, doors and booths). A deeper look into topics with unspecified topic status showed that they refer in a higher proportion to electrical (ca. 60%) and plumbing (ca. 70%). It could be that the tools used for BIM coordination in those design disciplines, do not facilitate filling in information. 47

Figure 32 Discipline and language analysis 48

Figure 33 Mentions with unspecified or other Status 49

Work effort

Another interesting possibility of being able to mine data that otherwise might get lost, is to obtain data regarding work patterns. The information analysed here was not linked to other model information such as model size, construction type etc. nor it is linked to macro-economic data, or workforce information so it is difficult to infer conclusions. None-the-less a couple patterns showed up.

1. Different coordinators work more effectively at different moments. 2. A closer look into project specific data showed a tendency to flag issues early in the project phase or in the late stages.

No conclusions have been proposed regarding this data as they do not provide enough statistical significance and they are just presented as patterns seen in the limited data and as example of what is possible by mining BIM coordination data.

50

5.2 Summary of general results

Results are presented as a list of items in no order of importance:

1. BCFZIP files are a zipped archive of folders containing XML information. 2. .bcf files are structured following the buildingSMART BCF definition. 3. Solibri exported BCF files only contain a portion of the defined data in the BCF definition. 4. Solibri BCF data can be enhanced if needed from other sources but it will not be seen in SMC. 5. FME requires of proper schema definition when dealing with varied data. This means that if support would be to be provided to other BCF definitions rather than 2.1. a similar effort needs to be done in order to support it. 6. Likewise, it will probably be problematic to give support for different BCF readers. 7. There is a lack of standardization in how data is filled or a failure to fill in properly data. This vastly complicated data mining. 8. Valuable information can be implied and obtained by mining BCF files.

9. Maintaining complete BCF information including images is somewhat problematic with FME and was not managed during the efforts for this thesis. Implementation should come in the form of a coded app following Microsoft’s documentation (Microsoft, 2019) or by integrating Microsoft Blob storage server into the solution.

10. Otherwise uploading document information and presenting it with the tools provided works seamlessly.

51

6 Conclusion and Further Steps

Parsing through the BCF information is a tedious process that can be hugely simplified by applying automated workflows. This automation needs to look at all single elements contained, or it would not be able to work properly as a server like substitution. Unfortunately maintaining complete reference to the BCF definition is more complex than what can be obtained easily from FME for BIM server like function. It would also mean that third parties would require having access to FME. FME Server and FME Cloud would, probably, be a much better solution in this case as they would allow to share workspaces easily without the need to share FME scripts, but this cannot be asserted as they are not part of the currently available tools and no exploration of their capabilities has taken place.

Considering problems with images, changes in data and considering the effort required to maintain compatibility it would be of much greater interest to, to integrate available tools like BIM-server (van Berlo & Krijnen, 2014), an open source effort by the Netherlands Organization for applied scientific research (TNO) and the Eindhoven University of Technology into the current workflow. This would provide software vendor independence but would require of a bigger internal effort to keep up and maintain. No exploration of BIM-server or its capabilities has been done.

Regarding the main objective of this thesis, to automate the extraction of BCF data to enhance QC/QA communication in BIM coordination, it has been shown that not only it is possible, but that valuable information can be obtained. Extracting and enhancing BCF data would increase information accessibility with a minimal effort. It is worth noting that the insight obtained from extracting BIM coordination information would only be as good as the information set in, therefore the following steps are recommended, in order.

1. To establish the market feasibility of providing this data. To this end a proper, well formatted, sufficiently big project should be used as an example. Automating processes is time consuming and bigger gains would be obtained from “more data” which means bigger BIM projects.

2. To standardize and enforce how data fields are being filled. Engaging in a data recollection if no data is generated or if the data is of bad quality is futile. 52

3. If support for cloud served BCF information is desired in PowerBI dashboards, exploration of user access and data security must be done. Availability to FME 2019 is required. If no support for BCF server like compatibility is required, data extraction and enhancing could be greatly simplified.

4. To actively gather and prepare BIM coordination data currently produced and to collate it with other business information for further insight discovery.

5. To explore the possibility of further extracting and collating information from the IFC relevant files for richer insights. Collating SharePoint document information and other economic data would be beneficial.

6. To explore the feasibility of using open BIM-server in the current workflow unless using a commercial provider of such services is found to be the best business solution.

7. To actively search collaboration with Solibri for better BCF support and to devise a method to enhance BCF information.

Lastly, it is of interest to extend point number two: Even if the information contained in BCF files relates to technical problems, it is stored and presented as text. In other words, technical information is communicated in language form, not mathematically. This means that linguistics plays here a bigger role than mathematical and numerical expressions. This presents a problem that should be addressed from the very beginning if we are to solve communication problems when using BCF files. Adopting the use of the priority tag, currently not supported by Solibri would help to convey important information in a numerical way, also a well-defined standard could fill in the “missing gaps” and help to convey better information. This standardization effort should be of highest priority.

53

7 Bibliography

AbdulLateef, O., Seong, Y. T. & Lee, F. K., 19-22 June 2017. Roles of communication on performance of the construction sector. Primosten, Croatia, Elsevier.

Autodesk, 2018. New Research from PlanGrid and FMI Identifies Factors Costing the Construction Industry More Than $177 Billion Annually. [Online] Available at: https://www.plangrid.com/press/fmi/ [Accessed 01 08 2018].

Boton, C. & Forgues, D., 2018. Practices and Processes in BIM Projects: An Exploratory Case Study. Advances in Civil Engineering, 6 August.Volume 2018. buildingSMART, 2017. BCF Documentation. [Online] Available at: https://github.com/buildingSMART/BCF- XML/tree/release_2_1/Documentation [Accessed 26 08 2019].

Business Dictionary, n.d. [Online] Available at: http://www.businessdictionary.com/definition/information-silo.html [Accessed 19 08 2019].

Davies, K., Wilkinson, S. & McMeel, D., 2017. A review of specialist role definition in BIM guides and standards. Electronic Journal of Information Technology in Construction, Volume 22, p. 185–203.

Froese, T. M., 2010. The impact of emerging information technology on project management for construction. Automation in Construction, August, 19(5), pp. 531-538.

Hoezen, M., Reymen, I. & Dewulf, G., 2006. The problem of communication in construction, Enschede, The Netherlands: ResearchGate.

International Organization for Standardization, 2018. Organization and digitization of information about buildings and civil engineering works, including building information modelling (BIM). [Online] Available at: https://www.iso.org/standard/68078.html [Accessed 26 08 2019].

Jones, S. a. H. B., 2012. The Business Value of BIM in North America.. s.l.:Smart Market.

Langar, S. & Criminale, A., 2017. Challenges with BIM Implementation: A Review of Literature. Seattle, Wa., Associated Schools of Construction.

Liu, Y., Nederveen, S. & Hertogh, M., 2017. Understanding effects of BIM on collaborative design and construction: An empirical study in China. International Journal of Project Management, 35(4), pp. 686-698.

Maankäyttö- ja rakennuslaki 5.2.1999/132, 1999. [Online] Available at: http://finlex.fi/fi/laki/ajantasa/1999/19990132 [Accessed 18 09 2019]. 54

Mckinsey Global Institute, 2016. Imagining construction’s digital future. [Online] Available at: https://www.mckinsey.com/industries/capital-projects-and- infrastructure/our-insights/imagining-constructions-digital-future [Accessed 17 08 2019].

Microsoft, 2019. Azure Documentation. [Online] Available at: https://docs.microsoft.com/en- us/dotnet/api/microsoft.azure.documents.client.documentclient.createattachmentasync ?redirectedfrom=MSDN&view=azure-dotnet#overloads [Accessed 26 08 2019].

Microsoft, 2019. Cosmo DB Documentation. [Online] Available at: https://docs.microsoft.com/en-us/azure/cosmos-db/ [Accessed 26 08 2019].

Microsoft, 2019. DAX Guide. [Online] Available at: https://dax.guide/userprincipalname/ [Accessed 18 06 2019].

Microsoft, 2019. Migrate non-partitioned containers to partitioned containers. [Online] Available at: https://docs.microsoft.com/bs-latn-ba/azure/cosmos-db/migrate- containers-partitioned-to-nonpartitioned [Accessed 19 08 2019].

Microsoft, 2019. PowerBI Documentation. [Online] Available at: https://docs.microsoft.com/en-us/power-bi/ [Accessed 2019 08 26].

Pellinen, P., 2016. Developing design process management in BIM based project involving infrastructure and construction engineering, https://aaltodoc.aalto.fi/handle/123456789/19964: Aalto University.

Poirier, E., Forgues, D. & Staub-French, S., 2017. Understanding the impact of BIM on collaboration: a Canadian case study. Building Research and Information, 45(6), pp. 681- 695.

Rakennuslehti, 2017. Rakennusalalla työn tuottavuus ei ole kasvanut 40 vuodessa – onko allianssista tai leanista apua?. [Online] Available at: https://www.rakennuslehti.fi/2017/09/rakennusalalla-tyon-tuottavuus-ei- ole-kasvanut-40-vuodessa-onko-allianssista-tai-leanista-apua/ [Accessed 15 09 2019].

SAFE software, 2019. FME Documentation. [Online] Available at: https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_ReadersWriters/do cumentdb/format_parameters_w.htm [Accessed 08 2019].

Solibri, I., 2012. COBIM: Common BIM Requirements. s.l.:COBIM Project.

Stephen A. Jones, Harvery M. Bernstein, 2012. Smart Market Report: The Business Value of BIM in North America. Design and Construction Intelligence. 55

Tohmo, S., 2019. BIM Coordination in Ramboll Finland [Interview] (01 06 2019).

University of Cambirdge Dictionary (on-line), n.d. [Online] Available at: https://dictionary.cambridge.org/dictionary/english/information-overload [Accessed 17 08 2019]. van Berlo, L. & Krijnen, T., 2014. Using the BIM Collaboration Format in a server based workflow. s.l., Elsevier.

Winch, G. M., 2010. Managing Construction Projects: An Information Processing Approach. 2nd ed. Hoboken, NJ, USA: Blackwell Publishing.

World Economic Forum, The Boston Consulting, 2016. Shaping the Future of Construction: A breakthrough in Mindset and Technology, s.l.: World Economic Forum.

1

Appendices

A1 Definitions

Information overload: A situation in which you receive too much information at one time and cannot think about it in a clear way (University of Cambirdge Dictionary (on- line), n.d.)

Information silo: Any information management system that is unable to communicate with other information management systems, even if otherwise related or within the same organization. This can be by design or by choice for a variety of reasons, though nowadays generally frowned upon because of the lack of accessibility and implied limitations to productivity. (Business Dictionary, n.d.)

GUID (UUID): 128-bit number used to identify information in computer systems. When generated following the standards, the GUID are for all practical purposes a unique identification tag with negligible probability of duplication.

UPSERT: Database relevant operation that Updates information previously contained and inSERTs any data not contained.

2

A2 Tables

Table 1 BCF information as exported by Solibri Content Status Information Support File: bcf.version Exported Yes File: project.bcfp Not exported ProjectID No Name No ExtensionSchema No File: markup.bcf Exported Yes Header IfcProject Yes IfcSpatialStructureElement No isExternal Yes Filename Yes Date Yes Reference No Topic Guid Yes TopicType Yes TopicStatus Yes ReferenceLink No Title Yes Priority No Index Yes Labels No CreationDate Yes CreationAuthor Yes ModifiedDate Yes ModifiedAuthor Yes DueDate No AssignedTo Yes Description Yes Stage No BIMsnippet SnippetType No IsExternal No (optional) Reference No ReferenceSchema No DocumentReference Guid No IsExternal No (Optional) ReferencedDocument No Description No RelatedTopic (optional) RelatedTopic/GUID No Comment Date Yes Author Yes Comment Yes Viewpoint Yes ModifiedDate No ModifiedAuthor No Viewpoints Viewpoint Yes Snapshot Yes Index Yes File: Visualization Exported Yes information (.bcfv) 3

Components IfcGuid Yes OriginatingSystem Yes AuthoringToolId Yes OrthogonalCamera No PerspectiveCamera CameraViewPoint Yes CameraDirection Yes CameraUpVector Yes ViewToWorldScale Yes Lines Yes ClippingPlanes Yes Bitmap Yes

4

A3 BCF file structure

Markup file formatting and structure example Edited for brevity. To be used only as example

C:\ProjectFolder\IFC\MEP\MEP.ifc 2019-04-24T08:47:39+03:00 C:\ProjectFolder\IFC\HVAC\HVAC.ifc 2019-04-26T16:10:25+03:00 C:\ProjectFolder\IFC\ARCH\ARCH.ifc 2019-04-26T10:03:25+03:00 C:\ProjectFolder\IFC\STR\STR.ifc 2019-03-26T12:27:10+02:00 C:\ProjectFolder\IFC\ELEC\ELEC.ifc 2019-04-08T16:06:15+03:00
MEP 28 2018-12-12T16:40:02+02:00 [email protected] 2019-04-29T14:28:57+03:00 [email protected] MEP Description text regarding the issue in the model. 15.3.2019 2019-04-29T14:28:57+03:00 [email protected] Comment regarding closing the issue by the BIM coordinator 2019-04-26T16:48:42+03:00 [email protected] Specialist comment regarding the issue and solution 2019-01-16T10:06:55+02:00 [email protected] Initial comment regarding closing the issue by the BIM coordinator 5

viewpoint.bcfv snapshot.png 0

6

Figure 34 Markup.bcf tree structure 7

Visualization information file formatting and structure example Edited for brevity. To be used only as example

Autodesk Revit 2018 (ENU) 6530118 43.74260720185285 49.24594138728482 26.980588620097862 0.9776150899404781 -0.20374505078485286 0.0525041922264222 -0.051399786142067076 0.010712244671349801 0.9986207036701428 60.0 36.779159366198634 21.659073784070543 31.469038641480623 36.78830514445061 21.185525743716546 31.135907529698237 42.102494359999994 8

33.588050611563396 29.92 42.095766615800706 33.742504156223234 30.0746 45.462054261051016 0.8780225697631043 0.0 -0.9998135502620163 -0.019309705136600516 -0.0 PNG 171e63aa-d5af-46f4-b398-b7a8fac5636a/bitmaps-a0a9e2e2-c733-4124-b051- 33f003840d9f-0.png 28.209298923616686 31.96636417496406 27.36675 -0.9364350059127844 0.350841103209307 0.0 -0.350841103209307 -0.9364350059127844 -0.0 154.6800000000019

9

Figure 35 Visualizationinfo.bcfv tree structure