Megaproject Front-End Engineering Design crucially impacts lifecycle value, yet often lacks systematic methods to integrate multi-dimensional value drivers, including Environmental, Social, and Governance factors, using advanced analytics. This research aimed to develop foundational knowledge and a methodological framework to address this gap. This study employed a quantitative approach using panel data (circa 2009-2023), merging country-level Environmental, Social, and Governance indicators from the World Bank and house price indices from Organization for Economic Co-operation and Development countries as an economic performance proxy. Analyses included multicollinearity assessment using the Variance Inflation Factor, panel data regression (Pooled Ordinary Least Squares, Fixed Effects, Random Effects with cluster-robust errors), and the development of a machine learning-based Automated Valuation Model using Random Forest with lagged predictors. Uncertainty quantification for the Automated Valuation Model was performed using Conformal Prediction. The Fixed Effects model (preferred via diagnostic tests; within-coefficient of determination = 0.59) identified significant within-country correlations between house price indices and specific Environmental, Social, and Governance and economic factors (e.g., coastal protection, literacy rate, economic/social rights performance, energy imports/use, internet adoption, demographics). The Random Forest Automated Valuation Model achieved strong predictive performance on test data (coefficient of determination = 0.87, root mean squared error = 6.88), with lagged indicators contributing significantly. Conformal Prediction reliably generated 90% prediction intervals with 90.8% empirical coverage. The study demonstrates the feasibility of a quantitative framework integrating diverse Environmental, Social, and Governance and economic factors using panel regression and machine learning with uncertainty quantification for analysis relevant to megaproject Front-End Engineering Design. This provides essential groundwork for developing future automated, data-driven decision support tools to enhance holistic value assessment.
AECO: Architecture, Engineering, Construction, and Operations; AI: Artificial Intelligence; AVM: Automated Valuation Model; BIM: Building Information Modeling; CAPEX: Capital Expenditures; DLT: Distributed Ledger Technology; DSS: Decision Support Systems; ESG: Environmental, Social, and Governance; FE: Fixed Effects; FEED: Front-End Engineering Design; GCPs: Green Construction Practices; GDP: Gross Domestic Product; GPI: Gender Parity Index; HPI: House Price Index; ILO: International Labour Organization; IoT: Internet of Things; LCA: Lifecycle Assessment; LCCA: Lifecycle Cost Analysis; MCDA: Multi-Criteria Decision Analysis; ML: Machine Learning; NLP: Natural Language Processing; OECD: Organisation for Economic Co-operation and Development; OLS: Ordinary Least Squares; OPEX: Operational Expenditures; PPP: Purchasing Power Parity; RE: Random Effects; RMSE: Root Mean Squared Error; SE: Standard Error; UQ: Uncertainty Quantification; VIF: Variance Inflation Factor; VM: Value Management
Megaprojects, which include large-scale construction, manufacturing facilities, and infrastructure systems, are essential to contemporary economic development but present considerable management challenges [1-6]. These initiatives are marked by significant financial investments, extended timelines, technical complexity, and multifaceted stakeholder interactions [7]. As a result, they frequently experience cost overruns and schedule delays, often failing to deliver their expected long-term benefits [1,8]. Thus, effective risk management and robust decision-making processes are crucial for success [9,10].
The Front-End Planning and Design (FEED) phase is particularly influential in shaping a megaproject's lifecycle outcomes [8,11]. Decisions regarding project scope, technology, and design made during this stage profoundly impact subsequent costs, performance, and overall value realization. Errors or inadequate analyses during FEED can lead to significant difficulties and expenses in later phases [1,12].
The understanding of “value” in this context has shifted from conventional metrics focused solely on time, cost, and scope [11]. Modern perspectives advocate for a multi-dimensional approach, requiring a balance between initial capital expenditures (CAPEX), long-term operational costs (OPEX), technical performance, and increasingly, Environmental, Social, and Governance (ESG) factors [13,14]. Growing societal expectations and the rise of sustainable finance have made environmental stewardship, social responsibility, and effective governance critical components influencing project evaluations and stakeholder acceptance [15-17].
Despite this expanded definition of value, notable gaps persist in practical implementation during the FEED phase. There is often an absence of systematic methods to quantify the various factors (value drivers) that shape lifecycle performance and satisfy diverse stakeholder priorities [9]. Additionally, early-stage decisions typically lack the rigorous, data-driven analysis necessary to assess complex trade-offs, such as balancing upfront costs with long-term sustainability benefits [18]. The digital landscape remains fragmented, with tools like Building Information Modeling (BIM) and specialized simulation software available, yet integrated platforms for comprehensive, automated evaluation and design optimization are scarce [19,20]. This fragmentation leads to reliance on manual and subjective assessment methods, limiting the ability to efficiently explore a broad range of design options [12]. Moreover, the potential of advanced technologies such as Artificial Intelligence (AI), Machine Learning (ML), and sophisticated data analytics to enhance decision-making processes in FEED remains largely underutilized in the industry [21-23].
This research addresses the central problem arising from these gaps: the need for an integrated, computationally enabled framework to facilitate multi-objective value assessment and optimization during the critical front-end planning of megaprojects.
The purpose of this study is to develop foundational knowledge and propose a methodological framework for future automated digital tools aimed at improving value-driven decision-making in megaproject FEED. The goal is to clarify how various factors, particularly economic performance indicators and relevant ESG metrics, interact and can be quantitatively assessed, taking into account contextual influences such as regulations and institutional norms. Insights will be drawn from Institutional Theory, the Porter Hypothesis, and Regulatory Capture Theory.
This study aims to answer the following research questions:
What quantifiable economic, social, and environmental factors (available as country-level indicators) significantly correlate with variations in national house price indices over time, while controlling for country-specific fixed effects?
How effectively can machine learning models (serving as Automated Valuation Models - AVMs) predict national house price indices using lagged indicators, and how reliably can the associated prediction uncertainty be quantified using methods like Conformal Prediction?
What are the key data, modeling (e.g., panel regression, ML), and computational considerations (e.g., managing multicollinearity, implementing uncertainty quantification) necessary for developing an integrated evaluation approach?
This research employs a quantitative methodology, analyzing a constructed panel dataset that merges country-level ESG indicators from the World Bank [2] and OECD house price indices [3] (c. 2009-2023). Figure 1 provides a schematic representation of the proposed framework, illustrating the key stages and analytical components involved in this study. Key analytical steps include data preprocessing (imputation checks, lagging), multicollinearity assessment (Variance Inflation Factor), panel data regression (Pooled OLS, Fixed Effects, Random Effects with cluster-robust errors), and the development of a machine learning-based Automated Valuation Model (AVM) using Random Forest with lagged predictors. Uncertainty quantification (UQ) for the AVM was performed using Conformal Prediction. The study emphasizes the decision context of the FEED stage for large-scale construction, manufacturing, and infrastructure projects. While aiming for broader applicability, the empirical analysis utilizes cross-country data, recognizing potential limitations in generalizing findings without considering specific project-level or regional data. This work focuses on establishing the conceptual and methodological foundation of the framework rather than on developing a final software tool.
This study contributes to the academic field by integrating diverse literature streams (project management, value assessment, digital technology, ESG, and relevant theories) and applying advanced panel data and machine learning techniques (including uncertainty quantification) to the megaproject FEED context. It provides empirical insights into the complex relationships between country-level ESG factors and economic performance proxies. Practically, the research lays the groundwork for developing advanced digital decision support tools, which could enhance stakeholder alignment, risk management, and overall value delivery in complex projects. Additionally, it offers insights relevant to policymakers regarding the interplay of regulation, sustainability, and project outcomes.
Understanding value delivery in megaprojects, particularly during the influential front-end planning (FEED) stages where complex trade-offs involving economic, social, environmental, and technical factors are made, necessitates multiple theoretical perspectives. Decisions at this stage are rarely purely technical or economic; inherently, they are embedded within complex institutional and regulatory environments and shaped by organizational capabilities and strategic responses [7,24]. This study fundamentally employs three complementary theoretical lenses, Institutional Theory, the Porter Hypothesis, and the Theory of Regulatory Capture, to structure the analysis. Essentially, these frameworks collectively help explain why certain value drivers gain prominence, how external pressures like regulations might shape project choices and potentially spur innovation, and what political-economic dynamics could influence the de facto impact of these pressures.
Institutional theory provides a robust lens for analyzing how organizations, including those involved in megaprojects, adapt to their environments to secure legitimacy and resources [25]. It posits that both formal institutions (e.g., laws and specific regulations like energy performance standards) and informal institutions (e.g., societal norms, professional ethics, and market expectations) exert powerful influences [26]. Formal regulations, such as environmental impact assessment mandates or safety protocols, directly shape project requirements and influence the weighting of value drivers in decision-making [7,27,28]. Furthermore, the perceived quality and consistent enforcement of these rules (‘rule of law’) can demonstrably impact project risks and performance [12].
Meanwhile, informal institutions like growing societal awareness of climate change and resource depletion create normative pressures favoring sustainable practices, such as green building, often exceeding minimum legal compliance [29]. Investor demands for credible Environmental, Social, and Governance (ESG) reporting and performance represent a potent mimetic and normative force [14,30]. Organizations ostensibly integrate ESG factors to maintain legitimacy, access sustainable finance markets [16], and secure the ‘social license to operate’ from stakeholders, including local communities [13]. Consequently, understanding these institutional pressures is key to explaining the increasing adoption of Green Construction Practices (GCPs) and sustainable materials [31], which may be driven by coercive, mimetic, or normative mechanisms, often varying across different institutional contexts [32,33].
Contrasting with traditional views of regulation as purely burdensome, the Porter Hypothesis suggests that well-designed, stringent environmental regulations can act as catalysts for innovation [34,35]. Theoretically, this “innovation offset” might not only reduce negative environmental impacts but also enhance resource efficiency, foster new technologies, and potentially improve a firm's or project's overall competitiveness and economic performance. The ‘weak’ version postulates that regulation stimulates specific innovations, while the ‘strong’ version posits that this leads to enhanced productivity or competitiveness [36,37].
In the megaproject domain, stringent environmental rules could conceivably push firms towards innovative, sustainable materials [31], energy-efficient technologies [38,39], and construction methods [32]. Such innovations might lead to lifecycle cost savings that offset initial compliance investments. Empirically, recent studies provide nuanced support, indicating environmental regulation can spur innovation and sometimes positively influence productivity or financial performance, though these effects are often heterogeneous depending on the type of regulation, industry context, and innovation measured [15,16,34,35, 38, 39, 40]. This resonates with sustainable finance principles, where projects exceeding regulatory minimums might attract investment due to perceived lower transition risk and potential long-term value creation [16,40].
Conversely, the theory of regulatory capture offers a critical perspective, suggesting that regulatory bodies may become heavily influenced or effectively ‘captured’ by the industries they oversee [41,42]. This capture can arise from information asymmetries, lobbying efforts, or revolving-door personnel exchanges, potentially leading to regulations that favor industry interests over broader public welfare. In the megaproject arena, this could manifest as weakened environmental or safety standards [43], ineffective enforcement regimes, or biased permitting processes [12], thereby undermining both genuine sustainability efforts and the innovation-driving potential suggested by the Porter Hypothesis [35,44]. Consequently, assessing the real-world impact of regulations on project value necessitates considering the potential for capture.
Integrating these three theories provides a multi-layered understanding: Institutional theory sets the broad context of norms and rules defining value; the Porter Hypothesis highlights a potential positive dynamic linking regulation, innovation, and value; while Regulatory Capture introduces a critical lens on how political-economic factors can mediate or undermine these processes. This integrated perspective allows an analysis that moves beyond purely technical assessments to consider how value is socially constructed, institutionally shaped, and potentially politically influenced, guiding the examination of relevant literature.
The literature consistently highlights the unique challenges of megaprojects stemming from their scale, complexity, duration, and stakeholder diversity [1,6,7]. These factors inherently contribute to heightened risk exposure across technical, financial, social, and environmental dimensions [24,45]. Performance issues, particularly cost and schedule overruns, are persistent problems often rooted in the FEED phase [1,8]. Effective FEED, involving robust feasibility studies, clear scope definition, stakeholder alignment, and early risk assessment, is indisputably critical for mitigating these issues and maximizing lifecycle value [11,12], though practical implementation often falls short due to various pressures and biases [1].
Value Management (VM) provides systematic methodologies for function-cost analysis to enhance project value [46]. However, the definition of ‘value’ itself has progressively expanded beyond direct economic returns to encompass lifecycle performance, functionality, maintainability, safety [28], stakeholder satisfaction [47], and ESG criteria [17,48]. Identifying and quantifying these multi-dimensional value drivers remains a significant challenge, requiring diverse metrics and methods, from financial calculations to simulations and qualitative stakeholder input [49]. While Lifecycle Cost Analysis (LCCA) and Lifecycle Assessment (LCA) are established tools [50], achieving truly integrated, quantitative lifecycle value assessment that balances all dimensions remains an area for development.
Decision Support Systems (DSS), particularly those employing Multi-Criteria Decision Analysis (MCDA) methods like AHP, ANP, ELECTRE, EDAS, and BWM, offer structured approaches to navigate complex trade-offs inherent in FEED [9,10,38,49,50,51]. Furthermore, digital technologies are rapidly evolving in the Architecture, Engineering, Construction, and Operations (AECO) sector. Building Information Modeling (BIM) serves as a central data repository and enabler for various analyses [19,52,53]. Advanced simulation tools support performance-based design [18], while the application of Data Analytics, AI, and ML is growing for prediction, optimization, risk assessment, and text mining [20-23,54]. The emergence of PropTech brings many of these innovations (BIM, IoT, AI/ML) under one umbrella relevant to the project lifecycle [38,55]. Simultaneously, intersections with FinTech (e.g., LendTech for finance, InsurTech for risk) and RegTech (compliance automation) are becoming increasingly important [20]. Technologies like Blockchain/DLT are also being explored for enhanced transparency and contract automation [56]. Nonetheless, a key challenge identified in the literature is the fragmentation of these tools and the lack of integrated platforms capable of holistic, automated value assessment and optimization, especially during FEED [20].
The push for sustainability is evident through the adoption of Green Construction Practices (GCPs), the use of sustainable materials, and green building certifications [27,31,32]. Beyond project-specific practices, broader ESG frameworks are increasingly applied to real asset investments, emphasizing comprehensive reporting across environmental, social, and governance dimensions [14]. Moreover, a significant body of research explores the financial materiality of ESG performance. Strong ESG credentials, sometimes linked to regulatory pressure (as suggested by the Porter Hypothesis), are increasingly associated with potential benefits like enhanced financial performance, innovation, reduced risk perception, and improved access to sustainable finance [32,37,40].
Synthesizing the literature reveals considerable progress in understanding megaproject complexities, evolving value concepts, specific digital tools, and ESG integration. However, a critical research niche exists at the intersection of these areas. There is a persistent lack of integrated frameworks and methodologies capable of holistically and quantitatively evaluating multi-dimensional value drivers during FEED, explicitly considering trade-offs and contextual factors. Crucially, the potential for automating this complex evaluation and design optimization process using AI/ML within an integrated digital environment (linking project data with PropTech/FinTech functionalities) remains largely unexplored territory for megaproject FEED. This research directly targets this niche by seeking to develop the necessary knowledge base and methodological framework to bridge these gaps, informed by empirical analysis and grounded in relevant theoretical perspectives.
This study employed a quantitative methodology using secondary panel data. The design involved merging country-level time-series data on Environmental, Social, and Governance (ESG) indicators from the World Bank [2] with national house price indices from OECD countries [3]. Subsequent analyses included data preprocessing, multicollinearity assessment, panel data regression modeling (Pooled OLS, Fixed Effects, Random Effects), and machine learning-based predictive modeling (Automated Valuation Model - AVM) with uncertainty quantification. The proposed methodological framework, illustrating the key stages and analytical components involved in this study, is schematically represented in figure 1.
Two primary datasets were sourced from the World Bank and the OECD databases. ESG and Macroeconomic Country-level indicators for the period c. 2008-2023 were obtained from the World Bank Databank’s “Environment, Social and Governance” collection [2]. While House Price Index “Real house price indices (2015 = 100)” for OECD member and partner countries covering the period c. 2009-2024 were extracted from the OECD Data Explorer platform (“Analytical house prices indicators” dataset) [3].
Multicollinearity among the contemporaneous independent variables intended for OLS-based panel models was assessed using the Variance Inflation Factor (VIF), calculated via the stats models, stats outliers influence. Variance inflation factor function after adding a constant (sm.add constant). An iterative procedure was applied: the predictor with the highest VIF above a threshold of 10.0 was removed, and VIFs were recalculated until all remaining predictors were below this threshold. This resulted in a reduced set of predictors (X_panel processed) used for subsequent panel regressions.
All analyses were performed using Python, primarily with the stats models, linear models [4], and scikit-learn [5] libraries. To analyze the contemporaneous association between ESG/macro factors and house price indices while controlling for country differences, standard panel data models were estimated using the linear models library. The dependent variable was house price index, and the independent variables were the VIF-reduced set of contemporaneous indicators (X_panel processed). The “Pooled OLS” was estimated using Pooled OLS. This served as a baseline but ignores the panel structure. Meanwhile, the “Fixed Effects (Entity)” was estimated using Panel OLS with entity effects = True. This model controls for time-invariant, unobserved heterogeneity across countries by including country specific intercepts. Subsequently, the “Random Effects” was estimated using Random-Effects. This model assumes country specific effects are random and uncorrelated with the regressors. Standard Errors for all panel models were calculated using the cluster-robust standard errors (cov type = 'clustered', cluster entity = True) were calculated to account for potential heteroskedasticity and serial correlation within countries. Ultimately, for the “Model Selection (FE vs. RE)”, an F-test for poolability (comparing FE against Pooled OLS) was examined from the FE output. A Hausman-like test was conducted using the linear models panel compare between the (unclustered) FE and RE models to assess the consistency of the RE estimator. Based on these tests, the Fixed Effects model was identified as the preferred specification for interpreting within-country effects.
A machine learning approach was used for prediction and UQ, employing the scikit-learn [5] and mapie libraries. A Random Forest Regressor served as the base predictive model. The full set of lagged independent variables and the lagged dependent variable were used as predictors. Subsequently, the data was partitioned into training (60%), calibration (20%), and test (20%) sets using train test split. For the “UQ Method”, conformal prediction was implemented using mapie regression. Mapie Regressor with method = "plus" and CV = "split". The model was fit on the training data, providing the calibration data (X_calib, Y_calib) during the .fit () step. Ultimately, point predictions were evaluated using Root Mean Squared Error (RMSE) and R-squared (R²). Prediction intervals generated at α = 0.1 were evaluated based on empirical coverage rate and average interval width on the test set. Feature importance were derived from the base Random Forest Regressor. All data processing and analysis were performed using Python version 3.11 and relevant libraries including pandas, numpy, stats models, scikit-learn, linear models, and mapie.
This study employed a comprehensive panel data regression analysis, Automated Valuation Models (AVM), and an extensive literature review to investigate the global impact of Environmental, Social, and Governance (ESG) factors on the housing price index.
This section presents the empirical findings from the quantitative analyses. Results for the multicollinearity diagnostics, panel data regression models, and model specification testing are presented sequentially, followed by the Automated Valuation Model (AVM) outcomes.
Variance Inflation Factors (VIFs) were calculated for the 34 contemporaneous independent variables to assess multicollinearity before panel regression. An iterative removal process excluded predictors with VIF > 10.00. Life expectancy at birth total years (initial VIF: 32.55) and rule law estimate (subsequent VIF: 30.99) were removed. The maximum VIF among the remaining 32 predictors was 8.57. This VIF-reduced set of predictors was used for the panel regression models described below.
Pooled OLS, Fixed Effects (FE), and Random Effects (RE) models were estimated using the VIF-reduced set of 32 contemporaneous predictors and 650 country-year observations. Cluster-robust standard errors (by country) were applied.
Pooled OLS model: The Pooled OLS estimation yielded an overall R-squared of 0.50 (Table 1). The model's predictors were collectively significant (Robust F (32, 617) = 49.50, p < 0.001).
| Table 1: Pooled OLS results (VIF-Reduced, Clustered SE) - Model summary. | |||
| Statistic | Value | Statistic | Value |
| Dep. Variable | house_price_index | R-squared | 0.50 |
| Estimator | PooledOLS | R-squared (Between) | 0.41 |
| No. Observations | 650 | R-squared (Within) | 0.53 |
| Cov. Estimator | Clustered | R-squared (Overall) | 0.50 |
| Entities | 47 | Log-likelihood | -2613.40 |
| Time periods | 14 | F-statistic (robust) | 49.50 |
| p-value (F-stat robust) | 0.00 | ||
Table 2 presents the parameter estimates. Variables with statistically significant negative coefficients (p < 0.05) included economic and social rights performance score (β = -2.49), energy use kg oil equivalent per capita (β = -0.00), gini index (β = -0.26), individuals using the internet population (β = -0.12), and renewable electricity output total electricity output (β = -0.14). Significant positive coefficients (p < 0.05) were found for income_share_held_by_lowest_20 (β = 0.90) and ratio female to male labor force participation rate modeled ilo estimate (β = 0.45).
| Table 2: Pooled OLS results (VIF-Reduced, Clustered SE) - Parameter estimates. | |||||
| Variable | Parameter | Std. Err. | T-stat | p-value | Sig. (0.05) |
| Const | 90.47 | 17.49 | 5.17 | 0.00 | *** |
| Coastal protection | -0.01 | 0.02 | -0.44 | 0.66 | |
| Control corruption estimate | -0.82 | 3.48 | -0.24 | 0.81 | |
| Economic and social rights performance score | -2.49 | 0.80 | -3.13 | 0.00 | ** |
| Electricity production from coal sources total | -0.06 | 0.05 | -1.09 | 0.27 | |
| Energy imports net energy use | -0.01 | 0.01 | -1.36 | 0.18 | |
| Energy intensity level primary energy mj 2017 ppp gdp | 0.65 | 0.57 | 1.13 | 0.26 | |
| Energy use kg oil equivalent per capita | -0.00 | 0.00 | -2.42 | 0.02 | * |
| Fertility rate total births per woman | -3.81 | 2.33 | -1.63 | 0.10 | |
| Food production index 2014_2016_100 | 0.08 | 0.08 | 1.04 | 0.30 | |
| Fossil fuel energy consumption total | 0.00 | 0.05 | 0.11 | 0.92 | |
| gdp growth annual | -0.33 | 0.29 | -1.13 | 0.26 | |
| gini index | -0.26 | 0.10 | -2.55 | 0.01 | * |
| Government expenditure on education total government expenditure | -0.22 | 0.22 | -1.03 | 0.31 | |
| Hospital beds per 1000 people | -0.51 | 0.39 | -1.30 | 0.19 | |
| Income share held by lowest 20 | 0.90 | 0.43 | 2.11 | 0.04 | * |
| Individuals using the internet population | -0.12 | 0.06 | -2.15 | 0.03 | * |
| Land surface temperature | 0.10 | 0.13 | 0.75 | 0.45 | |
| Level water stress freshwater withdrawal as a proportion | -0.01 | 0.02 | -0.93 | 0.35 | |
| Literacy rate adult total people ages 15 and above | 0.02 | 0.02 | 1.30 | 0.19 | |
| People using safely managed sanitation services population | -0.02 | 0.08 | -0.28 | 0.78 | |
| Political stability and absence violence terrorism estimate | 4.76 | 3.25 | 1.47 | 0.14 | |
| Population ages 65 and above total population | -0.25 | 0.16 | -1.51 | 0.13 | |
| Population density people per sq km land area | 0.01 | 0.01 | 0.80 | 0.42 | |
| Proportion bodies water with good ambient water quality | 0.01 | 0.02 | 0.41 | 0.68 | |
| Ratio female to male labor force participation rate modeled ilo | 0.45 | 0.20 | 2.22 | 0.03 | * |
| Renewable electricity output total electricity output | -0.14 | 0.06 | -2.42 | 0.02 | * |
| Renewable energy consumption total final energy consumption | -0.04 | 0.09 | -0.51 | 0.61 | |
| Research and development expenditure gdp | -0.57 | 0.97 | -0.59 | 0.55 | |
| School enrollment primary and secondary gross gender parity index gpi | 0.77 | 2.61 | 0.29 | 0.77 | |
| Voice and accountability estimate | -5.02 | 4.03 | -1.25 | 0.21 | |
| Significance Codes: (p < 0.1), *(p < 0.05), **(p < 0.01), ***(p < 0.001). | |||||
Fixed Effects (FE) Model: The Fixed Effects (Entity) model, controlling for time-invariant country characteristics, achieved a within-country R-squared of 0.588 (Table 3). This indicates the model explained approximately 58.8% of the temporal variation in house price indices within countries. Diagnostic testing (F-test for Poolability, p < 0.001) supported the inclusion of fixed effects over the Pooled OLS specification. The overall model significance was confirmed (Robust F (32, 571) = 20.720, p < 0.001).
| Table 3: Fixed effects (Country) results (VIF-Reduced, Clustered SE) - Model summary. | |||
| Statistic | Value | Statistic | Value |
| Dep. Variable | house_price_index | R-squared | 0.588 |
| Estimator | PanelOLS | R-squared (Between) | -0.348 |
| No. Observations | 650 | R-squared (Within) | 0.588 |
| Cov. Estimator | Clustered | R-squared (Overall) | -0.327 |
| Entities | 47 | Log-likelihood | -2479.400 |
| Time periods | 14 | F-statistic (robust) | 20.720 |
| p-value (F-stat robust) | 0.000 | ||
| F-test Poolability | 6.334 | ||
| p-value Poolability | 0.000 | ||
Parameter estimates (Table 4) revealed several statistically significant (p < 0.05) within-country associations. Positive coefficients were found for coastal protection (β = 0.05) and literacy rate adult total people ages 15 and above (β = 0.04). Negative coefficients were found for economic and social rights performance score (β = -2.67), energy imports net energy use (β = -0.03), energy use kg oil equivalent per capita (β = -0.00), individuals using the internet population (β = -0.15), and population ages 65 and above total population (β = -0.70). gdp growth annual (p = 0.079) and level water stress (p = 0.056) were marginally significant (p < 0.10).
| Table 4: Fixed effects (Country) results (VIF-Reduced, Clustered SE) - Parameter estimates. | |||||
| Variable | Parameter | Std. Err. | T-stat | p-value | Sig. (0.05) |
| Coastal protection | 0.05 | 0.03 | 2.09 | 0.04 | * |
| Control corruption estimate | -3.52 | 4.02 | -0.88 | 0.38 | |
| Economic and social rights performance score | -2.67 | 1.32 | -2.01 | 0.04 | * |
| Electricity production from coal sources total | -0.02 | 0.08 | -0.26 | 0.79 | |
| Energy imports net energy use | -0.03 | 0.01 | -2.92 | 0.00 | ** |
| Energy intensity level primary energy mj 2017 ppp gdp | -0.86 | 0.65 | -1.31 | 0.19 | |
| Energy use kg oil equivalent per capita | -0.00 | 0.00 | -2.11 | 0.03 | * |
| Fertility rate total births per woman | -5.80 | 3.84 | -1.51 | 0.13 | |
| Food production index 2014_2016_100 | 0.10 | 0.07 | 1.32 | 0.19 | |
| Fossil fuel energy consumption total | -0.02 | 0.06 | -0.42 | 0.67 | |
| gdp growth annual | -0.53 | 0.30 | -1.76 | 0.08 | . |
| gini index | -0.04 | 0.12 | -0.30 | 0.77 | |
| Government expenditure on education total government expenditure | -0.27 | 0.21 | -1.29 | 0.20 | |
| Hospital beds per 1000 people | -0.11 | 0.63 | -0.17 | 0.87 | |
| Income share held by lowest 20 | 0.02 | 0.63 | 0.03 | 0.98 | |
| Individuals using the internet population | -0.15 | 0.06 | -2.67 | 0.01 | ** |
| Land surface temperature | 0.15 | 0.18 | 0.84 | 0.40 | |
| Level water stress freshwater withdrawal as a proportion | 0.03 | 0.02 | 1.92 | 0.06 | . |
| Literacy rate adult total people ages 15 and above | 0.04 | 0.02 | 2.00 | 0.05 | * |
| Political stability and absence violence terrorism estimate | 5.17 | 5.32 | 0.97 | 0.33 | |
| Population ages 65 and above total population | -0.70 | 0.31 | -2.28 | 0.02 | * |
| Population density people per sq km land area | 0.02 | 0.02 | 1.18 | 0.24 | |
| Proportion bodies water with good ambient water quality | 0.01 | 0.02 | 0.66 | 0.51 | |
| Ratio female to male labor force participation rate modeled ilo | -0.00 | 0.86 | -0.01 | 1.00 | |
| Renewable electricity output total electricity output | -0.13 | 0.08 | -1.56 | 0.12 | |
| Renewable energy consumption total final energy consumption | 0.14 | 0.12 | 1.19 | 0.23 | |
| Research and development expenditure gdp | 1.01 | 1.48 | 0.69 | 0.49 | |
| School enrollment primary and secondary gross gender parity index gpi | 1.43 | 2.73 | 0.53 | 0.60 | |
| Voice and accountability estimate | 3.12 | 5.64 | 0.55 | 0.58 | |
| Significance Codes: (p < 0.1), *(p < 0.05), **(p < 0.01), ***(p < 0.001). | |||||
Random Effects (RE) model: The Random Effects model was also estimated (Table 5). It exhibited a high overall R-squared (0.980), largely driven by between-country variation (R-squared between = 0.990). Parameter estimates are shown in table 6.
| Table 5: Random effects results (VIF-Reduced, Clustered SE) - Model summary. | |||
| Statistic | Value | Statistic | Value |
| Dep. Variable | house_price_index | R-squared | 0.959 |
| Estimator | RandomEffects | R-squared (Between) | 0.990 |
| No. Observations | 650 | R-squared (Within) | 0.541 |
| Cov. Estimator | Clustered | R-squared (Overall) | 0.980 |
| Entities | 47 | Log-likelihood | -2594.600 |
| Time periods | 14 | F-statistic (robust) | 1501.900 |
| p-value (F-stat robust) | 0.000 | ||
| Table 6: Random effects results (VIF-Reduced, Clustered SE) - Parameter estimates. | |||||
| Variable | Parameter | Std. Err. | T-stat | p-value | Sig. (0.05) |
| Const | --- | --- | --- | --- | |
| Coastal protection | 0.01 | 0.02 | 0.64 | 0.52 | |
| Control corruption estimate | -6.26 | 3.36 | -1.86 | 0.06 | . |
| Economic and social rights performance score | -2.26 | 1.23 | -1.84 | 0.07 | . |
| Electricity production from coal sources total | -0.09 | 0.06 | -1.38 | 0.17 | |
| Energy imports net energy use | -0.01 | 0.01 | -1.57 | 0.12 | |
| Energy intensity level primary energy mj 2017 ppp gdp | -0.16 | 0.51 | -0.32 | 0.75 | |
| Energy use kg oil equivalent per capita | -0.00 | 0.00 | -2.43 | 0.02 | * |
| Fertility rate total births per woman | -3.17 | 2.62 | -1.21 | 0.23 | |
| Food production index 2014_2016_100 | 0.21 | 0.08 | 2.76 | 0.01 | ** |
| Fossil fuel energy consumption total | 0.05 | 0.04 | 1.06 | 0.29 | |
| gdp growth annual | -0.34 | 0.30 | -1.12 | 0.26 | |
| gini index | -0.24 | 0.09 | -2.83 | 0.00 | ** |
| Government expenditure on education total government expenditure | -0.34 | 0.23 | -1.48 | 0.14 | |
| Hospital beds per 1000 people | -0.22 | 0.60 | -0.36 | 0.72 | |
| Income share held by lowest 20 | 0.74 | 0.42 | 1.75 | 0.08 | . |
| Individuals using the internet population | -0.14 | 0.06 | -2.58 | 0.01 | * |
| Land surface temperature | 0.14 | 0.12 | 1.14 | 0.26 | |
| Level water stress freshwater withdrawal as a proportion | 0.04 | 0.01 | 3.85 | 0.00 | *** |
| Literacy rate adult total people ages 15 and above | 0.05 | 0.02 | 2.79 | 0.01 | ** |
| People using safely managed sanitation services population | -0.01 | 0.08 | -0.19 | 0.85 | |
| Political stability and absence violence terrorism estimate | 3.69 | 4.25 | 0.87 | 0.39 | |
| Population ages 65 and above total population | -0.63 | 0.24 | -2.69 | 0.01 | ** |
| Population density people per sq km land area | 0.03 | 0.01 | 2.37 | 0.02 | * |
| Proportion bodies water with good ambient water quality | 0.02 | 0.02 | 0.88 | 0.38 | |
| Ratio female to male labor force participation rate modeled ilo | 1.53 | 0.04 | 35.10 | 0.00 | *** |
| Renewable electricity output total electricity output | -0.16 | 0.06 | -2.49 | 0.01 | * |
| Renewable energy consumption total final energy consumption | 0.10 | 0.09 | 1.12 | 0.26 | |
| Research and development expenditure gdp | -1.29 | 1.19 | -1.08 | 0.28 | |
| School enrollment primary and secondary gross gender parity index gpi | 1.93 | 2.65 | 0.73 | 0.47 | |
| Voice and accountability estimate | -1.29 | 4.87 | -0.27 | 0.79 | |
| Significance Codes: (p < 0.1), *(p < 0.05), **(p < 0.01), ***(p < 0.001). | |||||
Model specification testing: A Hausman-type comparison between the Fixed Effects and Random Effects models was conducted (Table 7). The observed differences in coefficient estimates and significance levels between the two models, alongside the significant F-test for poolability (Table 3), indicated that the assumptions underlying the Random Effects model were likely violated, favoring the Fixed Effects specification for consistent estimation of within-country effects.
| Table 7: Hausman Test comparison (FE vs RE, VIF-Reduced). | ||||
| Feature | FE Coeff | RE Coeff | FE T-stat | RE T-stat |
| Coastal protection | 0.05 | 0.01 | 2.82 | 0.72 |
| Control corruption estimate | -3.52 | -6.26 | -1.24 | -3.03 |
| Economic and social rights performance score | -2.67 | -2.26 | -2.81 | -2.18 |
| Electricity production from coal sources total | -0.02 | -0.09 | -0.46 | -1.80 |
| Energy imports net energy use | -0.03 | -0.01 | -3.24 | -1.37 |
| Energy intensity level primary energy mj 2017 ppp gdp | -0.86 | -0.16 | -1.55 | -0.33 |
| Energy use kg oil equivalent per capita | -0.00 | -0.00 | -3.23 | -2.95 |
| Fertility rate total births per woman | -5.80 | -3.17 | -2.54 | -1.43 |
| Food production index 2014_2016_100 | 0.10 | 0.21 | 2.00 | 4.96 |
| Fossil fuel energy consumption total | -0.02 | 0.05 | -0.76 | 1.42 |
| gdp growth annual | -0.53 | -0.34 | -2.72 | -1.59 |
| gini index | -0.04 | -0.24 | -0.37 | -2.78 |
| Government expenditure on education total government expenditure | -0.27 | -0.34 | -1.44 | -1.82 |
| Hospital beds per 1000 people | -0.11 | -0.22 | -0.29 | -0.57 |
| Income share held by lowest 20 | 0.02 | 0.74 | 0.04 | 1.77 |
| Individuals using the internet population | -0.15 | -0.14 | -3.82 | -3.40 |
| Land surface temperature | 0.15 | 0.14 | 1.09 | 1.11 |
| Level water stress freshwater withdrawal as a proportion | 0.03 | 0.04 | 2.83 | 3.29 |
| Literacy rate adult total people ages 15 and above | 0.04 | 0.05 | 2.00 | 2.38 |
| People using safely managed drinking water service | 0.10 | 0.03 | 1.57 | 0.77 |
| People using safely managed sanitation services | -0.05 | -0.01 | -0.79 | -0.29 |
| Political stability and absence violence terrorism estimate | 5.17 | 3.69 | 1.88 | 1.57 |
| Population ages 65 and above total population | -0.70 | -0.63 | -3.41 | -3.08 |
| Population density people per sq km land area | 0.02 | 0.03 | 1.64 | 3.54 |
| Poverty headcount ratio at national poverty line | -0.05 | 0.09 | -0.58 | 1.03 |
| Proportion bodies water with good ambient water | 0.01 | 0.02 | 0.42 | 0.57 |
| Ratio female to male labor force participation rate modeled ilo | -0.00 | 1.53 | -0.01 | 59.93 |
| Renewable electricity output total electricity | -0.13 | -0.16 | -3.21 | -3.63 |
| Renewable energy consumption total final energy | 0.14 | 0.10 | 1.63 | 1.27 |
| Research and development expenditure gdp | 1.01 | -1.29 | 1.00 | -1.37 |
| School enrollment primary and secondary gross gender parity index gpi | 1.43 | 1.93 | 0.57 | 0.70 |
| Voice and accountability estimate | 3.12 | -1.29 | 0.78 | -0.46 |
A Random Forest model was developed as an AVM using the full set of 35 lagged predictors.
AVM performance and feature importance: The AVM achieved an R-squared of 0.87 and an RMSE of 6.88 on the test set. The lagged house price index was the most dominant predictor (61.57% importance). Several lagged ESG and economic indicators also contributed to predictive performance. Table 8 details the performance metrics and the top 15 feature importance.
| Table 8: Random forest AVM performance & feature Importance (Lagged Features). | ||
| Performance Metric | Value | |
| RMSE | 6.88 | |
| R2 Score | 0.87 | |
| Top 15 Feature Importance’s | ||
| Rank | Feature | Importance |
| 1 | House price index lag1 | 0.62 |
| 2 | Renewable electricity output total electricity lag1 | 0.10 |
| 3 | Economic and social rights performance score lag1 | 0.06 |
| 4 | Fossil fuel energy consumption total lag1 | 0.04 |
| 5 | Energy use kg oil equivalent per capita lag1 | 0.02 |
| 6 | Rule law estimate lag1 | 0.02 |
| 7 | Political stability and absence violence terrorism estimate lag1 | 0.01 |
| 8 | People using safely managed drinking water services population lag1 | 0.01 |
| 9 | Food production index 2014-2016 100 lag1 | 0.01 |
| 10 | Fertility rate total births per woman lag1 | 0.01 |
| 11 | Ratio female to male labor force participation rate modeled ilo lag1 | 0.01 |
| 12 | gdp growth annual lag1 | 0.01 |
| 13 | Population density people per sq km land area lag1 | 0.01 |
| 14 | Hospital beds per 1000 people lag1 | 0.01 |
| 15 | Research and development expenditure gdp lag1 | 0.01 |
Uncertainty quantification using conformal prediction: Using MAPIE for conformal prediction (α = 0.10), the generated 90% prediction intervals exhibited an empirical coverage of 90.80% on the test data, close to the nominal target. The mean width of these intervals was 23.998 index points. Table 9 provides sample predictions with intervals and summarizes the UQ performance.
| Table 9: Sample conformal prediction intervals and UQ performance: Sample predictions with 90% intervals. | ||||||
| Country | Year | Actual | Predicted | Lower_90% | Upper_90% | Interval Width |
| Australia | 2012 | 83.10 | 87.43 | 75.43 | 99.43 | 24.00 |
| Australia | 2016 | 104.80 | 103.69 | 91.69 | 115.69 | 24.00 |
| Australia | 2020 | 106.70 | 105.61 | 93.61 | 117.60 | 24.00 |
| Austria | 2020 | 125.20 | 121.18 | 109.18 | 133.18 | 24.00 |
| Belgium | 2012 | 99.60 | 99.74 | 87.74 | 111.73 | 24.00 |
| UQ Performance Summary | ||||||
| Metric | Value | |||||
| Target Coverage | 90.00% | |||||
| Actual Coverage (Test Set) | 90.80% | |||||
| Average Interval Width | 23.998 | |||||
This study aimed to develop foundational knowledge and a methodological framework for improving value-driven decision-making during the Front-End Planning and Design (FEED) phase of megaprojects, focusing on the integration of economic, social, and environmental factors through quantitative analysis and advanced digital techniques. The discussion interprets the empirical findings from the panel data regression and Automated Valuation Model (AVM) analyses in light of the research questions, theoretical underpinnings, and existing literature, while also considering practical implications and limitations.
The research questions posed guided the empirical investigation, yielding several key findings. Addressing RQ1 inquired about quantifiable economic, social, and environmental factors (available as country-level indicators) significantly correlating with variations in national house price indices over time, while controlling for country-specific fixed effects. The Fixed Effects (FE) panel regression model (Table 4), preferred based on diagnostic tests (Tables 3,7), and revealed significant within-country associations. The preference for the Fixed Effects model is particularly important as it allows for the isolation of within-country temporal variations, directly informing Institutional Theory by showing how changes in specific ESG factors within a given institutional context correlate with house price changes, rather than merely reflecting cross-country differences. This sheds light on the dynamic interplay of formal and informal rules and economic outcomes.
Regarding positive correlations, notably, improved coastal protection (β = 0.05) and higher literacy rate adult total people ages 15 and above (β = 0.04) were positively associated with house price index changes. The positive association with coastal protection suggests that investments in climate adaptation and resilience infrastructure enhance a region's perceived stability and attractiveness, thereby increasing property values. This aligns with a nuanced interpretation of the Porter Hypothesis, where proactive environmental measures, potentially spurred by regulation, lead to long-term economic benefits by reducing climate risks and attracting sustainable investment. The positive correlation with literacy rate indicates that improvements in human capital and educational attainment contribute to overall economic productivity and a higher quality of life, factors that are capitalized into real estate values. This finding resonates with Institutional Theory, highlighting how societal development and institutional investments in human capital foster a more stable, productive, and attractive economic environment, ultimately influencing asset values.
Conversely, factors like a higher economic and social rights performance score (β = −2.67), greater reliance on energy imports net energy use (β = −0.03), higher energy use kg oil equivalent per capita (β = −0.00), increased individuals using the internet population (β = −0.15), and a larger share of population ages 65 and above total population (β = −0.70) showed significant negative associations. The negative correlation with economic and social rights performance score warrants careful interpretation. While strong social rights are often perceived as positive, this observed negative correlation could suggest that, within the dataset's context, countries with higher performance in this area might also experience specific economic pressures or policy choices (e.g., higher taxation, regulatory burdens) that could dampen housing market growth. This finding opens avenues for future research into complex policy trade-offs, potential unintended consequences of certain social policies on market dynamics, or even implicitly hints at mechanisms akin to Regulatory Capture, where over-regulation in certain areas may lead to inefficiencies that negatively impact market dynamism. The negative association with energy imports net energy use and energy use kg oil equivalent per capita suggests that greater energy dependence or inefficiency might indicate underlying economic vulnerabilities, higher operational costs, or a lack of sustainable energy transition, deterring housing market growth. An increase in individuals using the internet population might correlate with housing price moderation if it signifies a shift towards a more distributed workforce, potentially decreasing demand for high-cost urban centers, rather than directly signaling value creation. The negative association with population ages 65 and above total population could reflect demographic shifts leading to reduced housing demand, increased healthcare burdens, or a diminished productive workforce, thereby impacting economic vitality and property values. Marginally significant negative associations were found for gdp growth annual (p = 0.08) and a positive association for level water stress freshwater withdrawal as a proportion (p = 0.06). These findings collectively highlight a complex interplay between environmental adaptation (coastal protection, water stress), social development (literacy, demographics, internet adoption), economic factors (energy use/imports, GDP growth), governance proxies (economic/social rights score), and the macroeconomic indicator (house prices).
Addressing RQ2 concerned how effectively machine learning models (AVMs) can predict national house price indices using lagged indicators, and how reliably uncertainty can be quantified. The Random Forest-based AVM demonstrated strong predictive performance on the test set, achieving an R-squared of 0.87 and an RMSE of 6.88 (Table 8). This strong performance underscores the efficacy of machine learning for forecasting national house price indices relevant to megaproject FEED. While the lagged house price index lag1 was the dominant predictor (61.57% importance), several lagged ESG and economic indicators, such as renewable electricity output lag1 (10%), economic and social rights performance score lag1 (6%), and fossil fuel energy consumption total lag1 (4%), contributed meaningfully to the prediction. This highlights that societal and environmental sustainability efforts, even when manifested at a macro level, can have a tangible, delayed impact on economic performance, providing valuable foresight for long-term project planning and reinforcing a nuanced view of the Porter Hypothesis over time. Furthermore, using Conformal Prediction via the MAPIE library, the study successfully generated 90% prediction intervals with an empirical coverage of 90.80% on the test data (Table 9), closely matching the target coverage. This robust quantification of uncertainty represents a significant advancement. It demonstrates the feasibility of reliably quantifying prediction uncertainty, moving beyond purely deterministic forecasts. For the high-stakes environment of megaprojects, this UQ capability is paramount, enabling decision-makers to understand the confidence level associated with predictions and make more robust, risk-informed choices [11], addressing a key identified gap regarding the lack of rigorous analysis for complex trade-offs in the early stages [20].
Addressing RQ3 focused on the key data, modeling, and computational considerations for an integrated evaluation approach. The study underscored the necessity of several key steps for creating a complete and trustworthy evaluation framework. These include: sourcing and merging diverse datasets (e.g., World Bank, OECD); rigorous data preprocessing, including handling missing data (imputation checks) and appropriate lagging of predictors for forecasting; systematic multicollinearity assessment using VIF and iterative feature reduction for robust regression modeling; employing appropriate panel data models (Pooled OLS, FE, RE) and selecting the best fit based on statistical tests e.g., Hausman test favoring FE (Table 7); leveraging machine learning (e.g., Random Forest) for predictive modeling (AVM); and adopting UQ methods (e.g., Conformal Prediction) to evaluate prediction reliability. These considerations are critical in building a computationally enabled framework for integrated value assessment in megaproject FEED.
The findings have significant implications for improving value delivery during the critical FEED phase. The complex web of factors identified in the FE model (RQ1) empirically supports the expanded definition of 'value' discussed in the literature [13,14], moving beyond traditional time-cost-scope metrics [11]. It highlights the need for FEED processes to systematically consider and quantitatively assess the trade-offs between economic performance, social impacts (e.g., literacy, demographics), environmental factors (e.g., energy, water stress, coastal protection), and governance quality. The success of the AVM (RQ2) suggests that data-driven forecasting, incorporating macro-level ESG and economic indicators, can provide valuable foresight during FEED, potentially anticipating future performance implications of early design choices. The ability to quantify uncertainty (UQ) is particularly critical in the high-stakes environment of megaprojects, allowing decision-makers to understand the confidence level associated with predictions and make more risk-informed choices [11]. This directly addresses the identified gap regarding the lack of rigorous analysis for complex trade-offs in the early stages [20].
The results offer empirical context and nuanced perspectives on the theoretical frameworks discussed, enriching their application within the megaproject domain. Regarding Institutional Theory, the empirical significance of the economic and social rights performance score in the FE model (Table 4) and the predictive importance of the lagged rule law estimate in the AVM (Table 8) strongly support this theory. These findings underscore how formal and informal rules, and their consistent enforcement, shape economic and social outcomes. The preference for the FE model, which effectively accounts for unobserved, time-invariant country-specific heterogeneity, further underscores the importance of varying institutional contexts and norms in influencing value drivers and project performance [26,32]. This demonstrates how changes within a nation's institutional quality directly influence its economic environment, which is reflected in housing markets, thereby affecting the contextual value for megaprojects.
Pertaining to the Porter Hypothesis, the findings provide nuanced insights rather than straightforward confirmation. While the FE model (Table 4) shows a positive association between coastal protection (potentially reflecting climate adaptation investment) and house price changes, suggesting that proactive environmental measures can yield economic benefits, other environmental indicators like renewable electricity output were not significant drivers of within-country house price changes in this specific contemporaneous specification, and energy use metrics were negatively associated. This complex and sometimes contradictory pattern suggests that the relationship between environmental regulations/performance and economic outcomes (proxied by house prices) is context-dependent and may not always align with the 'strong' version of the Porter Hypothesis [15,34,37] in the short term. However, the significant predictive importance of lagged renewable electricity output total electricity output in the AVM (Table 8) hints at potential delayed or indirect economic effects of environmental sustainability efforts, warranting further investigation into the temporal dynamics of these relationships, indicating that the benefits of green transitions may accrue over a longer horizon.
While not directly tested, the theory of Regulatory Capture is implicitly addressed by the findings. The observed importance of governance-related variables, such as economic and social rights performance score in the FE model (Table 4) and the predictive significance of rule law estimate lag1 in the AVM (Table 8) underscore that the quality, integrity, and effectiveness of the institutional and regulatory environment are crucial factors influencing economic outcomes. This highlights that if regulatory bodies become 'captured' by industry interests [33,41,44], leading to weakened environmental or social standards or biased enforcement, the observed relationships between ESG factors and economic proxies could be distorted, potentially undermining the positive dynamics suggested by the Porter Hypothesis. This emphasizes the need for robust governance frameworks to ensure that sustainability initiatives genuinely contribute to societal value and are not undermined by political-economic factors.
This research directly addresses the identified gap regarding the underutilization of integrated digital technologies in FEED [23]. The successful application of panel regression, ML (AVM), and UQ [20, 23] provides a robust proof-of-concept for a computationally enabled framework. It demonstrates how diverse data streams (macro-economic, ESG) can be integrated and analyzed using sophisticated techniques available through libraries like stats models, linear models [4], and scikit-learn [5], and mapie. The findings lay the methodological groundwork for future automated digital tools, potentially integrating BIM [19,52] as a data hub with advanced analytics. Such platforms could connect project data with broader contextual data (as analyzed in this study) and leverage PropTech/FinTech functionalities for holistic, automated value assessment and optimization during FEED, moving beyond current fragmented solutions [38,55]. The UQ component is vital for building trust in such automated systems by providing explicit measures of prediction reliability.
The study's findings provide practical insights for a variety of parties involved in megaproject development and policy. For project managers and teams, the study highlights the critical need to incorporate a broader set of quantifiable ESG and socio-economic indicators into FEED assessments. Relying solely on traditional time-cost-scope metrics is insufficient for maximizing long-term value and managing complex risks. Adopting data analytics and ML-based predictive tools (with UQ) can significantly enhance decision-making quality and foster greater stakeholder alignment by providing a more comprehensive, forward-looking understanding of potential future outcomes and associated risks related to a project's broader environmental and social context. Understanding how macro-level ESG factors influence property values can inform site selection, long-term market forecasts, and overall project viability assessments during FEED.
For investors and financial institutions, the empirical link (albeit complex) between macro-level ESG factors and economic performance proxies (house prices), and their predictive power in the AVM, reinforces the financial materiality of ESG considerations in megaprojects [32,40]. Integrating such quantitative analyses and UQ can improve risk assessment and alignment with sustainable finance goals, helping to identify projects and locations that are not only financially viable but also resilient to evolving environmental and social pressures. This can guide investment strategies towards more sustainable and valuable assets. Ultimately, for policymakers, the results underscore the profound influence of the broader institutional and regulatory environment on economic outcomes relevant to large investments. Policies strengthening governance (e.g., rule of law, social rights protection) and promoting targeted environmental actions (e.g., climate adaptation like coastal protection) may foster more favorable conditions for sustainable development and housing market stability, indirectly benefiting megaprojects. The nuanced findings on environmental factors suggest that careful design of regulations is needed to achieve desired economic co-benefits, avoiding unintended consequences. Promoting data availability and standardization for ESG metrics would also facilitate better analysis and more effective policy interventions for sustainable urban and infrastructure development.
Several limitations should be acknowledged. Firstly, the dependent variable, the national house price index, serves as an indirect macro-level proxy for the economic dimension of megaproject value or the context they operate. Findings may not directly map to specific project outcomes, and generalization to specific project-level FEED decisions requires caution. Secondly, the use of country-level aggregate data limits direct applicability to individual project-level analysis and carries the risk of ecological fallacy. Thirdly, the analysis relies on the availability and quality of data from the World Bank and OECD databases, which may have inherent limitations or gaps. Fourthly, the statistical methods identify correlations (panel regression) and predictive associations (AVM), not definitive causal relationships. Fifthly, the findings are specific to the chosen models (VIF-reduced FE, Random Forest); alternative specifications or algorithms might yield different insights. Furthermore, while Random Forest models demonstrate strong predictive power, their "black box" nature can limit the direct interpretability of how individual features combine to influence predictions, beyond overall feature importance. Finally, this study focuses on establishing a methodological foundation rather than developing and validating a ready-to-use software tool.
Building on this work, future research should focus on collecting and analyzing project-level data that includes specific FEED phase decisions, costs, schedules, and multi-dimensional lifecycle value outcomes (economic, social, environmental). Furthermore, future projects should concentrate on developing and validating more comprehensive and direct metrics for megaproject lifecycle value that capture the multi-faceted nature of performance beyond simple proxies. Researchers should explore more sophisticated modeling techniques, including dynamic panel models, causal inference methods (e.g., difference-in-differences if relevant policy changes occur), graph neural networks for stakeholder interactions, and NLP for analyzing textual data from FEED documentation. Additionally, future research work should explore designing, building, and validating integrated digital platforms that operationalize the proposed framework, linking BIM, simulation tools, ML/AI analytics, and UQ capabilities for practical FEED decision support. To gain qualitative insights, researchers can conduct in-depth case studies of megaprojects to qualitatively explore the decision-making dynamics, institutional pressures, and practical challenges of implementing value-driven FEED, complementing the quantitative findings. Ultimately, researchers should explore tracking megaprojects over their full lifecycle to assess the long-term validity of predictions made using AVMs and the actual impact of FEED decisions informed by holistic value frameworks.
This research tackled the significant challenge of embedding multi-dimensional value considerations into the Front-End Engineering Design (FEED) phase of megaprojects, a stage often hampered by reliance on traditional metrics and a lack of systematic, data-driven evaluation. The study sought to establish a foundation for advanced decision support by quantitatively exploring the links between national-level Environmental, Social, and Governance (ESG) factors and economic performance (proxied by house price indices), and by assessing the feasibility of predictive modeling with uncertainty quantification (UQ). The study's originality lies in its novel conceptual linkage of macro-level ESG and housing market dynamics to megaproject value assessment, providing a crucial interdisciplinary bridge. This is further strengthened by its multi-methodological approach, combining panel econometrics for explanatory power of within-country effects with advanced machine learning and robust uncertainty quantification for reliable prediction.
The most critical conclusion drawn from the empirical analysis using the Fixed Effects Model (Table 4) is that a diverse set of quantifiable environmental, social, governance, and economic factors exhibit statistically significant within-country correlations with national house price index variations, even after controlling for fixed country characteristics. This provides strong empirical validation that factors beyond traditional cost-schedule-scope, such as coastal protection, literacy rate, economic and social rights performance, energy imports/use, and population demographics, are intertwined with macroeconomic outcomes relevant to the environments where megaprojects unfold. This finding directly challenges the adequacy of narrow, traditional project evaluation methods and underscores the necessity of adopting a broader, multi-dimensional value perspective early in the project lifecycle. Furthermore, the successful development of the Automated Valuation Model (AVM) using a Random Forest algorithm demonstrates that machine learning techniques can effectively predict future economic indicator levels, with a high R² = 0.87 on test data (Table 8) using lagged ESG and economic data. Crucially, the reliable quantification of uncertainty associated with these predictions, achieving 90.8% empirical coverage with 90% target intervals via Conformal Prediction (Table 9), represents a significant advancement. This implies that it is feasible to move beyond purely deterministic forecasts in FEED, providing decision-makers with a more realistic understanding of potential outcomes and associated risks, thereby supporting more robust and defensible choices.
Methodologically, the study concludes that developing a rigorous, integrated framework for value assessment requires a systematic process. This encompasses careful data sourcing and preparation, diligent management of multicollinearity (VIF reduction, Section 3.2), appropriate statistical and machine learning model selection justified by diagnostic testing like the Hausman test favoring FE over RE (Table 7), and the vital implementation of UQ techniques. These findings extend previous research by providing quantitative, macro-level evidence supporting the integration of ESG factors into economic assessments and by demonstrating a practical application of ML with UQ in this context. The study offers nuanced empirical perspectives on theories like the Porter Hypothesis (showing complex, not always positive, links between environmental factors and the economic proxy) and Institutional Theory (highlighting the significance of governance-related variables).
The implications are substantial: project managers gain a basis for incorporating broader metrics, investors receive further evidence of ESG materiality, and policymakers see the potential influence of regulatory and social environments on economic performance indicators relevant to large investments (as elaborated in Section 5.5). While establishing a valuable methodological proof-of-concept, the conclusions are framed acknowledging the study's limitations, primarily the use of a macro-level proxy (national house price index) rather than direct project value, the aggregate nature of country-level data, and the correlational (not causal) nature of the findings (detailed in Section 5.6). These limitations highlight the need for future research, as recommended in Section 5.7, to focus on: applying similar methodologies to granular, project-specific data; developing more direct and comprehensive lifecycle value metrics for megaprojects; and building and validating integrated digital platforms that translate these methods into practical decision-support tools for FEED.
In summary, this research concludes that adopting more holistic, quantitative, and computationally advanced approaches, specifically integrating machine learning and uncertainty quantification, is not only feasible but necessary for advancing value-driven decision-making in the critical FEED phase of megaprojects. It provides essential groundwork for developing next-generation automated systems capable of enhancing stakeholder alignment, improving risk management, and ultimately increasing the likelihood of realizing intended lifecycle value from complex and costly initiatives, thereby promoting a crucial connection between theory and practice in addressing the great challenges faced by organizations and contemporary society.
The author would like to thank the faculty of Economics and Business staff for their support in the preparation of this manuscript.
To promote transparency and enhance reproducibility, the Python script used for the panel regression analysis and an anonymized version of the dataset, including the extracted panel data regression results and Automated Valuation Model (AVM) outcomes, will be made available as supplementary material on the publisher's website upon publication. This initiative aligns with the journal’s commitment to open science, ensuring ongoing access and supporting future research in this area.
SignUp to our
Content alerts.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Are you the author of a recent Preprint? We invite you to submit your manuscript for peer-reviewed publication in our open access journal.
Benefit from fast review, global visibility, and exclusive APC discounts.