This paper proposes a respiratory pattern recognition method for anxiety disorders based on an SVM-XGBoost-LR stacked ensemble model, enabling objective screening and early monitoring of anxiety disorders through the analysis of respiratory signals. A portable respiratory monitoring system was designed and implemented based on the ESP32 microcontroller and SM9541 high-precision pressure sensor, capable of real-time acquisition of oral-nasal airflow signals and data transmission via WiFi. In the algorithmic processing, Kalman filtering was employed for signal denoising, combined with a three-stage feature selection strategy (VIF-ANOVA-RF) to extract key respiratory features, along with the introduction of a Transformer self-attention mechanism for feature enhancement. The proposed SVM-XGBoost-LR stacked ensemble model achieved an overall classification accuracy of 96% on a dataset containing 600 samples, and improved the F1-score for the minority class (simulated breath-holding) from 60% to 87%, effectively alleviating the issue of missed diagnoses of abnormal patterns caused by class imbalance. Experimental results indicate that this method demonstrates superior performance in identifying anxiety-related respiratory abnormalities, providing a reliable algorithmic foundation and technical pathway for the development of portable mental health monitoring devices.
Anxiety, often described as nervousness, worry, or fear, is a natural reaction that everyone experiences from time to time [1]. Although it can help us focus or be particularly cautious when necessary, excessive or prolonged anxiety can interfere with daily life [2]. Anxiety disorders are the most common mental disorders [3], with a lifetime prevalence of 12.9% [4]. Since many anxiety disorders occur during childhood and adolescence and predict future psychopathology, it is essential to identify high-risk populations and implement early interventions as a key treatment strategy [5].
The progression from daily emotional responses to concerning states of anxiety, ultimately developing into clinically diagnosable anxiety disorders, often occurs gradually. Patients with anxiety disorders frequently exhibit characteristic abnormal breathing patterns, such as HVS and disrupted respiratory rhythms [1,6]. Therefore, breath analysis is a powerful non-invasive technique that can be used to diagnose diseases. Its main advantage is that it allows for unlimited sampling and enables real-time monitoring of disease status; thus, breath analysis is becoming a promising tool for diagnosing and monitoring anxiety disorders [7]. However, individuals with anxiety disorders often struggle to objectively identify and determine whether their anxiety has reached a pathological level, resulting in delays in seeking medical care. Therefore, it is essential to utilize professional tools for initial screening to monitor 'danger signals' and facilitate early intervention. This need for objective screening tools has propelled the advancement of anxiety disorder tracking technologies.
The screening and monitoring techniques for anxiety disorders have evolved from traditional scales to the analysis of multimodal physiological signals. In 2006 [8], Spitzer and colleagues developed a screening tool known as the Generalized Anxiety Disorder 7-item scale (GAD-7) to screen for generalized anxiety disorder in primary care populations [9]. Since then, the GAD-7 has been accepted as a screening tool for other anxiety disorders, including panic disorder, social anxiety disorder, and post-traumatic stress disorder [10]. Rashmi Panda, et al. [11] proposed a novel fuzzy VGG-16 neural network with a customized classification layer, utilizing single-channel ECG signals from wearable sensors for anxiety detection, validating the strong correlation between abnormal autonomic nervous activity and anxiety. In addition, Lin Lu and others designed a voluntary facial expression imitation experiment, collecting data from 323 participants, and developed the SFE-Former model for the identification of depression and anxiety [12]. However, methods for recognizing depression based on facial expressions have limitations, as they do not effectively model the correlations of temporal features, are easily influenced by dominant expression data, and may lose differential information when handling expression heterogeneity. Research indicates that when symptoms such as hyperventilation or palpitations, accompanied by chest tightness and shortness of breath, occur frequently, caution should be exercised regarding the potential for anxiety disorders [13,14]. Individuals with anxiety disorders exhibit characteristic respiratory disturbances, such as hyperventilation and increased variability in breathing rhythm, which can be quantified using high-precision MEMS sensors. In contrast, analysis based on respiratory signals offers a more promising approach to addressing the limitations above. The research conducted by Xie Yougan, et al. [15] has confirmed the reliability of the SM9541 sensor in the high-precision monitoring of respiratory parameters [13,15], providing a basis for hardware selection in this study.
Despite the progress above, current technology still faces critical bottlenecks. The GAD-7 has limited efficacy in screening comorbidities and specific populations, and it cannot differentiate between anxiety and palpitations induced by somatic diseases [1]. The ECG model relies on wearable electrodes, which exhibit poor long-term comfort. Additionally, facial expression recognition is susceptible to interference from environmental lighting and individual expression habits, resulting in inadequate stability [12]. Existing respiratory monitoring devices primarily focus on assessing ventilation functions (such as lung capacity) and lack an end-to-end monitoring system specifically designed to identify and evaluate anxiety-specific breathing patterns. Therefore, the development of a portable anxiety monitoring system based on respiratory signals can not only address the shortcomings of current technologies but also provide objective and reliable physiological indicators for the early screening of anxiety disorders.
To address the shortcomings of existing technologies, this study systematically elucidates the characteristic breathing patterns of patients with anxiety disorders for the first time and innovatively develops a comprehensive respiratory monitoring solution. A high signal-to-noise ratio signal acquisition is realized by creating a lightweight portable monitoring terminal based on ESP32 and SM9541 sensors. At the algorithm level, the proposed SVM-XGBoost-LR stacked ensemble model integrates Kalman filter noise reduction, VIF-ANOVA-random forest three-level feature screening and Transformer self-attention mechanism, so that the model achieves an overall accuracy of 96% on a dataset of 600 samples, and significantly improves the F1-score of minority classes by 64% to 87%, which effectively verifies the clinical application potential of breathing patterns as biomarkers of anxiety disorders. This achievement confirms the reliability of breathing patterns as biomarkers for anxiety disorders and provides new ideas for developing portable clinical diagnostic devices.
The main contributions of this study are summarized as follows:
This research systematically reveals, for the first time, the characteristic breathing patterns of patients with anxiety disorders and achieves their accurate identification, providing an adequate algorithmic assurance for addressing the issue of missed diagnoses of high-risk abnormal patterns in medical AI.
A portable monitoring device based on the ESP32 microcontroller and the SM9541 high-precision pressure sensor has been designed. It collects airflow signals from the nasal passages through a breath airflow collection tube and transmits data via WiFi, achieving low-cost and highly accessible home monitoring.
A stacking ensemble model (SVM-XGBoost-LR) based on Transformer feature enhancement and VIF-ANOVA-RF three-level feature selection has been proposed. This model significantly improves the recognition capability for minority class abnormal respiratory patterns while maintaining high overall accuracy.
This study designed a highly integrated respiratory monitoring device based on the ESP32 [16] main control chip, whose core function is to collect respiratory pressure signals in real-time through a high-precision pressure sensor. The device acquires oral and nasal airflow signals via a dedicated respiratory airflow conduit, and the collected raw respiratory pressure signals are first pre-processed, from which a feature set is extracted using a built-in algorithm. These features will be used for subsequent classification of respiratory patterns.
Figure 1 of the hardware physical diagram shows that the system achieves a highly integrated hardware design based on a single PCB board, incorporating core modules such as central control processing, respiratory signal acquisition, audio acquisition, data storage, wireless communication, and power management. Through optimization of the PCB layout and signal conditioning circuit design, cross-talk between modules has been effectively suppressed, ensuring the stability of system operations.
The hardware system is centered around the ESP32 main control chip, which coordinates the data collection from multiple sensors, implements local data caching, and transmits the data to the upper computer via a dual-mode wireless transmission method using Wi-Fi/Bluetooth. The power module utilizes a lithium battery and a high-efficiency charge and discharge management chip to provide a continuous and stable energy supply, ensuring long-term monitoring requirements. The entire hardware system features a compact structure, low power consumption, and high integration, fully meeting respiratory monitoring applications' real-time and reliability needs.
The SM9541 sensor (Figure 2) is based on MEMS technology and incorporates CMOS mixed-signal processing technology [17], providing fully digital pressure sensing and temperature compensation functions. It features a SOIC-16 standard package with dual vertical pressure columns, covering a pressure measurement range from 10 cmH₂O to 140 cmH₂O (approximately 0.14-2 PSI), capturing pressure changes from normal to abnormal breathing patterns (such as dyspnea or apnea). Throughout the entire measurement range, the accuracy is ±1% FS, maintaining stable output within an ambient temperature range of -5°C to 65°C, effectively suppressing the interference caused by temperature drift. This makes it suitable for applications sensitive to pressure changes. This system utilizes the sensor to precisely monitor the minute pressure fluctuations induced by breathing, providing critical data support for assessing respiratory abnormalities associated with anxiety disorders.
In practical applications, the SM9541 connects the user's nasal cavity and the sensor's air nozzle through the trachea, capturing real-time pressure changes within the nasal cavity during respiration: pressure decreases upon inhalation and increases upon exhalation. The sensor converts these pressure fluctuations into digital signals at a sampling rate 100Hz and transmits them to the ESP32 main control chip. The main control chip performs preliminary processing and buffering of the data, and in conjunction with the respiration detection algorithm, ultimately achieves a comprehensive assessment of the user's respiratory status and provides early warnings for abnormalities.
The system achieves wireless communication between the sleep breathing detection hardware terminal and the host computer through the ESP32-WROOM-32E WiFi module from Espressif Systems [18]. The module has a dual-core processor, supporting a clock frequency of up to 240 MHz. It integrates WiFi, Bluetooth, BLE RF, low-power baseband, and analog and digital interfaces. It complies with WiFi 802.11n and Bluetooth 4.2 standards, supports IEEE 802.11b/g/n/e/i protocols, with a maximum transmission rate of 150 Mb/s and a peak transmission power of 19.5 dBm. It features a built-in TCP/IP protocol for TCP data transmission, with WiFi reception sensitivity reaching -98 dBm and continuous UDP throughput of 135 Mb/s. The module supports three WiFi configurations: Station, AP, and Station AP. This module is scalable and adaptive, powerful in functionality, and versatile in application, making it suitable for low-power sensor networks to meet system requirements.
The lower computer system is based on the ESP32 platform, primarily responsible for tasks such as A/D conversion, timed sampling, and data transmission. After the system is powered on, it communicates with the upper computer by connecting to WiFi. It then initializes the UART serial port to communicate with sensors and configures GPIO pins to control LED indicators that display the operational status. The system reads data from the SM9541 pressure sensor via I2C in the main loop and collects analog signals through the ADC pins. The data collected during each acquisition is encapsulated into data frames and transmitted to the upper computer using the TCP protocol. When the pressure value falls below the set threshold, the system triggers data transmission and indicates the connection status through the LED. In the event of an abnormal connection, it will automatically reconnect to ensure stable communication. The system implements data acquisition and transmission using software logic, without interrupts or DMA, resulting in a simple structure and efficient operation, suitable for remote monitoring of physiological parameters such as respiratory signals. The specific process can be seen in figure 3 below.
This study used a mechanical simulation device to generate respiratory pattern data without the need for human subjects. Using a large-volume syringe connected to a pressure sensor, the piston's movement was precisely controlled to simulate three breathing patterns: normal breathing (18-23 breaths per minute), hyperventilation (24-30 breaths per minute), and breath-holding (pauses of 3-5 seconds, with a frequency of less than 18 breaths per minute). Each pattern was simulated sequentially according to clinical characteristics, with each test lasting one minute as a sample unit.
The system successfully reproduced abnormal respiratory fluctuations, including paroxysmal hyperventilation and breath-holding events, accurately reflecting the respiratory morphology characteristics of anxiety. A dataset of 600 samples was constructed, consisting of 400 normal breathing samples (category 0), 150 hyperventilation samples (category 1), and 50 breath-holding samples (category 2). This class distribution was designed based on clinical prior knowledge and is consistent with the observation that anxiety disorders are more likely to trigger hyperventilation than breath-holding.
A stratified sampling strategy was used to split the training and validation sets into an 80:20 ratio to ensure consistent class distribution and enhance the reliability and generalization of model evaluation. This method provides a reproducible and low-cost data generation solution, while ensuring the high clinical relevance of the data to anxiety disorder research through parameter control and process design.
Before feature selection, the received data undergoes rigorous validation, including column name integrity and validity checks. Subsequently, data cleaning is performed to handle missing and abnormal values, and timestamps are standardized to a standard datetime format, generating a relative time series based on the first recorded entry. Kalman filtering [19] is applied to reduce noise interference and smooth the time series data. As illustrated in the 'Signal Comparison' results of figure 4, the original input signal, which could have exhibited fluctuations or noise, is significantly smoothed after Kalman filtering. The output signal stabilizes around 0.5, effectively filtering out noise while preserving the overall trend of the signal, thereby providing a cleaner and more stable foundation for subsequent feature extraction.
Figure 5 shows this paper draws Kernel Density Estimation (KDE) [20] curves for the first 16 numerical features to reveal their statistical distribution and scale differences. The results show that time domain statistics (such as mean and standard deviation) reflect the baseline level and fluctuation of respiratory pressure; high-order features (such as skewness and kurtosis) reflect the asymmetry and peak characteristics of the signal distribution morphology; frequency domain and time-frequency domain features reveal the respiratory fundamental frequency, spectral complexity, and multi-scale laws. In addition, the rate of change and respiratory event-related features can intuitively reflect the stability of the respiratory rhythm and the frequency of abnormal pauses. This analysis clarifies the significant scale differences between features, highlights the necessity of standardization and transformation, and lays the foundation for subsequent feature selection and classification model construction.
This study employs a multi-dimensional feature selection strategy to systematically evaluate feature importance through three complementary methods (Figure 6). The Variance Inflation Factor (VIF) [21] is used to analyze and quantify the multicollinearity among features, strictly eliminating features with high correlation where VIF > 5, thereby ensuring the independence of the feature set and the stability of the model. Secondly, a univariate feature selection method based on the ANOVA F-value was employed to select features with a significant statistical correlation with the target variable (p < 0.05). Finally, a Random Forest model [22] containing 500 decision trees was utilized, and the contribution of each feature to the classification decision was assessed through Gini importance scores. All features were standardized.
An optimal feature subset is ultimately obtained by integrating the screening results of the three methods through a weighted comprehensive scoring method (Formula 1). The weight allocation of the weighted comprehensive scoring method [23] was determined through experimental grid search. This study attempted various weight combinations, such as (0.5, 0.25, 0.25), (0.3, 0.35, 0.35), and (0.4, 0.3, 0.3), using the average F1-score from 5-fold cross-validation on a simple XGBoost classifier based on a preliminarily screened subset of features (particularly focusing on the F1-score of the minority class) as the evaluation metric. The experimental results indicate that the weight combination of (0.4, 0.3, 0.3) can achieve the optimal overall performance and is therefore adopted.
The feature subset table 1 contains 15 features, effectively mitigating multicollinearity while providing in-depth insights into abnormal breathing patterns from multiple dimensions. The time-domain features (such as Std_Pressure_Change) quantify the intensity of fluctuations in breathing amplitude; the frequency-domain features (such as Spectral_Entropy) reflect the stability of breathing rhythm; wavelet features (including energy and entropy) precisely capture local patterns across different time scales (such as high-frequency wheezing and low-frequency breath-holding); nonlinear entropy features (such as Sample_Entropy) effectively quantify the complexity and chaos of breathing patterns; and autocorrelation features (such as ACF_Lag_1) reveal the inherent dependencies of breathing cycles. This diversified combination of features exhibits robustness and interpretability, forming a quantitative foundation for identifying pathological breathing patterns, thereby providing reliable input for the subsequent construction of high-precision classification models.
| Table 1: Feature explanation table. | ||
| Feature Name | Feature Type | Physiological meaning explanation |
| Wavelet_Entropy_Level_1 | Wavelet entropy characteristics | Reflects the degree of disorder of the high-frequency components of breathing. The higher the value, the more irregular the breathing. |
| Wavelet_Energy_Level_1~4 | Wavelet energy characteristics | Level 1-2: high-frequency breathing fluctuations (panting), Level 3-4: low-frequency breathing rhythm (deep breathing/holding breath) |
| ACF_Lag_1~5 | Autocorrelation characteristics | Reflects the time dependence of respiratory periodicity, Lag1 focuses on adjacent breaths, and Lag5 reflects long-term patterns |
| PACF_Lag_1 | Partial autocorrelation feature | Detect transient changes in breathing and exclude interference from other time steps |
| Sample_Entropy | Nonlinear dynamic characteristics | Lower values indicate more regular breathing; anxious patients often have more complex breathing patterns. |
| Approximate_Entropy | Nonlinear dynamic characteristics | Capturing small variations in breathing patterns |
| Spectral_Entropy | Frequency domain characteristics | The more dispersed the spectrum energy is, the higher the entropy value is, which reflects the stability of the breathing rhythm. |
| Std_Pressure_Change | Time domain statistical characteristics | Reflects the intensity of fluctuations in breathing amplitude |
This study addressed the three-class classification requirement for anxiety disorder respiratory pattern recognition tasks by innovatively constructing a dual-layer stacked ensemble model [24]. This model deeply integrates the stability of traditional machine learning algorithms with the advantages of feature enhancement provided by deep learning. By selecting SVM and XGBoost as complementary base classifiers, we fully leverage their synergistic advantages in feature space modeling; we introduce the Transformer multi-head self-attention mechanism to extract more discriminative higher-order feature representations from the original respiratory features; and we use logistic regression as the meta-model to achieve the optimal weighted fusion of the base model prediction probabilities. This results in a two-layer stacking architecture, as illustrated in figure 7, which significantly enhances the model's accuracy and
Basic principles of SVM: SVM (Support Vector Machine) [25] is a supervised learning model used for classification and regression analysis, widely applied in addressing small sample, nonlinear, and high-dimensional pattern recognition problems. The distance between healthy samples and samples with respiratory abnormalities is maximized by constructing the optimal classification hyperplane in the high-dimensional feature space mapped by the kernel function. As shown in figure 8, the SVM transforms the input features into a high-dimensional space through a nonlinear transformation, converting the original nonlinear classification problem into a linearly separable problem in high-dimensional space [26]. The final decision function is:
In the equation, the nonlinear transformation that maps physiological indicators to a high-dimensional space is represented by the normal vector of the hyperplane in the high-dimensional feature space, and b refers to the offset of the classification hyperplane.
The accuracy of model predictions is closely related to parameter selection. This study determined the optimal parameters through a 5-fold cross-validation grid search, using the RBF as the kernel function. The regularization parameter C was set to 0.5, and the gamma parameter adopted the 'scale' automatic scaling strategy to optimize model complexity and generalization ability, thereby achieving effective three-class classification of respiratory abnormal states.
The fundamental principles of xgboost gradient boosting trees: XGBoost is an ensemble learning algorithm based on gradient boosting. Its core idea is to construct an incremental model by gradually adding decision trees, where each new tree fits the residuals of the previous tree, progressively approaching the optimal solution (Figure 9) [27]. This algorithm is capable of effectively handling temporal features and class imbalance issues in the detection of respiratory anomalies.
The objective function of XGBoost comprises the loss function and the regularization term.
In the formula, is the direct loss function between the predicted value and the actual value , while is the regularization term of the k-th tree, defined as:
In the formula, T represents the number of leaf nodes, and are hyperparameters that control the complexity of the tree structure and weight decay, respectively. Both work together to mitigate overfitting.
In constructing the XGBoost model, the optimal hyperparameter combination was determined through grid search and 5-fold cross-validation: a learning rate of 0.1, a maximum tree depth of 4, and an L1 regularization coefficient of 0.1. These parameters collectively optimize the model's complexity and generalization ability, making it suitable for the classification task of respiratory signals.
The principles of the LR meta-model: In stacked ensemble learning, the core function of the meta-model is to make the final decision based on the outputs of the base learners. This study selects LR [28] as the meta-model. LR outputs well-calibrated probabilities through the Sigmoid function for binary classification or the Softmax function for multi-class classification. These probabilities are not only highly interpretable but also facilitate the computation of evaluation metrics such as F1-score and the setting of classification thresholds, making them particularly suitable for medical analysis scenarios that require confidence assessment. As a low-complexity linear model, LR effectively learns the output combination patterns on top of the powerful feature representation capabilities of established base models, such as SVM and XGBoost, reducing the risk of overfitting and significantly enhancing the overall model's generalization ability. Finally, LR can integrate the advantages of SVM in clearly defined classification boundaries and XGBoost in capturing complex nonlinear relationships by linearly weighting and balancing the predictive perspectives of different models, thereby achieving complementary strengths and collaborative decision-making across heterogeneous models.
In regard to two base models handling a three-class classification problem, each base model outputs a 3-dimensional probability vector for a given sample. The probability outputs from all base models are concatenated to form a meta-feature vector . The LR meta-model then learns a mapping from the meta-feature xmeta to the final class label y.
For the final three-class classification task (K = 3), multinomial logistic regression (Softmax regression) is employed. The model calculates the probability of the sample belonging to each category k as follows:
Among them, is the k-th class weight vector; is the k-th bias term; is the 6-dimensional probability vector output by the base model (3 classes each for SVM and XGBoost). The model learns the parameters and b by minimizing the cross-entropy loss function.
where N represents the sample size; is an indicator function.To prevent overfitting, add L2 regularization term to the loss function.
To mitigate the impact of class imbalance in the dataset on the training of the meta-model, class weights were introduced into the loss function, assigning higher weights to minority classes, thereby granting them a more significant role in the parameter optimization process.
This study innovatively integrates the multi-head self-attention mechanism [19] of the Transformer architecture into a stacked ensemble model, achieving feature optimization through a three-stage computation.
This mechanism is accomplished through the following steps:
Firstly, the input feature X is mapped to Query, Key, and Value matrices, and the relevance weights between features are computed using the scaled dot-product attention mechanism.
In the equation, Q, K, and V represent the query matrix, key matrix, and value matrix respectively; denotes the dimensionality of the key vectors.
Subsequently, the output from the attention mechanism is transformed into the breathing positivity probability through a fully connected layer, which indicates the risk score of the sample:
Among them, σ() is an activation function, wc is the weight matrix of the fully connected layer, and bc is the bias vector.
Finally, the original normalized features are concatenated with the risk probability features to form an enhanced feature matrix:
is expanded a into a vector, where || denotes column concatenation.
Through the feature enhancement mechanism, the original statistical characteristics of physiological indicators are fully retained, while the self-attention mechanism effectively captures high-order feature interaction information. Systematic experimental analysis was conducted to validate the effectiveness of this mechanism.
This study employs a two-level stacked ensemble strategy, where the first layer consists of SVM and XGBoost base models that generate class probability predictions, respectively. The second layer utilizes a logistic regression meta-model to learn the optimal fusion weights, concatenating the predicted probabilities of the base models as input features for the final decision. Regarding model optimization, techniques such as Dropout, weight decay, and early stopping are employed to control overfitting. SMOTE is applied to address class imbalance issues, and hyperparameters are optimized through cross-validated grid search. The innovation of this framework lies in the complementary fusion of multiple algorithms, the effective integration of deep learning feature enhancement with traditional ensemble learning, and the end-to-end optimization design of the entire process. Ultimately, it achieves an overall accuracy rate of 96% on the test set, significantly improving the F1-score for minority class recognition from 0.53 to 0.87, thereby providing a reliable technological solution for accurately identifying anxiety-related respiratory patterns. This study further establishes a unified evaluation standard and comparative experimental framework to systematically quantify the performance enhancement effects of the aforementioned ensemble strategies.
The main function of evaluation metrics is to assess the quality of model performance. In classification tasks, they can assist in determining the accuracy of the model's predictions regarding labels, where the predictions generated by an excellent model are predominantly correct [29]. Table 2 summarizes the most commonly used evaluation metrics and their calculation methods in clinical diagnostics, reflecting the model's discriminative ability from different perspectives.
| Table 2: Core evaluation indicators and significance of medical diagnosis. | ||
| Indicator Name | Official | Clinical significance |
| Accuracy | Comprehensive assessment of the overall prediction accuracy of the model, applicable for performance evaluation of balanced datasets. | |
| Recall | Reflecting the capability of disease detection, a high value indicates a low miss rate. | |
| Specificity | Measuring the identification ability of healthy populations; a high value can avoid misdiagnosis. | |
| Precision | Assessing the reliability of positive predictions; low values may lead to overmedicalization. | |
| F1-score | A comprehensive index for balancing missed diagnoses and misdiagnoses. | |
In the comparative experiments of the anxiety disorder breathing detection model, we selected basic classification algorithms such as KNN, Random Forest, Logistic Regression, Linear Discriminant Analysis, Decision Tree, SVM, and XGBoost as baseline models. Furthermore, to incorporate the latest advancements in the field, we also implemented and evaluated two state-of-the-art methods-the Adaptive Neuro-Fuzzy Inference System (ANFIS) [30] and a Convolutional Recurrent Neural Network (CRNN)[31] -on our dataset. These approaches were inspired by recent successful applications in respiratory disease diagnosis, such as fuzzy logic-based neural networks for acute respiratory failure, achieving 97.7% accuracy, and CRNN models for lung sound classification, achieving up to 98.6% accuracy. We systematically compared our proposed model (i.e., the SVM-XGBoost-LR stacked ensemble model) against all these methods across multiple performance dimensions. The detailed metrics of each model on the test set are presented in table 3.
| Table 3: Comparison results of evaluation metrics for each model. | |||||||||||||||
| Model Type | Performance Evaluation of the Test Set | CV Results | |||||||||||||
| Precision | Recall | F1-score | AUC | Acc | ACC | AUC | |||||||||
| 0 | 1 | 2 | 0 | 1 | 2 | 0 | 1 | 2 | 0 | 1 | 2 | ||||
| KNN | 0.98 | 0.97 | 0.65 | 0.94 | 0.93 | 0.93 | 0.96 | 0.95 | 0.76 | 0.98 | 0.98 | 0.99 | 0.93 | 0.92 | 0.96 |
| RF | 0.98 | 0.97 | 0.75 | 0.95 | 0.95 | 0.93 | 0.97 | 0.96 | 0.83 | 0.99 | 0.99 | 0.99 | 0.95 | 0.94 | 0.98 |
| LR | 1.00 | 0.95 | 0.53 | 0.90 | 0.95 | 0.93 | 0.94 | 0.95 | 0.68 | 0.99 | 0.99 | 0.98 | 0.91 | 0.93 | 0.98 |
| LDA | 0.99 | 0.97 | 0.51 | 0.89 | 0.91 | 1.00 | 0.93 | 0.94 | 0.68 | 0.99 | 0.97 | 0.98 | 0.90 | 0.92 | 0.96 |
| DT | 0.95 | 0.97 | 0.78 | 0.97 | 0.95 | 0.68 | 0.96 | 0.96 | 0.73 | 0.97 | 0.97 | 0.90 | 0.94 | 0.89 | 0.93 |
| SVM | 1.00 | 0.97 | 0.59 | 0.91 | 0.95 | 1.00 | 0.95 | 0.96 | 0.74 | 0.99 | 0.99 | 0.98 | 0.93 | 0.93 | 0.98 |
| XGBoost | 1.00 | 0.95 | 0.53 | 0.90 | 0.95 | 0.93 | 0.94 | 0.95 | 0.68 | 0.99 | 0.99 | 0.98 | 0.91 | 0.93 | 0.98 |
| ANFIS | 0.97 | 0.97 | 0.5 | 0.92 | 0.96 | 0.75 | 0.94 | 0.97 | 0.60 | 0.98 | 0.99 | 0.68 | 0.91 | 0.87 | 0.84 |
| CRNN | 0.99 | 0.98 | 0.56 | 0.91 | 0.93 | 0.94 | 0.95 | 0.95 | 0.70 | 0.98 | 0.99 | 0.98 | 0.92 | 0.93 | 0.97 |
| Our | 0.98 | 0.95 | 0.87 | 0.98 | 0.95 | 0.87 | 0.98 | 0.95 | 0.87 | 0.99 | 0.99 | 0.99 | 0.96 | 0.94 | 0.98 |
Unequivocally demonstrates the superior classification capability of our proposed model on the test set, achieving a notable overall accuracy of 0.96. A particularly salient observation pertains to its performance on Category 2 (simulated breath-holding instances), where our model attains an F1-score of 0.87. This represents a marginal improvement of 4.8% over the suboptimal RF classifier and a substantial advancement of 24.3% relative to the ANFIS model (F1-score: 0.60), thereby underscoring its exceptional proficiency in recognizing underrepresented classes.
While several benchmark models, including RF, XGBoost, and the CRNN, exhibited strong overall ranking performance as indicated by AUC values predominantly exceeding 0.98, closer inspection of confusion matrices reveals pronounced instability under class-imbalanced conditions. Notably, although the deep learning-based CRNN architecture outperformed conventional machine learning approaches across most metrics, it still exhibited considerable deficiency in discriminating Category 2 samples, with an F1-score of 0.70 compared to the 0.87 achieved by our model.
Conversely, all baseline models demonstrated robust performance on the majority classes (Category 0: normal breathing; Category 1: hyperventilation), with F1-scores consistently surpassing 0.90, suggesting that these categories are inherently more separable. Of particular significance is that our model’s test accuracy (0.96) slightly exceeds its cross-validation performance (mean accuracy: 0.94), an indication of effective generalization and remarkable resilience to overfitting. The markedly subpar performance of the ANFIS model further accentuates the complexity of the classification task at hand. It emphasizes the robustness and architectural efficacy of our proposed methodology.
Based on the visualization results of the confusion matrix in figure 10, it is evident that the diagonal elements are the darkest in the matrix diagram corresponding to the 'Our' model, the number of misclassifications in the non-diagonal regions is the smallest, and the prediction distribution across the three categories is the most uniform. This indicates that the model demonstrates robustness in the majority classes and a high degree of consistency in recognizing minority classes, with no apparent preference bias. A detailed analysis of the misclassifications of other models reveals that XGBoost exhibits a certain number of misclassifications of category 1 from category 2, while RF displays a similar trend of instability. This suggests that although these models possess strong overall discrimination capabilities, there remains uncertainty in the decision boundary when addressing the highly skewed sample distribution. In contrast, the 'Our' model maintains the lowest error count across all categories, notably achieving a significantly lower error rate in category 2 compared to the other models. It further validates the advantages of its stacked ensemble structure and feature enhancement mechanism in handling imbalanced data.
In summary, combining the numerical results of table 3 with the visual analysis of Figure 10, our model significantly outperforms traditional machine learning models in terms of overall classification performance, minority class recognition, and model generalization capability, providing reliable technical support for the screening of respiratory abnormalities in anxiety disorders.
The ablation experiment is based on the principle of controlling variables and aims to evaluate the contribution of each module in the model to the overall performance improvement [32].
Synergistic gain analysis of stacking integration strategy: The experiment compares the performance of Support Vector Machine (SVM), XGBoost, and the SVM–XGBoost–LR stacked ensemble model (OUR) proposed in this paper in the three-classification task. The results are shown in figure 11. The stacked ensemble model employs a two-layer architecture. The base layer comprises a parallel SVM and XGBoost implementation, outputting a class probability vector. The meta-learning layer employs a logistic regression meta-classifier. Its core innovation is concatenating the base layer's probability output with the original features using the passthrough=True setting, thereby synergizing high-level predictions with underlying feature information for decision making. To prevent data leakage, meta-feature generation is performed using strict 5-fold cross-validation. A weighted F1-score is used to balance class imbalance, supplemented by cross-class recall differences to assess the balance of model performance.
Experimental results show that XGBoost performs poorly on category 2 (simulated breath-holding) due to sample imbalance and complex decision boundaries. While SVM offers reasonable stability, its nonlinear modeling capabilities are limited. In contrast, our stacked ensemble model, combining the boundary discrimination strengths of SVM with the feature-fitting capabilities of XGBoost, achieves significant improvement on the key category 2 (F1-score: 0.87) and more balanced overall performance. These results demonstrate the effectiveness of heterogeneous model fusion strategies for anxiety-related breathing pattern recognition and provide an algorithmic foundation for building robust mental health monitoring systems.
Anti-overfitting effect of self-attention mechanism: To explore the impact of the self-attention mechanism on the model's generalization ability, we compared the training process and performance before and after introducing this mechanism (Figures 12,13). Table 4 provides a quantitative comparison of key evaluation metrics.
| Table 4: Comparison of evaluation indicators of self-attention mechanism ablation experiment. | |||||
| Model Type | Performance Evaluation of the Test Set | ||||
| Precision | Recall | F1-score | AUC | Acc | |
| WITH-ATTENTION | 0.87 | 0.96 | 0.91 | 0.99 | 0.96 |
| WITHOUT-ATTENTION | 0.93 | 0.95 | 0.94 | 0.99 | 0.97 |
In contrast, after introducing the self-attention mechanism (Figures 13,14), the model exhibited more robust learning behavior: training and validation losses decreased simultaneously and eventually converged, the validation accuracy (approximately 96%) maintained a reasonable gap with the training accuracy, and the validation set AUC value remained stable at a high level of 0.995. Furthermore, the weighted F1-score also significantly improved (Table 1). These results demonstrate that the self-attention mechanism effectively suppresses overfitting and improves generalization performance by enhancing the model's focus on key features.
In summary, while the self-attention mechanism increases model complexity and training cost, it significantly improves generalization and practical value, making it suitable for applications requiring high reliability.
To evaluate the generalization performance of the proposed SVM-XGBoost-LR stacked ensemble model, this study used the public dataset "Pressure, flow, and dynamic chest and abdominal circumference data of adults under CPAP therapy" [33] from the international authoritative physiological signal database PhysioNet [34] for external validation. This dataset contains respiratory signals from 30 healthy adults aged 19-37, obtained through open recruitment at the University of Canterbury with ethical approval and informed consent. The dataset covers different genders (15 males and 15 females), and includes smokers, e-cigarette users, and asthma patients, showing good diversity and representativeness.
This study applied the trained SVM-XGBoost-LR model to the open source test set without retraining or parameter adjustment to test its cross-dataset recognition ability.
As shown in figure 15, the model demonstrates excellent generalization performance on the external dataset, with all metrics remaining at high levels. The overall accuracy is 0.778, and the macro-average precision, recall, and F1 score are 0.795, 0.790, and 0.792, respectively. These numbers demonstrate that the model has balanced and stable discrimination capabilities across different categories. In particular, the macro-average AUC reaches 0.910, demonstrating the model's strong discriminative power in class differentiation.
Further analysis was conducted using radar plots from two perspectives: overall performance and category specificity (Figures 16,17). Figure 16 shows that the model's performance on multiple key metrics is close to the outer edges of the radar plot, forming a complete and balanced polygonal structure, indicating good overall performance. Figure 17 reveals the model's recognition characteristics for different breathing patterns at the category level: Category 0 (normal breathing) achieved near-perfect performance across all metrics, with the highest recognition reliability. Category 1 (hyperventilation) demonstrated outstanding precision with a low false positive rate. Despite the smaller number of samples in Category 2 (breath holding), the model maintained an acceptable recall rate, demonstrating its ability to capture minority class samples effectively.
The normalized confusion matrix in figure 18 further demonstrates the model's performance in fine-grained classification. The high diagonal element values indicate that the model has good recognition capabilities for all three breathing patterns. Specifically, the classification accuracy for standard breathing patterns reached 0.87, deep breathing patterns 0.72, and hyperventilation patterns 0.79. The off-diagonal elements indicate that the model's primary confusion occurs between regular and deep breathing patterns, with mutual confusion rates of 18% and 13%, respectively. This reflects the proximity of these two patterns and demonstrates that the features learned by the model are physiologically plausible.
The SVM-XGBoost-LR model performed well overall in external validation and maintained good recognition consistency across various respiratory categories. In particular, when processing samples with minor inter-class differences, it still had strong discrimination capabilities, verifying its good generalization and practical value in respiratory pattern recognition tasks.
This study successfully developed a portable respiratory monitoring system using an ESP32 microcontroller and an SM9541 sensor. Incorporating the proposed SVM-XGBoost-LR stacked ensemble model, the system demonstrated excellent performance in identifying breathing patterns associated with anxiety disorders, achieving 96% accuracy on the test set and significantly improving the F1 score of the minority class (simulated breath holding) to 87%. The model effectively enhanced its ability to distinguish different breathing patterns, exceptionally clinically rare but important abnormal patterns, through the collaborative integration of multiple algorithms and Transformer feature enhancement.
Despite the promising results, this study still has some limitations, such as a relatively limited sample size, reliance on simulated data, and high model complexity, which may affect practical deployment efficiency. However, these limitations do not diminish the core value of this research. The results strongly validate the potential of respiratory patterns as an objective screening indicator for anxiety disorders. By combining a low-cost, highly integrated hardware system with a high-performance algorithm model, this study provides key algorithmic support and a clear technical path for developing portable mental health monitoring devices. This study's feature enhancement and ensemble learning model demonstrates excellent generalization and transferability. It is not only applicable to breathing pattern analysis for anxiety disorders, but also provides a viable biomedical engineering paradigm for early warning and health monitoring of various chronic conditions, including diabetes and cardiovascular disease. Future work will focus on expanding clinical sample sizes, lightweighting the model for embedded deployment, and further exploring the system's potential for real-world applications, broadening its scope to a broader range of health monitoring applications.
The author, Yanming Huo, contributed as a consultant for this study and was responsible for overall guidance and project design. As one of the authors, Luyuan Jia mainly participated in manuscript writing, data analysis, and discussion of results. Luyuan Jia co-authored the main text of the manuscript, while Yanming Huo was responsible for the overall framework. Team members Guo Zhang, Jiajing Ma, Congkang Zhang, Xu Guo, Shen’ao Hao, and Yongdong Song participated in experimental design, data collection, data preprocessing, etc. All authors reviewed and approved the final manuscript submission, and Yanming Huo served as the corresponding author and was responsible for communicating with the journal's editorial office. All authors have read and agreed to the published version of the manuscript.
This research received no external funding.
The data collected by you in this study cannot be publicly shared due to privacy restrictions, but it may be obtained from the corresponding author upon reasonable request.
During the preparation of this manuscript, the authors utilized ChatGPT-4o (OpenAI) and DeepSeek for AI-assisted drafting in the following aspects: reference formatting and sorting, as well as language polishing, grammar checking, and logical refinement of the English abstract and selected sections. Following the use of these tools, the authors thoroughly reviewed and edited the content extensively. The authors take full responsibility for the entire content of the published work.
Not applicable.
Not applicable.
The authors declare no conflicts of interest.
SignUp to our
Content alerts.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Are you the author of a recent Preprint? We invite you to submit your manuscript for peer-reviewed publication in our open access journal.
Benefit from fast review, global visibility, and exclusive APC discounts.