1 Background

For those of us who lived through COVID-19, uncertainty was a constant presence. But for those graduating during this time, that uncertainty was experienced tenfold as they entered the job market amidst an economic downturn and struggled to build a future in an increasingly unpredictable world. Unsurprisingly, this heightened uncertainty has been linked to a decline in mental health,as numerous studies have documented the psychological toll of the pandemic on students (Aucejo et al., 2020; Russell et al., 2024).

Researchers in De La Salle University and the University of Hong Kong took this a step further by investigating the interaction between hope and depression among pandemic graduates in the Philippines (Dela Cruz et al., 2023). The survey assessed multiple dimensions of the Lous of Hope, a framework that captures both internal and external sources of hope (Bernardo, 2010) as well as depression levels using the Patient Health Questionnaire-9 (PHQ-9), a widely used screening tool for measuring the severity of depression (Kroenke et al., 2001).

The study by De La Salle University and the University of Hong Kong found that hope derived from personal qualities like resilience was the only significant predictor of depressive symptoms in Filipino pandemic graduates, while hope influenced by social support showed no significant link to depression (Dela Cruz et al., 2023). I built upon their work by dividing participants into those who graduated during the peak of the pandemic and those who graduated during its decline, in order to take a closer look at how the timing of graduation influences hope and depression among these Filipino graduates.

2 Statistical Analysis Plan (SAP)

Objective: Examine the psychological impact of the COVID-19 pandemic on graduating students by comparing those who graduated during the pandemic’s peak (2020) to those who graduated during its decline (2021).

Hypothesis: I hypothesize that students graduating during the peak of the COVID-19 pandemic (2020) will exhibit higher levels of uncertainty and psychological distress (in terms of depression and hope) compared to those graduating during the decline (2021), when the situation had somewhat stabilized.

2.1 Data Overview

Survey Data: Data collected from students who graduated either during the pandemic’s peak (2020) or its decline (2021).
Demographic Information: Includes variables such as gender, age, diploma type, and graduation year. The graduation year was transformed into the outcome variable for analysis.
Survey Responses: Questions assessing psychological well-being, social support, hope, and depression during the pandemic.
Psychological Measures:
- PHQ-9 Score: This score is derived from the Patient Health Questionnaire-9, a widely used tool for screening and assessing the severity of depression. A higher score indicates more severe levels of depression
- EXT.PE: This score measures the extend to which an individual derive hope and motivation from their relationships with peers. A higher score indicates a greater reliance on friends for support and confidence in achieving personal goals.
- EXT.PA: This score measures the extend to which an individual derive hope and motivation from their faith in God or higher spiritual power. A higher score indicates a greater reliance on parental support, encouragement, and guidance to achieve personal goals.
- EXT.SP: This score measures the extend to which an individual derive hope and motivation from their parents or parental figures. A higher score indicates a greater reliance on spirituality or religious beliefs for support and guidance in achieving personal goals.
- INT: This score measures the extend to which an individual derive hope and motivation from within themselves, emphasizing personal agency and self-determination. A higher score indicates greater reliance on one’s own abilities and internal drive to achieve goals.
Outcome Variable: PandemicPhase, binary variable indicating whether the student graduated during the pandemic’s peak (2020) or decline (2021), derived from the graduation year.

2.2 Analysis Plan

Principal Component Analysis (PCA):

Conduct PCA to reduce the dimensionality of the survey questions. This will identify key components explaining variance in the PandemicPhase. Use methods such as threshold cumulative variance and elbow method to determine the optimal number of PCs to keep.

Logistic Regression Models:

With PCA: Fit a logistic regression model using the principal components derived from PCA to predict the PandemicPhase.
Survey Responses Only: Fit a logistic regression model using only the survey responses to predict PandemicPhase.
Psychological Measures Only: Fit a logistic regression model using only the psychological measures (PHQ-9, EXT.PA, EXT.PE, EXT.SP, and INT) to predict PandemicPhase.

Model Evaluation

Compare the logistic regression models by creating a confusion matrix and calculating accuracy for each model.
Apply the bootstrap method (1000 resamples) to estimate the accuracy of each logistic regression model.
Compute 95% confidence intervals for the accuracy estimates from all models to assess the stability and reliability of their performance.

3 Data Preprocessing

Since there was an imbalance in the class distribution (with 23 students graduating during the peak of the pandemic in 2020 and 78 graduating as the pandemic was winding down in 2021), I performed a stratified partition to split the data into 80% training and 20% testing, ensuring that both the training and testing sets maintain the same proportion of students from each graduation phase.

Training Data Distribution
Pandemic Phase	Count	Proportion
Decline	63	0.7682927
Peak	19	0.2317073

Testing Data Distribution
Pandemic Phase	Count	Proportion
Decline	15	0.7894737
Peak	4	0.2105263

4 Correlation Matrix

I took a look at the correlation matrix for all 49 survey questions to see how related they are. While a lot of the questions don’t show strong relationships, there are definitely some pairs with pretty high correlations. That suggests a few questions might be measuring similar things. Since there’s a mix of strong and weak correlations, it makes sense to use PCA to help simplify the data and find any hidden patterns or underlying themes.

5 Principal Component Analysis (PCA)

Three methods were used to determine the optimal number of principal components: 80% cumulative variance explained, 90% cumulative variance explained, and the elbow method. The 80% threshold resulted in 14 PCs, while the 90% threshold resulted 22 PCs. The elbow method resulted in 20 PCs. I will first examine the top loadings for the first two PCs and then evaluate separation of groups using the first two principal components.

Top 5 Loadings for PC1
	PC1
God.always.finds.ways.to.help.resolve.my.problems.	0.2092938
There.are.many.ways.around.a.problem..if.one.trusts.in.God.	0.2073451
God.has.made.my.life.successful.	0.2058872
My.family.finds.many.ways.to.help.me.solve.my.problems.	0.1991556
My.family.has.helped.me.meet.the.goals.that.I.have.set.for.myself.	0.1988319

Top 5 Loadings for PC2
	PC2
Feeling.down..depressed..or.hopeless.	0.2833215
Feeling.bad.about.yourself…or.that.you.are.a.failure.or.have.let.yourself.or.your.family.down.	0.2449019
Moving.or.speaking.so.slowly.that.other.people.could.have.noticed..Or.the.opposite…being.so.fidgety.or.restless.that.you.have.been.moving.around.a.lot.more.than.usual.	0.2389954
Thoughts.that.you.would.be.better.off.dead..or.thoughts.of.hurting.yourself.in.some.way.	0.2280291
Poor.appetite.or.overeating.	0.2245673

The top 5 loadings from PC1 appear to be centered around problem-solving, goal-setting, and success, with hope being a key component, while the top 5 loadings from PC2 seem to reflect symptoms of depression. This pattern suggests that hope might be a crucial factor in differentiating between students who graduated during the peak of COVID and those who graduated during its decline.

The PCA plot didn’t show clear separation between the PandemicPhase groups along PC1 and PC2, meaning the variance captured by these components doesn’t seem to distinguish the groups, so I will try using Logistic Regression with the principal components to directly model how they relate to the groups and see if I can get better classification results.

6 Logistic Regression Models

Fitting the logistic regression with 22 PCs (90% variance) resulted in a lower accuracy (78.95%) compared to using 14 PCs (80% variance), which gave a higher accuracy of 89.47%. This suggests that adding more components may lead to overfitting or include unnecessary information, reducing performance. Using 14 PCs strikes a better balance between capturing essential variance and maintaining predictive power, with the elbow method (20 PCs) offering a compromise between the two.

Interestingly, the confusion matrix for 20 PCs matches that of the psychological measurements, resulting in the same observed accuracy rate.

6.1 Model Evaluation

Now I will compare the models. Below, you will find a bar plot that visualizes the bootstrapped test accuracy, including confidence intervals. The red dot represents the actual accuracy for each respective model.

Logistic Model Accuracy and Bootstrapped Confidence Intervals
Logistic Model	Actual Accuracy	Bootstrapped Accuracy	CI Lower	CI Upper	CI Width
Psychology	0.842	0.813	0.684	0.895	0.211
14 PCs	0.895	0.777	0.579	0.895	0.316
20 PCs	0.842	0.691	0.474	0.895	0.421
22 PCs	0.789	0.684	0.474	0.895	0.421
Survey	0.421	0.507	0.263	0.789	0.526

Due to the small sample size and class imbalance, some bootstrap samples contained only Decline cases, leading to extreme predicted probabilities and convergence warnings (which I suppressed for clarity). Only the psychological model avoided these issues. This occurred because I used case resampling and according to the professor fixing this issue is beyond the scope of this course. As a result, the bootstrapped accuracy and confidence intervals—especially for the PC and survey models—may not fully reflect true performance.

With that in mind, the graph and table suggest the PC models are less stable, as their resampled accuracy is much lower than the observed. The psychological model is more consistent, with bootstrapped test accuracy closely matching the observed and a narrow confidence interval. The survey model performs worst, overestimating accuracy and showing the greatest variability.

7 Conclusion

Given the strong predictive power of the psychological measures, we can confidently conclude that there is a psychological state difference between the two groups. The logistic regression results suggest that students graduating during the decline of the pandemic (2021) showed greater psychological distress than those who graduated during the peak (2020), which runs counter to my original hypothesis. Higher levels of depression (PHQ-9), internal hope, peer support, and spiritual support—with only spiritual support reaching statistical significance at p = 0.0329—were all associated with the Decline group. In contrast, only parental support (EXT.PA) was positively associated with the Peak group. This pattern may reflect a delayed psychological toll, where students graduating after the immediate crisis continued to experience or acknowledge distress and turned to various forms of support.