
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Scientific Reports volume 15, Article number: 4591 (2025)
Metrics details
Traditional one-size-fits-all recommendations for student well-being and academic success may not be optimal. Personalized recommendations based on individual data hold promise. This study explores the potential of Large Language Models (LLMs) to generate personalized recommendations for 12 high school students to enhance their well-being and academic performance. We analyzed data from 12 students, including Fitbit data (activity levels, sleep and stress scores), PSQI surveys (sleep quality), and school reports (grades, teacher observations). An LLM model was used to analyze this data and create personalized recommendations for each student. Validator scoring assessed the clarity, actionability, and alignment of recommendations with student data. The LLM generated various recommendations based on different student data profiles (e.g., low activity levels, poor sleep quality). Validation results indicated that the recommendations were generally clear and actionable, with high ratings in both areas, though alignment with student data showed more variability, suggesting areas for improvement. This study demonstrates the potential of LLMs to generate personalized recommendations based on student data, acknowledging the need for further validation with initial validator feedback indicating their value. However, improvements are needed at every stage, including enhancing prompts, refining models, and incorporating advanced data analytics and continuous feedback. Future research, particularly with intervention groups and potentially RCT studies, is crucial to establish causal relationships and validate the recommendations’ impact. As this technology evolves, ensuring ethical considerations and data privacy remains essential.
School life is full of challenges, particularly for high school students aged 13 to 17 years. For many, this period marks their first experience of handling various responsibilities independently, with a level of excitement1. In addition to academic requirements, students must adapt to a structured environment and make daily decisions, such as when to wake up, how to manage homework, and which extracurricular activities to pursue. These numerous sources of pressure can disrupt a student’s life, potentially leading them to make decisions that might not be beneficial for their physical or mental health in the long term2,3.
Traditionally, one-size-fits-all recommendations for student well-being and academic success may not be optimal. Teachers often look out for signs based on observed behavioral patterns or underperformance. However, this approach is highly subjective, and with the increased pressure of administrative work, teachers (particularly those less experienced) can often overlook students who may be suffering. Reports are typically written per term or academic year, containing observations, grades, and general comments about a student.
In this study, we consider well-being as a multidimensional construct encompassing sleep quality, physical activity levels, and academic performance. Our goal is to enhance students’ overall well-being by addressing these specific areas. Wearable devices, such as smartwatches and smart bands, equipped with sensors like photoplethysmography and accelerometers, can monitor vital health metrics such as heart rate, activity levels, and sleep patterns. These devices utilize AI to analyze the collected data and offer personalized insights. Additionally, they can monitor stress by assessing heart rate variability. Examining changes in stress levels using these devices, combined with observations and grades from reports and The Pittsburgh Sleep Quality Index (PSQI) surveys, could offer valuable insights for managing general well-being in students and generating personalized recommendations. Health metrics provide objective data that can complement teacher observations, offering a more holistic view of a student’s well-being. This dual approach allows for more tailored and effective recommendations, as it considers both subjective and objective indicators of student health. Furthermore, health metrics can reveal underlying issues that may not be immediately apparent through behavioral observations alone, thus enhancing the ability to provide timely and relevant support to students.
By leveraging personalized recommendations based on individual data, we can offer tailored advice for students. The personalized recommendations generated by the Large Language Model (LLM) will focus on actionable strategies for improving sleep hygiene, increasing physical activity, and enhancing academic performance. These recommendations may include suggestions for study habits, time management, and stress reduction techniques. Utilizing LLMs and AI to analyze and integrate these diverse data sources can enhance the support provided to each student. This approach not only personalizes and improves the recommendations but also alleviates the pressure and administrative workload on teachers, who often struggle to find the time to tailor advice for each student. The potential of personalized recommendations based on individual student data holds great promise for improving student well-being and academic performance. This study serves as a proof of concept, demonstrating the potential of LLMs to generate personalized recommendations based on student data, while recognizing the need for further validation and refinement.
Studies have looked at personalized learning to tailor educational tracks to individual strengths and weaknesses, leveraging AI and ML for precise content recommendations and curriculum advice based on student data. However, many of these studies primarily focus on refining educational delivery systems4 often overlooking lifestyle parameters critical for personalized recommendations that promote overall well-being. Previous studies have used wearables in school populations to give recommendations on activity levels on and off school days5, and measure stress during fasting6 but these were general and not personalized, and the focus was not on overall well-being and academic performance. Other studies focus on physical activity and stress management7,8. For instance, research in two Qatari high schools found that students did not meet the in moderate-to-vigorous physical activity MVPA or steps/day recommendations. The authors findings highlighted the need for educational strategies and programs to improve physical activity levels in the Qatari high school population and suggest that more extensive studies are required to determine if these trends are consistent across the region5. Healthy sleep, stress management, and physical activity is important when promoting well-being and positive academic outcomes. With regards to sleep: Orchard et al.9, highlight the associations between sleep patterns and mental health outcomes in adolescents. Additionally, Niu et al.10 illustrate how sleep deprivation can impair memory performance and academic outcomes, and Niu et al.11 show the benefits of sleep extension interventions for improving cognitive function and overall well-being.
This was as proof of concept study, describing the LLM’s capability and student data, not measuring intervention effects.
Data were collected from 12 healthy adolescents (mean age 15.7, SD 1.31) aged 13 to 17 at one time point, covering six weeks period. Initially, participants were recruited from two private high schools in Doha, which were selected for their willingness to collaborate. However, due to synchronization and device usage issues, only data from 12 students were suitable for analysis. Synchronization issues included difficulties in connecting the Fitbit devices to the participants’ smartphones, while device usage issues involved participants not wearing the devices consistently or removing them for extended periods. The final sample consisted of 12 healthy adolescents. Demographic information collected included gender, socioeconomic and health status data were not collected as part of this study. Table 2 provides a comprehensive overview of the data for all 12 students; however, the Large Language Model (LLM) was unable to generate sentiment scores for three of these individuals. Notably, one of these three students, along with an additional participant, exhibited extreme values in their daily sleep data recorded by Fitbit. Despite the LLM successfully generating advice for the latter participant, we have opted to exclude their data from Tables 3 and 4 to maintain the integrity of our analysis. Consequently, only eight students are included in the personalized advice generated and validation shown in Tables 3 and 4. Inclusion criteria required good general health, high school age (11–17 years), no known skin allergies to the Fitbit sensor, and possession of a smart mobile phone with internet access. The study followed the Declaration of Helsinki and was approved by the Institutional Review Board (IRB), Weill Cornell Medicine-Qatar (WCM-Q, Study No.: 21–00025). Written assent and informed consent were obtained from all participants and their parents.
The schools enlisted participants who showed a willingness to volunteer for the study. The participants were provided with a Fitbit Charge 5 wristband and were instructed to wear it on either wrist for the entire study period, including during sleep, but they could remove it briefly for charging or other necessary reasons. The Fitbit Charge 5 utilizes various biosensors, including photoplethysmography (PPG) for heart rate monitoring, SpO2 sensors for blood oxygen levels and accelerometers for activity tracking. The wristbands were worn continuously for 8 weeks, but the primary data analysis focused on a core 6-week period. The study was conducted in March 2022. Daily stress score and heart rate variability was collected along with other digital data such as sleep patterns and activity levels. The school provided reports in PDF format for each student. We extracted qualitative data from each PDF report detailing the student’s interests, strengths, and areas for improvement. The Pittsburgh Sleep Quality Index (PSQI) questionnaires (PSQI)12 were also given to the students both at the start and end of the study. PSQI12 is a widely used tool for assessing sleep quality and disturbances over a one-month period, validated in both adults and adolescents. It consists of a series of questions designed to evaluate various aspects of sleep, including sleep duration, sleep latency, sleep efficiency, and sleep disturbances, as well as daytime dysfunction related to sleep. Using data obtained from Fitbit devices as the primary source, information about the durations of sleep was measured. Despite adequate attempts to ensure constant use of fitbit devices, some expected anomalies such as very low sleep may be attributed to device technical errors, missing data, or personal behavioral exceptions. A comparison of Fitbit data with PSQI-reported habitual sleep patterns was done wherever available to assess data reliability.
A higher PSQI score is associated with poorer sleep quality and greater sleep-related problems, providing a comprehensive overview of an individual’s sleep patterns and issues.
We selected the LLaMA (Meta-Llama-3-Instruct) local download version using the LM studio downloadable application13, we used this model due to its state-of-the-art performance in natural language understanding and generation tasks. The model is known for its ability to generate coherent and contextually relevant text, making it suitable for producing personalized advice based on complex and varied input data.
We utilized LM Studio in conjunction with a Jupyter Notebook and python environment to implement our solution. LM Studio provides a robust platform for deploying large language models locally. By running the LLaMA model offline, we ensured that all sensitive student data remained on local storage, addressing privacy concerns and eliminating the risk of data breaches associated with cloud-based solutions. Additionally, this approach circumvents the necessity for Application Programming Interface (API) access, which can be costly and subject to usage limitations. This makes our methodology not only secure but also economically feasible for institutions with budget constraints.
Figure 1 demonstrates the process workflow, which comprises of following:
Data Preparation: A dataset was prepared combining quantitative metrics from wearable devices, PSQI surveys and qualitative data from academic reports. As illustrated in Fig. 1 the following data was extracted from the data sources:
Fitbit: Stress score, Step counts, Sleep mins. Weekly average of all these metrics was considered for analysis.
PSQI: Sleep Quality Score.
Academic Reports: Textual data in form of pdf was collected per participants and was fed into the model.
Prompt Construction: A detailed prompt was constructed for the LLaMA model, first the model was prompted to perform a sentiment analysis on academic reports and output scores between the values of 0–1 for eight categories (Academic Performance, Behavior, Grade, Creativity, Participation, Improvement Areas, Effort, Attitude). The second prompt incorporated all data types (Fitbit, PSQI scores, Academic reports and sentiments) to guide the generation of personalized advice. Sentiment analysis involves using natural language processing techniques to assess the emotional tone of text data, which can provide insights into student well-being. The specific parameters for prompt construction included settings for temperature, max tokens, and top-p sampling, which were left to default values as adjusting them had little to zero effect on output. In constructing the prompt for the LLM, we carefully considered the diverse data types relevant to each student, including Fitbit data (activity levels, sleep, and stress scores), PSQI survey results (sleep quality), and academic performance indicators (grades and teacher observations). The prompt was designed to clearly delineate each data type, allowing the model to effectively understand the individual student’s profile. This structured approach was essential, as LLMs are sensitive to prompt variations; thus, the chosen prompt was specifically tailored to elicit contextually relevant and actionable recommendations based on the unique data provided for each student. The goal was to provide the LLM with a comprehensive context to generate personalized recommendations. Specifically, the prompt included structured input that clearly delineated each data type, allowing the model to understand the individual student’s profile effectively. Furthermore, we employed strategies to manage long context window problems with LLMs, including data summarization and chunking. The prompt engineering process involved crafting specific queries that aligned with the data collected from students. For example, prompts were designed to elicit recommendations based on sleep quality, stress levels, and academic performance metrics.
Model Inference: The prompt was fed into the LLaMA model running within LM Studio. The model processed the input and generated tailored advice addressing various aspects of the student’s well-being and academic performance.
Output Generation: The generated advice was validated using a scoring system by three independent raters (including healthcare workers and academics) who evaluated all recommendations based on criteria of clarity, actionability, and alignment with student data. The scores from these raters were then averaged to provide a comprehensive assessment, enhancing the reliability and validity of our findings. In evaluating the effectiveness of the personalized recommendations generated by the LLM, we selected actionability, clarity and alignment as our primary outcome variables. Actionability was chosen because it ensures that the recommendations provided are practical and can be realistically implemented by students and educators. Clarity is equally important, as it allows users to easily understand and follow the advice given. These criteria are essential for assessing the utility of the recommendations in real-world educational settings, ensuring that they not only provide insights but also facilitate meaningful changes in student behavior and well-being. Furthermore, alignment with student data indicates how well the recommendations correspond to the individual data profiles. The three independent evaluators scores were than averaged to get a final score for each generation. The experts who provided the ratings had diverse backgrounds, including academia, nursing, and data analytics. The inter-rater reliability was assessed by calculating the Intraclass Correlation Coefficient (ICC). The ICC ranges from 0 to 1, with values below 0.5 reflecting poor reliability, 0.5 to 0.75 indicating moderate reliability, 0.75 to 0.9 signifying good reliability, and values above 0.9 representing excellent reliability14.
Illustrates workflow for proof of concept.
Table 1: This table illustrates the types of recommendations generated by the Language Model (LLM) based on different student data profiles such as sleep data, stress levels, academic performance and exercise habits.
Table 1: outlines the types of personalized recommendations generated by the LLM based on various student data profiles. The recommendations are tailored to address specific areas of concern identified through the data input such as low sleep quality, high stress levels, poor academic performance and irregular exercise. For instance, students with low sleep data receive suggestions to improve sleep hygiene and adjust bedtime routines, while those with high stress levels are advised to practice relaxation techniques and manage their workload. This table highlights the LLM’s ability to generate targeted interventions aimed at enhancing student well-being and academic success. Table 1 illustrates examples of recommendations generated by the LLM based on various data inputs, emphasizing the benefit of integrating multiple data sources and sentiment scores to create tailored advice for each student. The LLM synthesizes information on sleep, activity levels, academic performance, and sentiment analysis to provide comprehensive and context-aware recommendations.
Table 2 presents data from Fitbit devices alongside PSQI self-report survey scores for each participant, capturing various dimensions of their well-being. The table details weekly averages for stress scores, step counts, and sleep minutes, complemented by the PSQI scores from May. For instance, participants with high stress scores, such as IDs 6, 8, and 30, also display varying levels of physical activity and sleep, while some participants exhibit high stress scores, their physical activity levels and sleep quality do not necessarily align with the expected patterns of correlation. The ‘weekly average’ column represents the mean minutes asleep per day, calculated across all days of the study period. Similarly, the ‘average daily step count’ column represents the average daily step count across the entire study period. Missing values were not treated as ‘0’ during the calculation of these averages. Instances are noted where participants either recorded such low average sleep durations as ID 36 (28 min/day) or ID 18 (83 min/day). Possible explanations for these anomalies seem to be errors in recording by the wearable devices or individual variation in the sleeping pattern of a person. As a result we did not include these in the sentiment analysis results (Table 3) and validation (Table 4). These anomalies raise challenges in using device-recorded sleep data for research and necessitate rigorous quality assessment of the data. Certain participants, like 1, are among those with very low stress scores that might be reflective of personal differences in perceptions of, or reports of, stress. This situation underscores the fact that perceived stress can occur in a wide variety of ways and may have little relation to parameters besides sleep duration. Notably, some participants, like ID 59, exhibit high stress with low step counts and sleep duration, which may underscore the need for targeted interventions. This data set offers a multifaceted view of the participants’ health, emphasizing the importance of integrating both objective device data and subjective self-reports to inform personalized recommendations, while acknowledging the limitations in data quality and the potential discrepancies observed. There are discrepancies noted for Participants 10 and 59, who reported significantly low average sleep durations (64 minutes and 220 minutes, respectively. However, both participants would receive a PSQI score of 10, resulting in indications of only “very little” sleep quality problems. This could be an example of discrepancies between an individual’s perception of sleep quality and the measurement thereof.
Table 3 presents the results of sentiment analysis across multiple domains related to student performance and behavior, including academic performance, behavior, grades, creativity, participation, improvement areas, effort, and attitude. Academic performance refers to the overall assessment of a student’s capabilities, while grades specifically denote the numerical or letter scores received. Effort reflects the amount of work a student puts into their studies, whereas attitude pertains to their disposition towards learning. Improvement areas indicate specific domains where the student can enhance their performance, such as grades or behavioral aspects. The input for this analysis consisted of qualitative descriptions derived from student reports, while the output reflects the sentiment score indicating the overall positivity or negativity of the feedback. This analysis aims to provide insights into students’ emotional states and attitudes towards their academic performance. The sentiment scores, ranging from 0 to 1, reflect positive sentiment levels in each area for different participants. For instance, participants 10, 12, 30, and 59 exhibit consistently high sentiment scores across most categories, indicating strong overall performance and engagement. Conversely, lower scores in areas such as improvement areas or effort, as seen with participants 6, 8, and 16, may highlight specific aspects requiring further attention. This analysis provides a nuanced understanding of student attitudes and behaviors, aiding in the development of more tailored and supportive educational strategies.
Table 4: presents the scoring of LLM recommendations by validators based on three criteria: clarity, actionability, and alignment with student data. Scores range from 1 (poor) to 5 (excellent). Clarity: How clear and understandable the recommendation is Actionability: How feasible and practical it is for students to implement the recommendation. Alignment with Student Data: How well the recommendation addresses the specific student data profile it was generated for. The alignment with student data was evaluated by validators who reviewed the recommendations generated by the LLM in relation to the specific data profiles of each student. They considered how well the recommendations addressed the individual needs and circumstances reflected in the student data, including sleep patterns, stress levels, academic performance, and other relevant metrics.
The validation results, as summarized in Table 4, show varying levels of clarity, actionability, and alignment with student data across the sampled students. Most students received high scores in clarity, with an average rating above 4 for the majority, except for one outlier (Student 12) who scored significantly lower. Actionability also received favorable ratings, typically ranging from 3.67 to 5, though it was not applicable for Student 12 as the LLM failed to generate anything meaningful. The lack of meaningful results for student ID 12 may be attributed to insufficient data points due to inconsistent device usage or LLM was unable to generate output based on missing information from the report. The alignment with student data showed more variability, with most students scoring between 3.33 and 4, indicating generally good alignment, though some cases reflected moderate to lower alignment. These results suggest that while the recommendations are generally clear and actionable, there is room for improvement in ensuring they consistently align with student data. With regards to inter-rater reliability, the ICCs shows good agreement for the clarity criterion (0.886), moderate agreement for actionability (0.748) and alignment (0.719). These results suggest that raters are more consistent when evaluating clarity than they are when evaluating actionability and alignment. The lower ICCs for actionability and alignment may indicate that these criteria are more subjective or prone to differences in interpretation, which could require clearer guidelines or better-defined metrics for raters to improve consistency.
This study (process outlined in Fig. 1) demonstrates the LLM’s capability to generate personalized recommendations based on student data, while recognizing the variability in alignment with that dataTo avoid the “bit of skew caused by loss of extreme outliers,” participants with very short average sleep durations were retained for sentiment analysis-for example, such as 28 min in total/day for ID 36, but excluded from the personalized validation. However, the rigidity of this value may reflect recording errors in the data or other issues, and in this regard, a sensitivity analysis was conducted evaluating the robustness of results while continuing to exclude those with these extreme average sleep durations. Validator feedback offers initial insights into the potential value of these recommendations. Future research with student and/or parent feedback would provide insights into the perceived value and potential helpfulness of the recommendations. Furthermore, intervention groups are crucial and would validate the recommendations’ effectiveness in improving student well-being and academic performance, with the possibility of a randomized controlled trial (RCT) study to further validate this conceptual workflow. The LLM’s ability to analyze and integrate diverse data sources—including Fitbit metrics, PSQI scores, academic reports, and sentiment analysis—enables the generation of personalized recommendations that are not only contextually relevant but also sensitive to the emotional state of the student. This holistic approach ensures that the advice provided is comprehensive and actionable, thereby enhancing the overall utility of the recommendations.
At every stage of the process in this proof-of-concept study we have outlined, there is significant potential for improvement. The prompts used to generate recommendations (Fig. 1) could better align with specific student needs and contexts, and to address the observed variability in alignment with student data, future iterations of the model could benefit from enhanced prompt refinement and the exploration of more advanced LLMs. This could lead to improved accuracy in generating recommendations that are better tailored to individual student profiles. The model itself can be enhanced by training on more diverse and comprehensive datasets to improve the accuracy and relevance of its outputs, it can also be fine-tuned using known techniques. Exploring different models, including those trained on larger datasets and/or with advanced natural language processing capabilities, could yield even more effective recommendations. Additionally, the risk of model hallucinations—where the LLM generates plausible but incorrect information—poses a challenge. Future iterations of this research will need to incorporate safeguards against harmful recommendations and ensure that outputs are critically evaluated for accuracy and appropriateness.
Enhancing the model with more advanced data analytics techniques for preprocessing and analyzing student data could result in more nuanced and precise recommendations. Implementing continuous feedback loops, where the model adapts based on actual outcomes and user feedback, would further refine its performance. Ethical considerations, along with data privacy and security, remain critical as this technology evolves in educational settings. Our study faced limitations due to data privacy concerns, preventing the use of more powerful models like ChatGPT, which likely would have outperformed the LLaMA model we utilized. However, as more advanced models become available, this limitation can be addressed. By ensuring proper anonymization and obtaining the necessary ethical approvals, future studies could leverage larger, more sophisticated LLMs such as ChatGPT, potentially leading to even more effective outcomes.
Another approach to enhancing the model’s performance in the future could involve experimenting with different prompting strategies. By refining the way prompts are structured and exploring various techniques for querying the LLM, it may be possible to elicit more accurate and contextually relevant recommendations. This iterative process of optimizing prompts can be a straightforward yet effective way to improve the model’s outputs without necessarily requiring more advanced or larger models.
In this study, we prioritized ethical considerations by ensuring that all participant data was anonymized and that informed consent was obtained from both participants and their parents. We adhered to the guidelines set forth by the Institutional Review Board, ensuring that our research practices align with current ethical standards. Finally, we recognize that the small sample size of 12 participants limits the statistical power of our analysis. Future research should aim to include a larger cohort to strengthen the findings and allow for more comprehensive statistical evaluations. Care must be taken to avoid overwhelming students with information, as excessive feedback may lead to anxiety rather than improvement in well-being.
For the stress score values, the observed range of 8–74 reflects the individual variability in stress responses among participants. We acknowledge that this wide range may raise concerns, and we would emphasize the need for caution in interpreting these scores.
The sleep durations reported in this study were derived from data collected by Fitbit devices, which monitor sleep patterns based on movement and heart rate variability. The Fitbit-derived HRV metric used in this study is RMSSD, a well-established measure of short-term parasympathetic activity15. While Fitbit data have demonstrated reasonable accuracy in adults, limited validation exists for adolescents. Previous research reports adolescent RMSSD values ranging from 9 to 350 ms, with median values of 59 ms for boys and 69 ms for girls16. Future studies should validate Fitbit-derived HRV data in adolescents using simultaneous ECG recordings to ensure accuracy. It is important to note that the accuracy of these measurements can be influenced by factors such as the consistency of device usage and proper wear. Participants may not have worn the devices during all sleep periods or may have removed them for charging, which could lead to underreporting of sleep duration. Additionally, the reported sleep durations represent weekly averages, which may obscure individual variations and instances of disrupted sleep. Due to the complexity of the relationships among stress, physical activity, and sleep quality, it is possible that other factors, such as individual differences in coping mechanisms or lifestyle choices, may influence these outcomes. None less than the originality of the input, data quality is what decide the effectiveness of the recommendation generated through an AI algorithm. Inputs like very low duration of sleep could result in absurd recommendations. Future works have to be directed to developing data validation and also the use of further measures concerning data accuracy. We observed some inconsistency raising valid concerns about the reliability of the PSQI scores in accurately reflecting sleep issues, this potential for inaccuracies in self-reported measures highlights the need for further validation of the PSQI scores against objective sleep data. While we believe that the study demonstrates the potential of LLMs in generating personalized recommendations, we also recognize that the results were not universally effective for all participants.
The selection of private schools may introduce bias, as these institutions often have different resources and student demographics compared to public schools. Gender differences may influence stress scores and other metrics. Future studies should consider a more diverse range of school types and consider gender ratios to enhance generalizability. Collapsing weekday and weekend data may obscure crucial patterns and within-week variations in stress scores, step counts, and sleep duration. While our approach aimed to provide a holistic view, future studies should explore temporal patterns by analyzing metrics separately for weekdays and weekends17. Such analyses could uncover unique behavioral trends and support tailored recommendations for balanced health behaviors throughout the week.
While Fitbit devices offer valuable metrics for assessing physical activity, sleep quality, and stress levels, it is essential to approach their data with caution. A systematic review by Feehan et al.18 indicates that the accuracy of Fitbit devices can vary, and discrepancies may arise depending on the context and population being studied. Therefore, while we utilized Fitbit data to inform our personalized recommendations, future studies should consider validating these metrics against more traditional assessment methods to ensure reliability and accuracy.
Our methodology demonstrates the effective use of the LLaMA model within LM Studio to generate personalized educational and mental health advice while maintaining stringent data privacy standards. This approach provides a viable solution for educational institutions seeking to leverage advanced AI capabilities without compromising on data security or incurring high operational costs.
The findings from our study illustrate the LLaMA model’s ability to generate personalized recommendations based on diverse student data profiles, including wearable device metrics and academic performance indicators. The model produced a variety of tailored advice, offering valuable insights into areas such as mental health, stress management, academic strengths, and areas needing improvement. These personalized recommendations have the potential to significantly enhance student well-being and academic performance by providing actionable and specific guidance.
The potential benefits of LLM-based recommendations in education are substantial. Particularly the innovative integration of LLMs with diverse student data to generate personalized recommendations, which has not been extensively explored in prior studies. By delivering individualized advice, schools can better support their students’ mental health and academic journey, ultimately fostering a more supportive and effective learning environment. The integration of advanced AI models like LLaMA can empower educators with tools to address students’ unique needs more precisely and proactively.
However, it is essential to acknowledge the need for future studies to establish the causal relationships between AI-generated recommendations and student outcomes. Future work could include conducting randomized controlled trials (RCTs) where one group of students receives the personalized recommendations, while a control group does not. Monitoring and comparing the progress of these groups over time would provide valuable evidence on the effectiveness of LLM-based interventions in educational settings. Additionally, exploring the long-term impacts of such recommendations on student well-being and academic success would further validate the practical applications of this approach.
In conclusion, while this proof-of-concept study highlights the potential of integrating LLMs with wearable technology for personalized recommendations, it is essential to recognize that this research is in its early stages. Further validation through larger studies, more refined models, and direct feedback from stakeholders will be crucial in establishing the efficacy of these interventions in enhancing student well-being and academic performance.
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
Moore, G. et al. Socio-Economic Status, Mental Health Difficulties and feelings about transition to secondary school among 10–11 Year Olds in Wales: Multi-level Analysis of a Cross Sectional Survey. Child Indic. Res. 14 (4), 1597–1615 (2021).
Article PubMed PubMed Central MATH Google Scholar
Jindal-Snape, D. et al. Systematic literature review of primary–secondary transitions: international research. Rev. Educ. 8 (2), 526–566 (2020).
Article MATH Google Scholar
Gazelle, H. & Faldowski, R. A. Multiple trajectories in anxious solitary youths: the Middle School Transition as a turning point in Development. J. Abnorm. Child. Psychol. 47 (7), 1135–1152 (2019).
Article PubMed Google Scholar
Maghsudi, S. et al. Personalized education in the Artificial Intelligence Era: what to expect Next. IEEE. Signal. Process. Mag. 38 (3), 37–50 (2021).
Article Google Scholar
Ahmed, A. et al. Wearable Artificial Intelligence for assessing physical activity in High School Children. Sustainability 15 (1), 638 (2023).
Article MATH Google Scholar
Ahmed, A. et al. Wearable AI reveals the impact of intermittent fasting on stress levels in School Children during Ramadan. Stud. Health Technol. Inf. 305, 291–294 (2023).
ADS MATH Google Scholar
Ahmady, S. et al. Relation between stress, time management, and academic achievement in preclinical medical education: a systematic review and meta-analysis. J. Educ. Health Promot. 10, 32 (2021).
Article PubMed PubMed Central MATH Google Scholar
Getu, T. The effect of physical activity on academic performance and Mental Health: systematic review. Am. J. Sci. Eng. Technol. 5 (3), 118–123 (2020).
MATH Google Scholar
Orchard, F. et al. Self-reported sleep patterns and quality amongst adolescents: cross-sectional and prospective associations with anxiety and depression. J. Child. Psychol. Psychiatry. 61 (10), 1126–1137 (2020).
Article MathSciNet PubMed MATH Google Scholar
Niu, X. et al. The Effects of Shared, depression-specific, and anxiety-specific Internalizing Symptoms on Negative and Neutral Episodic Memories Following post-learning Sleep (Cognitive, 2024).
Niu, X., Zhou, S. & Casement, M. D. The feasibility of at-home sleep extension in adolescents and young adults: a meta-analysis and systematic review. Sleep. Med. Rev. 58, 101443 (2021).
Article PubMed Google Scholar
Buysse, D. J. et al. The Pittsburgh Sleep Quality Index: a new instrument for psychiatric practice and research. Psychiatry Res. 28 (2), 193–213 (1989).
Article CAS PubMed MATH Google Scholar
Studio, L. LM Studio – Discover and run LLMs locally, L. Studio, Editor. : (2024). https://lmstudio.ai/
Koo, T. K. & Li, M. Y. A Guideline of selecting and reporting Intraclass correlation coefficients for Reliability Research. J. Chiropr. Med. 15 (2), 155–163 (2016).
Article PubMed PubMed Central MATH Google Scholar
Shaffer, F. & Ginsberg, J. P. An overview of Heart Rate Variability Metrics and norms. Front. Public. Health. 5, 258 (2017).
Article PubMed PubMed Central MATH Google Scholar
Sharma, V. K. et al. Heart Rate Variability in adolescents – normative data stratified by sex and physical activity. J. Clin. Diagn. Res. 9 (10), Cc08–13 (2015).
PubMed PubMed Central Google Scholar
Kim, J. et al. The impact of Weekday-to-Weekend Sleep Differences on Health Outcomes among adolescent students. Child. (Basel), 9(1). (2022).
Feehan, L. M. et al. Accuracy of Fitbit Devices: Systematic Review and Narrative Syntheses of Quantitative Data6p. e10527 (JMIR Mhealth Uhealth, 2018). 8.
Download references
The authors would like to thank the teachers, parents, and children. We would also like to thank the Qatar Computing Research Institute, Doha, QA for allowing us the license to use their SIHA application.
This research received no external funding.
AI Center for Precision Health, Weill Cornell Medicine-Qatar, Doha, Qatar
Arfan Ahmed, Sarah Aziz, Alaa Abd-alrazaq, Rawan AlSaad & Javaid Sheikh
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
AA performed the analytics and wrote the draft manuscript. SA cleaned the data and feature engineered the data. RAA advised on the ML approach. AAA helped with manuscript editing. JS helped with manuscript editing.
Correspondence to Arfan Ahmed.
The authors declare no competing interests.
The study followed the tenets of the Declaration of Helsinki and was approved by the Institutional Review Board, Weill Cornell Medicine-Qatar (Study No.: 21–00025).
Written consent was obtained from all the participants and written informed consent was obtained from the parents of all of the participants.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
Ahmed, A., Aziz, S., Abd-alrazaq, A. et al. Leveraging LLMs and wearables to provide personalized recommendations for enhancing student well-being and academic performance through a proof of concept. Sci Rep 15, 4591 (2025). https://doi.org/10.1038/s41598-025-89386-2
Download citation
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-89386-2
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Advertisement
© 2025 Springer Nature Limited
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

More Stories
Community-Centered Connectivity Initiatives Earn Viddy Awards Recognition
Zombie IXPs: The Four Types of Exchanges That Refuse to Die, but Fail to Live
The Shift in Peering Threatening the Internet’s Foundations