SOTA | State of the Art of Smartphone Sensors Applications in Health, Mobility, and Context Awareness with Privacy Considerations

SOTA | State of the Art of Smartphone Sensors Applications in Health, Mobility, and Context Awareness with Privacy Considerations

Von am 19.01.2026

Abstract

The proliferation of mobile end devices and their integrated sensors has fundamentally changed the collection of personal data. They enable applications in areas such as mobile health (mHealth), mobility, and context-sensitive interaction to collect a multitude of important data (Delgado-Santos et al., 2022; Kumar et al., 2021; Mokbel et al., 2024; Mäder et al., 2024). This work gives an insight into current smartphone sensor technology (Apple, 2025) and highlights both passive data collection (Busso et al., 2025; Kumar et al., 2021) and the use of self-reports. This SOTA text also offers an overview of the scope and types of current datasets used in the field of mobile sensing. Additionally, techniques such as anonymization, Differential Privacy and Federated Learning are explained, and the difficulties of their implementation in practice are elucidated. Furthermore, the work highlights generalization problems of machine learning models across cultural boundaries, which are caused by regional behavioral differences. Finally, future research paths are pointed out that can contribute to improving model robustness and standardized data collection (Delgado-Santos et al., 2022; Meegahapola et al., 2023; Mokbel et al., 2024).

Keywords

Behavior Modeling, Context Awareness, Cross-Country Generalization, Differential Privacy, Federated Learning, mHealth, Mobile Sensing, MobilityData,MultimodalSensorData,Privacy-Preserving Techniques, Smartphone Sensors, User Profiling

Preface

The English translation of this text was created with the assistance of the generative AI tool Google Gemini. The tool was used exclusively to support the linguistic translation process; all conceptual content, ideas, and interpretations originate from the author. The translated passages were subsequently reviewed, corrected, and adapted to ensure accuracy, clarity, and consistency with the original meaning. The use of AI is disclosed here in accordance with the institutional guidelines on transparency and the responsible use of generative language models issued by the University of Applied Sciences St. Pölten (USTP).

1 Introduction

Modern mobile devices, especially smartphones and smartwatches, have become a ubiquitous technology through the evolution of mobile technologies. As described by Delgado-Santos et al. (2022), this includes the increase in their computing power, storage capacity, and integrated sensors. This development enables mobile devices to capture personal and sensitive information, which has shown their high potential in applications such as mobile health (mHealth), mobility, and context-sensitive systems. For instance, Delgado-Santos et al. (2022) estimate that the number of mobile devices reached nearly 6.8 billion by 2022. (Busso et al., 2025; Kumar et al., 2021; Mokbel et al., 2024)

Central application fields are based on the modeling of everyday behavior. In this process, Mäder et al. (2024) highlight that sensor data from up to 26 modalities, including accelerometers, gyroscopes, and GPS, are passively recorded and often combined with self-reports (annotations). Such datasets, like the DiversityOne dataset, which includes data from eight countries and over 26 smartphone sensor modalities are crucial, as previous datasets were often limited in scope and focused mainly on specific countries in the Global North (Busso et al., 2025). Meegahapola et al. (2023) and Busso et al. (2025) argue that this diversity is necessary to investigate generalization and robustness problems of models that rely on cross-country behavioral variations.

In parallel, Delgado-Santos et al. (2022) warn that the collection of personal and sensitive data poses a risk to privacy. Automated processing (user profiling) can derive sensitive attributes, such as health data, from seemingly harmless sensor data. Given these risks, it is indispensable to apply data protection techniques and ethical protocols that comply with international standards such as the General Data Protection Regulation (GDPR) to protect data and ensure its legal use (Busso et al., 2025; Delgado-Santos et al., 2022).

The goal of this State-of-the-Art paper is to analyze the current capabilities of smartphone sensors in the areas of health, mobility, and context sensitivity. Furthermore, international usage patterns will be compared, and privacy-preserving techniques as well as the challenges associated with their implementation will be examined.

2 MobileSensing and Data Collection

2.1 Overview of Sensors

As classified by Delgado-Santos et al. (2022) and Busso et al. (2025), the sensors that are built into mobile devices can be divided based on their functionality into hardware sensors (HW) and software sensors (SW). Hardware sensors are physical components that convert physical quantities into electrical signals (e.g. accelerometer, gyroscope). The software sensors use data from hardware sensors or calculate measurements from system logs (e.g., app usage, screen time).

Furthermore, Busso et al. (2025) distinguish sensor modalities based on the type of data collection.

2.1.1 Continuous Sensing.

Here, data is collected continuously and autonomously, mostly without direct user interaction (HoseiniTabatabaei et al., 2013). This category includes:

• Motion and Inertial Sensors: These include the accelerometer, the gyroscope, and the magnetometer. They measure acceleration and rotational forces and serve to recognize movement patterns (e.g., walking, running, inactivity) (Apple, 2025; Busso et al., 2025; Delgado-Santos et al., 2022; Hoseini-Tabatabaei et al., 2013)

• Position and Connectivity Sensors:Bussoetal.(2025) note that GPS and Wi-Fi are responsible for determining semantic locations and tracking trajectories. These sensors serve fortargetedadvertising, navigation, and recommendations. Bluetooth and proximity sensors provide information about social contexts and proximity to other devices (Apple, 2025; Delgado-Santos et al., 2022) .

• Environmental Sensors: These include the light sensor (Light) for measuring ambient brightness and the barometer for measuring atmospheric pressure (Apple, 2025; Busso et al., 2025).

2.1.2 Interaction Sensing.

These sensors capture the user’s interaction with the device and offer insights into engagement, attention, and internal states. Examples are app usage logs, touch events, “screen on/off episodes”, and interactions with notifications. The combination of both modalities provides a comprehensive view of user behavior (Mäder et al., 2024).

2.2 Passive Collection and Self-Reports

The development of machine learning models intended to predict user behavior (in-the-wild) is based on the creation of labeled datasets. Meegahapola et al. (2023) point out that passive data collection minimizes the burden on the user. To obtain the Ground Truth (truth labels) for model training, passive sensor data is combined with human-provided annotations or self-reports, which ideally confirm the actual states (Meegahapola et al., 2023).

For the collection of these annotations, longitudinal studies (Intensive Longitudinal Surveys) are used, frequently employing the Experience Sampling Methodology (ESM) or time diaries. In time diaries, participants report in detail on their activities, locations, social contexts, and moods at regular intervals. This type of data collection allows for collecting the user’s mood directly and promptly (in situ). Kumar et al. (2021) argue that through this immediate inquiry, the so-called “Recall Bias” is minimized, leading to significantly more accurate behavioral data (Busso et al., 2025; Kumar et al., 2021).

The iLog app tool, for example, was adapted for data collection in the DiversityOne project by Busso et al. (2025) and manages the simultaneous collection of raw sensor data and detailed self-reports via questions about current activity, semantic location, social context, and current mood (valence).

2.3 Extent and Types of Current Datasets

The research area in mobile sensing encompasses a broad spectrum of datasets. Busso et al. (2025) place this under research fields such as activity and context recognition. To illustrate this diversity, public datasets like MDC, StudentLife, ExtraSensory, and ContextLabeler offer different sample sizes, durations, collection locations, and numbers of used sensors.

A common problem was the gap in the availability of diverse datasets regarding mobile sensors. Busso et al. (2025) introduced the DiversityOne dataset to fill this gap. It comprises data from 782 college students over a period of four weeks. Through the combination of 26 smartphone sensor modalities and over 350,000 self-reports, DiversityOne belongs to the largest and most geographically diverse publicly accessible datasets of its kind. The data collections are divided into six thematic bundles: Connectivity, Environment, Motion, Position, App Usage, and Device Usage.

2.4 Advantages and Limitations of Previous Datasets

To analyze everyday behavior, the use of large amounts of data was essential. However, Mokbel et al. (2024) note that the use of these datasets was associated with limitations.

2.4.1 Regional Limitations.

In data collection within the field of mobility, Mokbel et al. (2024) identify regional limitations. The datasets that were published are mainly small and restricted to their collection environment. For example, published mobility trajectory datasets only included trips from taxis or in public spaces. And this only in specific cities like Athens, Beijing, Rio, Rome, and San Francisco. Additionally, due to privacy concerns, most datasets are released in aggregated form, as only a few spatial locations are sufficient to uniquely identify individuals. Such aggregated datasets, such as “Origin-Destination” or “CellPhoneTrace” datasets (aggregating data to the locations of the nearest cell tower), have coarse granularity.This prevents the extraction of detailed insights from mobility data (Mokbel et al., 2024). This prevalent limitation of regional restriction directly motivates the creation of large-scale, raw-data initiatives like DiversityOne (described in Section 2.3), which aims to overcome these specific biases by collecting nonaggregated data across multiple countries (see Table 1) (Busso et al., 2025).

2.4.2 Lack of User Proximity.

Silva et al. (2018) observe that the majority of data taken for analysis is collected by probes in the Radio Access Network or Core Network. The advantage of this is that they are easily accessible for network operators and contain useful mobility information. However, they offer no or only little information about the actual interaction of users with the smartphone. Even when user interactions are collected, data distortion can still occur, as usage duration is influenced by background traffic (Silva et al., 2018).

2.4.3 TechnologicalandMethodologicalDeficits.

Kumaretal.(2021) criticize that the data collected in mHealth frameworks is not stored in a standardized format. These methods of data collection worsen the reusability of datasets and limit cross-device and cross-study analyses. Apart from that, there is a lack of mechanisms for assessing data quality and of annotations, which are essential for understanding mobility behavior data. Such deficiencies worsen the quality and prevent important insights into actual observations (Kumar et al., 2021; Mokbel et al., 2024).

3 Application Areas in Everyday Life

Advances in the computing and communication capabilities of mobile devices have shown their potential in numerous application f ields. By 2022, the number of mobile devices was estimated at almost 6.8 billion, underlining the broad basis for these applications (Delgado-Santos et al., 2022).

3.1 Usage of Sensor Data for mHealth, Mobility, Context-Adaptive Systems

The three main application areas that can be derived from the collected data are:

3.1.1 Mobile Health (mHealth).

As defined by Delgado-Santos et al. (2022), mHealth refers to a sub-area of eHealth that includes medical and public health practices supported by mobile devices. Mobile apps can improve healthcare, monitor patients with chronic diseases, and promote a healthy lifestyle (Delgado-Santos et al., 2022; Hoseini-Tabatabaei et al., 2013; Kumar et al., 2021).

3.1.2 Mobility.

Mobility data is collected through accelerometers, gyroscopes, and GPS. Mokbel et al. (2024) emphasize that the analysis is central to mobility data science. It optimizes traffic management (e.g., route planning) as well as urban planning and enables life-saving interventions in health informatics, for instance, through the movement monitoring of elderly people (Mokbel et al., 2024; Mäder et al., 2024).

3.1.3 Context-AdaptiveSystems.

According to Delgado-Santosetal. (2022), these systems use geolocation data (GPS, Wi-Fi, Bluetooth) and other sensors to understand the user’s context and provide relevant information or services. Examples are the automatic adjustment of screen brightness via the light sensor or the adjustment of screen orientation through position sensors to improve the user experience (Delgado-Santos et al., 2022; Hoseini-Tabatabaei et al., 2013).

3.2 Examples from Current Research Works

The variety of sensor data enables complex inference tasks that go beyond mere basic measurement. Besides deriving demographic characteristics, Delgado-Santos et al. (2022) report high accuracies for various tasks, such as gender classification based on gestural attributes (93.65%), BMI estimation (94.8%) and sleep disorder detection (92.3%). While geolocation patterns provide indications of depressive phases (85%). Furthermore, sensor data allow profound insights into health status. In the mobility sector, sensors also allow vehicle localization to within 200 meters as well as indoor tracking via Wi-Fi with 85.7% accuracy. Even fine interaction patterns are analyzed. Micro-movements while typing even enabled the reconstruction of a PIN with 43% probability in tests.However, these high accuracy rates are often achieved in controlled or single-country settings. As shown in Section 5, Meegahapola et al. (2023) demonstrate that such performance can drop significantly when models are tested across different cultural contexts, highlighting a gap between theoretical capability and real-world robustness (DelgadoSantos et al., 2022; Hoseini-Tabatabaei et al., 2013; Kumar et al., 2021).

4 Privacy, Ethical Aspects, and Regulation

Due to the numerous sensors and possible applications, smartphones come with risks. Delgado-Santos et al. (2022) emphasize that data collection addresses aspects of data protection and privacy.

4.1 Privacy Techniques (Anonymization, Pseudonymization, Local Processing)

To minimize these risks, there are various privacy methods. Privacy methods aim to modify and de-identify data to avoid reidentification. Nevertheless, the utility of the data for analysis should be maximized simultaneously (Delgado-Santos et al., 2022).

4.1.1 Anonymization Metrics.

Traditional approaches use metrics like k-anonymity. K-anonymity ensures that an individual in the dataset is indistinguishable from at least k-1 other individuals. To overcome the limitations of k-anonymity, extensions like l-diversity and t-closeness were developed. However, Delgado-Santos et al. (2022) point out that these methods are primarily designed for structured, low-dimensional data.

4.1.2 Differential Privacy (DP).

Differential Privacy is a concept that makes the assignment to a test subject difficult by adding noise to the original data. According to Mokbel et al. (2024), DP can be applied locally on the user’s device before data is sent to an untrusted server, or globally by the service provider.

4.1.3 Local Processing and Federated Learning(FL).

Hoseini-Tabatabaei et al. (2013) note that local processing serves to minimize the risk of storage in the cloud. FL is a strategy, in combination with DP, to train models with cross-device datasets while still ensuring sufficient protection.

4.2 Challenges in Implementation in Practice

The implementation of privacy measures in mobile sensing is associated with several practical challenges. A central issue is the PrivacyUtility Trade-off. The Privacy-Utility Trade-off offers higher protection of sensitive attributes, which, however, heavily modifies the data. Mokbel et al. (2024) warn that this leads to an impairment of the usefulness of the data. This trade-off is particularly evident in techniques like Differential Privacy (see Section 4.1.2), where adding too much ’noise’ to protect the user renders the data useless for fine-grained mobility analysis. At the same time, there is a lack of standardized metric frameworks. Such frameworks can contribute to quantifying the degree of data protection and facilitate the setting of privacy parameters. Finally, privacy features like disabling sensors lead to data gaps, but increase user acceptance and thereby enable longer-term data collection (Mokbel et al., 2024).

5 Generalization Problems in Models Across Country Borders

The biggest challenge in mobile sensing, especially in modeling everyday behavior, is the problem of generalization across cultural and geographical boundaries, as highlighted by Meegahapola et al. (2023).

5.1 Behavioral Diversity and Distribution Shift

Human behavior such as eating habits, sleep rhythms, and social interactions is shaped by cultural and social norms. These behavioral patterns differ fromcountrytocountry. Thisresults in adistribution shift in the sensor data, which Meegahapola et al. (2023) identify as the cause for impaired performance of the models.

5.2 Model Failure in the Country-Agnostic Approach

Models trained in one region (often in the Global North) show poor performance when applied in another, unseen country. Studies on mood inference confirm this: In the Country-Agnostic Approach, the AUROCvalues of non-personalized models dropped on average to 0.46-0.55. Even hybrid models (partially personalized) showed reduced performance in this approach (0.66-0.73), compared to results achieved in country-specific settings (0.78-0.98) (Busso et al., 2025; Meegahapola et al., 2023).

5.3 Solution Gaps

Although the hybrid approach (partial personalization) represents a practical strategy to improve the relevance and precision of the model, Meegahapola et al. (2023) note that Domain Adaptation (DA) in multimodal mobile sensor data is still a young field of research. This is intended to improve the generalization ability and robustness of machine learning models. Thereby, models can better adapt to local data and better capture individual behaviors.

6 Conclusion

Through the proliferation of modern devices and integrated sensors, sensing has developed into a key technology for mHealth, mobility, and context awareness, as highlighted by Kumar et al. (2021) and Mokbel et al. (2024). The modeling of everyday behavior usually happens passively and uses a multitude of sensor modalities. Studies with geographically diverse datasets, such as DiversityOne, which contains raw data from 26 modalities from eight countries, prove that cultural and regional differences lead to generalization problems. Models trained in one country and applied in an unseen country show reduced performance. Therefore, researchers like Mäder et al. (2024) and Busso et al. (2025) argue that hybrid and partially personalized models are necessary to achieve high accuracy. However, the potential of these applications faces a critical counterweight: privacy.The depth of collected data requires strict adherence to standards like the GDPR. While techniques such as Differential Privacy offer solutions, they create an inherent tension between data utility and user protection. Ultimately, this review shows that the field has matured from simply demonstrating feasibility to addressing the complex challenges of robust, privacy-preserving, and cross-culturally valid sensing. (Busso et al., 2025; Delgado-Santos et al., 2022; Kumar et al., 2021; Meegahapola et al., 2023; Mokbel et al., 2024; Mäder et al., 2024).

6.1 Future Research

Future research, however, must focus on the development of methods for domain adaptation in mobile sensor data to improve robustness, a direction proposed by Busso et al. (2025) and Meegahapola et al. (2023). The next steps include the development of a general metric framework that allows a measurable assessment of data protection, as well as the definition of standardized data schemas for collected data. This helps to reduce data imbalance in personalized models (Busso et al., 2025; Delgado-Santos et al., 2022; Meegahapola et al., 2023; Mokbel et al., 2024) (Meegahapola et al., 2023).

References

Apple. 2025. iPhone 17 Pro Max- Technische Daten- Apple Support (AT). https: //support.apple.com/de-at/125091 Matteo Busso, Andrea Bontempelli, Leonardo Javier Malcotti, Lakmal Meegahapola, Peter Kun, Shyam Diwakar, Chaitanya Nutakki, Marcelo Dario Rodas Britez, Hao Xu, Donglei Song, Salvador Ruiz Correa, Andrea-Rebeca Mendoza-Lara, George Gaskell, Sally Stares, Miriam Bidoglia, Amarsanaa Ganbold, Altangerel Chagnaa, LucaCernuzzi, Alethia Hume,RonaldChenu-Abente,RoyAliaAsiku,IvanKayongo, Daniel Gatica-Perez, Amalia de Götzen, Ivano Bison, and Fausto Giunchiglia. 2025. DiversityOne: A Multi-Country Smartphone Sensor Dataset for Everyday Life Behavior Modeling. 9, 1 (2025), 1:1–1:49. doi:10.1145/3712289 Paula Delgado-Santos, Giuseppe Stragapede, Ruben Tolosana, Richard Guest, Farzin Deravi, and Ruben Vera-Rodriguez. 2022. A Survey of Privacy Vulnerabilities of Mobile Device Sensors. 54, 11 (2022), 224:1–224:30. doi:10.1145/3510579 Seyed Amir Hoseini-Tabatabaei, Alexander Gluhak, and Rahim Tafazolli. 2013. A survey on smartphone-based systems for opportunistic user context recognition. 45, 3 (2013), 27:1–27:51. doi:10.1145/2480741.2480744 Devender Kumar, Steven Jeuris, Jakob E. Bardram, and Nicola Dragoni. 2021. Mobile and Wearable Sensing Frameworks for mHealth Studies and Applications: A Systematic Review. 2, 1 (2021), 1–28. doi:10.1145/3422158 Lakmal Meegahapola, William Droz, Peter Kun, Amalia de Götzen, Chaitanya Nutakki, Shyam Diwakar, Salvador Ruiz Correa, Donglei Song, Hao Xu, Miriam Bidoglia, George Gaskell, Altangerel Chagnaa, Amarsanaa Ganbold, Tsolmon Zundui, Carlo Caprini, Daniele Miorandi, Alethia Hume, Jose Luis Zarza, Luca Cernuzzi, Ivano Bison, Marcelo Rodas Britez, Matteo Busso, Ronald Chenu-Abente, Can Günel, Fausto Giunchiglia, Laura Schelenz, and Daniel Gatica-Perez. 2023. Generalization and Personalization of Mobile Sensing-Based Mood Inference Models: An Analysis of CollegeStudentsinEightCountries. 6,4(2023),176:1–176:32. doi:10.1145/3569483 Mohamed Mokbel, Mahmoud Sakr, Li Xiong, Andreas Züfle, Jussara Almeida, Taylor Anderson, Walid Aref, Gennady Andrienko, Natalia Andrienko, Yang Cao, Sanjay Chawla, Reynold Cheng, Panos Chrysanthis, Xiqi Fei, Gabriel Ghinita, Anita Graser, Dimitrios Gunopulos, Christian S. Jensen, Joon-Seok Kim, Kyoung-Sook Kim, Peer Kröger, John Krumm, Johannes Lauer, Amr Magdy, Mario Nascimento, Siva Ravada, Matthias Renz, Dimitris Sacharidis, Flora Salim, Mohamed Sarwat, Maxime Schoemans, Cyrus Shahabi, Bettina Speckmann, Egemen Tanin, Xu Teng, Yannis Theodoridis, Kristian Torp, Goce Trajcevski, Marc van Kreveld, Carola Wenk, Martin Werner, Raymond Wong, Song Wu, Jianqiu Xu, Moustafa Youssef, Demetris Zeinalipour, Mengxuan Zhang, and Esteban Zimányi. 2024. Mobility Data Science: Perspectives and Challenges. 10, 2 (2024), 10:1–10:35. doi:10.1145/3652158 Aurel Ruben Mäder, Lakmal Meegahapola, and Daniel Gatica-Perez. 2024. Learning About Social Context From Smartphone Data: Generalization Across Countries and Daily Life Moments. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2024-05-11) (CHI ’24). Association for Computing Machinery, 1–18. doi:10.1145/3613904.3642444 Fabrício A. Silva, Augusto C. S. A. Domingues, and Thais R. M. Braga Silva. 2018. Discovering Mobile Application Usage Patterns from a Large-Scale Dataset. 12, 5 (2018), 59:1–59:36. doi:10.1145/3209669

The comments are closed.