NUS-SENSE Citation

Farseev, A., and Chua, T.-S. TweetFit: Fusing Sensors And Multiple Social Media For Wellness Profile Learning. Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), San Francisco, California USA, Feb 4-9, 2017. [PDF], [slides], [poster], [bib]

Farseev, A., and Chua, T.-S. Tweet can be Fit: Integrating Data from Wearable Sensors and Multiple Social Networks for Wellness Profile Learning. ACM Transactions on Information Systems (TOIS), 2017. [PDF], [bib]


Wellness is a widely popular concept that is commonly applied to fitness and self-help products or services. Inference of wellness-related attributes (wellness user profiling), such as body mass index (BMI) or diseases tendency, is of crucial importance to various applications in personal and public wellness domains. At the same time, the emergence of social media platforms and wearable sensors makes it feasible to learn wellness user profile from multiple perspectives. However, the research efforts on wellness user profiling from multiple social sources are relatively sparse, while the joint analysis of social media and sensor data was not comprehensively studied yet. We introduce a large-scale dataset towards joint sensor and social media data analysis.

In order to build a comprehensive wellness profile, we harvested data of different modality from multiple social networks: Twitter micro-posts were used as a textual data source; Instagram pictures and it's descriptions (comments) were used as an image and textual data sources, respectively; Foursquare check-in records were used as a location data source; User's workouts were used as a sensor data source and for ground truth construction. The sensors data bridges the gap between social media-based users' representation and their actual physical condition.

Our dataset can be used for both descriptive and prescriptive research. That is to say, we do not intend to constraint future research on user profile learning, since the available ground truth provides possibility to tackle other contemporary problems. The potential research topics that can be conducted on our released dataset are listed below:

  1. Extended multi-source user profile learning. It could be useful to perform further modeling of multi-source multi-modal data. From the data point of view, it is interesting to gain deeper inside into visual and sensor data representations extraction. From the data modeling point of view, it is interesting to study the performance of advanced non-linear models on wellness profile learning.
  2. Research on wearable sensors. Wellness and demographic profile learning, activity identification, and recommendation based on data from wearable sensors.
  3. Causality patterns extraction. It is important to discover potential causal relationships between events from multiple data sources. For example, the "flower" image concept could be temporally related with flower shop check-ins or tweets about flowers.
  4. Multi-view user timeline analysis and events detection. In current study, we focused on user profile learning based on data representations that are aggregated from the whole data gathering period. However, it could be useful to study the data dynamics aspect over time and space. It is also important to discover potential causal relationships between multi-source events.
  5. Cross-source user identification. The identification of the same users across multiple social resource could be addressed from both algorithmic and privacy points of view.


To get the anonymized timeline data, please contact us.


For any questions regarding SENSE dataset, please contact us