Temporal dynamics of user activities: deep learning strategies and mathematical modeling for long-term and short-term … – Nature.com

Our framework has two main axes: classifying the users activities and constructing his dynamic profile. The following subsections clarify each axis.

Weighted-based user profile is a representation in which the user profile is represented by a keyword or a set of keywords that is directly provided by the system or automatically extracted from web pages or documents. Keywords are associated with numerical weights to represent the user's interests in different topics or categories.

In our previous research15, we considered a user u inside the social media group , with a static profile (P_{u}) and discussing N topics. We used a weighted-based user profile to present the dynamic profile of the user. (D_{u} (t)), which reflects the position (x_{u})(m-dimensions) of the user inside the topic sphere such that (x_{u} (t_{i} ) = (d_{u}^{{c_{1} }} (t_{i} ),d_{u}^{{c_{2} }} (t_{i} ),...,d_{u}^{{c_{m} }} (t_{i} ))). (d_{u}^{{c_{j} }} (t_{i} )) is the distance between the user and the jth topic after the ith iteration is a representation in which the user profile is represented by a keyword or a set of keywords that is directly provided by the system or automatically extracted from web pages or documents. Keywords are associated with numerical weights representing the user's interests in different topics or categories.

Our model is based on the following assumptions about the connection between the user and topics:

The topics the user is interested in represent 100% of his mind.

The total similarity between the user and each topic depends on the users static profile (sim_{u}^{{c_{j} }} left( {t_{0} } right)), the user's activities (A_sim_{u}^{{c_{j} }} left( t right)), and the user's following list (F_sim_{u}^{{c_{j} }} left( t right)).

The user's interests found in his static profile are used to calculate the initial similarity between the user and each topic (c_{j}).

Users activities like posts P, shares S, or likes L have different significance weights.

The similarities between the user and the topic increased as the distance between the user and the topic decreased.

The distance between the user and each topic changed after each activity.

Consider bloggers who use social media to display their daily activities and aren't interested in wars or disasters. One day, a catastrophe occurred in their country, so they used their social accounts to express their feelings and to support the victims, etc. Their user profiles should reflect the unusual reaction to the crisis as a short-term interest and the entertainment and other elder interests as long-term ones.

In this paper, we will introduce how to use our model to accommodate the short-term and long-term profiles.

(Temporal user profile) The temporal profile (D_{u} (time)) of user u is the position (x_{u}) of the user inside the topic sphere based on specific timespans.

$$ x_{u} (time) = (d_{u}^{{c_{1} }} (time),d_{u}^{{c_{2} }} (time),...,d_{u}^{{c_{m} }} (time)), $$

(1)

where (d_{u}^{{c_{j} }} (time)) is the distance between the user and the jth topic category at the end of a given period. For the long-term profile, the beginning point of the user is the creation of the profile till the current moment. Accordingly, the initial values will be determined as mentioned in the 3rd point by using the users static profile. On the other hand, the beginning of the user in the short-term profile is the start of the specified period. Hence, the start values of (d_{u}^{{c_{j } }}) will be the users dynamic profile at the beginning of the time span. Using the temporal-based profile, we can explore how the user profile evolves over time; for example, we could investigate if there are any variations between the users profile generated on weekends compared to his profile on weekdays, etc.

In order to measure the difference between the two profiles, we apply the Manhattan distance (also known as L1-distance) in vector representation:

$$ L_{1} left( {x_{u} left( {time_{y} } right),x_{u} left( {time_{z} } right)} right) = mathop sum limits_{i} left| {,d_{u}^{{c_{i} }} left( {time_{y} } right) - d_{u}^{{c_{i} }} left( {time_{z} } right)} right|,,,,,,L_{1} in left[ {0..2} right] $$

(2)

The higher the L1 value, the larger the disparity between the two profiles, and vice versa. Manhattan distance provides an overall measure of similarity or dissimilarity between the two profiles. As it calculates the distance between two points by summing the absolute differences in their coordinates, it is more robust to outliers and variations in individual dimensions (i.e., it does not specify which interests contribute more or less to the overall distance). To analyze the user's behavior and detect if there is any unexpected change in it, we will calculate the squared differences to obtain more detailed information about the differences between each corresponding distance in the two profiles.

$$ squared,difference,for , d_{u}^{{c_{i} }} , = left( {d_{u}^{{c_{i} }} (time_{y} ) - d_{u}^{{c_{i} }} (time_{z} )} right)^{2} $$

(3)

The squared difference is used to calculate the squared value of the difference between the corresponding coordinates of two points in a multidimensional space. It is useful when assessing the magnitude of change within specific categories, as it amplifies differences between values. The squared distance may be sensitive to outliers and can overemphasize large differences, so it's typically utilized at the category level rather than for overall profile changes. By setting specific thresholds or criteria, we can define significant differences in user behavior or discover unusual changes in user interests. For example, we might consider elements with squared differences above a certain threshold to reflect a significant change. Criteria such as when a user becomes interested in a topic for the first time and for how long he was interested in it could be an indicator of whether it is a temporary change or if it will be a lasting one.

Classifying the activities of a user is a key task in creating his dynamic profile. Since deep learning models have consistently proven their effectiveness in resolving numerous text classification challenges, we used them to classify text into specific topics. Figure1 shows an overview of the proposed models.

The architecture of proposed topic-classification models.

We applied the models to two sets of tweets; the first one is the tweet data set collected by16, which consists of 22,424 manually labeled tweets divided into 11 topic categories (C1) business/finance, (C2) crisis [disaster/war], (C3) entertainment, (C4) politics, (C5) health/medical, (C6) law/crime, (C7) weather, (C8) life/society, (C9) sports, (C10) technology/internet, and (C11) others distributed as shown in Table 2. We observed that the dataset is imbalanced as there is a substantial disparity in the number of tweets between different classes, which could affect the performance of classifiers.

In order to handle this problem, we modified the dataset in a way that each class contains 3500 tweets. For classes with tweets less than 3500, we collected relevant tweets using Twitter API to reach the specified number; on the other hand, classes with tweets more than 3500 are deducted by randomly removing redundant tweets. The final dataset consists of 35,000 tweets distributed equally between 10 categories by eliminating the others class C11.

Preprocessing steps are applied to ensure that the tweets are clean and suitable for the classification process. We lowercase all tweets to eliminate case-related variations. Special characters except ($ and %), punctuations, URLs, mentions, and hashtags are removed. After that, we applied tweet tokenization by the tokenizer in the NLTK package.

After the tokenization, the tweets text is represented as vectors (numerical values) using an embedding model. Word embeddings are a type of distributed representation in an n-dimensional space designed to capture the semantic meanings of words. We used two distributed pre-trained word embedding models, GloVe17 and FastText18, to capture the semantic meaning of words in a sequence of text. Glove focuses on capturing global co-occurrence statistics of words in large text corpora, aiming to represent words based on their contextual relationships. In our model, we used GloVe embeddings that are trained on a large corpus with 300d vectors. FastText is an algorithm developed by Facebook that treats each word as a combination of n-gram characters, allowing it to represent out-of-vocabulary words and morphological variations effectively. FastText offers more flexibility and robustness in handling a wide range of languages and text types. We used FastText and GloVe separately and compared the results to study which one has a better impact on achieving higher classification accuracy.

Embedding vectors produced by embedding models are fed into the deep-learning classification model. We applied two kinds of classification models in this paper:

Recurrent Neural Networks (RNNs): These are a type of neural network designed for processing sequential data. They have a unique ability to maintain an internal memory or hidden state that allows them to capture dependencies over time. However, traditional RNNs suffer from vanishing gradient problems during training, making it challenging to capture long-term dependencies effectively. To solve these issues, several modifications and variants of RNNs have been developed. Long Short-Term Memory (LSTM) networks19. introduce sophisticated gating mechanisms to control the flow of information, enabling them to capture long-range dependencies. Bidirectional LSTM (Bi-LSTM)20 processes data in both forward and backward directions, enhancing context understanding. Gated Recurrent Unit (GRU)21 is another variant of RNNs that is known for its efficiency and simplicity. They are effective at capturing sequential patterns and have been widely employed in various natural language processing tasks, text classification, and time series prediction, offering a balance between computational efficiency and modeling capability.

BERT Model: BERT22 is a transformer-based model that could be fine-tuned to solve a wide range of real-world NLP tasks. Fine-tuning BERT to classify text typically involves feeding labeled data to BERT and updating its parameters through backpropagation. This process allows BERT to leverage its pre-trained knowledge of language and semantics to excel in the classification task, often achieving state-of-the-art results with relatively little training data. In our experiments, we used a compact version of BERT called DistilBERT23 that is designed to be smaller and faster while maintaining much of BERT's language understanding capabilities. It achieves this by employing knowledge distillation techniques during training, where it learns from a larger pre-trained BERT model. The key distinctions lie in the reduced size and efficiency of DistilBERT, making it more suitable for applications with limited computational resources or a need for faster inference.

The first layer of the DistilBERT model involves the initial preprocessing and transformation of raw tweet text data into a structured format that can be fed into the DistilBERT model for further processing and classification. It encompasses tokenization, padding, truncation, the addition of special tokens to create input tensors, and creating attention masks. DistilBERT takes the tokenized tweet text as input and generates contextualized embeddings for each token in the text. These embeddings capture semantic and contextual information.

The model variant used for classification is DistilBERT-base-uncased. This variant is based on the DistilBERT architecture and is case-insensitive (lowercase). It is a smaller and more efficient version of the original BERT model. DistilBERT models typically consist of 6 layers of transformer encoder blocks, 768 hidden dimensions, and 12 attention heads in each multi-head self-attention mechanism. The vocabulary size of DistilBERT is typically 30,000. This means that the model can tokenize and work with a vocabulary of 30,000 unique sub-word pieces.

The performance metrics used to evaluate our models are accuracy, precision, recall, and F1-score. Accuracy measures the overall correctness of the model's predictions by calculating the ratio of correctly classified instances to the total number of instances.

$$ Accuracy = frac{Number;of;corrected;topic;predictions}{{Total;number;of;predictions}} $$

(3)

Precision evaluates the model's ability to make accurate positive predictions within each class, indicating the fraction of correctly predicted positive instances among all instances predicted as positive.

$$ Precision = frac{{Number,of;correct;predictions;of;the;topic left( {TP} right)}}{{Total;number;of;instances;predicted;as;that;topic left( {TP + FP} right)}} $$

(4)

Recall, on the other hand, gauges the model's ability to capture all positive instances within each class, measuring the fraction of correctly predicted positive instances among all actual positive instances.

$$ Recall = frac{{Number;of;correct;predictions;of;the;topic left( {TP} right)}}{{Total;number;of;instances;actually,in;that;topic left( {TP + FN} right)}} $$

(5)

The F1-score is a balanced measure that combines precision and recall, providing a single value that reflects the model's overall performance across all classes.

$$ F1 - Score = 2 times frac{{left( {precision times recall} right)}}{{left( {precision + recall} right)}} $$

(6)

Weighted average (WA) and macro average (MA) are two approaches for aggregating precision, recall, and F1-score metrics. Weighted average takes into account the class imbalance by assigning weights based on class proportions, giving more importance to the majority classes. This is useful when optimizing the model's performance with respect to class distribution. In contrast, macro average treats all classes equally, providing an unbiased assessment of the model's ability to perform across all classes, regardless of size or imbalance.

Continue reading here:
Temporal dynamics of user activities: deep learning strategies and mathematical modeling for long-term and short-term ... - Nature.com

Related Posts

Comments are closed.