Category Archives: Machine Learning
Airbnb using machine learning technology to prevent parties – KYW
PHILADELPHIA (KYW Newsradio) With the help of machine learning technology, Airbnb says it will be cracking down on parties this summer.
Its really important that those spaces are respected and treated with care, and that, you know, people are not showing up and taking advantage of that, said Airbnbs Global Director of Corporate and Policy Communications Christopher Nulty.
The best part about staying in an Airbnb is often that you're staying in a neighborhood, and the only way to continue staying in a neighborhood is to be a good neighbor.
Nulty says the company will be using the technology to prevent any disruptive parties, paying close attention to bookings on Memorial Day, Fourth of July and Labor Day. It looks at how long guests are staying, past rental ratings, distance from home, and the number of guests.
So far, it has resulted in a 50% reduction in unauthorized parties. In 2023, more than 67,000 people across the U.S., including 950 in Philadelphia, were deterred from booking entire home listings over those weekends.
Those who are flagged, but arent actually planning on throwing a party, can call Airbnbs customer service line.
Read the original:
Airbnb using machine learning technology to prevent parties - KYW
Predicting multi-label emojis, emotions, and sentiments in code-mixed texts using an emojifying sentiments framework … – Nature.com
Extensive discussion of experiments, results, and analysis on our introduced dataset for the proposed method and existing state-of-the-art baselines are presented below.
The following baseline methods are compared to our proposed approach.
XLMR(^{[FT+LS+RF]})86: In this method, a pre-trained BERT (Bidirectional Encoder Representations from Transformers) model is fine-tuned (FT) to perform sentiment analysis. To reduce overfitting, the authors incorporated label smoothing (LS) and rule-based features (RF) such as negation handling and sentiment shift detection. This model is used for emoji, sentiment, and emotion analysis tasks.
Multilingual BERT (mBERT)87: The authors utilized a transformer-based language model called mBERT to learn contextual embeddings for words in multiple languages. mBERT was pre-trained on large amounts of monolingual and multilingual text data and fine-tuned on the SentiMix code-mixed dataset for sentiment detection and emotion recognition.
XLMR(^{MTL})87: The authors used XLM-R, a cross-lingual language model based on a transformer architecture that was pre-trained on a larger dataset including code-mixed text. XLM-R can encode and decode text in multiple languages and has achieved state-of-the-art results on various NLP tasks, including sentiment analysis and emotion recognition. They fine-tuned XLM-R on the SentiMix code-mixed dataset for sentiment detection and emotion recognition.
TL-XLMR(^{[LS]})6: To detect sentiment and recognize emotions in the SentiMix code-mixed dataset, the authors employed an end-to-end multitask framework based on a transformer architecture. They fine-tuned XLM-RoBERTa (XLMR), a pre-trained cross-lingual embedding model, with task-specific data to improve model efficiency through transfer learning.
TL-mBERT(^{[LS]})6: In this ablation experiment, the authors replaced the XLMR module with mBERT to investigate the significance of the sentence encoder in TL-XLMR(^{[LS]}). The model was fine-tuned on the SentiMix code-mixed dataset to perform sentiment detection and emotion recognition.
Our suggested model is put into practice using PyTorch, a well-liked Python deep-learning toolkit. We employ the F1-score (F1) as our evaluation metric for both emotion and sentiment prediction and for emoji we used Jaccord Index (JI), macro F1-score. We utilize Adam optimizer88 and do a grid search for 200 epochs to improve the model. We use Transformer Encoder with two layers our embedding size is 300 which we find empirically (checked for 100, 150, 200 and 300). The dropout rate is set at 0.5 while the learning rate is set at 0.05. The auto-latent encoders dimension was found to be 2048 using empirical techniques. The discriminator, ({mathcal {D}}), is composed of two fully connected layers, a ReLU layer. The learning rate is set to 1e-3, weight decay of 1e-4, and momentum of 0.3. By contrasting the F1 and accuracy scores with different baselines, the efficacy of our strategy is assessed. In the CM-RFT, the kernel is dynamically computed from the input using a fully connected layer. The kernel sizes are [3, 5, 7, 31*3], and each module has 4 heads (half the number of heads in the transformer base model).
For the emoji detection tasks, we consider the Jaccard Index (JI)89 and Hamming loss (HL)90 metrics to evaluate the performance of our proposed system. Additionally, we also report the micro-averaged F191 score and Accuracy values for the same (as shown in Table8). JI, HL, and micro-averaged F1 are popular choices to evaluate multi-label classification tasks. For the sentiment and emotion detection tasks (as shown in Tables9 and 10), we report the macro-averaged F1 score91 and accuracy values for our proposed model.
Micro-averaged F1 score: For multi-label classification tasks, the micro-averaged F1 score is a commonly used metric that computes the F1 score globally by counting the true positives (TP), false negatives (FN), and false positives (FP) across all labels. The formula for the micro-averaged F1 score is: (F1_{micro} = frac{2 * sum _{i=1}^n TP_i}{2 * sum _{i=1}^n TP_i + sum _{i=1}^n FP_i + sum _{i=1}^n FN_i})
Macro-averaged F1 score: The macro-averaged F1 score is another commonly used metric for multi-label classification tasks. It computes the F1 score for each label and then takes the average of these F1 scores. The formula for the macro-averaged F1 score is: (F1_{macro} = frac{1}{n} sum _{i=1}^n frac{2 * TP_i}{2 * TP_i + FP_i + FN_i})
Accuracy: Accuracy is a metric that measures the proportion of correctly classified labels to the total number of labels. The formula for accuracy is: (A = frac{sum _{i=1}^n TP_i}{sum _{i=1}^n TP_i + sum _{i=1}^n FP_i})
Hamming Loss: The Hamming loss measures the proportion of misclassified labels to the total number of labels. The formula for Hamming loss is: (HL = frac{1}{n} sum _{i=1}^n frac{xor(Y_i, hat{Y_i})}{m}) where n is the number of instances, m is the number of labels, (Y_i) is the true label vector for instance i, (hat{Y_i}) is the predicted label vector for instance i, and xor is the logical XOR operator.
Jaccard Index: The Jaccard Index measures the similarity between two sets by computing the ratio of the size of their intersection to the size of their union, and it is used to measure the similarity between the predicted and true label sets in multi-label classification. The formula for the Jaccard Index is: (JI = frac{1}{n} sum _{i=1}^n frac{|Y_i cap hat{Y_i}|}{|Y_i cup hat{Y_i}|}) where n is the number of instances, (Y_i) is the true label set for instance i, and (hat{Y_i}) is the predicted label set for instance i. The Jaccard similarity is computed as the size of the intersection of the predicted and true label sets divided by the size of their union. The resulting score ranges from 0 to 1, with 1 representing the perfect similarity between the predicted and true label sets.
Tables8, 9, and 10 present the performance of CM-T, CM-FT, and CM-RFT models for the emoji, sentiment, and emotion tasks in UTL, DTL, and TTL setups. These setups investigate the effectiveness of multi-task learning in improving overall system performance compared to single-task learning.
The results reported in Table8 are the performance metrics of three different models (CM-T, CM-FT, CM-RFT) trained on three different setups (uni-task learning, dual-task learning, and tri-task learning) for the task of emoji detection.
In the uni-task learning setup, where each task is solved individually, the performance of the CM-RFT model improves as more features are added. Specifically, the performance improves as we go from using only character embeddings to character embeddings + Elmo embeddings + TF-IDF. The F1 score increases from 0.59 to 0.64, the accuracy score from 0.62 to 0.67, while the hamming loss decrease from 0.15 to 0.13, and the Jaccard index increases from 0.52 to 0.56. These results suggest that using multiple features can improve the performance of the emoji detection task.
In the dual-task learning setup, where the emoji task is jointly learned with sentiment/emotion tasks are jointly learned, the performance of the CM-RFT model further improves compared to the uni-task learning setup. The improvement is more evident when the model is trained on Character embeddings + Elmo embeddings + TF-IDF features. The F1 score increases from 0.64 to 0.68, the accuracy score from 0.67 to 0.71, while the Hamming loss decrease from 0.13 to 0.07, and the Jaccard index increases from 0.56 to 0.61, respectively. These results suggest that training the model on multiple tasks can lead to further improvements in the performance of the emoji detection task.
In the tri-task learning setup, where sentiment, emotion, and emoji detection tasks are jointly learned, the performance of the CM-RFT model improves even further compared to the dual-task learning setup. The F1 score increases from 0.68 to 0.73, the accuracy score from 0.71 to 0.75, while the Hamming loss decrease from 0.07 to 0.054, and the Jaccard index increases from 0.61 to 0.69. These results suggest that joint learning of multiple tasks leads to significant improvements in the performance of the emoji detection task.
Overall, the results suggest that the performance of the emoji detection task can be improved by using multiple features and by training the model on multiple tasks. Additionally, the results suggest that sentiment and emotion have a significant impact on the performance of the emoji detection task as joint learning of these tasks leads to significant improvements in performance.
The sentiment classification task results are presented in Table9 for the joint learning of emotion and emoji tasks. In the uni-task setup, where each task is performed independently, the CM-RFT model achieves the highest performance for the sentiment task with an F1 score of 72.65 and accuracy of 75.19. This suggests that including extra features, such as Elmo embeddings and TF-IDF features, can enhance sentiment detection performance across all models compared to those utilizing only character embedding features.
In the dual-task setup, when sentiment and emoji tasks are jointly learned, the F1 score and accuracy score of the sentiment detection task improve from 72.65 and 75.19, respectively, in the uni-task setup to 78.22 and 79.21, respectively, when using character embeddings, Elmo embeddings, and TF-IDF features. Similarly, when sentiment and emotion tasks are jointly learned, the F1 score and accuracy score of the sentiment detection task improve from 72.65 and 75.19, respectively, in the uni-task setup to 74.64 and 77.31, respectively, when using character embeddings, Elmo embeddings, and TF-IDF features.
In the tri-task setup, where sentiment, emotion, and emoji detection tasks are solved jointly, the CM-RFT model achieves the best performance for the sentiment task with an F1 score of 82.35 and accuracy of 83.14, followed by the CM-FT model with an F1 score of 75.42 and accuracy of 79.26. This again confirms that multitask learning helps to improve sentiment detection performance when it is learned jointly with other tasks.
The findings indicate that integrating emotion and emoji detection tasks into the sentiment classification task can enhance the models performance. The tri-task learning setup demonstrated the highest performance for the sentiment task, implying that incorporating these extra tasks can improve the models comprehension of the sentiment expressed in text. The enhanced performance is likely due to the additional contextual information that emotions and emojis provide, particularly in cases where the sentiment is complicated or sarcastic. Therefore, incorporating emotion and emoji detection tasks could be a useful technique for enhancing the performance of sentiment classification models. Moreover, incorporating additional features, such as Elmo embeddings and TF-IDF features, can also improve the sentiment detection performance.
According to the results presented in Table10, we can observe that the performance of the emotion task increases as we transition from single-task learning to dual-task and eventually to tri-task learning. In the single-task setup, the CM-RFT model outperforms the CM-T and CM-FT models across all three feature combinations, indicating that incorporating sentiment and emoji information can enhance the emotion detection tasks performance. In the dual-task setup with emoji, the performance of all models is considerably lower than in the single-task setup. However, the performance improves as more features are incorporated, and the CM-RFT model achieves the best results with all three features. This suggests that utilizing various feature types can benefit joint learning of emoji and emotion detection, and the tri-task setup may provide further improvement. In the dual-task setup with the sentiment, the performance is better than with emoji. The addition of Elmo embeddings and TF-IDF features leads to consistent performance improvement, with the CM-RFT model again achieving the best results. This implies that joint learning of sentiment and emotion detection can also benefit from the use of multiple feature types.
The presence of sentiment and emoji information appears to enhance the emotion tasks performance, as suggested by the results. The best performance for the emotion task was obtained in the tri-task learning setup, which involved jointly learning sentiment, emotion, and emoji detection tasks. The improvement in performance can be attributed to the fact that sentiment and emoji provide additional contextual information that can help in better disambiguation of emotions.
The results also suggest that multitask learning is more effective than single-task learning, especially when the tasks are related, such as emotion, sentiment, and emoji detection. The emotion tasks performance improved consistently as we progressed from single-task to dual-task and finally to tri-task learning. This indicates that joint learning of related tasks can better utilize the available information and improve the overall performance of the system.
The presented results in Table11 indicate that the CM-RFT model proposed in this study performs better than the state-of-the-art models for both sentiment and emoji detection tasks. In the single-task scenario, mBERT achieved the highest accuracy of 63.77% and an F1 score of 61.54% for the emoji detection task. However, in the multi-task setting, the proposed CM-RFT model surpasses all other models, achieving an accuracy of 75.81% and an F1 score of 73.25%. This shows that the proposed model effectively uses multi-task learning to improve the performance of both tasks. Moreover, the model also shows promising results for the unsupervised emotion detection task, with an F1 score of 60.53% and an accuracy of 63.73%. This demonstrates that the zero-shot approach utilized in the proposed model is effective in detecting emotions from the text even without labeled data.
When focusing on the emoji prediction task, the proposed CM-RFT model outperforms both single-task and multi-task models significantly. The model achieves an accuracy of 75.81%, which is approximately 12% higher than the accuracy of the best-performing single-task model (mBERT) and approximately 9% higher than the accuracy of the best-performing multi-task model (TL-XLMR(^{[LS]})). Moreover, the models F1 score is 73.25%, which is approximately 12% higher than the F1 score of the best-performing single-task model (mBERT) and approximately 8% higher than the F1 score of the best-performing multi-task model (TL-XLMR(^{[LS]]})).
We conducted additional experiments with our proposed model to compare it fairly with the single- and multi-task baselines discussed earlier. As none of the baseline models addressed unsupervised classification, they couldnt generate scores for the emotion task, unlike our proposed CM-RFT model that solves sentiment and multi-label emoji detection in a supervised setting and emotion detection in an unsupervised setting using a zero-shot approach. Therefore, we trained two versions of the CM-RFT model: one in a single-task setting (CM-RFT(^{STL}) (_{[-Emo]})) for all tasks and another in a multitask setting (CM-RFT(^{MTL}) (_{[-Emo]})) without the emotion task. The results are presented in Table11.
Comparing the performance of CM-RFT(^{STL}) (_{[-Emo]}) with single-task models XLMR, XLMR(^{[FT+LS+RF]}), mBERT, we observe that STL-CM-RFT outperforms all these models in terms of accuracy and F1 scores for the emoji and sentiment tasks. For example, the accuracy of CM-RFT(^{STL}) (_{[-Emo]}) is 67.30% for the emoji task, while the highest accuracy achieved by single-task models is 63.77% by mBERT. Similarly, CM-RFT(^{STL}) (_{[-Emo]}) achieves an F1 score of 74.64% for sentiment detection, while the highest F1 score achieved by single-task models is 70.32% by mBERT. These results indicate that the inclusion of the unsupervised emotion task has indeed helped the model perform better on supervised tasks.
Comparing the performance of CM-RFT(^{MTL}) (_{[-Emo]}) with multi-task models MT-XLMR, TL-XLMR(^{[LS]}), TL-mBERT[LS], we observe that CM-RFT(^{MTL}) (_{[-Emo]}) outperforms all these models in terms of accuracy and F1 scores for both emoji and sentiment tasks. For example, the accuracy of CM-RFT(^{MTL}) (_{[-Emo]}) is 71.68% for the emoji task, while the highest accuracy achieved by multi-task models is 66.83% by TL-XLMR(^{[LS]}). Similarly, MT-CM-RFT achieves an F1 score of 78.22% for sentiment detection, while the highest F1 score achieved by multi-task models is 72.58% by MT-XLMR. These results indicate that the inclusion of the unsupervised emotion task has indeed helped the model perform better in both single-task and multi-task settings.
We evaluate the performance of Llama model on the emotion recognition task by fine-tuning it for three epochs. Our model yielded an F1 score of 60.53 for emotion recognition which positions closely alongside the Llama model, which achieved an F1 score of 61.11. These results underscore the effectiveness of our proposed approach in tackling emotion recognition tasks, indicating its potential for practical applications in natural language processing.
To sum up, the CM-RFT model we proposed outperforms the current state-of-the-art models in both sentiment and emoji detection tasks. Our results indicate that taking advantage of multi-task learning and utilizing a zero-shot approach for unsupervised emotion detection can lead to substantial improvements in task performance. For the emoji prediction task, our proposed model achieves a remarkable improvement over the best-performing single-task and multi-task models, demonstrating the efficacy of our approach.
To assess the effectiveness of our model, we conducted comparisons with several papers and their corresponding models.
Comparison Study 1: Emotion Detection in Code-Mixed Roman Urdu - English Text51. Models: We compared our model with BERT and XLM-RoBERTa. Dataset Used: We used the Code-Mixed Roman Urdu - English Text dataset. The results, as shown in Table12, indicate that our model outperforms both BERT and XLM-RoBERTa with an F1 score of 0.69, demonstrating its effectiveness in detecting emotions in code-mixed text.
Comparison Study 2: A self-attention hybrid emoji prediction model for code-mixed language92 Models: We compared our model with BARF. Dataset Used: We used the Hinglish Emoji Prediction (HEP) dataset. The results, as presented in Table13, indicate that our model achieves a higher F1 score of 0.64 compared to BARF, demonstrating its superior performance in predicting emojis in code-mixed language.
Comparison Study 3: Multitasking of sentiment detection and emotion recognition in code-mixed Hinglish data6 Models: We compared our model with TL-XLMR(^{MTL}_{LS}). Dataset Used: We used the SemEval-2020 Task 9 dataset93. Table14 displays the results, showing that our model achieves higher F1 scores for both emotion detection (76.22) and sentiment analysis (70.31) compared to TL-XLMR(^{MTL}_{LS}), indicating its effectiveness in multitasking for sentiment and emotion recognition in code-mixed Hinglish data.
Table15 shows the results of four ablation experiments aimed at evaluating the contribution of different components in the proposed CM-RFT framework. The four components examined are the GLU module, the auto-encoder and ANP module, the self-attention mechanism, and the collective combination of GLU, self-attention, ANP, and AE modules.
The results indicate that each component contributes to the overall performance of the CM-RFT framework. Removing any of these components leads to a significant decline in F1 scores for all three tasks, especially when all four modules are removed (row 4). This suggests that the proposed framework is well-designed, and each module plays a critical role in its success. Specifically, the GLU module seems to be a crucial part of the framework (row 1). The removal of this component leads to a significant decrease in performance across all three tasks, highlighting the importance of non-linear transformations in the text encoder. Similarly, removing the auto-encoder and ANP module leads to a drop in performance (row 2), indicating the importance of these unsupervised pre-training methods in learning useful feature representations. Moreover, the self-attention mechanism appears to be more effective than linear concatenation in fusing the output features of the GLU and Trans Encoder modules (row 3). This result confirms the superior performance of self-attention in capturing long-range dependencies and modeling interactions among input tokens. Finally, the collective combination of GLU, SA, ANP, and AE modules is a highly effective feature learning mechanism (row 4), but it also leads to higher computational costs. The result suggests that one can still achieve decent performance with a simpler linear concatenation mechanism, albeit at the cost of reduced model capacity and expressive power.
In summary, the ablation experiments demonstrate the importance of each module in the proposed CM-RFT framework for multi-label emoji prediction. The findings can guide the design of future models and shed light on the underlying mechanisms that contribute to their success.
Table16 shows the results of four ablation experiments where each experiment is compared to the proposed CM-RFT containing all three loss functions (({mathcal {L}}_{ad}), ({mathcal {L}}_{re}), and ({mathcal {L}}_{al})) for the emoji, emotion, and sentiment tasks.
The F1 scores for all three tasks consistently decrease in each ablation experiment when any of the loss functions are removed. The largest decrease in performance is observed when all three loss functions are removed, indicating that each loss function plays an important role in the models performance. Specifically, removing the ({mathcal {L}}_{ad}) and ({mathcal {L}}_{re}) loss functions has the greatest negative impact on the models performance compared to removing only one of these loss functions. This suggests that these loss functions contribute significantly to the models ability to capture relevant features for both the adversarial training and reconstruction of the input data.
In terms of the contributions of the individual loss functions, the adversarial loss (({mathcal {L}}_{ad})) appears to have a slightly larger impact on performance compared to the alignment loss (({mathcal {L}}_{al})) and reconstruction loss (({mathcal {L}}_{re})), especially for the emoji and emotion detection tasks. This indicates that adversarial loss plays an important role in the models ability to distinguish between different classes for these tasks. On the other hand, the alignment loss and reconstruction loss appear to be more important for sentiment detection.
Overall, these results demonstrate the importance of the proposed loss functions for effective training of the multitask emoji, emotion, and sentiment detection system. These findings can be used to guide the development of more effective training strategies for multitasking learning models in the future. For example, incorporating additional loss functions or modifying the weighting of existing loss functions may improve the models performance. Additionally, these results suggest that the importance of different loss functions may vary depending on the specific tasks being performed and the data being used, highlighting the importance of careful analysis and selection of loss functions in the design of multitask learning models.
In this section, we provide a qualitative analysis of our proposed multitask framework, which takes into account the relationship between emoji, sentiment, and emotion, as we previously mentioned. To illustrate the impact of these tasks on one another, we have selected several examples from the SENTIMOJI dataset and present them in Table17.
Observation 1: In the first sentence, the model correctly predicts a heart emoji, positive sentiment, and joy as the emotion. The model seems to have picked up on the positive sentiment and joy from the words too good and dont know respectively, and predicted the heart emoji to match the positive sentiment. Moreover, the word bhai (brother) may imply a friendly or affectionate tone, leading to the identification of the heart emoji. Finally, the presence of the word joy or similar words in the training data might have helped the model to identify the emotion accurately.
Observation 2: In the second sentence, the model correctly predicts the negative sentiment, but the predicted emoji is wrong. The model predicted a pouting face instead of an angry face, which could be because the pouting face emoji can also indicate dissatisfaction or annoyance, which might be related to pride. Additionally, the emotion is misclassified as disgust instead of anger, which could be because of the strong negative sentiment and the use of words like failure and cant do this.
Observation 3: In the third sentence, the model correctly predicts the Face With Open Mouth, Throwing Up emoji, indicating disgust, along with the negative sentiment. The sentence contains words like missing, which suggests a negative sentiment, and the use of the Face With Open Mouth, Throwing Up emoji, and disgust emotion can be related to the revulsion expressed in the sentence.
Observation 4: In the first multi-label sentence, the model correctly predicts the negative sentiment and joy as the emotion, but only partially predicts the emojis. The use of hardik subhkamnaye and Congratulations sir ji in the sentence indicates a positive sentiment and the use of Dobara pm banee suggests a sense of achievement, which could explain the use of the heart and sparkles emojis. The misclassification of the smiling face emoji could be due to the lack of contextual information or insufficient training data.
Observation 5: In the second multi-label sentence, the model correctly predicts the negative sentiment but misclassifies the emotion as disgust instead of anger. For the emojis, the model predicted pouting face, crying face, and dissapointed face, but the original annotations have pouting face, angry face, and Face With Open Mouth, Throwing Up. This could be because the model picked up on the negative sentiment and the use of words like respect, anything, and woman, which might have led to the prediction of the pouting face emoji, while the crying face and dissapointed face emojis could be related to the negative sentiment.
Observation 6: In the third multi-label sentence, the model correctly identifies the sentiment as negative but wrongly predicts the emotion as anger instead of sad. The model also partially predicts the emojis, which may be due to the presence of multiple emotions in the sentence. To improve the prediction, the model could be trained on more data that contains similar phrases and words to better distinguish between different negative emotions and emojis.
The analysis of the incorrect predictions revealed several common error patterns, which are summarized below:
Ambiguity in Emoji Interpretation: The model often struggles with emojis that have multiple interpretations depending on the context. For example, the emoji can represent both laughter and tears of joy, leading to misclassifications.
Negation and Sarcasm: Negation and sarcasm in text can lead to misinterpretations by the model, especially in sentiment analysis. For instance, the phrase not bad may be interpreted as positive by the model, leading to misclassification.
Lack of Context: The model sometimes fails to capture the context of a sentence, leading to errors in sentiment and emotion classification. For example, short or contextually ambiguous sentences may be misclassified.
Data Imbalance: Imbalance in the distribution of classes can lead to biases in the models predictions, especially for minority classes. This is particularly evident in emotion classification, where some classes have fewer examples than others.
Out-of-Vocabulary Words: The presence of out-of-vocabulary words in the text can lead to errors, especially when the model is unable to capture their semantics. This is more common in emoji and sentiment analysis tasks.
These error patterns highlight the challenges faced by the proposed CM-RFT model in understanding and interpreting text across different tasks. Addressing these challenges requires further research into more robust modeling techniques, better handling of context and ambiguity, and mitigation of biases in the data.
The joint learning of sentiment and emotion tasks with the emoji prediction task may have benefited the performance of the emoji task. This is because emotions and sentiments can provide additional context for the model to predict the appropriate emojis. For example, in the first correct prediction sample, the model was able to correctly predict the heart emoji, which may have been influenced by the positive sentiment and joyful emotion predicted for the sentence. Similarly, in the second incorrect prediction sample, the model correctly predicted the negative sentiment but misclassified the emotion and emoji, suggesting that it may not have fully captured the nuances of the text.
Single-label emojis can be a risk in multilabel emoji prediction because the emojis can have different meanings in different contexts, and a single emoji may not be able to capture all the nuances of the text. For example, the pouting face emoji can be used to express anger, disappointment, or sadness, and without additional context, it can be difficult to determine the exact emotion being conveyed. We observe in the incorrect prediction samples, that the model has predicted some of the emojis correctly while missing some. It is better than having fully incorrect predictions because it shows that the model has some understanding of the context and can predict the relevant emojis to some extent. However, there is still room for improvement in the models performance.
To improve the models predictions, we can consider the following steps:
Increase the training data: The model might benefit from additional training data to capture the various nuances of language and emotions.
Incorporate context: The model might benefit from incorporating the context of the sentence to better identify the sentiment, emoji, and emotion.
Use pre-trained language models: The model might benefit from using pre-trained language models that can capture the semantic meaning of words and phrases.
Regularize the model: The model might benefit from regularization techniques to prevent overfitting and improve generalization.
Analyze and correct errors: Analyzing the models errors and correcting them might help improve the models performance over time.
We perform a study using ChatGPT(https://chat.openai.com/) to demonstrate the effectiveness of our proposed framework. We notice that CM-RFT has an overwhelming performance advantage over ChatGPT. A few sample predictions from ChatGPT on the TASKS task are shown below:
Prompt: Read these hinglish utterances and find the suitable emojis, emotion, and sentiment:
tere liye chand nhi la sakta baby actually tu bhaad mein ja
Tere ghamand k karan hi aaj congress k ye halat hai ... failure hai tu Bhai .. Tujhse na ho payega
Congress ki sarker mai cylinder he gayab ho gaya tha
Human Annotators:
Emoji Label: , ,
Emotion Label: Anger, Anger, Disgust.
Sentiment Label: Negative, Negative, Negative
Proposed_MODEL:
Emoji Label: , ,
Emotion Label: Anger, Disgust, Disgust.
Sentiment Label: Negative, Negative, Negative
ChatGPT:
Emoji Label: , ,
Emotion Label: Dismissive, Anger, Confusion.
Sentiment Label: Negative, Negative, Neutral (depending on the context, it could be interpreted as negative)
In our analysis, it is evident that our model yields results akin to ChatGPT. While ChatGPT is renowned for its high performance, our model demonstrates proficiency, particularly in handling codemixed sentences.
While our proposed CM-RFT model demonstrates strong performance across multiple tasks, there are several limitations and potential biases that need to be addressed:
Data Bias: The performance of the model heavily relies on the quality and representativeness of the training data. Biases present in the training data, such as underrepresentation of certain demographics or topics, can lead to biased predictions by the model.
Language Bias: The models performance may vary across different languages due to differences in linguistic structures, cultural nuances, and availability of training data. It may perform better on languages that are well-represented in the training data compared to those that are not.
Context Sensitivity: The models performance is influenced by the context in which the text is presented. It may struggle with contextually ambiguous or sarcastic text, leading to misinterpretations.
Generalization: The models ability to generalize to unseen data or domains is limited by the diversity and representativeness of the training data. It may perform well on data similar to the training data but struggle with out-of-domain or adversarial examples.
Interpretability: The complex architecture of the proposed CM-RFT model may hinder its interpretability, making it challenging to understand how and why certain predictions are made. This lack of interpretability can limit the models usefulness in real-world applications where transparency and accountability are important.
Addressing these limitations and biases requires careful consideration of model design, training data, evaluation metrics, and ethical considerations. Future research should focus on developing more robust and fair AI models that are capable of handling diverse languages, cultures, and contexts while ensuring transparency, interpretability, and accountability. Additionally, efforts should be made to collect more diverse and representative training data and to develop evaluation metrics that account for biases and fairness concerns. By addressing these challenges, we can build AI models that are more reliable, equitable, and trustworthy for real-world applications.
Continued here:
Predicting multi-label emojis, emotions, and sentiments in code-mixed texts using an emojifying sentiments framework ... - Nature.com
Slack is training its machine learning on your chat behavior unless you opt out via email – TechRadar
Slack has been using customer data to power its machine learning functions, including search result relevance and ranking, leading to the company being criticized over confusing policy updates that led many to believe that their data was being used to train its AI models.
According to the company's policy, those wishing to opt out must do so through their organizations Slack admin, who must email the company to put a stop to data use.
Slack has confirmed in correspondence to TechRadar Pro that the information it uses to power its ML not its AI is de-identified and does not access message content.
An extract from the companys privacy principles page reads:
To develop non-generative AI/ML models for features such as emoji and channel recommendations, our systems analyze Customer Data (e.g. messages, content, and files) submitted to Slack as well as Other Information (including usage information) as defined in our Privacy Policy and in your customer agreement.
Another passage reads: To opt out, please have your org, workspace owners or primary owner contact our Customer Experience team at feedback@slack.com
The company does not provide a timeframe for processing such requests.
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
In response to uproar among the community, the company posted a separate blog post to address concerns arising, adding: We do not build or train these models in such a way that they could learn, memorize, or be able to reproduce any customer data of any kind.
Slack confirmed that user data is not shared with third-party LLM providers for training purposes.
The company added in its correspondence to TechRadar Pro that its "intelligent features (not Slack AI) analyze metadata like user behavior data surrounding messages, content and files but they don't access message content."
See the original post:
Slack is training its machine learning on your chat behavior unless you opt out via email - TechRadar
US Pharma and Biotech Summit 2024: Artificial Intelligence and Machine Learning Through the Eyes of the FDA Part II – Pharmaceutical Executive
PE: Do you see the FDA placing any restrictions on the use of AI and machine learning as times goes on? What may prompt such actions?
Fakhouri: Like I mentioned during the keynote interview, we get asked, does FDA regulate large language models? Are you going to ban generative AI use? My response is that we typically don't regulate linear regression. We look at the data and the information that any modeling technique is producing, and we want to make sure that the information is trustworthy. So, I wouldn't say that we would be banning or prohibiting a certain AI or machine learning type of algorithm, what we're actually interested in is how robust how accurate, how credible, the information from these models is.
PE: What do you think the future may hold for AI and machine learning in pharma R&D in both the short- and long-term?
Fakhouri: We're actually very excited about AI use, I think we're seeing that it's increasing efficiencies in different parts of the drug development process. If you think about things such as discovery or protein folding, which again, is outside of what we normally look at, it could potentially cut the development time by years. This is all very exciting, because it could translate into faster, safe and effective drugs coming into the market. It can also fill in certain gaps for rare diseases, for example, where we can see a lot of potential use for AI to accelerate the development of drugs. In this type of situation, that's what I would say would be the long term. With the short term, I think what we're all doing, whether it's industry, whether it's the regulator's academia, is we're going through this adoption curve. You need to train your staff, you need to bring in the right expertise, and you need to develop the right tools to solve the right problems. That's going to take some time and that's why I think the short term uses of AI are going to be mostly low hanging type of fruits where you're increasing operational efficiency, but then that will translate into the development of safe and effective drugs faster.
Machine Learning Researcher Links OpenAI to Drug-Fueled Sex Parties – Futurism
A machine learning researcher is claiming to have knowledge of kinky drug-fueled orgies in Silicon Valley's storied hacker houses and appears to be linking those parties, and the culture surrounding them, to OpenAI.
"The thing about being active in the hacker house scene is you are accidentally signing up for a career as a shadow politician in the Silicon Valley startup scene," begins the lengthy X-formerly-Twitter post by Sonia Joseph, a former Princeton ML researcher who's now affiliated with the deep learning institute Mila Quebec.
What follows is a vague and anecdotal diatribe about the "dark side" of startup culture made particularly explosive by Joseph's reference to so-called "consensual non-consent" sex parties that she says took place within the artificial general intelligence (AGI) enthusiast community in the valley.
The jumping off point, as far as we can tell, stems from a thread announcing that OpenAI superalignment chief Jan Leike was leaving the company as it dissolved his team that was meant to prevent advanced AI from going rogue.
At the end of his X thread, Leike encouraged remaining employees to "feel the AGI," a phrase that was also ascribed to newly-exited OpenAI cofounder Ilya Sutskever during seemingly cultish rituals revealed in an Atlantic expos last year but nothing in that piece, nor the superalignment chief's tweets, suggests anything having to do with sex, drugs, or kink.
Still, Joseph addressed her second viral memo-length tweet "to the journalists contacting me about the AGI consensual non-consensual (cnc) sex parties." And in the post, said she'd witnessed "some troubling things" in Silicon Valley's "community house scene" when she was in her early 20s and new to the tech industry.
"It is not my place to speak as to why Jan Leike and the superalignment team resigned. I have no idea why and cannot make any claims," wrote the researcher, who is not affiliated with OpenAI. "However, I do believe my cultural observations of the SF AI scene are more broadly relevant to the AI industry."
"I don't think events like the consensual non-consensual (cnc) sex parties and heavy LSD use of some elite AI researchers have been good for women," Joseph continued. "They create a climate that can be very bad for female AI researchers... I believe they are somewhat emblematic of broader problems: a coercive climate that normalizes recklessness and crossing boundaries, which we are seeing playing out more broadly in the industry today. Move fast and break things, applied to people."
While she said she doesn't think there's anything generally wrong with "sex parties and heavy LSD use," she also charged that the culture surrounding these alleged parties "leads to some of the most coercive and fucked up social dynamics that I have ever seen."
"I have seen people repeatedly get shut down for pointing out these problems," Joseph wrote. "Once, when trying to point out these problems, I had three OpenAI and Anthropic researchers debate whether I was mentally ill on a Google document. I have no history of mental illness; and this incident stuck with me as an example of blindspots/groupthink."
"Its likely these problems are not really on OpenAI but symptomatic of a much deeper rot in the Valley," she added. "I wish I could say more, but probably shouldnt."
Overall, it's hard to make heads or tails of these claims.We've reached out to Joseph and OpenAI for more info.
"I'm not under an NDA. I never worked for OpenAI," Joseph wrote. "I just observed the surrounding AI culture through the community house scene in SF, as a fly-on-the-wall, hearing insider information and backroom deals, befriending dozens of women and allies and well-meaning parties, and watching many them get burned."
More on OpenAI: Sam Altman Clearly Freaked Out by Reaction to News of OpenAI Silencing Former Employees
See the rest here:
Machine Learning Researcher Links OpenAI to Drug-Fueled Sex Parties - Futurism
Machine Learning Stocks to Buy That Are Millionaire-Makers: May – InvestorPlace
Source: Wright Studio / Shutterstock.com
The next phase of technology has been established: machine learning and AI will revolutionize the world for the better. Although it might seem like these stocks are trading in a bubble, investors need to keep a discerning and keen long-term vision for these disruptive, emerging technologies. Some way or another, AI will grow to become a secular movement that nearly every industry, not every company in the world, will incorporate to increase productivity and efficiency.
Of course, anxiousness about the AI bubble is not unwarranted. Preparing a well-diversified portfolio of the right stocks is crucial to avoid such major drawdowns. Just because a company mentions AI doesnt mean it instantly becomes a good investment. Weve already seen this with pullbacks in industries like EVs and fintech. So, if you want to gain machine learning exposure in your portfolio, consider these three machine learning stocks to buy and thank us in the coming five or ten years.
Source: Ascannio / Shutterstock.com
Palantir (NYSE:PLTR) went from a meme stock to a legitimate business, earning hundreds of millions each year in profits. The stock is trading right at the average analyst price target of $21.45 and has a street-high price target of $35.00. This high-end target represents a more than 60% upside from the current price.
This stock has been polarizing on Wall Street since its direct listing debut in September 2020. While the first few years were a roller coaster ride for investors, the stock is earning legitimate backing through its machine-learning integrated production deployment infrastructure. Additionally, the hype doesnt get any more legit than Stanley Druckenmiller, who disclosed that he bought nearly 770,000 shares in the recent quarter! For those who dont know him, Druckenmiller has long supported the ML revolution, with NVIDIA (NASDAQ:NVDA) being his most recent win during its massive rally over the past year.
The problem with Palantir has always been its valuation. Currently, shares trade at 21x sales and 65x forward earnings. Nonetheless, growth prospects are looking strong now, with revenue growing at a five-year compound annual growth rate (CAGR) of 12% and a three-year CAGR of 21%. As multiples begin to compress, investors should consider Palantir to be a legitimate money-making contender in the ML space.
Baidu (NASDAQ:BIDU) is a Chinese technology company that recently amassed over 200 million users on its new Ernie AI chatbot. This year, the stock is down by about 4.0% as Chinese stocks have lagged the broader rally in US equities. Nonetheless, Wall Street has maintained an average analyst price target of $153.36, about 40% higher than the current price.
Baidu recently made headlines after reporting it was interested in partnering with Tesla (NASDAQ:TSLA) to use its robotaxis in China. As China looks to get its hands on some for immediate rollout, investors should keep their eyes peeled for the unveiling of the CyberCabs in America this August. Not only will this potentially be one of the strongest new channels for revenue growth for both these companies, but Baidus race to get first movers advantage could solidify it as a leader in the Chinese automobile space.
As with many Chinese ADR stocks, the multiples for BIDU are low. For example, its P/E ratio of 9.79x is sitting 25% lower than its sectors median! On top of such a discounted valuation, Baidu has maintained a strong 10-year revenue CAGR of 14%. Baidu looks like a bargain for investors who can tolerate the risk that comes with Chinese stocks.
Micron Technologies (NASDAQ:MU) is an American chip maker with a major surge in demand due to AI and machine learning technology. Analysts are bullish on MU, with 28 of 31 recommendations coming in May as a Buy or Strong Buy rating. The average analyst price target is $145.52, nearly 15% higher than the current price.
This chip maker has already hit new all-time highs this month and is seeing revitalized product demand. This growth potential has largely been attributed to Micron being one of three companies in the world that make DRAM memory chips. These chips allow for storing massive amounts of data, which will help accelerate the training of AI and machine learning technologies. These DRAM chips account for 71% of Microns revenue as of Q2 2024, which bodes well for the stocks upward momentum.
Usually, when a stock trades at all-time highs, its valuations also stretch. Thats not exactly true for Micron, as shares are trading at just 7.5x sales and 17x forward earnings. As revenue growth accelerates, Micron sticks out as one of the more under-the-radar ways to gain exposure to AI and potentially join the million-dollar club.
On the date of publication, Ian Hartana and Vayun Chugh did not hold (either directly or indirectly) any positions in the securities mentioned in this article. The opinions expressed in this article are those of the writer, subject to the InvestorPlace.comPublishing Guidelines.
Chandler Capital is the work of Ian Hartana and Vayun Chugh. Ian Hartana and Vayun Chugh are both self-taught investors whose work has been featured in Seeking Alpha. Their research primarily revolves around GARP stocks with a long-term investment perspective encompassing diverse sectors such as technology, energy, and healthcare.
Read the rest here:
Machine Learning Stocks to Buy That Are Millionaire-Makers: May - InvestorPlace
EU AI Act Clears Final Hurdle to Become Global Landmark – InformationWeek
The European Union (EU) on Tuesday passed the AI Act, a landmark legislative effort that marks the first comprehensive regulations to create guardrails for artificial intelligence.
EU members final approval means the act will enter into force next month. The law, first drafted in 2021, was put on a fast track in recent months as global leaders race to adopt safeguards to keep pace with the explosive growth in generative AI (GenAI) adoption.
This landmark law, the first of its kind in the world, addresses a global technological challenge that also creates opportunities for our societies and economies, Belgian Digitalization Minister Mathieu Michel said in a statement. With the AI Act, Europe emphasizes the importance of trust, transparency and accountability when dealing with new technologies while at the same time ensuring this fast-changing technology can flourish and boost European innovation.
But US companies will certainly take notice as the rules will apply to any company doing business in Europe. And the cost of running afoul of the rules could be substantial, even for multibillion-dollar US firms.
Rules for general purpose AI models will impact companies after 12 months while rules for AI systems embedded into products will strike in 36 months. Bans on AI in predictive policing, and untargeted scraping of facial images from video will come into play in six months. Fines will range from $8.2 million or 1.5% of global turnover to $37.9 million or 7% of turnover, depending on the violation.
Related:EU AI Act Passes: How CIOs Can Prepare
The EU AI Act clearing its final hurdle today marks a significant milestone in the regulatory landscape of AI globally, Manoj Saxena, InformationWeek Insight Circle member and founder of the Responsible AI Institute, tells us via email. Although it may not directly affect US-based AI developers like OpenAI, Microsoft, Google, and Meta until 2025, its implications are profound.
US companies are already bracing for change, Saxena tells InformationWeek. We are already seeing an uptick in consultations as our member companies prepare for a future where compliance will not only be mandatory but will also serve as a competitive differentiator in the global marketplace.
Companies, he says, should not take the act lightly. This act is setting a precedent that will likely influence AI regulation and development not just in the world, but across the US."
US legislators on both sides of the aisle have signaled concern about the EUs growing influence on US tech interests. A Biden administration executive order sought to establish some US-based rules, but an administration change could see that order easily canceled.
Related:Cranium, Microsoft, KPMG Launch EU AI Hub
Were glad to see that the EU is taking on the regulation of frontier AI models, Daniel Colson, executive director of the AI Policy Institute, tells InformationWeek in an email. But the American people are clear that they dont want Europe to take the lead on AI regulation, and want us to craft our own policies.
He noted that a poll conducted by the AI Policy Institute showed that the majority of Americans, regardless of partisan leanings, want to see the US pave its own way for AI regulation.
Theres a lot of work to do to improve on the European model of this tiering system as regulation is passed in the US, he says. But fundamentally, its approach is sound and on the right track US regulation has the opportunity to focus even more on reducing the dangers of these most powerful models while broadly supporting responsible innovation.
View post:
EU AI Act Clears Final Hurdle to Become Global Landmark - InformationWeek
Create a multimodal assistant with advanced RAG and Amazon Bedrock | Amazon Web Services – AWS Blog
Retrieval Augmented Generation (RAG) models have emerged as a promising approach to enhance the capabilities of language models by incorporating external knowledge from large text corpora. However, despite their impressive performance in various natural language processing tasks, RAG models still face several limitations that need to be addressed.
Naive RAG models face limitations such as missing content, reasoning mismatch, and challenges in handling multimodal data. Although they can retrieve relevant information, they may struggle to generate complete and coherent responses when required information is absent, leading to incomplete or inaccurate outputs. Additionally, even with relevant information retrieved, the models may have difficulty correctly interpreting and reasoning over the content, resulting in inconsistencies or logical errors. Furthermore, effectively understanding and reasoning over multimodal data remains a significant challenge for these primarily text-based models.
In this post, we present a new approach named multimodal RAG (mmRAG) to tackle those existing limitations in greater detail. The solution intends to address these limitations for practical generative artificial intelligence (AI) assistant use cases. Additionally, we examine potential solutions to enhance the capabilities of large language models (LLMs) and visual language models (VLMs) with advanced LangChain capabilities, enabling them to generate more comprehensive, coherent, and accurate outputs while effectively handling multimodal data. The solution uses Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies, providing a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
The mmRAG solution is based on a straightforward concept: to extract different data types separately, you generate text summarization using a VLM from different data types, embed text summaries along with raw data accordingly to a vector database, and store raw unstructured data in a document store. The query will prompt the LLM to retrieve relevant vectors from both the vector database and document store and generate meaningful and accurate answers.
The following diagram illustrates the solution architecture.
The architecture diagram depicts the mmRAG architecture that integrates advanced reasoning and retrieval mechanisms. It combines text, table, and image (including chart) data into a unified vector representation, enabling cross-modal understanding and retrieval. The process begins with diverse data extractions from various sources such as URLs and PDF files by parsing and preprocessing text, table, and image data types separately, while table data is converted into raw text and image data into captions.
These parsed data streams are then fed into a multimodal embedding model, which encodes the various data types into uniform, high dimensional vectors. The resulting vectors, representing the semantic content regardless of original format, are indexed in a vector database for efficient approximate similarity searches. When a query is received, the reasoning and retrieval component performs similarity searches across this vector space to retrieve the most relevant information from the vast integrated knowledge base.
The retrieved multimodal representations are then used by the generation component to produce outputs such as text, images, or other modalities. The VLM component generates vector representations specifically for textual data, further enhancing the systems language understanding capabilities. Overall, this architecture facilitates advanced cross-modal reasoning, retrieval, and generation by unifying different data modalities into a common semantic space.
Developers can access mmRAG source codes on the GitHub repo.
You start by configuring Amazon Bedrock to integrate with various components from the LangChain Community library. This allows you to work with the core FMs. You use the BedrockEmbeddings class to create two different embedding models: one for text (embedding_bedrock_text) and one for images (embeddings_bedrock_image). These embeddings represent textual and visual data in a numerical format, which is essential for various natural language processing (NLP) tasks.
Additionally, you use the LangChain Bedrock and BedrockChat classes to create a VLM model instance (llm_bedrock_claude3_haiku) from Anthropic Claude 3 Haiku and a chat instance based on a different model, Sonnet (chat_bedrock_claude3_sonnet). These instances are used for advanced query reasoning, argumentation, and retrieval tasks. See the following code snippet:
In this section, we explore how to harness the power of Python to parse text, tables, and images from URLs and PDFs efficiently, using two powerful packages: Beautiful Soup and PyMuPDF. Beautiful Soup, a library designed for web scraping, makes it straightforward to sift through HTML and XML content, allowing you to extract the desired data from web pages. PyMuPDF offers an extensive set of functionalities for interacting with PDF files, enabling you to extract not just text but also tables and images with ease. See the following code:
The following code snippets demonstrate how to generate image captions using Anthropic Claude 3 by invoking the bedrock_get_img_description utility function. Additionally, they showcase how to embed image pixels along with image captioning using the Amazon Titan image embedding model amazon.titan_embeding_image_v1 by calling the get_text_embedding function.
You can harness the capabilities of the newly released Anthropic Claude 3 Sonnet and Haiku on Amazon Bedrock, combined with the Amazon Titan image embedding model and LangChain. This powerful combination allows you to generate comprehensive text captions for tables and images, seamlessly integrating them into your content. Additionally, you can store vectors, objects, raw image file names, and source documents in an Amazon OpenSearch Serverless vector store and object store. Use the following code snippets to create image captions by invoking the utility function bedrock_get_img_description. Embed image pixels along with image captions using the Amazon Titan image embedding model amazon.titan_embeding_image_v1 by calling the get_text_embedding functions.
You can consult the provided code examples for more information on how to embed multimodal and insert vector documents into the OpenSearch Serverless vector store. For more information about data access, refer to Data access control for Amazon OpenSearch Serverless.
Fusion in RAG presents an innovative search strategy designed to transcend the limitations of conventional search techniques, aligning more closely with the complex nature of human inquiries. This initiative elevates the search experience by integrating multi-faceted query generation and using Reciprocal Rank Fusion for an enhanced re-ranking of search outcomes. This approach offers a more nuanced and effective way to navigate the vast expanse of available information, catering to the intricate and varied demands of users searches.
The following diagram illustrates this workflow.
We use the Anthropic Claude 3 Sonnet and Haiku models, which possess the capability to process visual and language data, which enables them to handle the query decomposition (Haiku) and answer fusion (Sonnet) stages effectively. The following code snippet demonstrates how to create a retriever using OpenSearch Serverless:
The combination of decomposition and fusion intend to address the limitations of the chain-of-thought (CoT) method in language models. It involves breaking down complex problems into simpler, sequential sub-problems, where each sub-problem builds upon the solution of the previous one. This technique significantly enhances the problem-solving abilities of language models in areas such as symbolic manipulation, compositional generalization, and mathematical reasoning.
The RAG-decomposition approach, which uses the decomposition step (see the following code), underscores the potential of a technique called least-to-most prompting. This technique not only improves upon existing methods but also paves the way for more advanced, interactive learning frameworks for language models. The ultimate goal is to move towards a future where language models can learn from bidirectional conversations, enabling more effective reasoning and problem-solving capabilities.
The RAG process is further enhanced by integrating a reciprocal re-ranker, which uses sophisticated NLP techniques. This makes sure the retrieved results are relevant and also semantically aligned with the users intended query. This multimodal retrieval approach seamlessly operates across vector databases and object stores, marking a significant advancement in the quest for more efficient, accurate, and contextually aware search mechanisms.
The mmRAG architecture enables the system to understand and process multimodal queries, retrieve relevant information from various sources, and generate multimodal answers by combining textual, tabular, and visual information in a unified manner. The following diagram highlights the data flows from queries to answers by using an advanced RAG and a multimodal retrieval engine powered by a multimodal embedding model (amazon.titan-embed-image-v1), an object store (Amazon S3), and a vector database (OpenSearch Serverless). For tables, the system retrieves relevant table locations and metadata, and computes the cosine similarity between the multimodal embedding and the vectors representing the table and its summary. Similarly, for images, the system retrieves relevant image locations and metadata, and computes the cosine similarity between the multimodal embedding and the vectors representing the image and its caption.
The following screenshot illustrates the improved accuracy and comprehensive understanding of the users query with multimodality capability. The mmRAG approach is capable of grasping the intent behind the query, extracting relevant information from the provided chart, and estimating the overall costs, including the estimated output token size. Furthermore, it can perform mathematical calculations to determine the cost difference. The output includes the source chart and a link to its original location.
Amazon Bedrock offers a comprehensive set of generative AI models for enhancing content comprehension across various modalities. By using the latest advancements in VLMs, such as Anthropic Claude 3 Sonnet and Haiku, as well as the Amazon Titan image embedding model, Amazon Bedrock enables you to expand your document understanding beyond text to include tables, charts, and images. The integration of OpenSearch Serverless provides enterprise-grade vector storage and approximate k-NN search capabilities, enabling efficient retrieval of relevant information. With advanced LangChain decomposition and fusion techniques, you can use multi-step querying across different LLMs to improve accuracy and gain deeper insights. This powerful combination of cutting-edge technologies allows you to unlock the full potential of multimodal content comprehension, enabling you to make informed decisions and drive innovation across various data sources.
The reliance on visual language models and image embedding models for comprehensive and accurate image captions has its limitations. Although these models excel at understanding visual and textual data, the multi-step query decomposition, reciprocal ranking, and fusion processes involved can lead to increased inference latency. This makes such solutions less suitable for real-time applications or scenarios that demand instantaneous responses. However, these solutions can be highly beneficial in use cases where higher accuracy and less time-sensitive responses are required, allowing for more detailed and accurate analysis of complex visual and textual data.
In this post, we discussed how you can use multimodal RAG to address limitations in multimodal generative AI assistants. We invite you to explore mmRAG and take advantage of the advanced features of Amazon Bedrock. These powerful tools can assist your business in gaining deeper insights, making well-informed decisions, and fostering innovation driven by more accurate data. Ongoing research efforts are focused on developing an agenic and graph-based pipeline to streamline the processes of parsing, injection, and retrieval. These approaches hold the promise of enhancing the reliability and reusability of the mmRAG system.
Authors would like to expression sincere gratitude to Nausheen Sayed, Karen Twelves, Li Zhang, Sophia Shramko, Mani Khanuja, Santhosh Kuriakose, and Theresa Perkins for their comprehensive reviews.
Alfred Shenis a Senior AI/ML Specialist at AWS. He has been working in Silicon Valley, holding technical and managerial positions in diverse sectors including healthcare, finance, and high-tech. He is a dedicated applied AI/ML researcher, concentrating on CV, NLP, and multimodality. His work has been showcased in publications such as EMNLP, ICLR, and Public Health.
Changsha Ma is an generative AI Specialist at AWS. She is a technologist with a PhD in Computer Science, a masters degree in Education Psychology, and years of experience in data science and independent consulting in AI/ML. She is passionate about researching methodological approaches for machine and human intelligence. Outside of work, she loves hiking, cooking, hunting food, mentoring college students for entrepreneurship, and spending time with friends and families.
Julianna Delua is a Principal Specialist for AI/ML and generative AI. She serves the financial services industry customers including those in Capital Markets, Fintech and Payments. Julianna enjoys helping businesses turn new ideas into solutions and transform the organizations with AI-powered solutions.
Continue reading here:
Create a multimodal assistant with advanced RAG and Amazon Bedrock | Amazon Web Services - AWS Blog
Using blood routine indicators to establish a machine learning model for predicting liver fibrosis in patients with … – Nature.com
Study population
The study population consisted of patients diagnosed with Schistosoma japonicum in Yueyang, Hunan Province, China. This city has historically been a high schistosomiasis epidemic area. Because it was located near Dongting Lake in the middle and lower reaches of the Yangtze River, where the Intermediate host Oncomelania hupensis breeds in large numbers.
Schistosoma japonicum infection was diagnosed according to the definition of Zhou et al.26. Including the following diagnostic criteria: life history in schistosomiasis-endemic areas, contact with infected water, specific schistosoma serology testing, color ultrasound, excreta (feces, urine) microscopic examination. Schistosomiasis infection was considered when schistosome ova were visualized in stool, urine or when the Schistosoma serology was positive.
Liver fibrosis was determined by ultrasound according to the World Health Organization diagnostic criteria for Schistosoma japonicum infection27,28. An experienced ultrasound expert divided the patients into two groups according to the ultrasound results: fibrosis group (with mesh-like changes and uneven hepatic echotexture); no-fibrosis group (without mesh-like changes, smooth and uniform hepatic echotexture). The diagnosis was double-checked by another experienced schistosomiasis specialist.
A retrospective medical record review was conducted from June 2019 to June 2022 at Xiangyue Hospital, Yueyang City, Hunan Province of China. All patients underwent blood tests and ultrasound evaluation at admission. All variables were extracted from the hospitals electronic medical record system. The data include: patient demographic characteristics, blood routine indicators and other variables. KNN filling method is used to fill in the missing data. The principle is to identify k samples that are spatially similar or close in the data set through distance measurement, and then use these k samples to estimate the value of the missing data point. The percentage of missing data points is presented in Supplementary Table 5. The LassoCV method was used to screen out key variables. Data entry was performed by a full-time research physician or medical student. This study was conducted and approved by the Ethics Committee of the third Xiangya Hospital of Central South University (No: 21149) and has been carried out in accordance with the Code of Ethics of the World Medical Association (Declaration of Helsinki) for experiments. All methods were performed in accordance with the relevant guidelines and regulations. The need of informed consent was waived by the Ethics Committee of the third Xiangya Hospital of Central South University due to retrospective nature of the study. The privacy of all participants is fully protected.
Patients were divided into hepatic fibrosis and non-hepatic fibrosis groups according to their color Doppler ultrasound results. Patients with hepatitis B virus (hepatitis B surface antigen seropositive), hepatitis C virus (HCV antibody seropositive), human immunodeficiency virus (HIV antibody seropositive), alcoholic and non-alcoholic fatty liver disease (ultrasound scanning and alcohol consumption above 30g daily), decompensated liver disease or liver cancer (ultrasound and liver function tests), and organ transplantation (self-reported) were excluded. The key variables are selected by LassoCV method for subsequent modeling.
First, the classification task was completed using 6 machine learning algorithms, including: XGB Classifier, Logistic Regression, LightGBM Classifier, Random Forest Classifier, Support Vector Classification, K Neighbors Classifier. Fivefold cross-validation method was used for validation. Each model was evaluated using AUC, clinical decision curve plot, accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score. The ROC diagram and the forest diagram show the ROC results of each model for the prediction of hepatic fibrosis.
After selecting the best algorithm through multi-algorithm model comparison, the best algorithm was used to model again. Different from multi-model comparison, when using the best-performing algorithm for modeling, we randomly select 15% of the total samples as the test set, and the remaining samples are used as the training set for fivefold cross-validation.
The SHAP package in python can interpret the output of machine learning models, considering all features as contributors. For each prediction sample, the model will generate a prediction value, and its biggest advantage is that it can reflect the influence of the characteristics in each sample and show the positive and negative effects. This study used the SHAP package to interpret the model. SHAP value plots were used to show the contribution of each variable in the model. Model variable importance plots were used to show the importance ranking of each variable. Force diagrams were used to illustrate how each variable affects the predicted outcome for each sample with two examples.
The python used in this study is version 3.7. The statsmodels 0.11.1 package in Python was used to count whether each variable was different between two groups of people. The analysis method was selected according to the distribution of samples, homogeneity of variance, and sample size. Chi-square test was used for categorical variables. Students t-test or MannWhitney U-test was used for quantitative variables.
In this study, LassoCV was used to screen key variables, and factors with a coefficient of 0 were automatically eliminated (sklearn 0.22.1 package in Python). Lasso obtains a more refined model by constructing a penalty function, so that it compresses some regression coefficients, that is, forces the sum of the absolute values of the coefficients to be less than a certain fixed value; at the same time, sets some regression coefficients to zero. Therefore, the advantage of subset shrinkage is preserved, and it is a biased estimate for dealing with data with multicollinearity. In the multi-model and best-model modeling process, the xgboost 1.2.1 package of Python is used for XGBoost algorithm modeling, the lightgbm 3.2.1 package of Python is used for LightGBM algorithm modeling, and the sklearn 0.22.1 package of Python was used to build other models. The shap 0.39.0 package in python was used to demonstrate the interpretability of the model.
Ethics approval was obtained from the Ethics Committee of the third Xiangya Hospital of Central South University.
Slack has been using data from your chats to train its machine learning models – Engadget
Slack trains machine-learning models on user messages, files and other content without explicit permission. The training is opt-out, meaning your private data will be leeched by default. Making matters worse, youll have to ask your organizations Slack admin (human resources, IT, etc.) to email the company to ask it to stop. (You cant do it yourself.) Welcome to the dark side of the new AI training data gold rush.
Corey Quinn, an executive at DuckBill Group, spotted the policy in a blurb in Slacks Privacy Principles and posted about it on X (via PCMag). The section reads (emphasis ours), To develop AI/ML models, our systems analyze Customer Data (e.g. messages, content, and files) submitted to Slack as well as Other Information (including usage information) as defined in our Privacy Policy and in your customer agreement.
In response to concerns over the practice, Slack published a blog post on Friday evening to clarify how its customers data is used. According to the company, customer data is not used to train any of Slacks generative AI products which it relies on third-party LLMs for but is fed to its machine learning models for products like channel and emoji recommendations and search results. For those applications, the post says, Slacks traditional ML models use de-identified, aggregate data and do not access message content in DMs, private channels, or public channels. That data may include things like message timestamps and the number of interactions between users.
A Salesforce spokesperson reiterated this in a statement to Engadget, also saying that we do not build or train these models in such a way that they could learn, memorize, or be able to reproduce customer data.
I'm sorry Slack, you're doing fucking WHAT with user DMs, messages, files, etc? I'm positive I'm not reading this correctly. pic.twitter.com/6ORZNS2RxC
Corey Quinn (@QuinnyPig) May 16, 2024
The opt-out process requires you to do all the work to protect your data. According to the privacy notice, To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line Slack Global model opt-out request. We will process your request and respond once the opt out has been completed.
The company replied to Quinns message on X: To clarify, Slack has platform-level machine-learning models for things like channel and emoji recommendations and search results. And yes, customers can exclude their data from helping train those (non-generative) ML models.
How long ago the Salesforce-owned company snuck the tidbit into its terms is unclear. Its misleading, at best, to say customers can opt out when customers doesnt include employees working within an organization. They have to ask whoever handles Slack access at their business to do that and I hope they will oblige.
Inconsistencies in Slacks privacy policies add to the confusion. One section states, When developing Al/ML models or otherwise analyzing Customer Data, Slack cant access the underlying content. We have various technical measures preventing this from occurring. However, the machine-learning model training policy seemingly contradicts this statement, leaving plenty of room for confusion.
In addition, Slacks webpage marketing its premium generative AI tools reads, Work without worry. Your data is your data. We dont use it to train Slack AI. Everything runs on Slacks secure infrastructure, meeting the same compliance standards as Slack itself.
In this case, the company is speaking of its premium generative AI tools, separate from the machine learning models its training on without explicit permission. However, as PCMag notes, implying that all of your data is safe from AI training is, at best, a highly misleading statement when the company apparently gets to pick and choose which AI models that statement covers.
Update, May 18 2024, 3:24 PM ET: This story has been updated to include new information from Slack, which published a blog post explaining its practices in response to the community's concerns.
Update, May 19 2024, 12:41 PM ET: This story and headline have been updated to reflect additional context provided by Slack about how it uses customer data.
Read the rest here:
Slack has been using data from your chats to train its machine learning models - Engadget