Category Archives: Machine Learning
Bringing generative artificial intelligence to space – SpaceNews
TAMPA, Fla. Amazon Web Services is busy positioning its cloud infrastructure business to capitalize on the promise of generative artificial intelligence for transforming space and other industries.
More than 60% of the companys space and aerospace customers are already using some form of AI in their businesses, according to AWS director of aerospace and satellite Clint Crosier, up from single digits around three years ago.
Crosier predicts similar growth over the next few years in space for generative AI, which uses deep-learning models to answer questions or create content based on patterns detected in massive datasets, marking a major step up from traditional machine-learning algorithms.
Mathematical advances, an explosion in the amount of available data and cheaper and more efficient chips for processing it are a perfect storm for the rise of generative AI, he told SpaceNews in an interview, helping drive greater adoption of cloud-based applications.
In the last year, AWS has fundamentally reorganized itself internally so that we could put the right teams [and] organizational structure in place so that we can really double down on generative AI, he said.
He said AWS has created a generative AI for space cell of a handful of people to engage with cloud customers to help develop next-generation capabilities.
These efforts include a generative AI laboratory for customers to experiment with new ways of using these emerging capabilities.
Crosier sees three main areas for using generative AI in space: geospatial analytics, spacecraft design and constellation management.
Earth observation satellite operators such as BlackSky and Capella Space already use AI extensively to gain more insights into their geospatial data, but have not yet bridged into generative AI.
Its also early days in the manufacturing sector, but Crosier said engineers are experimenting with how a generative AI model fed with design parameters could produce new concepts by drawing from potentially overlooked data, such as from the automotive industry.
Whether youre designing a satellite, rocket or spacecraft, youre letting the generative AI go out and do that exploratory work around the globe with decades of data, he said, and then it will come back and bring you novel design concepts that nobody has envisioned before for your team to use as a baseline to start refining.
He said generative AI also has the potential to help operators manage increasingly crowded orbits by helping to simulate testing scenarios.
If I have a constellation of 600 satellites, I want to model how that constellation will behave under various design parameters, he said.
Well, I can get a model of two concepts, which leaves me woefully inadequate but it costs time and money to model them, or I can model an infinite number. Gen AI will tell me what are the top 25 cases I should model for my modeling simulation capability that will give me the best design optimization, and so were seeing it used that way.
AWS efforts to accelerate the adoption of these emerging computing capabilities also include scholarships and a commitment announced in November to provide free AI training for two million people worldwide before the end of 2025.
This article was updated May 28 to clarify that BlackSky and Capella Space have yet to integrate generative AI into their business, although they use AI extensively.
Go here to see the original:
Bringing generative artificial intelligence to space - SpaceNews
Predicting multi-label emojis, emotions, and sentiments in code-mixed texts using an emojifying sentiments framework … – Nature.com
Extensive discussion of experiments, results, and analysis on our introduced dataset for the proposed method and existing state-of-the-art baselines are presented below.
The following baseline methods are compared to our proposed approach.
XLMR(^{[FT+LS+RF]})86: In this method, a pre-trained BERT (Bidirectional Encoder Representations from Transformers) model is fine-tuned (FT) to perform sentiment analysis. To reduce overfitting, the authors incorporated label smoothing (LS) and rule-based features (RF) such as negation handling and sentiment shift detection. This model is used for emoji, sentiment, and emotion analysis tasks.
Multilingual BERT (mBERT)87: The authors utilized a transformer-based language model called mBERT to learn contextual embeddings for words in multiple languages. mBERT was pre-trained on large amounts of monolingual and multilingual text data and fine-tuned on the SentiMix code-mixed dataset for sentiment detection and emotion recognition.
XLMR(^{MTL})87: The authors used XLM-R, a cross-lingual language model based on a transformer architecture that was pre-trained on a larger dataset including code-mixed text. XLM-R can encode and decode text in multiple languages and has achieved state-of-the-art results on various NLP tasks, including sentiment analysis and emotion recognition. They fine-tuned XLM-R on the SentiMix code-mixed dataset for sentiment detection and emotion recognition.
TL-XLMR(^{[LS]})6: To detect sentiment and recognize emotions in the SentiMix code-mixed dataset, the authors employed an end-to-end multitask framework based on a transformer architecture. They fine-tuned XLM-RoBERTa (XLMR), a pre-trained cross-lingual embedding model, with task-specific data to improve model efficiency through transfer learning.
TL-mBERT(^{[LS]})6: In this ablation experiment, the authors replaced the XLMR module with mBERT to investigate the significance of the sentence encoder in TL-XLMR(^{[LS]}). The model was fine-tuned on the SentiMix code-mixed dataset to perform sentiment detection and emotion recognition.
Our suggested model is put into practice using PyTorch, a well-liked Python deep-learning toolkit. We employ the F1-score (F1) as our evaluation metric for both emotion and sentiment prediction and for emoji we used Jaccord Index (JI), macro F1-score. We utilize Adam optimizer88 and do a grid search for 200 epochs to improve the model. We use Transformer Encoder with two layers our embedding size is 300 which we find empirically (checked for 100, 150, 200 and 300). The dropout rate is set at 0.5 while the learning rate is set at 0.05. The auto-latent encoders dimension was found to be 2048 using empirical techniques. The discriminator, ({mathcal {D}}), is composed of two fully connected layers, a ReLU layer. The learning rate is set to 1e-3, weight decay of 1e-4, and momentum of 0.3. By contrasting the F1 and accuracy scores with different baselines, the efficacy of our strategy is assessed. In the CM-RFT, the kernel is dynamically computed from the input using a fully connected layer. The kernel sizes are [3, 5, 7, 31*3], and each module has 4 heads (half the number of heads in the transformer base model).
For the emoji detection tasks, we consider the Jaccard Index (JI)89 and Hamming loss (HL)90 metrics to evaluate the performance of our proposed system. Additionally, we also report the micro-averaged F191 score and Accuracy values for the same (as shown in Table8). JI, HL, and micro-averaged F1 are popular choices to evaluate multi-label classification tasks. For the sentiment and emotion detection tasks (as shown in Tables9 and 10), we report the macro-averaged F1 score91 and accuracy values for our proposed model.
Micro-averaged F1 score: For multi-label classification tasks, the micro-averaged F1 score is a commonly used metric that computes the F1 score globally by counting the true positives (TP), false negatives (FN), and false positives (FP) across all labels. The formula for the micro-averaged F1 score is: (F1_{micro} = frac{2 * sum _{i=1}^n TP_i}{2 * sum _{i=1}^n TP_i + sum _{i=1}^n FP_i + sum _{i=1}^n FN_i})
Macro-averaged F1 score: The macro-averaged F1 score is another commonly used metric for multi-label classification tasks. It computes the F1 score for each label and then takes the average of these F1 scores. The formula for the macro-averaged F1 score is: (F1_{macro} = frac{1}{n} sum _{i=1}^n frac{2 * TP_i}{2 * TP_i + FP_i + FN_i})
Accuracy: Accuracy is a metric that measures the proportion of correctly classified labels to the total number of labels. The formula for accuracy is: (A = frac{sum _{i=1}^n TP_i}{sum _{i=1}^n TP_i + sum _{i=1}^n FP_i})
Hamming Loss: The Hamming loss measures the proportion of misclassified labels to the total number of labels. The formula for Hamming loss is: (HL = frac{1}{n} sum _{i=1}^n frac{xor(Y_i, hat{Y_i})}{m}) where n is the number of instances, m is the number of labels, (Y_i) is the true label vector for instance i, (hat{Y_i}) is the predicted label vector for instance i, and xor is the logical XOR operator.
Jaccard Index: The Jaccard Index measures the similarity between two sets by computing the ratio of the size of their intersection to the size of their union, and it is used to measure the similarity between the predicted and true label sets in multi-label classification. The formula for the Jaccard Index is: (JI = frac{1}{n} sum _{i=1}^n frac{|Y_i cap hat{Y_i}|}{|Y_i cup hat{Y_i}|}) where n is the number of instances, (Y_i) is the true label set for instance i, and (hat{Y_i}) is the predicted label set for instance i. The Jaccard similarity is computed as the size of the intersection of the predicted and true label sets divided by the size of their union. The resulting score ranges from 0 to 1, with 1 representing the perfect similarity between the predicted and true label sets.
Tables8, 9, and 10 present the performance of CM-T, CM-FT, and CM-RFT models for the emoji, sentiment, and emotion tasks in UTL, DTL, and TTL setups. These setups investigate the effectiveness of multi-task learning in improving overall system performance compared to single-task learning.
The results reported in Table8 are the performance metrics of three different models (CM-T, CM-FT, CM-RFT) trained on three different setups (uni-task learning, dual-task learning, and tri-task learning) for the task of emoji detection.
In the uni-task learning setup, where each task is solved individually, the performance of the CM-RFT model improves as more features are added. Specifically, the performance improves as we go from using only character embeddings to character embeddings + Elmo embeddings + TF-IDF. The F1 score increases from 0.59 to 0.64, the accuracy score from 0.62 to 0.67, while the hamming loss decrease from 0.15 to 0.13, and the Jaccard index increases from 0.52 to 0.56. These results suggest that using multiple features can improve the performance of the emoji detection task.
In the dual-task learning setup, where the emoji task is jointly learned with sentiment/emotion tasks are jointly learned, the performance of the CM-RFT model further improves compared to the uni-task learning setup. The improvement is more evident when the model is trained on Character embeddings + Elmo embeddings + TF-IDF features. The F1 score increases from 0.64 to 0.68, the accuracy score from 0.67 to 0.71, while the Hamming loss decrease from 0.13 to 0.07, and the Jaccard index increases from 0.56 to 0.61, respectively. These results suggest that training the model on multiple tasks can lead to further improvements in the performance of the emoji detection task.
In the tri-task learning setup, where sentiment, emotion, and emoji detection tasks are jointly learned, the performance of the CM-RFT model improves even further compared to the dual-task learning setup. The F1 score increases from 0.68 to 0.73, the accuracy score from 0.71 to 0.75, while the Hamming loss decrease from 0.07 to 0.054, and the Jaccard index increases from 0.61 to 0.69. These results suggest that joint learning of multiple tasks leads to significant improvements in the performance of the emoji detection task.
Overall, the results suggest that the performance of the emoji detection task can be improved by using multiple features and by training the model on multiple tasks. Additionally, the results suggest that sentiment and emotion have a significant impact on the performance of the emoji detection task as joint learning of these tasks leads to significant improvements in performance.
The sentiment classification task results are presented in Table9 for the joint learning of emotion and emoji tasks. In the uni-task setup, where each task is performed independently, the CM-RFT model achieves the highest performance for the sentiment task with an F1 score of 72.65 and accuracy of 75.19. This suggests that including extra features, such as Elmo embeddings and TF-IDF features, can enhance sentiment detection performance across all models compared to those utilizing only character embedding features.
In the dual-task setup, when sentiment and emoji tasks are jointly learned, the F1 score and accuracy score of the sentiment detection task improve from 72.65 and 75.19, respectively, in the uni-task setup to 78.22 and 79.21, respectively, when using character embeddings, Elmo embeddings, and TF-IDF features. Similarly, when sentiment and emotion tasks are jointly learned, the F1 score and accuracy score of the sentiment detection task improve from 72.65 and 75.19, respectively, in the uni-task setup to 74.64 and 77.31, respectively, when using character embeddings, Elmo embeddings, and TF-IDF features.
In the tri-task setup, where sentiment, emotion, and emoji detection tasks are solved jointly, the CM-RFT model achieves the best performance for the sentiment task with an F1 score of 82.35 and accuracy of 83.14, followed by the CM-FT model with an F1 score of 75.42 and accuracy of 79.26. This again confirms that multitask learning helps to improve sentiment detection performance when it is learned jointly with other tasks.
The findings indicate that integrating emotion and emoji detection tasks into the sentiment classification task can enhance the models performance. The tri-task learning setup demonstrated the highest performance for the sentiment task, implying that incorporating these extra tasks can improve the models comprehension of the sentiment expressed in text. The enhanced performance is likely due to the additional contextual information that emotions and emojis provide, particularly in cases where the sentiment is complicated or sarcastic. Therefore, incorporating emotion and emoji detection tasks could be a useful technique for enhancing the performance of sentiment classification models. Moreover, incorporating additional features, such as Elmo embeddings and TF-IDF features, can also improve the sentiment detection performance.
According to the results presented in Table10, we can observe that the performance of the emotion task increases as we transition from single-task learning to dual-task and eventually to tri-task learning. In the single-task setup, the CM-RFT model outperforms the CM-T and CM-FT models across all three feature combinations, indicating that incorporating sentiment and emoji information can enhance the emotion detection tasks performance. In the dual-task setup with emoji, the performance of all models is considerably lower than in the single-task setup. However, the performance improves as more features are incorporated, and the CM-RFT model achieves the best results with all three features. This suggests that utilizing various feature types can benefit joint learning of emoji and emotion detection, and the tri-task setup may provide further improvement. In the dual-task setup with the sentiment, the performance is better than with emoji. The addition of Elmo embeddings and TF-IDF features leads to consistent performance improvement, with the CM-RFT model again achieving the best results. This implies that joint learning of sentiment and emotion detection can also benefit from the use of multiple feature types.
The presence of sentiment and emoji information appears to enhance the emotion tasks performance, as suggested by the results. The best performance for the emotion task was obtained in the tri-task learning setup, which involved jointly learning sentiment, emotion, and emoji detection tasks. The improvement in performance can be attributed to the fact that sentiment and emoji provide additional contextual information that can help in better disambiguation of emotions.
The results also suggest that multitask learning is more effective than single-task learning, especially when the tasks are related, such as emotion, sentiment, and emoji detection. The emotion tasks performance improved consistently as we progressed from single-task to dual-task and finally to tri-task learning. This indicates that joint learning of related tasks can better utilize the available information and improve the overall performance of the system.
The presented results in Table11 indicate that the CM-RFT model proposed in this study performs better than the state-of-the-art models for both sentiment and emoji detection tasks. In the single-task scenario, mBERT achieved the highest accuracy of 63.77% and an F1 score of 61.54% for the emoji detection task. However, in the multi-task setting, the proposed CM-RFT model surpasses all other models, achieving an accuracy of 75.81% and an F1 score of 73.25%. This shows that the proposed model effectively uses multi-task learning to improve the performance of both tasks. Moreover, the model also shows promising results for the unsupervised emotion detection task, with an F1 score of 60.53% and an accuracy of 63.73%. This demonstrates that the zero-shot approach utilized in the proposed model is effective in detecting emotions from the text even without labeled data.
When focusing on the emoji prediction task, the proposed CM-RFT model outperforms both single-task and multi-task models significantly. The model achieves an accuracy of 75.81%, which is approximately 12% higher than the accuracy of the best-performing single-task model (mBERT) and approximately 9% higher than the accuracy of the best-performing multi-task model (TL-XLMR(^{[LS]})). Moreover, the models F1 score is 73.25%, which is approximately 12% higher than the F1 score of the best-performing single-task model (mBERT) and approximately 8% higher than the F1 score of the best-performing multi-task model (TL-XLMR(^{[LS]]})).
We conducted additional experiments with our proposed model to compare it fairly with the single- and multi-task baselines discussed earlier. As none of the baseline models addressed unsupervised classification, they couldnt generate scores for the emotion task, unlike our proposed CM-RFT model that solves sentiment and multi-label emoji detection in a supervised setting and emotion detection in an unsupervised setting using a zero-shot approach. Therefore, we trained two versions of the CM-RFT model: one in a single-task setting (CM-RFT(^{STL}) (_{[-Emo]})) for all tasks and another in a multitask setting (CM-RFT(^{MTL}) (_{[-Emo]})) without the emotion task. The results are presented in Table11.
Comparing the performance of CM-RFT(^{STL}) (_{[-Emo]}) with single-task models XLMR, XLMR(^{[FT+LS+RF]}), mBERT, we observe that STL-CM-RFT outperforms all these models in terms of accuracy and F1 scores for the emoji and sentiment tasks. For example, the accuracy of CM-RFT(^{STL}) (_{[-Emo]}) is 67.30% for the emoji task, while the highest accuracy achieved by single-task models is 63.77% by mBERT. Similarly, CM-RFT(^{STL}) (_{[-Emo]}) achieves an F1 score of 74.64% for sentiment detection, while the highest F1 score achieved by single-task models is 70.32% by mBERT. These results indicate that the inclusion of the unsupervised emotion task has indeed helped the model perform better on supervised tasks.
Comparing the performance of CM-RFT(^{MTL}) (_{[-Emo]}) with multi-task models MT-XLMR, TL-XLMR(^{[LS]}), TL-mBERT[LS], we observe that CM-RFT(^{MTL}) (_{[-Emo]}) outperforms all these models in terms of accuracy and F1 scores for both emoji and sentiment tasks. For example, the accuracy of CM-RFT(^{MTL}) (_{[-Emo]}) is 71.68% for the emoji task, while the highest accuracy achieved by multi-task models is 66.83% by TL-XLMR(^{[LS]}). Similarly, MT-CM-RFT achieves an F1 score of 78.22% for sentiment detection, while the highest F1 score achieved by multi-task models is 72.58% by MT-XLMR. These results indicate that the inclusion of the unsupervised emotion task has indeed helped the model perform better in both single-task and multi-task settings.
We evaluate the performance of Llama model on the emotion recognition task by fine-tuning it for three epochs. Our model yielded an F1 score of 60.53 for emotion recognition which positions closely alongside the Llama model, which achieved an F1 score of 61.11. These results underscore the effectiveness of our proposed approach in tackling emotion recognition tasks, indicating its potential for practical applications in natural language processing.
To sum up, the CM-RFT model we proposed outperforms the current state-of-the-art models in both sentiment and emoji detection tasks. Our results indicate that taking advantage of multi-task learning and utilizing a zero-shot approach for unsupervised emotion detection can lead to substantial improvements in task performance. For the emoji prediction task, our proposed model achieves a remarkable improvement over the best-performing single-task and multi-task models, demonstrating the efficacy of our approach.
To assess the effectiveness of our model, we conducted comparisons with several papers and their corresponding models.
Comparison Study 1: Emotion Detection in Code-Mixed Roman Urdu - English Text51. Models: We compared our model with BERT and XLM-RoBERTa. Dataset Used: We used the Code-Mixed Roman Urdu - English Text dataset. The results, as shown in Table12, indicate that our model outperforms both BERT and XLM-RoBERTa with an F1 score of 0.69, demonstrating its effectiveness in detecting emotions in code-mixed text.
Comparison Study 2: A self-attention hybrid emoji prediction model for code-mixed language92 Models: We compared our model with BARF. Dataset Used: We used the Hinglish Emoji Prediction (HEP) dataset. The results, as presented in Table13, indicate that our model achieves a higher F1 score of 0.64 compared to BARF, demonstrating its superior performance in predicting emojis in code-mixed language.
Comparison Study 3: Multitasking of sentiment detection and emotion recognition in code-mixed Hinglish data6 Models: We compared our model with TL-XLMR(^{MTL}_{LS}). Dataset Used: We used the SemEval-2020 Task 9 dataset93. Table14 displays the results, showing that our model achieves higher F1 scores for both emotion detection (76.22) and sentiment analysis (70.31) compared to TL-XLMR(^{MTL}_{LS}), indicating its effectiveness in multitasking for sentiment and emotion recognition in code-mixed Hinglish data.
Table15 shows the results of four ablation experiments aimed at evaluating the contribution of different components in the proposed CM-RFT framework. The four components examined are the GLU module, the auto-encoder and ANP module, the self-attention mechanism, and the collective combination of GLU, self-attention, ANP, and AE modules.
The results indicate that each component contributes to the overall performance of the CM-RFT framework. Removing any of these components leads to a significant decline in F1 scores for all three tasks, especially when all four modules are removed (row 4). This suggests that the proposed framework is well-designed, and each module plays a critical role in its success. Specifically, the GLU module seems to be a crucial part of the framework (row 1). The removal of this component leads to a significant decrease in performance across all three tasks, highlighting the importance of non-linear transformations in the text encoder. Similarly, removing the auto-encoder and ANP module leads to a drop in performance (row 2), indicating the importance of these unsupervised pre-training methods in learning useful feature representations. Moreover, the self-attention mechanism appears to be more effective than linear concatenation in fusing the output features of the GLU and Trans Encoder modules (row 3). This result confirms the superior performance of self-attention in capturing long-range dependencies and modeling interactions among input tokens. Finally, the collective combination of GLU, SA, ANP, and AE modules is a highly effective feature learning mechanism (row 4), but it also leads to higher computational costs. The result suggests that one can still achieve decent performance with a simpler linear concatenation mechanism, albeit at the cost of reduced model capacity and expressive power.
In summary, the ablation experiments demonstrate the importance of each module in the proposed CM-RFT framework for multi-label emoji prediction. The findings can guide the design of future models and shed light on the underlying mechanisms that contribute to their success.
Table16 shows the results of four ablation experiments where each experiment is compared to the proposed CM-RFT containing all three loss functions (({mathcal {L}}_{ad}), ({mathcal {L}}_{re}), and ({mathcal {L}}_{al})) for the emoji, emotion, and sentiment tasks.
The F1 scores for all three tasks consistently decrease in each ablation experiment when any of the loss functions are removed. The largest decrease in performance is observed when all three loss functions are removed, indicating that each loss function plays an important role in the models performance. Specifically, removing the ({mathcal {L}}_{ad}) and ({mathcal {L}}_{re}) loss functions has the greatest negative impact on the models performance compared to removing only one of these loss functions. This suggests that these loss functions contribute significantly to the models ability to capture relevant features for both the adversarial training and reconstruction of the input data.
In terms of the contributions of the individual loss functions, the adversarial loss (({mathcal {L}}_{ad})) appears to have a slightly larger impact on performance compared to the alignment loss (({mathcal {L}}_{al})) and reconstruction loss (({mathcal {L}}_{re})), especially for the emoji and emotion detection tasks. This indicates that adversarial loss plays an important role in the models ability to distinguish between different classes for these tasks. On the other hand, the alignment loss and reconstruction loss appear to be more important for sentiment detection.
Overall, these results demonstrate the importance of the proposed loss functions for effective training of the multitask emoji, emotion, and sentiment detection system. These findings can be used to guide the development of more effective training strategies for multitasking learning models in the future. For example, incorporating additional loss functions or modifying the weighting of existing loss functions may improve the models performance. Additionally, these results suggest that the importance of different loss functions may vary depending on the specific tasks being performed and the data being used, highlighting the importance of careful analysis and selection of loss functions in the design of multitask learning models.
In this section, we provide a qualitative analysis of our proposed multitask framework, which takes into account the relationship between emoji, sentiment, and emotion, as we previously mentioned. To illustrate the impact of these tasks on one another, we have selected several examples from the SENTIMOJI dataset and present them in Table17.
Observation 1: In the first sentence, the model correctly predicts a heart emoji, positive sentiment, and joy as the emotion. The model seems to have picked up on the positive sentiment and joy from the words too good and dont know respectively, and predicted the heart emoji to match the positive sentiment. Moreover, the word bhai (brother) may imply a friendly or affectionate tone, leading to the identification of the heart emoji. Finally, the presence of the word joy or similar words in the training data might have helped the model to identify the emotion accurately.
Observation 2: In the second sentence, the model correctly predicts the negative sentiment, but the predicted emoji is wrong. The model predicted a pouting face instead of an angry face, which could be because the pouting face emoji can also indicate dissatisfaction or annoyance, which might be related to pride. Additionally, the emotion is misclassified as disgust instead of anger, which could be because of the strong negative sentiment and the use of words like failure and cant do this.
Observation 3: In the third sentence, the model correctly predicts the Face With Open Mouth, Throwing Up emoji, indicating disgust, along with the negative sentiment. The sentence contains words like missing, which suggests a negative sentiment, and the use of the Face With Open Mouth, Throwing Up emoji, and disgust emotion can be related to the revulsion expressed in the sentence.
Observation 4: In the first multi-label sentence, the model correctly predicts the negative sentiment and joy as the emotion, but only partially predicts the emojis. The use of hardik subhkamnaye and Congratulations sir ji in the sentence indicates a positive sentiment and the use of Dobara pm banee suggests a sense of achievement, which could explain the use of the heart and sparkles emojis. The misclassification of the smiling face emoji could be due to the lack of contextual information or insufficient training data.
Observation 5: In the second multi-label sentence, the model correctly predicts the negative sentiment but misclassifies the emotion as disgust instead of anger. For the emojis, the model predicted pouting face, crying face, and dissapointed face, but the original annotations have pouting face, angry face, and Face With Open Mouth, Throwing Up. This could be because the model picked up on the negative sentiment and the use of words like respect, anything, and woman, which might have led to the prediction of the pouting face emoji, while the crying face and dissapointed face emojis could be related to the negative sentiment.
Observation 6: In the third multi-label sentence, the model correctly identifies the sentiment as negative but wrongly predicts the emotion as anger instead of sad. The model also partially predicts the emojis, which may be due to the presence of multiple emotions in the sentence. To improve the prediction, the model could be trained on more data that contains similar phrases and words to better distinguish between different negative emotions and emojis.
The analysis of the incorrect predictions revealed several common error patterns, which are summarized below:
Ambiguity in Emoji Interpretation: The model often struggles with emojis that have multiple interpretations depending on the context. For example, the emoji can represent both laughter and tears of joy, leading to misclassifications.
Negation and Sarcasm: Negation and sarcasm in text can lead to misinterpretations by the model, especially in sentiment analysis. For instance, the phrase not bad may be interpreted as positive by the model, leading to misclassification.
Lack of Context: The model sometimes fails to capture the context of a sentence, leading to errors in sentiment and emotion classification. For example, short or contextually ambiguous sentences may be misclassified.
Data Imbalance: Imbalance in the distribution of classes can lead to biases in the models predictions, especially for minority classes. This is particularly evident in emotion classification, where some classes have fewer examples than others.
Out-of-Vocabulary Words: The presence of out-of-vocabulary words in the text can lead to errors, especially when the model is unable to capture their semantics. This is more common in emoji and sentiment analysis tasks.
These error patterns highlight the challenges faced by the proposed CM-RFT model in understanding and interpreting text across different tasks. Addressing these challenges requires further research into more robust modeling techniques, better handling of context and ambiguity, and mitigation of biases in the data.
The joint learning of sentiment and emotion tasks with the emoji prediction task may have benefited the performance of the emoji task. This is because emotions and sentiments can provide additional context for the model to predict the appropriate emojis. For example, in the first correct prediction sample, the model was able to correctly predict the heart emoji, which may have been influenced by the positive sentiment and joyful emotion predicted for the sentence. Similarly, in the second incorrect prediction sample, the model correctly predicted the negative sentiment but misclassified the emotion and emoji, suggesting that it may not have fully captured the nuances of the text.
Single-label emojis can be a risk in multilabel emoji prediction because the emojis can have different meanings in different contexts, and a single emoji may not be able to capture all the nuances of the text. For example, the pouting face emoji can be used to express anger, disappointment, or sadness, and without additional context, it can be difficult to determine the exact emotion being conveyed. We observe in the incorrect prediction samples, that the model has predicted some of the emojis correctly while missing some. It is better than having fully incorrect predictions because it shows that the model has some understanding of the context and can predict the relevant emojis to some extent. However, there is still room for improvement in the models performance.
To improve the models predictions, we can consider the following steps:
Increase the training data: The model might benefit from additional training data to capture the various nuances of language and emotions.
Incorporate context: The model might benefit from incorporating the context of the sentence to better identify the sentiment, emoji, and emotion.
Use pre-trained language models: The model might benefit from using pre-trained language models that can capture the semantic meaning of words and phrases.
Regularize the model: The model might benefit from regularization techniques to prevent overfitting and improve generalization.
Analyze and correct errors: Analyzing the models errors and correcting them might help improve the models performance over time.
We perform a study using ChatGPT(https://chat.openai.com/) to demonstrate the effectiveness of our proposed framework. We notice that CM-RFT has an overwhelming performance advantage over ChatGPT. A few sample predictions from ChatGPT on the TASKS task are shown below:
Prompt: Read these hinglish utterances and find the suitable emojis, emotion, and sentiment:
tere liye chand nhi la sakta baby actually tu bhaad mein ja
Tere ghamand k karan hi aaj congress k ye halat hai ... failure hai tu Bhai .. Tujhse na ho payega
Congress ki sarker mai cylinder he gayab ho gaya tha
Human Annotators:
Emoji Label: , ,
Emotion Label: Anger, Anger, Disgust.
Sentiment Label: Negative, Negative, Negative
Proposed_MODEL:
Emoji Label: , ,
Emotion Label: Anger, Disgust, Disgust.
Sentiment Label: Negative, Negative, Negative
ChatGPT:
Emoji Label: , ,
Emotion Label: Dismissive, Anger, Confusion.
Sentiment Label: Negative, Negative, Neutral (depending on the context, it could be interpreted as negative)
In our analysis, it is evident that our model yields results akin to ChatGPT. While ChatGPT is renowned for its high performance, our model demonstrates proficiency, particularly in handling codemixed sentences.
While our proposed CM-RFT model demonstrates strong performance across multiple tasks, there are several limitations and potential biases that need to be addressed:
Data Bias: The performance of the model heavily relies on the quality and representativeness of the training data. Biases present in the training data, such as underrepresentation of certain demographics or topics, can lead to biased predictions by the model.
Language Bias: The models performance may vary across different languages due to differences in linguistic structures, cultural nuances, and availability of training data. It may perform better on languages that are well-represented in the training data compared to those that are not.
Context Sensitivity: The models performance is influenced by the context in which the text is presented. It may struggle with contextually ambiguous or sarcastic text, leading to misinterpretations.
Generalization: The models ability to generalize to unseen data or domains is limited by the diversity and representativeness of the training data. It may perform well on data similar to the training data but struggle with out-of-domain or adversarial examples.
Interpretability: The complex architecture of the proposed CM-RFT model may hinder its interpretability, making it challenging to understand how and why certain predictions are made. This lack of interpretability can limit the models usefulness in real-world applications where transparency and accountability are important.
Addressing these limitations and biases requires careful consideration of model design, training data, evaluation metrics, and ethical considerations. Future research should focus on developing more robust and fair AI models that are capable of handling diverse languages, cultures, and contexts while ensuring transparency, interpretability, and accountability. Additionally, efforts should be made to collect more diverse and representative training data and to develop evaluation metrics that account for biases and fairness concerns. By addressing these challenges, we can build AI models that are more reliable, equitable, and trustworthy for real-world applications.
Continued here:
Predicting multi-label emojis, emotions, and sentiments in code-mixed texts using an emojifying sentiments framework ... - Nature.com
Machine Learning Stocks to Buy That Are Millionaire-Makers: May – InvestorPlace
Source: Wright Studio / Shutterstock.com
The next phase of technology has been established: machine learning and AI will revolutionize the world for the better. Although it might seem like these stocks are trading in a bubble, investors need to keep a discerning and keen long-term vision for these disruptive, emerging technologies. Some way or another, AI will grow to become a secular movement that nearly every industry, not every company in the world, will incorporate to increase productivity and efficiency.
Of course, anxiousness about the AI bubble is not unwarranted. Preparing a well-diversified portfolio of the right stocks is crucial to avoid such major drawdowns. Just because a company mentions AI doesnt mean it instantly becomes a good investment. Weve already seen this with pullbacks in industries like EVs and fintech. So, if you want to gain machine learning exposure in your portfolio, consider these three machine learning stocks to buy and thank us in the coming five or ten years.
Source: Ascannio / Shutterstock.com
Palantir (NYSE:PLTR) went from a meme stock to a legitimate business, earning hundreds of millions each year in profits. The stock is trading right at the average analyst price target of $21.45 and has a street-high price target of $35.00. This high-end target represents a more than 60% upside from the current price.
This stock has been polarizing on Wall Street since its direct listing debut in September 2020. While the first few years were a roller coaster ride for investors, the stock is earning legitimate backing through its machine-learning integrated production deployment infrastructure. Additionally, the hype doesnt get any more legit than Stanley Druckenmiller, who disclosed that he bought nearly 770,000 shares in the recent quarter! For those who dont know him, Druckenmiller has long supported the ML revolution, with NVIDIA (NASDAQ:NVDA) being his most recent win during its massive rally over the past year.
The problem with Palantir has always been its valuation. Currently, shares trade at 21x sales and 65x forward earnings. Nonetheless, growth prospects are looking strong now, with revenue growing at a five-year compound annual growth rate (CAGR) of 12% and a three-year CAGR of 21%. As multiples begin to compress, investors should consider Palantir to be a legitimate money-making contender in the ML space.
Baidu (NASDAQ:BIDU) is a Chinese technology company that recently amassed over 200 million users on its new Ernie AI chatbot. This year, the stock is down by about 4.0% as Chinese stocks have lagged the broader rally in US equities. Nonetheless, Wall Street has maintained an average analyst price target of $153.36, about 40% higher than the current price.
Baidu recently made headlines after reporting it was interested in partnering with Tesla (NASDAQ:TSLA) to use its robotaxis in China. As China looks to get its hands on some for immediate rollout, investors should keep their eyes peeled for the unveiling of the CyberCabs in America this August. Not only will this potentially be one of the strongest new channels for revenue growth for both these companies, but Baidus race to get first movers advantage could solidify it as a leader in the Chinese automobile space.
As with many Chinese ADR stocks, the multiples for BIDU are low. For example, its P/E ratio of 9.79x is sitting 25% lower than its sectors median! On top of such a discounted valuation, Baidu has maintained a strong 10-year revenue CAGR of 14%. Baidu looks like a bargain for investors who can tolerate the risk that comes with Chinese stocks.
Micron Technologies (NASDAQ:MU) is an American chip maker with a major surge in demand due to AI and machine learning technology. Analysts are bullish on MU, with 28 of 31 recommendations coming in May as a Buy or Strong Buy rating. The average analyst price target is $145.52, nearly 15% higher than the current price.
This chip maker has already hit new all-time highs this month and is seeing revitalized product demand. This growth potential has largely been attributed to Micron being one of three companies in the world that make DRAM memory chips. These chips allow for storing massive amounts of data, which will help accelerate the training of AI and machine learning technologies. These DRAM chips account for 71% of Microns revenue as of Q2 2024, which bodes well for the stocks upward momentum.
Usually, when a stock trades at all-time highs, its valuations also stretch. Thats not exactly true for Micron, as shares are trading at just 7.5x sales and 17x forward earnings. As revenue growth accelerates, Micron sticks out as one of the more under-the-radar ways to gain exposure to AI and potentially join the million-dollar club.
On the date of publication, Ian Hartana and Vayun Chugh did not hold (either directly or indirectly) any positions in the securities mentioned in this article. The opinions expressed in this article are those of the writer, subject to the InvestorPlace.comPublishing Guidelines.
Chandler Capital is the work of Ian Hartana and Vayun Chugh. Ian Hartana and Vayun Chugh are both self-taught investors whose work has been featured in Seeking Alpha. Their research primarily revolves around GARP stocks with a long-term investment perspective encompassing diverse sectors such as technology, energy, and healthcare.
Read the rest here:
Machine Learning Stocks to Buy That Are Millionaire-Makers: May - InvestorPlace
Slack is training its machine learning on your chat behavior unless you opt out via email – TechRadar
Slack has been using customer data to power its machine learning functions, including search result relevance and ranking, leading to the company being criticized over confusing policy updates that led many to believe that their data was being used to train its AI models.
According to the company's policy, those wishing to opt out must do so through their organizations Slack admin, who must email the company to put a stop to data use.
Slack has confirmed in correspondence to TechRadar Pro that the information it uses to power its ML not its AI is de-identified and does not access message content.
An extract from the companys privacy principles page reads:
To develop non-generative AI/ML models for features such as emoji and channel recommendations, our systems analyze Customer Data (e.g. messages, content, and files) submitted to Slack as well as Other Information (including usage information) as defined in our Privacy Policy and in your customer agreement.
Another passage reads: To opt out, please have your org, workspace owners or primary owner contact our Customer Experience team at feedback@slack.com
The company does not provide a timeframe for processing such requests.
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
In response to uproar among the community, the company posted a separate blog post to address concerns arising, adding: We do not build or train these models in such a way that they could learn, memorize, or be able to reproduce any customer data of any kind.
Slack confirmed that user data is not shared with third-party LLM providers for training purposes.
The company added in its correspondence to TechRadar Pro that its "intelligent features (not Slack AI) analyze metadata like user behavior data surrounding messages, content and files but they don't access message content."
See the original post:
Slack is training its machine learning on your chat behavior unless you opt out via email - TechRadar
US Pharma and Biotech Summit 2024: Artificial Intelligence and Machine Learning Through the Eyes of the FDA Part II – Pharmaceutical Executive
PE: Do you see the FDA placing any restrictions on the use of AI and machine learning as times goes on? What may prompt such actions?
Fakhouri: Like I mentioned during the keynote interview, we get asked, does FDA regulate large language models? Are you going to ban generative AI use? My response is that we typically don't regulate linear regression. We look at the data and the information that any modeling technique is producing, and we want to make sure that the information is trustworthy. So, I wouldn't say that we would be banning or prohibiting a certain AI or machine learning type of algorithm, what we're actually interested in is how robust how accurate, how credible, the information from these models is.
PE: What do you think the future may hold for AI and machine learning in pharma R&D in both the short- and long-term?
Fakhouri: We're actually very excited about AI use, I think we're seeing that it's increasing efficiencies in different parts of the drug development process. If you think about things such as discovery or protein folding, which again, is outside of what we normally look at, it could potentially cut the development time by years. This is all very exciting, because it could translate into faster, safe and effective drugs coming into the market. It can also fill in certain gaps for rare diseases, for example, where we can see a lot of potential use for AI to accelerate the development of drugs. In this type of situation, that's what I would say would be the long term. With the short term, I think what we're all doing, whether it's industry, whether it's the regulator's academia, is we're going through this adoption curve. You need to train your staff, you need to bring in the right expertise, and you need to develop the right tools to solve the right problems. That's going to take some time and that's why I think the short term uses of AI are going to be mostly low hanging type of fruits where you're increasing operational efficiency, but then that will translate into the development of safe and effective drugs faster.
Machine Learning Researcher Links OpenAI to Drug-Fueled Sex Parties – Futurism
A machine learning researcher is claiming to have knowledge of kinky drug-fueled orgies in Silicon Valley's storied hacker houses and appears to be linking those parties, and the culture surrounding them, to OpenAI.
"The thing about being active in the hacker house scene is you are accidentally signing up for a career as a shadow politician in the Silicon Valley startup scene," begins the lengthy X-formerly-Twitter post by Sonia Joseph, a former Princeton ML researcher who's now affiliated with the deep learning institute Mila Quebec.
What follows is a vague and anecdotal diatribe about the "dark side" of startup culture made particularly explosive by Joseph's reference to so-called "consensual non-consent" sex parties that she says took place within the artificial general intelligence (AGI) enthusiast community in the valley.
The jumping off point, as far as we can tell, stems from a thread announcing that OpenAI superalignment chief Jan Leike was leaving the company as it dissolved his team that was meant to prevent advanced AI from going rogue.
At the end of his X thread, Leike encouraged remaining employees to "feel the AGI," a phrase that was also ascribed to newly-exited OpenAI cofounder Ilya Sutskever during seemingly cultish rituals revealed in an Atlantic expos last year but nothing in that piece, nor the superalignment chief's tweets, suggests anything having to do with sex, drugs, or kink.
Still, Joseph addressed her second viral memo-length tweet "to the journalists contacting me about the AGI consensual non-consensual (cnc) sex parties." And in the post, said she'd witnessed "some troubling things" in Silicon Valley's "community house scene" when she was in her early 20s and new to the tech industry.
"It is not my place to speak as to why Jan Leike and the superalignment team resigned. I have no idea why and cannot make any claims," wrote the researcher, who is not affiliated with OpenAI. "However, I do believe my cultural observations of the SF AI scene are more broadly relevant to the AI industry."
"I don't think events like the consensual non-consensual (cnc) sex parties and heavy LSD use of some elite AI researchers have been good for women," Joseph continued. "They create a climate that can be very bad for female AI researchers... I believe they are somewhat emblematic of broader problems: a coercive climate that normalizes recklessness and crossing boundaries, which we are seeing playing out more broadly in the industry today. Move fast and break things, applied to people."
While she said she doesn't think there's anything generally wrong with "sex parties and heavy LSD use," she also charged that the culture surrounding these alleged parties "leads to some of the most coercive and fucked up social dynamics that I have ever seen."
"I have seen people repeatedly get shut down for pointing out these problems," Joseph wrote. "Once, when trying to point out these problems, I had three OpenAI and Anthropic researchers debate whether I was mentally ill on a Google document. I have no history of mental illness; and this incident stuck with me as an example of blindspots/groupthink."
"Its likely these problems are not really on OpenAI but symptomatic of a much deeper rot in the Valley," she added. "I wish I could say more, but probably shouldnt."
Overall, it's hard to make heads or tails of these claims.We've reached out to Joseph and OpenAI for more info.
"I'm not under an NDA. I never worked for OpenAI," Joseph wrote. "I just observed the surrounding AI culture through the community house scene in SF, as a fly-on-the-wall, hearing insider information and backroom deals, befriending dozens of women and allies and well-meaning parties, and watching many them get burned."
More on OpenAI: Sam Altman Clearly Freaked Out by Reaction to News of OpenAI Silencing Former Employees
See the rest here:
Machine Learning Researcher Links OpenAI to Drug-Fueled Sex Parties - Futurism
Create a multimodal assistant with advanced RAG and Amazon Bedrock | Amazon Web Services – AWS Blog
Retrieval Augmented Generation (RAG) models have emerged as a promising approach to enhance the capabilities of language models by incorporating external knowledge from large text corpora. However, despite their impressive performance in various natural language processing tasks, RAG models still face several limitations that need to be addressed.
Naive RAG models face limitations such as missing content, reasoning mismatch, and challenges in handling multimodal data. Although they can retrieve relevant information, they may struggle to generate complete and coherent responses when required information is absent, leading to incomplete or inaccurate outputs. Additionally, even with relevant information retrieved, the models may have difficulty correctly interpreting and reasoning over the content, resulting in inconsistencies or logical errors. Furthermore, effectively understanding and reasoning over multimodal data remains a significant challenge for these primarily text-based models.
In this post, we present a new approach named multimodal RAG (mmRAG) to tackle those existing limitations in greater detail. The solution intends to address these limitations for practical generative artificial intelligence (AI) assistant use cases. Additionally, we examine potential solutions to enhance the capabilities of large language models (LLMs) and visual language models (VLMs) with advanced LangChain capabilities, enabling them to generate more comprehensive, coherent, and accurate outputs while effectively handling multimodal data. The solution uses Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies, providing a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
The mmRAG solution is based on a straightforward concept: to extract different data types separately, you generate text summarization using a VLM from different data types, embed text summaries along with raw data accordingly to a vector database, and store raw unstructured data in a document store. The query will prompt the LLM to retrieve relevant vectors from both the vector database and document store and generate meaningful and accurate answers.
The following diagram illustrates the solution architecture.
The architecture diagram depicts the mmRAG architecture that integrates advanced reasoning and retrieval mechanisms. It combines text, table, and image (including chart) data into a unified vector representation, enabling cross-modal understanding and retrieval. The process begins with diverse data extractions from various sources such as URLs and PDF files by parsing and preprocessing text, table, and image data types separately, while table data is converted into raw text and image data into captions.
These parsed data streams are then fed into a multimodal embedding model, which encodes the various data types into uniform, high dimensional vectors. The resulting vectors, representing the semantic content regardless of original format, are indexed in a vector database for efficient approximate similarity searches. When a query is received, the reasoning and retrieval component performs similarity searches across this vector space to retrieve the most relevant information from the vast integrated knowledge base.
The retrieved multimodal representations are then used by the generation component to produce outputs such as text, images, or other modalities. The VLM component generates vector representations specifically for textual data, further enhancing the systems language understanding capabilities. Overall, this architecture facilitates advanced cross-modal reasoning, retrieval, and generation by unifying different data modalities into a common semantic space.
Developers can access mmRAG source codes on the GitHub repo.
You start by configuring Amazon Bedrock to integrate with various components from the LangChain Community library. This allows you to work with the core FMs. You use the BedrockEmbeddings class to create two different embedding models: one for text (embedding_bedrock_text) and one for images (embeddings_bedrock_image). These embeddings represent textual and visual data in a numerical format, which is essential for various natural language processing (NLP) tasks.
Additionally, you use the LangChain Bedrock and BedrockChat classes to create a VLM model instance (llm_bedrock_claude3_haiku) from Anthropic Claude 3 Haiku and a chat instance based on a different model, Sonnet (chat_bedrock_claude3_sonnet). These instances are used for advanced query reasoning, argumentation, and retrieval tasks. See the following code snippet:
In this section, we explore how to harness the power of Python to parse text, tables, and images from URLs and PDFs efficiently, using two powerful packages: Beautiful Soup and PyMuPDF. Beautiful Soup, a library designed for web scraping, makes it straightforward to sift through HTML and XML content, allowing you to extract the desired data from web pages. PyMuPDF offers an extensive set of functionalities for interacting with PDF files, enabling you to extract not just text but also tables and images with ease. See the following code:
The following code snippets demonstrate how to generate image captions using Anthropic Claude 3 by invoking the bedrock_get_img_description utility function. Additionally, they showcase how to embed image pixels along with image captioning using the Amazon Titan image embedding model amazon.titan_embeding_image_v1 by calling the get_text_embedding function.
You can harness the capabilities of the newly released Anthropic Claude 3 Sonnet and Haiku on Amazon Bedrock, combined with the Amazon Titan image embedding model and LangChain. This powerful combination allows you to generate comprehensive text captions for tables and images, seamlessly integrating them into your content. Additionally, you can store vectors, objects, raw image file names, and source documents in an Amazon OpenSearch Serverless vector store and object store. Use the following code snippets to create image captions by invoking the utility function bedrock_get_img_description. Embed image pixels along with image captions using the Amazon Titan image embedding model amazon.titan_embeding_image_v1 by calling the get_text_embedding functions.
You can consult the provided code examples for more information on how to embed multimodal and insert vector documents into the OpenSearch Serverless vector store. For more information about data access, refer to Data access control for Amazon OpenSearch Serverless.
Fusion in RAG presents an innovative search strategy designed to transcend the limitations of conventional search techniques, aligning more closely with the complex nature of human inquiries. This initiative elevates the search experience by integrating multi-faceted query generation and using Reciprocal Rank Fusion for an enhanced re-ranking of search outcomes. This approach offers a more nuanced and effective way to navigate the vast expanse of available information, catering to the intricate and varied demands of users searches.
The following diagram illustrates this workflow.
We use the Anthropic Claude 3 Sonnet and Haiku models, which possess the capability to process visual and language data, which enables them to handle the query decomposition (Haiku) and answer fusion (Sonnet) stages effectively. The following code snippet demonstrates how to create a retriever using OpenSearch Serverless:
The combination of decomposition and fusion intend to address the limitations of the chain-of-thought (CoT) method in language models. It involves breaking down complex problems into simpler, sequential sub-problems, where each sub-problem builds upon the solution of the previous one. This technique significantly enhances the problem-solving abilities of language models in areas such as symbolic manipulation, compositional generalization, and mathematical reasoning.
The RAG-decomposition approach, which uses the decomposition step (see the following code), underscores the potential of a technique called least-to-most prompting. This technique not only improves upon existing methods but also paves the way for more advanced, interactive learning frameworks for language models. The ultimate goal is to move towards a future where language models can learn from bidirectional conversations, enabling more effective reasoning and problem-solving capabilities.
The RAG process is further enhanced by integrating a reciprocal re-ranker, which uses sophisticated NLP techniques. This makes sure the retrieved results are relevant and also semantically aligned with the users intended query. This multimodal retrieval approach seamlessly operates across vector databases and object stores, marking a significant advancement in the quest for more efficient, accurate, and contextually aware search mechanisms.
The mmRAG architecture enables the system to understand and process multimodal queries, retrieve relevant information from various sources, and generate multimodal answers by combining textual, tabular, and visual information in a unified manner. The following diagram highlights the data flows from queries to answers by using an advanced RAG and a multimodal retrieval engine powered by a multimodal embedding model (amazon.titan-embed-image-v1), an object store (Amazon S3), and a vector database (OpenSearch Serverless). For tables, the system retrieves relevant table locations and metadata, and computes the cosine similarity between the multimodal embedding and the vectors representing the table and its summary. Similarly, for images, the system retrieves relevant image locations and metadata, and computes the cosine similarity between the multimodal embedding and the vectors representing the image and its caption.
The following screenshot illustrates the improved accuracy and comprehensive understanding of the users query with multimodality capability. The mmRAG approach is capable of grasping the intent behind the query, extracting relevant information from the provided chart, and estimating the overall costs, including the estimated output token size. Furthermore, it can perform mathematical calculations to determine the cost difference. The output includes the source chart and a link to its original location.
Amazon Bedrock offers a comprehensive set of generative AI models for enhancing content comprehension across various modalities. By using the latest advancements in VLMs, such as Anthropic Claude 3 Sonnet and Haiku, as well as the Amazon Titan image embedding model, Amazon Bedrock enables you to expand your document understanding beyond text to include tables, charts, and images. The integration of OpenSearch Serverless provides enterprise-grade vector storage and approximate k-NN search capabilities, enabling efficient retrieval of relevant information. With advanced LangChain decomposition and fusion techniques, you can use multi-step querying across different LLMs to improve accuracy and gain deeper insights. This powerful combination of cutting-edge technologies allows you to unlock the full potential of multimodal content comprehension, enabling you to make informed decisions and drive innovation across various data sources.
The reliance on visual language models and image embedding models for comprehensive and accurate image captions has its limitations. Although these models excel at understanding visual and textual data, the multi-step query decomposition, reciprocal ranking, and fusion processes involved can lead to increased inference latency. This makes such solutions less suitable for real-time applications or scenarios that demand instantaneous responses. However, these solutions can be highly beneficial in use cases where higher accuracy and less time-sensitive responses are required, allowing for more detailed and accurate analysis of complex visual and textual data.
In this post, we discussed how you can use multimodal RAG to address limitations in multimodal generative AI assistants. We invite you to explore mmRAG and take advantage of the advanced features of Amazon Bedrock. These powerful tools can assist your business in gaining deeper insights, making well-informed decisions, and fostering innovation driven by more accurate data. Ongoing research efforts are focused on developing an agenic and graph-based pipeline to streamline the processes of parsing, injection, and retrieval. These approaches hold the promise of enhancing the reliability and reusability of the mmRAG system.
Authors would like to expression sincere gratitude to Nausheen Sayed, Karen Twelves, Li Zhang, Sophia Shramko, Mani Khanuja, Santhosh Kuriakose, and Theresa Perkins for their comprehensive reviews.
Alfred Shenis a Senior AI/ML Specialist at AWS. He has been working in Silicon Valley, holding technical and managerial positions in diverse sectors including healthcare, finance, and high-tech. He is a dedicated applied AI/ML researcher, concentrating on CV, NLP, and multimodality. His work has been showcased in publications such as EMNLP, ICLR, and Public Health.
Changsha Ma is an generative AI Specialist at AWS. She is a technologist with a PhD in Computer Science, a masters degree in Education Psychology, and years of experience in data science and independent consulting in AI/ML. She is passionate about researching methodological approaches for machine and human intelligence. Outside of work, she loves hiking, cooking, hunting food, mentoring college students for entrepreneurship, and spending time with friends and families.
Julianna Delua is a Principal Specialist for AI/ML and generative AI. She serves the financial services industry customers including those in Capital Markets, Fintech and Payments. Julianna enjoys helping businesses turn new ideas into solutions and transform the organizations with AI-powered solutions.
Continue reading here:
Create a multimodal assistant with advanced RAG and Amazon Bedrock | Amazon Web Services - AWS Blog
EU AI Act Clears Final Hurdle to Become Global Landmark – InformationWeek
The European Union (EU) on Tuesday passed the AI Act, a landmark legislative effort that marks the first comprehensive regulations to create guardrails for artificial intelligence.
EU members final approval means the act will enter into force next month. The law, first drafted in 2021, was put on a fast track in recent months as global leaders race to adopt safeguards to keep pace with the explosive growth in generative AI (GenAI) adoption.
This landmark law, the first of its kind in the world, addresses a global technological challenge that also creates opportunities for our societies and economies, Belgian Digitalization Minister Mathieu Michel said in a statement. With the AI Act, Europe emphasizes the importance of trust, transparency and accountability when dealing with new technologies while at the same time ensuring this fast-changing technology can flourish and boost European innovation.
But US companies will certainly take notice as the rules will apply to any company doing business in Europe. And the cost of running afoul of the rules could be substantial, even for multibillion-dollar US firms.
Rules for general purpose AI models will impact companies after 12 months while rules for AI systems embedded into products will strike in 36 months. Bans on AI in predictive policing, and untargeted scraping of facial images from video will come into play in six months. Fines will range from $8.2 million or 1.5% of global turnover to $37.9 million or 7% of turnover, depending on the violation.
Related:EU AI Act Passes: How CIOs Can Prepare
The EU AI Act clearing its final hurdle today marks a significant milestone in the regulatory landscape of AI globally, Manoj Saxena, InformationWeek Insight Circle member and founder of the Responsible AI Institute, tells us via email. Although it may not directly affect US-based AI developers like OpenAI, Microsoft, Google, and Meta until 2025, its implications are profound.
US companies are already bracing for change, Saxena tells InformationWeek. We are already seeing an uptick in consultations as our member companies prepare for a future where compliance will not only be mandatory but will also serve as a competitive differentiator in the global marketplace.
Companies, he says, should not take the act lightly. This act is setting a precedent that will likely influence AI regulation and development not just in the world, but across the US."
US legislators on both sides of the aisle have signaled concern about the EUs growing influence on US tech interests. A Biden administration executive order sought to establish some US-based rules, but an administration change could see that order easily canceled.
Related:Cranium, Microsoft, KPMG Launch EU AI Hub
Were glad to see that the EU is taking on the regulation of frontier AI models, Daniel Colson, executive director of the AI Policy Institute, tells InformationWeek in an email. But the American people are clear that they dont want Europe to take the lead on AI regulation, and want us to craft our own policies.
He noted that a poll conducted by the AI Policy Institute showed that the majority of Americans, regardless of partisan leanings, want to see the US pave its own way for AI regulation.
Theres a lot of work to do to improve on the European model of this tiering system as regulation is passed in the US, he says. But fundamentally, its approach is sound and on the right track US regulation has the opportunity to focus even more on reducing the dangers of these most powerful models while broadly supporting responsible innovation.
View post:
EU AI Act Clears Final Hurdle to Become Global Landmark - InformationWeek
Using blood routine indicators to establish a machine learning model for predicting liver fibrosis in patients with … – Nature.com
Study population
The study population consisted of patients diagnosed with Schistosoma japonicum in Yueyang, Hunan Province, China. This city has historically been a high schistosomiasis epidemic area. Because it was located near Dongting Lake in the middle and lower reaches of the Yangtze River, where the Intermediate host Oncomelania hupensis breeds in large numbers.
Schistosoma japonicum infection was diagnosed according to the definition of Zhou et al.26. Including the following diagnostic criteria: life history in schistosomiasis-endemic areas, contact with infected water, specific schistosoma serology testing, color ultrasound, excreta (feces, urine) microscopic examination. Schistosomiasis infection was considered when schistosome ova were visualized in stool, urine or when the Schistosoma serology was positive.
Liver fibrosis was determined by ultrasound according to the World Health Organization diagnostic criteria for Schistosoma japonicum infection27,28. An experienced ultrasound expert divided the patients into two groups according to the ultrasound results: fibrosis group (with mesh-like changes and uneven hepatic echotexture); no-fibrosis group (without mesh-like changes, smooth and uniform hepatic echotexture). The diagnosis was double-checked by another experienced schistosomiasis specialist.
A retrospective medical record review was conducted from June 2019 to June 2022 at Xiangyue Hospital, Yueyang City, Hunan Province of China. All patients underwent blood tests and ultrasound evaluation at admission. All variables were extracted from the hospitals electronic medical record system. The data include: patient demographic characteristics, blood routine indicators and other variables. KNN filling method is used to fill in the missing data. The principle is to identify k samples that are spatially similar or close in the data set through distance measurement, and then use these k samples to estimate the value of the missing data point. The percentage of missing data points is presented in Supplementary Table 5. The LassoCV method was used to screen out key variables. Data entry was performed by a full-time research physician or medical student. This study was conducted and approved by the Ethics Committee of the third Xiangya Hospital of Central South University (No: 21149) and has been carried out in accordance with the Code of Ethics of the World Medical Association (Declaration of Helsinki) for experiments. All methods were performed in accordance with the relevant guidelines and regulations. The need of informed consent was waived by the Ethics Committee of the third Xiangya Hospital of Central South University due to retrospective nature of the study. The privacy of all participants is fully protected.
Patients were divided into hepatic fibrosis and non-hepatic fibrosis groups according to their color Doppler ultrasound results. Patients with hepatitis B virus (hepatitis B surface antigen seropositive), hepatitis C virus (HCV antibody seropositive), human immunodeficiency virus (HIV antibody seropositive), alcoholic and non-alcoholic fatty liver disease (ultrasound scanning and alcohol consumption above 30g daily), decompensated liver disease or liver cancer (ultrasound and liver function tests), and organ transplantation (self-reported) were excluded. The key variables are selected by LassoCV method for subsequent modeling.
First, the classification task was completed using 6 machine learning algorithms, including: XGB Classifier, Logistic Regression, LightGBM Classifier, Random Forest Classifier, Support Vector Classification, K Neighbors Classifier. Fivefold cross-validation method was used for validation. Each model was evaluated using AUC, clinical decision curve plot, accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score. The ROC diagram and the forest diagram show the ROC results of each model for the prediction of hepatic fibrosis.
After selecting the best algorithm through multi-algorithm model comparison, the best algorithm was used to model again. Different from multi-model comparison, when using the best-performing algorithm for modeling, we randomly select 15% of the total samples as the test set, and the remaining samples are used as the training set for fivefold cross-validation.
The SHAP package in python can interpret the output of machine learning models, considering all features as contributors. For each prediction sample, the model will generate a prediction value, and its biggest advantage is that it can reflect the influence of the characteristics in each sample and show the positive and negative effects. This study used the SHAP package to interpret the model. SHAP value plots were used to show the contribution of each variable in the model. Model variable importance plots were used to show the importance ranking of each variable. Force diagrams were used to illustrate how each variable affects the predicted outcome for each sample with two examples.
The python used in this study is version 3.7. The statsmodels 0.11.1 package in Python was used to count whether each variable was different between two groups of people. The analysis method was selected according to the distribution of samples, homogeneity of variance, and sample size. Chi-square test was used for categorical variables. Students t-test or MannWhitney U-test was used for quantitative variables.
In this study, LassoCV was used to screen key variables, and factors with a coefficient of 0 were automatically eliminated (sklearn 0.22.1 package in Python). Lasso obtains a more refined model by constructing a penalty function, so that it compresses some regression coefficients, that is, forces the sum of the absolute values of the coefficients to be less than a certain fixed value; at the same time, sets some regression coefficients to zero. Therefore, the advantage of subset shrinkage is preserved, and it is a biased estimate for dealing with data with multicollinearity. In the multi-model and best-model modeling process, the xgboost 1.2.1 package of Python is used for XGBoost algorithm modeling, the lightgbm 3.2.1 package of Python is used for LightGBM algorithm modeling, and the sklearn 0.22.1 package of Python was used to build other models. The shap 0.39.0 package in python was used to demonstrate the interpretability of the model.
Ethics approval was obtained from the Ethics Committee of the third Xiangya Hospital of Central South University.
Efficient and cost-effective multi-tenant LoRA serving with Amazon SageMaker | Amazon Web Services – AWS Blog
In the rapidly evolving landscape of artificial intelligence (AI), the rise of generative AI models has ushered in a new era of personalized and intelligent experiences. Organizations are increasingly using the power of these language models to drive innovation and enhance their services, from natural language processing to content generation and beyond.
Using generative AI models in the enterprise environment, however, requires taming their intrinsic power and enhancing their skills to address specific customer needs. In cases where an out-of-the-box model is missing knowledge of domain- or organization-specific terminologies, a custom fine-tuned model, also called a domain-specific large language model (LLM), might be an option for performing standard tasks in that domain or micro-domain. BloombergGPT is an example of LLM that was trained from scratch to have a better understanding of highly specialized vocabulary found in the financial domain. In the same sense, domain specificity can be addressed through fine-tuning at a smaller scale. Customers are fine-tuning generative AI models based on domains including finance, sales, marketing, travel, IT, HR, finance, procurement, healthcare and life sciences, customer service, and many more. Additionally, independent software vendors (ISVs) are building secure, managed, multi-tenant, end-to-end generative AI platforms with models that are customized and personalized based on their customers datasets and domains. For example, Forethought introduced SupportGPT, a generative AI platform for customer support.
As the demands for personalized and specialized AI solutions grow, businesses often find themselves grappling with the challenge of efficiently managing and serving a multitude of fine-tuned models across diverse use cases and customer segments. With the need to serve a wide range of AI-powered use cases, from resume parsing and job skill matching, domain-specific to email generation and natural language understanding, these businesses are often left with the daunting task of managing hundreds of fine-tuned models, each tailored to specific customer needs or use cases. The complexities of this challenge are compounded by the inherent scalability and cost-effectiveness concerns that come with deploying and maintaining such a diverse model ecosystem. Traditional approaches to model serving can quickly become unwieldy and resource intensive, leading to increased infrastructure costs, operational overhead, and potential performance bottlenecks.
Fine-tuning enormous language models is prohibitively expensive in terms of the hardware required and the storage and switching cost for hosting independent instances for different tasks. LoRA (Low-Rank Adaptation) is an efficient adaptation strategy that neither introduces inference latency nor reduces input sequence length while retaining high model quality. Importantly, it allows for quick task switching when deployed as a service by sharing the vast majority of the model parameters.
In this post, we explore a solution that addresses these challenges head-on using LoRA serving with Amazon SageMaker. By using the new performance optimizations of LoRA techniques in SageMaker large model inference (LMI) containers along with inference components, we demonstrate how organizations can efficiently manage and serve their growing portfolio of fine-tuned models, while optimizing costs and providing seamless performance for their customers.
The latest SageMaker LMI container offers unmerged-LoRA inference, sped up with our LMI-Dist inference engine and OpenAI style chat schema. To learn more about LMI, refer to LMI Starting Guide, LMI handlers Inference API Schema, and Chat Completions API Schema.
There are two kinds of LoRA that can be put onto various engines:
The new LMI container offers out-of-box integration and abstraction with SageMaker for hosting multiple unmerged LoRA adapters with higher performance (low latency and high throughput) using the vLLM backend LMI-Dist backend that uses vLLM, which in-turn uses S-LORA and Punica. The LMI container offers two backends for serving LoRA adapters: the LMI-Dist backend (recommended) and the vLLM Backend. Both backends are based on the open source vLLM library for serving LoRA adapters, but the LMI-Dist backend provides additional optimized continuous (rolling) batching implementation. You are not required to configure these libraries separately; the LMI container provides the higher-level abstraction through the vLLM and LMI-Dist backends. We recommend you start with the LMI-Dist backend because it has additional performance optimizations related to continuous (rolling) batching.
S-LoRA stores all adapters in the main memory and fetches the adapters used by the currently running queries to the GPU memory. To efficiently use the GPU memory and reduce fragmentation, S-LoRA proposes unified paging. Unified paging uses a unified memory pool to manage dynamic adapter weights with different ranks and KV cache tensors with varying sequence lengths. Additionally, S-LoRA employs a novel tensor parallelism strategy and highly optimized custom CUDA kernels for heterogeneous batching of LoRA computation. Collectively, these features enable S-LoRA to serve thousands of LoRA adapters on a single GPU or across multiple GPUs with a small overhead.
Punica is designed to efficiently serve multiple LoRA models on a shared GPU cluster. It achieves this by following three design guidelines:
Punica uses a new CUDA kernel design called Segmented Gather Matrix-Vector Multiplication (SGMV) to batch GPU operations for concurrent runs of multiple LoRA models, significantly improving GPU efficiency in terms of memory and computation. Punica also implements a scheduler that routes requests to active GPUs and migrates requests for consolidation, optimizing GPU resource allocation. Overall, Punica achieves high throughput and low latency in serving multi-tenant LoRA models on a shared GPU cluster. For more information, read the Punica whitepaper.
The following figure shows the multi LoRA adapter serving stack of the LMI container on SageMaker.
As shown in the preceding figure, the LMI container provides the higher-level abstraction through the vLLM and LMI-Dist backends to serve LoRA adapters at scale on SageMaker. As a result, youre not required to configure the underlying libraries (S-LORA, Punica, or vLLM) separately. However, there might be cases where you want to control some of the performance driving parameters depending on your use case and application performance requirements. The following are the common configuration options the LMI container provides to tune LoRA serving. For more details on configuration options specific to each backend, refer to vLLM Engine User Guide and LMI-Dist Engine User Guide.
Enterprises grappling with the complexities of managing generative AI models often encounter scenarios where a robust and flexible design pattern is crucial. One common use case involves a single base model with multiple LoRA adapters, each tailored to specific customer needs or use cases. This approach allows organizations to use a foundational language model while maintaining the agility to fine-tune and deploy customized versions for their diverse customer base.
An enterprise offering a resume parsing and job skill matching service may use a single high-performance base model, such as Mistral 7B. The Mistral 7B base model is particularly well-suited for job-related content generation tasks, such as creating personalized job descriptions and tailored email communications. Mistrals strong performance in natural language generation and its ability to capture industry-specific terminology and writing styles make it a valuable asset for such an enterprises customers in the HR and recruitment space. By fine-tuning Mistral 7B with LoRA adapters, enterprises can make sure the generated content aligns with the unique branding, tone, and requirements of each customer, delivering a highly personalized experience.
On the other hand, the same enterprise may use the Llama 3 base model for more general natural language processing tasks, such as resume parsing, skills extraction, and candidate matching. Llama 3s broad knowledge base and robust language understanding capabilities enable it to handle a wide range of documents and formats, making sure their services can effectively process and analyze candidate information, regardless of the source. By fine-tuning Llama 3 with LoRA adapters, such enterprises can tailor the models performance to specific customer requirements, such as regional dialects, industry-specific terminology, or unique data formats. By employing a multi-base model, multi-adapter design pattern, enterprises can take advantage of the unique strengths of each language model to deliver a comprehensive and highly personalized job profile to a candidate resume matching service. This approach allows enterprises to cater to the diverse needs of their customers, making sure each client receives tailored AI-powered solutions that enhance their recruitment and talent management processes.
Effectively implementing and managing these design patterns, where multiple base models are coupled with numerous LoRA adapters, is a key challenge that enterprises must address to unlock the full potential of their generative AI investments. A well-designed and scalable approach to model serving is crucial in delivering cost-effective, high-performance, and personalized experiences to customers.
The following sections outline the coding steps to deploy a base LLM, TheBloke/Llama-2-7B-Chat-fp16, with LoRA adapters on SageMaker. It involves preparing a compressed archive with the base model files and LoRA adapter files, uploading it to Amazon Simple Storage Service (Amazon S3), selecting and configuring the SageMaker LMI container to enable LoRA support, creating a SageMaker endpoint configuration and endpoint, defining an inference component for the model, and sending inference requests specifying different LoRA adapters like Spanish (es) and French (fr) in the request payload to use those fine-tuned language capabilities. For more information on deploying models using SageMaker inference components, see Amazon SageMaker adds new inference capabilities to help reduce foundation model deployment costs and latency.
To showcase multi-base models with their LoRA adapters, we add another base model, mistralai/Mistral-7B-v0.1, and its LoRA adapter to the same SageMaker endpoint, as shown in the following diagram.
You need to complete some prerequisites before you can run the notebook:
To prepare the LoRA adapters, create a adapters.tar.gz compressed archive containing the LoRA adapters directory. The adapters directory should contain subdirectories for each of the LoRA adapters, with each adapter subdirectory containing the adapter_model.bin file (the adapter weights) and the adapter_config.json file (the adapter configuration). We typically obtain these adapter files by using the PeftModel.save_pretrained() method from the Peft library. After you assemble the adapters directory with the adapter files, you compress it into a adapters.tar.gz archive and upload it to an S3 bucket for deployment or sharing. We include the LoRA adapters in the adapters directory as follows:
Download LoRA adapters, compress them, and upload the compressed file to Amazon S3:
SageMaker provides optimized containers for LMI that support different frameworks for model parallelism, allowing the deployment of LLMs across multiple GPUs. For this post, we employ the DeepSpeed container, which encompasses frameworks such as DeepSpeed and vLLM, among others. See the following code:
Create an endpoint configuration using the appropriate instance type. Set ContainerStartupHealthCheckTimeoutInSeconds to account for the time taken to download the LLM weights from Amazon S3 or the model hub, and the time taken to load the model on the GPUs:
Create a SageMaker endpoint based on the endpoint configuration defined in the previous step. You use this endpoint for hosting the inference component (model) inference and make invocations.
Now that you have created a SageMaker endpoint, lets create our model as an inference component. The SageMaker inference component enables you to deploy one or more foundation models (FMs) on the same SageMaker endpoint and control how many accelerators and how much memory is reserved for each FM. See the following code:
With the endpoint and inference model ready, you can now send requests to the endpoint using the LoRA adapters you fine-tuned for Spanish and French languages. The specific LoRA adapter is specified in the request payload under the "adapters" field. We use "es" for the Spanish language adapter and "fr" for the French language adapter, as shown in the following code:
Lets add another base model and its LoRA adapter to the same SageMaker endpoint for multi-base models with multiple fine-tuned LoRA adapters. The code is very similar to the previous code for creating the Llama base model and its LoRA adapter.
Configure the SageMaker LMI container to host the base model (mistralai/Mistral-7B-v0.1) and its LoRA adapter (mistral-lora-multi-adapter/adapters/fr):
Create a new SageMaker model and inference component for the base model (mistralai/Mistral-7B-v0.1) and its LoRA adapter (mistral-lora-multi-adapter/adapters/fr):
Invoke the same SageMaker endpoint for the newly created inference component for the base model (mistralai/Mistral-7B-v0.1) and its LoRA adapter (mistral-lora-multi-adapter/adapters/fr):
Delete the SageMaker inference components, models, endpoint configuration, and endpoint to avoid incurring unnecessary costs:
The ability to efficiently manage and serve a diverse portfolio of fine-tuned generative AI models is paramount if you want your organization to deliver personalized and intelligent experiences at scale in todays rapidly evolving AI landscape. With the inference capabilities of SageMaker LMI coupled with the performance optimizations of LoRA techniques, you can overcome the challenges of multi-tenant fine-tuned LLM serving. This solution enables you to consolidate AI workloads, batch operations across multiple models, and optimize resource utilization for cost-effective, high-performance delivery of tailored AI solutions to your customers. As demand for specialized AI experiences continues to grow, weve shown how the scalable infrastructure and cutting-edge model serving techniques of SageMaker position AWS as a powerful platform for unlocking generative AIs full potential. To start exploring the benefits of this solution for yourself, we encourage you to use the code example and resources weve provided in this post.
Michael Nguyen is a Senior Startup Solutions Architect at AWS, specializing in leveraging AI/ML to drive innovation and develop business solutions on AWS. Michael holds 12 AWS certifications and has a BS/MS in Electrical/Computer Engineering and an MBA from Penn State University, Binghamton University, and the University of Delaware.
Dhawal Patel is a Principal Machine Learning Architect at AWS. He has worked with organizations ranging from large enterprises to mid-sized startups on problems related to distributed computing, and Artificial Intelligence. He focuses on Deep learning including NLP and Computer Vision domains. He helps customers achieve high performance model inference on SageMaker.
Vivek Gangasani is a AI/ML Startup Solutions Architect for Generative AI startups at AWS. He helps emerging GenAI startups build innovative solutions using AWS services and accelerated compute. Currently, he is focused on developing strategies for fine-tuning and optimizing the inference performance of Large Language Models. In his free time, Vivek enjoys hiking, watching movies and trying different cuisines.
Qing Lan is a Software Development Engineer in AWS. He has been working on several challenging products in Amazon, including high performance ML inference solutions and high performance logging system. Qings team successfully launched the first Billion-parameter model in Amazon Advertising with very low latency required. Qing has in-depth knowledge on the infrastructure optimization and Deep Learning acceleration.
See more here:
Efficient and cost-effective multi-tenant LoRA serving with Amazon SageMaker | Amazon Web Services - AWS Blog