Page 529«..1020..528529530531..540550..»

Navigating the AI Landscape of 2024: Trends, Predictions, and Possibilities – Towards Data Science

The marketing domain, traditionally commanding a lions share of enterprise budgets, is now navigating through a transformative landscape. The catalyst? The rise of chat-based tools like ChatGPT. These innovations are potentially leading to a noticeable decline in traditional search volume, fundamentally altering how consumers engage with information.

In this evolving scenario, marketers find themselves at a crossroads. The ability to influence or monitor brand mentions in these AI-driven dialogues is still in its nascent stages. Consequently, theres a growing trend towards adapting marketing strategies for a generative AI world. This adaptation involves a strategic reliance on traditional media in the short term, leveraging its reach and impact to build and sustain brand presence.

Simultaneously, we are witnessing a significant shift in the technological landscape. The move from browser-based tools to on-device applications is gaining momentum. Leading this charge are innovations like Microsoft Co-Pilot, Google Bard on devices such as Android, and the anticipated launch of Apples own large language model (LLM) sometime in 2024. This transition indicates a paradigm shift from web-centric interactions to a more integrated, device-based AI experience.

This shift extends beyond mere convenience; it represents a fundamental change in user interaction paradigms. As AI becomes more seamlessly integrated into devices, the distinction between online and offline interactions becomes increasingly blurred. Users are likely to interact with AI in more personal, context-aware environments, leading to a more organic and engaging user experience. For tech giants like Google, Microsoft, and Apple, already entrenched in the marketing services world, this represents an opportunity to redefine their offerings.

We can anticipate the emergence of new answer analytics platforms and operating models in marketing to support answer engine optimisation. These tools will likely focus on understanding and leveraging the nuances of AI-driven interactions but potentially better leverage the training data to understand how the results might be portrayed for a given brand or product.

Digital marketeers will start to think more deeply about how they are indexed in these training datasets same as they once did with search engines.

Moreover, the potential launch of ad-sponsored results or media measurement tools by platforms like OpenAI could introduce a new dimension in digital advertising. This development would not only offer new avenues for brand promotion but also challenge existing digital marketing strategies, prompting a reevaluation of metrics and ROI assessment methodologies.

As LLMs migrate into devices, moving away from traditional web interfaces, the marketing landscape is poised for significant changes. Marketers must adapt to these shifts, leveraging both traditional media and emerging AI technologies, to effectively engage with their audiences in this new digital era. This dual approach, combining the impact of traditional media with the precision of AI-driven analytics, could very well be the key to success in the rapidly evolving marketing landscape of 2024.

See the article here:

Navigating the AI Landscape of 2024: Trends, Predictions, and Possibilities - Towards Data Science

Read More..

9 Effective Techniques To Boost Retrieval Augmented Generation (RAG) Systems – Towards Data Science

2023 was, by far, the most prolific year in the history of NLP. This period saw the emergence of ChatGPT alongside numerous other Large Language Models, both open-source and proprietary.

At the same time, fine-tuning LLMs became way easier and the competition among cloud providers for the GenAI offering intensified significantly.

Interestingly, the demand for personalized and fully operational RAGs also skyrocketed across various industries, with each client eager to have their own tailored solution.

Speaking of this last point, creating fully functioning RAGs, in todays post we will discuss a paper that reviews the current state of the art of building those systems.

Without further ado, lets have a look

If youre interested in ML content, detailed tutorials and practical tips from the industry, follow my newsletter. Its called The Tech Buffet.

I started reading this piece during my vacation

and its a must.

It covers everything you need to know about the RAG framework and its limitations. It also lists modern techniques to boost its performance in retrieval, augmentation, and generation.

The ultimate goal behind these techniques is to make this framework ready for scalability and production use, especially for use cases and industries where answer quality matters *a lot*.

I wont discuss everything in this paper, but here are the key ideas that, in my opinion, would make your RAG more efficient.

Continue reading here:

9 Effective Techniques To Boost Retrieval Augmented Generation (RAG) Systems - Towards Data Science

Read More..

Data Science in 2024: An Evolving Frontier for Analytics and Insights. – Medium

Data science has exploded in recent years as a field that extracts valuable insights from data to solve complex business problems. According to a 2021 report from Gartner, demand for data and analytics capabilities will set to increase five times by 2024. As businesses become more data-driven and the volume of data continues growing exponentially, data science will only increase in importance over the next few years. 2024 is positioned to mark a notable milestone in which modeling, algorithms, and infrastructure could propel data science into an even more vital strategic function impacting a variety of industries.

Given the surge of big data in motion from IoT sensors, social platforms, mobile devices, and other sources, AI and ML will be integral to efficient advanced analysis. Augmented analytics leverages autonomous techniques so people can shift from doing repetitive tasks to higher cognitive skill sets of critical thinking, insight discovery, and decision evaluation. As Gartner predicts, by 2025 over 50% of analytics queries will be generated using search or NLP-driven interactions rather than code-based authoring.

and visual-based exploration tools will empower business users without technical skills to access, interpret, and interact easily with data. This democratization will facilitate data literacy as analytics permeates across the organization. Metrics and dashboard customization will adapt to various personas and workflows through auto-generated content and recommendations. Predictive modeling will also progress more accessible and transparent self-service offerings as oversight requirements for trust and fairness increase.

While augmented analytics offloads the drudgery, it aims to elevate human judgment and creativity. The symbiotic combination of AI with people who relate context to numbers will lead to the most impactful, nuanced analysis ultimately guiding business strategy. Machines cannot wholly replace human emotion, acumen and domain expertise. As analytics becomes pervasive across enterprises, data science skills to discern high-value problems

Read more:

Data Science in 2024: An Evolving Frontier for Analytics and Insights. - Medium

Read More..

A Data Science Course Project About Crop Yield and Price Prediction I’m Still Not Ashamed Of – Towards Data Science

Hello, dear reader! During these Christmas holidays, I experienced a feeling of nostalgia for the past student years. Thats why I decided to write a post about a student project that was done almost four years ago as a project on the course Methods and models for Multivariate data analysis during my Masters degree in ITMO University.

Disclaimer: I decided to write this post for two reasons:

The article mentions, in tips format, good practices that Ive been able to apply during course project.

So, at the beginning of the course, we were informed that students could form teams of 23 people on our own and propose a course project that we would present at the end of the course. During the learning process (about 5 months), we will make intermediate presentations to our lecturers. This way, the professors can see how the progress is (or is not) going on.

After that, I immediately teamed up with my dudes: Egor and Camilo (just because we knew how to have fun together), and we started thinking about the topic

I suggested choosing

So, it was

Camilo also wanted to try to make dashboards with visualisations (using PowerBI), but pretty much any task would be suitable for this desire.

Tip 1: Choose a topic that you (and your colleagues) will be passionate about. It may not be the coolest project on a topic that is not very popular, but you will enjoy spending your evenings working on it

The course consisted of a big number of topics each of which is a set of methods for statistical analysis. We decided that we would try to forecast yield and crop price in as many different ways as possible and then ensemble the forecasts using some statistical method. This allowed us to try most of the methods discussed in the course in practice.

Also, the spatio-temporal data was truly multidimensional this related pretty well to the main theme of the course.

Spoiler: we all got a score 5 out of 5

We started with a literature review to understand exactly how crop yield and crop price are predicted. We also wanted to understand what kind of forecast error could be considered satisfactory.

I will not cite in this post the thesis resulting from this review. I will simply mention that we decided to use the following metric and threshold to evaluate the quality of the solution (for both crop yield and crop price):

Acceptable performance: Mean absolute percentage error (MAPE) for a reasonably good forecast should not exceed 10%

2 tip: Start your project (no matter at work or during your studies) with a review of contemporary solutions. Maybe the problem you are looking at now has already been solved.

3 tip: Before starting a development, determine what metric you will use to evaluate the solution. Remember, you cant improve what you cant measure.

Going back to the research, we have identified the following data sources (Links are up to date at 28th of December 2023):

Why these sources? We have assumed that the price of a crop will depend on the amount of product produced. And in agriculture, the quantity produced depends on weather conditions.

The model was implemented for:

So, we have started with an assumption: Wheat, rice, maize, and barley yields depend on weather conditions in the first half of the year (until 30 June) (Figure 2)

The source archives obtained from the European space Agency website contain netCDF files. The files have daily fields for the following parameters:

Based on the initial fields, the following parameters for the first half of each year were calculated:

Thus we obtained matrices for the whole territory of Europe with calculated features for the future model(s). The reader may notice that I calculate such a parameter as The sum of active temperatures above 10 degrees Celsius. This is such a popular parameter in ecology and botany that helps to determine the temperature optimums for different species of organisms (mainly plants, for example The sum of active temperatures as a method of determining the optimum harvest date of Sampion and Ligol apple cultivars)

4 tip: If you have expertise in the domain (which is not related to Data Science), make sure you use it in the project show that you are not only making a fit-predict but also adapting and improving domain-specific approaches

The next step is Aggregation of information by country. For values from the meteorological parameter matrices were extracted for each country separately (Figure 4).

I would note that this strategy made sense (Figure 5): For example, the picture shows that for Spain, wheat yields are almost unaffected by the sum of active temperatures. However, for the Czech Republic, a hotter first half of the year is more likely to result in lower yields. It is therefore a good idea to model yields separately for each country.

Not all of the countrys territory is suitable for agriculture. Therefore, it was necessary to aggregate information only from certain pixels. In order to account for the location of agricultural land, the following matrix was prepared (Figure 6).

So, weve got the data ready. However, agriculture is a very complex industry that has improved markedly year by year, decade by decade. It may make sense to limit the training sample for the model. For this purpose, we used the cumulative sum method (Figure 7):

Cumulative sum method:To each number from the sample, successive numbers are added sequentially to the following. That is, if the sample includes only three years: 1950, 1951, and 1952, the number for 1950 will be plotted on the Y-axis for 1950, and 1951 will show the sum of 1950 and 1951, etc.

- If the shape of the line is close to a straight line and there are no fractures, the sample is homogeneous

- If the shape of the line has fractures the sample is divided into 2 parts based on this fracture

If a fracture was detected, we compared the two samples for belonging to the general population (Kolmogorov-Smirnov statistic). If the samples were statistically significantly different, we used the second part to train the model for prediction. If not, we used the entire sample.

5 tip: Dont be afraid to combine approaches to statistical analysis (it is a course project!). For example, in the lecture we were not told about the cumulative sums method the topic was about comparing distributions. However, I have previously used this approach to compare trends in ice conditions during the processing of ice maps. It seemed to me that it could be useful here as well

I should note here that we have assumed that the process is ergodic, so we decided to compare in this way.

So, after the preparation, we are ready to start building statistical models lets take a look at the most interesting part!

The following features was included in the model:

Target variables: Yield of wheat, rice, maize, and barley

Validation years: 20082018 for each country

Lets move on to the visualisations to make it a little clearer.

And here is Figure 9 showing the residuals (residual = observed value -estimated (predicted) value) from the linear model for France and Italy:

It can be seen from the graphs that the metric is satisfactory, but the error distribution is biased from zero this means that the model has systematic error. We tried to correct in the new models below

Validation sample MAPE metric value: 10.42%

6 tip: Start with the simplest models (e.g. linear regression). This will give you a baseline against which you can compare improved versions of the model. The simpler the model, the better it is, as long as it shows a satisfactory metric

Weve turned the material from this lecture into a model Distribution analysis. The assumption was simple we analysed the distributions of climatic parameters for each year and for the current year and found an analogue year of the current one to predict the value of yield exactly the same as that of the known in the past (Figure 10).

Idea: Yields for years with similar weather conditions will be similar

The approach: Pairwise comparison of temperature, precipitation, and pressure distributions. Prediction-yield for the year that is most similar to the considered one

Distributions used:

For comparison of distributions we used Kruskal-Wallis test. To adjust p-value, a multiple testing correction is introduced the Bonferroni correction.

Validation sample MAPE metric value: 13.80%

7 tip: If you are doing multiple statistical testing, dont forget to include the correction (for example, Bonferroni correction)

One of the lectures was focused on the Bayesian networks. Therefore, we decided to adapt the approach for yield prediction. We considered that each year is described by a set of group of variables A, B, C etc. where A is a set of categories describing crop yields, B is, for example, the Sum of active temperatures conditions and so on. A, for example, could take only three values: High crop yield, Medium crop yield, Low crop yield. The same for B and C and others. Thus, if we categorise the conditions and the target variable, we obtain the following description of each year:

The algorithm was designed to predict a yield category based on a combination of three other categories:

How can we define these categories? by using a clustering algorithm! For example, the following 3 clusters were identified for wheat yields

The final forecast of this model the average yield of the predicted cluster.

Validation sample MAPE metric value: 14.55%

8 tip: Do experiment! Bayesian networks with clustering for time series forecasting? Sure! Pairwise analysis of distributions Why not? Sometimes the boldest approaches lead to significant improvements

Of course, we can forecast the target variable as a time series. Our task here was to understand how classical forecasting methods work in theory and practice.

Putting this method into practice proved to be the easiest. For example, in Python there are several libraries that allow to customise and apply the ARIMA model, for example pmdarima.

Validation sample MAPE metric value: 10.41%

9 tip: Dont forget the comparison with classical approaches. An abstract metric will not tell your colleague much about how good your model is, but a comparison with well-known standards will show the true level of performance

After all the models were built, we explored exactly how each model is mistaken (remember residual plots for linear regression model see Figure 9):

None of the presented algorithms allowed to overcome the 10% threshold (according to MAPE).

The Kalman filter was used to improve the quality of the forecast (to ensemble it). Satisfactory results have been achieved for some countries (Figure 15)

Validation sample MAPE metric value: 9.89%

10 tip: If I were asked to integrate the developed model into Production service, I would integrate either ARIMA or linear regression, even though the ensemble metric is better. However, metrics in business problems are sometimes not the key. A standalone model is sometimes better than an ensemble because it is simpler and more reliable (even if the error metric is slightly higher)

And the final part: model (lasso regression), which used predicted yield values and Futures features to estimate possible price values (Figure 16):

Mape on validation sample: 6.61%

So thats the end of the story. Above there were posted some of tips. And in the last paragraph, I want to summarise the final point and say why I am satisfied with that project. Here are three main items:

Well, we also got great marks on the exam XD

I hope your projects at university and elsewhere will be as exciting for you. Happy New Year!

Sincerely yours, Mikhail Sarafanov

Continue reading here:

A Data Science Course Project About Crop Yield and Price Prediction I'm Still Not Ashamed Of - Towards Data Science

Read More..

Deriving a Score to Show Relative Socio-Economic Advantage and Disadvantage of a Geographic Area – Towards Data Science

There exist publicly accessible data which describe the socio-economic characteristics of a geographic location. In Australia where I reside, the Government through the Australian Bureau of Statistics (ABS) collects and publishes individual and household data on a regular basis in respect of income, occupation, education, employment and housing at an area level. Some examples of the published data points include:

Whilst these data points appear to focus heavily on individual people, it reflects peoples access to material and social resources, and their ability to participate in society in a particular geographic area, ultimately informing the socio-economic advantage and disadvantage of this area.

Given these data points, is there a way to derive a score which ranks geographic areas from the most to the least advantaged?

The goal to derive a score may formulate this as a regression problem, where each data point or feature is used to predict a target variable, in this scenario, a numerical score. This requires the target variable to be available in some instances for training the predictive model.

However, as we dont have a target variable to start with, we may need to approach this problem in another way. For instance, under the assumption that each geographic areas is different from a socio-economic standpoint, can we aim to understand which data points help explain the most variations, thereby deriving a score based on a numerical combination of these data points.

We can do exactly that using a technique called the Principal Component Analysis (PCA), and this article demonstrates how!

ABS publishes data points indicating the socio-economic characteristics of a geographic area in the Data Download section of this webpage, under the Standardised Variable Proportions data cube[1]. These data points are published at the Statistical Area 1 (SA1) level, which is a digital boundary segregating Australia into areas of population of approximately 200800 people. This is a much more granular digital boundary compared to the Postcode (Zipcode) or the States digital boundary.

For the purpose of demonstration in this article, Ill be deriving a socio-economic score based on 14 out of the 44 published data points provided in Table 1 of the data source above (Ill explain why I select this subset later on). These are :

In this section, Ill be stepping through the Python code for deriving a socio-economic score for a SA1 region in Australia using PCA.

Ill start by loading in the required Python packages and the data.

### For dataframe operationsimport numpy as npimport pandas as pd

### For PCAfrom sklearn.decomposition import PCAfrom sklearn.preprocessing import StandardScaler

### For Visualizationimport matplotlib.pyplot as pltimport seaborn as sns

### For Validationfrom scipy.stats import pearsonr

file1 = 'data/standardised_variables_seifa_2021.xlsx'

### Reading from Table 1, from row 5 onwards, for column A to ATdata1 = pd.read_excel(file1, sheet_name = 'Table 1', header = 5,usecols = 'A:AT')

data1_dropna = data1.dropna()

An important cleaning step before performing PCA is to standardise each of the 14 data points (features) to a mean of 0 and standard deviation of 1. This is primarily to ensure the loadings assigned to each feature by PCA (think of them as indicators of how important a feature is) are comparable across features. Otherwise, more emphasis, or higher loading, may be given to a feature which is actually not significant or vice versa.

Note that the ABS data source quoted above already have the features standardised. That said, for an unstandardised data source:

### Take all but the first column which is merely a location indicatordata_final = data1_dropna.iloc[:,1:]

### Perform standardisation of datasc = StandardScaler()sc.fit(data_final)

### Standardised datadata_final = sc.transform(data_final)

With the standardised data, PCA can be performed in just a few lines of code:

pca = PCA()pca.fit_transform(data_final)

PCA aims to represent the underlying data by Principal Components (PC). The number of PCs provided in a PCA is equal to the number of standardised features in the data. In this instance, 14 PCs are returned.

Each PC is a linear combination of all the standardised features, only differentiated by its respective loadings of the standardised feature. For example, the image below shows the loadings assigned to the first and second PCs (PC1 and PC2) by feature.

With 14 PCs, the code below provides a visualization of how much variation each PC explains:

exp_var_pca = pca.explained_variance_ratio_plt.bar(range(1, len(exp_var_pca) + 1), exp_var_pca, alpha = 0.7,label = '% of Variation Explained',color = 'darkseagreen')

plt.ylabel('Explained Variation')plt.xlabel('Principal Component')plt.legend(loc = 'best')plt.show()

As illustrated in the output visualization below, Principal Component 1 (PC1) accounts for the largest proportion of variance in the original dataset, with each following PC explaining less of the variance. To be specific, PC1 explains circa. 35% of the variation within the data.

For the purpose of demonstration in this article, PC1 is chosen as the only PC for deriving the socio-economic score, for the following reasons:

### Using df_plot dataframe per Image 1

sns.heatmap(df_plot, annot = False, fmt = ".1f", cmap = 'summer') plt.show()

To obtain a score for each SA1, we simply multiply the standardised portion of each feature by its PC1 loading. This can be achieved by:

### Perform sum product of standardised feature and PC1 loadingpca.fit_transform(data_final)

### Reverse the sign of the sum product above to make output more interpretablepca_data_transformed = -1.0*pca.fit_transform(data_final)

### Convert to Pandas dataframe, and join raw score with SA1 columnpca1 = pd.DataFrame(pca_data_transformed[:,0], columns = ['Score_Raw'])score_SA1 = pd.concat([data1_dropna['SA1_2021'].reset_index(drop = True), pca1], axis = 1)

### Inspect the raw scorescore_SA1.head()

The higher the score, the more advantaged a SA1 is in terms its access to socio-economic resource.

How do we know the score we derived above was even remotely correct?

For context, the ABS actually published a socio-economic score called the Index of Economic Resource (IER), defined on the ABS website as:

The Index of Economic Resources (IER) focuses on the financial aspects of relative socio-economic advantage and disadvantage, by summarising variables related to income and housing. IER excludes education and occupation variables as they are not direct measures of economic resources. It also excludes assets such as savings or equities which, although relevant, cannot be included as they are not collected in the Census.

Without disclosing the detailed steps, the ABS stated in their Technical Paper that the IER was derived using the same features (14) and methodology (PCA, PC1 only) as what we had performed above. That is, if we did derive the correct scores, they should be comparable against the IER scored published here (Statistical Area Level 1, Indexes, SEIFA 2021.xlsx, Table 4).

As the published score is standardised to a mean of 1,000 and standard deviation of 100, we start the validation by standardising the raw score the same:

score_SA1['IER_recreated'] = (score_SA1['Score_Raw']/score_SA1['Score_Raw'].std())*100 + 1000

For comparison, we read in the published IER scores by SA1:

file2 = 'data/Statistical Area Level 1, Indexes, SEIFA 2021.xlsx'

data2 = pd.read_excel(file2, sheet_name = 'Table 4', header = 5,usecols = 'A:C')

data2.rename(columns = {'2021 Statistical Area Level 1 (SA1)': 'SA1_2021', 'Score': 'IER_2021'}, inplace = True)

col_select = ['SA1_2021', 'IER_2021']data2 = data2[col_select]

ABS_IER_dropna = data2.dropna().reset_index(drop = True)

Validation 1 PC1 Loadings

As shown in the image below, comparing the PC1 loading derived above against the PC1 loading published by the ABS suggests that they differ by a constant of -45%. As this is merely a scaling difference, it doesnt impact the derived scores which are standardised (to a mean of 1,000 and standard deviation of 100).

(You should be able to verify the Derived (A) column with the PC1 loadings in Image 1).

Validation 2 Distribution of Scores

The code below creates a histogram for both scores, whose shapes look to be almost identical.

score_SA1.hist(column = 'IER_recreated', bins = 100, color = 'darkseagreen')plt.title('Distribution of recreated IER scores')

ABS_IER_dropna.hist(column = 'IER_2021', bins = 100, color = 'lightskyblue')plt.title('Distribution of ABS IER scores')

plt.show()

Validation 3 IER score by SA1

As the ultimate validation, lets compare the IER scores by SA1:

## Plot scores on x-y axis. ## If scores are identical, it should show a straight line.

plt.scatter('IER_recreated', 'IER_2021', data = IER_join, color = 'darkseagreen')plt.title('Comparison of recreated and ABS IER scores')plt.xlabel('Recreated IER score')plt.ylabel('ABS IER score')

plt.show()

A diagonal straight line as shown in the output image below supports that the two scores are largely identical.

To add to this, the code below shows the two scores have a correlation close to 1:

The demonstration in this article effectively replicates how the ABS calibrates the IER, one of the four socio-economic indexes it publishes, which can be used to rank the socio-economic status of a geographic area.

Taking a step back, what weve achieved in essence is a reduction in dimension of the data from 14 to 1, losing some information conveyed by the data.

Dimensionality reduction technique such as the PCA is also commonly seen in helping to reduce high-dimension space such as text embeddings to 23 (visualizable) Principal Components.

Visit link:

Deriving a Score to Show Relative Socio-Economic Advantage and Disadvantage of a Geographic Area - Towards Data Science

Read More..

Understand Naive Bayes Algorithm | NB Classifier – Towards Data Science

Photo by Google DeepMind on Unsplash

This year, my resolution is to go back to the basics of data science. I work with data every day, but its easy to forget how some of the core algorithms function if youre completing repetitive tasks. Im aiming to do a deep dive into a data algorithm each week here on Towards Data Science. This week, Im going to cover Naive Bayes.

Just to get this out of the way, you can learn how to pronounce Naive Bayes here.

Now that we know how to say it, lets look at what it means

This probabilistic classifier is based on Bayes theorem, which can be summarized as follows:

The conditional probability of an event when a second event has already occurred is the product of event B, given A and the probability of A divided by the probability of event B.

P(A|B) = P(B|A)P(A) / P(B)

A common misconception is that Bayes Theorem and conditional probability are synonymous.

However, there is a distinction Bayes Theorem uses the definition of conditional probability to find what is known as the reverse probability or the inverse probability.

Said another way, the conditional probability is the probability of A given B. Bayes Theorem takes that and finds the probability of B given A.

A notable feature of the Naive Bayes algorithm is its use of sequential events. Put simply, by acquiring additional information later, the initial probability is adjusted. We will call these the prior probability/marginal probability and the posterior probability. The main takeaway is that by knowing another conditions outcome, the initial probability changes.

A good example of this is looking at medical testing. For example, if a patient is dealing with gastrointestinal issues, the doctor might suspect Inflammatory Bowel Disorder (IBD). The initial probability of having this condition is about 1.3%.

Follow this link:

Understand Naive Bayes Algorithm | NB Classifier - Towards Data Science

Read More..

How data science is changing the world for the better in 2023 – Mastercard

Year in review December 27, 2023 | By Caroline Morris

With political, social and climate crises mounting throughout the world, its been a tough year for optimism. Yet that is what Michael Miebach offered at the 2023 United Nations General Assembly this September. In a speech before the U.N. Security Council, Mastercards CEO laid out how well-planned public-private partnerships can help advance humanitarian causes. He also emphasized that one of the most valuable contributions private entities can make is not necessarily money, but their expertise.

For Mastercard, that expertise is the technology and tools to analyze data, and the company is using it to put issues of sustainability, equity and inclusion into a broader context and zero in on individuals who need the most help.

When responsibly used, data can be an enormous force for good in the world. Here are a few examples of how that evolved over the past year.

Creating safe, happy homes for those in need

After the war broke out in Ukraine in 2022, refugees were pouring across the border into Poland. Among the displaced was Alona, below, who traveled with her young son. But even in asylum, they faced significant challenges as they tried to adapt to a new country in which many cities offered scarce work and housing options.

Alona found Where to Settle, Mastercards platform to help Ukrainian refugees figure out the ideal place to live by presenting users with the estimated cost of living, potential job opportunities and housing offers in different locations. Now, in their new hometown of Sochaczew, Alona and her son are thriving.

And the platform is expanding beyond refugees. Polish students are taking advantage of the app as they move for university. The idea is that Where to Settle which Fortune magazine recognized in itsannual Change the World listof companies mobilizing the creative tools of capitalism to help solve social problems will eventually be available wherever Mastercard is available.

Empowering informal workers

In impoverished countries like Mozambique, informal workers those whose sources of income are not recorded by the government struggle to make ends meet, and they do not possess the resources or know-how to grow their businesses. However, Data for Workforce Nurturing, or D4WN, is empowering these workers to earn a living wage. Pulling data outputs from Com-Hector, a virtual assistant that helps low-income workers with advertising and business management, and Biscate, a digital job board for the informal sector where clients can find workers, D4WN provides business insights to users so they can build their businesses and gain financial resilience and independence.

D4WN was one of nine awardees around the world to winData.orgs$10 millionInclusive Growth and Recovery Challenge, which sought innovative examples of data for social impact, with support from the Mastercard Center for Inclusive Growthandthe Rockefeller Foundation.

Keeping humans at the center of AI

This year, Mastercard hosted its second annual Impact Data Summit, bringing together leaders in the world of AI, tech and data science to talk about how data can help solve some of humanitys biggest challenges, including climate action, gender equality and inclusive economic growth.

Participants acknowledged that, for all of its promise, AI can misconstrue or misrepresent data to exacerbate existing social issues or even spark new problems. The key to leveraging technology is keeping people at the center, from using inclusive data to developing impactful public-private partnerships to making sure that AI solutions serve human good above all else.

See the original post here:

How data science is changing the world for the better in 2023 - Mastercard

Read More..

AI and Data Science How to Coexist and Thrive in 2024 – Analytics Insight

Embark on a journey into the future of technology as we delve into the dynamic synergy between Artificial Intelligence (AI) and Data Science. From the foundational understanding of AI and Data Science to the pivotal trends shaping the upcoming year, this exploration aims to provide insights into the coexistence and potential of these cutting-edge technologies.

Artificial Intelligence (AI), a dynamic field in computer science, seeks to equip machines with human-like capabilities, revolutionizing various sectors such as medicine, finance, transportation, entertainment, and education. Utilizing search algorithms and optimization techniques, AI promises profound changes in our daily lives and work environments.

Complementing AI, Data Science acts as the alchemy of insights in data-driven decision-making. Integrating scientific methods and algorithms, Data Science extracts valuable insights from diverse data sets, informing decision-making, predicting future events, and addressing complex issues.

The collaboration between AI and Data Science emerges as a transformative force, unlocking the potential for intelligent insights and shaping a future where machines leverage data-driven understanding for meaningful actions.

AI (Artificial Intelligence) and Data Science have revolutionized how businesses function, make decisions, and extract insights from data. The domains of Data Science and Artificial Intelligence (AI) are intricately connected, and their mutually beneficial evolution is expected to continue in 2024.

According to a Gartner report, a significant shift is expected by 2024, with 60% of data for AI being synthetic. This synthetic data aims to simulate reality, anticipate future scenarios, and mitigate risks associated with AI, marking a substantial increase from the mere 1% recorded in 2021.

As we look ahead, several noteworthy trends are poised to shape the landscape of AI and Data Science in 2024:

Generative AI for Synthetic Data: The emergence of generative AI is anticipated to revolutionize data creation, enabling the generation of synthetic data for training models. This not only expands the availability of diverse datasets but also addresses privacy concerns associated with real-world data.

AI-Powered Chips for Enhanced Processing: The integration of AI-powered chips helps in faster and more efficient data processing. These specialized chips are designed to handle the intricate computations inherent in AI algorithms, contributing to improved performance and reduced processing times.

Explainable AI for Transparency: The demand for transparency and accountability in AI decision-making is driving the development of explainable AI. In 2024, there is a growing emphasis on creating models and algorithms that provide clear insights into their decision processes, fostering trust and understanding.

AI-Powered Healthcare: The healthcare sector is poised for transformation through AI applications that enhance patient outcomes and reduce costs. From predictive diagnostics to personalized treatment plans, AI is becoming an integral part of revolutionizing healthcare delivery.

AI-Powered Cybersecurity: AI-powered cybersecurity tools are expected to play a crucial role in detecting and preventing cyber-attacks, offering proactive defense mechanisms against evolving security challenges.

AI-Powered Chatbots for Customer Service: The adoption of AI-powered chatbots is set to rise, enhancing customer service and engagement. These intelligent bots can efficiently handle queries, provide information, and improve the overall customer experience.

AI-Powered Personalization: The trend of AI-powered personalization continues to gain momentum, offering tailored experiences to customers. From content recommendations to personalized marketing strategies, AI is driving a shift towards more individualized interactions.

Cloud Adoption Surges: More businesses are anticipated to embrace cloud computing, with a rising trend of companies transferring their data to cloud platforms.

Expansion of Predictive Analytics: The utilization of predictive analytics is poised for growth, as an increasing number of companies leverage machine learning algorithms to forecast future trends.

Upsurge in Cloud-Native Solutions: The development of applications specifically tailored for cloud environments is projected to escalate, with a growing number of companies adopting cloud-native solutions.

Rise in Augmented Consumer Interfaces: The utilization of augmented reality (AR) and virtual reality (VR) is expected to surge, as an increasing number of companies integrate these technologies to craft immersive experiences for their customers.

Heightened Focus on Data Regulation: The significance of data regulation is expected to intensify, with a growing number of companies directing their attention towards ensuring data privacy and security.

In conclusion, the dynamic evolution of Data Science and Artificial Intelligence is not only intriguing but also holds immense promise for shaping various facets of our lives. The ongoing developments and trends in these fields underscore their potential to create groundbreaking innovations and transformative impacts across diverse industries.

Join our WhatsApp and Telegram Community to Get Regular Top Tech Updates

Excerpt from:

AI and Data Science How to Coexist and Thrive in 2024 - Analytics Insight

Read More..

Optimizing Retrieval-Augmented Generation (RAG) by Selective Knowledge Graph Conditioning – Towards Data Science

How SURGE substantially improves knowledge relevance through targeted augmentation while retaining language fluency

Generative pre-trained models have shown impressive fluency and coherence when used for dialogue agents. However, a key limitation they suffer from is the lack of grounding in external knowledge. Left to their pre-trained parameters alone, these models often generate plausible-sounding but factually incorrect responses, also known as hallucinations.

Prior approaches to mitigate this have involved augmenting the dialogue context with entire knowledge graphs associated with entities mentioned in the chat. However, this indiscriminate conditioning on large knowledge graphs brings its own problems:

Limitations of Naive Knowledge Graph Augmentation:

To overcome this, Kang et al. 2023 propose the SUbgraph Retrieval-augmented GEneration (SURGE) framework, with three key innovations:

This allows providing precisely the requisite factual context to the dialogue without dilution from irrelevant facts or model limitations. Experiments show SURGE reduces hallucination and improves grounding.

Follow this link:

Optimizing Retrieval-Augmented Generation (RAG) by Selective Knowledge Graph Conditioning - Towards Data Science

Read More..

Embracing Data-driven Resolutions: A Tech-savvy Start to the New Year – Medium

As we bid farewell to another year and usher in the promises of a new one, its the perfect time for reflection, renewal, and setting new goals. For tech enthusiasts and data scientists, what better way to kick off the New Year than by leveraging our skills to make data-driven resolutions? Lets explore how we can merge the world of technology and data science to enhance our personal and professional lives in 2024.

1. Reflect on Your Data:As data scientists, we thrive on insights gained from analyzing information. Apply this principle to your personal life by reflecting on your past years experiences. Utilize data visualization tools to create a visual representation of your achievements, challenges, and areas for improvement. This self-analysis can provide a clear roadmap for the year ahead.

2. Set SMART Goals:In the tech world, we often emphasize the importance of setting Specific, Measurable, Achievable, Relevant, and Time-bound (SMART) goals. Extend this methodology to your personal and professional objectives. Whether its learning a new programming language, completing a certification, or launching a personal data project, make your goals SMART to ensure they are well-defined and achievable.

3. Optimize Your Routine with Data:Data science is all about optimization. Apply this mindset to your daily routine. Analyze your habits, identify inefficiencies, and optimize your workflow. Leverage productivity tools and apps to streamline tasks, prioritize responsibilities, and enhance your overall efficiency.

4. Learn and Stay Curious:The tech landscape is ever-evolving, and as data scientists, our success is rooted in continuous learning. Commit to expanding your skill set in the New Year. Explore emerging technologies, enroll in online courses, and stay updated on the latest trends in data science. Embrace a curious mindset that fuels both personal and professional growth.

5. Collaborate and Share Knowledge:One of the strengths of the tech community is its collaborative spirit. In 2024, make a resolution to actively engage with your peers. Join online forums, participate in tech communities, and contribute your expertise. Whether its sharing insights from a recent project or seeking advice, collaboration enhances the collective knowledge of the community.

6. Use Tech for Wellness:In the fast-paced world of tech, its crucial to prioritize wellness. Leverage technology and data to monitor and improve your well-being. From fitness trackers to mindfulness apps, integrate tech solutions into your routine that promote a healthy work-life balance.

Conclusion:As we step into the New Year, lets harness the power of technology and data science to shape a future that is not only innovative but also personally fulfilling. By setting data-driven resolutions, we can navigate the challenges ahead with clarity, curiosity, and a commitment to continuous improvement. Heres to a tech-savvy and data-driven journey in 2024! Happy New Year!

Go here to read the rest:

Embracing Data-driven Resolutions: A Tech-savvy Start to the New Year - Medium

Read More..