Category Archives: Data Science

Crunching numbers isn’t enough; you also have to explain results – University of Colorado Boulder

CU Boulder researcher Eric Vance recently won the W.J. Dixon Award for Excellence in Statistical Consulting, in recognition of his work to help statisticians and data scientists become better communicators

The skills of statistics and data science are broad and varied, requiring those who use them not only to ask the right questions and capture the right data, but to process and analyze it and then convey what they discovered.

Students of statistics and data science are taught methods and modeling, theyre taught to code and to troubleshoot, but how do we teach students in statistics and data science to become more effective collaborators? asks Eric Vance, a University of Colorado Boulder associate professor of applied mathematics.

The thing about modern statistics is that almost anybody can upload an Excel spreadsheet to a statistical software program, do some stuff and get answers. You can have people who understand data, who understand methods and the appropriate conditions to use those methods. But what we want is to grow the number of well-trained data scientists who understand that the context of data matters and who also have that drive to see their work put into action for the benefit of society and know how to collaborate to make that happen.

Eric Vance (center), a CU Boulder associate professor of applied mathematics, is a Fulbright fellow in Indonesia for the 2023-24 academic year. Hes working with colleagues at IPB University to develop a course in effective statistics and data science collaboration

For most of his career, Vance has recognized that its not enough to be good at statistics and data sciencestudents entering these fields must also learn communication and project-management skills to become effective collaborators. He has designed curricula and academic programs that promote this goal, work that recently was recognized with the American Statistical Associations W.J. Dixon Award for Excellence in Statistical Consulting.

The award recognizes individuals who have demonstrated excellence in statistical consulting or developed and contributed new methods, software or ways of thinking that improve statistical practice in general.

As the youngest winner by at least 15 years, Vance is in the middle rather than at the close of his career, which is good because theres still a lot I want to do to translate my framework for collaboration into different languages and cultures, and to build it up across disciplines.

Doing good with data

Since the beginning of Vances academic career, which started as director of the Laboratory for Interdisciplinary Statistical Analysis at Virginia Tech, I noticed that my students were really good in statistical methods, but only some of them were really good in the non-technical skills, the communication skills, he says.

Part of my job was also to teach statistical consulting, so I started to think about what are the key aspects that a student needs to know, that a student can learn to become an effective, collaborative statistician?

Good data scientists have a deep store of quantitative skills, he says, and many enter the field because they want to work with real data and pursue projects that help society and benefit humanity. Plus, in this hyper-plugged-in world, data are everywherepowerful data in huge datasets with the potential to have sweeping effects. The demand for people who can analyze data properly and leverage them appropriately is growing.

But what I noticed is kind of holding statisticians and scientists back is not technical skillsits not that they dont know the latest analysis techniquebut its that they dont have the communication skills, Vance says. That became my focus: What is it that a student or a data scientist needs to know to effectively unlock the technical skills to do the most good?

At CU Boulder, Vance established and directs the Laboratory for Interdisciplinary Statistical Analysis (LISA), housed in the Department of Applied Mathematics, to teach students to become effective interdisciplinary collaborators who can apply statistical analysis and data science to enable and accelerate research on campus and making data-driven business decisions and policy interventions in the community.

Vance explains that often statisticians and data scientists are not the ones collecting the data they analyze, so if we want to develop new methods, we need to have data, and who has data? Everybody else. Domain experts are everywhere around world, so statistics and data science should be collaborative disciplines, and students should learn to work with a chemist or a biologist or an English professor or an elected official to help them think about what kind of data they have, help them collect high-quality data and transform into policy and action.

More than just good with data

Vance and his colleagues have built LISA into the center of the global LISA 2020 Global Network of statics labs that aim to strengthen local capacity in statistical analysis and data science and to transform academic evidence into action for development.

You cant just be good with data anymore; you have to be able to communicate why it matters.

The LISA 2020 Global Network comprises 35 statistics labs in 10 countries, including Nigeria, Brazil and Pakistan. Vance is now a Fulbright fellow in Indonesia, where hes working with colleagues at IPB University to develop a course in effective statistics and data science collaboration and establish a new statistics and data science collaboration center.

Several years ago, Vance and research colleague Heather Smith developed the ASCCR frameworkwhich stands for attitude, structure, content, communication and relationshipto support this model of statistics and data science education that incorporates collaboration skills. Vances work in Indonesia is also exploring how to adapt ASCCR within different cultural contexts.

We want statistics and data science students around the world to have the skills to collaborate and communicate with domain experts, Vance says. Maybe its a researcher around campus, maybe a local policy maker, maybe a local businesspersonanybody who has data and wants to be able to do something with the data, make a decision based on the data or come to some conclusion.

We want students to become people who can talk with a domain expert to understand what the problem is, what the data are, how they were collected, the provenance of the data, and then figure out what that the domain expert actually wants to do with the data. That means understanding the workflow of collaboration before actually analyzing the data and coming up with some statistical results. Then they need to translate those results to answer the original research question or come up with a conclusion and recommendations for action. You cant just be good with data anymore; you have to be able to communicate why it matters.

Did you enjoy this article?Subcribe to our newsletter.Passionate about applied mathematics?Show your support.

Read more from the original source:

Crunching numbers isn't enough; you also have to explain results - University of Colorado Boulder

User Churn Prediction. Modern data warehousing and Machine | by Mike Shakhomirov | Dec, 2023 – Towards Data Science

Modern data warehousing and Machine Learning12 min read

No doubt, user retention is a crucial performance metric for many companies and online apps. We will discuss how we can use built-in data warehouse machine learning capabilities to run propensity models on user behaviour data to determine the likelihood of user churn. In this story, I would like to focus on dataset preparation and model training using standard SQL. Modern data warehouses allow this. Indeed, retention is an important business metric that helps understand user behaviours mechanics. It provides a high-level overview of how successful our Application is in terms of retaining users by answering one simple question: Is our App good enough at retaining users? It is a well-known fact that its cheaper to retain an existing user than to acquire a new one.

In one of my previous articles, I wrote about modern data warehousing [1].

Modern DWH has a lot of useful features and components which differentiate them from other data platform types [2].

ML model support seems to be the foundational DWH component when dealing with big data.

In this story, I will use Binary logistic regression, one of the fastest models to train. I will demonstrate how we can use it to predict user propensity to churn. Indeed, We dont need to know every machine-learning model.

We cant compete with cloud service providers such as Amazon ang Google in machine learning and data science but we need to know how to use it.

I previously wrote about it in my article here [3]:

In this tutorial, we will learn how to transform raw event data to create a training dataset for our ML

See the rest here:

User Churn Prediction. Modern data warehousing and Machine | by Mike Shakhomirov | Dec, 2023 - Towards Data Science

5 Questions Every Data Scientist Should Hardcode into Their Brain – Towards Data Science

Photo by Tingey Injury Law Firm on Unsplash

Despite all the math and programming, data science is more than just analyzing data and building models. When you boil it down, the key objective of data science is to solve problems.

The trouble, however, is that at the outset of most data science projects, we rarely have a well-defined problem. In these situations, the role of the data scientist isnt to have all the answers but to ask the right questions.

In this article, Ill break down 5 questions every data scientist should hardcode into their brain to make problem discovery second nature.

When I began my data science journey in grad school, I had a naive view of the discipline. Namely, I was hyper-focused on learning tools and technologies (e.g. LSTM, SHAP, VAE, SOM, SQL, etc.)

While a technical foundation is necessary to be a successful data scientist, focusing too much on tools creates the Hammer Problem (i.e. when you have a really nice hammer, everything looks like a nail).

This often leads to projects which are intellectually stimulating yet practically useless.

My perspective didnt fully mature until I graduated and joined the data science team at a large enterprise, where I was able to learn from those years (if not decades) ahead of me.

The key lesson was the importance of focusing on problems rather than technologies. What this means is gaining a (sufficiently) deep understanding of the business problem before writing a single line of code.

Since, as data scientists, we typically dont solve our own problems, we gain this understanding through conversations with clients and stakeholders. Getting this right is important because, if you dont, you can end up spending a lot of time (and money) solving the wrong problem. This is where problem discovery questions come in.

6 months ago, I left my corporate data science job to become an independent AI consultant (to fund my entrepreneurial ventures)

See original here:

5 Questions Every Data Scientist Should Hardcode into Their Brain - Towards Data Science

Data Science Career Paths, Skills, and Special Projects: Our Best Reads of 2023 – Towards Data Science

2023 may have been the year of the LLMwe highlighted our most popular articles on ChatGPT and related topics last weekbut data science and machine learning are far too vast for us to reduce them to a single phenomenon (as inescapable as it might be).

Every day, TDS authors publish excellent work on a staggering range of topics, from the latest tools of the trade to career insights and project walkthroughs. For our final Variable edition of the year, we decided to highlight some of the most memorable and widely read posts weve shared around three themes: programming for data scientists, career growth, and and creative projects and opinion pieces. They do a fantastic job showing just how vibrant, diverse, and dynamic this fieldand our communityis.

We hope you enjoy our selection, and thank you once again for all your support over the past year.

Here is the original post:

Data Science Career Paths, Skills, and Special Projects: Our Best Reads of 2023 - Towards Data Science

7 Reasons Why Youre Struggling to Land a Data Science Job – KDnuggets

Tired of applying to data science roles and not hearing back from companies? Perhaps you managed to land a couple of interviews but weren't able to convert them to offers? Well, youre not alone.

The job market is brutally competitive now. So just because it's difficult doesn't mean you're not good enough. That said, it's both important and helpful to take a step back and see how and where you can improve. And thats exactly what this guide will help you with.

Well go over common reasons why aspiring data professionals like you struggle to make the cut. And how you can improve your chances of landing interviews and getting that job you want!

It's a hard truth. So let's face it.

Say youve applied to a bunch of data science roles at companies that youre interested in. And have been shortlisted for interviews.

Congratulations! Youre on the right track. The next goal is to convert the interview opportunity to a job offer. And the first step is to crack that coding interview.

Youll first have a round of timed coding interviewstesting your problem-solving skillsfollowed by an SQL coding round.

But coding interviews are difficult to crackeven for experienced professionals. But consistent practice and spaced repetition can help you successfully crack these interviews.

Regularly practice coding interview questions on platforms like Leetcode and Hackerrank.

If you are looking for resources check out:

Once you clear coding interviews, focus and prepare for technical rounds. Brush up your machine learning fundamentals. Also review your projects so you can explain their impact with confidence.

It is true that recruiters spend only a few seconds reviewing your resume and decide if it proceeds to the next phase or to the reject pile.

So you should put in conscientious efforts to draft your resume. Be sure to tailor your resume based on the job specifications.

Here are a few resume tips:

Ill also suggest using a simple single-column layout that's easier to parse than complicated and fancy layouts.

When youre applying to jobs, your resume and LinkedIn profile should be consistent without any conflicting details. And they should also be aligned with the experience and skill set that the role demands.

There are a couple of caveats you should avoid, though.

Suppose youre interested in medical imaging and computer vision. So almost all your projects are in computer vision. Such a profile may be a great fit for a computer vision engineer or a computer vision researcher role.

But what if youre applying to a data scientist role at a FinTech company? Clearly, you don't stand out as a strong candidate.

If you are an aspiring data scientist with strong SQL skills and experience building machine learning models, you can apply for the roles of data analyst and machine learning engineer as well.

But you don't want to make your resume/candidate profile look like youre someone who wants to be a data analyst, a machine learning engineer, and a data scientistall at once.

If youre interested in all of these roles, have separate resumes for each.

Its important to find a sweet middle ground that allows you to showcase your expertise and stand out as a potential candidate with a broad skill set that is aligned with the jobs requirements.

Your projects help you gain a competitive edge over other candidates. So choose them wisely.

Some aspiring data professionals put on their resume and portfolio certain projects which they shouldn't be. Yes, there are some beginner projects which are good for learningbut you should AVOID showcasing them in your portfolio.

Here are a few:

Just to name a few. These projects are too generic and basic to be able to land you an interview (let alone job offers).

So what are some interesting projectsespecially if you are a beginner who is looking to break into this field?

Here are some beginner-level projects that would help you showcase your skills and emerge as a stronger candidate:

Use real-world datasets to build your projects. This way you can showcase a lot of important skills: data collection, data cleaning, and exploratory data analysis besides model building.

Also include projects that are inspired by your interest. As Id suggested in a previous pandas guide, try turning data from your interests and hobbies into interesting projects that will help you leave an impression on the interviewer.

Another common road block aspiring data professionals face is their educational background. Breaking into data science can be especially difficult if you have majored in a field such as sociology, psychology, and the like.

While your skillshard and soft skillsmatter eventually, you should remember that you are competing with those who have an undergraduate or advanced degree in a related field.

So what can you do about this?

Look for ways to constantly upskill yourself. Remember, once you land your first data role, you can leverage your experience going forward.

Look for ways to work on relevant projects within your company. If your company has a dedicated data team, try to accept a small side project.

Learning in public is super important, especially when you are trying to land your first job (and even after that, honestly).

I started writing online in late 2020. Since then, Ive landed most of my opportunities through my worktutorials and technical deep divesthat I published online.

So how and where do you start? Leverage social media platforms like LinkedIn and Twitter (X) to share your work with the community:

What you code on your laptop stays on your laptop. So be ready to put yourself out there and share what you build and learn.

Building a strong portfolio and online presence can be immensely helpful in the job search process. Because you never know which project or article might interest your future employer.

Because of how competitive the job market is right now, you have to go beyond just applying to jobsand start being more proactive.

Here are a few simple steps that can help you make the difference:

Joining data science communities online can also be super helpful!

And that's a wrap. Heres a quick review of what weve discussed:

Good luck on your job search journey. I hope you land your data science role soon. What else would you add? Let us know in the comments.

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she's working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more.

Read more from the original source:

7 Reasons Why Youre Struggling to Land a Data Science Job - KDnuggets

Inside GPT II. The core mechanics of prompt engineering | by Fatih Demirci | Dec, 2023 | Medium – Towards Data Science

As you can see above with greedy strategy, we append the token with the highest probability to the input sequence and predict the next token.

Using this strategy lets generate a longer text with 128 next tokens using greedy-search decoding.

As you we can see from the text above, although it is the simplest logic, the drawback of this approach is the generated repetitive sequences. As it fails to capture the probabilities of sequences, meaning, the overall probability of a several words coming one after another is overlooked. Greedy search predicts and considers only the probability one step at a time.

Repetitive text is a problem. We would desire our generated output to be concise, how can we achieve it?

Instead of choosing the token that has highest probability at each step, we consider future x-steps and calculate the joint probability(simply multiplication of consecutive probabilities) and choose the next token sequence that is most probable. While x refers to number of beams, it is the depth of the future sequence we look into the future steps. This strategy is known as the beam search.

Lets go back to our example from GPT-2 and explore beam vs greedy search scenarios.

Given the prompt, looking at the two tokens with highest probability and their continuation(4 beams) in a tree diagram

Lets calculate the join probabilities of the green sequences above.

Germany is known for its -> high-quality beer

with the joint probability 3.30%*24.24%*31.26%*6.54% = 0.0016353

whereas the lower path with the sequence;

Germany is known for its -> strong tradition of life

2.28%*2.54%*87.02%*38.26% = 0.0019281.

The bottom sequence overall resulted with the higher joint probability, although the first next token prediction step in the top sequence has higher probability.

While greedy search priorities the absolute maximum probability at each prediction step, it neglects the token probabilities in sequences. Beam search decoding enables us to go in depth of sequences and help us decode text in more extensive fashion. So is beam-search the ultimate solution?

Lets explore further and decode the next 128 tokens with the depth of 5 beams.

Here are 10 of the most beautiful places in Germany.

1. Lake Constance

Lake Constance is one of the largest lakes in Germany.

It is located in the state of North Rhine-Westphalia and is the second largest lake in Germany after Lake Constance in Bavaria.

Lake Constance is located in the state of North Rhine-Westphalia and is the second largest lake in Germany after Lake Constance in Bavaria.

"""

Although comparatively lesser than the greedy-search, beam-search suffers from repetitive output too. However, with beam search decoding, we can solve this problem by penalising the repeated pairs of word sequences. In other words, the probability of token sequences is assigned zero, if the sequence has already been decoded before. This penalisation of a repeated tokens sequence is also know as n-gram penalty.

While n signifies the length of the sequence, gram is a term that refers to unit in computational linguistic often corresponds to the term token in our case.

The reasoning behind is to discourage the generation of sequences that contain consecutive repeating n-grams. The decoding algorithm will penalise generated sequences that contain repeating pairs of words in the output.

Knowing this, lets apply n-gram penalty of n = 2.

The country's capital, Berlin, is the largest city in Europe, with a population of more than 8.5 million people.

The city is located in the former East Germany, which was divided into East and West Germany after World War II.

Today, Germany is a member of both the European Union and NATO, as well as the World Trade Organization and the Organization for Economic Cooperation and Development (OECD).<|endoftext|>

"""

This is the best completion of the input prompt we extracted from the model so far in terms of coherence and compactness. Through n-gram penalisation the output decoded with beam-search became more human-like.

When should we use beam-search and when greedy-search? Where the factualness is paramount, like solving a math problem, key information extraction, summarisation or translation, greedy-search should be preferred. However, when we want to achieve creative output and factuality is not our priority (like it can be in the case of story generation) beam-search is often the better suited approach.

Why exactly does your prompt matter? Because every word you choose to use, the sentence structure, the layout of your instructions will activate different series of parameters in the deep layers of large language model and the probabilities will be formed differently for each different prompt. In the essence of the matter, the text generation is a probability expression conditional on your prompt.

There are also alternative methods to prevent repetitions and influence the factuality/creativity of the generated text, such as truncating the distribution of vocabulary or sampling methods. If you are interested in a higher-level in-depth exploration of the subject, Id highly recommend the article from Patrick von Platen in HuggingFace blog.

Next and the last article of this series will explore fine-tuning and reinforcement learning through human feedback which played an important role on why pre-trained models succeeded to surpass SOTA models in several benchmarks. I hope in this blog post, I was able help you understand the reasoning of prompt engineering better. Many thanks for the read. Until next time.

Read this article:

Inside GPT II. The core mechanics of prompt engineering | by Fatih Demirci | Dec, 2023 | Medium - Towards Data Science

I Survived 3 Mass Layoffs This Year, Here’s What I Learned – Towards Data Science

Look at you oh you handsome captain steering the wheels of your life Image by Author (Dalle)

Imagine finally landing your dream job after years of hard work. Youre at the top of the world, living the life, and feeling secure. Then, out of nowhere, layoffs hit.

What would you do? How would you feel in that moment?

This isnt just a hypothetical scenario its the harsh reality in todays tech world affecting hundreds of thousands.

And this year, I got to experience it firsthand, not once, but three times!

Just two months into my dream job at Spotify, 600 people were suddenly laid off. Then, six months later, boom, another wave struck, taking more people with it.

I wasnt laid off but these events still hit me like a big wake-up slap.

I realized I needed to take control of my own career else someone else would be doing it for me. Id always be at the mercy of the corporate worlds unpredictable slaps.

And I was right.

As I write this, were riding a third wave, the largest one yet, because this time, 17% of the workforce around 1,500 people are being let go.

Its a reality no one is safe from, which is why I want to share what I learned from this transformative experience.

Think of this as an exciting adventure, like youre a pirate setting sail. By the end of this story, youll gather the essential insights you need to contemplate your own course. Youll seize command of your ship toward a path that can set you free in your professional odyssey.

Last January, I witnessed many individuals shattered by layoffs, haunted by the question of why they were chosen for this unwelcome fate.

The reality is companies rarely share the criteria they use to decide who will leave and who will stay. You might never have that closure for reasons we ignore.

View original post here:

I Survived 3 Mass Layoffs This Year, Here's What I Learned - Towards Data Science

Data management implications of the AI Act – DataScienceCentral.com – Data Science Central

Image by Gerd Altmann from Pixabay

Members of the European Parliament and the Council reached provisional agreement on the Artificial Intelligence Act on December 9th, 2023 after years of debate and discussion. The AI Act is broad in scope and is intended to protect public welfare, digital rights, democracy, and the rule of law from the dangers of AI. The Act in this sense underscores the need to ensure and protect data sovereignty of both individuals and organizations.

On the data sovereignty regulation front, Europes approach is comparable to Californias on the vehicle emissions regulation front. Carmakers design to the California emissions requirement, and by doing so make sure theyre compliant elsewhere. Much like the GDPR [the EUs General Data Protection Regulation, which went into effect in 2018], the AI Act could become a global standard. Companies elsewhere that want to do business in the worlds second-largest economy will have to comply with the law, pointed out Melissa Heikkil in a December 11, 2023 piece in the MIT Technology Review.

In November 2023, the Organisation for Economic Co-operation and Developments (OECDs) Council updated its definition of artificial intelligence. The European Parliament then adopted the OECDs definition, which is as follows (emphasis mine):

An AI system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment.

Note the text above in boldface. AI systems infer how to generate outputs from inputs. In other words, AI systems are entirely dependent on the quality of their data input.

We can talk all we want to about trustworthy models, but when it comes to statistical models being trustworthy, inputs rule. High data quality is a prerequisite. When the input is garbage, the output will be garbage also.

Most of the time, data scientists grapple with the input before training their models, so the output they end up with often seems reasonable. But the output despite their efforts can be problematic in ways that arent straightforward. How to solve that problem? Make sure the data quality is high to begin with, before it gets to the data scientist. And then make sure the data scientists preserve that quality by preserving context throughout the rest of the process.

The best way to think about ensuring data quality up front is domain by domain. Each business domain needs to produce relevant, contextualized data specific to that domain. Then at a higher level of abstraction, the organization needs to knit that context together to be able to scale data management.

What results is an input model of the business, described as consumable data, that accompanies the rest of the data when fed to machines.

With specific context articulated in the input data, the data becomes explicit enough for machines to associate the data supplied as input with a given context. Explicit relationships stated as facts in domain-specific data are what help to create sufficient context. Theyre what distinguishes tennis matches from kitchen matches.

Organizations need to spell things out for machines by feeding them contextualized facts about their businesses. Volumes and volumes of text, systematically accumulated, can deliver bits and pieces of context. But still, a good portion of that context will be missing from the input data. How to solve that problem? Those responsible for each domains data can make each context explicit by making the relationships between entities explicit.

Once those relationships are explicit, each organization can connect the contexts for each domain together with a simplified model of the business as a whole, whats called an upper ontology.

Most organizations have been siloing data and trapping relationship information separately in applications because thats what existing data and software architecture mandates.

Knowledge graphs provide a place to bring the siloed data and the necessary relationship information for context together. These graphs, which can harness the power of automation in various ways, also provide a means of organization-wide access to unified, relationship-rich whole. Instead of each app holding the relationship information for itself, the graph becomes the resource for that information too. That way, instance data and relationship data can evolve together.

Graphs facilitate the creation, storage and reuse of fully articulated, any-to-any relationships. This graph paradigm itself encourages data connections and reuse by contrast with the data siloing and code sprawl of older data management techniques.

Intelligent data is data that describes itself so that machines dont have to guess what it means. That self-describing data in true knowledge graphs provides machines sufficient context so that machines can provide accurately contextualized output. This addition of context is what makes the difference when it comes to AI accuracy. The larger, logically interconnected context, moreover, can become an organic, reusable resource for the entire business.

View post:

Data management implications of the AI Act - DataScienceCentral.com - Data Science Central

Reltio’s Vidhi Chugh Named "AI & Data Science Leader of the Year" – Yahoo Finance

Chugh honored at Women in Tech Global Awards for her industry leadership, contributions, innovation, and influence

REDWOOD SHORES, Calif., December 19, 2023--(BUSINESS WIRE)--Vidhi Chugh, Head of Product Management for AI/ML Solutions at Reltio, was named AI & Data Science Leader of the Year at this years Women in Tech Global Awards. Ms. Chugh earned the distinction earlier this month for her "substantial contributions to the AI & Data Science field," and for "showcasing extraordinary leadership, innovative thinking, and notable influence" in both her company and the industry.

Nearly 1,200 Women in Tech Global Awards nominations were reviewed by a jury of industry leaders and experts from companies including Amazon Web Services, Microsoft, Palo Alto Networks, Airbnb, Amazon, and others.

"I feel incredibly humbled and honored to be chosen for this recognition by my peers," Ms. Chugh said. "I am so proud to be a member of the Women in Tech Network, which is doing great work around the world by providing inspiration, education, mentoring, networking and more in helping people build and grow their careers in technology. At Reltio, Im excited about our continued AI/ML innovation. We are transforming the realm of data unification and management as we continue to integrate advanced machine learning models and AI-driven capabilities into our solutions. "

The award won by Ms. Chugh was open to women currently working in AI- and data science-related roles and those who have demonstrated innovation in AI and data science, led successful projects, mentored others, engaged with the community, published research or articles, and spoken at industry events.

Ms. Chugh has co-authored 11 U.S. patents and has received a variety of awards during her career, including being named to the World's Top 200 Business & Technology Innovators and winning the prestigious Global AI Inclusion award. She is a highly regarded speaker at conferences worldwide, known as a strong advocate of awareness for ethical AI practices. She has been named to the "Women in AI Ethics" global directory, among other accomplishments.

Story continues

The Women in Tech Network has over 80,000 active members in 181 countries. Previous Women in Tech winners have represented companies including IBM, DoorDash, Shell, Okta, Telefonica Germany, IGT, YouTube, Accenture UK, Zendesk, and others.

At Reltio, Ms. Chugh leads the companys artificial intelligence and machine learning (AI/ML) roadmap, focusing on enhancing capabilities, including entity resolution, using pre-trained machine learning models. She previously held similar positions at Walmart, Blue Yonder, Yatra, and All About Scale.

"Reltio" is a registered trademark of Reltio, Inc. All other trademarks are property of their respective owners. All Rights Reserved.

About Reltio

At Reltio, we believe data should fuel business success. Reltios AI-powered data unification and management capabilitiesencompassing entity resolution, multi-domain master data management (MDM), and data productstransform siloed data from disparate sources into unified, trusted, and interoperable data. The Reltio Connected Data Platform delivers interoperable data where and when it's needed, empowering data and analytics leaders with unparalleled business responsiveness. Leading enterprise brandsacross multiple industries around the globerely on our award-winning data unification and cloud-native MDM capabilities to improve efficiency, manage risk and drive growth.

Visit us at Reltio.com.

View source version on businesswire.com: https://www.businesswire.com/news/home/20231219191117/en/

Contacts

Alan RyanAllison Worldwide for ReltioReltio@allisonworldwide.com

Excerpt from:

Reltio's Vidhi Chugh Named "AI & Data Science Leader of the Year" - Yahoo Finance

FAIR Skies Ahead for Biomedical Data Project Looking to Benefit Research Community – Datanami

Dec. 22, 2023 The San Diego Supercomputer Center at UC San Diego, along with theGO FAIR Foundation, the National Center for Atmospheric Research, the Ronin Institute and other partners, will conduct data landscaping work funded by theFrederick National Laboratory for Cancer Research, operated by Leidos Biomedical Research, Inc., on behalf of the National Institute of Allergy and Infectious Diseases (NIAID). SDSCs Research Data Services Director Christine Kirkpatrick leads theGO FAIR U.S.Office at SDSC and serves as PI for the new project.

The NIAID Data Landscaping and FAIRification project seeks to benefit biomedical researchers and the broader community to generate and analyze infectious, allergic and immunological data. Using theFAIR Principles as a guide, the project teamoffering a broad background to ensure that metadata, a set of data that describes and gives information about other data, for biomedical research is findable, accessible, interoperable and reusable (FAIR)will provide guidance on approaches to enhance the quality of metadata within NIAID and NIH supported repositories and resources that harbor data and metadata.

Structured trainings and guidance will be offered to support stakeholders, including components from the model pioneered by GO FAIR leveraging established M4M workshops and adopting FAIR Implementation Profiles (FIPs). This work will be underpinned by interviews with stakeholders and an assessment to explore the relationship between FAIR resources and scientific impact. The initial period of the federally funded contract, which runs from Sept. 20, 2023 to Sept. 30, 2024, is valued at $1.3 million.

Highlights of the teams expertise include co-authoring the FAIR Guiding Principles, facilitating metadata for machines (M4M) workshops, developing the FAIR Implementation Profile approach, and contributing to improvements on data policy and metadata practices and standards.

Our team is elated to be working with our NIAID project sponsors at the Office of Data Science and Emerging Technologies (ODSET) through Leidos Biomedical Research, remarked Kirkpatrick, PI of the landscaping project. NIAID is renowned for its significant data resources and impactful scientific research. Having the chance to apply our collective expertise in research data management in support of the NIAID mission areas of infectious disease, allergy and immunology will be both impactful to the FAIR ecosystem, and meaningful work for our team. Further, I believe this work will become more common in the future as organizations begin to see data as a strategic asset, rather than focus on the cost of storing it.

The project follows alongside another key project in the Leidos Biomedical Research portfolio, theNIAID Data Ecosystem Discovery Portal, led by The Scripps Research Institute. The project team will work hand in hand with the Scripps team to ensure repository improvements maximize the Discovery Portals ability to search across the wide array of data assets produced by NIAID-funded research.

The project team includes co-authors of the 2016 FAIR Principles paper (Barend Mons and Erik Schultes), leaders in research data consortia, scholars in informatics, biomedical research and pioneers in FAIR training, interoperability practices and methodology for assessing scientific impact. Team members are Chris Erdmann, Doug Fils, John Graybeal, Nancy Hoebelheinrich, Kathryn Knight, Natalie Meyers, Bert Meerman, Barbara Magagna, Keith Maull and Matthew Mayernik. These experts are complemented by world-class systems integrators and project managers from SDSC: Alyssa Arce, Julie Christopher and Kevin Coakley.

Source: Christine Kirkpatrick and Julie Christopher, SDSC Research Data Services

Read the original:

FAIR Skies Ahead for Biomedical Data Project Looking to Benefit Research Community - Datanami