Page 2,754«..1020..2,7532,7542,7552,756..2,7602,770..»

DeepMind releases massive database of 3D protein structures – STAT – STAT

With the advent of cheap genetic sequencing, the world of biology has been flooded with 2D data. Now, artificial intelligence is pushing the field into three dimensions.

On Thursday, Alphabet-owned AI outfit DeepMind announced it has used its highly accurate deep learning model AlphaFold2 to predict the 3D structure of 350,000 proteins including nearly every protein expressed in the human body from their amino acid sequences. Those predictions, reported in Nature and released to the public in the AlphaFold Protein Structure Database, are a powerful tool to unravel the molecular mechanisms of the human body and deploy them in medical innovations.

This resource were making available, starting at about twice as many predictions as there are structures in the Protein Data Bank, is just the beginning, said John Jumper, lead AlphaFold researcher at DeepMind, in a press call. The company intends to continue adding predicted structures to the database.

advertisement

When we reach the scale of 100 million predictions that cover all proteins, were really starting to talk about transformative uses, he said.

One of those transformations may come in the databases application to drug discovery. In an uncommon move, DeepMind has chosen to make the database released in partnership with the European Molecular Biology Laboratory completely open source for any use.

advertisement

So we hope, actually, that drug discovery and pharma will use it, DeepMind CEO Demis Hassabis said during the call.

DeepMinds predictions could be of interest to AI-driven drug companies looking to hone their models, biotech startups hoping to expand their list of target proteins, and even companies engineering new designer enzymes.

Whenever theres a breakthrough, I think rising tides lift all boats. And this opens up a super exciting era in structure-driven drug design, said Abraham Heifets, CEO of AI-driven drug discovery company Atomwise, which uses its own library of computationally inferred protein structures to find molecules that selectively bind with proteins involved in disease. Having better information on the shape of a protein is how you design a molecule that fits into that protein really well, to shut down or arrest that disease process.

DeepMind had committed to opening up its work in November, after AlphaFold2 took home the top prize in the protein-folding prediction contest CASP, in what was hailed as a solution to the long-standing protein folding problem. But in the seven months since then, structural biologists got antsy waiting for the groundbreaking work to go public. As STAT reported last week, DeepMind raced to publish its open source code and methods in Nature, just as a group at the University of Washington published their own attempt at replicating AlphaFolds approach in Science.

With the database adding so many new structural predictions, researchers from drug developers and basic scientists will have a lot of new material to work with. Well look through it very quickly to see if there are proteins were interested in that are suddenly enabled by this new dataset, said Heifets.

Jumper thinks the new tool will remove a difficult choice that plagues some biologists: If a protein structure isnt available, they could spend lots of time and money on physical experiments to figure it out (which still might not pan out), or they could simply go without and focus on functional studies. Suddenly, the access to structures is going to increase dramatically, he told STAT. I think thats really going to change how scientists approach these biological questions.

Still, these arent plug-and-play structures: Theyre predictions, and they come with caveats that scientists will have to consider.

Me as a biochemist, Id like to understand is this a good model or not? What about this algorithm is confident or not? said Frank von Delft, who leads protein crystallography at the University of Oxfords Centre for Medicines Discovery. I think that will be the key. Can you tell me, Yeah, I kind of nailed it, and this one Im struggling to nail, but this one is easy to get right?

To answer that question, DeepMind built measures into its predictions to help researchers determine whether to rely on the structures for their work. Preparing the predictions has actually only been a small part of this work, DeepMinds Kathryn Tunyasuvunakool, lead author on the paper, said in the call. Perhaps even more effort has gone into providing both local and global confidence metrics.

Across the board, AlphaFold2 predicted 58% of amino acids in the human proteome all the proteins expressed by the human body with confidence, and 35.7% with a very high degree of confidence. At that level, the model could nail not just the backbone of the protein, but the orientation of its side chains. The degree of confidence required will depend on how scientists are using the prediction. If you were looking at, say, the active site of an enzyme, you would want the residues involved to be in that highest confidence bracket, said Tunyasuvunakool, but actually theres an awful lot of utility even in the next-highest confidence bracket.

It is kind of overwhelming what they can do, said Arne Elofsson, a bioinformatician at Stockholm University.

The AlphaFold database doesnt spell doom for experimental biologists, those who painstakingly determine the physical structure of proteins using methods like X-ray crystallography, cryo-electron microscopy, and nuclear magnetic resonance spectroscopy. For many applications, there will be a need to validate the structures proposed by these models, said Elofsson.

But as predicted structures become more accepted, the AlphaFold database could change the way structural biology prioritizes its work and even what it considers its gold standard.

Normally in CASP we assume the experiment is the gold standard, and if you disagree youre wrong, said John Moult, a computational biologist at the University of Maryland who founded the contest. And with DeepMind some of the time thats true, but quite a lot of the time not true. In other words, theres room for error in the physical experiments used to determine protein structure and with a highly accurate prediction model, a computer could in some cases do the job better. So I think that theres a lot to sort out there: When is a detail actually computationally better than the corresponding experimental result?

That will be a philosophical question for the field to confront over time, especially as AlphaFolds approach continues to develop. DeepMind made massive gains between its first entry in 2018s CASP competition, with AlphaFold1, and AlphaFold2 in 2020. This is sort of v2.1 in a way, and we expect there will be more improvements over time, said Hassabis, adding that DeepMind may update the database as more experimental protein structures are solved or as the computational model continues to be developed.

As the database expands, so too could the set of structures that could be applied to drug discovery. A thing that people dont really know or think about is that theres 20,000 human genes, but only 4% of those have ever had a drug approved by the FDA, said Heifets. So we have many more protein targets that we could go after than weve ever had medicine brought to bear against. DeepMind has established a partnership with the Drugs for Neglected Diseases Initiative to develop approaches for Chagas disease and leishmaniasis.

But there are also uses for the database that are as yet unseen. AlphaFold is a paradigm change in the level of accuracy which biologists can now expect, which will unlock other applications, Pushmeet Kohli, DeepMinds head of AI for science, told STAT. Which is why we wanted to make AlphaFold broadly accessible: So the community would not just leverage it for existing applications, like in drug discovery, but other applications they might not even have been thinking about until now.

See the rest here:
DeepMind releases massive database of 3D protein structures - STAT - STAT

Read More..

DeepMinds AI uncovers structure of 98.5 per cent of human proteins – News Nation USA

By Matthew Sparkes

Detemining the delicate folds of proteins traditionally takes ages, but DeepMind AI speeds that up

DeepMind

It took decades of painstaking research to map the structure of just 17 per cent of the proteins used within the human body, but less than a year for UK-based AI company DeepMind to raise that figure to 98.5 per cent. The company is making all this data freely available, which could lead to rapid advances in the development of new drugs.

Determining the complex, crumpled shape of proteinsbased on the sequence of amino acids that make them has been a huge scientific hurdle. Some amino acids are attracted to others, some are repelled by water, and the chains form intricate shapes that are hard to calculate accurately.Understanding these structures enables new, highly targeted drugs to be designed that bind to specific parts of proteins.

Genetic research had long provided the ability to determine the sequence of a protein, but an efficient way of finding the shape crucial to understanding its properties has proven elusive.Although supercomputers and distributed computing projectshave been effective, they have failed to make significant progress.

DeepMind published research last year that proved that AI can solve the problem quickly. Its AlphaFold neural network was trained on sections of previously solved protein shapes and learned to deduce the structure of new sequences, which were then checked against experimental data.

Since then, the company has been applying and refining the technology to thousands of proteins, beginning with the human proteome, proteins relevant to covid-19 and others that will most benefit immediate research. It is now releasing the results in a database created in partnership with the European Molecular Biology Laboratory.

DeepMind has mapped the structure of 98.5 per cent of the 20,000 or so proteins in the human body. For 35.7 per cent of these, the algorithm gave a confidence of over 90 per cent accuracy in predicting its shape.

The company has released more than 350,000 protein structure predictions in total, including those for 20 additional model organisms that are important for biological research, from Escherichia coli to yeast. The team hopes that within months it can add almost every sequenced protein known to science more than 100 million structures.

John Moult at the University of Maryland says the rise of AI in the area of protein folding had been a profound surprise.

Its revolutionary in a sense thats hard to get your head around, he says. If youre working on some rare disease and you never had a structure, now youll be able to go and look at structural information which was basically very, very hard or impossible to get before.

Demis Hassabis, chief executive and founder of DeepMind, says that AlphaFold which is composed of around 32 separate algorithms and has been made open source is now solving protein shapes in minutes or, in some cases, seconds using hardware no more sophisticated than a standard graphics card.

It takes one [graphics processing unit] a few minutes to fold one protein, which of course would have taken years of experimental work, he says. Were just going to put this treasure trove of dataout there. Its a little bit mind blowing in a way because going from the breakthrough of creating a system that can do that to actually producing all the data has only been a matter of months. We hope its going to become a sort of standard tool that all biologists around the world use.

The team also added a confidence measure to all structure predictions, which Hassabis says he felt was vital given that the results will be the basis for research efforts.Hassabis believes that some portion of human proteins for which the predicted structure had lower confidence scores could be down to errors in the sequence or perhaps something intrinsic about the biology, such as proteins that are inherently disordered or unpredictable. The 1.5 per cent remaining of the human proteome which no structure has been published for were proteins with sequences longer than 2700 segments, which were excluded for the time being to minimise runtime.

Journal reference: Nature, DOI: https://www.nature.com/articles/s41586-021-03828-1

More on these topics:

Read more:
DeepMinds AI uncovers structure of 98.5 per cent of human proteins - News Nation USA

Read More..

Good Morning, News: $600000 Settlement for 2017 PPB Killing, Deep Pockets Try Influencing MultCo DA, and Everything is GREAT at the Olympics! – The…

The Mercury provides news and fun every single daybut your help is essential. If you believe Portland benefits from smart, local journalism and arts coverage, please consider making a small monthly contribution, because without you, there is no us. Thanks for your support!

Sponsored

Custom framing, photo frames, printing on metal, paper and canvas.

Multnomah County District Attorney Mike Schmidt NATHAN HOWARD / GETTY IMAGES

Good morning, Portland! Reminder: There's only one week left in July, meaning there's only one week left to enjoy boozy $5 slushies. Please act accordingly!!!

Here are the headlines.

You'll want to read this story about powerful Portland businessmen trying to convince District Attorney Mike Schmidt to prosecute more protesters arrested on bullshit charges. It's frustrating as hell, but ultimately gratifying:

Here's another environmental phenomenon to worry about: Oregon is experiencing a "hypoxia season"when oxygen levels drop to low levels in the ocean off the Oregon coastthat's much earlier than usual. That could mean trouble for both crabs and bottom-dwelling fish off the coast.

Portland City Council unanimously approved a $600,000 settlement agreement Wednesday to the family of Terrell Johnson, a 24-year-old man killed by a Portland police officer in 2017. Johnson died on May 10, 2017, after being chased on foot by former PPB officer Samson Ajir from the SE Flavel MAX platform.

A Portland police officer shot and wounded a member of the public Tuesday evening at a convenience store in Northwest Portlandthe fourth shooting by PPB this year. New information is still coming out, but Alex Zielinski has more details on the shooting.

With limited fire-fighting resources, some Oregonians are forced to take matters into their own hands:

Disturbing headline of the day, courtesy of NBC News: "As GOP supporters die of Covid, the party remains split in its vaccination message."

NPR has a report out about a new trend with the United States Supreme Court: Last month, the Court twice ruled in favor of giving the President more power over federal regulatory agencies, such as the United States Patent and Trademark Office or the Federal Housing Finance Agency. This means that the agencies, which are meant to simply enforce the rules, could become more overtly political, depending on the whims of whoever happens to be President at the time.

Looking forward to a few months from now, when I can sit back and let an AI bot write this column:

Let's check in on the Tokyo Olympics, where everything is going great, the athletes are happy and healthy, and the world is coming together to enjoy some sports! Oh, what's that? The opening ceremony director was fired for making Holocaust jokes? Yeah, okay, sounds about right.

And finally, let's all sit in awe of this fast-acting teen for a moment:

Read the original here:
Good Morning, News: $600000 Settlement for 2017 PPB Killing, Deep Pockets Try Influencing MultCo DA, and Everything is GREAT at the Olympics! - The...

Read More..

Data Science is Here to Spearhead Organizations Through Tough Competition – Analytics Insight

Data science, as a field, started getting recognition in the early 2000s. But it took an entire pandemic to create the demand it now has. Organizations that were reluctant to embrace digital transformation and use modern technologies like data science are accelerating the rate at which they are adopting this analytical technology. It wont be wrong to say that every business across industries, be it manufacturing, automobile, retail, and pharmaceutical, are leveraging the capabilities of data science to get a competitive edge. This increasing demand is resulting in a flood of data science jobs.

To those who have some knowledge about this field, they are familiar with the fact that data science professionals are of utmost importance to organizations. Data engineers, data analysts, and data scientists are the roles that are flooding job portals. As technology develops with every year, the skills required for data science professionals also vary with time and advancements. For future generations who are going to be a part of dynamic workforces, keeping up with the latest tech trends, in this case data science, is crucial.

From an organizations point of view, data science brings many advantages to the table. Firstly, it helps businesses make better decisions using data-driven approaches. Its a data professionals responsibility to be the trusted advisor to the organizations top management and present the necessary data and metrics that will help the teams make informed decisions. Not only that, data science capabilities will also help businesses predict favorable outcomes and forecast potential growth opportunities.

At the end of the day, the main goal of any organization is to earn profits. A data scientist puts his/her skills to use to explore the available data, analyze what business processes work and dont, and prescribe strategies that will improve overall performance, customer engagement, and result in greater ROI. A data professional will also help employees to understand their tasks, improve on them, and help teams devote their efforts to tasks that will make a substantial difference.

For every company that involves itself with products and services, it is crucial for the company to ensure their solutions reach the right audience. Instead of relying on assumptions, data science helps companies identify the right target audience. With a thorough analysis of the companys data sources and in-depth knowledge about the company and its goals, data science skills assist teams in targeting the right audience and refine existing strategies for better sales. A data professionals knowledge about the dynamic market through data analysis can also help in product innovation.

Before everything else, efficient and skilled employees make or break an organization. Data scientists also help recruiters in sourcing the right profiles from the available talent. Through social media, corporate databases and job portals, data professionals should possess the skills to sieve through the data points and identify the right candidate for the right roles.

With these advantages and many more, data science is an invaluable asset for organizations. Hence, this field is a lucrative career option that the future generation must prepare for, if they want to make their place in the tech industry. In this magazine edition, Analytics Insight is putting a spotlight on the most prominent analytics and data science institutes that are guiding young tech leaders with the right skills to ace the field of data science. With digital transformation becoming an essential part of every established and upcoming business, the demand for data science professionals is only going to grow.

Share This ArticleDo the sharing thingy

About AuthorMore info about author

Analytics Insight is an influential platform dedicated to insights, trends, and opinions from the world of data-driven technologies. It monitors developments, recognition, and achievements made by Artificial Intelligence, Big Data and Analytics companies across the globe.

Visit link:

Data Science is Here to Spearhead Organizations Through Tough Competition - Analytics Insight

Read More..

Behind the scenes: A day in the life of a data scientist – TechRepublic

Helping others use data is "like giving them a superpower," says the senior data scientist at an ag-tech startup, Plenty.

Data Scientist Dana Seidel at work.

Image: Dana Seidel

Dana Seidel was "traipsing around rural Alberta, following herds of elk," trying to figure out their movement patterns, what they ate, what brought them back to the same spot, when she had an epiphany: Data could help answer these questions.

SEE: Snowflake data warehouse platform: A cheat sheet (free PDF) (TechRepublic)

At the time, enrolled in a master's program at the University of Alberta, she was interested in tracking the movement of deer and elk and other central foragers. Seidel realized that she could use her math and ecology background at Cornell University to help evaluate a model that could answer these questions. She continued her studies, earning a Ph.D. at University of California Berkeley related to animal movement and the spread of diseaseswhich she monitored, in part, by collecting data from collars. Kind of like a Fitbit, Seidel explained, "tracking wherever you go throughout the day," yielding GPS data points that could connect to land data, such as satellite images, offering a window into the movement of this wildlife.

Seidel, 31, has since transitioned from academia to the startup world, working as the lead data scientist at Plenty, an indoor vertical farming company. Or as she would call herself a "data scientist who is interested in spatial-temporal time series data."

SEE: Behind the scenes: A day in the life of a freelance JavaScript developer (TechRepublic)

Seidel was born in Tennessee, but grew up in Kansas. She's 31, which she said is "old" for the startup world. As someone who spent her twenties "investing in one career path and then switching over," she doesn't necessarily have the same industry experience as her colleagues. So while she is grateful for her experience, a degree is not a necessity, she said.

"I'm not sure that my Ph.D. helps me in my current job," she said. One area where it did help her, however, was by giving her access to internshipsat Google Maps, in Quantitative Analysts and RStudiowhere she gained experience in software development.

"But I don't think writing more papers about anthrax and zebras really convinced anybody that I was a data scientist," she said.

Seidel learned the programming language R, which she loved, in college, and in her master's program started building databases. She said she "generally taught myself alongside these courses to use the tools." The biggest skill of being a data scientist "may very well just be knowing how to Google things," she said. "That's all coding really is, creative problem-solving."

SEE: Job description: Chief data officer (TechRepublic Premium)

The field of data science is about a decade old, Seidel saidpreviously, it was statistics. "The idea of having somebody who has a statistics background or understands inferential modeling or machine learning has existed for a lot longer than we've called it a data scientist," she said, and a master's in data science didn't exist until the last year of her Ph.D.

Additionally, "data scientist" is very broad. Among data scientists, many different jobs can exist. "There are data scientists that focus very much on advanced analytics. Some data scientists only do natural language processing," she said. And the work emcompasses many diverse skills, she said, including "project management skills, data skills, analysis skills, critical thinking skills."

Seidel has mentored others interested in getting into the field, starting with a weekly Women in Machine Learning and Data Science coffee hour at Berkeley. The first piece of advice? "I would tell them: 'You have skills,'" Seidel said. Many young students, especially women, don't realize how much they already know. "I don't think we communicate often to ourselves in a positive way, all of the things we know how to do, and how that might translate," she said.

For those interested in transitioning from academia to industry, she also advises getting experience in software development and best practices, which may have been missing from formal education. "If you understand things like standard industry practices, like version control and git and bash scripting a little bit so that you have some of that language, some of that knowledge, you can be a more effective collaborator." Seidel also recommends learning SQLone of the easiest languages, in her opinionwhich she calls "the lingua franca of data analytics and data science. Even though I think it's something you can absolutely learn on the job, it's going to be the main way you access data if you're working in an industry data science team. They're going to have large databases with data and you need a way to communicate that," she said. She also recommends building skills, through things like the 25-day Advent of Code, and other ways to demonstrate a clean coding style. "What takes a good amount of legwork, and until you have your industry job, it's unpaid legwork, but it can really help make you stand out," she said.

SEE: Top 5 things you need to know about data science (TechRepublic)

On a typical morning at her current job, working from home, Seidel is drinking coffee and answering Slack messages in her home office/ quilting studio. She checks to see if there are questions about the data, something wrong with the dashboard, or a question about plant health. Software engineers working on the data may also have questions, she said. There's often a scrum meeting in the morning, and they operate with sprint teams (meeting every two weeks) and agile workflows.

"I have a pretty unique position where I can float between various data scrums we do, we have a farm performance scrum versus a perception team or a data infrastructure team," Seidel explained. "I can decide: What am I going to contribute to in this sprint?" Twice a week there's a leadership meeting, where she is on the software and data leads, and she can listen in on what else is being worked on, and what's coming up ahead, which she said is one of the most important meetings for her, since she can hear directly "when a change is happening on the software side or there's a new requirement coming out of ops for a software or for software or for data that's coming."

In the afternoon, she has a good block of development time, "to dig into whatever issue I'm working on that sprint," she said.

SEE: How to become a data scientist: A cheat sheet (TechRepublic)

Seidel manages the data warehouse and ensures data streams are "being surfaced to end users in core data models." Last week, she worked on the farm performance scrum, "validating measurements that are coming out of the farm, thinking ahead about the new measurements we need to be collecting, and thinking about the measurements that we have in our south San Francisco farm, measurements streaming in from a couple of thousand devices." She needs to ensure accurate measurement streams, which come from everything from the temperature to irrigation, to ensure plant health, and answer questions like: "Why did last week's arugula do better than this week's arugula?"

The primary task is to know if they're measuring the right thing, and to push back and say, "Oh, OK, what is it that you want that data to be explaining? What is the question you're asking?" She needs to stay a few steps ahead, she said, and ask: "What are all the new data sources that I need to be aware of that we need to be supporting?"

The toughest part of the job? "I really hate not having the answer. I hate having to say, "No, we don't measure that thing yet." Or, "We'll have that in the next sprint." Balancing giving people the answers with giving them tools to access the answers themselves is a daily challenge, she said, with the ultimate goal of making data accessible.

And saying, "Oh, yes, that data is there and it's this simple query," or, "Oh, have you seen this tool I built a year ago that can solve this problem?" is really gratifying.

"Helping someone learn how to ask and answer questions from data is like giving them a superpower," Seidel said.

Learn the latest news and best practices about data science, big data analytics, and artificial intelligence. Delivered Mondays

The rest is here:

Behind the scenes: A day in the life of a data scientist - TechRepublic

Read More..

The Biggest Data Science News Items During the First Half of 2021 – Solutions Review

Our editors curated this list of the biggest data science news items during the first half of 2021, as highlighted on Solutions Review.

Data science is one of thefastest-growing fieldsin America. Organizations are employing data scientists at a rapid rate to help them analyze increasingly large and complex data volumes. The proliferation of big data and the need to make sense of it all has created a vortex where all of these things exist together. As a result, new techniques, technologies, and theories are continually being developed to run advanced analysis, and they all require development and programming to ensure a path forward.

Part of Solutions Reviews ongoing analysis of the big data marketplace includes covering the biggest data science news stories which have the greatest impact on enterprise technologists. This is a curated list of the most important data science news stories from the first half of 2021. For more on the space, including the newest product releases, funding rounds and mergers and acquisitions, follow our popular news section.

Databricks raised $1 billion in Series G funding in response to the rapid adoption of its unified data platform, according to a press release. The capital injection, which follows araise of $400 millionin October 2019, puts Databticks at a $28 billion valuation. The round was led by new investor Franklin Templeton with inclusion from Amazon Web Services, CapitalG and Salesforce Ventures. The funding will enable Databricks to move ahead with additional product innovations and scale support for the lakehouse data architecture.

In a media statement, Databricks co-founder and CEO Ali Ghodsi said We see this investment and our continued rapid growth as further validation of our vision for a simple, open and unified data platform that can support all data-driven use cases, from BI to AI. Built on a modern lakehouse architecture in the cloud, Databricks helps organizations eliminate the cost and complexity that is inherent in legacy data architectures so that data teams can collaborate and innovate faster. This lakehouse paradigm is whats fueling our growth, and its great to see how excited our investors are to be a part of it.

OmniSci recentlyannounced the launchof OmniSci Free, a full-featured version of its analytics platform available for use at no cost. OmniSci free will enable users to utilize the full power of the OmniSci Analytics Platform, which includes OmniSciDB, OmniSci Render Engine, OmniSci Immerse, and the OmniSci Data Science Toolkit. The solution can be deployed on Linux-based servers and is generally adequate for datasets of up to 500 million records. Three concurrent users are permitted.

In a media statement on the news, OmniSci co-founder and CEO Todd Mostak said Our mission from the beginning has been to make analytics instant, powerful, and effortless for everyone, and the launch of OmniSci Free is our latest step towards making our platform accessible to an even broader audience. While our open source database has delivered significant value to the community as an ultra-fast OLAP SQL engine, it has become increasingly clear that many use cases heavily benefit from access to the capabilities of our full platform, including its massively scalable visualization and data science capabilities.

DataRobotrecently announcedthe release of DataRobot 7, the latest version of its flagship AI and machine learning platform. The release is highlighted by MLOps remove model challengers which allow customers to challenge production models no matter where they are running and regardless of framework or language in which it was built. Additionally, DataRobot 7 also offers choose your own forecast baseline which lets users compare the output of their forecasting models with predictions from DataRobot Automated Time Series.

In a media statement, DataRobot SVP of Product Nenshad Bardoliwalla said Through ongoing engagement with our customers, weve developed an intimate understanding of the challenges they face, as well as the opportunities they have, with AI. Our latest platform release has been specifically designed to help them seize the transformative power of AI and advance on their journeys to becoming AI-driven enterprises.

Tableau announced the releaseof Tableau 2021.1, the latest version of the companys flagship business intelligence and data analytics offering. The release is highlighted by the introduction of business science, a new class of AI-powered analytics that enables business users to take advantage of data science techniques. Business science is delivered via Einstein Discovery. Other key additions aim to simplify analytics at scale and expand the Tableau ecosystem to help different user personas understand their environment.

In a media statement about the news, Tableau Chief Product Officer Francois Ajenstat said Data science has always been able to solve big problems but too often that power is limited to a few select people within an organization. To build truly data-driven organizations, we need to unlock the power of data for as many people as possible. Democratizing data science will help more people make smarter decisions faster.

Dataiku recently announced the release of Dataiku 9, the latest version of the companys flagship data science and machine learning platform. The release is highlighted by best practice guardrails to prevent common pitfalls, model assertations to capture and test known use cases, what-if analysis to interactively test model sensitivity, and a new model fairness report to augment existing biased detection methods when building responsible AI models. Dataikuraised $100 millionin Series D funding last summer.

The release notes add For business analysts engaged in data preparation tasks, the highly requested fuzzy join recipe makes it easy to join close-but-not-equal columns, an updated formula editor requires less time to learn, and updated date functions simplify time date preparation. It also touts support for the Dash application framework.

Domino Data Lab recently announced a series of new integrated solutions and product enhancements with NVIDIA,according to a press release. The technologies were unveiled at theNVIDIA GTC Conference. Dominos latest is highlighted by Dominos availability for the NetApp ONTAP AI Integrated Solution, which upgrades data science productivity with software that streamlines the workflow while maximizing infrastructure utilization. As such, Domino has been tested and validated to run on the packaged offering and is available via the NVIDIA Partner network.

The new platform automatically creates and manages multi-node clusters and releases them when training is done. Domino currently supports ephemeral clusters using Apache Spark and Ray, and will be adding support for Dask in a product release later in the year. Administrators can also divide a single NVIDIA DGX A100 GPU into multiple instances or partitions to support a variety of users with Dominos support. According to the announcement, this allows 7x the number of data scientists to run a Jupyter notebook attached to a single GPU versus without MIG.

Exploriumrecently announcedthat it has secured $75 million in Series C funding, according to a press release on the companys website. The funding isExploriums second roundin the last nine months and brings the companys total capital raised to more than $125 million since its founding in 2017. Explorium doubled its customer base during the last 16 months.

In a media statement on the news, Explorium CEO Maor Shlomo said As we saw last year, machine learning models and tools for advanced analytics are only as good as the data behind them. And often that data is not sufficient. Were addressing a business-critical need, guiding data scientists and business leaders to the signals that will help them make better predictions and achieve better business outcomes.

Alteryx recentlyannounced product enhancementsacross its product line of data science and analytics tools, as well as the release of Alteryx Machine Learning. The company broke the news at Alteryx Inspire Virtual, its annual user conference. Currently available in early access, Alteryx Machine Learning provides guided, explainable, and fully automated machine learning (AutoML). Key features include feature engineering and deep feature synthesis, automated insight generation, and an Education Mode that offers data science best practices.

In a media statement on the news, Alteryx Chief Product Officer Suresh Vittal said: We are investing deeply in analytics and data science automation in the cloud, starting with Designer Cloud, Alteryx Machine Learning and AI introduced today. We remain focused on being the best at democratizing analytics so millions of people can leverage the power of data.

Tim is Solutions Review's Editorial Director and leads coverage on big data, business intelligence, and data analytics. A 2017 and 2018 Most Influential Business Journalist and 2021 "Who's Who" in data management and data integration, Tim is a recognized influencer and thought leader in enterprise business software. Reach him via tking at solutionsreview dot com.

More here:

The Biggest Data Science News Items During the First Half of 2021 - Solutions Review

Read More..

Scaling AI and data science 10 smart ways to move from pilot to production – VentureBeat

Presented by Intel

Fantastic! How fast can we scale? Perhaps youve been fortunate enough to hear or ask that question about a new AI project in your organization. Or maybe an initial AI initiative has already reached production, but others are needed quickly.

At this key early stage of AI growth, enterprises and the industry face a bigger, related question: How do we scale our organizational ability to develop and deploy AI? Business and technology leaders must ask: Whats needed to advance AI (and by extension, data science) beyond the craft stage, to large-scale production that is fast, reliable, and economical?

The answers are crucial to realizing ROI, delivering on the vision of AI everywhere, and helping the technology mature and propagate over the next five years.

Unfortunately, scaling AI is not a new challenge. Three years ago, Gartner estimated that less than 50% of AI models make it to production. The latest message was depressingly similar. Launching pilots is deceptively easy, analysts noted, but deploying them into production is notoriously challenging. A McKinsey global survey agreed, concluding: Achieving (AI) impact at scale is still very elusive for many companies.

Clearly, a more effective approach is needed to extract value from the $327.5 billion that organizations are forecast to invest in AI this year.

As the scale and diversity of data continues to grow exponentially, data science and data scientists are increasingly pivotal to manage and interpret that data. However, the diversity of AI workflows means that the data scientists need expertise across a wide variety of tools, languages, and frameworks that focus on data management, analytics modeling and deployment, and business analysis. There is also increased variety in the best hardware architectures to process the different types of data.

Intel helps data scientists and developers operate in this wild wild West landscape of diverse hardware architectures, software tools, and workflow combinations. The company believes the keys to scaling AI and data science are an end-to-end AI software ecosystem built on the foundation of the open, standards-based, interoperable oneAPI programming model, coupled with an extensible, heterogeneous AI compute infrastructure.

AI is not isolated, says Heidi Pan, senior director of data analytics software at Intel. To get to market quickly, you need to grow AI with your application and data infrastructure. You need the right software to harness all of your compute.

She continues, Right now, however, there are lots of silos of software out there, and very little interoperability, very little plug and play. So users have to spend a lot of their time cobbling multiple things together. For example, looking across the data pipeline; there are many different data formats, libraries that dont work with each other, and workflows that cant operate across multiple devices. With the right compute, software stack, and data integration, everything can work seamlessly together for exponential growth.

Creation of an end-to-end AI production infrastructure is an ongoing, long-term effort. But here are 10 things enterprises can do right now that can deliver immediate benefits. Most importantly, theyll help unclog bottlenecks with data scientists and data, while laying the foundations for stable, repeatable AI operations.

Consider the following from Rise Labs at UC Berkeley. Data scientists, they note, prefer familiar tools in the Python data stack: pandas, scikit-learn, NumPy, PyTorch, etc. However, these tools are often unsuited to parallel processing or terabytes of data. So should you adopt new tools to make the software stack and APIs scalable? Definitely not!, says Rise. They calculate that it would take up to 200 years to recoup the upfront cost of learning a new tool, even if it performs 10x faster.

These astronomical estimates illustrate why modernizing and adapting familiar tools are much smarter ways to solve data scientists critical AI scaling problems. Intels work through the Python Data API Consortium, the modernizing of Python via numbas parallel compilation and Modins scalable data frames, Intel Distribution of Python, or upstreaming of optimizations into popular deep learning frameworks such as TensorFlow, PyTorch, and MXNet and gradient boosting frameworks such as xgboost and catboost are all examples of Intel helping data scientists get productivity gains by maintaining familiar workflows.

Hardware AI accelerators such as GPUs and specialized ASICs can deliver impressive performance improvements. But software ultimately determines the real-world performance of computing platforms. Software AI accelerators, performance improvements that can be achieved through software optimizations for the same hardware configuration, can enable large performance gains for AI across deep learning, classical machine learning, and graph analytics. This orders of magnitude software AI acceleration is crucial to fielding AI applications with adequate accuracy and acceptable latency and is key to enabling AI Everywhere.

Intel optimizations can deliver drop-in 10-to-100x performance improvements for popular frameworks and libraries in deep learning, machine learning, and big data analytics. These gains translate into meeting real-time inference latency requirements, running more experimentation to yield better accuracy, cost-effective training with commodity hardware, and a variety of other benefits.

Below are example training and inference speedups with Intel Extension for Scikit-learn, the most widely used package for data science and machine learning. Note that accelerations ranging up to 322x for training and 4,859x for inference are possible just by adding a couple of lines of code!

Figure 1. Training speedup with Intel Extension for Scikit-learn over the original package

Figure 2. Inference speedup with Intel Extension for Scikit-learn over the original package

Data scientists spend a lot of time trying to cull and downsize data sets for feature engineering and models in order to get started quickly despite the constraints of local compute. But not only do the features and models not always hold up with data scaling, they also introduce a potential source of human ad hoc selection bias and probable explainability issues.

New cost-effective persistent memory makes it possible to work on huge, terabyte-sized data sets and bring them quickly into production. This helps with speed, explainability, and accuracy that come from being able to refer back to a rigorous training process with the entire data set.

While CPUs and the vast applicability of their general-purpose computing capabilities are central to any AI strategy, a strategic mix of XPUs (GPUs, FPGAs, and other specialized accelerators) can meet the specific processing needs of todays diverse AI workloads.

The AI hardware space is changing very rapidly, Pan says, with different architectures running increasingly specialized algorithms. If you look at computer vision versus a recommendation system versus natural language processing, the ideal mix of compute is different, which means that what it needs from software and hardware is going to be different.

While using a heterogeneous mix of architectures has its benefits, youll want to eliminate the need to work with separate code bases, multiple programming languages, and different tools and workflows. According to Pan, the ability to reuse code across multiple heterogeneous platforms is crucial in todays dynamic AI landscape.

Central to this is oneAPI, a cross-industry unified programming model that delivers a common developer experience across diverse hardware architectures. Intels Data Science and AI tools such as the Intel oneAPI AI Analytics Toolkit and the Intel Distribution of OpenVINO toolkit are built on the foundation of oneAPI and deliver hardware and software interoperability across the end to end data pipeline.

Figure 3. Intel AI Software Tools

The ubiquitous nature of laptops and desktops make them a vast untapped data analytics resource. When you make it fast enough and easy enough to instantaneously iterate on large data sets, you can bring that data directly to the domain experts and decision makers without having to go indirectly through multiple teams.

OmniSci and Intel have partnered on an accelerated analytics platform that uses the untapped power of CPUs to process and render massive volumes of data at millisecond speeds. This allows data scientists and others to analyze and visualize complex data records at scale using just their laptops or desktops. This kind of direct, real-time decision making can cut down time to insight from weeks to days, according to Pan, further speeding production.

AI development often starts with prototyping on a local machine but invariably needs to be scaled out to a production data pipeline on the data center or cloud due to expanding scope. This scale out process is typically a huge and complex undertaking, and can often lead to code rewrites, data duplication, fragmented workflow, and poor scalability in the real world.

The Intel AI software stack lets one scale out their development and deployment seamlessly from edge and IOT devices to workstations and servers to supercomputers and the cloud. Explains Pan: You make your software thats traditionally run on small machines and small data sets to run on multiple machines and Big Data sets, and replicate your entire pipeline environments remotely. Open source tools such as Analytics Zoo and Modin can move AI from experimentation on laptops to scaled-out production.

Throwing bodies at the production problem is not an option. The U.S. Bureau of Labor Statistics predicts that roughly 11.5 million new data science jobs will be created by 2026, a 28% increase, with a mean annual wage of $103,000. While many training programs are full, competition for talent remains fierce. As the Rise Institute notes: Trading human time for machine timeis the most effective way to ensure that data scientists are not productive. In other words, its smarter to drive AI production with cheaper computers rather than expensive people.

Intels suite of AI tools place a premium on developer productivity while also providing resources for seamless scaling with extra machines.

For some enterprises, growing AI capabilities out of their existing data infrastructure is a smart way to go. Doing so can be the easiest way to build out AI because it takes advantage of data governance and other systems already in place.

Intel has worked with partners such as Oracle to provide the plumbing to help enterprises incorporate AI into their data workflow. Oracle Cloud Infrastructure Data Science environment, which includes and supports several Intel optimizations, helps data scientists rapidly build, train, deploy, and manage machine learning models.

Intels Pan points to Burger King as a great example of leveraging existing Big Data infrastructure to quickly scale AI. The fast food chain recently collaborated with Intel to create an end-to-end, unified analytics/AI recommendation pipeline and rolled out a new AI-based touchscreen menu system across 1,000 pilot locations. A key: Analytics Zoo, a unified big data analytics platform that allows seamless scaling of AI models to big data clusters with thousands of nodes for distributed training or inference.

It can take a lot of time and resources to create AI from scratch. Opting for the fast-growing number of turnkey or customized vertical solutions on your current infrastructure makes it possible to unleash valuable insights faster and at lower cost than before.

The Intel Solutions Marketplace and AI builders program offer a rich catalog of over 200 turnkey and customized AI solutions and services that span from edge to cloud. They deliver optimized performance, accelerate time to solution, and lower costs.

The District of Columbia Water and Sewer Authority (DC Water), worked with Intel partner Wipro to develop Pipe Sleuth, an AI solution that uses deep learning- based computer vision to automate real-time analysis of video footage of the pipes. Pipe Sleuth was optimized for the Intel Distribution of OpenVINO toolkit and Intel Core i5, Intel Core i7 and Intel Xeon Scalable processors, and provided DC water with a highly efficient and accurate way to inspect their underground pipes for possible damage.

Open and interoperable standards are essential to deal with the ever-growing number of data sources and models. Different organizations and business groups will bring their own data and data scientists solving for disparate business objectives will need to bring their own models. Therefore, no single closed software ecosystem can ever be broad enough or future-proof to be the right choice.

As a founding member of the Python Data API consortium, Intel works closely with the community to establish standard data types that interoperate across the data pipeline and heterogeneous hardware, and foundational APIs that span across use cases, frameworks, and compute.

An open, interoperable, and extensible AI Compute platform helps solve todays bottlenecks in talent and data while laying the foundation for the ecosystem of tomorrow. As AI continues to pervade across domains and workloads, and new frontiers emerge, the need for end-to-end data science and AI pipelines that work well with external workflows and components is immense. Industry and community partnerships that build open, interoperable compute and software infrastructures are crucial to a brighter, scalable AI future for everyone.

Learn More: Intel AI, Intel AI on Medium

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and theyre always clearly marked. Content produced by our editorial team is never influenced by advertisers or sponsors in any way. For more information, contactsales@venturebeat.com.

See the article here:

Scaling AI and data science 10 smart ways to move from pilot to production - VentureBeat

Read More..

Top Data Science Jobs to Apply for this Weekend – Analytics Insight

Analytics Insight has selected the top data science jobs for applying this weekend.

Data science is an essential part of any industry today, given the massive amounts of data that are produced. Data science is one of the most debated topics in the industry these days. Its popularity has grown over the years, and companies have started implementing data science techniques to grow their business and increase customer satisfaction.

Location: Bengaluru, Karnataka

Human-led and tech-empowered since 2002, Walmart global tech delivers innovative solutions to the biggest retailer in the world Walmart. By leveraging emerging technologies, they create omnichannel shopping experiences for their customers, across the globe and help them save money and live better. The company is looking for an IN4 Data Scientist for Ad-tech. The position requires skills in building Data Science models for online advertising.

Apply here.

Location: India

Position Objective

The purpose of this role is to partner with the Regional and global BI customers within RSR (who can include but are not limited to Data engineering, BI Support Teams, Operational Teams, Internal teams, and RSR clients) and provide business solutions through data. This position has operational and technical responsibility for reporting, analytics, and visualization dashboards across all operating companies within RSR. This position will develop processes and strategies to consolidate, automate, and improve reporting and dashboards for external clients and internal stakeholders. As a Business Intelligence Partner you will be responsible for overseeing the end-to-end delivery of regional and global account and client BI reporting. This will include working with data engineering to provide usable datasets, create dashboards with meaningful insights & visualizations within our BI solution (DOMO), and ongoing communication and partnering with the BI consumers. The key to this is as a Business Intelligence Partner you will have commercial and operational expertise with the ability to translate data into insights. You will use this to mitigate risk, find operational and revenue-generating opportunities and provide business solutions.

Apply here.

Location: Hyderabad, Telangana

As the Data Scientist role within the Global Shared Services and Office of Transformation group at Salesforce, you will work cross-functionally with business stakeholders throughout the organization to drive data-driven decisions. This individual must excel in data and statistical analysis, predictive modeling, process optimization, building relationships in business and IT functions, problem-solving, and communication. S/he must act independently and own the implementation and impacts of assigned projects, and demonstrate the ability to be successful in an unstructured, team-oriented environment. The ideal candidate will have experience working with large, complex data sets, experience in the technology industry, exceptional analytical skills, and experience in developing technical solutions.

Responsibilities

Partner with Shared Services Stakeholder organizations to understand their business needs and utilize advanced analytics to derive actionable insights

Find creative solutions to challenging problems using a blend of business context, statistical and ML techniques

Understand data infrastructure and validate data is cleansed and accurate for reporting requirements.

Work closely with the Business Intelligence team to derive data patterns/trends and create statistical models for predictive and scenario analytics

Communicate insights utilizing Salesforce data visualization tools (Tableau CRM and Tableau) and make business recommendations (cost-benefit, invest-divest, forecasting, impact analysis) with effective presentations of findings at multiple levels of stakeholders through visual displays of quantitative information

Partner cross-functionally with other business application owners on streamlining and automating reporting methods for Shared Services management and stakeholders.

Support the global business intelligence agenda and processes to make sure we provide consistent and accurate data across the organization

Collaborate with cross-functional stakeholders to understand their business needs, formulate a roadmap of project activity that leads to measurable improvement in business performance metrics/key performance indicators (KPIs) over time.

Apply here.

Job Description:

Gathers data, analyses, and reports findings. Gathers data using existing formats and will suggest changes to these formats. Resolves disputes and acts as an SME, first escalation level.

Conducts analyses to solve repetitive or patterned information and data queries/problems.

Works within a variety of well-defined procedures and practices. Supervised progress and results; inform management about analysis outcomes. Works autonomously within this scope, with regular steer required e.g., on project scope and prioritization.

Supports stakeholders in understanding analyses/outcomes and using them on a topic related to own their areas of expertise. Interaction with others demands influencing and persuasion in a tactful manner to explain and advise on performed analyses of information.

Job holder identifies shortcomings in current processes, systems, and procedures within the assigned unit and suggests improvements. Analyses propose and (where possible) implements alternatives.

Apply here.

Responsibilities

Develop analytical models to estimate annual, monthly, daily platform returns and other key metrics and weekly tracking of AOP vs actual returns performance.

Monitor key OKR metrics across the organization for all departments. Work closely with BI teams to maintain OKR dashboards across the organization

Work with Business teams (Rev, Mktg, category), etc. on preliminary hypothesis evaluation on returns leakages/inefficiencies in the system (category, rev & pricing constructs, etc.)

Regular analysis and experimentation to find areas of improvement in returns maintaining a highly data-backed approach Maintain monthly reporting and track of the SNOP process

Influence various teams/stakeholders within the organization to meet goals & planning timelines.

Qualifications & Experience

B Tech/BE in Computer Science or equivalent from a tier 1 college with 1-3 years of experience.

Problem-solving skills the ability to break a problem down into smaller parts and develop a solution approach with an appreciation for Math and Business.

Strong analytical bent of mind with strong communication/persuasion skills.

Demonstrated ability to work independently in a highly demanding and ambiguous environment.

Strong attention to detail and exceptional organizational skills.

Strong knowledge of SQL, advanced Excel, R

Apply here.

The rest is here:

Top Data Science Jobs to Apply for this Weekend - Analytics Insight

Read More..

Beware the 1% view of data science – ComputerWeekly.com

This is a guest blogpost by Shaun McGirr, AI Evangelist, Dataiku

As data science and AI become more widely used, two separate avenues of innovation are becoming clear. One avenue, written about and discussed publicly by individuals working at Google, Facebook and peer companies, depends on access to effectively infinite resources.

This generates a problem for further democratisation of AI: success stories told by the top echelon of data companies drown out the second avenue of innovation. There, smaller-scale data teams deliver stellar work in their own right, without the benefit of unlimited resources, and also need a share of the glory.

One thing is certain: a whole class of legacy IT issues dont plague global technology companies at anywhere near the scale of traditional enterprises. Some even staff entire data engineering teams to deliver ready-for-machine-learning data to data scientists, which is enough to make the other 99% of data scientists in the world salivate with envy.

Access to the right data, in a reasonable time frame, is still a top barrier to success for most data scientists in traditional companies, and so the 1% served by dedicated data engineering teams might as well be from another planet!

Proudly analogue companies need to go on their own data journey on their own terms, said Henrik Gthberg, Founder and CEO of Dairdux, on the AI After Dark podcast. This highlights that what is right and good for the 1% of data scientists working at internet giants is unlikely to work for those having to innovate from the ground up, with limited resources. This 99% of data scientists must extract data, experiment, iterate and productionise all by themselves, often with inadequate tooling they must stitch together themselves based on the research projects of the 1%.

For example, one European retailer spent many months developing machine learning models written in Python (.py files) and run on the data scientists local machines. But eventually, the organisation needed a way to prevent interruptions or failure of the machine learning deployments.

As a first solution, they moved these .py files to Google Cloud Platform (GCP), and the outcome was well received by the business and technical teams in the organisation. However, once the number of models in production went from one to three and more, the team quickly realized the burden involved in maintaining models. There were too many disconnected datasets and Python files running on the virtual machine, and the team had no way to check or stop the machine learning pipeline.

Beyond these data scientists doing the hard yards to create value in traditional organisations, there is also the latent data population capable but hidden away who have real-world problems to solve but who are even further from being able to directly leverage the latest innovations. If these people can be empowered to create even a fraction of the value of the 1% of data scientists, their sheer number would mean the total value created for organisations and society would massively outweigh the latest technical innovations.

Achieving this massive scale, across many smaller victories, is the real value of data science to almost every individual and company.

Organisations dont need to be a Facebook to get started on an innovative and advanced data science or AI project. There is still a whole chunk of the data science world (and its respective innovations) that is going unseen, and its time to give this second avenue of innovation its due.

Go here to read the rest:

Beware the 1% view of data science - ComputerWeekly.com

Read More..

Thickness and structure of the martian crust from InSight seismic data – Science Magazine

Single seismometer structure

Because of the lack of direct seismic observations, the interior structure of Mars has been a mystery. Khan et al., Knapmeyer-Endrun et al., and Sthler et al. used recently detected marsquakes from the seismometer deployed during the InSight mission to map the interior of Mars (see the Perspective by Cottaar and Koelemeijer). Mars likely has a 24- to 72-kilometer-thick crust with a very deep lithosphere close to 500 kilometers. Similar to the Earth, a low-velocity layer probably exists beneath the lithosphere. The crust of Mars is likely highly enriched in radioactive elements that help to heat this layer at the expense of the interior. The core of Mars is liquid and large, 1830 kilometers, which means that the mantle has only one rocky layer rather than two like the Earth has. These results provide a preliminary structure of Mars that helps to constrain the different theories explaining the chemistry and internal dynamics of the planet.

Science, abf2966, abf8966, abi7730, this issue p. 434, p. 438, p. 443 see also abj8914, p. 388

A planets crust bears witness to the history of planetary formation and evolution, but for Mars, no absolute measurement of crustal thickness has been available. Here, we determine the structure of the crust beneath the InSight landing site on Mars using both marsquake recordings and the ambient wavefield. By analyzing seismic phases that are reflected and converted at subsurface interfaces, we find that the observations are consistent with models with at least two and possibly three interfaces. If the second interface is the boundary of the crust, the thickness is 20 5 kilometers, whereas if the third interface is the boundary, the thickness is 39 8 kilometers. Global maps of gravity and topography allow extrapolation of this point measurement to the whole planet, showing that the average thickness of the martian crust lies between 24 and 72 kilometers. Independent bulk composition and geodynamic constraints show that the thicker model is consistent with the abundances of crustal heat-producing elements observed for the shallow surface, whereas the thinner model requires greater concentration at depth.

Read the original post:

Thickness and structure of the martian crust from InSight seismic data - Science Magazine

Read More..