Page 2,615«..1020..2,6142,6152,6162,617..2,6202,630..»

The Poisson Process and Poisson Distribution, Explained (With Meteors!) – Built In

Do you know the real tragedy of statistics education in most schools? Its boring! Teachers spend hours wading through derivations, equations, and theorems. Then, when you finally get to the best part applying concepts to actual numbers its with irrelevant, unimaginative examples like rolling dice. Its a shame because stats can be engaging if you skip the derivations (which youll likely never need) and focus on using the concepts to solve interesting problems.

So lets look at Poisson processes and the Poisson distribution, two important probability concepts. After highlighting the relevant theory, well work through a real-world example.

A Poisson process is a model for a series of discrete events where the average time between events is known, but the exact timing of events is random. The arrival of an event is independent of the event before (waiting time between events is memoryless). For example, suppose we own a website that our content delivery network (CDN) tells us goes down on average once per 60 days, but one failure doesnt affect the probability of the next. All we know is the average time between failures. The failures are a Poisson process that looks like:

We know the average time between events, but the events are randomly spaced in time (stochastic). We might have back-to-back failures, but we could also go years between failures because the process is stochastic.

A Poisson process meets the following criteria (in reality, many phenomena modeled as Poisson processes dont precisely match these but can be approximated as such):

The last point events are not simultaneous means we can think of each sub-interval in a Poisson process as a Bernoulli Trial, that is, either a success or a failure. With our website, the entire interval in consideration is 60 days, but each with sub-interval (one day) our website either goes down or it doesnt.

Common examples of Poisson processes are customers calling a help center, visitors to a website, radioactive decay in atoms, photons arriving at a space telescopeand movements in a stock price. Poisson processes are generally associated with time, but they dont have to be. In the case of stock prices, we might know the average movements per day (events per time), but we could also have a Poisson process for the number of trees in an acre (events per area).

One example of a Poisson process we often see is bus arrivals (or trains). However, this isnt a proper Poisson process because the arrivals arent independent of one another. Even for bus systems that run on time, a late arrival from one buscan impact the next buss arrival time. Jake VanderPlas has a great article on applying a Poisson process to bus arrival times which works better with made-up data than real-world data.

More From Will KoehrsenUse Precision and Recall to Evaluate Your Classification Model

The Poisson process is the model we use for describing randomly occurring events and, by itself, isnt that useful. We need the Poisson distribution to do interesting things like find the probability of a given number of events in a time period or find the probability of waiting some time until the next event.

The Poisson distribution probability mass function (pmf) gives the probability of observing k events in a time period given the length of the period and the average events per time:

The pmf is a little convoluted, and we can simplify events/time * time period into a single parameter, lambda (), the rate parameter. With this substitution, the Poisson Distribution probability function now has one parameter:

We can think of lambda as the expected number of events in the interval. (Well switch to calling this an interval because, remember, the Poisson process doesnt always use a time period). I like to write out lambda to remind myself the rate parameter is a function of both the average events per time and the length of the time period, but youll most commonly see it as above. (The discrete nature of the Poisson distribution is why this is a probability mass function and not a density function.)

As we change the rate parameter, , we change the probability of seeing different numbers of events in one interval. The graph below is the probability mass function of the Poisson distribution and shows the probability (y-axis) of a number of events (x-axis) occurring in one interval with different rate parameters.

The most likely number of events in one interval for each curve is the curves rate parameter. This makes sense because the rate parameter is the expected number of events in one interval. Therefore, the rate parameter represents the number of events with the greatest probability when the rate parameter is an integer. When the rate parameter is not an integer, the highest probability number of events will be the nearest integer to the rate parameter. (The rate parameter is also the mean and variance of the distribution, which dont need to be integers.)

We can use the Poisson distribution pmf to find the probability of observing a number of events over an interval generated by a Poisson process. Another use of the mass function equation (as well see later)is to find the probability of waiting a given amount of time between events.

Learn More From Our Data Science ExpertsWhat Is Multiple Regression?

We could continue with website failures to illustrate a problem solvable with a Poisson distribution, but I propose something grander. When I was a child, my father would sometimes take me into our yard to observe (or try to observe) meteor showers. We werent space geeks, but watching objects from outer space burn up in the sky was enough to get us outside, even though meteor showers always seemed to occur in the coldest months.

We can model the number of meteors seen as a Poisson distribution because the meteors are independent, the average number of meteors per hour is constant (in the short term), and this is an approximation meteors dont occur at the same time.

All we need to characterize the Poisson distribution is the rate parameter, the number of events per interval * interval length. In a typical meteor shower, we can expect five meteors per hour on average or one every 12 minutes. Due to the limited patience of a young child (especially on a freezing night), we never stayed out more than 60 minutes, so well use that as the time period. From these values, we get:

Five meteors expected mean that is the most likely number of meteors wed observe in an hour. According to my pessimistic dad, that meant wed see three meteors in an hour, tops. To test his prediction against the model, we can use the Poisson pmf distribution to find the probability of seeing exactly three meteors in one hour:

We get 14 percent or about 1/7. If we went outside and observed for one hour every night for a week, then we could expect my dad to be right once! We can use other values in the equation to get the probability of different numbers of events and construct the pmf distribution. Doing this by hand is tedious, so well use Python calculation and visualization (which you can see in this Jupyter Notebook).

The graph below shows the probability mass function for the number of meteors in an hour with an average of 12 minutes between meteors, the rate parameter (which is the same as saying five meteors expected in an hour).

The most likely number of meteors is five, the rate parameter of the distribution. (Due to a quirk of the numbers, four and fivehave the same probability, 18 percent). There is one most likely value as with any distribution, but there is also a wide range of possible values. For example, we could see zero meteors or see more than 10 in one hour. To find the probabilities of these events, we use the same equation but, this time, calculate sums of probabilities (see notebook for details).

We already calculated the chance of seeing precisely three meteors as about 14 percent. The chance of seeing three or fewer meteors in one hour is 27 percent which means the probability of seeing more than 3 is 73 percent. Likewise, the probability of more than five meteors is 38.4 percent, while we could expect to see five or fewer meteors in 61.6 percent of hours. Although its small, there is a 1.4 percent chance of observing more than ten meteors in an hour!

To visualize these possible scenarios, we can run an experiment by having our sister record the number of meteors she sees every hour for 10,000 hours. The results are in the histogram below:

(This is just a simulation. No sisters were harmed in the making of thisarticle.)

On a few lucky nights, wed see 10or more meteors in an hour, although more often, wed see four or fivemeteors.

The rate parameter, , is the only number we need to define the Poisson distribution. However, since its a product of two parts (events/interval * interval length), there are two ways to change it: we can increase or decrease the events/interval, and we can increase or decrease the interval length.

First, lets change the rate parameter by increasing or decreasing the number of meteors per hour to see how those shifts affect the distribution. For this graph, were keeping the time period constant at 60 minutes.

In each case, the most likely number of meteors in one hour is the expected number of meteors, the rate parameter. For example, at 12 meteors per hour (MPH), our rate parameter is 12, and theres an 11 percent chance of observing exactly 12 meteors in one hour. If our rate parameter increases, we should expect to see more meteors per hour.

Another option is to increase or decrease the interval length. Heres the same plot, but this time were keeping the number of meteors per hour constant at five and changing the length of time we observe.

Its no surprise that we expect to see more meteors the longer we stay out.

Improve Your Data Visualization Skills7 Ways to Tell Powerful Stories With Your Data Visualization

An intriguing part of a Poisson process involves figuring out how long we have to wait until the next event (sometimes called the interarrival time). Consider the situation: meteors appear once every 12 minutes on average. How long can we expect to wait to see the next meteor if we arrive at a random time? My dad always (this time optimistically) claimed we only had to wait six minutes for the first meteor, which agrees with our intuition. Lets use statistics to see if our intuition is correct.

I wont go into the derivation (it comes from the probability mass function equation), but the time we can expect to wait between events is a decaying exponential. The probability of waiting a given amount of time between successive events decreases exponentially as time increases. The following equation shows the probability of waiting more than a specified time.

With our example, we have one event per 12 minutes, and if we plug in the numbers, we get a 60.65 percent chance of waiting more than six minutes. So much for my dads guess! We can expect to wait more than 30 minutes, about 8.2 percent of the time. (Note this is the time between each successive pair of events. The waiting times between events are memoryless, so the time between two events has no effect on the time between any other events. This memorylessness is also known as the Markov property).

A graph helps us to visualize the exponentially decaying probability of waiting time:

There is a 100 percent chance of waiting more than zero minutes, which drops off to a near-zero percent chance of waiting more than 80 minutes. Again, as this is a distribution, theres a wide range of possible interarrival times.

Rearranging the equation, we can use it to find the probability of waiting less than or equal to a time:

We can expect to wait six minutes or less to see a meteor 39.4 percent of the time. We can also find the probability of waiting a length of time: Theres a 57.72 percent probability of waiting between 5 and 30 minutes to see the next meteor.

To visualize the distribution of waiting times, we can once again run a (simulated) experiment. We simulate watching for 100,000 minutes with an average rate of one meteor per 12 minutes. Then we find the waiting time between each meteor we see and plot the distribution.

The most likely waiting time is one minute, but thats distinct from the average waiting time. Lets try to answer the question: On average, how long can we expect to wait between meteor observations?

To answer the average waiting time question, well run 10,000 separate trials, each time watching the sky for 100,000 minutes, and record the time between each meteor. The graph below shows the distribution of the average waiting time between meteors from these trials:

The average of the 10,000 runs is 12.003 minutes. Surprisingly, this average is also the average waiting time to see the first meteor if we arrive at a random time. At first, this may seem counterintuitive: if events occur on average every 12 minutes, then why do we have to wait the entire 12 minutes before seeing one event? The answer is we are calculating an average waiting time, taking into account all possible situations.

If the meteors came precisely every 12 minutes with no randomness in arrivals, then the average time wed have to wait to see the first one would be six minutes. However, because waiting time is an exponential distribution, sometimes we show up and have to wait an hour, which outweighs the more frequent times when we wait fewer than 12 minutes. The average time to see the first meteor averaged over all the occurrences will be the same as the average time between events. The average first event waiting time in a Poisson process is known as the Waiting Time Paradox.

As a final visualization, lets do a random simulation of one hour of observation.

Well, this time we got precisely the result we expected: five meteors. We had to wait 15 minutes for the first one then 12 minutes for the next. In this case, itd be worth going out of the house for celestial observation!

The next time you find yourself losing focus in statistics, you have my permission to stop paying attention to the teacher. Instead, find an interesting problem and solve it using the statistics youre trying to learn. Applying technical concepts helps you learn the material and better appreciate how stats help us understand the world. Above all, stay curious: There are many amazing phenomena in the world, and data science is an excellent tool for exploring them.

This article was originally published on Towards Data Science.

More:

The Poisson Process and Poisson Distribution, Explained (With Meteors!) - Built In

Read More..

Understanding the Role and Attributes of Data Access Governance in Data Science & Analytics – Analytics Insight

Understanding the Role and Attributes of Data Access Governance in Data Science & Analytics

Data scientists and business analysts need to not only find answers to their questions by querying data in various repositories, but also transform it in order to build sophisticated analysis and models. Read and write operations are at the heart of the data science process and are essential to helping them make quick and highly informed decision-making. It is also an imperative capability for data infrastructure teams that are tasked with democratizing data while complying with privacy and industry regulations.

Understanding and meeting the necessary components for both groups require a data governance platform capable of accelerating the data sharing process to satisfy the unique requirements of the data consumers, while ensuring the organization as a whole is remaining in compliance with regulations such as GDPR, CCPA, LGPD, and HIPAA.

Data is the raw material for any type of analytics whether it is related to the historical analysis presented in reports and dashboards by business analysts, or predictive analysis that involves building a model by data scientists that anticipates an event or behavior that has not yet occurred. To be truly useful, the raw information that forms the basis of reports and dashboards must be converted into data ready for consumption so business analysts can create reports, dashboards, and visualizations to paint a picture of the overall health of the organization.

Data scientists too can benefit from converted data as they can now leverage it to build and train statistical models using techniques such as linear regression, logistic regression, clustering, and time series. The output of which can be used to automate decision-making using sophisticated techniques such as machine learning.

But this task is becoming increasingly difficult due to the rise in compliance regulations such as GDPR, CCPA, LGPD, and HIPAA and the need for organizations to secure sensitive data across multiple cloud services. In fact, according to Gartners Hype Cycle for Privacy, 2021 report[1], By year-end 2023, 75% of the worlds population will have its personal data covered under modern privacy regulations, up from 25% todayand that before year-end 2023, more than 80% of companies worldwide will be facing at least one privacy-focused data protection regulation.

Because data analytics is an exploratory exercise, it requires data consumers such as business analysts and data scientists to analyze large bodies of data to reveal patterns, behaviors, or insights to inform some decision-making process. Machine learning, on the other hand, specifically attempts to understand the features with the biggest influence on the target variable. This requires access to a large amount of data that may contain sensitive elements, personally identifiable information (PII) such as a persons age, social security number, address, etc.

In many instances, this data is owned by different business units and is subjected to strict data sharing agreements; presenting infrastructure teams with unique challenges such as balancing the need to provide data consumers with access to enterprise data at the required granularity while complying with privacy regulations and requirements set by the actual data owners themselves. Another major challenge for the data infrastructure team is to support the rapid demand for data by the data science team for their analytics and innovation projects.

Data science requires not only reading data but also updating it in the above-mentioned preprocessing steps. Put simply, data science by nature is a read and write-intensive activity. To address this, data infrastructure teams usually create sandbox instances for these data consumers whenever they start a new project. However, these too require robust data access governance so as to not expose any sensitive or confidential data during data exploration.

According to the previously mentioned, Gartner Hype Cycle for Privacy, 2021 report, through 2024, privacy-driven spending on data protection and compliance technology will breakthrough to more than $15 billion worldwide. To support the growing data science activities in a company, data infrastructure teams need to implement a unified data access governance platform that has four important attributes:

Enterprises can only thrive in this economy if data can flow to the far reaches of the organization to help make decisions that improve the companys profitability and competitive position. However, every company must share data with proper guardrails in place so that only authorized personnel can access the required data. This is mandated by an ever-increasing list of privacy regulations, as well as to foster the trust that customers have placed with the company. A data governance solution that companies need to securely extract insights from their data must support both read and write operations, as well as automate the process of identifying and classifying sensitive data, take action on it by encrypting it, and providing visibility into the companys data ecosystem.

Balaji Ganesan is CEO and co-founder of both Privacera, the cloud data governance and security leader, and XA Secure, which was acquired by Hortonworks. He is an Apache Ranger committer and member of its project management committee (PMC). To learn more visit http://www.privacera.com or follow the company on Twitter.

Share This ArticleDo the sharing thingy

About AuthorMore info about author

Analytics Insight is an influential platform dedicated to insights, trends, and opinions from the world of data-driven technologies. It monitors developments, recognition, and achievements made by Artificial Intelligence, Big Data and Analytics companies across the globe.

The rest is here:

Understanding the Role and Attributes of Data Access Governance in Data Science & Analytics - Analytics Insight

Read More..

Groundbreakers: U of T’s Data Sciences Institute to help researchers find answers to their biggest questions – News@UofT

When University of Toronto astronomer Bryan Gaensler looks up at the night sky, he doesnt just see stars he sees data. Big data.

So big, in fact, that his current research tracking the baffling fast radio bursts (FRBs) that bombard Earth from across the universe requires the capture of more data per second than all of Canadas internet traffic.

This is probably the most exciting thing in astronomy right now, and its a complete mystery, says Gaensler, director of U of Ts Dunlap Institute for Astronomy & Astrophysics and Canada Research Chair in Radio Astronomy. Randomly, maybe once a minute, theres this incredibly bright flash of radio waves like a one-millisecond burst of static from random directions all over the sky.

We now know that theyre from very large distances, up to billions of light-years, so they must be incredibly powerful to be able to be seen this far away.

U of T is a world leader in finding FRBs, using the multi-university CHIME radio telescope in British Columbias Okanagan region and a U of T supercomputer. Yet, despite the impressive technology, many daunting challenges remain.

Its a massive computational and processing problem that is holding us back, he says. We are recording more than the entire internet of Canada, every day, every second. And because theres no hard drive big enough or fast enough to actually save that data, we end up throwing most of it away. We would obviously like to better handle the data, so that needs better equipment and better algorithms and just better ways of thinking about the data.

With the creation of U of Ts Data Sciences Institute (DSI), Gaensler and his colleagues now have a new place to turn to for help. The institute, which is holding a launch event tomorrow,is designed to help the universitys wealth of academic experts in a variety of disciplines team up with statisticians, computer scientists, data engineers and other digital experts to create powerful research results that can solve a wide range of problems from shedding light on interstellar mysteries to finding life-saving genetic therapies.

The way forward is to bring together new teams of astronomers, computer scientists, artificial intelligence experts and statisticians who can come up with fresh approaches optimized to answer specific scientific questions that we currently dont know how to address, Gaensler says.

The Data Sciences Institute is just one of nearly two dozen Institutional Strategic Initiatives (ISI) launched by U of T to address complex, real-world challenges that cut across fields of expertise. Each initiative brings together a flexible, multidisciplinary team of researchers, students and partners from industry, government and the community to take on a grand challenge.

Were bringing together individuals at the intersection of traditional disciplinary fields and computational and data sciences, says Lisa Strug, director of the Data Sciences Institute and a professor in the departments of statistical sciences and computer science in the Faculty of Arts & Science, and a senior scientist at the Hospital for Sick Children research institute.

She notes that U of T boasts world-leading experts in fields such as medicine, health, social sciences, astrophysics and the arts, and some of the top departments in the world in the cognate areas of data science like statistics, mathematics, computer science and engineering.

Data science techniques can be brought to bear on a near-infinite variety of academic questions from climate change to transportation, planning to art history. In literature, Strug says, many works from previous centuries are now being digitized, allowing data-based analysis right down to, say, sentence structure.

New fields of data science are emerging every day, says Strug, who oversees data-intensive genomics research in complex diseases such as cystic fibrosis that has led to the promise of new drugs to treat the debilitating lung disease. We have so much computational disciplinary strength we can leverage to define and advance these new fields.

We want to make sure that faculty have access to the cutting-edge tools and methodology that enable them to push the frontiers of their field forward. They may be answering questions they wouldnt have been able to ask before, without that data and without those tools.

A key function of the DSI is the creation and funding of Collaborative Research Teams (CRTs) of professors and students from a variety of disciplines who can work together on important projects with stable support.

Gaensler, who already has statisticians on his team, says hes looking to the CRTs to greatly expand the scope of his work.

We have just done the low-hanging fruit, he says. There are many deeper problems that we havent even started on.

Similarly, Laura Rosella, an associate professor at the Dalla Lana School of Public Health,says the collaborative teams will be a major asset for the university.

Were going to dedicate funding to these multi-disciplinary trainees and post-docs so we can start building a critical mass of people that can actually translate between these disciplines, she says. To solve problems, you need this connecting expertise.

Rosella played a key role in how Ontario dealt with COVID-19 in the early part of 2021. By analyzing anonymous cellphone data along with health information, she and her interdisciplinary team were able to see where people were moving and congregating, and then predict in advance likely clusters of the disease that would appear up to two weeks later. Her work helped support the provinces highly successful strategy of targeting so-called hotspots.

Weve been able to work with diverse data sources in order to generate insights that are used forhigh-level pandemic preparedness and planning, in ways that werent possible before, says Rosella, who sits on Ontarios COVID-19 Modelling Consensus Table. And weve also brought in new angles to the data around the social determinants of health that have shone a light on the policy measures that are needed to truly address disparities in COVID rates.

Rosellas population risk tools also include one for diabetes, which health systems can use to estimate the future burden of the disease and guide future planning. This includes inputs about the built environment. For example, if people can walk to a new transit stop, Rosella says, the increased exercise may have an impact on diabetes or other diseases. Potentially, even satellite imaging data could be brought into the prediction mix, she says.

In addition to advancing research in a given field, the Data Sciences Institute is also seeking to advance equity.

That includes tackling societal inequalities uncovered by data research including how socio-economic factors can determine who is more likely to get COVID-19 and the way the research itself is being conducted.

For example, Strug says most genomics studies have focused on participants of European origin, even though the genetic risk factors for various diseases can differ between different ethnicities.

We must make sure we develop and implement the models, tools and research designs and bring diverse sources of data together to ensure our understanding of disease risk is applicable to all, Strug says.

Many algorithms, or the data they use to make predictions, contain unconscious bias that may skew results which is why Strug says transparency is vital both to support equity and to ensure studies can be reproduced properly.

Gaensler says its critical to ensure diversity among researchers, too.

My department looks very different from the faces that I see on the subway, he says. Its not a random sampling of Canadian society its very male, white and old, and thats a problem we need to work on.

Strug hopes the Data Sciences Institute will ultimately become a nucleus for researchers across the university and beyond.

Theres never been one entrance to the university to guide people, so its so important for us to be that front door, she says.

We will make every effort to stay abreast of the different fantastic things that are happening in data sciences and be able to direct people to the right place, as well as provide an inclusive, welcoming and inspiring academic home.

Link:

Groundbreakers: U of T's Data Sciences Institute to help researchers find answers to their biggest questions - News@UofT

Read More..

Top 15 Tools Every Data Scientist Should Bring to Work – Analytics Insight

Data science and data scientists job market is constantly evolving. Every year, there are so many new things to learn. While some tools rise and others fall into oblivion, it becomes highly essential for a data scientist to keep up with the trends and have the necessary knowledge and skills to use all the tools that make their job easier.

Here are the top 15 tools that every data scientist should bring to work to become more effective at their job.

For a data scientist, their mind is one of the best tools that keep them one step ahead of the competition. Because data science is the field where you have to deal with different roadblocks, bugs, and unexpected issues every day. Therefore, if you do not have problem-solving skills, it will become difficult for you to continue with your work.

Programming languages allow data scientists to easily communicate with computers and machines. They dont need to be the best developers ever, but data scientists should be strong at it. Python, R, Julia, and SQL, and more, are the programming languages that are widely used by data scientists.

This convenient data science tool is an undertaking grade arrangement that tends to each expected requirement for AI and machine learning. With DataRobot, data scientists get everything rolling with just a few clicks and support their organizations with components, for example, robotized AI or time-series, AI tasks, and more.

TensorFlow is crucial if you are interested in artificial intelligence, deep learning, and machine learning. Built by Google, TensorFlow is essentially a library that helps data scientists to assemble and prepare models, etc.

With the help of Knime, data scientists can integrate elements like machine learning or data mining into data sets and create visual data pipelines, models, and interactive views. They can also perform the extraction, transformation, and loading of data with the intuitive GUI.

In data science, statistics and probability are crucial. This tool help data analysts to understand what they are working with and guide their exploration in the right direction. Understanding details additionally guarantee that the analysis is valid and there are no logical errors.

Companies always give priority to those data scientists that know machine learning. AI and machine learning give data scientists the power to analyze large volumes of data using data-driven models and algorithms aided with automation.

Data science involves a lot of precise communication, therefore having the ability to tell a detailed story with data becomes very important. In that case, data visualization might be essential to your work as analysts depend on graphs and charts to make their theories or findings easier to understand.

RapidMiner is used to prepare models from the initial preparation of data to the very last steps, for example, analyzing the deployed model. Being an end-to-end data science package, RapidMiner offers massive help in areas like text mining, predictive analytics, deep learning, and machine learning.

Python is one the most powerful programming languages for data science because of its vast collection of libraries like Matplotlib and integration with other languages. Matplotlibs simple GUI allows data scientists to create attractive data visualizations. Thanks to multiple export options, data scientist can take their custom graph to the platform of their choice easily.

D3.js allows data scientists to use functionalities for creating data analytics and dynamic visualizations inside browsers and it also uses animated transitions. By combining D3.js with CSS, a data scientist can create beautiful transitory visualizations that assist in implementing customized graphs on web pages.

For simulating fuzzy logic and neural networks, every data scientist makes use of MATLAB. It is a multi-paradigm numerical computing environment that assists in processing mathematical information. MATLAB is a closed-source program that makes it easier to carry out tasks like algorithmic implementation and statistical modeling of data or matrix functions.

Excel is probably the most widely used data analysis tool because MS Excel not only comes in handy for spreadsheet calculations but also data processing, visualization, and carrying out complex calculations. For data scientists, Excel is one of the most powerful analytical tools.

Nowadays, organizations that focus on software development widely use SAS. It comes with many statistical libraries and tools that can be used for modeling and organizing data. SAS is a highly reliable language with strong support from the developers.

Apache Spark is one of the most used data science tools today. It was designed to deal with clump and stream processing. It offers data scientists numerous APIs that assistance to make rehashed admittance to information for AI purposes or capacity in SQL and others. Its most certainly is an enormous improvement over Hadoop, and it can perform multiple times faster than MapReduce.

Share This ArticleDo the sharing thingy

Read more from the original source:

Top 15 Tools Every Data Scientist Should Bring to Work - Analytics Insight

Read More..

Cisco : data scientists work with nonprofit partner Replate to improve food recovery and delivery to communities in need – Marketscreener.com

The Transformational Tech series highlights Cisco's nonprofit grant recipient that uses technology to help transform the lives of individuals and communities.

Artificial Intelligence (AI) and Machine Learning (ML) are utilized in many different industries. AI and ML create more efficient virtual healthcare visits and more intuitive online education platforms. They enhance agriculture through IoT devices to monitor soil health, and devise new ways for people to access banking and other financial services.

This type of technology can also be used to improve services that nonprofits provide to local communities. At Cisco, we have a proven track record of supporting nonprofits through our strategic social impact grants along with a strong culture of giving back. Cisco's AI for Good program brings these values together by connecting Cisco data science talent to nonprofits that do not have the resources to use AI/ML to meet their goals.

Cisco AI product manager and former data scientist Arya Taylor leads the AI for Good program. Arya shared, 'AI for Good is specifically dedicated to the data science community at Cisco. We heard that a lot of data scientists want to apply their skills to a problem for good.'

The AI for Good team constantly works to grow its network of nonprofit partners by engaging with the team that manages Cisco's social impact grants and by reaching out to nonprofits directly. One of the organizations that AI for Good volunteers support is Cisco nonprofit partner Replate. Based out of Oakland, California, Replate reduces food waste through a digital platform that makes it easy for companies to schedule on-demand pickups for their surplus food. Replate's food rescuers bring donated food to nonprofit partners who distribute it to people of all ages and backgrounds who are experiencing food insecurity.

Cisco data scientists use ML to forecast food supply and optimize Replate's operations

Cisco's AI for Good team spent six months working with Replate to develop a model that can forecast food supply to maximize food recovery and optimize their operations. Replate's staff met with Cisco's AI for Good team via WebEx to share more about their method of food recovery. Cisco data scientists first assessed the scope of Replate's needs and learned how they could best apply their skills in ML to make an impact.

This Cisco AI for Good project was led by data scientist Aarthi Janakiraman, who also served as cause champion-which means she led the project from start to finish to ensure the project's success. Other members of the project included data scientist Idris Kuti and ML operations expert David Meyer. The team looked at how Cisco's machine learning models would allow Replate to predict surplus food supply within their donor network.

Because Replate offers a variety of donor plans to their partners, it can be challenging to calculate availability and capacity. As a result, Cisco's data scientists developed an ML model that could predict the total pounds of food each donor would contribute on any given day. This more accurate prediction helps Replate's food rescuers, who deliver the food, as well as the nonprofit organizations that rely on meal delivery.

'Before our project started,' Arya explained, 'Replate was using a rules-based model with different thresholds that would determine the estimated amount. But there's no single threshold in machine learning that you can apply to every single donor; it just becomes more personalized to that donor and evolves as more data is collected. So, it works more like our brain, rather than a static generalization.'

Aarthi gave an example: 'Let's say there is a donor for Replate, and they tell us that next Friday they will be able to provide 60 trays of food. This number is often a skewed estimate-donations are typically from grocery stores, corporate cafeterias, or farmer's markets, who may not be able to provide an exact prediction due to the variability of consumption. Our model will take in different information about the donor and estimate a more accurate donation amount in pounds. That estimation will go into Replate's algorithm and match the food rescue task to the correct driver.'

By incorporating machine learning models, Replate can also predict donation volume for existing and new partners. The volume of a new donor's first pickup will be predicted based on data from donors in similar regions or industries. 'Such forecasting will make a significant difference in our operations and allow us to better fulfill our mission,' said Mehran Navabi, senior data scientist at Replate. 'Replate will implement these models into our codebase and integrate them within our existing routing algorithm. The algorithms will coalesce to automate driver-dispatches for each donor's pickup.'

Cisco and Replate: Working together to create lasting change

Replate's team met with Cisco data scientists for biweekly progress reports throughout the project lifecycle and discussed how they could advance their platform's technological capacities. The models that the AI for Good team created will enable smarter dispatching, which will allow a greater volume of food to be recovered and delivered to communities in need.

According to Mehran, one challenge for Replate is meeting the different needs and expectations of their nonprofit partners who serve diverse populations with varying capacities for food storage and meal distribution. Having a model to forecast food supply can reduce waste and help Replate connect food delivery tasks to the correct drivers to ensure as much food as possible will be given to those in need. The project may even increase the amount of surplus food that can be recovered by giving Replate the information needed to make smarter, predictive dispatching decisions.

Now, Cisco's AI for Good team is handing over the project to Replate and will leave them with a maintenance plan which will allow them to retrain the model on Google Cloud Platform. They also built out a service that will track the model's accuracy, so any adjustments can be made as time goes on.

'Working with Cisco's AI for Good team was incredible,' said Mehran. 'Their team was professional and knowledgeable. And overall, their communication was excellent. The partnership enabled Replate to build a fruitful and beneficial connection with the Cisco team and foster new approaches to the way we collect and interpret data.'

Share:

See original here:

Cisco : data scientists work with nonprofit partner Replate to improve food recovery and delivery to communities in need - Marketscreener.com

Read More..

Cogitativo Releases Visin, A First-Of-Its-Kind Machine Learning Tool Built to Tackle the Growing Deferred Care Crisis – PRNewswire

Machine Learning to assist in addressing deferred care crisis

"Millions of Americans have gone without critical screenings and treatment for 18 months, creating a deferred care crisis that requires immediate and proven solutions to support those in need," said Gary Velasquez, CEO of Cogitativo. "We believe Visin will play a vital role in preventing acute medical events for vulnerable individuals and enabling health care organizations to mitigate many of the challenges that are on the horizon."

Cogitativo's new solution comes as health care payors and providers are reporting a rise in medical needs among individuals who were unable to receive care during the pandemic, including those with chronic conditions like cardiovascular disease, chronic kidney disease, diabetes, HIV, and mental health challenges. In addition, many providers are also struggling to manage a surge in patient visits, with the virus continuing to spread at the same time that people are returning to medical facilities for appointments, screenings, and treatment.

Visin analyzes patient health records through the lens of peer-reviewed literature on disease progression, social determinants of health, climate change, and other relevant data sources to predict elevated risk for an acute clinical event. These temporal-based predictions will enable healthcare payors and providers to identify members and patients most likely to require greater medical attention in the months ahead. This information will, in turn, facilitate health care payors and providers to proactively conduct outreach and render prophylactic care to at-risk beneficiaries and offer individualized recommendations on preventive care.

A version of Cogitativo's new machine learning platform was used by a host of health care leaders and public health officials during the pandemic. For example, Blue Shield of California used it to deliver personalized care and support to vulnerable beneficiaries; it helped guide mobile vaccination efforts in the City of Compton, California; and it provided insights to the U.S. Department of Health and Human Services.

"Visin is the field-tested machine learning tool that so many health care payors have been waiting for, and it cannot come soon enough for those managing the fallout from the deferred care crisis," said Dr. Terry Gilliland, Chief Science Officer at Cogitativo and former Executive Vice President of Health Care Quality and Affordability at Blue Shield of California.

"Cogitativo's new machine learning tool can help physicians throughout the country identify their highest-risk patients and conduct proactive outreach, providing those patients the critical care and attention they need while also preventing unpredictable waves of patient visits that create capacity problems," said Dr. Hector Flores, Director of the Family Care Specialists Medical Group. Dr. Flores used a version of Visin during the pandemic to support his most vulnerable patients.

About Cogitativo Inc.Cogitativo is a Berkeley-based data science company founded in 2015 with a mission to create and implement innovative, scalable solutions to the most complex challenges facing the healthcare system. Leveraging machine learning, proprietary data sets, and expertise from leaders with decades of experience working with public health agencies, Cogitativo can deliver actionable insights and save lives. To date, Cogitativo has successfully applied data science solutions to more than 200 unique operational challenges to significantly improve the efficiency of our healthcare systems and protect vulnerable patients and communities. Visitwww.cogitativo.comfor more information.

Media Contact:Joshua Rosen[emailprotected]Phone: (610) 2473482

Company Contact:Amy Domangue[emailprotected] Phone: (225) 337 -6402

SOURCE Cogitativo, Inc.

http://www.cogitativo.com

Excerpt from:
Cogitativo Releases Visin, A First-Of-Its-Kind Machine Learning Tool Built to Tackle the Growing Deferred Care Crisis - PRNewswire

Read More..

Dive Deep Into Machine Learning With Over 75 Hours Of Expert Led Training – IGN SOUTH EAST ASIA

As we push forward into the future, it seems more and more certain that artificial intelligence and machine learning are going to be massive pieces of our collective future. Continuously producing and conceiving countless breakthroughs,new technologies, and industry-changing developments the world of AI and machine learning is rife with potential for new minds to help build and shape tomorrow. If you'd like to join the party, there's a lot to learn.

One way to dive deep into these pivotal technologies is to take advantage of this deal onThe Premium Machine Learning Artificial Intelligence Super Bundle, which is on sale for $36.99 (reg. $2,388). This nearly 80-hour collection of courses and lessons breaks down fundamental lessons on deep learning, machine learning, Python, and other development tools used to help grow these sections of the tech industry.

Upon subscribing and taking advantage of this incredible, you'll begin with Machine Learning with Python, which is a course that teaches you the fundamentals of machine learning with Python. In this practical, hands-on course you'll get the foundational lessons and examples on approaching data processing, linear regression, logistic regression, decision trees, and more. This course is taught by Juan E. Galvan, who is a top instructor, digital entrepreneur, and recipient of a 4.4/5 star instructor rating.

These are some of the other courses included in the attractive Premium Machine Learning Artificial Intelligence Super Bundle: The Machine Learning and Data Science Developer Certification Program, The Complete Machine Learning & Data Science with Python A-Z, and Deep Learning with Python. Each of these well-reviewed and well-curated courses will help you on your path to becoming an informed player in the growing world of AI and machine learning.

Don't miss your chance to grabThe Premium Machine Learning Artificial Intelligence Super Bundlefor only $36.99 (reg. $2,388).

Go here to see the original:
Dive Deep Into Machine Learning With Over 75 Hours Of Expert Led Training - IGN SOUTH EAST ASIA

Read More..

Golden Gate University and MetricStream Bring Together Machine Learning and Edge Computing to Assess and Mitigate Risk in Enterprise Business…

SAN FRANCISCO, Sept. 16, 2021 /PRNewswire/ -- MetricStream, the industry leader in supporting the Governance, Risk, and Compliance (GRC) space, and Golden Gate University, announced the successful completion of the first phase of their "DeepEdge" project, using emerging technologies to bring innovation to business solutions.

The project started in 2019 with the goal of letting GGU faculty and graduate students in the MS in Business Analytics and MS in Information Technologies programs partner with MetricStream employees to develop new risk management solutions. The teams set out to use emerging model-based AI augmented with Machine Learning, Elastic Edge Computing, Agile methodologies maturing to DevOps, and Zero-touch Self-Managing service orchestration. The teams have successfully implemented the first application to assess and mitigate risk in enterprise contract management process, resulting in MetricStream adopting it as a part of their product suite.

"Contracts are legally binding agreements," said Vidya Phalke, Chief Technology Evangelist at MetricStream. "Knowing the obligations for every contract, monitoring and assuring compliance is labor-intensive process and error-prone. The DeepEdge project uses model-based AI, machine learning and automation of extracting the knowledge of the obligations for every contract. It integrates with processes already in place and improves monitoring and contract obligation fulfillment at scale. This type of industry-academia collaboration is what is needed to power what is next in the post-pandemic world."

Judith Lee, Business Innovation & Technology department chair, said the project "allowed graduate students and GGU faculty to work jointly with MetricStream to push the boundaries of machine learning and edge computing technologies."

"We chose edge computing for security and data privacy reasons, and the deployment was facilitated by a zero-touch operations environment supported by Platina Systems," said Ross Millerick, program director, MS/IT Management. "It allowed us to remotely access the infrastructure at MetricStream during the Covid pandemic, when our laboratory on campus was not available."

"Bringing together thought leadership in AI that goes beyond deep learning and edge computing allows us to teach our students how to push the boundaries with federated AI and edge computing" said Rao Mikkilineni, distinguished adjunct professor.

The project spanned five terms and a succession of students. The students completed their capstone obligation with the project output with support from MetricStream. The project will continue to drive innovation in various enterprise business processes. Its vision is to build a long-term mutually beneficial partnership between the GGU business school and MetricStream, to inform the surrounding business community about the importance of GRC, and to provide an ongoing local forum for dialogue and education.

Leveraging the power of AI, MetricStream is the global market leader in Governance, Risk, and Compliance and Integrated Risk Management solutions, providing the most comprehensive solutions for Enterprise and Operational Risk, Regulatory Compliance, Internal Audit, IT and Cyber Risk and Third-Party Risk Management on one single integrated platform.

Golden Gate University, a private nonprofit, has been helping adults achieve their professional goals by providing undergraduate and graduate education in accounting, law, taxation, business and related areas since 1901. Programs offer maximum flexibility with evening, weekend and online options. GGU is accredited by the American Bar Association (ABA) and the WASC Senior College and University Commission.

Media Contacts: For MetricStream Amy Rhodes, [emailprotected]; For GGU: Judith Lee [emailprotected],edu,Michael Bazeley, [emailprotected]

SOURCE Golden Gate University

Home Page

Go here to see the original:
Golden Gate University and MetricStream Bring Together Machine Learning and Edge Computing to Assess and Mitigate Risk in Enterprise Business...

Read More..

Artificial Intelligence in Medicine | IBM

Artificial intelligence in medicine is the use of machine learning models to search medical data and uncover insights to help improve health outcomes and patient experiences. Thanks to recent advances in computer science and informatics, artificial intelligence (AI) is quickly becoming an integral part of modern healthcare. AI algorithms and other applications powered by AI are being used to support medical professionals in clinical settings and in ongoing research.

Currently, the most common roles for AI in medical settings are clinical decision support and imaging analysis. Clinical decision support tools help providers make decisions about treatments, medications, mental health and other patient needs by providing them with quick access to information or research that's relevant to their patient. In medical imaging, AI tools are being used to analyze CT scans, x-rays, MRIs and other images for lesions or other findings that a human radiologist might miss.

The challenges that the COVID-19 pandemic created for many health systems also led many healthcare organizations around the world to start field-testing new AI-supported technologies, such as algorithms designed to help monitor patients and AI-powered tools to screen COVID-19 patients.

The research and results of these tests are still being gathered, and the overall standards for the use AI in medicine are still being defined. Yet opportunities for AI to benefit clinicians, researchers and the patients they serve are steadily increasing. At this point, there is little doubt that AI will become a core part of the digital health systems that shape and support modern medicine.

Read the original post:
Artificial Intelligence in Medicine | IBM

Read More..

Artificial Intelligence: Implications for Business Strategy

This online program from the MIT Sloan School of Management and the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) challenges common misconceptions surrounding AI and will equip and encourage you to embrace AI as part of a transformative toolkit. With a focus on the organizational and managerial implications of these technologies, rather than on their technical aspects, youll leave this course armed with the knowledge and confidence you need to pioneer its successful integration in business.

What is artificial intelligence (AI)? What does it mean for business? And how can your company take advantage of it? This online program, designed by the MIT Sloan School of Management and the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), will help you answer these questions.

Through an engaging mix of introductions to key technologies, business insights, case examples, and your own business-focused project, your learning journey will bring into sharp focus the reality of central AI technologies today and how they can be harnessed to support your business needs.

Focusing on key AI technologies, such as machine learning, natural language processing, and robotics, the course will help you understand the implications of these new technologies for business strategy, as well as the economic and societal issues they raise. MIT expert instructors examine how artificial intelligence will complement and strengthen our workforce rather than just eliminate jobs. Additionally, the program will emphasize how the collective intelligence of people and computers together can solve business problems that not long ago were considered impossible.

You will receive a certificate of course completion at the conclusion of this course. You may also be interested in our Executive Certificates which are designed around a central themed track and consist of several courses. Learn more.

Learn more about the GetSmarter course experience.

Learn more about GetSmarter technical requirements.

See the rest here:
Artificial Intelligence: Implications for Business Strategy

Read More..