Page 2,100«..1020..2,0992,1002,1012,102..2,1102,120..»

Attract Data Science Talent with These 3 Tips – InformationWeek

Organizations are at a crossroads. The demand for data science talent has surged in recent years alongside widespread artificial intelligence and machine learning adoption, technology advancements, and businesses seeking to scale with data. As the race for innovation shows no signs of stopping, filling these open positions requires organizations to go beyond salary and traditional workplace benefits to attractand retaindata science and software engineering talent. By rethinking their approaches to recruiting, hiring, and managing employees, organizations can better identify the intangible aspects of holding a job, like culture and workplace autonomy, that employees have come to value.

Here are three ways organizations can attract data science and software engineering candidates today:

Amid the Great Resignation and Great Reshuffle, millions of Americans have quit their jobs to find professional opportunities that better align with a diverse list of desired optionsone of which is growth opportunity. Understanding how they can grow in their roles has always been important to employees who want clarity regarding their career trajectories. This requires a commitment from employers to support their employees professional development and provide transparency around their potential growth paths within the company.

Leaders should create clear career mapping tracks for individual contributors and managers, allowing each to see the steps necessary to reach their goals. If career trajectories are communicated from the start, organizations will be enabled to welcome the best data and software practitioners worldwide while embracing a culture of trust and transparency.

Encourage data scientists and engineers to join community forums and attend industry events for deeper professional development opportunities. Opportunities like these allow companies to promote individual growth and network with potential candidates within the community.

Salary and insurance offerings are two of the most basic employee benefits companies default to. While these are essential elements for attracting and retaining talent, organizations should think outside of the box for other creative perks desired by todays candidates.

Increasingly, organizations that prioritize employees' mental, physical, and emotional well-being are at a competitive advantage over others. For example, with burnout becoming a common concern among industries, companies should consider company-wide designated days off to relax and recharge. Work perks like quarterly wellness stipends, no-meeting days, or subscriptions to companies like Talkspace or Calm all show employees how much the organization cares for them. Its also important to provide the time and space for employees to do work they are most passionate about. Organizations should consider having consistent hackathon days or Open-Source Fridays, where data scientists and engineers can contribute to open-source projects they care about.

Even when it comes to more traditional aspects of compensation, like salary, strive to be open to innovation and in tune with the needs of today's professionals. Offer salary transparency for each employee, with benchmarks in the industry for their title, where they live, and how their salary compares to other jobs with the same title. Structure salary increases based on the industry benchmarks of a specific role within a particular job market, and then share those benchmarks for transparency.

As the last two years have shown, we need to be ready to continually refine workplace cultures and employee experiences in response to a changing world. To that end, it's also important to stay responsive to employee feedback and suggestions as change is navigated together.

The best data scientists follow what their data tells them -- if there is no open dialogue between employees and senior company leaders, organizations ignore a crucial source of data in the decision-making surrounding company culture. Some organizations may succeed by introducing senior leadership office hours, anonymous monthly "Ask Me Anything" sessions, bi-annual employee engagement surveys, and recurring pulse surveys.

These efforts ensure organizations do not remain stagnant and distant from what matters most to employees, but rather take an empathetic and responsive approach to meet the needs of their people.

With an abundance of open positions and a seemingly limited number of candidates, employees needs should be at the forefront of attracting and retaining the best talent possible.

As data scientists search for personally and professionally fulfilling roles, it's essential for organizations to be explicit in how they prioritize employee growth, the benefits offered beyond traditional mainstays, and how company culture is nurtured.

See original here:

Attract Data Science Talent with These 3 Tips - InformationWeek

Read More..

DataRobot Positioned as a Leader in the 2022 SPARK Matrix for the Data Science and Machine Learning Platform Market by Quadrant Knowledge Solutions -…

Quadrant Knowledge Solutions announced today that it has named DataRobot as a 2022 technology leader in the SPARK Matrix analysis of the Data Science and Machine Learning (DSML) platform market.

MIDDLETON, Mass., May 11, 2022 /PRNewswire/ -- Quadrant Knowledge Solutions conducted an in-depth analysis of the major Data Science and Machine Learning platform vendors by evaluating their product portfolio, market presence, and customer-value proposition. The Quadrant Knowledge SolutionsSPARK Matrix includes a detailed analysis of the global market dynamics, major trends, vendor landscape, and competitive positioning. The study also provides strategic information for users to evaluate different vendor capabilities, competitive differentiation, and market positions.

The technological trends that are driving the DSML platform market include demand for intuitive user experience dashboards that support deep interoperability with analytics tools. Users are also demanding predictive and prescriptive models in both basic and advanced DSML platforms and are especially demanding these platforms for professional users like data scientists, analysts, machine learning leaders, and other professionals with technical and business knowledge. The platform must utilize analytical insights from a large variety of data and track models built. DSML platforms help organizations achieve digitalization and automation to gain real-time insights and reduce complexities that occur due to big data. The platforms also include a variety of analytics and machine learning tools that empower data experts to reveal insights from the data. Thus, these platforms help organizations to streamline business processes and enhance customer experience.

Sofia Ali, Analyst at Quadrant Knowledge Solutions, states, "DataRobot offers a comprehensive set of capabilities to provide a seamless experience from data to value, which includes Data Engineering, Machine Learning, MLOps, Decision Intelligence, and Trusted AI through the DataRobot AI Cloud platform." Sofia adds, "DataRobot's DSML solution leverages a code-first platform for data science experts, including composable ML, enhanced cloud-hosted notebooks, and code-centric data pipelines. The company has received strong ratings across the parameters of technology excellence and customer impact and has been positioned amongst technology leaders in the 2022 SPARK Matrix for the Data Science and Machine Learning (DSML) platform market."

"DataRobot AI Cloud is designed to keep our customers ahead of the trends driving the DSML platform market, offering powerful data-driven insights and predictions in a unified environment across the AI lifecycle," said Nenshad Bardoliwalla, Chief Product Officer, DataRobot. "We are committed to continuing to deliver easy-to-build AI applications that are accessible to everyone from the most novice business users to the most advanced data scientists, optimizing critical business decisions across the organization."

Additional Resources:

About DataRobot

DataRobot AI Cloud is the next generation of AI. DataRobot's AI Cloud vision is to bring together all data types, all users, and all environments to deliver critical business insights for every organization. DataRobot is trusted by global customers across industries and verticals, including a third of the Fortune 50. For more information, visit http://www.datarobot.com/.

About Quadrant Knowledge Solutions

Quadrant Knowledge Solutions is a global advisory and consulting firm which mainly focuses on helping clients and allows them to achieve business transformation goals with Strategic Business and Growth advisory services. At Quadrant Knowledge Solutions, our vision is to become an integral part of our client's business as a strategic knowledge partner. Our research and consulting deliverables are designed to provide comprehensive information and strategic insights for helping clients formulate growth strategies to survive and thrive in ever-changing business environments. For more available research, please visithttps://quadrant-solutions.com/market-research/

Media Contacts

Stephanie Rogers[emailprotected]1-617-765-4500

Riya MeharQuadrant Knowledge Solutions[emailprotected]

SOURCE Quadrant Knowledge Solutions

See the rest here:

DataRobot Positioned as a Leader in the 2022 SPARK Matrix for the Data Science and Machine Learning Platform Market by Quadrant Knowledge Solutions -...

Read More..

University of Chicago’s Data Science Institute comes forward with years of research on internet equity; the reality isn’t good for those living on…

From COVID-19 vaccines to the agriculture industry, from mental health wellness to the city of Chicagos Year of Healing, equity is a term at the forefront on many societal fronts. And for the past two years, the University of Chicago Data Science Institute (DSI) has been focusing on internet equity in a hope to better understand how to fix the digital divide laid bare in state communities during this pandemic.

Researchers from the universitys Crown Family School of Social Work, Policy and Practice and the Department of Computer Science have been collaborating for the past two years gathering newer, focused internet data on Chicagos 77 neighborhoods under the Internet Equity Initiative. At Mondays Data Science Institute summit on UChicagos campus, Nick Feamster, faculty director of research at the Data Science Institute, and Nicole Marwell, associate professor in the Crown Family School of Social Work, Policy and Practice both principal investigators of the initiative revealed a 32-point difference between the most connected neighborhoods in the Loop and Near North Side (where more than 94% of households are connected to the internet) compared with Far South Side neighborhoods of Burnside and West Englewood, where fewer than 62% of households are connected.

Weve known for a while that federal data on this basically collects paper forms from internet service providers at a pretty coarse granularity, like a census tract level and if one home gets covered, theyre like, OK, its fine, Feamster said. I knew that was suspect, but it hit home for me when I moved to Hyde Park almost three years ago. If you look at that map, Hyde Park purportedly gets gigabit internet access and has multiple ISPs serving it. But I had a heck of a time signing up for service on my block. That lit a fire for me. I was like, Wow, if its this bad in Hyde Park, in the city of Chicago, its got to be even worse elsewhere where were not even looking.

Nicole Marwell, left, and Nick Feamster speak at the inaugural Data Science Institute Summit at the University of Chicago on May 9, 2022. (Terrence Antonio James / Chicago Tribune)

The disparities in connectivity between neighborhoods can be seen in DSIs data portal, which combines public and private data from 20 cities in the nation, including Chicago. UChicago undergraduate students analyzed pre-pandemic information from the U.S. Census, the American Community Survey, the Federal Communications Commission and the portal for a more localized look of internet connectivity in Chicago. From July through August 2021, researchers measured internet performance in a house in Hyde Park and one in South Shore both households were paying for gigabit internet service from Xfinity (Comcast). The Hyde Park household experienced higher-quality internet than the South Shore household. Portal data also revealed connectivity strongly correlates with income, unemployment and race/ethnicity.

Per the portal data, in portions of Roseland, broadband access is as low as 49%; in an area of Chicago Heights, its less than that, and in an area in East Garfield Park, connectivity is lower than 46%. The Loop, Lincoln Park and Beverly neighborhoods show over 90% connectivity. The results emphasize the need for continued, targeted intervention to improve connectivity in sections of the city, and the reason for DSIs ongoing study. With the $65 billion in federal funding that was authorized in 2021 under the Infrastructure Investment and Jobs Act to help expand broadband, Feamster and Marwell hope the initiatives work helps Illinois secure its fair share of moneys under the act and aids stakeholders interested in working on solutions to decrease the digital divide.

The initiative is working with local community organizations and residents to help in this effort by collecting different measurements of internet performance in households across Chicago. Volunteers from across Chicago have installed small devices on their routers, which allows researchers to measure internet performance as data travels to and from the household. Researchers are continuing to recruit volunteers to conduct comparisons between neighborhoods. Feamster said the institutes team welcomes many manners of involvement from community residents from slicing and dicing the data, but also to think about solutions.

The point of collecting the data is to understand the nature of the problem, which can then inform the folks who are working to actually develop solutions to the problem, Marwell said. And those can be a lot of different folks: ISPs, utility companies, it could be community groups that are putting together public Wi-Fi, it could be landlords who are trying to add Wi-Fi into their building services, rather than having people connect to an internet service provider on their own.

Marwell said connectivity is more than just an affordability issue. The initiative study is really driving at the quality and reality of peoples lived internet experience on the ground something more than the one-time captured data of internet speed tests. By measuring more of the lived experience in continuous real time , researchers can measure over time whether something big is going on in a certain neighborhood or whether an area is just having a bad day or hour.

It may seem like internet is one solution fits all, but the more we learn about the nature of the problem, we see that what the building is made out of makes a difference, what the trees and other topography are like makes a difference, Marwell said. Whats possible within the sort of managerial orientation of a multiunit building makes a difference, what community institutions might be available to site an antenna for community Wi-Fi all those things are part of this process. We cant really be thinking about solving the internet problem just as give everybody a subsidy to buy their own service.

I think to the extent that these efforts are successful at achieving the goal of greater connectivity, thats going to be really important proof of concept for continuing to roll money in subsequent years through subsequent infrastructure investments at both the federal and the state level, to continue the work and try and reach everybody, Marwell said.

drockett@chicagotribune.com

Follow this link:

University of Chicago's Data Science Institute comes forward with years of research on internet equity; the reality isn't good for those living on...

Read More..

It’s a Numbers Game Why Businesses Need More Women in Data and Analytics – insideBIGDATA

In this special guest feature, Peggy Pranschke, VP Global Business Analytics with Vista, discusses how the barriers to success for women working as data scientist can be overcome. Peggy began her career in the federal government as a data analyst in 2005 where she worked in a variety of roles on various projects. In 2017, she left the government to lead AI and Data Science in the private-sector for Advance Auto Parts, a US-based Fortune 500 after-market auto parts retailer. At Advance, Peggy built a world-class data science team and created and implemented an AI strategy unique to the automotive industry. Peggy is also a passionate mentor and a member with Chief, a female executive networking group supported by Serena Williams and Madeleine Albright.

As organizations go through digital transformation, they generate more invaluable data that needs analyzing and interpreting. If done right, understanding these insights helps companies in every industry create customer value and gain a real competitive advantage.

As the amount of data increases and the demand for specialists grows, ensuring diversity within data teams should be a priority. Diversified staff can bring different perspectives, creativity, and unbiased problem-solving to the discipline. Yet, research shows that women are still significantly outnumbered in data-related fields. In fact, according to Boston Consulting Group, only 15% of data scientists today are women.There is clearly still a lot that needs to be done to increase the number of women in data careers and it is a shift that wont happen overnight. But there are several ways companies can ensure theyre helping drive this change.

Diversity-first environment

Building an effective team of data engineers and analysts requires true commitment to inclusion. Groups of employees with very similar backgrounds wont represent the spectrum of users that the products or services seek to target.

In practice, lack of diversity within a data team can lead to issues including representation bias and using skewed benchmarks when developing algorithms. Ensuring greater balance in the ratio of male and female data specialists increases chances that a potential bias problem will be quickly identified and flagged.

Another important step in making the workplace more gender-diverse is ensuring existing policies are helping to cultivate an environment of trust and safety. Some examples of inclusive policies include the possibility of async or flexible work, paid maternity/paternity leave, and anti-harassment policies.

Nurturing Talent

With the ongoing data skills shortage, attracting more women to AI and analytics roles has never been more pressing. One way of addressing this challenge is investing in the development of young women enrolled in STEM degrees. In fact, encouraging future female graduates while theyre still at university might be a decisive moment for recruiters its been reported that only a small portion of the group will opt for a career in data. To improve chances of more women choosing data and analytics careers, businesses could look into launching partnerships with universities and non-profits, offering women access to placements, insights and communities. At Vista, we have partnered with The Mom Project a talent community that connects professionally accomplished women with world-class companies.

Once you have an individual whos determined to work within the space, we need to ensure that the job descriptions they are seeing are appealing to a wider variety of groups. That is why closing the gender gap requires internal processes including recruitment tools to be as neutral and inclusive as possible. Otherwise, it can not only put female applicants at risk of unfair treatment but also make companies miss out on the right candidate. One way that employers can develop fairer recruitment processes is by using tools such as Textio to check job descriptions for inclusive language.

Companies often want to only take into consideration professionals who are fully equipped for data and analytics roles. To broaden the talent pool and increase the number of female candidates for these positions, we all should be looking for transferrable skills such as critical thinking and problem-solving.

Networking opportunities

Networking events are a great opportunity for women working in different tech fields to connect and share experiences. From my personal perspective, being part of Vistas Women in Technology (WIT) resource group and Chief, a womens executive networking group, has helped me immensely as a professional. Being part of WIT here at Vista has given me an amazing group of colleagues with shared experiences to learn and grow from. Ive learned so much from listening to how fellow parents tackle current challenges like working remotely while kids are in virtual school and balancing the two.

Although these events can be a valuable way of developing great ideas and enabling a better flow of information, I would argue that the format needs to be considered. Late night drinks and dinners are increasingly difficult to commit to. Changing up the format, for example, a breakfast or lunch event, could help to encourage more involvement.

Sign up for the free insideBIGDATAnewsletter.

Join us on Twitter:@InsideBigData1 https://twitter.com/InsideBigData1

Read more from the original source:

It's a Numbers Game Why Businesses Need More Women in Data and Analytics - insideBIGDATA

Read More..

Pure Storage and Snowflake team up to offer rapid data analytics in the cloud – SiliconANGLE News

Pure Storage Inc. and Snowflake Inc. said today theyre working together to enable joint customers to take data thats stored on-premises and analyze it in the cloud.

Under the new initiative, companies that use Pure Storages arrays to store sensitive, on-premises data within their own corporate data centers will soon be able to connect those assets to the Snowflake Data Cloud.

Its a partnership that promises some interesting benefits for companies. Snowflake Data Cloud is a cloud-native data warehouse that enterprises can use to share data securely across their entire organization. They can unify datasets from various different cloud services and software applications and make them accessible to any user. The advantage is that this combined data can be used for analytics, data science, application development and more.

Snowflake Data Cloud provides other advantages too, as any information stored within it can be augmented with third-party data from providers such as FactSet Research Systems Inc., Safegraph Inc., Zillow Inc. and Weather Source LLC.

The implication is that the cloud is far better suited for many kinds of data analysis, explaining why Pure Storage is looking to partner with Snowflake. Pure is a leading provider of high-performance, all-flash storage arrays used by organizations to host critical data that must be constantly analyzed. However, it understands that its customers may need to analyze this information at greater scale.

Once todays partnership comes to fruition, Snowflake customers will be able to add data from Pure Storages FlashBlade object storage to the Snowflake Data Cloud, where it can be analyzed while still adhering to regulatory rules that require sensitive information to be stored locally.

The partnership is very similar to the deal between Snowflake and Dell Technologies Inc. that was announced last week at Dell Technologies World. That initiative was billed as a first of its kind for the industry, but if Dell is indeed the first to get to market, it wont be the last.

Wikibon analyst Dave Vellante said he believes Snowflake will make many more deals of this type in order to get its teeth into the vast amount of data that lives on on-premises servers. In an analysis on SiliconANGLE, he explained that cloud-native independent service providers such as Snowflake are realizing that, despite their cloud-only dogma, they have to grit their teeth and deal with on-premises data or risk being shut out of evolving data architectures.

Software companies want to partner will leading hardware platforms and vice versa, Vellante said. Hybrid dynamics are forming.

Snowflake said partnering with Pure Storage will give enterprises better control over their data and accelerated time to outcomes, eliminating the time and resources they spend on migrating data to the cloud beforehand so it can be analyzed there. Other benefits include reduced cloud costs, Snowflake said, as companies will be able to analyze data directly on FlashBlade object storage, thus removing the need to create separate copies of that data anywhere else. Furthermore, Snowflake said, it will make it easier for Pure Storages customers to share data.

Pure Storage Chief Technology Officer Rob Lee said his company is working with Snowflake to build a modern platform for hybrid cloud data analytics. Were excited to continue delivering on our vision of simplifying how organizations interact with and extract the most value out of their data, he added.

Pure Storage said its first joint offering with Snowflake will be available in public preview in the second half of the year, a similar time frame to that offered by Dell.

Continued here:

Pure Storage and Snowflake team up to offer rapid data analytics in the cloud - SiliconANGLE News

Read More..

Python Profiling Tools: A Tutorial – Built In

Profiling is a software engineering task in which software bottlenecks are analyzed programmatically. This process includes analyzing memory usage, the number of function calls and the runtime of those calls. Such analysis is important because it provides a rigorous way to detect parts of a software program that may be slow or resource inefficient, ultimately allowing for the optimization of software programs.

Profiling has use cases across almost every type of software program, including those used for data science and machine learning tasks. This includes extraction, transformation and loading (ETL) and machine learning model development. You can use the Pandas library in Python to conduct profiling on ETL, including profiling Pandas operations like reading in data, merging data frames, performing groupby operations, typecasting and missing value imputation.

Identifying bottlenecks in machine learning software is an important part of our work as data scientists. For instance, consider a Python script that reads in data and performs several operations on it for model training and prediction. Suppose the steps in the machine learning pipeline are reading in data, performing a groupby, splitting the data for training and testing, fitting three types of machine models, making predictions for each model type on the test data, and evaluating model performance. For the first deployed version, the runtime might be a few minutes.

After a data refresh, however, imagine that the scripts runtime increases to several hours. How do we know which step in the ML pipeline is causing the problem? Software profiling allows us to detect which part of the code is responsible so we can fix it.

Another example relates to memory. Consider the memory usage of the first version of a deployed machine learning pipeline. This script may run for an hour each month and use 100 GB of memory. In the future, an updated version of the model, trained on a larger data set, may run for five hours each month and require 500 GB of memory. This increase in resource usage is to be expected with an increase in data set size. Detecting such an increase may help data scientists and machine learning engineers decide if they would like to optimize the memory usage of the code in some way. Optimization can help prevent companies from wasting money on unnecessary memory resources.

Python provides useful tools for profiling software in terms of runtime and memory. One of the most basic and widely used is the timeit method, which offers an easy way to measure the execution times of software programs. The Python memory_profile module allows you to measure the memory usage of lines of code in your Python script. You can easily implement both of these methods with just a few lines of code.

We will work with the credit card fraud data set and build a machine learning model that predicts whether or not a transaction is fraudulent. We will construct a simple machine learning pipeline and use Python profiling tools to measure runtime and memory usage. This data has an Open Database License and is free to share, modify and use.

More From Sadrach PierreHow to Find Outliers (With Examples)

To start, lets import the Pandas library and read our data into a Pandas data frame:

Next, lets relax the display limits for columns and rows using the Pandas method set_option():

Next, lets display the first five rows of data using the head() method:

Next, to get an idea of how big this data set is, we can use the len method to see how many rows there are:

And we can do something similar for counting the number of columns. We can assess the columns attribute from our Pandas data frame object and use the len() method to count the number of columns:

We can see that this data set is relatively large: 284,807 rows and 31 columns. Further, it takes up 150 MB of space. To demonstrate the benefits of profiling in Python, well start with a small subsample of this data on which well perform ETL and train a classification model.

Lets proceed by generating a small subsample data set. Lets take a random sample of 10,000 records from our data. We will also pass a value for random_state, which will guarantee that we select the same set of records every time we run the script. We can do this using the sample() method on our Pandas data frame:

Next, we can write the subsample of our data to a new csv file:

Now we can start building out the logic for data preparation and model training. Lets define a method that reads in our csv file, stores it in a data frame and returns it:

Next, lets define a function that selects a subset of columns in the data. The function will take a data frame and a list of columns as inputs and return a new one with the selected columns:

Next, lets define a method that itself defines model inputs and outputs and returns these values:

We can then define a method used for splitting data for training and testing. First, at the top of our script, lets import the train_test_split method from the model_selection module in Scikit-learn:

Now we can define our method for splitting our data:

Next, we can define a method that will fit a model of our choice to our training data. Lets start with a simple logistic regression model. We can import the logistic regression class from the linear models module in Scikit-learn:

We will then define a method that takes our training data and an input that specifies the model type. The model type parameter we will use to define and train a more complex model later on:

Next, we can define a method that take our trained model and test data as inputs and returns predictions:

Finally, lets define a method that evaluates our predictions. Well use average precision, which is a useful performance metric for imbalance classification problems. An imbalance classification problem is one where one of the targets has significantly fewer examples than the other target(s). In this case, most of the transaction data correspond to legitimate transactions, whereas a small minority of transactions are fraudulent:

Now we have all of the logic in place for our simple ML pipeline. Lets execute this logic for our small subsample of data. First, lets define a main function that well use to execute our code. In this main function, well read in our subsampled data:

Next, use the data prep method to select our columns. Lets select V1, V2, V3, Amount and Class:

Lets then define inputs and output. We will use V1, V2, V3, and Amount as inputs; the class will be the output:

Well split our data for training and testing:

Fit our data:

Make predictions:

And, finally, evaluate model predictions:

We can then execute the main function with the following logic:

And we get the following output:

Now we can use some profiling tools to monitor memory usage and runtime.

Lets start by monitoring runtime. Lets import the default_timer from the timeit module in Python:

Next, lets start by seeing how long it takes to read our data into a Pandas data frame. We define start and end time variables and print the difference to see how much time has elapsed:

If we run our script, we see that it takes 0.06 seconds to read in our data:

Lets do the same for each step in the ML pipeline. Well calculate runtime for each step and store the results in a dictionary:

We get the following output upon executing:

We see that reading in the data and fitting it are the most time-consuming operations. Lets rerun this with the large data set. At the top of our main function, we change the file name to this:

And now, lets return our script:

We see that, when we use the full data set, reading the data into a data frame takes 1.6 seconds, compared to the 0.07 seconds it took for the smaller data set. Identifying that it was the step where we read in the data that led to the increase in runtime is important for resource management. Understanding bottleneck sources like these can prevent companies from wasting resources like compute time.

Next, lets modify our model training method such that CatBoost is a model option:

Lets rerun our script but now specifying a CatBoost model:

We see the following results:

We see that by using a CatBoost model instead of logistic regression, we increased our runtime from ~two seconds to ~ 22 seconds, which is more than a tenfold increase in runtime because we changed one line of code. Imagine if this increase in runtime happened for a script that originally took 10 hours: Runtime would increase to over 100 hours just by switching the model type.

Another important resource to keep track of is the memory. We can use the memory_usage module to monitor memory usage line-by-line in our code. First, lets install the memory_usage module in terminal using pip:

We can then simply add @profiler before each function definition. For example:

And so on.

Now, lets run our script using the logistic regression model type. Lets look at the step where we fit the model. We see that memory usage is for fitting our logistic regression model is around 4.4 MB (line 61):

Now, lets rerun this for CatBoost:

We see that memory usage for fitting our logistic regression model is 13.3 MB (line 64). This corresponds to a threefold increase in memory usage. For our simple example, this isn't a huge deal, but if a company deploys a newer version of a production and it goes from using 100 GB of memory to 300 GB, this can be significant in terms of resource cost. Further, having tools like this that can point to where the increase in memory usage is occurring is useful.

The code used in this post is available on GitHub.

More in Data ScienceUse Lux and Python to Automatically Create EDA Visualizations

Monitoring resource usage is an important part of software, data and machine learning engineering. Understanding runtime dependencies in your scripts, regardless of the application, is important in virtually all industries that rely on software development and maintenance. In the case of a newly deployed machine learning model, an increase in runtime can have a negative impact on business. A significant increase in production runtime can result in a diminished experience for a user of an application that serves realtime machine learning predictions.

For example, if the UX requirements are such that a user shouldnt have to wait more than a few seconds for a prediction result and this suddenly increases to minutes, this can result in frustrated customers who may eventually seek out a better/faster tool.

Understanding memory usage is also crucial because instances may occur in which excessive memory usage isnt necessary. This usage can translate to thousands of dollars being wasted on memory resources that arent necessary. Consider our example of switching the logistic regression model for the CatBoost model. What mainly contributed to the increased memory usage was the CatBoost packages default parameters. These default parameters may result in unnecessary calculations being done by the package.

By understanding this dynamic, the researcher can modify the parameters of the CatBoost class. If this is done well, the researcher can retain the model accuracy while decreasing the memory requirements for fitting the model. Being able to quickly identify bottlenecks for memory and runtime using these profiling tools are essential skills for engineers and data scientists building production-ready software.

Read the original here:

Python Profiling Tools: A Tutorial - Built In

Read More..

The Top Five Key Data Visualization Techniques to Utilize Right Now – Solutions Review

The editors at Solutions Review highlight five key data visualization techniques to utilize right now to enhance data storytelling.

Data visualization is a graphical or visual representation of a large amount of data in the form of charts, graphs, and maps which helps analyze the data by identifying the patterns and trends following the data. In this article, we will discuss data science visualization, the applications and benefits of data visualization, and the different types of analysis such as univariate, bivariate, and multivariate data visualization. We will also discuss the top data visualization techniques, which include line charts, histograms, pie charts, area plots, and scatter plots that can help you understand your data better.

So before going on to the techniques of data visualization, let us look at what is data science visualization and its benefits.

Data science visualization refers to the graphical representation of data using various graphics such as charts, plots, maps, graphs, and infographics. Data visualization makes it easier for human beings to understand the data by analyzing patterns and trends, which helps in generating valuable business insights. Data visualization can even help identify critical relationships in various charts and plots that may prove fruitful for businesses.

Data scientists analyze, interpret and visualize various large datasets regularly with the help of multiple data visualization tools such as Tableau, Sisense, Microsoft Power BI, Microsoft Excel, Looker, and Zoho Analytics.

Today data visualization has its application in various fields such as healthcare, finance, marketing, data science, military, E-commerce, education, etc., as it helps in organizing the data in a way that is not possible through traditional techniques, and thus, it helps in the faster data processing. Check out to learn more about critical aspects of data science visualization.

Data visualization is a technique of graphical representation of data that helps us in faster data processing and making improved business decisions. Some of the benefits of data visualization are:

The three different types of analysis for data visualization are:

Data visualization techniques involve generating the graphical or visual representation of the data to identify various patterns, trends, correlations, and dependencies to gain valuable insights. Let us have a look at the top five data visualization techniques that are used commonly:

Line Chart: A line chart or line plot displays information as a series of data points connected by a straight line. A line chart displays any relationship between two variables on the respective X and Y-axis. It is most commonly used to compare several variables and analyze trends.

Histogram: A histogram is the graphical representation of a set of numerical data in the form of rectangle blocks connected. A histogram represents only the quantitative data, unlike the bar chart. It is used to figure out any unusual observation or gap present in the large dataset.

Pie Chart: A pie chart represents the data in a circular statistical graphic form. It records data in the form of numbers, percentages, or degrees. It is the most common form of graphical representation used in business presentations to depict various data related to orders, sales, revenue, profit, loss, etc. Pie chart are divided into sectors that represent the percentage of the whole.

Area Plot: An area plot is almost similar to the line chart or can be defined as a special form of a line chart where to highlight the distance between different variables, the region below the line is filled with a color instead of just connecting the data with a continuous line. It helps to show the rise and fall of data, changes over time, categorical breakdowns, etc.

Scatter Plot: A scatter plot is a graphical representation used to observe and display the relationship between two variables. It uses dots to illustrate values for two different horizontal and vertical axis variables. Scatter plots are used to monitor the relationship between variables.

Data science visualization is an essential tool as it helps you analyze hidden patterns and relationships between variables through the graphical presentation of data. It is still an important tool and is a must-have skill for data scientists and data analysts to derive insights from complex business data. In this article, we discussed various data Visualization techniques like line charts, histograms, scatter plots, and others.

Tim is Solutions Review's Editorial Director and leads coverage on big data, business intelligence, and data analytics. A 2017 and 2018 Most Influential Business Journalist and 2021 "Who's Who" in data management and data integration, Tim is a recognized influencer and thought leader in enterprise business software. Reach him via tking at solutionsreview dot com.

Continued here:

The Top Five Key Data Visualization Techniques to Utilize Right Now - Solutions Review

Read More..

EDU introduces MSc in Data Analytics for the first time in the country – The Business Standard

East Delta University (EDU) started a specialised master's program titled MSc in Data Analytics and Design Thinking for Business.

EDU being a pioneer in regards to offering contemporary degree programmes has introduced as the first and only university in Bangladesh a degree in Data Analytics and Design Thinking, reads a press release.

The MSc in Data Analytics and Design Thinking for Business degree blends the data analytics overview with the necessary skills to design creative and successful business models.

This degree allows the learners to focus on data-driven business designs, marketing and human resources management with the appropriate use of data science.

Alongside the analytics module, students will study strategic marketing in this digital era and creative HRM practices with extensive use of data science, reads the statement.

The Architect of the programme, the Founder and Vice-chairman of EDU Sayeed Al Noman expressed his philosophy regarding this one of a kind programme, "The innovative business design makes for substantive and emotional distinction which can certainly lead towards making a lasting effect in the human psyche."

"In this era, the design has become profoundly crucial in business spheres, and companies are consistently seeking to recognise and optimise the strategic edge that creative design can bring."

"To help companies solve today's business issues in new ways, business executives aim to improve their innovative and operational thinking skills. Such a program was not offered before in our country and to relinquish this void, EDU came forward with an internationally acclaimed faculty pool and a curriculum that resonates with the global requirements too," he said.

There are four modules, including the Data Analytics module, Design Thinking for Business module, Creative Design for Marketing and HRM module, and Application of Data Analytics for Business module.

Participants will learn to understand and analyse data from the cores, including data design, data handling and decision making, and data visualization and interpretation. It will include a range of business courses linked to business designs for creative thinking and practices.

The MSc in Data Analytics and Design Thinking for Business program will prepare students for careers that apply and manage modern data science to solve critical business challenges.

This degree aims to turn big data into actionable intelligence. To that end, business analysts use a variety of statistical and quantitative methods, computational tools, and predictive models as well as their knowledge of business, marketing, HRM, the corporate world, and the economy to make data-driven decisions and design thinking for modern business, reads the statement.

University Vice-chancellor Professor Sikandar Khan stated that "this crafted programme may open a new avenue to materialise the dream of the present government to build a Digital Bangladesh. The university got motivated to offer this one of a kind master's programme considering the dream of the government of Bangladesh. To transform the country from traditional to digital requires a group of talented and technology-centric young people who can combine the ideas of business with the latest technology and data science."

EDU authorities declared that applicants and graduates from any background are eligible for this program. The unique curriculum mapping of this unconventional programme allows applicants with any undergraduate degree to apply for admissions.

The graduates from non-relevant disciplines will also be equipped with necessary techniques from the tailor-made course.

The total program cost is Tk419,000 but the university authority is offering a special 70% waiver on tuition fees. Special waiver on admission fees will be available.

After all these waivers the programme will cost Tk167,000. In addition, students can avail upto 100% scholarship based on academic merit and work experience.

Continued here:

EDU introduces MSc in Data Analytics for the first time in the country - The Business Standard

Read More..

Hospital IQ Wins Best Predictive Analytics Solution in 2022 MedTech Breakthrough Awards – Yahoo Finance

Recognition marks third consecutive year Hospital IQ has been named MedTech Breakthrough winner

NEWTON, Mass., May 12, 2022--(BUSINESS WIRE)--Hospital IQ, a leading intelligent workflow automation provider for hospital operations, announced today it has been named the winner of the MedTech Breakthrough Award for Best Predictive Analytics Solution. This recognition marks the third consecutive year Hospital IQ has been selected as a MedTech Breakthrough Award winner, previously winning the Health Administration Innovation Award in both 2021 and 2020.

Hospital IQ earned this accolade for its latest innovation in predictive analytics, which accurately predicts patient demand, automates workflows and drives cross-functional team action to improve hospital operational efficiencies and optimize capacity to provide patient care. The Hospital IQ solution leverages data and predictive analytics to provide health system leaders with enterprise-wide visibility into variables across all areas of hospital operations in real-time, including admissions, discharges, staff scheduling and more. Driven by intelligent data analytics, the system identifies potential future challenges in census, boarding, and staffing levels throughout the health system, and turns these predictions into actionable notifications. When a future scenario in which sub-optimal conditions will occur is identified, the Hospital IQ platform alerts the proper stakeholders and provides recommended actions to mitigate problems before they arise.

"Were honored to receive our third consecutive MedTech Breakthrough award win, and were proud to be recognized as the leading innovator in predictive analytics for our clients and the patients they serve," said Rich Krueger, CEO of Hospital IQ. "Success in the post-pandemic future of healthcare will require systematic improvements regarding how healthcare organizations manage processes and people, and core to that improvement is the move from reactive to proactive processes. Predictive analytics allow hospitals and health systems to operate more effectively and strategically utilizing data science to predict whats coming along with real-time insight and transparent communication across the enterprise, allowing them to see more patients, align staffing to demand, provide greater care, and support staff satisfaction."

The Hospital IQ solution is also equipped to promote team success through its coordinated communication platform and automated notifications, keeping everyone informed as patient throughput and capacity utilization goals are achieved and sustained, which eliminates confusion, keeps everyone on the same page and helps with staff morale and retention. The platform can be deployed for use cases across the most impactful areas of the hospital including inpatient, perioperative and infusion, enabling health systems to customize the toolset to their unique needs and further enhancing enterprise-wide performance.

Story continues

The MedTech Breakthrough Awards honor excellence and recognize the innovation, hard work and success in a range of health and medical technology categories, including Clinical Administration, Telehealth, Patient Engagement, Electronic Health Records (EHR), mHealth, Medical Devices, Medical Data and many more. This year, nearly 4,000 nominations from around the world were submitted representing the most competitive program to-date.

About Hospital IQ

Hospital IQ provides an intelligent workflow automation solution for hospital operations that uses artificial intelligence to anticipate and direct actions, enabling health systems to achieve and sustain peak operational performance to improve patient access, clinical outcomes, and financial performance. Hospital IQ's cloud-based software platform combines advanced data analytics, machine learning, and simulation technology with an easy-to-use, intuitive user interface to deliver optimized surgical resource alignment, patient flow, and staff scheduling capabilities. Hundreds of leading hospitals and health systems rely on Hospital IQ to help them make the right operational decisions the first time, every time. To learn more, visit http://www.hospiq.com.

View source version on businesswire.com: https://www.businesswire.com/news/home/20220512005230/en/

Contacts

Laura BastardiMatter for Hospital IQHospital_IQ@matternow.com

Read this article:

Hospital IQ Wins Best Predictive Analytics Solution in 2022 MedTech Breakthrough Awards - Yahoo Finance

Read More..

Analytics and Data Science News for the Week of May 6; Updates from Domino Data Lab, Gartner, Starburst, and More – Solutions Review

The editors at Solutions Review have curated this list of the most noteworthy analytics and data science news items for the week of May 6, 2022.

Keeping tabs on all the most relevant data management news can be a time-consuming task. As a result, our editorial team aims to provide a summary of the top headlines from the last month, in this space. Solutions Review editors will curate vendor product news, mergers and acquisitions, venture capital funding, talent acquisition, and other noteworthy data science and analytics news items.

New capabilities will recommend the optimal size for a development environment, thereby improving the model development experience for data science teams. Integrated workflows in Domino 5.2 automate model deployment to Snowflakes Data Cloud and enable the power of in-database computation, as well as model monitoring and continuous identification of new production data to update data drift and model quality calculations that drive better business decisions.

Read on for more.

Analyst house Gartner, Inc. has released its newest research highlighting four emerging solution providers that data and analytics leaders should consider as compliments to their existing architectures. The 2022Cool Vendors in Analytics and Data Sciencereport features information on startups that offer some disruptive capability or opportunity not common to the marketplace.

Read on for more.

An integration plugin now ships with every Starburst Enterprise instance and features include: scalable attribute-based access control (ABAC), sensitive data discovery and classification, data policy enforcement and advanced policy building, and dynamic data masking auditing.

Read on for more.

For consideration in future data analytics news roundups, send your announcements to tking@solutionsreview.com.

Tim is Solutions Review's Editorial Director and leads coverage on big data, business intelligence, and data analytics. A 2017 and 2018 Most Influential Business Journalist and 2021 "Who's Who" in data management and data integration, Tim is a recognized influencer and thought leader in enterprise business software. Reach him via tking at solutionsreview dot com.

View original post here:

Analytics and Data Science News for the Week of May 6; Updates from Domino Data Lab, Gartner, Starburst, and More - Solutions Review

Read More..