Category Archives: Data Science

Analytics and Data Science News for the Week of May 3; Updates from Databricks, DataRobot, MicroStrategy & More – Solutions Review

Written by admin on May 4, 2024 — Leave a Comment

Solutions Review Executive Editor Tim King curated this list of notable analytics and data science news for the week of May 3, 2024.

Keeping tabs on all the most relevant analytics and data science news can be a time-consuming task. As a result, our editorial team aims to provide a summary of the top headlines from the last week, in this space. Solutions Review editors will curate vendor product news, mergers and acquisitions, venture capital funding, talent acquisition, and other noteworthy analytics and data science news items.

FlightStream can capture subsonic to supersonic flows, including compressible effects and a unique surface vorticity capability. It leverages the strengths of panel method flow solvers and enhances them with modern computational techniques to provide a fast solver capable of handling complex aerodynamic phenomena.

Read on for more

By supporting any JDK distribution including those from Azul, Oracle, Amazon, Eclipse, Microsoft, Red Hat and others, Azul Intelligence Cloud delivers key benefits across an enterprises entire Java fleet.

Read on for more

This authorization builds onAzure Databricks FedRAMP High and IL5 authorizations and further demonstrates Databricks commitment to meeting the federal governments requirements for handling highly sensitive and mission-critical defense Controlled Unclassified Information (CUI) across a wide variety of data analytics and AI use cases.

Read on for more

In the last 12 months, DataRobot introducedindustry-first generative AI functionality, launched theDataRobot Generative AI Catalyst Program to jumpstart high-priority use cases, and announced expanded collaborations with NVIDIAandGoogle Cloudto supercharge AI solutions with world-class performance and security.

Read on for more

The platform enables data scientists and CISO teams to gain valuable understanding and insights into AI systems risks and challenges, alongside comprehensive protection and alerts. DeepKeep is already deployed by leading global enterprises in the finance, security, and AI computing sectors.

Read on for more

MicroStrategy AI, introduced in 2023, is now in its third GA release, which includes enhanced AI explainability, automated workflows, and several other features designed to increase convenience, reliability, and flexibility for customers. The company also launched MicroStrategyAuto Express, allowing anyone to create and share AI-powered bots and dashboards free for 30 days.

Read on for more

The Predactiv platform emerged from the success of ShareThis, a leading data and programmatic advertising provider, recognized for engineering expertise and pioneering the use of AI in audience and insight creation.

Read on for more

Salesforce Inc. rolled out data visualization and AI infrastructure improvements for its Tableau Software platform today that increase the usability of its product for data analysts and expand its scalability for artificial intelligence.

Read on for more

Sigma is building the AI Toolkit for Business, which will bring powerful AI and ML innovations from the data ecosystem into an intuitive interface that anyone can use. With forms and actions, a customers data app is constantly up-to-date because its directly connected to the data in their warehouse.

Read on for more

Watch this space each week as our editors will share upcoming events, new thought leadership, and the best resources from Insight Jam, Solutions Reviews enterprise tech community for business software pros. The goal? To help you gain a forward-thinking analysis and remain on-trend through expert advice, best practices, predictions, and vendor-neutral software evaluation tools.

With the next Solutions Spotlight event, the team at Solutions Review has partnered with leading developer tools provider Infragistics. In this presentation, were bringing two of the biggest trends in the market today low-code app development and embedded BI.

With the next Expert Roundtable event, the team at Solutions Review has partnered with Databricks and ZoomInfo to cover why bad data is the problem, how Databricks and ZoomInfo help companies build a unified data foundation that fixes the bad data problem, and how this foundation can be leveraged to easily scale and use data + AI for GenAI.

This summit is designed for leaders looking to modernize their organizations data integration capabilities across analytics, operations and artificial intelligence use cases. Dive deep into the core use cases of data integration as it pertains to analytics, operations, and artificial intelligence. Learn how integrating data can drive operational excellence, enhance analytical capabilities, and fuel AI innovations.

Getting data from diverse data producers to data consumers to meet business needs is a complicated and time-consuming task that often traverses many products. The incoming data elements are enriched, correlated, and integrated so that the consumption-ready data products are meaningful, timely and trustworthy.

Read on for more

Debbie describes herself as a very social person and enjoys working with people., she cares about any problems colleagues experience and has a natural leaning towards wanting to help. On top of that, Debbie is also a musician, which lends itself to wanting order, structure, systems, and rules. This all adds up to make Debbie very good at her most recent role of Data Governance Manager for Solent University.

Read on for more

For consideration in future analytics and data science news roundups, send your announcements to the editor: tking@solutionsreview.com.

What Does a Data Analyst Do? | SNHU – Southern New Hampshire University

Written by admin on May 4, 2024 — Leave a Comment

In today's technology-driven world, data is collected, analyzed and interpreted to solve a wide range of business problems. With a career as a data analyst, you could play a decisive role in the growth and success of an organization.

A data analyst is a lot more than a number cruncher. Analysts review data and determine how to solve problems using that data, and learn critical insights about a business's customers and boost profits. Analysts also communicate this information with key stakeholders, including company leadership.

"Ultimately, the work of a data analyst provides insights to the organization that can transform how the business moves forward and grows successfully," said Dr. Susan McKenzie, senior associate dean of STEM programsand faculty at Southern New Hampshire University (SNHU).

McKenzie earned a Doctor of Education (EdD) in Educational Leadershipfrom SNHU, where she also serves as a member of the Sustainability Community of Practice. Throughout her career, she's focused on reducing the barriers that inhibit the learning and application of math and science.

Suppose you're interested in becoming a data analyst. In that case, it's essential to understand the day-to-day work of an analytics professional and how to prepare for a successful career in this growing field.*

Data analysts play a vital role in a modern company, helping to reflect on its work and customer base, determining how these factors have affected profits and advising leadership on ways to move forward to grow the business.

According to McKenzie, successful data analysts have strong mathematical and statistical skills, as well as:

McKenzie said that data analysts also require a strong foundation of business knowledge and professional skills, from decision-making and problem-solving to communication and time management. In addition, attention to detail is one of the essentialdata analyst skillsas it ensures that data is analyzed efficiently and effectively while minimizing errors.

As a data analyst, you can collect data using software, surveys and other data collection tools, perform statistic analyses on data and interpret information gathered to inform critical business decisions, McKenzie said. For example, a data analyst might review the demographics of visitors who clicked on a specific advertising campaign on a company's website. This data could then be used to check whether the campaign is reaching its target audience, how well the campaign is working and whether money should be spent on this type of advertising again.

Where can a data analyst work? Large amounts of data are becoming increasingly accessible to even small businesses, putting analysts in high demand across a wide variety of industries. For example, according to the U.S. Bureau of Labor Statistics (BLS), operations research analysts, which includes data analysts, held 109,900 jobs as of 2022, with an additional 24,700 by 2032.* That's a projected growth of 23% for this role.*

Many organizations have even created information analyst teams, with data-focused roles including database administrators, data scientists, data architects, database managers, data engineers, and, of course, data analysts, McKenzie said.

No matter what your specific interests are in the data analytics world, you're going to need a bachelor's degree to get started in the field. While many people begin a data analytics career with a degree in mathematics, statistics or economics, data analytics degrees are becoming more common and can help set you apart in this growing field, according to McKenzie.*

"With the increase in the amount of data available and advanced technical skills, obtaining a university degree specifically in data analytics provides the ability to master the necessary skills for the current marketplace," McKenzie said.

An associate degree in data analyticsis a great way to get your foot in the door. You'll learn what data analytics isand basic fundamentals such as identifying organizational problems, and using data analytics to respond to them. Associate degrees are typically 60 credits long, and all 60 of those credits can be applied toward a bachelor's in data analytics.

You may be able to transfer earned creditsinto a degree program, too. Jason Greenwood '21 transferred 36 credits from schooling that he did over 30 years ago into a bachelor's in data analytics program.

"I was really happy that as many credits transferred as they did, as it helped decrease what I needed to take from a course perspective," he said.

Greenwood earned his data analytics degree from SNHU to pair with his experience working in the Information Technology (IT) field.

"Even though I had a successful IT career, I had managed it without a degree, and that 'gap' in my foundations always bothered me," Greenwood said.

According to Greenwood, he was always interested in data, and his career focused increasingly on data movement and storage over the last decade. "The chance to learn about the analysis of that data felt like 'completing the journey' for me," he said.

In a data analytics bachelor's degree program, you may explore business, information technology and mathematics while also focusing on data mining, simulation and optimization. You can also learn to identify and define data challenges across industries, gain hands-on practice collecting and organizing information from many sources and explore how to examine data to determine relevant information.

Pursuing a degree in data analytics can prepare you to use statistical analysis, simulation and optimization to analyze and apply data to real-world situations and use the data to inform decision-makers in an organization.

Some universities also offer concentrations to help make your degree more specialized. At SNHU, for example, you can earn a concentration in project management for STEM to help you develop skills that may be useful for managing analytical projects and teams effectively.

A master's in data analyticscan further develop your skills, exploring how to use data to make predictions and how data relates to risk management. In addition, you'll dive deeper into data-driven decision-making, explore project managementand develop communication and leadership skills.

Finding an internship during your studies can give you essential hands-on experience that stands out when applying for data analyst jobs, McKenzie said, while joining industry associations for data analytics, statistics and operations research can provide key networking opportunitiesthat may help grow your career.

Data analysts play a unique role among the many data-focused jobs often found in today's businesses. Although the terms data analyst and data scientist are often used interchangeably, the roles differ significantly.

So, what's the difference between data science and data analytics?

While a data analyst gathers and analyzes data, a data scientist develops statistical models and uses the scientific method to explain the data and make predictions, according to McKenzie. She used an example of weather indicators. While a data analyst might gather temperature, barometric pressure and humidity, a data scientist could use that data to predict whether a hurricane might be forming.

"They're looking at the data to identify patterns and to decide scientifically what the result is," she said. "The data analyst works on a subset of what the data scientist does."

McKenzie said that data scientists generally have to earn a master's degree, while data analysts typically need a bachelor's degreefor that role.

A degree in data analytics could position you to enter a growing field and get started on a fulfilling career path.*

As technology advances and more of our lives are spent online, higher-quality data is getting easier to collect, encouraging more organizations to get on board with data analytics.

According to BLS, demand for mathematicians and statisticians is projected to grow by 30%, and job opportunities for database administrators are expected to grow by7% through 2032.*

With career opportunities across nearly every industry, you can take your data analytics degree wherever your interests lie.

"Data analysts are in high demand across many industries and fields as data has become a very large component of every business," said McKenzie. "The undergraduate degree in data analytics provides an entry place into many of these careers depending on the skills of the individual."

*Cited job growth projections may not reflect local and/or short-term economic or job conditions and do not guarantee actual job growth. Actual salaries and/or earning potential may be the result of a combination of factors including, but not limited to: years of experience, industry of employment, geographic location, and worker skill.

Danielle Gagnon is a freelance writer focused on higher education. She started her career working as an education reporter for a daily newspaper in New Hampshire, where she reported on local schools and education policy. Gagnon served as the communications manager for a private school in Boston, MA before later starting her freelance writing career. Today, she continues to share her passion for education as a writer for Southern New Hampshire University. Connect with her on LinkedIn.

Go here to see the original:

What Does a Data Analyst Do? | SNHU - Southern New Hampshire University

How CS professor and team discovered that LLM agents can hack websites – Illinois Computer Science News

Written by admin on May 4, 2024 — Leave a Comment

Daniel Kang

The launch of ChatGPT in late 2022 inspired considerable chatter. Much of it revolved around fears of large language models (LLM) and generative AI replacing writers or enabling plagiarism.

Computer science professor Daniel Kang from TheGranger College of Engineering and his collaborators at the University of Illinois have discovered thatChatGPT can do far worse than helping students cheat on term papers. Under certain conditions, the generative AI programs developer agent can write personalized phishing emails, sidestep safety measures to assist terrorists in creating weaponry, or even hack into websites without prompting.

Kang has been researching making analytics with machine learning (ML) easy for scientists and analysts to use. He said, I started to work on the broad intersection of computer security and AI. I've been working on AI systems for a long time, but it became apparent when ChatGPT came out in its first iteration that this will be a big deal for nonexperts, and that's what prompted me to start looking into this.

This suggested whatKang calls the problem choice for further research.

WhatKang and co-investigators Richard Fang, Rohan Bindu, Akul Gupta, and Qiusi Zhan discovered in research funded partly by Open Philanthropy they succinctly summarized: LLM agents can autonomously hack websites.

This research into the potential for harm inLLM agents has been covered extensively, notably by New Scientist. Kang said the media exposure is partially due to luck. He observed that people on Twitter with a large following stumbled across my work and then liked and retweeted it. This problem is incredibly important, and as far as I'm aware, what we showed is the first of a kind that LLM agents can do this autonomous hacking.

In a December 2023 article, New Scientist coveredKangs research into how the ChatGPT developer tool can evade chatbot controls and provide weapons blueprints. A March 2023 article detailed the potential for ChatGPT to create cheap, personalized phishing and scam emails. Then, there was this story in February of this year: GPT-4 developer tool can hack websites without human help.

NineLLM tools were used by the research team, with ChatGPT being the most effective. The team gave the open source GPT-4 developer tool access to six documents on hacking from the internet and the Assistants API used by OpenAI, the company developing ChatGPT, to give the agent planning ability. Confining their tests in secure sandboxed websites, the research team reported that LLM agents canautonomously hack websites, performing complex taskswithout prior knowledge of the vulnerability. For example, these agents can perform complex SQL union attacks, which involve a multi-step process of extracting a database schema, extracting information from the database based on this schema, and performing the final hack. Our most capable agent can hack 73.3% of the vulnerabilities we tested, showing the capabilities of these agents. Importantly,our LLM agent is capable of finding vulnerabilities in real-world websites. Importantly, the tests demonstrated that the agents could search for vulnerabilities and hack websites more quickly and cheaply than human developers can.

Afollow-up paper in April 2024, was covered by the Register in the article OpenAI's GPT-4 can exploit real vulnerabilities by reading security advisories. An April 18 article in Dark Reading said that Kangs research reveals that Existing AI technology can allow hackers to automate exploits for public vulnerabilities in minutes flat. Very soon, diligent patching will no longer be optional. An April 17 article from Toms Hardware stated that With the huge implications of past vulnerabilities, such asSpectreand Meltdown, still looming in the tech world's mind, this is a sobering thought. Mashable wrote The implications of such capabilities are significant, with the potential to democratize the tools of cybercrime, making them accessible to less skilled individuals. On April 16, an Axios story noted that Some IT teams can take as long as one month to patch their systems after learning of a new critical security flaw.

Kang noted, We were the first to show the possibility of LLM agents and their capabilities in the context of cyber security. The inquiry into the potential for malevolent use of LLM agents has drawn the federal government's attention. Kang said, I've already spoken to some policymakers and congressional staffers about these upcoming issues, and it looks like they are thinking about this. NIST (the National Institute of Standards and Technology) is also thinking about this. I hope my work helps inform some of these decision-making processes.

Kang and the team passed along their results to OpenAI. An Open AI spokesperson told The Register, We don't want our tools to be used for malicious purposes, and we are always working on how to make our systems more robust against this type of abuse. We thank the researchers for sharing their work with us."

Kang told Dark Reading newsletter that GPT-4 doesn't unlock new capabilities an expert human couldn't do. As such, I think it's important for organizations to apply security best practices to avoid getting hacked, as these AI agents start to be used in more malicious ways."

Kang suggested a two-tiered approach that would present the public with a limited developer model that cannot perform the problematic tasks that his research revealed. A parallel model would be a bit more uncensored but more restricted access and could be available only to those developers authorized to use it.

Kang has accomplished much since arriving at the University of Illinois Urbana-Champaign in August 2023. He said of the Illinois Grainger Engineering Department of Computer Science, The folks in the CS department are incredibly friendly and helpful. It's been amazing working with everyone in the department, even though many people are super busy. I want to highlight CS professor Tandy Warnow. She has so much on her plateshe's helping the school, doing a ton of service, and still doing researchbut she still has time to respond to my emails, and it's just been incredible to have that support from the department.

See more here:

How CS professor and team discovered that LLM agents can hack websites - Illinois Computer Science News

Starting ML Product Initiatives on the Right Foot – Towards Data Science

Written by admin on May 4, 2024 — Leave a Comment

Picture by Snapwire, on Pexels

This blog post is an updated version of part of a conference talk I gave on GOTO Amsterdam last year. The talk is also available to watch online.

As a Machine Learning Product Manager, I am fascinated by the intersection of Machine Learning and Product Management, particularly when it comes to creating solutions that provide value and positive impact on the product, company, and users. However, managing to provide this value and positive impact is not an easy job. One of the main reasons for this complexity is the fact that, in Machine Learning initiatives developed for digital products, two sources of uncertainty intersect.

From a Product Management perspective, the field is uncertain by definition. It is hard to know the impact a solution will have on the product, how users will react to it, and if it will improve product and business metrics or not Having to work with this uncertainty is what makes Product Managers potentially different from other roles like Project Managers or Product Owners. Product strategy, product discovery, sizing of opportunities, prioritization, agile, and fast experimentation, are some strategies to overcome this uncertainty.

The field of Machine Learning also has a strong link to uncertainty. I always like to say With predictive models, the goal is to predict things you dont know are predictable. This translates into projects that are hard to scope and manage, not being able to commit beforehand to a quality deliverable (good model performance), and many initiatives staying forever as offline POCs. Defining well the problem to solve, initial data analysis and exploration, starting small, and being close to the product and business, are actions that can help tackle the ML uncertainty in projects.

Mitigating this uncertainty risk from the beginning is key to developing initiatives that end up providing value to the product, company, and users. In this blog post, Ill deep-dive into my top 3 lessons learned when starting ML Product initiatives to manage this uncertainty from the beginning. These learnings are mainly based on my experience, first as a Data Scientist and now as an ML Product Manager, and are helpful to improve the likelihood that an ML solution will reach production and achieve a positive impact. Get ready to explore:

I have to admit, I have learned this the hard way. Ive been involved in projects where, once the model was developed and prediction performance was determined to be good enough, the models predictions werent really usable for any specific use case, or were not useful to help solve any problem.

There are many reasons this can happen, but the ones Ive found more frequently are:

To start an ML initiative on the right foot, it is key to start with the good problem to solve. This is foundational in Product Management, and recurrently reinforced product leaders like Marty Cagan and Melissa Perri. It includes product discovery (through user interviews, market research, data analysis), and sizing and prioritization of opportunities (by taking into account quantitative and qualitative data).

Once opportunities are identified, the second step is to explore potential solutions for the problem, which should include Machine Learning and GenAI techniques, if they can help solve the problem.

If it is decided to try out a solution that includes the use of predictive models, the third step would be to do an end-to-end definition and design of the solution or system. This way, we can ensure the requirements on how to use the predictions by the system, influence the design and implementation of the predictive piece (what to predict, data to be used, real-time vs batch, technical feasibility checks).

However, Id like to add there might be a notable exception in this topic. Starting from GenAI solutions, instead of from the problem, can make sense if this technology ends up truly revolutionizing your sector or the world as we know it. There are a lot of discussions about this, but Id say it is not clear yet whether that will happen or not. Up until now, we have seen this revolution in very specific sectors (customer support, marketing, design) and related to peoples efficiency when performing certain tasks (coding, writing, creating). For most companies though, unless its considered R&D work, delivering short/mid-term value still should mean focusing on problems, and considering GenAI just as any other potential solution to them.

Tough experiences lead to this learning as well. Those experiences had in common a big ML project defined in a waterfall manner. The kind of project that is set to take 6 months, and follow the ML lifecycle phase by phase.

What could go wrong, right? Let me remind you of my previous quote With predictive models, the goal is to predict things you dont know are predictable! In a situation like this, it can happen that you arrive at month 5 of the project, and during the model evaluation realize there is no way the model is able to predict whatever it needs to predict with good enough quality. Or worse, you arrive at month 6, with a super model deployed in production, and realize it is not bringing any value.

This risk combines with the uncertainties related to Product, and makes it mandatory to avoid big, waterfall initiatives if possible. This is not something new or related only to ML initiatives, so there is a lot we can learn from traditional software development, Agile, Lean, and other methodologies and mindsets. By starting small, validating assumptions soon and continuously, and iteratively experimenting and scaling, we can effectively mitigate this risk, adapt to insights and be more cost-efficient.

While these principles are well-established in traditional software and product development, their application to ML initiatives is a bit more complex, as it is not easy to define small for an ML model and deployment. There are some approaches, though, that can help start small in ML initiatives.

Rule-based approaches, simplifying a predictive model through a decision tree. This way, predictions can be easily implemented as if-else statements in production as part of the functionality or system, without the need to deploy a model.

Proofs of Concept (POCs), as a way to validate offline the predictive feasibility of the ML solution, and hint on the potential (or not) of the predictive step once in production.

Minimum Viable Products (MVPs), to first focus on essential features, functionalities, or user segments, and expand the solution only if the value has been proven. For an ML model this can mean, for example, only the most straightforward, priority input features, or predicting only for a segment of data points.

Buy instead of build, to leverage existing ML solutions or platforms to help reduce development time and initial costs. Only when proved valuable and costs increase too much, might be the right time to decide to develop the ML solution in-house.

Using GenAI as an MVP, for some use cases (especially if they involve text or images), genAI APIs can be used as a first approach to solve the prediction step of the system. Tasks like classifying text, sentiment analysis, or image detection, where GenAI models deliver impressive results. When the value is validated and if costs increase too much, the team can decide to build a specific traditional ML model in-house.

Note that using GenAI models for image or text classification, while possible and fast, means using a way too big an complex model (expensive, lack of control, hallucinations) for something that could be predicted with a much simpler and controllable one. A fun analogy would be the idea of delivering a pizza with a truck: it is feasible, but why not just use a bike?

Data is THE recurring problem Data Scientist and ML teams encounter when starting ML initiatives. How many times have you been surprised by data with duplicates, errors, missing batches, weird values And how different that is from the toy datasets you find in online courses!

It can also happen that the data you need is simply not there: the tracking of the specific event was never implemented, collection and proper ETLs where implemented recently I have experienced how this translates into having to wait some months to be able to start a project with enough historic and volume data.

All this relates to the adage Garbage in, garbage out: ML models are only as good as the data theyre trained on. Many times, solutions have a bigger potential to be improve by improving the data than by improving the models (Data Centric AI). Data needs to be sufficient in volume, historic (data generated during years can bring more value than the same volume generated in just a week), and quality. To achieve that, mature data governance, collection, cleaning, and preprocessing are critical.

From the ethical AI point of view, data is also a primary source of bias and discrimination, so acknowledging that and taking action to mitigate these risks is paramount. Considering data governance principles, privacy and regulatory compliance (e.g. EUs GDPR), is also key to ensure a responsible use of data (especially when dealing with personal data).

With GenAI models this is pivoting: huge volumes of data are already used to train them. When using these types of models, we might not need volume and quality data for training, but we might need it for fine-tuning (see Good Data = Good GenAI), or to construct the prompts (nurture the context, few-shot learning, Retrieval Augmented Generation I explained all these concepts in a previous post!).

It is important to note that by using these models we are losing control of the data used to train them, and we can suffer from the lack of quality or type of data used there: there are many known examples of bias and discrimination in GenAI outputs that can negatively impact our solution. A good example was Bloombergs article on how How ChatGPT is a recruiters dream tool tests show theres racial bias. LLM leaderboards testing for biases, or LLMs specifically trained to avoid these biases can be useful in this sense.

We started this blogpost discussing what makes ML Product initiatives especially tricky: the combination of the uncertainty related to developing solutions in digital products, with the uncertainty related to trying to predict things through the use of ML models.

It is comforting to know there are actionable steps and strategies available to mitigate these risks. Yet, perhaps the best ones, are related to starting the initiatives off on the right foot! To do so, it can really help to start with the right problem and an end-to-end design of the solution, reduce initial scope, and prioritize data quality, volume, and historical accuracy.

I hope this post was useful and that it will help you challenge how you start working in future new initiatives related to ML Products!

Starting ML Product Initiatives on the Right Foot - Towards Data Science

Understand SQL Window Functions Once and For All – Towards Data Science

Written by admin on May 4, 2024 — Leave a Comment

Photo by Yasmina H on Unsplash

Window functions are key to writing SQL code that is both efficient and easy to understand. Knowing how they work and when to use them will unlock new ways of solving your reporting problems.

The objective of this article is to explain window functions in SQL step by step in an understandable way so that you dont need to rely on only memorizing the syntax.

Here is what we will cover:

Our dataset is simple, six rows of revenue data for two regions in the year 2023.

If we took this dataset and ran a GROUP BY sum on the revenue of each region, it would be clear what happens, right? It would result in only two remaining rows, one for each region, and then the sum of the revenues:

The way I want you to view window functions is very similar to this but, instead of reducing the number of rows, the aggregation will run in the background and the values will be added to our existing rows.

First, an example:

Notice that we dont have any GROUP BY and our dataset is left intact. And yet we were able to get the sum of all revenues. Before we go more in depth in how this worked lets just quickly talk about the full syntax before we start building up our knowledge.

The syntax goes like this:

Picking apart each section, this is what we have:

Dont stress over what each of these means yet, as it will become clear when we go over the examples. For now just know that to define a window function we will use the OVER keyword. And as we saw in the first example, thats the only requirement.

Moving to something actually useful, we will now apply a group in our function. The initial calculation will be kept to show you that we can run more than one window function at once, which means we can do different aggregations at once in the same query, without requiring sub-queries.

As said, we use the PARTITION BY to define our groups (windows) that are used by our aggregation function! So, keeping our dataset intact weve got:

Were also not restrained to a single group. Similar to GROUP BY we can partition our data on Region and Quarter, for example:

In the image we see that the only two data points for the same region and quarter got grouped together!

At this point I hope its clear how we can view this as doing a GROUP BY but in-place, without reducing the number of rows in our dataset. Of course, we dont always want that, but its not that uncommon to see queries where someone groups data and then joins it back in the original dataset, complicating what could be a single window function.

Moving on to the ORDER BY keyword. This one defines a running window function. Youve probably heard of a Running Sum once in your life, but if not, we should start with an example to make everything clear.

What happens here is that weve went, row by row, summing the revenue with all previous values. This was done following the order of the id column, but it couldve been any other column.

This specific example is not particularly useful because were summing across random months and two regions, but using what weve learned we can now find the cumulative revenue per region. We do that by applying the running sum within each group.

Take the time to make sure you understand what happened here:

Its quite interesting to notice here that when were writing these running functions we have the context of other rows. What I mean is that to get the running sum at one point, we must know the previous values for the previous rows. This becomes more obvious when we learn that we can manually chose how many rows before/after we want to aggregate on.

For this query we specified that for each row we wanted to look at one row behind and two rows ahead, so that means we get the sum of that range! Depending on the problem youre solving this can be extremely powerful as it gives you complete control on how youre grouping your data.

Finally, one last function I want to mention before we move into a harder example is the RANK function. This gets asked a lot in interviews and the logic behind it is the same as everything weve learned so far.

Just as before, we used ORDER BY to specify the order which we will walk, row by row, and PARTITION BY to specify our sub-groups.

The first column ranks each row within each region, meaning that we will have multiple rank ones in the dataset. The second calculation is the rank across all rows in the dataset.

This is a problem that shows up every now and then and to solve it on SQL it takes heavy usage of window functions. To explain this concept we will use a different dataset containing timestamps and temperature measurements. Our goal is to fill in the rows missing temperature measurements with the last measured value.

Here is what we expect to have at the end:

Before we start I just want to mention that if youre using Pandas you can solve this problem simply by running df.ffill() but if youre on SQL the problem gets a bit more tricky.

The first step to solve this is to, somehow, group the NULLs with the previous non-null value. It might not be clear how we do this but I hope its clear that this will require a running function. Meaning that its a function that will walk row by row, knowing when we hit a null value and when we hit a non-null value.

The solution is to use COUNT and, more specifically, count the values of temperature measurements. In the following query I run both a normal running count and also a count over the temperature values.

The normal_count column is useless for us, I just wanted to show what a running COUNT looked like. Our second calculation though, the group_count moves us closer to solving our problem!

Notice that this way of counting makes sure that the first value, just before the NULLs start, is counted and then, every time the function sees a null, nothing happens. This makes sure that were tagging every subsequent null with the same count we had when we stopped having measurements.

Moving on, we now need to copy over the first value that got tagged into all the other rows within that same group. Meaning that for the group 2 needs to all be filled with the value 15.0.

Can you think of a function now that we can use here? There is more than one answer for this, but, again, I hope that at least its clear that now were looking at a simple window aggregation with PARTITION BY .

We can use both FIRST_VALUE or MAX to achieve what we want. The only goal is that we get the first non-null value. Since we know that each group contains one non-null value and a bunch of null values, both of these functions work!

This example is a great way to practice window functions. If you want a similar challenge try to add two sensors and then forward fill the values with the previous reading of that sensor. Something similar to this:

Could you do it? It doesnt use anything that we havent learned here so far.

By now we know everything that we need about how window functions work in SQL, so lets just do a quick recap!

This is what weve learned:

Originally posted here:

Understand SQL Window Functions Once and For All - Towards Data Science

Breaking down State-of-the-Art PPO Implementations in JAX – Towards Data Science

Written by admin on May 4, 2024 — Leave a Comment

Since its publication in a 2017 paper by OpenAI, Proximal Policy Optimization (PPO) is widely regarded as one of the state-of-the-art algorithms in Reinforcement Learning. Indeed, PPO has demonstrated remarkable performances across various tasks, from attaining superhuman performances in Dota 2 teams to solving a Rubiks cube with a single robotic hand while maintaining three main advantages: simplicity, stability, and sample efficiency.

However, implementing RL algorithms from scratch is notoriously difficult and error-prone, given the numerous error sources and implementation details to be aware of.

In this article, well focus on breaking down the clever tricks and programming concepts used in a popular implementation of PPO in JAX. Specifically, well focus on the implementation featured in the PureJaxRL library, developed by Chris Lu.

Disclaimer: Rather than diving too deep into theory, this article covers the practical implementation details and (numerous) tricks used in popular versions of PPO. Should you require any reminders about PPOs theory, please refer to the references section at the end of this article. Additionally, all the code (minus the added comments) is copied directly from PureJaxRL for pedagogical purposes.

Proximal Policy Optimization is categorized within the policy gradient family of algorithms, a subset of which includes actor-critic methods. The designation actor-critic reflects the dual components of the model:

Additionally, this implementation pays particular attention to weight initialization in dense layers. Indeed, all dense layers are initialized by orthogonal matrices with specific coefficients. This initialization strategy has been shown to preserve the gradient norms (i.e. scale) during forward passes and backpropagation, leading to smoother convergence and limiting the risks of vanishing or exploding gradients[1].

Orthogonal initialization is used in conjunction with specific scaling coefficients:

The training loop is divided into 3 main blocks that share similar coding patterns, taking advantage of Jaxs functionalities:

Before going through each block in detail, heres a quick reminder about the jax.lax.scan function that will show up multiple times throughout the code:

A common programming pattern in JAX consists of defining a function that acts on a single sample and using jax.lax.scan to iteratively apply it to elements of a sequence or an array, while carrying along some state. For instance, well apply it to the step function to step our environment N consecutive times while carrying the new state of the environment through each iteration.

In pure Python, we could proceed as follows:

However, we avoid writing such loops in JAX for performance reasons (as pure Python loops are incompatible with JIT compilation). The alternative is jax.lax.scan which is equivalent to:

Using jax.lax.scan is more efficient than a Python loop because it allows the transformation to be optimized and executed as a single compiled operation rather than interpreting each loop iteration at runtime.

We can see that the scan function takes multiple arguments:

Additionally, scan returns:

Finally, scan can be used in combination with vmap to scan a function over multiple dimensions in parallel. As well see in the next section, this allows us to interact with several environments in parallel to collect trajectories rapidly.

As mentioned in the previous section, the trajectory collection block consists of a step function scanned across N iterations. This step function successively:

Scanning this function returns the latest runner_state and traj_batch, an array of transition tuples. In practice, transitions are collected from multiple environments in parallel for efficiency as indicated by the use of jax.vmap(env.step, )(for more details about vectorized environments and vmap, refer to my previous article).

After collecting trajectories, we need to compute the advantage function, a crucial component of PPOs loss function. The advantage function measures how much better a specific action is compared to the average action in a given state:

Where Gt is the return at time t and V(St) is the value of state s at time t.

As the return is generally unknown, we have to approximate the advantage function. A popular solution is generalized advantage estimation[3], defined as follows:

With the discount factor, a parameter that controls the trade-off between bias and variance in the estimate, and t the temporal difference error at time t:

As we can see, the value of the GAE at time t depends on the GAE at future timesteps. Therefore, we compute it backward, starting from the end of a trajectory. For example, for a trajectory of 3 transitions, we would have:

Which is equivalent to the following recursive form:

Once again, we use jax.lax.scan on the trajectory batch (this time in reverse order) to iteratively compute the GAE.

Note that the function returns advantages + traj_batch.value as a second output, which is equivalent to the return according to the first equation of this section.

The final block of the training loop defines the loss function, computes its gradient, and performs gradient descent on minibatches. Similarly to previous sections, the update step is an arrangement of several functions in a hierarchical order:

Lets break them down one by one, starting from the innermost function of the update step.

This function aims to define and compute the PPO loss, originally defined as:

Where:

However, the PureJaxRL implementation features some tricks and differences compared to the original PPO paper[4]:

Heres the complete loss function:

The update_minibatch function is essentially a wrapper around loss_fn used to compute its gradient over the trajectory batch and update the model parameters stored in train_state.

Finally, update_epoch wraps update_minibatch and applies it on minibatches. Once again, jax.lax.scan is used to apply the update function on all minibatches iteratively.

From there, we can wrap all of the previous functions in an update_step function and use scan one last time for N steps to complete the training loop.

A global view of the training loop would look like this:

We can now run a fully compiled training loop using jax.jit(train(rng)) or even train multiple agents in parallel using jax.vmap(train(rng)).

There we have it! We covered the essential building blocks of the PPO training loop as well as common programming patterns in JAX.

To go further, I highly recommend reading the full training script in detail and running example notebooks on the PureJaxRL repository.

Thank you very much for your support, until next time

Full training script, PureJaxRL, Chris Lu, 2023

[1] Explaining and illustrating orthogonal initialization for recurrent neural networks, Smerity, 2016

[2] Initializing neural networks, DeepLearning.ai

[3] Generalized Advantage Estimation in Reinforcement Learning, Siwei Causevic, Towards Data Science, 2023

[4] Proximal Policy Optimization Algorithms, Schulman et Al., OpenAI, 2017

See more here:

Breaking down State-of-the-Art PPO Implementations in JAX - Towards Data Science

UVA’s new School of Data Science has grand opening – – CBS19 News

Written by admin on April 30, 2024 — Leave a Comment

Inside there is also a space to bring in researchers and guest speakers, as well as host annual events, including the School's annual Datapalooza and Women in Data Science events.

The building also features a giant data sculpture. It is an interactive art piece that displays different data.

"The genome related to Alzheimer's disease is loaded up on there, we've got the number of hours it took to build this building by day is loaded up on there. The idea is you select a data set, and you are rewarded by seeing the data fill the sculpture with light," said Burgess.

Burgess says the growth of the School of Data Science shows how much the university believes data science will change the world.

"The fact that we have university leadership here saying, 'This is a place that brings us together.' We have the governor here saying, 'This is important for the Commonwealth, this is important for higher education.' We really believe that and we think that what we've seen is that other people believe that as well," she said.

Link:

UVA's new School of Data Science has grand opening - - CBS19 News

Deloitte Broadens Biosecurity, Public Health Capabilities With Gryphon Scientific Acquisition; Beth Meagher Quoted – GovCon Wire

Written by admin on April 30, 2024 — Leave a Comment

Deloitte has acquired Gryphon Scientific for an undisclosed sum as part of efforts to broaden its biosafety, biosecurity, public health and emergency preparedness and response capabilities.

Through the transaction, Deloitte said Monday it will absorb Gryphons scientists, programmers, data analysts, public health specialists and planning and policy professionals with experience in scientific communications, modeling, data science and risk assessment.

The two companies will also develop artificial intelligence applications through Deloittes Federal Health AI Accelerator to improve public health and safety and help customers prepare for chemical threats, biothreats and other biological emergencies.

Beth Meagher, U.S. federal health sector leader and principal at Deloitte Consulting, said the addition of Gryphons professionals and leadership reflects an enhancement to the consulting firms data analytics and advanced tech capabilities.

Our federal health practice is excited to lead the way for U.S. government and public services (GPS) to push the boundaries and bring our clients to the forefront of AI-enabled, mission-driven work, Meagher noted. This enhances the types of data-driven technology and scientific experience that we can offer to federal agencies and strengthens our ability to support government leaders in their efforts to safeguard the security of our nation and the health and safety of our people.

See original here:

Deloitte Broadens Biosecurity, Public Health Capabilities With Gryphon Scientific Acquisition; Beth Meagher Quoted - GovCon Wire

Gilbane completes University of Virginia’s School of Data Science building – World Construction Network

Written by admin on April 30, 2024 — Leave a Comment

Gilbane Building Company has completed the construction of the University of Virginias (UVA) School of Data Science building in the US.

The project was supported via a $120m donation from the Quantitative Foundation.

The design process, which commenced in January 2020, was a collaborative effort involving Hopkins Architects, VMDO, and the universitys Office of the Architect.

Ground-breaking on the project took place in October 2021, with construction led by Gilbane.

Gilbane Richmond Virginia business leader Maggie Reed said: We are delighted to continue our partnership with UVA on this cutting-edge facility that creates opportunity for collaborative, open, and responsible data science research and education as the schools goals state.

While we helped build a physical building to house the programme, we are excited about what this school without walls will create in the future.

Access the most comprehensive Company Profiles on the market, powered by GlobalData. Save hours of research. Gain competitive edge.

Your download email will arrive shortly

We are confident about the unique quality of our Company Profiles. However, we want you to make the most beneficial decision for your business, so we offer a free sample that you can download by submitting the below form

Country * UK USA Afghanistan land Islands Albania Algeria American Samoa Andorra Angola Anguilla Antarctica Antigua and Barbuda Argentina Armenia Aruba Australia Austria Azerbaijan Bahamas Bahrain Bangladesh Barbados Belarus Belgium Belize Benin Bermuda Bhutan Bolivia Bonaire, Sint Eustatius and Saba Bosnia and Herzegovina Botswana Bouvet Island Brazil British Indian Ocean Territory Brunei Darussalam Bulgaria Burkina Faso Burundi Cambodia Cameroon Canada Cape Verde Cayman Islands Central African Republic Chad Chile China Christmas Island Cocos Islands Colombia Comoros Congo Democratic Republic of the Congo Cook Islands Costa Rica Cte d"Ivoire Croatia Cuba Curaao Cyprus Czech Republic Denmark Djibouti Dominica Dominican Republic Ecuador Egypt El Salvador Equatorial Guinea Eritrea Estonia Ethiopia Falkland Islands Faroe Islands Fiji Finland France French Guiana French Polynesia French Southern Territories Gabon Gambia Georgia Germany Ghana Gibraltar Greece Greenland Grenada Guadeloupe Guam Guatemala Guernsey Guinea Guinea-Bissau Guyana Haiti Heard Island and McDonald Islands Holy See Honduras Hong Kong Hungary Iceland India Indonesia Iran Iraq Ireland Isle of Man Israel Italy Jamaica Japan Jersey Jordan Kazakhstan Kenya Kiribati North Korea South Korea Kuwait Kyrgyzstan Lao Latvia Lebanon Lesotho Liberia Libyan Arab Jamahiriya Liechtenstein Lithuania Luxembourg Macao Macedonia, The Former Yugoslav Republic of Madagascar Malawi Malaysia Maldives Mali Malta Marshall Islands Martinique Mauritania Mauritius Mayotte Mexico Micronesia Moldova Monaco Mongolia Montenegro Montserrat Morocco Mozambique Myanmar Namibia Nauru Nepal Netherlands New Caledonia New Zealand Nicaragua Niger Nigeria Niue Norfolk Island Northern Mariana Islands Norway Oman Pakistan Palau Palestinian Territory Panama Papua New Guinea Paraguay Peru Philippines Pitcairn Poland Portugal Puerto Rico Qatar Runion Romania Russian Federation Rwanda Saint Helena, Ascension and Tristan da Cunha Saint Kitts and Nevis Saint Lucia Saint Pierre and Miquelon Saint Vincent and The Grenadines Samoa San Marino Sao Tome and Principe Saudi Arabia Senegal Serbia Seychelles Sierra Leone Singapore Slovakia Slovenia Solomon Islands Somalia South Africa South Georgia and The South Sandwich Islands Spain Sri Lanka Sudan Suriname Svalbard and Jan Mayen Swaziland Sweden Switzerland Syrian Arab Republic Taiwan Tajikistan Tanzania Thailand Timor-Leste Togo Tokelau Tonga Trinidad and Tobago Tunisia Turkey Turkmenistan Turks and Caicos Islands Tuvalu Uganda Ukraine United Arab Emirates US Minor Outlying Islands Uruguay Uzbekistan Vanuatu Venezuela Vietnam British Virgin Islands US Virgin Islands Wallis and Futuna Western Sahara Yemen Zambia Zimbabwe Kosovo

Industry * Academia & Education Aerospace, Defense & Security Agriculture Asset Management Automotive Banking & Payments Chemicals Construction Consumer Foodservice Government, trade bodies and NGOs Health & Fitness Hospitals & Healthcare HR, Staffing & Recruitment Insurance Investment Banking Legal Services Management Consulting Marketing & Advertising Media & Publishing Medical Devices Mining Oil & Gas Packaging Pharmaceuticals Power & Utilities Private Equity Real Estate Retail Sport Technology Telecom Transportation & Logistics Travel, Tourism & Hospitality Venture Capital

Tick here to opt out of curated industry news, reports, and event updates from World Construction Network.

Submit and download

The UVAs School of Data Science now features a new four-storey, 61,000ft facility.

The building, situated at the east entrance of the Emmet-Ivy Corridor, is part of a larger expansion project by the university.

It has been designed with open hallways, an atrium, stairs, adaptive classrooms, and research and meeting areas, aiming to create an interactive and collaborative learning environment.

UVA president Jim Ryan said: This beautiful and unique new space gives students, faculty, and staff a home base for research and teaching that intersects with schools and disciplines across grounds and beyond.

The School of Data Science building has achieved Leadership in Energy and Environmental Design (LEED) Gold certification for its green design.

It incorporates solar panels, daylighting strategies, and shading systems to minimise the need for artificial lighting.

The building is expected to derive 15% of its power from solar energy, using four arrays of solar photovoltaic panels on its roof.

This aligns with the UVAs Sustainability Plan, which targets carbon neutrality by 2030 and a fossil fuel-free status by 2050.

Give your business an edge with our leading industry insights.

See more here:

Gilbane completes University of Virginia's School of Data Science building - World Construction Network

Is Data Science a Bubble Waiting to Burst? – KDnuggets

Written by admin on April 30, 2024 — Leave a Comment

Image by Author

I once spoke with a guy who bragged that, armed only with some free LinkedIn courses and an outdated college Intro to SQL course, hed managed to bag a six-figure job in data science. Nowadays, most people struggling to get a good data science job will agree thats unlikely to happen. Does that mean the data science job category is a popped bubble or worse, that it hasnt yet burst, but is about to?

In short, no. Whats happened is that data science used to be an undersaturated field, easy to get into if you used the right keywords on your resume. Nowadays, employers are a little more discerning and often have specific skill sets in mind that theyre looking for.

The bootcamps, free courses, and Hello World projects dont cut it anymore. You need to prove specific expertise and nail your data science interview, not just drop buzzwords. Not only that, but the shine of data scientist has worn off a little. For a long time, it was the sexiest job out there. Now? Other fields, like AI and machine learning, are just a bit sexier.

That all being said, there are still more openings in data science than there are applicants, and reliable indicators say the field is growing, not shrinking.

Not convinced? Lets look at the data.

Over the course of this article, Ill drill down into multiple graphs, charts, figures, and percentages. But lets start with just one percentage from one outstandingly reputable source: The Bureau of Labor Statistics.

The BLS predicts that there will be a 35 percent change in employment from 2022 to 2032 for data scientists. In short, in 2032, there will be about a third more jobs in data science than there were in 2022. For comparison, the average growth rate for all jobs is 3 percent. Keep that number in mind as you go through the rest of this article.

The BLS does not think that data science is a bubble waiting to burst.

Now we can start getting into a bit of the nitty gritty. The first signs people point to as signs of a popped or impending bubble pop are the mass layoffs in data science.

Its true that the numbers dont look good. Starting in 2022 and continuing through 2024, the tech sector in general experienced 430k layoffs. Its difficult to tease out data science-specific data from those numbers, but the best guesses are that around 30 percent of those were in data science and engineering.

Source: https://techcrunch.com/2024/04/05/tech-layoffs-2023-list/

However, thats not a burst bubble of data science. Its a little smaller in scope than that its a pandemic bubble popping. In 2020, as more people stayed home, profits rose, and money was cheap, FAANG and FAANG-adjacent companies scooped up record numbers of tech workers, only to lay many of them off just a few years later.

If you zoom out and look at the broader picture of hirings and layoffs, youll be able to see that the post-pandemic slump is a dip in an overall rising line, which is even now beginning to recover:

Source: https://www.statista.com/

You can clearly see the huge dip in tech layoffs during 2020 as the market tightened, and then the huge spike starting in Q1 of 2022 as layoffs began. Now, in 2024, the number of layoffs is smaller than in 2023.

Another scary stat often touted is that FAANG companies shuttered their job openings by 90% or more. Again, this is most in reaction to a widely high number of job openings during the pandemic.

That being said, job openings in the tech sector are still lower than they were pre-pandemic. Below, you can see an adjusted chart showing demand for tech jobs relative to February 2020. Its clear to see that the tech sector took a blow its not recovering from any time soon.

Source: https://www.hiringlab.org/2024/02/20/labor-market-update-tech-jobs-below-pre-pandemic-levels/

However, lets look a little closer at some real numbers. Looking at the chart below, while job openings are indubitably down from their 2022 peak, the overall number of openings is actually increasing up 32.4% from the lowest point.

Source: https://www.trueup.io/job-trend

If you look at any labor and news reports online, youll see theres a bit of an anti-remote, anti-tech backlash happening at the moment. Meta, Google, and other FAANG companies, spooked by the bargaining power that employees enjoyed during the pandemic heights, are now pushing for return-to-office mandates (data science jobs and other tech jobs are often remote) and laying off large quantities of employees somewhat unnecessarily, judging by their revenue and profit reports.

Just to give one example, Googles parent company Alphabet laid off over 12,000 employees over the course of 2023 despite growth across its ad, cloud, and services divisions.

This is just one facet with which to examine the data, but part of the reason companies are doing these layoffs is more to do with making the board happy rather than any decreased need for data scientists.

I find that people believing were in a data science bubble are most often those who dont really know what data scientists do. Think of that BLS stat and ask yourself: why does this well-informed government agency believe that theres strong growth in this sector?

Its because the need for data scientists cannot go away. While the names might be changed AI expert or ML Cloud Specialist rather than Data Scientist the skills and tasks that data scientists perform cant be outsourced, dropped, decreased, or automated.

For example, predictive models are essential for businesses to forecast sales, predict customer behavior, manage inventory, and anticipate market trends. This enables companies to make informed decisions, plan strategically for the future, and maintain competitive advantages.

In the financial sector, data science plays a crucial role in identifying suspicious activities, preventing fraud, and mitigating risks. Advanced algorithms analyze transaction patterns to detect anomalies that may indicate fraud, helping protect businesses and consumers alike.

NLP enables machines to understand and interpret human language, powering applications like chatbots, sentiment analysis, and language translation services. This is critical for improving customer service, analyzing social media sentiment, and facilitating global communication.

I could list dozens more examples demonstrating that data science is not a fad, and data scientists will always be in demand.

Revisiting my anecdote from earlier, part of the reason it feels like were in a bubble that is either popping or about to pop is the perception of data science as a career.

Back in 2011, Harvard Business Review famously called it the sexiest job of the decade. In the intervening years, companies hired more data scientists than they knew what to do with, often unsure about what data scientists actually did.

Now, a decade and a half later, the field is a little wiser. Employers understand that data science is a broad field, and are more interested in hiring machine learning specialists, data pipeline engineers, cloud engineers, statisticians, and other specialties that broadly fall under the data science hat but are more specialized.

This also helps explain why this idea of walking into six figure job straight out of a bachelor's degree used to be the case - since employers didn't know better - but now is impossible to do. The lack of easy data science jobs makes it feel like the market is tighter. It's not; data shows job openings are still high and demand is still greater than the graduates coming out with appropriate degrees. But employers are more discerning and unwilling to take a chance on untried college grads with no demonstrated experience.

Finally, you can take a look at the tasks that data scientists do and ask yourself what companies would do without those tasks getting done.

If you don't know much about data science, you might guess that companies can simply automate this work, or even go without. But if you know anything about the actual tasks data scientists do, you understand that the job is, currently, irreplaceable.

Think of how things were in the 2010s: that guy I talked about, with just a basic understanding of data tools, catapulted himself into a lucrative career. Things arent like that anymore, but this recalibration isnt a sign of a bursting bubble as some believe. Instead, its the field of data science maturing. The entry-level data science field may be oversaturated, but for those with specialized skills, deep knowledge, and practical experience, the field is wide open.

Furthermore, this narrative of a bubble is fueled by a misunderstanding of what a bubble actually represents. A bubble occurs when the value of something (in this case, a career sector) is driven by speculation rather than actual intrinsic worth. However, as we covered, the value proposition of data science is tangible and measurable. Companies need data scientists, plain and simple. Theres no speculation there.

Theres also a lot of media sensationalism surrounding the layoffs in big tech. While these layoffs are significant, they reflect broader market forces rather than a fundamental flaw in the data science discipline. Dont get caught up in the headlines.

Finally, its also worth noting that the perception of a bubble may stem from how data science itself is changing. As the field matures, the differentiation between roles becomes more pronounced. Job titles like data engineering, data analysis, business intelligence, machine learning engineering, and data science are more specific, and require a more niche skill set. This evolution can make the data science job market appear more volatile than it is, but in reality, companies just have a better understanding of their data science needs and can recruit for their specialities.

If you want a job in data science, go for it. Theres very little chance were actually in a bubble. The best thing you can do is, as Ive indicated, pick your specialty and develop your skills in that area. Data science is a broad field, spilling over into different industries, languages, job titles, responsibilities, and seniorities. Select a specialty, train the skills, prep for the interview, and secure the job.

Nate Rosidi is a data scientist and in product strategy. He's also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.

Cloud Hosting

Category Archives: Data Science

Analytics and Data Science News for the Week of May 3; Updates from Databricks, DataRobot, MicroStrategy & More – Solutions Review

What Does a Data Analyst Do? | SNHU – Southern New Hampshire University

How CS professor and team discovered that LLM agents can hack websites – Illinois Computer Science News

Starting ML Product Initiatives on the Right Foot – Towards Data Science

Understand SQL Window Functions Once and For All – Towards Data Science

Breaking down State-of-the-Art PPO Implementations in JAX – Towards Data Science

UVA’s new School of Data Science has grand opening – – CBS19 News

Deloitte Broadens Biosecurity, Public Health Capabilities With Gryphon Scientific Acquisition; Beth Meagher Quoted – GovCon Wire

Gilbane completes University of Virginia’s School of Data Science building – World Construction Network

Is Data Science a Bubble Waiting to Burst? – KDnuggets

Recent Posts

Categories

Archives

Media Sites

Pages

Site admin