Category Archives: Data Science
Continual Learning A Deep Dive Into Elastic Weight Consolidation Loss – Towards Data Science
One of the most significant challenges in training artificial neural networks is catastrophic forgetting. This problem arises when a neural network trained on one task (Task A) subsequently learns a new task (Task B) and, in the process, forgets how to perform the original task. In this article, we will explore a method to address this issue known as Elastic Weight Consolidation (EWC). EWC offers a promising approach to mitigate catastrophic forgetting enabling neural networks to retain knowledge of previously learned tasks while acquiring new skills.
All figures in this article are by author unless otherwise specified
It has been shown that there exist many configurations of optimal parameters with a desired low error on a task gray and yellow regions for tasks A and B respectively in the above figure. Assuming we found one such configuration * for task A, when continuing to train the model from such configuration to a new task B we have three different scenarios:
Read more:
Continual Learning A Deep Dive Into Elastic Weight Consolidation Loss - Towards Data Science
Managing ML Projects: The CRISP-DM Process | by Mohsen Nabil | Jul, 2024 – DataDrivenInvestor
In this article, we will talk about solving problems using data science and machine learning. Before we go into the technical details, lets answer an important question: why should we follow a process?
As engineers, we love solving problems and often want to jump straight into finding solutions. However, if we dont properly define the problems were working on, we can waste a lot of time and money on the wrong issues. Additionally, if we dont follow a structured approach and perform the right tasks in the right order, we risk inefficiency and failure. For instance, if we start modeling before cleaning and processing our data, we might end up with a poor-quality model, regardless of our efforts. The old saying garbage in, garbage out is particularly relevant here.
A systematic process is essential for organizing work, distributing responsibilities among team members, and ensuring that each step is completed properly. Various processes are available for data science projects, but in this article, we will focus on the most common one: CRISP-DM (Cross Industry Standard Process for Data Mining).
CRISP-DM was developed in 1996 by a group of European companies from different industries. It is a flexible, industry-neutral approach to data mining and machine learning projects. Even though it is over 25 years old, it remains the most widely used method in data science. Major corporations, including IBM, use CRISP-DM or versions of it.
See the article here:
Managing ML Projects: The CRISP-DM Process | by Mohsen Nabil | Jul, 2024 - DataDrivenInvestor
Modeling the Extinction of the Catalan Language | by Pol Marin | Jun, 2024 – Towards Data Science
Applying existing literature to a practical case Photo by Brett Jordan on Unsplash
Can we predict the extinction of a language? It doesn't sound easy, and it indeed shouldnt, but it shouldnt stop us from trying to model it.
I was recently interested in this topic and started reviewing some of the existing literature. I came across one article[1] that I enjoyed and thought of sharing.
So, in this post, Ill be sharing the insights of that paper, translated into (hopefully) a simple read and applied to a practical case so we can see data science and mathematical modeling in action.
I am Catalan and, for those who dont know, Catalan is a co-official language in Catalonia, Valencian Community, and the Balearic Islands (Spain) along with Spanish. Its also the official language in Andorra, found in the south of France and even in Alghero (Italy).
Its often that we see on local TV or media that the Catalan language is at risk of extinction. Focusing only on Catalonia, we can easily dig deeper into the case because the government takes care of studying the use of the language through whats called the survey of linguistic uses of the population (Enquesta dusos lingistics de la poblaci)[2].
See more here:
Modeling the Extinction of the Catalan Language | by Pol Marin | Jun, 2024 - Towards Data Science
AI Innovations and Their Lasting Impact on Data Science – TechiExpert.com
This is an era of fast-paced technological landscape. Artificial intelligence (AI) is the transformative force amid such a scenario and particularly in the data science sector. Synergy between the two has revolutionized data analysis. Simultaneously, new horizons have been opened up for innovative applications.
Automated Machine Learning (AutoML) is an innovation result of the combination. It democratizes access to machine learning capabilities. It automates complex tasks such as data transformation, algorithm selection, parameter tuning and results interpretation. It saves time in data analysis and also makes advanced analytical tools accessible to a broader audience.
Machine learning has enhanced predictive analytics too by incorporating deep learning, neural networks and more such techniques. The technologies continuously improve accuracy as it keeps on learning from vast datasets. AI-driven predictive analytics can forecast disease outbreaks. It can also forecast health risks of specific patient.
Natural Language Processing (NLP) has revolutionized how data scientists interact with data. It enables meaningful information extraction from text sources like social media posts, emails and documents. It has led to the development of various applications. It also bridges the gap between human language and computer understanding.
It is true that AI has greatly improved data visualization techniques. It has become more interactive and insightful. It can help in identifying patterns and correlations by data analysis. The resulting visualizations are clearer and more compelling. Hence, it helps business executives and stakeholders to grasp complex information quickly. This further facilitates better decision-making and strategic planning.
One of the most important areas is that AI practice should be ethical. AI systems are unbiased based on the data they are trained on. Hence, the focus should be in developing such algorithms that prevents and eliminates biases.
Go here to see the original:
AI Innovations and Their Lasting Impact on Data Science - TechiExpert.com
The best Python classes, certificates and bootcamps in NYC – Time Out
There are many programming languages and classes that you could take, so why should you take a course in Python? For starters, Python is one of the fastest-growing programming languages with around 8.2 million users. This vast user community means youll have plenty of company, and a deep network of online forums, local meetups, and the open-source community where youll be able to learn and share. If you are a developer, theres a good chance youll work on projects where Python skills are needed. Python is also one of the easiest languages to learn and use. It was designed to be concise and easy to read, with an efficient syntax that uses fewer lines of code than other languages.
Python has a large collection of frameworks and libraries that speed up your work, and it is compatible across operating systems. Python is extremely versatile and is used for diverse applications, like web development, data science, machine learning, artificial intelligence, and scientific computing. This versatility makes Python an in-demand skill and it means youll be able to choose from a wide variety of career options. In addition, Python is an open-source language, making it license-free and open to contributions from diverse user groups.
Learning Python helps you advance your career as a developer or newly enter the field. If you already have a quantitative background or analytical skills, adding a specialized data science skill set can lead to a high-paying job in fields like data science, advanced analytics, or business intelligence. Employment for software developers is strong and still growing so many opportunities are waiting for people with Python skills.
Python is such a popular programming language that there are many resources available. You can find step-by-step lessons online that introduce the basics or explain specific tasks. This is a great way to become familiar with the capabilities of this robust language. You can also join communities online and on social media where people share what they know and answer questions.
If you want a comprehensive education in Python and a solid set of skills that will help you launch a programming career, Python courses are a great option. They systematically cover the skills youll need with guidance from an instructor whos an expert in the industry. When youre learning Python, its essential to put your budding skills to use and get plenty of practice. The more you practice the better youll become, and building projects is a great way to improve your understanding of the language. Look for classes that incorporate real-world projects, give you hands-on practice, and help you build a portfolio of your skills.
Choosing the best Python course may seem daunting. Some courses are only a few hours and others extend for weeks. Courses may specialize in certain aspects of the language or focus on a particular application, like data science or machine learning. Choosing the right course for you depends on whether you are interested in learning Python as a hobby or for your career, and which field youre trying to enter.
If youre new to Python, look for a course thats designed for beginners. Youll want a basic orientation to the language and the logic behind it so you can start to think like a programmer. For more advanced skills, many classes will teach you to use libraries like Scikit-Learn, Numpy, and MatPlotLib. Advanced courses may focus on Python skills for particular fields like data science or finance. Look for a course with a live instructor who can answer your questions and provide feedback, and check whether youll be able to work on real-world projects that give you practice with Python and help you begin to build a portfolio. Python private tutoring is a good option if youd like one-on-one sessions where you can learn at your own pace and focus on the topics that meet your goals.
What you will learn in a Python course depends on your skill level and the way that you want to use Python. If youre completely new to Python, youll start by learning about the development environment and Python syntax and structure. Then, youll learn to work with different types of data and variables and write control structures like conditional statements and loops. Youll gain an understanding of blocks of code called functions and how to order data into structures. Once you understand how to use Python, you can explore the way that libraries expand its capabilities.
Object-oriented programming is useful to know because many other programming languages use it in addition to Python. OOP allows you to write concise, legible code and create secure and reliable software. You can interact with databases using libraries like SQLite or SQLAlchemy, and you can manipulate and visualize data using libraries like Pandas, Matplotlib, or Seaborn, making your data more meaningful and shareable. If you are using Python for web development, youll explore web frameworks like Flask or Django and learn to create simple web applications. You can also automate tasks using Python scripts and libraries like selenium. Last but not least, youll learn to test your programs and use debugging techniques and tools.
By the end of a comprehensive Python course, you should be able to write Python scripts, use Pythons powerful libraries and frameworks, and develop simple applications or perform data analysis tasks.
Python is widely used in NYC across various industries and by numerous companies, making it a valuable skill to learn. Financial firms like Goldman Sachs and JP Morgan Chase use Python for developing trading algorithms, risk management systems, and quantitative analysis, and they use Pythons libraries to process and visualize large datasets that guide investment decisions. NYCs thriving ecosystem of AI startups relies on Python libraries like TensorFlow, Keras, and Scikit-Learn to build machine learning models and AI-driven applications. Python frameworks like Django and Flask allow them to develop robust and scalable applications quickly.
At media companies like Spotify and The New York Times, Pythons versatility with big data makes it ideal for analyzing user data and improving content recommendations and ad targeting. E-commerce companies like Etsy and Warby Parker also use Python to develop recommendation systems for consumers. Python is invaluable for analyzing sales data, managing inventory, and predicting consumer trends using data analytics.
Pythons powerful data analysis capability is also widely used in the booming field of life sciences, where it powers bioinformatics and healthcare research by companies like Pfizer and Memorial Sloan Kettering Cancer Center, and it helps to analyze patient data and improve healthcare. Pythons simplicity and readability make it a popular choice for educational software development at platforms like Khan Academy and Coursera. Real estate firms like Zillow and Compass use Python to analyze market trends and property values, and it excels at automating tasks like property listings and data collection.
A Python bootcamp is worthwhile if you want to build your skills in this versatile and widely-used programming language. When you complete a Python bootcamp, youll learn essentials like basic syntax and data structures and get hands-on experience building programs. A bootcamp can help you leap forward from basic programming skills to specialties like data science and machine learning, and it often includes mentoring, career coaching, and networking opportunities. A Python bootcamp will give you the confidence to tackle real-world projects and boost your job prospects in tech-savvy cities like NYC. A Python bootcamp can be a great investment in your tech career.
Enrolling in a career-focused Python bootcamp can help you prepare for a future career in one of several different industries. Python is heavily utilized in the field of data science, so if you are interested in becoming a data scientist or a data analyst, learning the skills youll pick up in a data book camp will greatly improve your chances of finding work. Since data has become such an essential part of virtually every industry, these skills are highly marketable, especially in a city like NYC where so many finance and investing firms are located (FinTech is a huge part of data science after all). In NYC, data scientists can expect to earn over $100,000 a year.
Beyond working in data science, Python is an important part of the emerging technological revolutions in machine learning and artificial intelligence. Python is utilized to write the algorithms that allow for LLMs, Chatbots and other artificial intelligence applications to operate and it is an important part of the learning process, allowing machines to read and interpret large amounts of data with the help of a human operator. It is difficult to tell what the future holds for this emerging technology, but it is certainly having an impact across a range of industries (including finance, commerce and advertising, all of which are key parts of the NYC economy). Learning how to program these algorithms is an important aspect of leveraging this technology and businesses are paying a premium for skilled Python Developers who can help them take advantage of these automated systems.
Even if you arent aiming for a career in Python, learning how to manipulate, collect and query data is useful for any aspiring professional looking to grow their brand, get attention to their start-up or otherwise work with investors to get a project off the ground. Learning how to use basic Python programming skills and techniques will ensure that you arent leaving valuable information on the table, particularly as data analysis becomes increasingly important for anyone to get an edge in the market. You dont want to get left behind and taking a Python bootcamp can help ensure that you are able to leverage data to suit your needs.
See the rest here:
The best Python classes, certificates and bootcamps in NYC - Time Out
Using OpenAI and PandasAI for Series Operations | by Michael B Walker | Jun, 2024 – Towards Data Science
Incorporate natural language queries and operations into your Python data cleaning workflow. Red panda drawing donated by Karen Walker, the artist.
Many of the series operations we need to do in our pandas data cleaning projects can be assisted by AI tools, including by PandasAI. PandasAI takes advantage of large language models, such as that from OpenAI, to enable natural language queries and operations on data columns. In this post, we examine how to use PandasAI to query Series values, create new Series, set Series values conditionally, and reshape our data.
You can install PandasAI by entering pip install pandasai into a terminal or into Windows Powershell. You will also need to get a token from openai.com to send a request to the OpenAI API.
As the PandasAI library is developing rapidly, you can anticipate different results depending on the versions of PandasAI and pandas you are using. In this article, I use version 1.4.8 of PandasAI and version 1.5.3 of pandas.
We will work with data from the National Longitudinal Study of Youth (NLS) conducted by the United States Bureau of Labor Statistics. The NLS has surveyed the same cohort of high school students for over 25 years, and has useful data items on educational outcomes and weeks worked for each of those years, among many other variables. It is available for public use at nlsinfo.org. (The NLS public releases are covered by the United States government Open Data Policy, which permits both non-commercial and commercial use.)
We will also work with COVID-19 data provided by Our World in Data. That dataset has one row per country per day with number of new cases and new deaths. This dataset is available for download at ourworldindata.org/covid-cases, with a Creative Commons CC BY 4.0 license. You can also download all code and data used in this post from GitHub.
We start by importing the OpenAI and SmartDataframe modules from PandasAI. We also have to instantiate an llm object:
Next, we load the DataFrames we will be using and create a SmartDataframe object from the NLS pandas DataFrame:
Now we are ready to generate summary statistics on Series from our SmartDataframe. We can ask for the average for a single Series, or for multiple Series:
We can also summarize Series values by another Series, usually one that is categorical:
We can also create a new Series with the chat method of SmartDataframe. We do not need to use the actual column names. For example, PandasAI will figure out that we want the childathome Series when we write child at home:
We can use the chat method to create Series values conditionally:
PandasAI is quite flexible regarding the language you might use here. For example, the following provides the same results:
We can do calculations across a number of similarly named columns:
This will calculate the average of all weeksworked00-weeksworked22 columns and assign that to a new column called weeksworked.
We can easily impute values where they are missing based on summary statistics:
We can also use PandasAI to do some reshaping. Recall that the COVID-19 case data has new cases for each day for each country. Lets say we only want the first row of data for each country. We can do that the traditional way with drop_duplicates:
We can get the same results by creating a SmartDataframe and using the chat method. The natural language I use here is remarkably straightforward, Show first casedate and location and other values for each country:
Notice that PandasAI makes smart choices about the columns to get. We get the columns we need rather than all of them. We could have also just passed the names of the columns we wanted to chat. (PandasAI sorted the rows by iso_code, rather than by location, which is why the first row is different.)
Much of the work when using PandasAI is really just importing the relevant libraries and instantiating large language model and SmartDataframe objects. Once thats done, simple sentences sent to the chat method of the SmartDataframe are sufficient to summarize Series values and create new Series.
PandasAI excels at generating simple statistics from Series. We dont even need to remember the Series name exactly. Often the natural language we might use can be more intuitive than traditional pandas methods like groupby. The Show satmath average by gender value passed to chat is a good example of that.
Operations on Series, including the creation of a new Series, is also quite straightforward. We created a total number of children Series (childnum) by instructing the SmartDataframe to add the number of children living at home to the number of children not living at home. We didnt even provide the literal Series names, childathome and childnotathome respectively. PandasAI figured out what we meant.
Since we are passing natural language instructions to chat for our Series operations, there is no one right way to get what we want. For example, we get the same result when we passed evermarried is No when maritalstatus is Never-married, else Yes to chat as we did with if maritalstatus is Never-married set evermarried2 to No, otherwise Yes.
We can also do fairly extensive DataFrame reshaping with simple natural language instructions, as in the last command we provided. We add and other values to the instructions to get columns other than casedate. PandasAI also figures out that location makes sense as the index.
You can read more about how to use PandasAI and SmartDataframes here:
Or in the second edition of my book, Python Data Cleaning Cookbook:
Good luck with your data cleaning and I would love to hear how things are going!
Read the original post:
Moving Brands’ identity for Eppo sets data science apart with evolving graphic systems and shifting forms – It’s Nice That
When software company Eppo reached out to Moving Brands, the team quickly realised that the platform needed an identity and website that reflects their difference in the data market: Eppo is an experimentation platform that was founded by a data scientist in a landscape dominated by engineering-founded competitors, says Jordan Heber, strategy director at Moving Brands. The A/B testing platform that allows companies to successfully compare two versions of a web page or app against each other to determine which one performs better, gives companies the chance to grow, by using their own data as a tool for business. Moving brands aimed to reimagine a visual identity for Eppos model that celebrates transformation, creating new ideas from disparate inputs, and the brands dynamic, athletic nature, all whilst leaving room for flexibility, for the brand to expand in future.
Delivered by a multidisciplinary team across London and the US, the independent design studio reimagined Eppos brand's identity to constantly evolve, with a system of modular visuals that allow for endless iterations and expressions, says Joel Smith, design director at Moving Brands. The data company challenged us to lean into the weird, he adds, in order to help them differentiate their unique background in data science and showcase their experimental spirit. From a palette of bold and unusual colour pairings designed to evoke the discomfort of embracing a culture of experimentation, to a modular new Eppo logo that shapes and rearranges itself: it becomes more experimental as it scales up just like the best tech companies do, says Joel, all aspects of the user experience are constantly in motion, with an experimental feel across print and digital.
Go here to see the original:
South River president on reverse mortgage business, data science and priorities – HousingWire
Chris Clow/RMD: How has reverse business this year been going, and how do you foresee business progressing by the time we get to December?
Tyler Plack: We just finished out May, and I thought the May results were very good. Im excited about where were headed. Some of our corporate goals have been to increase the size of our sales floor and to focus on bringing on some of these remote loan officers in the feet in the street model. The reception weve received from some of those loan officers has been very warm and exciting.
What were seeing in terms of our in-person and call-center model is that it continues to grow as well. The results have been increasing month over month for every month this year. Our expectation is that this trend will continue through the end of the year. To say that I am bullish on reverse mortgages is probably an understatement.
Plack: Yeah, we are very focused on our direct-mail model. Weve done a lot of work in the data science department. This is such a crucial corporate goal that I probably spend about half of my day focused on data science. It is that important to what we do. Some of the increase in results youre seeing is due to gains were making in the statistics world.
What that is exactly is a statistical model, but its done well and continues to improve. Were really excited about that and expect to continue growing originations, largely through our own marketing and this new external sales model that were building.
Plack: I think theres a box of consumers who want the product but dont qualify for it, and theres a box of consumers who qualify but dont want the product. What we really want to focus on is the consumers who want the product and qualify for it. Thats where the data modeling comes in.
Were looking at the overall credit picture and the overall property picture, applying a ton of math and statistical models to find who we can help the most. This results in us helping more people at a lower cost and making sure that the people were talking to can actually be helped, rather than people who would love our help but unfortunately will never qualify for it. So, its about the intersection of both the level of interest of the consumer and our projections on their qualification.
Plack: Its a mixture of everything. Theres some data we take from other sources, and theres a ton of data we have internally that were using as well. We throw all that into a model that helps us decide, on a week-to-week basis, who is going to have the highest propensity to take the loan, close and fund.
Plack: I think there are certainly gains to be made through better modeling, but I dont know that its the key for everyone. I know its worked well for me due to our analytically minded team, but for the average loan officer or sales manager, I wouldnt focus on the data. I would focus on making connections with financial advisers and planners, and using traditional methods. With scale, it makes sense the way we do it, but there is more than one way to originate loans.
Any company should take a multifaceted, multimodal approach. We might be really good at direct mail, but we could be missing out on referral networks and that style of origination. Were working to build into that. For those who dont have exposure to direct-mail marketing, perhaps they should spend some time in it. Just like an investment portfolio, your marketing strategy should be diversified.
Look for more from Tyler Plack and South River on RMD soon.
Excerpt from:
South River president on reverse mortgage business, data science and priorities - HousingWire
Analytics and Data Science News for the Week of June 14; Updates from Databricks, Power BI, Qlik & More – Solutions Review
Solutions Review Executive Editor Tim King curated this list of notable analytics and data science news for the week of June 14, 2024.
Keeping tabs on all the most relevant analytics and data science news can be a time-consuming task. As a result, our editorial team aims to provide a summary of the top headlines from the last week, in this space. Solutions Review editors will curate vendor product news, mergers and acquisitions, venture capital funding, talent acquisition, and other noteworthy analytics and data science news items.
Amplitude also launched a new integration portal to make developing integrations easier and faster. Technology partners can now access a range of tools and resources, including documentation, code samples, and best practices, to guide teams through necessary integration steps.
Read on for more
Databricks AI/BI features a pair of complementary experiences: Dashboards, an AI-powered, low-code interface for creating and distributing fast, interactive dashboards; and Genie, a conversational interface for addressing ad-hoc and follow-up questions through natural language.
Read on for more
The integration allows enterprises to access advanced DatabricksMosaic AI functionalitieswithout extensive infrastructure changes or specialized training, facilitated by the advanced features of the newly launched Qlik Talend Cloud and empowering companies to unlock new levels of efficiency and innovation.
Read on for more
Now ready for deployment, Quantexa customers will be able to operationalize Gen AI for transformative gains without additional investment in infrastructure, tooling, and any additional skilled resources.
Read on for more
This new report format offers source control-friendly file structures, facilitating co-development and improving development efficiency for Power BI reports. Together withTMDL for the semantic model, Power BI Projects now have a great source control experience for both report and semantic model.
Read on for more
The technology potentially allows Snowflake SQL users to get more projects into production faster, accelerate time-to-value, and generate more accurate business insights for better decision-making.
Read on for more
Parsables AI-Powered Analytics provides customizable, real-time data visualization tools that enable frontline operators to make informed decisions instantly, addressing issues as they arise and improving overall efficiency and cost-effectiveness.
Read on for more
Watch this space each week as our editors will share upcoming events, new thought leadership, and the best resources from Insight Jam, Solutions Reviews enterprise tech community for business software pros. The goal? To help you gain a forward-thinking analysis and remain on-trend through expert advice, best practices, predictions, and vendor-neutral software evaluation tools.
Join Doug Atkinson and David Loshin as they break down recent forfeiture orders by the FCC involving location data violations by major telecom companies. They discuss the complexities of data sharing, the importance of governance, and the implications of information misuse.
Watch on YouTube
This month on The Jam Session, host Bob Eve is joined by Robert Seiner, Juan Sequeda, and Austin Kronz to tackle this pressing question. The panel discusses the evolving roles of generative AI and data catalogs, exploring their complementary strengths.
Watch on YouTube
Amplitude Field CTO at Amplitude and gain a better understanding of how Amplitude can help you increase the ROI of existing investments and open up options to replace tools youre looking to sunset. Product demo and Q&A included!
Register free on LinkedIn
My work with companies all over the world has given me a lot of insights into where businesses get lost in the mire of tech and shiny new toy syndrome! In the race to stay ahead of the pack, ignoring the power of data is a sure fire way to doom your business. Guess what: with little benefits and outcomes its not working!
Read on Solutions Review
For consideration in future analytics and data science news roundups, send your announcements to the editor: tking@solutionsreview.com.
See the article here:
Top AI Execs to Watch in 2024: Redhorse Corp.s Brian Sacash – WashingtonExec
Brian Sacash, Redhorse Corp. Director of Data Science, Redhorse Corp.
Brian Sacashs biggest recent achievement was engaging with technology partners to bring the most cutting-edge AI capabilities to the mission space. But rather than simply implementing the latest off-the-shelf solutions, his team is pioneering custom approaches that seamlessly blend this advanced technology to redefine the role of AI as an enabler, not a replacement, for human potential.
While Brian has the technical skills expected of a senior engineer, what sets him apart is his ability to make an impact in other ways: mentoring junior staff, presenting and discussing technical topics, and continually looking for ways to improve Redhorse, said Matt Teschke, Redhorse chief technology officer.
Why Watch
In 2024, Sacash is heavily focused on working with the Redhorse technical teams to help customers intelligently adopt transformative technologies. By combining traditional AI/ML and cutting-edge generative AI, they empower people and organizations to access new potential and excel at what they do, with greater speed and efficiency.
Generative AI has opened doors to new worlds of possibilities, he said. While we are still exploring and navigating these new areas, we must move forward, understanding that generative AI is here to stay, bringing both challenges and opportunities.
Fun fact: Sacash is passionate about writing and enjoyscombining his background in physics and expertise in technology with the science of crafting stories.
Read more from the original source:
Top AI Execs to Watch in 2024: Redhorse Corp.s Brian Sacash - WashingtonExec