Category Archives: Data Science

DragGAN: Everything you need to know about this AI – DataScientest

Like other popular tools such as ChatGPT or MidJourney and Stable Diffusion, DragGAN exploits generative artificial intelligence technology to automate creative tasks.

In this case, its photo editing that becomes childs play, since the AI seems almost to guess the users intention and make the changes for him or her.

More traditional and long-established software such as Photoshop have no choice but to embrace innovation, or risk becoming obsolete. Indeed, Adobe has already launched its own Firefly AI to bring its tools into the new era.

Over the next few years, advances in artificial intelligence will continue to open up new possibilities in image editing. These include automatic object recognition, real-time retouching and video editing.

Despite DragGANs ease of use, exploiting its full potential requires a thorough understanding of artificial intelligence.

Human supervision is needed to improve the quality of the results produced by the AI, which can still make mistakes. To acquire this expertise, you can choose DataScientest.

Our training courses enable you to learn all the techniques and tools required to work in the Data Science profession, as an analyst, data scientist or data engineer.

In particular, youll learn about Machine Learning and Deep Learning, neural networks, GANs, and specialized tools like Keras, TensorFlow or PyTorch. This will enable you to understand how software like DragGAN works, and even create your own models!

As you progress through the other modules of our training courses, youll also become an expert in data analysis, business intelligence, dataviz, programming and databases.

By the end of the course, youll have acquired all the skills you need to become a Data Science professional. Youll also receive a state-recognized diploma and certification from our cloud partners AWS or Azure.

All our training courses are entirely distance learning via the web, and are eligible for funding options. Dont waste another moment and discover DataScientest!

Read more here:

DragGAN: Everything you need to know about this AI - DataScientest

WiDS Livermore Conference: Attendees Share Research Insights – Mirage News

Lawrence Livermore National Laboratory (LLNL) recently hosted its 7th annual Women in Data Science (WiDS) conference for data scientists, industry professionals, recent graduates and others interested in the field. As an independent satellite of the global WiDS conference celebrating International Women's Day, the Livermore hybrid event was held to highlight the work and careers of LLNL and regional data-science professionals.

Hosted at the University of California Livermore Collaboration Center, the all-day event included technical talks, panel discussions, speed mentoring, a poster session and networking opportunities. Keynote speaker and LLNL Distinguished Member of Technical Staff Carol Woodward spoke about her unconventional career path. She described her experience as one of few women in male-dominated classes at Louisiana State University, where she earned her bachelor's degree in mathematics, and how she discovered the field of applied mathematics after nearly becoming a microbiologist. Throughout her talk, Woodward gave credit to those who mentored her at every step of her journey.

"[With] the power of the right cohort and engaged mentors, the right environment - it's amazing what you can accomplish," she said.

Read this article:

WiDS Livermore Conference: Attendees Share Research Insights - Mirage News

MD Anderson’s Institute for Data Science in Oncology announces appointment of inaugural IDSO Affiliates – MD Anderson Cancer Center

Affiliates bring diverse expertise to facilitate engagement, advance the work of the institute and grow the data science ecosystem at MD Anderson

MD Anderson News Release March 22, 2024

The Institute for Data Science in Oncology (IDSO) at The University of Texas MDAnderson Cancer Center today announced the appointment of its inaugural cohort of IDSO Affiliates. These 33 talented scientists, clinicians and staff bring diverse expertise to help IDSO leadership and focus area co-leads advance collaborative data science projects and align the institutes efforts with MD Andersons mission to end cancer.

We are proud to welcome these exceptional individuals to the growing IDSO community, and we look forward to the collaborative work ahead of us, said David Jaffray, Ph.D., director of IDSO and chief technology and digital officer atMD Anderson. By engaging diverse expertise across all of our mission areas, we will enhance the rich and productive data science ecosystem at MD Anderson to deliver transformational impact for patients.

IDSO was launched to integrate the most advanced computational and data science approaches with MD Andersons leading scientific and clinical enterprise, significantly improving patients lives by transforming cancer care and research.

The affiliates were identified based on their existing contributions to IDSO or were recruited to MD Anderson specifically for their data science expertise. Affiliates are approved for a two-year term based on their qualifications, alignment with, and commitment to IDSO projects and focus areas. The inaugural IDSO Affiliates include:

Our affiliates bring expertise, perspectives and commitment from across the institution to foster impactful data science in order to tackle the most urgent needs of our patients and their families, saidCaroline Chung, M.D., director of Data Science Development and Implementation for IDSO and chief data officer at MDAnderson. People and community are at the heart of our efforts, and establishing the IDSO Affiliates is an exciting step in growing the most impactful ecosystem for data science in the world.

Originally posted here:

MD Anderson's Institute for Data Science in Oncology announces appointment of inaugural IDSO Affiliates - MD Anderson Cancer Center

CDOs, data science heads to fill Chief AI Officer positions in India – CIO

We are already seeing this (combination of the AI roles) happening now in India, Addagada said, giving the examples of HDFC Bank, Axis Bank, ICICI Bank, and Bandhan Bank.

The refactoring of C-level technology roles across Indian enterprises, according to CK Birla Hospitals CIO Mitali Biswas, can be chalked up to the dearth of talent or skills presently available to take on the responsibilities for the role or create an efficient team under that position.

While larger enterprises may still want to create a new position and a team around it, small and medium businesses will look up to their existing technology leaders, such as the CIO or the CTO or the CDO to take up the CAIO mantle, Biswas explained, adding that maturity and pervasiveness of the CAIO role, at least in the Indian healthcare sector, is two to three years away.

Santanu Ganguly, who is the CEO of advisory firm StrategINK, said he believes that other industry sectors, including healthcare, will see the role of CAIO being adopted in the next one to three years, driven by the boards and CEOs agenda of shaping the future of customer-centricity, offering innovation, enhanced productivity & efficient operations.

Along the same lines, Gaurav Kataria, vice president of digital manufacturing and CDIO at PSPD, ITC Limited said that the evolution of the CAIO role is already happening in India.

Mostly all enterprises are setting up AI centers of excellence and the persons leading those centers are already doing what is expected of a CAIO. While the CAIO is not an official CxO position, this role rolls into the CDO who helps drive strategy, governance, and connect to the board, Kataria explained.

See more here:

CDOs, data science heads to fill Chief AI Officer positions in India - CIO

Data Science Career Challenges-and How to Overcome Them – Towards Data Science

On a very basic level, most work-related challenges come from similar sources, regardless of field or industry: having to navigate professional relationships and communicate with people who might not always be on the same page as you. And you have to do that within the constraints of goals, available resources, and limited timeand on top of everything else you might need to deal with in your life.

If we take a closer look, though, we can see different patterns emerge not just across professions and workplace types, but even within well-defined roles and disciplines. That certainly appears to be the case for data and ML professionals, who despite a very broad range of skills and responsibilities, often have to resolve similar issues.

This week, were highlighting recent articles that focus on some of these common data science work and career challenges we see pop up again and again; theyre grounded in the authors personal experiences, but offer insights that can likely help a wide swath of our community. Enjoy!

View original post here:

Data Science Career Challenges-and How to Overcome Them - Towards Data Science

Discovering My Path: Data Science and Beyond at Syracuse University – iSchool | Syracuse University – Syracuse University

From my earliest memories in Miami, I was intrigued by everything from the unfolding of a story in a book to the evolving theories about our universe. Now, as a third-year student at Syracuse University, Im living my dream, delving into the world of Applied Data Analytics with minors in Innovation, Design, and Startups, and Computer Engineering.

Being the first in my family to attend university, I had to navigate this new world largely on my own. But thats the thing about Syracuse its more than just a university. Its a community where youre encouraged to explore, make mistakes, and grow.

In my classes, whether its Data in Society or Intro to Networks & Cloud, I find myself constantly challenged and intrigued. Its not just about learning the theories; its about seeing how these concepts come alive in the real world. And speaking of the real world, my job at the ITS Service Center and the Digital Scholarship Space (DSS) at Syracuse University has been nothing short of eye-opening. There, Im not just a student; Im a problem-solver, a tech wizard in training, helping others make sense of the digital world.

But perhaps what really gets my heart racing is the work I do as a NEXIS Researcher in Data Science. Its here that I get to explore the frontiers of technology imagine being part of discussions on the next big thing in tech! This, coupled with my ongoing project, Pr0-Tech, is where I see my dreams converging with reality. Its a thrilling journey of creating a blockchain-based solution that could redefine data security and privacy.

Looking ahead, I cant wait for my summer internship at GE Aerospace. It feels like the next big leap towards my goal of being at the forefront of technology and innovation. Im eager to dive into projects, to test my skills in a real-world setting, and to see how far my passion for data and technology can take me.

And lets not forget my role as an iSchool Ambassador. Sharing my story, guiding prospective students, and being a part of their decision-making process is not just a responsibility its a privilege. Its my way of giving back, of showing others that their dreams are valid and achievable.

As I reflect on my time at Syracuse, I realize how each experience has been a stepping stone towards a future I once only dreamed of. This journey has been about finding my place in the world of technology and data, about pushing boundaries, and about discovering who I am and who I want to be.

When I think about how I ended up here, a big part of the story is the Posse Foundation Full-Tuition Leadership Scholarship. It wasnt just a scholarship; it felt like a vote of confidence in a kid who loved the idea of being a forever learner and dreamed of making a difference. More than that, Posse connected me with a network of fellow scholars, individuals who have become more than peers they are motivators, inspirers, and my closest friends. Together, weve shared experiences and challenges that have shaped me into a better person, deepening my commitment to learning and growing. Each of them, in their own unique way, has contributed to my journey, encouraging me to strive for excellence not just in academia, but in all facets of life. This heartfelt community of scholars that Iveive been able to meet at Syracuse has been instrumental in my development, constantly pushing me to explore new horizons and to be the best version of myself.

Adding to this enriching journey, I was recently honored with the Scholarship in Action award from the iSchool at Syracuse University. This recognition is not just an accolade; it is a testament to the hard work, dedication, and passion Ive invested in my academic and extracurricular endeavors. Additionally, being an Our Time Has Come Scholar has brought another layer of enrichment to my university experience. These acknowledgments validate my efforts and reinforce my belief in the power of education and community. They serve as reminders of the responsibility I carry to not only excel academically but to also make a meaningful impact within my community and beyond. With these accolades, I feel even more empowered to pursue my goals and continue making a positive difference in the world.

For anyone considering Syracuse, especially the iSchool, know that its a place where dreams are given the space to grow. Its where your passion for technology and innovation will find a nurturing home, and where your academic journey will be as exciting as it is enlightening.

View original post here:

Discovering My Path: Data Science and Beyond at Syracuse University - iSchool | Syracuse University - Syracuse University

Point Cloud Classification with PointNet and PyTorch3D | by Mason McGough | Mar, 2024 – Towards Data Science

11 min read

Follow along with this post using the Google Colab notebook.

In todays rapidly evolving technological landscape, 3D technology is becoming indispensable. Prototyping, virtual try-on, virtual and augmented reality experiences, digital twins, surveying, medical prosthetics, and the film and gaming industries are just the tip of the 3D iceberg. LinkedIn estimates that the worldwide demand for 3D content will surpass $3 billion by 2028, showing no signs of deceleration. From Frozen to Fortnite, its safe to say that 3D models are becoming the new photographs.

As demand for 3D data grows, so does the need for effective methods to classify and understand 3D data. PointNet, invented in 2016 by Stanford researchers, is a fossil in the fast-paced world of ML, and yet it has withstood the test of time. As recently as 2023, researchers have published variants on the PointNet architecture for tasks as diverse as:

Designed specifically to grapple with the complexities inherent in 3D point cloud data, PointNet offers a robust and versatile solution in an era where the utilization of 3D data is more prevalent than ever before.

To aid us on our PointNet journey, we will use PyTorch3D. PyTorch3D, from Facebook AI Research (FAIR), is a flexible and efficient 3D deep learning tasks framework that empowers researchers and practitioners to delve into the intricacies of 3D machine learning. With its rich toolset, we can visualize and manipulate 3D data to build a 3D object classification model with PointNet. If you seek more information on PointNet, PyTorch3D, and 3D machine learning, you may be interested in my new Educative course 3D Machine Learning with PyTorch3D.

Since its creation, PointNet has become a cornerstone in 3D deep learning for efficient point cloud data processing. But exactly what is a point cloud?

Original post:

Point Cloud Classification with PointNet and PyTorch3D | by Mason McGough | Mar, 2024 - Towards Data Science

Analytics and Data Science News for the Week of March 22; Updates from Databricks, NVIDIA, Power BI & More – Solutions Review

Solutions Review Executive Editor Tim King curated this list of notable analytics and data science news for the week of March 22, 2024.

Keeping tabs on all the most relevant analytics and data science news can be a time-consuming task. As a result, our editorial team aims to provide a summary of the top headlines from the last week, in this space. Solutions Review editors will curate vendor product news, mergers and acquisitions, venture capital funding, talent acquisition, and other noteworthy analytics and data science news items.

The transaction was previously announced on December 18, 2023 and approved by Alteryx stockholders on March 13, 2024. Alteryx is now a privately held company and its common stock has ceased trading on the New York Stock Exchange.

Read on for more.

Together, Databricks and NVIDIA will optimize data and AI workloads on the Databricks Data Intelligence Platform. The collaboration builds on NVIDIAs recent participation in DatabricksSeries Ifunding round.

Read on for more.

DataRobot announced that its enterprise-ready AI solutions will be supercharged with NVIDIA technology to offer world-class performance, security and efficiency across the full AI lifecycle. This new collaboration with NVIDIA accelerates AI use case delivery.

Read on for more.

This latest offering from Dell, a fully integrated data platform built on Dell hardware with a full-service software suite, helps modernize an organizations data platform and operations. By utilizing Starbursts query engine, data processes are streamlined.

Read on for more.

In partnership with Vanson Bourne, an independent research firm, Exasol surveyed 800 senior decision-makers as well as data scientists and analysts across the U.S., U.K., and Germany to assess enterprises data and analytics initiatives, including their top challenges and how they are planning to address those challenges in the short-term (within two years).

Read on for more.

Peter Wang was named Chief AI & Innovation Officer and will lead Anacondas new AI Incubator. The AI Incubator will serve as an internal research and development group dedicated to advancing Python performance in AI workloads and supporting the companys competitive advantage.

Read on for more.

Java 22 (Oracle JDK 22) delivers thousands of performance, stability, and security improvements to help developers increase productivity, drive innovation, and accelerate growth across their organizations. These include enhancements to the Java language, its APIs and performance, and the tools included in the Java Development Kit (JDK).

Read on for more.

This update brings the on-premises data gateway up to date with the March 2024 release of Power BI Desktop. This version of the gateway will ensure that the reports that you publish to the Power BI Service and refresh via the gateway will go through the same query execution logic/run-time as in the March version of Power BI Desktop.

Read on for more.

Watch this space each week as our editors will share upcoming events, new thought leadership, and the best resources from Insight Jam, Solutions Reviews enterprise tech community for business software pros. The goal? To help you gain a forward-thinking analysis and remain on-trend through expert advice, best practices, predictions, and vendor-neutral software evaluation tools.

In this episode, David examines the reputation and potential regulatory risks businesses face when using predictive analysis and gets into the ethics of using customers data to improve marketing techniques. What is the value proposition for predictive analysis? How can companies better articulate their goals in a mutually beneficial way?

Watch on YouTube

With the next Spotlight event, the team at Solutions Review has partnered with Amplitude to learn how to apply insights into product-led growth workflows and how they interlace with marketing efforts.

Read on for more.

The Thought Leader Project is a single initiative that brings the full power of Solutions Reviews authority, reach, and distribution to help technology vendors build brands, enhance their reputation, and ultimately, reach their target market.

Read on for more.

For consideration in future analytics and data science news roundups, send your announcements to the editor: tking@solutionsreview.com.

Visit link:

Analytics and Data Science News for the Week of March 22; Updates from Databricks, NVIDIA, Power BI & More - Solutions Review

How to perform anomaly detection with LOF – Towards Data Science

An introduction to performing outlier detection with the Local Outlier Factor (LOF) algorithm. 8 min read

Anomaly detection, although useful, is a topic that often gets skipped in machine learning classes. There are many applications of anomaly detection, especially in areas such as fraud detection and system monitoring.

If youve followed my blog for some time, youll remember that I previously wrote an article about using Isolation Forests for anomaly detection.

Aside from Isolation Forests, there is also another anomaly detection known as the Local Outlier Factor (LOF) that also performs well in practice. In this article, I will briefly go over the LOF algorithm and also demonstrate how you can use this algorithm for anomaly detection in Python.

The LOF algorithm is an unsupervised algorithm for anomaly detection. It borrows concepts from the K-nearest neighbors algorithm and produces an anomaly score based on how isolated a point is from its local neighbors. The basic hypothesis of this algorithm is that outliers or anomalies will have a lower density (further nearest neighbors) than other points.

To fully explain how this algorithm computes anomaly scores, we need to understand four concepts in the following order:

The k-distance is the distance between a point and its k-th nearest neighbor. The value we select for k is a hyperparameter for the LOF algorithm that we can experiment with to produce different results. Consider the diagram below where the second-closest point (or second-nearest neighbor) to point A is point B so the k-distance with k=2 is

See the original post here:

How to perform anomaly detection with LOF - Towards Data Science

Understanding the Sparse Mixture of Experts (SMoE) Layer in Mixtral – Towards Data Science

Lets begin with the idea of an expert in this context. Experts are feed-forward neural networks. We then connect them to our main model via gates that will route the signal to specific experts. You can imagine our neural network thinks of these experts as simply more complex neurons within a layer.

The problem with a naive implementation of the gates is that you have significantly increased the computational complexity of your neural network, potentially making your training costs enormous (especially for LLMs). So how do you get around this?

The problem here is that neural networks will be required to calculate the value of a neuron so long as there is any signal going to it, so even the faintest amount of information sent to an expert triggers the whole expert network to be computed. The authors of the paper get around this by creating a function, G(x) that forces most low-value signals to compute to zero.

In the above equation, G(X) is our gating function, and E(x) is a function representing our expert. As any number times zero is zero, this logic prevents us from having to run our expert network when we are given a zero by our gating function. So how does the gating function determine which experts to compute?

The gating function itself is a rather ingenious way to only focus on the experts that you want. Lets look at the equations below and then Ill dive into how they all work.

Going from bottom to top, equation 5 is simply a step function. If the input is not within a certain range (here the top k elements of the list v), it will return infinity, thus assuring a perfect 0 when plugged into Softmax. If the value is not -infinity, then a signal is passed through. This k parameter allows us to decide how many experts wed like to hear from (k=1 would only route to 1 expert, k=2 would only route to 2 experts, etc.)

Equation 4 is how we determine what is in the list that we select the top k values from. We begin by multiplying the input to the gate (the signal x) by some weight Wg. This Wg is what will be trained in each successive round for the neural network. Note that the weight associated with each expert likely has a distinct value. Now to help prevent the same expert being chosen every single time, we add in some statistical noise via the second half of our equation. The authors propose distributing this noise along a normal distribution, but the key idea is to add in some randomness to help with expert selection.

Equation 3 simply combines the two equations and puts them into a SoftMax function so that we can be sure that -infinity gets us 0, and any other value will send a signal through to the expert.

The sparse part of the title comes from sparse matrices, or matrices where most of the values are zero, as this is what we effectively create with our gating function.

While our noise injection is valuable to reduce expert concentration, the authors found it was not enough to fully overcome the issue. To incentivize the model to use the experts nearly equally, they adjusted the loss function.

Equation 6 shows how they define importance in terms of the gate function this makes sense as the gate function is ultimately the decider of which expert gets used. Importance here is the sum of all of the experts gate functions. They define their loss function as the coefficient of the variation of the set of Importance. Put simply, this means we are finding a value that represents just how much each expert is used, where a select few experts being used creates a big value and all of them being used creates a small value. The w importance is a hyperparameter that can aid the model to use more of the experts.

Another training challenge the paper calls out involves getting enough data to each of the experts. As a result of our gating function, the amount of data each expert sees is only a fraction of what a comparatively dense neural network would see. Put differently, because each expert will only see a part of the training data, it is effectively like we have taken our training data and hidden most of it from these experts. This makes us more susceptible to overfitting or underfitting.

This is not an easy problem to solve, so the authors suggest the following: leveraging data parallelism, leaning into convolutionality, and applying Mixture of Experts recurrently (rather than convolutionally). These are dense topics, so to prevent this blog post from getting too long I will go into these in later posts if there is interest.

The Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer paper was published in 2017, the same year that the seminal Attention is All You Need paper came out. Just as it took some years before the architecture described in Self-Attention reached the main stream, it took a few years before we had any models that could successfully implement this Sparse architecture.

When Mistral released their Mixtral model in 2024, they showed the world just how powerful this setup can be. With the first production-grade LLM with this architecture, we can look at how its using its experts for further study. One of the most fascinating pieces here is we dont really understand why specialization at the token level is so effective. If you look at the graph below for Mixtral, it is clear that with the exception of mathematics, no one expert is the go-to for any one high level subject.

Consequently, we are left with an intriguing situation where this new architectural layer is a marked improvement yet nobody can explain exactly why this is so.

More major players have been following this architecture as well. Following the open release of Grok-1, we now know that Grok is a Sparse Mixture of Experts model, with 314 billion parameters. Clearly, this is an architecture people are willing to invest amounts of capital into and so will likely be a part of the next wave of foundation models. Major players in the space are moving quickly to push this architecture to new limits.

The Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer paper ends suggesting experts created via a recurrent neural network are the natural next step, as recurrent neural networks tend to be even more powerful than feed-forward ones. If this is the case, then the next frontier of foundation models may not be networks with more parameters, but rather models with more complex experts.

In closing, I think this paper highlights two critical questions for future sparse mixture of experts studies to focus on. First, what scaling effects do we see now that we have added more complex nodes into our neural network? Second, does the complexity of an expert have good returns on cost? In other words, what scaling relationship do we see within the expert network? What are the limits on how complex it should be?

As this architecture is pushed to its limits, it will surely bring in many fantastic areas of research as we add in complexity for better results.

[1] N. Shazeer, et al., OUTRAGEOUSLY LARGE NEURAL NETWORKS: THE SPARSELY-GATED MIXTURE-OF-EXPERTS LAYER (2017), arXiv

[2] A. Jiang, et al., Mixtral of Experts (2024), arXiv

[3] A. Vaswani, et al., Attention Is All You Need (2017), arXiv

[4] X AI, et al., Open Release of Grok-1 (2024), x ai website

See more here:

Understanding the Sparse Mixture of Experts (SMoE) Layer in Mixtral - Towards Data Science