Category Archives: Data Science

MBA in business analytics: Big data is a big deal – Fortune

Business analytics is no longer a priority for tech companies alone. In 2022, McKinsey estimated that it could create approximately $1.2 trillion a year in value for government entities. A recent EY study found that 93% of companies said they plan to increase investments in the area of data and analytics.

People realize thats the way the world is going, says Daniel Guetta, an associate professor and director of the Business Analytics Initiative at Columbia Business Schooland if professionals want to succeed in their jobs, they need to have at least a basic comprehension of how data functions and how it can be leveraged in decision-making.

ADVERTISEMENT

Learn how to analyze, interpret, and visualize data for a career. Our course will prepare you to organize and interpret data professionally with hands-on projects and 1-on-1 mentoring. Gain experience in the critical tools for todays data analysts: Python, SQL, Tableau, and Excel.

With demand for this expertise growing across sectors and the technology surrounding it becoming more convoluted every day, theres incredible value in having a degree in business analytics. Companies are also paying top dollar for this expertise, with Glassdoor reporting that business analysts with MBAs make on average around $104,000 annually.

Heres what an MBA in business analytics entails, and why it might be a degree worth considering.

Business analytics is the use of technical tools and frameworks to mine data for insights that can drive value for an organization. Guetta likes to think of the field as a spectrum.

On one end, you have data analytics, data science, and business intelligence, which, while highly dependent on the company and context, generally involve collecting, storing, and then sorting through data to make informed decisions around marketing, sales, operations, or other areas of the business. I have a big data setits too big for Excel, or maybe its not too big for Excel, but its kind of complexand I just want to ask very simple questions out of it, Guetta says.

On the other end, you might have more complex concepts like predictive analytics and machine learning that focus less on comprehending the present and more on forecasting and preempting whats to come. All points in that spectrum are incredibly useful, incredibly important in different ways, he adds.

Big dataa large, diverse data set thats often difficult to process with simple computational toolsplays a role in many of these categories, particularly in emerging tech like AI that cant operate without a complicated array of inputs.

While some analytics skills can be taught on the job, a more formal education will provide you with the credibility and confidence to incorporate data into your workflows. And its not just for those who are good at math and engineers: I cannot think of someone today in 2024 who cant at least benefit in some way from knowing a little bit about coding, Guetta says, adding that his students range from aspiring product managers to venture capitalists.

I can think of at least three or four students in the last two years who work in museums whove decided they wanted to take this class and just understand how this works and whats happening and where its going.

Peter Fader, a professor of marketing at The Wharton School of the University of Pennsylvania whos taught the topic for more than three decades, agrees that anyone could benefit from an MBA in business analytics. Youve been in an organization, two, or three, and realized all the dysfunction and opportunity to improvea well-taken MBA can be immensely helpful, he says.

Specializations like business analytics used to really only be reserved for masters or PhD programs. But business schools are beginning to recognize that they must incorporate data and engineering concepts into their curriculum if they want their students to be competitive in todays job market.

The difference between a masters in business analytics and an MBA in business analytics comes down to whats taught and how deep the lesson plans go. The MS programs, depending on where and how theyre taught, are going to be a whole level way beyond the analytics courses we see in the MBA, Fader says.

In a business analytics MBA program, you might take some courses on statistics, data analysis, and emerging tech like AI, but youll also spend as much time, if not more, on business foundations such as accounting, people management, and supply chain. An MS in business analytics, meanwhile, may include classes where you get really technical and mathematical.

At Columbia Business School, Guetta says, the business analytics speciality can be broken up into three buckets. The first is fundamentals, where you learn about whats going on under the hood of specific technologies and ideas. The second is tools, where you learn how to deploy software and programming languages like SQL. And the third is applications, where youre taught to apply business analytics knowledge and tools to marketing, real estate, sports, and other areas.

Masters degrees also tend to be a better fit for recent college graduates or young professionals, while an MBA may make more sense for someone whos already fairly established in their career and/or is looking to make a major switch. Most MBA programs are going to require a few years of experience after your undergrad, Guetta adds.

Another option is a dual MS/MBA, which Columbia, along with other legacy institutions like Harvard and Emory, offer. You get the best of all worldsyou get the very rigorous engineering training, and you also get the MBA training, Guetta says.

If an MBA in business analytics is piquing your interest, consider the following (not at all exhaustive) list of institutions with in-person, virtual, and/or hybrid offerings. Note, too, that some business schools call it a masters instead of an MBA, while others provide concentrations and certifications in business analytics through their MBA programs.

Whether youre an entrepreneur, a finance professional, or a nonprofit worker, having a business analytics background can help you make smart, data-informed choices for your company or team. It will not only make you an easily marketable candidate for job openings or promotions, but it will ensure youre staying on top of the latest innovations and tech. An MBA program is an ideal place for those who want to mix business acumen with engineering, while a masters degree will help you further hone your technical skill set.

Whatever route you decide, Fader recommends researching and asking around about a schools job placement success rates and services. Having that kind of transparency and putting the conversation in the hands of the students and alums themselvesand not letting the institution filter itwould be really, really helpful, he says. Guetta suggests finding an MBA cohort that provides a broad range of business analytics courses, or even partners closely with an engineering school. Lastly, exercise muscles that help you learn quicker. Business analytics, he says, has changed unrecognizably in the last two, three years, and its going to change unrecognizably in the next two, three years. The ability to pick up things as theyre thrown at you, to pick up new topics, to pick up new trends, is going to be invaluable in the field.

Continued here:

MBA in business analytics: Big data is a big deal - Fortune

Uncovering the Trends in Data Analytics & Technology – Data Science Central

In todays digitized environment, the importance of data analytics as a tool for informed decision-making and strategic planning is paramount. From large corporations to parts of the healthcare industry, the ability to parse large quantities of data has become the key need for competitiveness and growth. Hence, right from machine learning algorithms to sophisticated visualization tools, the field of data analytics is evolving and new trends and innovations are being developed at a very high frequency.

Therefore, in this article, we will highlight the latest updates and trends that occur periodically to alter the field of big data analytics technology gently and explain why such changes are important in todays world.

Knowing that organizations highly demand quick ways to access meaningful insights from vast datasets, automate processes, and enhance decision-making capabilities. Therefore, AI algorithms enable advanced pattern recognition and predictive analytics, allowing businesses to forecast trends, customer behavior, and market shifts with unprecedented accuracy.

According to Gartner, by 2024, 75% of enterprises will shift from piloting to operationalizing AI, driving a 5x increase in streaming data and analytics infrastructures. Machine learning models automate complex data analysis tasks, speeding up decision-making processes and enhancing operational efficiency.

This synergy is not only improving traditional analytics but also paving the way for personalized recommendations in e-commerce where AI for personal finance assists consumers in managing their budgets, optimizing saving money, and making more informed financial decisions.

Real-time data analytics has become essential in todays fast-paced business environment where timely decisions can make or break opportunities. By processing data as it is generated, organizations can swiftly respond to customer needs, monitor campaign effectiveness instantly, and detect anomalies in operational processes in real-time. For instance, according to a study by IDC, organizations that use real-time data can achieve a 26% increase in revenue.

This capability is particularly valuable in sectors like online retail, where understanding customer behavior at the moment can drive personalized marketing strategies and optimize inventory management. Real-time analytics also supports dynamic pricing strategies and enhances overall customer experience through immediate feedback mechanisms.

Cloud-driven data analytics has democratized access to powerful analytical tools and computing resources, previously only available to large enterprises. Cloud platforms for data analytics technology offer scalability, flexibility, and cost-effectiveness, allowing businesses to scale their analytics infrastructure according to fluctuating demands. With features such as data integration across disparate sources and robust security protocols, cloud-based solutions enable seamless collaboration and data sharing across teams and geographies.

This shift has accelerated the adoption of analytics-driven decision-making across industries, from healthcare to financial services, to innovate faster and stay competitive in rapidly evolving markets. However, Services from ThingsFromMars.com exemplify how platforms process analytics and cater scalable solutions to grow swiftly.

Given the fact that the amount of data is increasing and the processes becoming more and more complex, Data Governance and especially data privacy, have become critical and strategic priorities of a large company. Therefore, to free the company from this risk and bolster customer confidence, data governance policies, and privacy controls have been put in place and occupy this position.

Modern NLP has unlocked new possibilities for converting vast amounts of unstructured data into meaningful insights. Organizations can now analyze social media posts, customer emails, and product reviews to understand sentiment, identify key entities, and discover emerging topics.

NLP-powered tools can automate the extraction of actionable information from text, enabling businesses to enhance customer engagement, improve service quality, and tailor marketing strategies based on real-time feedback. These advancements also facilitate more accurate trend analysis and competitive benchmarking to provide a deeper understanding of market dynamics and consumer preferences.

Edge computing decentralizes the analytics capability to where real-time insight is important and bandwidth is constrained. Faster processing, minimum time taken to transfer data, and enhanced protection of the processed data. Consequently, to analyze data for businesses, they can easily optimize data transfer costs, enhance data security, and explore other use cases of IoT solutions. Gartner predicts that by 2025, 75% of enterprise-generated data will be created and processed outside a traditional centralized data center or cloud.

However, using a Windows or Linux-based server can also be quite beneficial in the sense of Analytics since it is possible to manipulate data right on a server. Even more, you do not need to restrict the choice of automation software because you can install and utilize any utility.

Augmented analytics is democratizing data science by making advanced analytical tools accessible to non-technical users rather than only to seasoned IT professionals and data scientists. Through features like natural language processing, automated model generation, and intuitive data visualization, augmented analytics enables business users to perform complex data analyses without needing deep technical expertise. This self-service approach accelerates decision-making processes and fosters a data-driven culture across organizations. By empowering a broader range of employees to engage with data, companies can uncover new insights, drive innovation, and respond more swiftly to market changes.

Quantum computing holds the promise of transforming data analytics by tackling problems that are currently beyond the reach of classical computers. With its ability to process complex calculations at unprecedented speeds, quantum computing can revolutionize fields such as optimization, data clustering, and molecular modeling. This technology can enhance the precision of simulations, improve financial modeling, and provide breakthroughs in artificial intelligence research. As quantum computing continues to evolve, it is expected to open new frontiers in data analytics, enabling businesses to solve previously intractable problems and uncover deeper insights from their data. Moreover, according to a report by P&S Intelligence, the quantum computing market is expected to reach $64.98 billion by 2030.

As the various domains related to data analytics technology advance alongside evolving consumer trends, businesses are witnessing a transformative era in data power. From groundbreaking technologies in big data analytics such as AI and real-time analytics to edge computing and augmented analytics, the landscape of data analytics is continuously evolving. These advancements enable businesses to derive actionable insights, optimize operations, and drive innovation more effectively. Emerging forms are revolutionizing traditional approaches, allowing for more precise and dynamic decision-making processes.

Consequently, with the ever-increasing availability of data, organizations must actively contribute to new advancements to embrace cutting-edge technologies. This proactive approach will ensure they remain competitive in a fast-growing, data-driven world, and achieve a significant competitive advantage in the market.

Read more here:

Uncovering the Trends in Data Analytics & Technology - Data Science Central

Kai Analytics, a consulting firm that utilizes data science to solve social issues, establishes a Japanese subsidiary in Tokyo | 2024 – Events &…

While data is a key factor in solving issues facing companies, organizations, and society, the reality is that many people have trouble collecting, analyzing, and visualizing such data.

Kai Analytics provides support for companies and organizations with problems in statistical research, data analysis, or data visualization. The company not only organizes survey results, but also adds expert insights to tell the "story" derived from the results and analyses. In addition, it has a track record of having provided support in the fields of international cooperation and development, as well as a network of experts in healthcare and DEI (Diversity, Equity, and Inclusion), and has strength in data analysis for solving social issues.

The company established Kai Analytics K.K. in Tokyo in July 2023 to strengthen its business in Japan in the future after experiencing collaborations with Japanese startups, where it helped them implement generative AI solutions to enhance their existing processes.

JETRO's Investment Business Support Center in Japan (IBSC) provided consultation (registration, visa, and tax) support for the company's entry into the Japanese market.

Link:

Kai Analytics, a consulting firm that utilizes data science to solve social issues, establishes a Japanese subsidiary in Tokyo | 2024 - Events &...

You Dont Need an LLM Agent. The Why & The Alternative | by Louis Chan | Jul, 2024 – Towards Data Science

Opinion And what you might actually want instead

People anchor their eyes on the next shiny thing. FOMO is part of human nature. And that applies to businesses, too. Like how Data Science became a craze for every businesss analytics function, Agentic Architecture is the red hot target on most AI radars.

Have you ever considered if you actually need it though?

Hi, I am Louis Chan, Tech Lead in KPMGs Global Lighthouse. It has been over a year since I co-founded KPMGs Enterprise GenAI-as-a-Service Platform KPMG AVA servicing over 18,000 users globally on all their GenAI needs. It is a fascinating time seeing how businesses and clients catch up to the latest catchphrases from AI, GenAI, Chat, RAG, Agent, to now Agentic Architecture.

This is an emerging field, and most of us (including me) do not know half of what we talk about. The general optimism that LLM can miraculously increase business efficiency and the fear of the impact and liability of cutting-edge technology drive the desire to gun for the next as-told, better, and more complex system.

The reality is that you dont need an Agentic Architecture, and you will be better off not using one in your business.

Excerpt from:

You Dont Need an LLM Agent. The Why & The Alternative | by Louis Chan | Jul, 2024 - Towards Data Science

Briefly Bio raises $1.2M to build the GitHub of science experiments – VentureBeat

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Briefly Bio, a startup based in London, has announced a small but meaningful $1.2 million round from Compound VC and others with the goal of making scientific experiments and data more reproducible.

In fact, so many thousands of results of scientific experiments havent yet or possibly cant be reproduced that some researchers have deemed this a reproducibility or replication crisis.

In an effort to help solve this problem, Briefly Bios platform uses large language models (LLM) like the kind powering leading AI products such as ChatGPT to turn complex lab documentation into a consistent, structured format. The idea is that anyone can use and build on the original documents in their own lab more easily.

The company has already roped in early customers for the paid version of the offering and has plans in place to create a public version of the platform for open sharing and collaboration on scientific data that it says will be similar to GitHubs code repository for sharing open source and permissively licensed software.

As GitHub helped software engineers collaborate and build on each others code, we think Briefly can help scientists and engineers do the same with their experiments, Harry Rickerby, the CEO and co-founder of Briefly Bio, said in a statement.

When scientists try to solve a complex biological problem, they take different approaches. Some methods work, some partially do the job and some dont at all, but in all cases, the protocol for the lab work the plan for research, covering objectives, design, methodology and statistics and the details of the experiment itself are documented thoroughly.

The idea behind collating this data is to give other scientific teams a base of sorts to continue the research or solve any other closely related problem. However, this is also where the problem of reproducibility begins.

Essentially, every scientist has their own way of documenting their work, which in many cases leads to ambiguity and the loss of critical details crucial for shared understanding.

For instance, some researchers may go into extensive detail when describing their approach to gene editing in their own words, while others may just scratch the surface with the notion that other teams may have similar knowledge. This can easily lead to inefficient collaboration and failure at reproducing experiments, costing the industry over $50 billion each year.

Rickerby told VentureBeat he and his colleagues at LabGenius Katya Putintseva and Staffan Piledahl saw the problem first-hand at different levels.

Katya worked as a scientist in academia and faced the challenges of re-using and adapting work from the published literature. She then moved into data science, where she needed to understand precisely how the data was generated to analyze and model it. On the other hand, Staffan worked as an automation engineer and needed complete definitions of a lab workflow to transfer them to robots. After leaving, we realized that many struggles across our careers shared a common root cause there wasnt consistent documentation of how lab work was being run, he said.

To address this, the trio came together and launched Briefly Bio. At the core, the company provides scientists with a platform that can convert any scientific protocol documented in natural language into a consistent, structured format containing step-by-step information. All the user has to do is provide the blob of text from the original author and the tool comes up with a structured output detailing the method for reproducing or building on the experiment.

Brieflys tool is powered by generative AI, which helps structure plain text descriptions of procedural knowledge and convert them into a hierarchical representation. The large language models under the hood automatically extract the key pieces of information and categorize them into different processes, actions, explanations and parameters. This structured representation is then transformed into a visual representation that is clearer and easier to digest than a wall of text, Rickerby explained.

The offering not only creates a shared language for data understanding but also paints a clear picture showcasing how scientific methods change and evolve, in a way that just hasnt been possible with traditional text descriptions.

More importantly, in addition to converting existing scientific descriptions into a structured format, Briefly Bio also includes an AI copilot, which can be triggered via natural language to spot errors and find as many parameters as it can find in connection to the lab work being done. The AI generates missing parameters in a matter of seconds, enriching the hierarchical representation of the method for reuse in a lab experiment.

The CEO did not share the exact details of the models powering the whole experience but said they are building on top of existing models, enriching them with additional experimental context to improve the lab work understanding.

For reusing the generated data in experiments, teams can launch Briefly Bios workspace. It copies the enriched, structured method as is while allowing users and their team members to mark each step as complete/incomplete with associated calculations, text and sample layout painting a picture of whats in each well of the users plate, layer by layer.

While Briefly Bio is still in its early stages, the company claims it has started booking revenue from first customers on a per-user-based SaaS model.

Our users are typically wet lab scientists, working in early-stage research and development whether this is in academia, or in biotech and pharma and looking for a clearer way to document and share their work. Weve also found a lot of interest from those working in laboratory automation, using Briefly as a way to collaborate with scientists to properly describe their workflows before they program the robots, Rickerby noted.

In the long run, the company wants to build this work and also open up a public version of the platform for sharing experiments and protocols. This will allow scientists to discover complete, reproducible methodology that they can easily adapt and use in their own labs just as Github did for open-source software development, the CEO added.

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat's Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

The rest is here:

Briefly Bio raises $1.2M to build the GitHub of science experiments - VentureBeat

Advanced Retrieval Techniques in a World of 2M Token Context Windows, Part 1 | by Meghan Heintz | Jul, 2024 – Towards Data Science

Visualising AI project launched by Google DeepMind. From Unsplash image.

Gemini Pro can handle an astonishing 2M token context compared to the paltry 15k we were amazed by when GPT-3.5 landed. Does that mean we no longer care about retrieval or RAG systems? Based on Needle-in-a-Haystack benchmarks, the answer is that while the need is diminishing, especially for Gemini models, advanced retrieval techniques still significantly improve performance for most LLMs. Benchmarking results show that long context models perform well at surfacing specific insights. However, they struggle when a citation is required. That makes retrieval techniques especially important for use cases where citation quality is important (think law, journalism, and medical applications among others). These tend to be higher-value applications where lacking a citation makes the initial insight much less useful. Additionally, while the cost of long context models will likely decrease, augmenting shorter content window models with retrievers can be a cost-effective and lower latency path to serve the same use cases. Its safe to say that RAG and retrieval will stick around a while longer but maybe you wont get much bang for your buck implementing a naive RAG system.

Advanced RAG covers a range of techniques but broadly they fall under the umbrella of pre-retrieval query rewriting and post-retrieval re-ranking. Lets dive in and learn something about each of them.

Q: What is the meaning of life?

A: 42

Question and answer asymmetry is a huge issue in RAG systems. A typical approach to simpler RAG systems is to compare the cosine similarity of the query and document embedding. This works when the question is nearly restated in the answer, Whats Meghans favorite animal?, Meghans favorite animal is the giraffe., but we are rarely that lucky.

Here are a few techniques that can overcome this:

The nomenclature Rewrite-Retrieve-Read originated from a paper from the Microsoft Azure team in 2023 (although given how intuitive the technique is it had been used for a while). In this study, an LLM would rewrite a user query into a search engine-optimized query before fetching relevant context to answer the question.

The key example was how this query, What profession do Nicholas Ray and Elia Kazan have in common? should be broken down into two queries, Nicholas Ray profession and Elia Kazan profession. This allows for better results because its unlikely that a single document would contain the answer to both questions. By splitting the query into two the retriever can more effectively retrieve relevant documents.

Rewriting can also help overcome issues that arise from distracted prompting. Or instances where the user query has mixed concepts in their prompt and taking an embedding directly would result in nonsense. For example, Great, thanks for telling me who the Prime Minister of the UK is. Now tell me who the President of France is would be rewritten like current French president. This can help make your application more robust to a wider range of users as some will think a lot about how to optimally word their prompts, while others might have different norms.

In query expansion with LLMs, the initial query can be rewritten into multiple reworded questions or decomposed into subquestions. Ideally, by expanding the query into several options, the chances of lexical overlap increase between the initial query and the correct document in your storage component.

Query expansion is a concept that predates the widespread usage of LLMs. Pseudo Relevance Feedback (PRF) is a technique that inspired some LLM researchers. In PRF, the top-ranked documents from an initial search to identify and weight new query terms. With LLMs, we rely on the creative and generative capabilities of the model to find new query terms. This is beneficial because LLMs are not restricted to the initial set of documents and can generate expansion terms not covered by traditional methods.

Corpus-Steered Query Expansion (CSQE) is a method that marries the traditional PRF approach with the LLMs generative capabilities. The initially retrieved documents are fed back to the LLM to generate new query terms for the search. This technique can be especially performant for queries for which LLMs lacks subject knowledge.

There are limitations to both LLM-based query expansion and its predecessors like PRF. The most glaring of which is the assumption that the LLM generated terms are relevant or that the top ranked results are relevant. God forbid I am trying to find information about the Australian journalist Harry Potter instead of the famous boy wizard. Both techniques would further pull my query away from the less popular query subject to the more popular one making edge case queries less effective.

Another way to reduce the asymmetry between questions and documents is to index documents with a set of LLM-generated hypothetical questions. For a given document, the LLM can generate questions that could be answered by the document. Then during the retrieval step, the users query embedding is compared to the hypothetical question embeddings versus the document embeddings.

This means that we dont need to embed the original document chunk, instead, we can assign the chunk a document ID and store that as metadata on the hypothetical question document. Generating a document ID means there is much less overhead when mapping many questions to one document.

The clear downside to this approach is your system will be limited by the creativity and volume of questions you store.

HyDE is the opposite of Hypothetical Query Indexes. Instead of generating hypothetical questions, the LLM is asked to generate a hypothetical document that could answer the question, and the embedding of that generated document is used to search against the real documents. The real document is then used to generate the response. This method showed strong improvements over other contemporary retriever methods when it was first introduced in 2022.

We use this concept at Dune for our natural language to SQL product. By rewriting user prompts as a possible caption or title for a chart that would answer the question, we are better able to retrieve SQL queries that can serve as context for the LLM to write a new query.

See the original post here:

Advanced Retrieval Techniques in a World of 2M Token Context Windows, Part 1 | by Meghan Heintz | Jul, 2024 - Towards Data Science

AI and data science – AGInfo Ag Information Network

John Bourne VP Marketing Ceres Imaging discusses the top trends shaping the future of AI in agriculture?

Computer vision and deep-learning algorithms are being leveraged by ag tech companies to process data captured from aerial imagery, allowing growers to monitor and manage crop and soil health. Over time, these AI/machine learning models develop to track and predict various environmental impacts like water stress, disease and pest infiltration. Growers will increasingly look to use this historical data to predict yield in season and analyze the ROI on various potential fixes or investments they can make to improve yield.

Artificial intelligence and data science are important because they will empower growers to make better decisions in increasingly complex times. In a world that will be challenged by a rapidly increased food demand, climate change,a continued decline in agricultural workersand long-term sustainability concerns, farmers will look to adopt smarter tools that give them an edge, and the confidence to solve problems with certainty.

See the original post:

AI and data science - AGInfo Ag Information Network

Chaining Pandas Operations: Strengths and Limitations | by Marcin Kozak | Jul, 2024 – Towards Data Science

PYTHON PROGRAMMING Learn when its worth chaining Pandas operations in pipes. 17 min read

The title of this article stresses the strengths and limitations of chaining Pandas operations but to be honest, I will write about fun.

Why fun? Is it at all important when we have data to analyze?

I dont know what works for you, but for me fun in work is important. During my 20+ years of experience in data science, Ive found that the more enjoyment I derive from coding, the more satisfied I am from completing the task. And I do mean the process of pursuing the task, not only just completing it. Of course, achieving results matters, probably the most. But trust me, if you dislike the tools youre using, all youll want is to finish the job as quickly as possible. This can lead to mistakes, as you might work hastily and overlook important details in the data. And thats something you want to avoid.

I transitioned to Python from R, and analyzing data with R is a lot of fun thanks to the dplyr syntax. Ive always enjoyed it, and I still do. However, when I switched to Python, I found myself preferring it over R. Ive never really enjoyed programming in R (note the distinction between analyzing data and programming), while

See original here:

Chaining Pandas Operations: Strengths and Limitations | by Marcin Kozak | Jul, 2024 - Towards Data Science

Lessons Learned as a Data Science Manager and Why Im Moving Back to an Individual Contributor Role – Towards Data Science

Photo by Robert Ruggiero on Unsplash

As you gain experience and progress through your career in data science, at some point you might start hearing or asking yourself this question:

Do you want to move into management?

My boss asked me this question two years ago. At the time I was a senior data scientist. I was successfully leading projects and mentoring junior team members on top of other duties. Since I was already doing a lot of the work of a manager (or so I thought), I didnt think moving to a management position would be a big change. Plus, I thought a fancier title and a pay bump for (mostly) the same work sounded like a good deal, so, even though I never planned on trying to become a manager, I eventually agreed to move from an individual contributor (senior data scientist) to a people leader role (manager data science).

Now, after nearly two years in management, I decided to move back to an individual contributor role. I will share what motivated me to do this later on in this post, but in short, I found that an individual contributor role aligns with my interests

Continue reading here:

Lessons Learned as a Data Science Manager and Why Im Moving Back to an Individual Contributor Role - Towards Data Science

Running Local LLMs is More Useful and Easier Than You Think – Towards Data Science

Image generated by AI by Author

ChatGPT is great, no doubt about that, but it comes with a significant drawback: everything you write or upload is stored on OpenAIs servers. Although this may be fine in many cases, when dealing with sensitive data this might become a problem.

For this reason, I started exploring open-source LLMs which can be run locally on personal computers. As it turns out, there are actually many more reasons why they are great.

1. Data Privacy: your information stays on your machine.

2. Cost-Effective: no subscription fees or API costs, they are free to use.

3. Customization: models can be fine-tuned with your specific system prompts or datasets.

4. Offline Functionality: no internet connection is required.

5. Unrestricted Use: free from limitations imposed by external APIs.

Now, setting up a local LLM is surprisingly straightforward. This article provides a step-by-step guide to help you install and run an open-source model on your

See the article here:

Running Local LLMs is More Useful and Easier Than You Think - Towards Data Science