Page 379«..1020..378379380381..390400..»

W&M, industry partnership leverages AI to support patients with chronic conditions – William & Mary

William & Mary data scientist Haipeng Chen believes in AI for social good. So, he is using his expertise to help deliver personalized and more accessible health care to patients with chronic conditions.

Chen, an assistant professor of data science, is leading a partnership between William & Mary and the health care technology company Generated Health. He and his team will develop synthetic patient data to help train a more autonomous version of the companys digital nurse, Florence.

If we can have an AI system that can deliver automated, personalized management of patients, then we will relieve some of the growing pressures created by the accelerating prevalence of chronic conditions and workforce shortages, said Chen.

The contract with Generated Health, starting July 1 this year, will also cover the stipend of a graduate research assistant from Chens lab.

This partnership is part of a growing portfolio of externally funded data science research at William & Mary. The data science program has attracted over $2 million in research funding last year and is now extending its scope with projects supported by federal agencies and the private sector as well as pursuing technology transfer opportunities.

As the disciplinary home of AI on campus, the data science unit is particularly interested in studying AI solutions as they impact the world, said Professor Anthony Stefanidis, data science program director. He described the research program as particularly focused on the intersections of data science and AI with location, health, information generation and dissemination, and large-scale experiments and simulations.

The data science program will be part of a proposednew schoolat William & Mary, which will expand among other things the universitysfocus on data fluencyand data-intensive research by building on the strengths of existing programs.

According to aGenerated Health press release, the digital nurse Florence has already managed over 25 million clinical conversations with 200,000 patients in three countries, delivering a better patient experience and improved clinical outcomes.

Chen said that Florence has been used to help chronic disease patients monitor and control their conditions.

In many cases, patients cant get an appointment soon enough to get to know their condition better, said Chen. Using AI, we can have an automated way to accelerate and augment the current health care system.

Chen and his team will be developing an AI diffusion model simulating real patient behavior, which will be used to train the nurse model combining generative AI and reinforcement learning.

The goal is developing a next-generation digital nurse with the ability to take effective decisions learning from its interaction with an environment within a set of clinical rules and protocols that eliminate the risk for hallucination that is, incorrect information presented as factual.

Chens interest in health care is not new. While a postdoctoral fellow at Harvard, he started working on AI in the public health domain. At William & Mary, he and Associate Professor of Kinesiology Carrie Dolan are developinga projectusing data science to get timely vaccinations to rural communities in Kenya.

I believe that AI should be used for the good: Its a kind of philosophical belief, said Chen. Many people mostly care about the fancy techniques, but then at the end of the day what really makes AI useful is its application to domains related to society.

According to Chen, one advantage of applying AI to the medical domain is freeing up clinicians time, helping alleviate the impact of workforce shortages in health care across the nation currentlyestimatedat 200,000 among nurses and 124,000 among physicians by the 2030s. Also, he sees AI as a support tool for auxiliary health care workers, helping remove barriers and create job opportunities.

This collaboration is a very important piece of my general vision, he said. I would be excited to see this system benefiting tens of thousands or even millions of patients around the world because thats one of the end goals for researchers in AI for social good.

Antonella Di Marzio, Senior Research Writer

Original post:

W&M, industry partnership leverages AI to support patients with chronic conditions - William & Mary

Read More..

Speculative Decoding for Faster Inference with Mixtral-8x7B and Gemma – Towards Data Science

7 min read

Larger language models typically deliver superior performance but at the cost of reduced inference speed. For example, Llama 2 70B significantly outperforms Llama 2 7B in downstream tasks, but its inference speed is approximately 10 times slower.

Many techniques and adjustments of decoding hyperparameters can speed up inference for very large LLMs. Speculative decoding, in particular, can be very effective in many use cases.

Speculative decoding uses a small LLM to generate the tokens which are then validated, or corrected if needed, by a much better and larger LLM. If the small LLM is accurate enough, speculative decoding can dramatically speed up inference.

In this article, I first explain how speculative decoding works. Then, I show how to run speculative decoding with different pairs of models involving Gemma, Mixtral-8x7B, Llama 2, and Pythia, all quantized. I benchmarked the inference throughput and memory consumption to highlight what configurations work the best.

Speculative decoding is presented by Google Research in this paper:

Fast Inference from Transformers via Speculative Decoding

It is a very simple and intuitive method. However, as we will see in detail in the next section, it is also difficult to make it work.

Speculative decoding runs two models during inference: the main model we want to use and a draft model. This draft model suggests the tokens during inference. Then, the main model checks the suggested tokens and corrects them if necessary. In the end, the output of speculative decoding is the same as the one that would have generated the main model alone.

Here is an illustration of speculative decoding by Google Research:

This method can dramatically accelerate inference if:

Go here to see the original:

Speculative Decoding for Faster Inference with Mixtral-8x7B and Gemma - Towards Data Science

Read More..

Forian Inc. to Announce Fourth Quarter and Full Year 2023 Results on March 28, 2024 – TradingView

NEWTOWN, PA, March 08,2024 (GLOBE NEWSWIRE) via NewMediaWire Forian Inc. FORA, a provider of data science driven information and analytics solutions to the healthcare and life sciences industries, will announce its fourth quarter and full year 2023 financial results on Thursday, March 28,2024, after the close of the market. The Company will host a conference call and webcast at 4:30 p.m. (ET) on March 28,2024, to discuss the results.

To register for the conference call, clickhere. The webcast will be available live athttps://edge.media-server.com/mmc/p/w4vcipvu. This information is also available on our website atwww.forian.com/investors. The earnings release along with a replay of the call promptly following its conclusion will be available at the same site.

About Forian

Forian provides a unique suite of data management capabilities and proprietary information and analytics solutions to optimize and measure operational, clinical and financial performance for customers within the traditional and emerging life sciences and healthcare payer and provider segments. Forian has industry leading expertise in acquiring, integrating, normalizing and commercializing large scale healthcare data assets. Forians information products overlay sophisticated data management and data science capabilities on top of a comprehensive clinical data lake to identify unique relationships, create distinctive information assets and generate proprietary insights. For more information, please visit the Companys website atwww.forian.com.

Cautionary Statements Regarding Forward-Looking Statements

This release contains forward-looking statements within the meaning of the federal securities laws, including Section 27A of the Securities Act of 1933, as amended, and Section 21E of the Securities Exchange Act of 1934, as amended. In this context, forward-looking statements often address expected future business and financial performance and financial condition, which may include GAAP and non- GAAP financial measures, and often contain words such as expect, anticipate, intend, plan, believe, seek, see, will, would, target, similar expressions and variations or negatives of these words. Forward-looking statements by their nature address matters that involve risks and uncertainties, many of which are beyond our control and are not guarantees of future results, such as statements about future financial and operating results, company strategy and intended product offerings and market positioning. These and other forward- looking statements are not guarantees of future results and are subject to risks, uncertainties and assumptions that could cause actual results to differ materially from those expressed in any forward-looking statements. Accordingly, there are or will be important factors that could cause actual results to differ materially from those indicated in such statements and, therefore, you should not place undue reliance on any such statements and caution must be exercised in relying on forward-looking statements. Factors that could cause actual results to differ include, but are not limited to, those risks and uncertainties associated with operations, strategy and goals, our ability to execute on our strategy and the additional risks and uncertainties set forth more fully under the caption Risk Factors in Forians Annual Report on Form 10-K for the year ended December 31,2022, as filed with the United States Securities and Exchange Commission (SEC) on March 30,2023, and elsewhere in Forians filings and reports with the SEC. Forward-looking statements contained in this release are made as of the date hereof, and we undertake no duty to publicly update or revise any forward-looking statements, whether as a result of new information, future events or otherwise, except as may be required under applicable law.

Media and Investor Contact: forian.com/investorsir@forian.com

267-225-6263

SOURCE Forian Inc.

Read the original post:

Forian Inc. to Announce Fourth Quarter and Full Year 2023 Results on March 28, 2024 - TradingView

Read More..

Use Rust’s Speed to Install Python Libraries Up to 100 Times Faster – Towards Data Science

6 min read

For data scientists and Python programmers, pip needs no introduction. As a package manager, it is either the go-to solution or the starting point in our search for the best solution out there.

Trivia alert : While pip doesnt require an introduction, I have only just learned that it actually stands for Pip Installs Packages or Preferred Installer Program.

pip is not the only package manager available. Below, you can find the most popular tools on the market:

And as this XKCD comic illustrates, there is always room for a new joiner!

Today, well look into uv, which boasts being over 100 times faster than pip. uv functions as a Python package installer, virtual environment creator and a resolver. To achieve its blazing speed, it was built in Rust. Additionally, it was designed as a drop-in replacement for pip and pip-tools workflows.

That sounds promising! Before putting it to the test, lets briefly briefly touch upon the difference between an installer and resolver:

Before we dive into testing, it is worth mentioning that I am conducting the tests on an M1 Mac Mini. And a general disclaimer

See the original post here:

Use Rust's Speed to Install Python Libraries Up to 100 Times Faster - Towards Data Science

Read More..

How to Generate Instruction Datasets from Any Documents for LLM Fine-Tuning – Towards Data Science

Generate high-quality synthetic datasets economically using lightweight libraries

Large Language Models (LLMs) are capable and general-purpose tools, but often they lack domain-specific knowledge, which is frequently stored in enterprise repositories.

Fine-tuning a custom LLM with your own data can bridge this gap, and data preparation is the first step in this process. It is also a crucial step that can significantly influence your fine-tuned models performance.

However, manually creating datasets can be an expensive and time-consuming. Another approach is leveraging an LLM to generate synthetic datasets, often using high-performance models such as GPT-4, which can turn out to be very costly.

In this article, I aim to bring to your attention to a cost-efficient alternative for automating the creation of instruction datasets from various documents. This solution involves utilizing a lightweight open-source library called Bonito.

Before we dive into the library bonito and how it works, we need to first understand what even an instruction is.

An instruction is a text or prompt given to a LLM, such as Llama, GPT-4, etc. It directs the model to produce a specific kind of answer. Through instructions, people can guide the discussion, ensuring that the models replies are relevant, helpful, and in line with what the user wants. Creating clear and precise instructions is important to achieve the desired outcome.

Bonito is an open-source model designed for conditional task generation. It can be used to create synthetic instruction tuning datasets to adapt large language models to users specialized, private data.

Read the rest here:

How to Generate Instruction Datasets from Any Documents for LLM Fine-Tuning - Towards Data Science

Read More..

Data and Analytics Working Group to host third annual Data Engagement Conference March 20-21 – Clemson News

March 6, 2024March 6, 2024

The third annual Data Engagement Conference, hosted by the Clemson University Data and Analytics Working Group (DAWG), will take place on March 20-21, 2024, from 8:15 a.m. to 4:30 p.m. at the Watt Family Innovation Center. This conference is open to all Clemson University employees and students interested in data science.

Keynote speaker Ellen Granburg, president of George Washington University, will offer reflections on her experiences with data-informed leadership and institutional strategy as a department chair and senior associate provost at Clemson before her roles as a provost and president. Attendees can expect a series of sessions delving into best practices, campus partnerships, and the overarching theme of leveraging data effectively to improve operations and advance the University mission.

Sponsored by the Watt Family Innovation Center and the Office of the Provost, the event promises engaging presentations tailored to address pertinent challenges and opportunities in data analytics at Clemson University.

Attendees will also have the opportunity to participate in the annual data and analytics awards presentation ceremony. Lunch will be provided for in-person participants, and a Zoom link will be available for virtual attendees.

For those interested in participating, whether in-person or virtually, the preferred due date forregistrationis March 13.

Read the original:

Data and Analytics Working Group to host third annual Data Engagement Conference March 20-21 - Clemson News

Read More..

Free Online Data Science courses By Top firms and Universities – Analytics Insight

In the fast-paced world of technology, data science has emerged as a critical field driving innovation, decision-making, and problem-solving across industries. Recognizing the increasing demand for skilled data scientists, several top companies and universities are offering free online courses, democratizing access to valuable knowledge and skills. Whether youre a beginner looking to enter the field or a seasoned professional seeking to enhance your expertise, these courses provide a gateway to the vast and dynamic realm of data science.

The Data Science Boom: As data continues to grow exponentially, organizations worldwide are grappling with the challenge of extracting meaningful insights from this wealth of information. Data science, at the intersection of statistics, programming, and domain expertise, has become the linchpin for transforming raw data into actionable intelligence. With the rising importance of data-driven decision-making, the demand for skilled data scientists has reached unprecedented levels.

Leading Institutions Open Their Digital Doors: Major universities and tech giants are recognizing the need to nurture a global community of data scientists. To bridge the skills gap, institutions such as Stanford University, Massachusetts Institute of Technology (MIT), and Harvard University are offering free online courses, covering fundamental concepts to advanced topics. These courses are not only accessible but also structured to accommodate learners at various proficiency levels.

Googles Data Science with Google Cloud: Google, a powerhouse in the tech industry, offers the Data Science with Google Cloud specialization on Coursera. This program provides hands-on experience with Google Cloud Platform tools and technologies. From data exploration and visualization to machine learning and deployment, learners gain practical skills directly applicable to real-world scenarios. The self-paced format allows flexibility for professionals juggling work and learning commitments.

Microsofts Data Science and Machine Learning Essentials: Microsofts commitment to empowering learners is evident in its Data Science and Machine Learning Essentials course on edX. Designed for beginners, this course covers the basics of data manipulation, visualization, and machine learning using Microsoft Azure tools. The hands-on labs provide a practical understanding of data science workflows, making it an ideal starting point for those new to the field.

IBMs Data Science Professional Certificate: IBM, a pioneer in the tech industry, offers the Data Science Professional Certificate on Coursera. This comprehensive program covers the entire data science lifecycle, including data exploration, feature engineering, and machine learning. Learners work with real-world datasets and gain proficiency in tools like Jupyter notebooks and Apache Spark. Completing this certificate can be a steppingstone towards a career in data science.

Harvards Data Science MicroMasters Program: Harvard Universitys Data Science MicroMasters Program on edX provides learners with an in-depth understanding of data science concepts and methodologies. With courses covering R programming, statistical concepts, machine learning, and more, this program is tailored for individuals seeking a rigorous and comprehensive education in data science. Completing the MicroMasters can also serve as a pathway to a full masters degree at Harvard.

Democratizing Data Science Education: The availability of free online data science courses from top institutions reflects a collective effort to democratize education and bridge the global skills gap. Aspiring data scientists, regardless of their geographical location or financial constraints, can access high-quality resources to build a strong foundation or advance their existing skills. This democratization not only benefits individuals but also contributes to a more inclusive and diverse data science community.

Join our WhatsApp and Telegram Community to Get Regular Top Tech Updates

More here:

Free Online Data Science courses By Top firms and Universities - Analytics Insight

Read More..

TARNet and Dragonnet: Causal Inference Between S- And T-Learners – Towards Data Science

Learn how to build neural networks for direct causal inference Photo by Geranimo on Unsplash

Building machine learning models is fairly easy nowadays, but often, making good predictions is not enough. On top, we want to make causal statements about interventions. Knowing with high accuracy that a customer will leave our company is good, but knowing what to do about it for example sending a coupon is much better. This is a bit more involved, and I explained the basics in my other article.

I recommend reading this article before you continue. I showed you how you can easily come to causal statements whenever your features form a sufficient adjustment set, which I will also assume for the rest of the article.

The estimation works using so-called meta-learners. Among them, there are the S- and the T-learners, each with their own set of disadvantages. In this article, I will show you another approach that can be seen as a tradeoff between these two meta-learners that can give you better results.

Let us assume that you have a dataset (X, t, y), where X denotes some features, t is a distinct binary treatment, and y is the outcome. Let us briefly recap how the S- and T-learners work and when they dont perform well.

If you use an S-learner, you fix a model M and train it on the dataset such that M(X, t) y. Then, you compute

Treatment Effects = M(X, 1) - M(X, 0)

and thats it.

The problem with this approach is that the mode could choose to ignore the feature t completely. This typically happens if you already have hundreds of features in X, and t drowns in this noise. If this

Read the rest here:

TARNet and Dragonnet: Causal Inference Between S- And T-Learners - Towards Data Science

Read More..

Future of LLM application development – impact of Gemini 1.5 Pro with a 1M context window, – DataScienceCentral.com – Data Science Central

Image source https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/

Current LLM applicationsare mostly based on LangChain or LlamaIndex. LangChain and LlamaIndex are frameworks designed for LLM development. They each cater to different use cases with unique features.

LangChain is a framework ideal for creating data-aware and agent-based applications. It offers high-level APIs for easy integration with various large language model (LLM) providers, supporting a broad range of capabilities and tool integration.

LlamaIndex focuses on indexing and retrieval of data, making it highly suitable for applications that require smart search and deep exploration of data. It features a lightweight interface for data loading and transfer and offers a list index feature for composing an index from other indexes. This functionality is useful for searching and summarizing data from heterogeneous sources, making LlamaIndex as a good choice for projects centered around data retrieval and search capabilities.

The choice between LangChain and LlamaIndex depends on the specific needs of your project.If you require a broader, more versatile framework for developing complex language model applications with multiple tool integrations, LangChain might be the right choice. Conversely, if your applications core functionality revolves around efficient data search and retrieval, LlamaIndex could offer more targeted benefits.

A typical use case for LangChain involves building intelligent agents capable of integrating with various language models and external data sources. For example, creating an application that uses natural language to interact with databases, such as a chatbot that can query a SQL database in plain English and provide users with answers based on the databases data.

A typical use case for LlamaIndexmight be a custom knowledge management system where private or domain-specific documents are ingested and indexed, allowing users to perform natural language searches to find precise information within those documents. LlamaIndex, with its focus on efficient indexing and retrieval, is ideal for applications that need smart search capabilities across large volumes of data.

Now, the question is: How does this LLM application development status quo (ex RAG) change with the Gemini 1.5 pro-LLM with a 1M token context window?

Google recently released Gemini 1.5 Pro with a 1M context window,The ability to process 1 million tokens in one go implies the capability to process vast amounts of informationincluding 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words.

It also means the LLM can reason over such vast amounts of information. For example, when given the 402-page transcripts from Apollo 11s mission to the moon, it can reason about conversations, events, and details found across the document.

Also,1.5 Pro can perform highly sophisticated understanding and reasoning tasks for different modalities, including video.The same abilities can also extend toproblem-solving with longer blocks of code. In this scenario what happens to RAG?

LlamaIndex creator Jerry Liu proposes his vision having worked with Gemeni1.5 Pro. He believes that as tokens get cheaper, we will see a new wave of large context LLMs in the future. While long-context LLMs will simplify certain parts of the RAG pipeline (e.g. chunking), new RAG architectureswill need to be evolved to cater for thenew use cases arising from long context LLMs. These could includeQA over semi-structured data, over complex documents, and agentic reasoning in a multi-doc setting.

References:

https://www.llamaindex.ai/blog/towards-long-context-rag https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/

Follow this link:

Future of LLM application development - impact of Gemini 1.5 Pro with a 1M context window, - DataScienceCentral.com - Data Science Central

Read More..

New programs create new opportunities for Bulldogs – University of Redlands

Redlands offers more than 45 undergraduate programs and encourages students to take the time to discover and develop their academic goals. This fall, Bulldogs will have even more choices with the addition of four new majors: data science, Geographical Information Systems (GIS) B.A and B.S., and kinesiology.

What are the main pollutants affecting air quality? Have reading levels improved in elementary school students since new standards were implemented? Are there patterns in patients that can help predict early signs of heart disease? These are just a few questions data scientists can help answer through analyzing and interpreting complex data sets.

Data science is a multidisciplinary field that utilizes scientific methods, algorithms, and statistics to understand patterns and trends. Redlands new program offers a unique opportunity for students to bridge liberal arts with science, technology, engineering, and mathematics (STEM). Unlike other university data science programs, students are not expected to have a formal background in mathematics or computer science.

The program aims to provide students of different backgrounds and interests with a solid foundation that not only advances technical skills but also fosters ethical responsibility, diversity, and collaboration. The program utilizes tools from mathematics, statistics, and computer science to investigate questions in a wide variety of disciplines, even areas that have not traditionally been seen as data focused. Students in the program will gain experience in Python, R, and SQL three of the top programming skills required by many employers and acquire a myriad of skills, such as data cleaning, visualization, statistical modeling, and machine learning.

Professor Joanna Bieri, the data science program director, believes that these technical skills must be complemented by non-technical skills, such as critical thinking, effective communication, and creative problem-solving. Data science is not just about crunching numbers but about asking meaningful questions and finding solutions within specific domains. The Redlands liberal arts environment is the ideal place to develop both the technical and non-technical skills that employers are looking for, she said.

To learn more about the Data Science program, click here.

As described by Esri, the worlds leading producer of Geographical Information Systems (GIS) software, GIS is a system that creates, manages, analyzes, and maps all types of data. GIS connects data to a map, integrating location data where things are with all types of descriptive information what things are like there. The U of R will offer two new GIS programs: the Bachelor of Arts in GIS (BAGIS) program and the Bachelor of Science in GIS (BSGIS). Both programs aim to create leaders in the field of GIS, but the BSGIS program differs from the BAGIS program in that the former offers students more advanced skills in GIS without the same exposure to interdisciplinary applications contained in the latter.

John Glover, a history and GIS professor, said We inhabit a world saturated with spatial information, from maps and charts to mobile devices that employ location-based services. It is also increasingly clear that the pressing issues confronting society from urban planning to energy needs, natural hazards, climate change, and human health are fundamentally spatial in nature. Linking locational and descriptive information makes it possible to visualize data in the form of maps and charts to conduct location-based analyses, revealing patterns that can inform decision making.

Continue reading to learn more about the BAGIS and BSGIS programs.

The BAGIS program is an applied, interdisciplinary curriculum that empowers students to utilize GIS. BAGIS distinguishes students by enabling them to gain a strong foundation in the application of GIS through one-one-one mentorship, real-world problem-solving, and first-hand experience in internships. BAGIS has three major points of emphasis: interdisciplinary learning, applied career-oriented spatial skills, and complementarity nature for a potential second major.

Students in the program will take elective courses to further emphasize diverse case studies and interdisciplinary application of spatial tools. Career-oriented classes showcase real-world problem solving flood control, crime deterrence, fire management, and more and the senior seminar helps students to identify and compete for GIS internships and jobs. Due to the streamlined and applied design, students are encouraged to complete a second major, where students of all interests and backgrounds can be leaders in the field of GIS.

To learn more about the B.A. in GIS, click here.

The BSGIS program equips students with skills needed to proficiently identify, propose, design, and develop solutions to spatial problems, emphasizing four main skills: spatial data acquisition, data management, spatial analysis, and information presentation.

Students will strengthen their understanding of acquiring spatial data by studying real-world applications, such as image sourcing, GPS data, and coordinate calculations from field measurements. Effectively organizing substantial volumes of data is a pivotal skill in GIS, and students will actively learn how to upload, create, and retrieve various types of spatial information. Courses like Introduction to Python Programming and Applied Data Analysis with Python within the curriculum will establish a strong foundation for students to excel in creating compelling visuals for effectively communicating information and solutions to spatial problems.

To learn more about the B.S. in GIS, click here.

Physical therapy, occupational therapy, and athletic training are just a few of the growing fields in allied health. The new kinesiology program will provide students the opportunity to excel in an academically rigorous program that provides a pathway to graduate programs and career opportunities within the allied health industry.

Tom Whittemore, chair of physical education, said, Transitioning our PE minor into a kinesiology major is something we have talked about within the PE and Athletics Department for several years. As I talked to current U of R students, it became more and more apparent that there is great interest in kinesiology and careers in the allied health professions.

I am confident that this new program will be a great draw for prospective students with an eye toward careers in physical therapy, occupational therapy, athletic training, teaching, coaching, and strength and fitness. The timing is perfect for us to launch the kinesiology major, and we have received tremendous support from administration, faculty, and students.

Students are encouraged to have a strong background or willingness to learn in biology and chemistry courses. Paired with various elective courses, such as biomechanics and exercise physiology, students will gain a deep understanding in the fundamentals of human movements and applications of kinesiology to human health.

To learn more about the kinesiology program, click here.

Read the original post:

New programs create new opportunities for Bulldogs - University of Redlands

Read More..