Page 528«..1020..527528529530..540550..»

Quantum Quandary: Navigating the Path to Unbreakable Encryption – Security Boulevard

The rise of quantum computing presents a profound challenge to data security. Termed Q-Day, the point at which quantum computers could break existing encryption algorithms looms on the horizon. This quantum threat, now considered imminent rather than distant, necessitates a strategic shift towards quantum-safe solutions.

Quantum computing, with its potential to unravel established encryption standards, poses a dual dilemma. Grovers algorithm targets symmetric key cryptosystems, while Shors algorithm jeopardizes public key cryptosystems. The quantum threat extends beyond theoretical frameworks, as threat actors exploit quantum capabilities in harvest now, decrypt later attacks.

Governments and the tech industry collaborate on initiatives to usher in post-quantum cryptography (PQC), a quantum-resistant alternative. PQC algorithms, grounded in lattice problems and codes, offer resilience against quantum attacks. The US National Institute of Standards and Technology (NIST) plays a pivotal role, outlining draft PQC standards expected to become global benchmarks in 2024.

Preparing for the quantum threat demands a proactive approach. Large organizations are urged to identify priority systems, assess their susceptibility, and initiate an incremental transition to PQC for key exchange. The deployment of cryptographic agility, incorporating both traditional and PQC algorithms, emerges as a pragmatic strategy. However, transitioning to full PQC requires meticulous planning, considering security risks and potential business continuity disruptions.

NIST and the UK National Cyber Security Centre (NCSC) guide the way with evolving standards for quantum-safe cryptography. The UK emphasizes a wait-and-watch approach, endorsing quantum-safe solutions once standardized and interoperable algorithms emerge. Legislation and guidelines further reinforce the urgency of preparing for a quantum-safe future.

Key players like SSH Communications Security contribute to the quantum-safe journey. Their NQX solution and accompanying tools facilitate a seamless transition, ensuring critical data remains protected during the shift to quantum-safe cryptography.

As the quantum threat inches closer, businesses stand at a crossroads. Adapting to post-quantum cryptography is a technological upgrade and a strategic imperative. The roadmap involves embracing PQC, understanding global standards, and integrating quantum-safe technologies. Safeguarding the digital future hinges on navigating the quantum threat landscape with foresight and agility, ensuring data security endures the quantum revolution.

The post Quantum Quandary: Navigating the Path to Unbreakable Encryption appeared first on Centraleyes.

*** This is a Security Bloggers Network syndicated blog from Centraleyes authored by Rebecca Kappel. Read the original post at: https://www.centraleyes.com/quantum-quandary-navigating-the-path-to-unbreakable-encryption/

See the article here:
Quantum Quandary: Navigating the Path to Unbreakable Encryption - Security Boulevard

Read More..

Fighting European Threats to Encryption: 2023 Year in Review – EFF

Private communication is a fundamental human right. In the online world, the best tool we have to defend this right is end-to-end encryption. Yet throughout 2023, politicians across Europe attempted to undermine encryption, seeking to access and scan our private messages and pictures.

But we pushed back in the EU, and so far, weve succeeded. EFF spent this year fighting hard against an EU proposal (text) that, if it became law, would have been a disaster for online privacy in the EU and throughout the world. In the name of fighting online child abuse, the European Commission, the EUs executive body, put forward a draft bill that would allow EU authorities to compel online services to scan user data and check it against law enforcement databases. The proposal would have pressured online services to abandon end-to-end encryption. The Commission even suggested using AI to rifle through peoples text messages, leading some opponents to call the proposal chat control.

EFF has been opposed to this proposal since it was unveiled last year. We joined together with EU allies and urged people to sign the Dont Scan Me petition. We lobbied EU lawmakers and urged them to protect their constituents human right to have a private conversationbacked up by strong encryption.

Our message broke through. In November, a key EU committee adopted a position that bars mass scanning of messages and protects end-to-end encryption. It also bars mandatory age verification, which would have amounted to a mandate to show ID before you get online; age verification can erode a free and anonymous internet for both kids and adults.

Well continue to monitor the EU proposal as attention shifts to the Council of the EU, the second decision-making body of the EU. Despite several Member States still supporting widespread surveillance of citizens, there are promising signs that such a measure wont get majority support in the Council.

Make no mistakethe hard-fought compromise in the European Parliament is a big victory for EFF and our supporters. The governments of the world should understand clearly: mass scanning of peoples messages is wrong, and at odds with human rights.

EFF also opposed the U.K.s Online Safety Bill (OSB), which passed and became the Online Safety Act (OSA) this October, after more than four years on the British legislative agenda. The stated goal of the OSB was to make the U.K. the worlds safest place to use the internet, but the bills more than 260 pages actually outline a variety of ways to undermine our privacy and speech.

The OSA requires platforms to take action to prevent individuals from encountering certain illegal content, which will likely mandate the use of intrusive scanning systems. Even worse, it empowers the British government, in certain situations, to demand that online platforms use government-approved software to scan for illegal content. The U.K. government said that content will only be scanned to check for specific categories of content. In one of the final OSB debates, a representative of the government noted that orders to scan user files can be issued only where technically feasible, as determined by the U.K. communications regulator, Ofcom.

But as weve said many times, there is no middle ground to content scanning and no safe backdoor if the internet is to remain free and private. Either all content is scanned and all actorsincluding authoritarian governments and rogue criminalshave access, or no one does.

Despite our opposition, working closely with civil society groups in the UK, the bill passed in September, with anti-encryption measures intact. But the story doesn't end here. The OSA remains vague about what exactly it requires of platforms and users alike. Ofcom must now take the OSA and, over the coming year, draft regulations to operationalize the legislation.

The public understands better than ever that government efforts to scan it all will always undermine encryption, and prevent us from having a safe and secure internet. EFF will monitor Ofcoms drafting of the regulation, and we will continue to hold the UK government accountable to the international and European human rights protections that they are signatories to.

Go here to see the original:
Fighting European Threats to Encryption: 2023 Year in Review - EFF

Read More..

SandboxAQ Partners with Carahsoft to Expand Distribution of Cybersecurity And AI-Enabled Quantum Solutions in The … – The Quantum Insider

By Jen Sovada, President of Global Public Sector

Robert E. Williams, Head of Global Channels

Today, were announcing a partnership with Carahsoft Technology Corp., The Trusted Government IT Solutions Provider. As part of the agreement, Carahsoft will provide our modern cryptography management platform, Security Suite, and other AI and Quantum technology (AQ) solutions to the Public Sector via its existing contracts and network of resellers, integrators, and consultants. This builds on our early success with the U.S. Air Force, the Defense Information Systems Agency and the U.S. Dept. of Health & Human Services and will enable us to expand distribution of our solutions to help government agencies achieve their missions of today and the future.

Were starting with cybersecurity due to the predicted ability of fault-tolerant quantum computers to break todays public-key encryption protocols, which will put the worlds sensitive data, communications and financial transactions at risk. Adversaries have already begun acquiring and storing encrypted data for decryption by quantum computers using Store Now, Decrypt Later (SNDL) attacks. These attacks prompted President Biden to issue an Executive Order and two National Security Memoranda (NSM-8 and NSM-10) and sign the Quantum Computing Cybersecurity Preparedness Act (H.R.7535) into law.

Given the rapidly evolving cyber and quantum threats facing public sector entities, and the Presidents sweeping mandates to implement post-quantum cryptography, our partnership with Carahsoft will ensure that all federal, state and local agencies have access to a trusted, modern cryptography management solution to protect our countrys sensitive data, critical infrastructure, and national interests.

In addition to Security Suite, Carahsoft will also provide SandboxAQs Simulation & Optimization solutions, which can be used to discover and develop advanced new materials, such as more effective EV batteries or lighter, stronger metal alloys; and its quantum sensing solutions, which can be used for a broad range of biomagnetic, geophysical and materials sensing applications. Currently, the U.S. Air Force is testing SandboxAQs geomagnetic navigation system as a potential Assured Positioning, Navigation, and Timing (APNT) solution to augment the Global Positioning System (GPS).

SandboxAQ and Carahsoft will co-market these solutions via joint webinars, white papers and events, and will establish government training and upskilling courses on AI, quantum and all SandboxAQ solutions.

Carahsoft is excited to deliver SandboxAQs portfolio of cybersecurity and AI-enabled quantum solutions through our reseller partners to government, intelligence and defense agency contracts throughout the Public Sector, said Craig P. Abod, Carahsoft President. Given the size and complexity of government IT infrastructure, Security Suites ability to identify and remediate cryptographic vulnerabilities will enable the Public Sector to protect its critical systems, data and infrastructure against ever-evolving cyber threats. At the same time, SandboxAQs quantum-based solutions have tremendous potential to positively impact a broad range of government, intelligence, law enforcement, defense and health-related agencies.

Contract information for our Simulation and Sensing solutions will be available soon.

For more information, contact the Carahsoft team at (844) 445-5688 or [emailprotected]. To learn more about SandboxAQ, visit http://www.sandboxaq.com/partners

About the Authors:

Jen Sovada, President of Global Public Sector

Colonel (Ret.) Jen Sovada is the President of SandboxAQs Global Public Sector focused on Government issues at the nexus of quantum and AI. Prior to her position at SandboxAQ, she was the Chief Futures Officer and Senior Vice President / General Manager for the Intelligence Community (IC) start-up MissionTech Solutions. Jens Air Force career spanned 25 years in intelligence focused on higher-end technological capabilities where she held various positions in operational test, systems interoperability, and requirements definition. She commanded the Air Force Technical Applications Center, the DoDs sole organization responsible for nuclear treaty monitoring.

Robert E. Williams, Head of Global Channels

Robert is responsible for building out partnerships globally at SandboxAQ to increase technology adoption and service levels for its existing customers, and reach new customers. Robert has spent the last 15 years in global go-to-market leadership roles in cybersecurity, cloud, and telecom. Prior to SandboxAQ, Robert was a business development and channel executive at Palo Alto Networks, Amazon Web Services, and AT&T; his most recent role was VP of Public Sector Channels & Strategic Partnerships.

Read the original:
SandboxAQ Partners with Carahsoft to Expand Distribution of Cybersecurity And AI-Enabled Quantum Solutions in The ... - The Quantum Insider

Read More..

iStorage Datashur PRO+C Encrypted Flash Drive review protecting your personal data – The Gadgeteer

We use affiliate links. If you buy something through the links on this page, we may earn a commission at no cost to you. Learn more.

REVIEW We all have data that we need to protect, data like passwords, tax information, financial investments, social security and passport numbers, security clearance information, medical and dental records, and so on. Its data that we dont want anyone else to see, and we really dont want it stolen and used for identity theft. We need that data to be stored external to our computer so that it cant be stolen via malware or held hostage with ransomware. We need this data encrypted, but we dont want to figure out the details of encryption ourselves. We need it to be stored on a gadget small enough to toss into a home safe when were out of town or into a purse if we take it with us. For this kind of data, an encrypted flash drive like the iStorage Datashur PRO+C is exactly what we need.

The iStorage Datashur PRO+C is an encrypted flash drive from iStorage. It uses USB-C to connect to the computer, a built-in keypad to enter the PIN, and automatic hardware-based encryption to protect the data. iStorage is a company that develops innovative ultra-secure portable data storage devices.

The iStorage Datashur PRO+C is a svelte little gadget. Its only slightly bigger than a normal USB stick and stands out with a bright blue color. The housing and its case are both made of aluminum, which protects from water, dust, and physical damage. The top of the drive has a numeric keypad and three small LEDs. One end has the USB-C port, and the other has a wire ring for clipping on a lanyard or keychain. The entire drive has a design that says, I mean business.

The PRO+C comes with a quick start guide that, as the name suggests, provides just enough information to get up and going. Before using the drive for the first time, I have to setup an 8 to 15-digit PIN. After inserting the gadget into a USB port the PRO+C comes with a dongle that allows me to use a USB-A if I need to I followed the instructions, which include entering this PIN twice, and it was ready to go.

Using the keypad is a bit of a pain on this device. I have to press buttons while its plugged in and do so gently so that I dont bend the port. For a personal computer like my Mac mini, this is virtually impossible, because the ports are all around back. Even for my work computer, a MacBook Pro, this is still difficult, because it sits back to one side behind my monitor. This device is easiest to use in a computer that sits near the front of your desk and has a front-facing port USB port.

To make things easier for my testing, I found a USB extension cable that I had lying around; this allowed me to bring it out front and to hold the drive in my hand while entering the PIN. If you get this drive, I highly recommend spending a few extra bucks and picking up an extension cable, whether a USB-A or a USB-C.

Once I added the cable, I found that using this drive could not be any simpler. I simply connect it to the USB port of my computer, press the key button, enter my PIN, and then press the key button again. The LEDs flash and eventually, the green one turns on, letting me know that its ready to go. At this point, the PRO+C functions just like any other flash drive. It shows up as DATASHURPPC in the sidebar of finder, and I can copy, edit, and delete all files that I want. When I am done, I click the eject icon in Finder, wait for the blue light to finish flashing, and then remove the drive. The LED blinks red to let me know everything is secure. Easy peasy.

For most home users, this is all you will ever need to know: How to setup a PIN and how to unlock and use the drive. There are a few other situations that might be handy to know, and so iStorage includes a digital copy of the user manual on the drive itself and on their website. A few examples include:

I tested all of these features and found they worked just as the manual describes. There is one very important detail: All three of these changes should be made when the drive is NOT connected to the computer. The PRO+C has an internal battery that provides the power necessary to make these changes.

For those who work in an office, its possible to set two different PINs, the normal PIN for an employee and an admin PIN for administrators. This would allow an administrator to always have access to company data on the drive, even if the employee forgot the PIN, left the company, or whatever.

The buttons on the keypad are very small, approximately 5mm square. Those with larger fingers may find them a challenge to use.

The largest size PRO+C drive is 512 GB and costs $324. By comparison, a vanilla 512 GB drive is only $49.99 on Amazon, making the PRO+C drive 7.5 times more expensive. This kind of comparison, however, is not really fair, because a vanilla drive doesnt offer the same protection and capabilities. Protecting your most valuable data is worth the extra cost.

We live in a world where bad actors are constantly trying to exploit us through the misuse of our data. We need a gadget that provides great protection while being easy enough for anyone to use and highly portable. The Datashur PRO+C encrypted flash drive from iStorage is the perfect device for the average person. If your data is valuable to you and worth spending a little extra money to protect, then I commend this drive to you. If this drive isnt exactly what youre looking for, then take a look at the many other drives that iStorage sells.

Price: $129 to $324Where to buy: iStorage online store and AmazonSource: The sample for this review was provided by iStorage.

Excerpt from:
iStorage Datashur PRO+C Encrypted Flash Drive review protecting your personal data - The Gadgeteer

Read More..

Beyond Predictions: Uplift Modeling & the Science of Influence (Part I) – Towards Data Science

Illustration by the authorHands-On Approach to Uplift with Tree-Based Models

Predictive analytics has long been a cornerstone of decision-making, but what if we told you theres an alternative beyond forecasting? What if you could strategically influence the outcomes instead?

Uplift modeling holds this promise. It adds an interesting dynamic layer to traditional predictions by identifying individuals whose behavior can be influenced positively if they receive special treatments.

The application use cases are endless. In medicine, it would help identify patients for whom a medical treatment could improve their health. In retail, such a model allows for better targeting of customers for whom a promotion or personalized offering would be effective in retention.

This article is the first part of a series that explores the transformative potential of uplift modeling, shedding light on how it can reshape strategies in marketing, healthcare, and beyond. It focuses on uplift models based on decision trees and uses, as a case study, the prediction of customer conversion with the application of promotional offers

After reading this article, you will understand:

No prior knowledge is required to understand the article.The experimentations described in the article were carried out using the libraries scikit-uplift, causalml and plotly. You can find the code here on GitHub.

The best way to understand the benefit of using uplift models is through an example. Imagine a scenario where a telecommunications company aims to reduce customer churn.

A traditional ML-based approach would consist of using a model trained on historical data to predict the likelihood of current customers to churn. This would help identify customers at risk

Read the rest here:

Beyond Predictions: Uplift Modeling & the Science of Influence (Part I) - Towards Data Science

Read More..

Get on Track for a Data Science Career With $15 Off This Bundle – PCMag

A career in data science can be lucrative, but it takes more than a few advanced math classes. Those who excel in this field might be expected to work with and even create the next generation's AI algorithms, so they need to know their way around software like Excel and Python. The Complete Excel, VBA and Data Science Certification Training Bundle is a great resume builder when it comes to big data, and it's now on sale for $34.97 for a limited time.

In this bundle, you'll find more than 50 hours of training on the software and code that data scientists use the most. That includes code-free platforms like Amazon Honeycomb as well as Python and Excel, and beginners will find intro courses on them all that will get them ready for the headier concepts in later, more advanced lessons. At the end of each, a certification from Mammoth Interactive will let you show your newfound knowledge to potential employers.

Get the Complete Excel, VBA and Data Science Certification Training Bundle for $34.97 (reg. $49.99) through Jan. 1.

Prices subject to change. PCMag editors select and review products independently. If you buy through StackSocial affiliate links, we may earn commissions, which help support our testing.

Sign up for our expertly curated Daily Deals newsletter for the best bargains youll find anywhere.

This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.

Read more:

Get on Track for a Data Science Career With $15 Off This Bundle - PCMag

Read More..

Geospatial Indexing Explained: A Comparison of Geohash, S2, and H3 – Towards Data Science

Geospatial indexing, or Geocoding, is the process of indexing latitude-longitude pairs to small subdivisions of geographical space, and it is a technique that we data scientists often find ourselves using when faced with geospatial data.

Though the first popular geospatial indexing technique Geohash was invented as recently as 2008, indexing latitude-longitude pairs to manageable subdidivisions of space is hardly a new concept. Governments have been breaking up their land into states, provinces, counties, and postal codes for centuries for all sorts of applications, such as taking censuses and aggregating votes for elections.

Rather than using the manual techniques used by governments, we data scientists use modern computational techniques to execute such spatial subdividing, and we do so for our own purposes: analytics, feature-engineering, granular AB testing by geographic subdivision, indexing geospatial databases, and more.

Geospatial indexing is a thoroughly developed area of computer science, and geospatial indexing tools can bring a lot of power and richness to our models and analyses. What makes geospatial indexing techniques further exciting, is that a look under their proverbial hoods reveals eclectic amalgams of other mathematical tools, such as space-filling curves, map projections, tessellations, and more!

This post will explore three of todays most popular geospatial indexing tools where they come from, how they work, what makes them different from one another, and how you can get started using them. In chronological order, and from least to greatest complexity, well look at:

It will conclude by comparing these tools, and recommending when you might want to use one over another.

Before getting started, note that these tools include much functionality beyond basic geospatial indexing: polygon intersection, polygon containment checks, line containment checks, generating cell-coverings of geographical spaces, retrieval of geospatially indexed cells neighbors, and more. This post, however, focuses strictly on geospatial indexing functionality.

Geohash, invented in 2008 by Gustavo Niemeyer, is the earliest created geospatial indexing tool [1]. It enables its users to map

Continue reading here:

Geospatial Indexing Explained: A Comparison of Geohash, S2, and H3 - Towards Data Science

Read More..

Ted Kaouk, John Coughlan Take on Leadership Roles at CFTC Division of Data – Executive Gov

The Commodity Futures Trading Commission has expanded the leadership team of its Division of Data, or DOD, with the appointment of Ted Kaouk as chief data officer and director of DOD and John Coughlan as chief data scientist.

CFTC said Kaouk will oversee the DODs data integration efforts and facilitate collaboration among offices and divisions to help the CFTC leadership make informed policy decisions.

He was chief data officer at the Office of Personnel Management and oversaw the development of OPMs inaugural human capital data strategy.

Kaouk also held the same position at the Department of Agriculture and served as the first chair of the Federal Chief Data Officers Council.

Meanwhile, Coughlan has held data science and analytical roles at CFTC for eight years. Before joining the DOD, he was a market analyst within the Division of Market Oversights Market Intelligence Branch.

In his new role, Coughlan will help advance the DODs data science capabilities and drive the adoption of artificial intelligence-powered tools across the agency.

Read more here:

Ted Kaouk, John Coughlan Take on Leadership Roles at CFTC Division of Data - Executive Gov

Read More..

18 Data Science Tools to Consider Using in 2024 – TechTarget

The increasing volume and complexity of enterprise data, and its central role in decision-making and strategic planning, are driving organizations to invest in the people, processes and technologies they need to gain valuable business insights from their data assets. That includes a variety of tools commonly used in data science applications.

In an annual survey conducted by consulting firm Wavestone's NewVantage Partners unit, 87.8% of chief data officers and other IT and business executives from 116 large organizations said their investments in data and analytics initiatives, such as data science programs, increased during 2022. Looking ahead, 83.9% expect further increases this year despite the current economic conditions, according to a report on the Data and Analytics Leadership Executive Survey that was published in January 2023.

The survey also found that 91.9% of the responding organizations got measurable business value from their data and analytics investments in 2022 and that 98.2% expect their planned 2023 spending to pay off. Many strategic analytics goals remain aspirational, though: Only 40.8% of the respondents said they're competing on data and analytics, and just 23.9% have created a data-driven organization.

As data science teams build their portfolios of enabling technologies to help achieve those analytics goals, they can choose from a wide selection of tools and platforms. Here's a rundown of 18 top data science tools that may be able to aid you in the analytics process, listed in alphabetical order with details on their features and capabilities -- and some potential limitations.

Apache Spark is an open source data processing and analytics engine that can handle large amounts of data -- upward of several petabytes, according to proponents. Spark's ability to rapidly process data has fueled significant growth in the use of the platform since it was created in 2009, helping to make the Spark project one of the largest open source communities among big data technologies.

Due to its speed, Spark is well suited for continuous intelligence applications powered by near-real-time processing of streaming data. However, as a general-purpose distributed processing engine, Spark is equally suited for extract, transform and load uses and other SQL batch jobs. In fact, Spark initially was touted as a faster alternative to the MapReduce engine for batch processing in Hadoop clusters.

Spark is still often used with Hadoop but can also run standalone against other file systems and data stores. It features an extensive set of developer libraries and APIs, including a machine learning library and support for key programming languages, making it easier for data scientists to quickly put the platform to work.

Another open source tool, D3.js is a JavaScript library for creating custom data visualizations in a web browser. Commonly known as D3, which stands for Data-Driven Documents, it uses web standards, such as HTML, Scalable Vector Graphics and CSS, instead of its own graphical vocabulary. D3's developers describe it as a dynamic and flexible tool that requires a minimum amount of effort to generate visual representations of data.

D3.js lets visualization designers bind data to documents via the Document Object Model and then use DOM manipulation methods to make data-driven transformations to the documents. First released in 2011, it can be used to design various types of data visualizations and supports features such as interaction, animation, annotation and quantitative analysis.

D3 includes more than 30 modules and 1,000 visualization methods, making it complicated to learn. In addition, many data scientists don't have JavaScript skills. As a result, they may be more comfortable with a commercial visualization tool, like Tableau, leaving D3 to be used more by data visualization developers and specialists who are also members of data science teams.

IBM SPSS is a family of software for managing and analyzing complex statistical data. It includes two primary products: SPSS Statistics, a statistical analysis, data visualization and reporting tool, and SPSS Modeler, a data science and predictive analytics platform with a drag-and-drop UI and machine learning capabilities.

SPSS Statistics covers every step of the analytics process, from planning to model deployment, and enables users to clarify relationships between variables, create clusters of data points, identify trends and make predictions, among other capabilities. It can access common structured data types and offers a combination of a menu-driven UI, its own command syntax and the ability to integrate R and Python extensions, plus features for automating procedures and import-export ties to SPSS Modeler.

Created by SPSS Inc. in 1968, initially with the name Statistical Package for the Social Sciences, the statistical analysis software was acquired by IBM in 2009, along with the predictive modeling platform, which SPSS had previously bought. While the product family is officially called IBM SPSS, the software is still usually known simply as SPSS.

Julia is an open source programming language used for numerical computing, as well as machine learning and other kinds of data science applications. In a 2012 blog post announcing Julia, its four creators said they set out to design one language that addressed all of their needs. A big goal was to avoid having to write programs in one language and convert them to another for execution.

To that end, Julia combines the convenience of a high-level dynamic language with performance that's comparable to statically typed languages, such as C and Java. Users don't have to define data types in programs, but an option allows them to do so. The use of a multiple dispatch approach at runtime also helps to boost execution speed.

Julia 1.0 became available in 2018, nine years after work began on the language; the latest version is 1.9.4, with a 1.10 update now available for release candidate testing. The documentation for Julia notes that, because its compiler differs from the interpreters in data science languages like Python and R, new users "may find that Julia's performance is unintuitive at first." But, it claims, "once you understand how Julia works, it's easy to write code that's nearly as fast as C."

An open source web application, Jupyter Notebook enables interactive collaboration among data scientists, data engineers, mathematicians, researchers and other users. It's a computational notebook tool that can be used to create, edit and share code, as well as explanatory text, images and other information. For example, Jupyter users can add software code, computations, comments, data visualizations and rich media representations of computation results to a single document, known as a notebook, which can then be shared with and revised by colleagues.

As a result, notebooks "can serve as a complete computational record" of interactive sessions among the members of data science teams, according to Jupyter Notebook's documentation. The notebook documents are JSON files that have version control capabilities. In addition, a Notebook Viewer service enables them to be rendered as static webpages for viewing by users who don't have Jupyter installed on their systems.

Jupyter Notebook's roots are in the programming language Python -- it originally was part of the IPython interactive toolkit open source project before being split off in 2014. The loose combination of Julia, Python and R gave Jupyter its name; along with supporting those three languages, Jupyter has modular kernels for dozens of others.The open source project also includes JupyterLab, a newer web-based UI that's more flexible and extensible than the original one.

Keras is a programming interface that enables data scientists to more easily access and use the TensorFlow machine learning platform. It's an open source deep learning API and framework written in Python that runs on top of TensorFlow and is now integrated into that platform. Keras previously supported multiple back ends but was tied exclusively to TensorFlow starting with its 2.4.0 release in June 2020.

As a high-level API, Keras was designed to drive easy and fast experimentation that requires less coding than other deep learning options. The goal is to accelerate the implementation of machine learning models -- in particular, deep learning neural networks -- through a development process with "high iteration velocity," as the Keras documentation puts it.

The Keras framework includes a sequential interface for creating relatively simple linear stacks of layers with inputs and outputs, as well as a functional API for building more complex graphs of layers or writing deep learning models from scratch. Keras models can run on CPUs or GPUs and be deployed across multiple platforms, including web browsers and Android and iOS mobile devices.

Developed and sold by software vendor MathWorks since 1984, Matlab is a high-level programming language and analytics environment for numerical computing, mathematical modeling and data visualization. It's primarily used by conventional engineers and scientists to analyze data, design algorithms and develop embedded systems for wireless communications, industrial control, signal processing and other applications, often in concert with a companion Simulink tool that offers model-based design and simulation capabilities.

While Matlab isn't as widely used in data science applications as languages like Python, R and Julia, it does support machine learning and deep learning, predictive modeling, big data analytics, computer vision and other work done by data scientists. Data types and high-level functions built into the platform are designed to speed up exploratory data analysis and data preparation in analytics applications.

Considered relatively easy to learn and use, Matlab -- which is short for matrix laboratory -- includes prebuilt applications but also enables users to build their own. It also has a library of add-on toolboxes with discipline-specific software and hundreds of built-in functions, including the ability to visualize data in 2D and 3D plots.

Matplotlib is an open source Python plotting library that's used to read, import and visualize data in analytics applications. Data scientists and other users can create static, animated and interactive data visualizations with Matplotlib, using it in Python scripts, the Python and IPython shells, Jupyter Notebook, web application servers and various GUI toolkits.

The library's large code base can be challenging to master, but it's organized in a hierarchical structure that's designed to enable users to build visualizations mostly with high-level commands. The top component in the hierarchy is pyplot, a module that provides a "state-machine environment" and a set of simple plotting functions similar to the ones in Matlab.

First released in 2003, Matplotlib also includes an object-oriented interface that can be used together with pyplot or on its own; it supports low-level commands for more complex data plotting. The library is primarily focused on creating 2D visualizations but offers an add-on toolkit with 3D plotting features.

Short for Numerical Python, NumPy is an open source Python library that's used widely in scientific computing, engineering, and data science and machine learning applications. The library consists of multidimensional array objects and routines for processing those arrays to enable various mathematical and logic functions. It also supports linear algebra, random number generation and other operations.

One of NumPy's core components is the N-dimensional array, or ndarray, which represents a collection of items that are the same type and size. An associated data-type object describes the format of the data elements in an array. The same data can be shared by multiple ndarrays, and data changes made in one can be viewed in another.

NumPy was created in 2006 by combining and modifying elements of two earlier libraries. The NumPy website touts it as "the universal standard for working with numerical data in Python," and it is generally considered one of the most useful libraries for Python because of its numerous built-in functions. It's also known for its speed, partly resulting from the use of optimized C code at its core. In addition, various other Python libraries are built on top of NumPy.

Another popular open source Python library, pandas typically is used for data analysis and manipulation. Built on top of NumPy, it features two primary data structures: the Series one-dimensional array and the DataFrame, a two-dimensional structure for data manipulation with integrated indexing. Both can accept data from NumPy ndarrays and other inputs; a DataFrame can also incorporate multiple Series objects.

Created in 2008, pandas has built-in data visualization capabilities, exploratory data analysis functions and support for file formats and languages that include CSV, SQL, HTML and JSON. Additionally, it provides features such as intelligent data alignment, integrated handling of missing data, flexible reshaping and pivoting of data sets, data aggregation and transformation, and the ability to quickly merge and join data sets, according to the pandas website.

The developers of pandas say their goal is to make it "the fundamental high-level building block for doing practical,real-worlddata analysis in Python."Key code paths in pandas are written in C or the Cython superset of Python to optimize its performance, and the library can be used with various kinds of analytical and statistical data, including tabular, time series and labeled matrix data sets.

Python is the most widely used programming language for data science and machine learning and one of the most popular languages overall. The Python open source project's website describes it as "an interpreted, object-oriented, high-level programming language with dynamic semantics," as well as built-in data structures and dynamic typing and binding capabilities. The site also touts Python's simple syntax, saying it's easy to learn and its emphasis on readability reduces the cost of program maintenance.

The multipurpose language can be used for a wide range of tasks, including data analysis, data visualization, AI, natural language processing and robotic process automation. Developers can create web, mobile and desktop applications in Python, too. In addition to object-oriented programming, it supports procedural, functional and other types, plus extensions written in C or C++.

Python is used not only by data scientists, programmers and network engineers, but also by workers outside of computing disciplines, from accountants to mathematicians and scientists, who often are drawn to its user-friendly nature. Python 2.x and 3.x are both production-ready versions of the language, although support for the 2.x line ended in 2020.

An open source framework used to build and train deep learning models based on neural networks, PyTorch is touted by its proponents for supporting fast and flexible experimentation and a seamless transition to production deployment. The Python library was designed to be easier to use than Torch, a precursor machine learning framework that's based on the Lua programming language. PyTorch also provides more flexibility and speed than Torch, according to its creators.

First released publicly in 2017, PyTorch uses arraylike tensors to encode model inputs, outputs and parameters. Its tensors are similar to the multidimensional arrays supported by NumPy, but PyTorch adds built-in support for running models on GPUs. NumPy arrays can be converted into tensors for processing in PyTorch, and vice versa.

The library includes various functions and techniques, including an automatic differentiation package called torch.autograd and a module for building neural networks, plus a TorchServe tool for deploying PyTorch models and deployment support for iOS and Android devices. In addition to the primary Python API, PyTorch offers a C++ one that can be used as a separate front-end interface or to create extensions to Python applications.

The R programming language is an open source environment designed for statistical computing and graphics applications, as well as data manipulation, analysis and visualization. Many data scientists, academic researchers and statisticians use R to retrieve, cleanse, analyze and present data, making it one of the most popular languages for data science and advanced analytics.

The open source project is supported by The R Foundation, and thousands of user-created packages with libraries of code that enhance R's functionality are available -- for example, ggplot2, a well-known package for creating graphics that's part of a collection of R-based data science tools called tidyverse. In addition, multiple vendors offer integrated development environments and commercial code libraries for R.

R is an interpreted language, like Python, and has a reputation for being relatively intuitive. It was created in the 1990s as an alternative version of S, a statistical programming language that was developed in the 1970s; R's name is both a play on S and a reference to the first letter of the names of its two creators.

SAS is an integrated software suite for statistical analysis, advanced analytics, BI and data management. Developed and sold by software vendor SAS Institute Inc., the platform enables users to integrate, cleanse, prepare and manipulate data; then they can analyze it using different statistical and data science techniques. SAS can be used for various tasks, from basic BI and data visualization to risk management, operational analytics, data mining, predictive analytics and machine learning.

The development of SAS started in 1966 at North Carolina State University; use of the technology began to grow in the early 1970s, and SAS Institute was founded in 1976 as an independent company. The software was initially built for use by statisticians -- SAS was short for Statistical Analysis System. But, over time, it was expanded to include a broad set of functionality and became one of the most widely used analytics suites in both commercial enterprises and academia.

Development and marketing are now focused primarily on SAS Viya, a cloud-based version of the platform that was launched in 2016 and redesigned to be cloud-native in 2020.

Scikit-learn is an open source machine learning library for Python that's built on the SciPy and NumPy scientific computing libraries, plus Matplotlib for plotting data. It supports both supervised and unsupervised machine learning and includes numerous algorithms and models, called estimators in scikit-learn parlance. Additionally, it provides functionality for model fitting, selection and evaluation, and data preprocessing and transformation.

Initially called scikits.learn, the library started as a Google Summer of Code project in 2007, and the first public release became available in 2010. The first part of its name is short for SciPy toolkit and is also used by other SciPy add-on packages. Scikit-learn primarily works on numeric data that's stored in NumPy arrays or SciPy sparse matrices.

The library's suite of tools also enables various other tasks, such as data set loading and the creation of workflow pipelines that combine data transformer objects and estimators. But scikit-learn has some limits due to design constraints. For example, it doesn't support deep learning, reinforcement learning or GPUs, and the library's website says its developers "only consider well-established algorithms for inclusion."

SciPy is another open source Python library that supports scientific computing uses. Short for Scientific Python, it features a set of mathematical algorithms and high-level commands and classes for data manipulation and visualization. It includes more than a dozen subpackages that contain algorithms and utilities for functions such as data optimization, integration and interpolation, as well as algebraic equations, differential equations, image processing and statistics.

The SciPy library is built on top of NumPy and can operate on NumPy arrays. But SciPy delivers additional array computing tools and provides specialized data structures, including sparse matrices and k-dimensional trees, to extend beyond NumPy's capabilities.

SciPy actually predated NumPy: It was created in 2001 by combining different add-on modules built for the Numeric library that was one of NumPy's predecessors. Like NumPy, SciPy uses compiled code to optimize performance; in its case, most of the performance-critical parts of the library are written in C, C++ or Fortran.

TensorFlow is an open source machine learning platform developed by Google that's particularly popular for implementing deep learning neural networks. The platform takes inputs in the form of tensors that are akin to NumPy multidimensional arrays and then uses a graph structure to flow the data through a list of computational operations specified by developers. It also offers an eager execution programming environment that runs operations individually without graphs, which provides more flexibility for research and debugging machine learning models.

Google made TensorFlow open source in 2015, and Release 1.0.0 became available in 2017. TensorFlow uses Python as its core programming language and now incorporates the Keras high-level API for building and training models. Alternatively, a TensorFlow.js library enables model development in JavaScript, and custom operations -- or ops, for short -- can be built in C++.

The platform also includes a TensorFlow Extended module for end-to-end deployment of production machine learning pipelines, plus a TensorFlow Lite one for mobile and IoT devices. TensorFlow models can be trained and run on CPUs, GPUs and Google's special-purpose Tensor Processing Units.

Weka is an open source workbench that provides a collection of machine learning algorithms for use in data mining tasks. Weka's algorithms, called classifiers, can be applied directly to data sets without any programming via a GUI or a command-line interface that offers additional functionality; they can also be implemented through a Java API.

The workbench can be used for classification, clustering, regression, and association rule mining applications and also includes a set of data preprocessing and visualization tools. In addition, Weka supports integration with R, Python, Spark and other libraries like scikit-learn. For deep learning uses, an add-on package combines it with the Eclipse Deeplearning4j library.

Weka is free software licensed under the GNU General Public License. It was developed at the University of Waikato in New Zealand starting in 1992; an initial version was rewritten in Java to create the current workbench, which was first released in 1999. Weka stands for the Waikato Environment for Knowledge Analysis and is also the name of a flightless bird native to New Zealand that the technology's developers say has "an inquisitive nature."

Commercially licensed platforms that provide integrated functionality for machine learning, AI and other data science applications are also available from numerous software vendors. The product offerings are diverse -- they include machine learning operations hubs, automated machine learning platforms and full-function analytics suites, with some combining MLOps, AutoML and analytics capabilities. Many platforms incorporate some of the data science tools listed above.

Matlab and SAS can also be counted among the data science platforms. Other prominent platform options for data science teams include the following technologies:

Some platforms are also available in free open source or community editions -- examples include Dataiku and H2O. Knime combines an open source analytics platform with a commercial Knime Hub software package that supports team-based collaboration and workflow automation, deployment and management.

Mary K. Pratt is an award-winning freelance journalist with a focus on covering enterprise IT and cybersecurity management.

Editor's note: This unranked list of data science tools is based on web research from sources such as Capterra, G2 and Gartner as well as vendor websites.

Excerpt from:

18 Data Science Tools to Consider Using in 2024 - TechTarget

Read More..

Growing Significance of Data Science in the Logistics Industry – CIO Applications

Data logistics professionals are streamlining operations and paving the way for a more resilient and customer-focused future in the field of logistics.

FREMONT, CA:Technological advancements and rapid digital transformation have transformed industries across the board and are harnessing the power of data to optimize operations and gain competitive advantages. The logistics sector stands out as a prime beneficiary of the burgeoning field of Data Science. The need for data-driven decision-making has never been greater as supply chains become more complex and globalized. The mounting importance of data science in logistics is transforming the movement, storage, and management of goods.

The applications of Data Science in logistics are vast, from demand forecasting and route optimization to risk management and customer-centric solutions. Traditionally, logistics management relied heavily on experience and intuition. With the advent of Data Science, this paradigm has shifted towards a more analytical and data-centric approach. Advanced algorithms and machine learning models are now employed to process vast amounts of data generated by various facets of the supply chain, including transportation, inventory management, demand forecasting, and route optimization.

Enhanced demand forecasting

Effective logistics management requires accurate demand forecasting. By leveraging historical data, market trends, and other relevant variables, Data Science empowers logistics professionals to make precise predictions about future demand patterns. It enables them to optimize inventory levels, reduce excess stock, and meet customer expectations with greater precision.

Optimal route planning and fleet management

Efficient transportation is fundamental to the success of any logistics operation. Data Science plays a pivotal role in this aspect by optimizing route planning and fleet management. Algorithms can optimize routes based on traffic conditions, weather forecasts, and real-time updates. Predictive maintenance models help prevent unexpected breakdowns, ensuring that fleets operate at peak efficiency.

Inventory optimization

It is a delicate balance to maintain the right level of inventory. Too much can lead to excessive carrying costs, while too little can result in stockouts and missed opportunities. Data Science algorithms analyze historical sales data, supplier lead times, and seasonal trends to determine the optimal inventory levels for each product. It reduces holding costs and ensures that products are readily available when customers demand them.

Warehouse efficiency and layout design

The layout and operations of warehouses directly affect order fulfillment speed and accuracy. Data-driven insights are employed to design efficient warehouse layouts, minimizing travel distances and maximizing storage capacity. Predictive analytics can anticipate spikes in demand, enabling proactive adjustments to staffing levels and workflows.

Risk management and resilience

The logistics industry is no stranger to disruptions, whether they be natural disasters, geopolitical events, or global health crises. Data Science equips logistics professionals with the tools to assess and mitigate risks effectively. Businesses can ensure business continuity despite adversity by analyzing historical data and employing predictive modeling.

Customer-centric solutions

Data Science enables logistics providers to offer personalized and responsive services. By analyzing customer preferences, order histories, and feedback, businesses can tailor their services accordingly. It enhances customer satisfaction, fosters brand loyalty, and generates positive word-of-mouth.

Original post:

Growing Significance of Data Science in the Logistics Industry - CIO Applications

Read More..