Back

AI Algorithm Essentials

Explore essential concepts from algorithm selection to performance, drawing insights from key texts in data science and machine learning.

In sequence with our previous article, we carry forward with more insights on algorithmic essentials in this piece.

Algorithm Selection Cheat Sheet

A cheat sheet will guide you through the process of selecting the best algorithm for your specific needs. 

Many might be in the loop, yet for all those who are out of the loop, firstly, let’s get on the same page and understand what a cheat sheet is.

A cheat sheet, or crib sheet, is a quick rundown of notes for lightning-fast reference.

It covers the whole nine yards, including big O notation, arrays, linked lists, hash tables, search and sorting algorithms, and other algorithms and data structures.

  1. Understanding Your Problem

Renowned data scientists Foster Provost and Tom Fawcett’s book, Data Science for Commercials, teaches the principles of “data-analytic thinking” to help you extract insightful information and use data analysis to drive business success.

This book provides a comprehensive explanation of modern data-mining methods.

Data Science for Business showcases these concepts through practical business scenarios from Provost’s extensive MBA course at New York University.

You’ll gain valuable insights on enhancing stakeholder-data scientist communication and effectively engaging in your company’s data science initiatives.

You’ll attain a deep understanding of data-analytic thinking and how data science can greatly enhance commercial decision-making.

Master the art of leveraging data science to gain a competitive edge in your organization.

View data as a valuable asset that requires strategic investment to maximize its value. Efficiently extract valuable data by employing data mining techniques to analyze company challenges. Utilize data science concepts when interviewing applicants to gather knowledge from interview data.

2-Data Characteristics

In “Hidden Technical Debt in Machine Learning Systems,” D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, and Dan Dennison discuss how machine learning can help rapidly develop complex prediction systems.

This essay warns against the myth that rapid wins are free.

Understanding technical debt in software engineering shows that real-world ML systems need considerable continuous maintenance.

System design should address several machine-learning-specific risks.

Complex systems may have boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, external changes, and system-level anti-patterns.

The machine learning (ML) community has seen a disturbing trend: creating and deploying ML systems is quick and cheap but sustaining them is difficult and costly.

Ward Cunningham’s 1992 metaphor for technical debt, which illustrates the long-term effects of rapid software engineering advancement, illuminates this contradiction.

Technological debt, like fiscal debt, has strategic benefits.

While not all debt is terrible, it is imperative to pay it off.

One may reduce technical debt by refactoring, improving unit tests, removing dead code, minimizing dependencies, optimizing APIs, and increasing documentation.

Instead of adding features, the goal is to assure future additions, reduce mistakes, and increase maintainability.

Payment delays might increase costs. The silent accumulation of debt may be disastrous.

ML systems may accrue technical debt owing to traditional programming and ML-specific difficulties. System-level debt is harder to find than code-level debt.

Data’s effect on ML systems might silently challenge standard abstractions and limits. Machine learning presents distinct system-level difficulties that traditional code-level technical debt solutions cannot solve.

This work emphasizes practical factors above the ML technique introduction by highlighting challenging real-world judgments.

System-level interactions and interfaces are where ML technical debt may quickly accrue. ML models may gently remove system abstraction barriers. It’s dangerous to reuse or chain input signals since they might link systems unintentionally.

ML packages may be opaque, requiring extra code or calibration layers to meet assumptions.

External variables might affect system behaviour unintentionally. Without proper preparation, ML system monitoring is difficult.

3-Algorithm Type

As per MIT Press’s “Alpaydin, E. (2020). “Introduction to Machine Learning” (4th Ed.), we are currently in the era of “big data.”

Previously, corporations were the sole owners of data.

Computer centres possessed a high level of expertise in storing and processing data. Thanks to personal computers and wireless connections, we have become proficient at generating data.

We generate data every time we make a purchase, rent a movie, browse a website, write a blog, share on social media, or even just go for a walk or drive.

We all generate and consume data. It also indicates our preference for personalized goods and services. We would like to have our demands and interests accurately predicted.

Imagine a grocery chain that effortlessly sells thousands of products to millions of consumers through hundreds of outlets nationwide or online.

Transaction information typically includes the date, client ID, products purchased, quantity, total money spent, and other relevant details.

This produces a substantial amount of data on a daily basis. The grocery company strives to predict customer preferences in order to maximize sales and profitability. Every consumer desire the best products to meet their needs.

This work lacks clarity. It is uncertain whether a particular individual will choose to buy a specific ice cream flavour, the author’s upcoming book, a movie, visit a city, or click on a link.

Understanding customer behaviour requires expertise in recognizing the different factors that influence it, such as time and location.

We understand that it is not a matter of chance. People don’t shop at supermarkets haphazardly. They expertly purchase chips with beer, ice cream in summer, and Glühwein spices in winter.

The data clearly reveals patterns. We require an algorithm to effectively tackle computer issues. An algorithm is a precise set of instructions that expertly converts input into output.

Sorting algorithms can be created. Integers are provided as input, and an ordered list is generated as output.

We strive to find the most efficient algorithm for a given task, minimizing the need for extensive instructions or memory usage.

Supervised learning involves the process of mapping input to output by utilizing values provided by a supervisor.

Unsupervised learning utilizes input data without any guidance or supervision – to uncover patterns in the input. We need to analyse the structure and frequencies of the input space to determine what is present and what is not. 

Certain programs result in a sequence of activities.Executing a single action may not be the determining factor, but rather the overall strategy of implementing a series of appropriate activities to successfully reach the desired goal.

There is no perfect course of action in any intermediate condition; a well-informed decision is a crucial component of a sound policy.

When it comes to policy generation, it’s important for the machine learning algorithm to assess the quality of policies and gain insights from past successful action sequences. Reinforcement learning commonly uses these algorithms.

4-Performance Metrics

The authors of “An Introduction to Statistical Learning,” James, Witten, Hastie, & Tibshirani (2013), provide a comprehensive overview of statistical learning, which encompasses a wide range of data analysis techniques.

You can monitor these tools or use them unsupervised. Supervised statistical learning involves the creation of a statistical model that can accurately estimate an output based on one or more inputs.

These issues are prevalent in various fields such as business, health, astrophysics, and public policy. Unsupervised statistical learning enables us to acquire correlations and structures from data without any guidance.

In this book, they provide a concise overview of three real-world data sets to showcase the practical applications of statistical learning.

Await our next article, which will be in sequence with this piece, to gain more information on the essentials of AI algorithms and ML.

Leave A Reply

Your email address will not be published. Required fields are marked *