Artificial Intelligence Explained
Artificial intelligence (AI) refers to the theory and development of computer systems that simulate human intelligence to make decisions and perform tasks.
Artificial Intelligence (AI) or machine intelligence (MI) is defined by Techopedia as “a branch of computer science that focuses on building and managing technology that can learn to autonomously make decisions and carry out actions on behalf of a human being”.1 However, this definition is far too general and cannot be used as a blanket definition for understanding what AI technology encompasses. AI isn’t one type of technology, it's a broad term that can be applied to a myriad of hardware or software technologies which are often leveraged in support of machine learning (ML), natural language processing (NLP), natural language understanding (NLU), and computer vision (CV).
Oracle Cloud Infrastructure states that “In the simplest terms, AI [refers] to systems or machines that mimic human intelligence to perform tasks and can iteratively improve themselves based on the information they collect”2 and that “AI manifests in a number of forms”.2
Compared to Techopedia’s and Oracle’s broad and general definitions, IBM’s definition of AI is more specific. IBM states that AI “leverages computers and machines to mimic the problem-solving and decision-making capabilities of the human mind”.3 Focusing on these ideas of “creating intelligence” and of “machines understanding human intelligence”, IBM continues with their definition of AI by citing John McCarthy’s eponymous article, What is Artificial Intelligence?. McCarthy states that “[AI] is science and engineering of making intelligent machines, especially intelligent computer programs. [AI is similar to] using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable".4
Stretching back about 55 years before McCarthy’s paper, IBM expands their exploration of AI and references mathematician, cryptanalyst, and the iconic “father of computer science” Alan Turing. Founder of the eponymous Turing test that’s meant to distinguish between a human and a computer, Turing’s 1950 paper Computing Machinery and Intelligence helped lay the foundation for modern computer science and AI. It’s arguably the first paper to pose the question “Can machines think?”4
SAS notes that “[the 1950’s initial] AI research [explored] problem solving and symbolic methods [and the 1960’s research explored] training computers to mimic basic human reasoning”.6 This early work paved the way for the automation and formal reasoning that we see in computers today, including decision support systems and smart search systems that can be designed to complement and augment human abilities.
In comparison to Turing, IBM presents Stuart Russell and Peter Norvig’s textbook Artificial Intelligence: A Modern Approach as a more recent analog to Turing’s AI philosophy. Originally published in 1994, and with its most recent edition was published in April 2020, Artificial Intelligence: A Modern Approach has become one of the most popular resources for studying AI.
In this textbook, Russell and Norvig explore the different versions of AI, stating that “[Some versions] have defined intelligence in terms of fidelity to human performance, while others prefer an abstract, formal definition of intelligence called rationality” 7 and defining rationality as “doing the right thing”.7 Continuing their exploration of AI, the authors note that intelligence could also “be a property of internal thought processes and reasoning, [or] intelligent behavior".7
Russell and Norvig break these lines of thought into the two dimensions or “human vs. rational and thought vs. behavior”.7 With four potential combinations for exploring AI, IBM presents these approaches as the following:
Human Approach:
Ideal Approach:
Alan Turing asked, “Can a machine think”?5 IBM argues that Turing’s definition of AI would likely use the Human Approach of “systems that act like humans”.4
Russell and Norvig argue that from the two dimensions of “human vs. rational and thought vs. behavior [that] there are four possible combinations [and] there have been adherents and research programs for all four”.7
Techopedia speculates that the next generation of AI is “expected to inspire new types of brain-inspired circuits and architectures that can make data-driven decisions faster and more accurately than a human being can”.1
Far before Alan Turing, most examples of AI were found in myth or works of literature. Greek mythology contains what is probably the foremost example of AI. In the book, Machines Who Think, author Pamela McCorduck notes that crippled Hephaestus, the god of fire and the forge “[fashioned] attendants to help him walk and assist in his forge”8. For a description of Hephaestus’ automatons, McCorduck cites The Iliad, where Homer described the automatons as follows:
“These are golden, and in appearance like living young women. There
is intelligence in their hearts, and there is speech in them and strength,
and from the immortal gods they have learned how to do things”9.
Perhaps the most well-known automaton in Greek myth was Talos. A giant bronze automaton that’s sometimes depicted as a man or a “bronze bull or a man with a bull’s head”,10 Talos guarded the island Crete from invaders. Similar to Achilles’ weakness, Talos’ vulnerability was found in his ankle.
If Turing is considered “the father of computer science,” then Charles Babbage is “the father of the computer”.11 Babbage invented the Difference Engine and its successor, the Analytical Engine. Designed to calculate tables of numbers including logarithms, the Analytical Engine contained “a central processing unit and memory and would have been programmed with punch cards”12, making it the progenitor of general-purpose computers.
Arguably the next most iconic automaton comes from Mary Shelley’s 1818 genre defining science fiction and gothic novel, Frankenstein. In the article The Link Between Mary Shelley’s Frankenstein and AI, author Charlotte Mckee writes that “There is an indisputable link between Victor Frankenstein’s creation and Artificial Intelligence”.12 Mckee notes that “The questions Shelley raises about a man-made being are relevant in the creation of AI [and explore] the possibilities of Artificial General Intelligence [and the] many ethical concerns that link Frankenstein and AI too”13.
Frankenstein’s creation isn’t created to be good or evil. Throughout the novel, it demonstrates a keen mind and articulation, teaching itself to both read and speak, and experiences the human emotions of happiness, anger, and desire. Mckee argues that it “is the very embodiment of machine intelligence. He learns like an algorithm throughout the novel”.13
At the turn of the 20th century, the concept of AI, or at least of rogue machines, was spread from the stage and the silver screen. Outside of early adaptations of Frankenstein, two of the most well-known and influential instances of AI in media, notably a play and a silent film, consisted of the following examples:
It is impossible to overstate the influence that Alan Turing has had on science, let alone computer science or AI. Turing’s work as a codebreaker striving against Enigma, the German military's cipher machine, was instrumental in decrypting Nazi Germany’s encrypted communications. The BBC estimates that Turing helped bring WWII to an end faster and saved the lives of between “14 to 21 million people”.16
The Turing test is arguably one of the pillars of AI. Initially referred to as the Imitation Game5 in Computing Machinery and Intelligence, the Turing test is a means for determining if a computer (or any machine) is intelligent and can think.
Turing argued that a human interviewer could evaluate the conversation between a human and a machine, such as a computer, attempting to make human-like conversation. Knowing that one of the subjects was a machine imitating human speech, the human interviewer would separate the two subjects from each other then ask each subject questions and record their answers. Each subject’s response to the interviewer would be written out or typed, so the conversation wouldn’t rely on how well the machine could articulate words as human speech.
If the interviewer can’t reliably determine the human from the machine, then the machine would pass the test. The machine’s answers wouldn’t have to be correct answers per se, only answers that a human might give.5
It’s not uncommon for people today to experience a reverse Turing test – a test where the subjects must prove that they are human and not a computer. Completely Automated Public Turing Test To Tell Computers And Humans Apart (CAPTCHA) are probably the easiest to recognize of any reverse Turing tests.17 Common CAPTCHA examples include the following:
Despite all modern technological advancement, as of June 2022 (the time of writing), no AI has successfully passed the Turing test.19 Despite this fact, AI’s failure to pass the Turing test has not become a testament against the idea of thinking, intelligent machines, and the possibility that one day machines could pass the test.
Noel Sharkey, professor of artificial intelligence and robotics at the University of Sheffield, argues that “Despite the failure of machines to deceive us into believing they are human, Turing would be excited by the remarkable progress of AI”.19 Prof. Sharkey continues, imagining that Turing “would have danced for joy when Deep Blue defeated world champion Gary Kasparov at chess [or when IBM’s] Watson beat the two best human opponents in the history of the American game [Jeopardy]”.20 Prof. Sharkey concludes that “the Turing Test remains a useful way to chart the progress of AI and I believe that humans will be discussing it for centuries to come”.20
AI is generally broken down into categories based on “the degree [that] an AI system can replicate human capabilities”.21 While an AI still hasn’t passed the Turing test, how proficient they are at performing human functions helps with their classification and creates a juxtaposition between the simpler and “less-evolved type[s]”21 of AIs and the more evolved types capable of demonstrating human-like functions and proficiency.
Forbes states that there are two broad classifications or “types” for AI based of their capabilities and functionality, specifically “classifying AI and AI-enabled machines based on their likeness to the human mind, and their ability to “think” and perhaps even “feel” like humans”.21 Java T Point states that these seven types of AI are divided into two groups: a Type 1 group and a Type 2 group.
Techopedia notes that with the rising demands for lightning-fast information processing, today’s “digital processing hardware cannot keep [pace]”.1 To keep up with the needs of tomorrow, researchers and developers are “taking inspiration from the brain and considering alternative architectures [of] artificial neurons and synapses [processing] information with high speed and adaptive learning capabilities in an energy-efficient, scalable manner”.1 The Type 1 group consists of “evolving stages of AI”1 sorted together by their intelligence capabilities and includes of the following examples:
The Type 2 group consist of AI sorted together by their functionality and consists of the following examples:
The evolving stages of AI move from one stage to the next based on the demand for faster, smarter, and more efficient information processing. The philosopher John R. Searle is considered to have coined the terms “weak vs. strong AI” in his 1980 article, Minds, Brains, and Programs where he stated “I find it useful to distinguish what I will call "strong" AI from "weak" or "cautious" AI”.23
It’s easy to differentiate the “weak to strong” performance levels of narrow AI from general AI and super AI. Great Learning notes that weak AIs are “narrower applications with limited scopes [that are only] good at specific tasks [and that use] supervised and unsupervised learning to process data”.24 Inversely, strong AIs use “a wider application with a [broader] scope, [have] an incredible human-level intelligence, [and] uses clustering and association to process data”.24
The first stage of evolving AI and easily the most common and available category of capability-based AI, ANI represents all AI that exists today and that has ever existed. Apple’s Siri and Amazon’s Alexa are all examples of narrow AI that you might see or even use as you go about your day-to-day life. Common pre-determined functions for narrow AI include “speech recognition, natural language processing, computer vision, machine learning, and expert systems”1.
Outside of simply being an AI that exists today, the requirements for narrow AI are simple: perform a specific task using human-like intelligence. The “narrow” in narrow AI refers to the AI’s limited, usually pre-defined range of capabilities. These AI are often created with a single dedicated task in mind and are unable to perform tasks outside of their limitations or programming.
IBM’s Watson and other supercomputer AIs are still considered narrow AI. Despite or because of how they use expert systems approaches, ML, and natural language progressions, “these systems correspond to all the reactive and limited memory AI”.21 Forbes goes on to state that “Even the most complex AI that uses machine learning and deep learning to teach itself falls under ANI”.21
The second stage of evolving AI, AGI refers to an AI that could “learn, perceive, understand, and function completely like a human being”.21 An AGI system could independently construct different competencies and develop domain-spanning connections and competencies. This ability would “[reduce the] time needed for training [and] make AI systems just as capable as humans by replicating our multi-functional capabilities”.21 A general AI is the dream that one day a computer could be as smart and as capable of performing the same intellectual tasks as a human and have the “[equivalent of] the human mind’s ability to function autonomously according to a wide set of stimuli”.1
At time of writing, while it’s possible for them to be hidden in development, there are no known general AI systems that exist today.
The third and final stage of evolving AI, ASI refers to the hypothetical concept of an AI that exceeds human intelligence. A super AI would contain many of the “key characteristics of strong AI, [including the] capability to think [and] to reason, [to] make judgments, plan, learn, and communicate [independently]”.22 True to its name, an ASI could exceed human intelligence and yield an intellect that’s greater than the best human minds in virtually every field.
The first of the functionality-specific types of AI, reactive AI uses real-time data to make decisions. “One of the oldest forms of AI systems [with an] extremely limited capacity [reactive AI lacks] memory-based functionality [and can’t] use [their] experiences [or memory] to inform their present actions”.21 This AI lacks the capability to learn the same way that a person does and can only respond to a previously defined inputs or conditions. “IBM’s Deep Blue, a machine that beat chess Grandmaster Garry Kasparov in 1997”21 is a popular example of a reactive AI.
The second of the functionality-specific types of AI, limited memory AI leverages data stored from past experiences for decision-making. This AI features the capabilities of reactive machines, is often capable of storing data for a limited timeframe and demonstrating learning from historical data.
Limited memory AI is present in most AI systems and apps today, particularly those that use deep learning and are “trained by large volumes of training data that they store in their memory to form a reference model for solving future problems”.21
Self-driving cars are a popular example of limited memory AI. Edureka notes that today’s self-driving cars “use sensors to identify civilians crossing the road, steep roads, traffic signals [and similar road navigation information] to make better driving decisions [and leveraging its learning experience] helps to prevent any future accidents”.25
The third of the functionality-specific types of AI, theory of mind AI incorporates user intent and similar subjective elements into its decision making. Unlike reactive AI and limited memory AI, theory of mind AI is currently in its early conceptual phases of development or possibly early development. Forbes describes theory of mind AI as “the next level of AI systems”21 that’s capable of understanding human emotions, beliefs, social cues, and thought process and “discerning [an entity’s] needs”.21 Forbes expands on theory of mind AI, noting that “to truly understand human needs, AI machines [must] perceive humans as individuals whose minds can be shaped by multiple factors”.21
The fourth and final AI of the functionality-specific types of AI, self-aware AI features a consciousness similar to a human mind and the ability to create goals and make data-driven decisions. As of the time of writing, self-aware AI is currently only a hypothetical idea, a concept that is potentially the final goal of AI research. Forbes notes that a self-aware AI would “be able to understand and evoke emotions in [others and] also have emotions, needs, beliefs, and potentially desires of its own”.21 Hypothetical or otherwise, self-aware AI can be readily found throughout much of science fiction and similar popular culture.
Techopedia offers up a fun example for distinguishing the different functionality-specific types of AI, prompting the reader to visualize each AI player in a poker game.
Generative AI can be broadly described as a AI that can be used to create new text, video, images, audio, synthetic data or code. The concept of generative AI is often associated with applications such as ChatGPT and Midjourney, and with deep fakes.
When it comes to discussing AI, deep learning and ML are often confused and conflated, and it’s not hard to see why. Both are subsets of AI that focus on completing tasks or goals. Examples of both deep learning and ML can be easily found today, from self-driving cars to facial recognition software. Despite their common interchangeability, there is much that distinguishes deep learning from ML, and vice versa.
A subfield of machine learning, Techopedia defines deep learning as “an iterative approach to artificial intelligence (AI) that stacks ML algorithms in a hierarchy of increasing complexity and abstraction”26 and notes “Each deep learning level is created with knowledge gained from the preceding layer of the hierarchy”.26
Author Michael Middleton on the Flatiron School blog post Deep Learning vs. Machine Learning — What’s the Difference? states that “Deep learning models introduce an extremely sophisticated approach to ML and are set to [perform some complex tasks] because they've been specifically modeled after the human brain”.27 Middleton continues the human brain comparison noting that “Complex, multi-layered ‘deep neural networks’ are built to allow data to be passed between nodes (like neurons) in highly connected ways [resulting in] a non-linear transformation of the data that is increasingly abstract”.27
IBM notes that the word “deep” in deep learning regards “a neural network comprised of more than three layers—which would be inclusive of the inputs and the output—can be considered a deep learning algorithm”.3 Middleton notes that although “it takes tremendous volumes of data to ‘feed and build’ [a deep neural network], it can begin to generate immediate results, and there is relatively little need for human intervention once the programs are in place”.27
On the identically titled Levity blog post Deep Learning vs. Machine Learning – What’s The Difference?, author Arne Wolfewicz states that in addition to analyzing data like a human mind, deep learning algorithms can perform their analysis “through supervised and unsupervised learning [and] use a layered structure of algorithms called an artificial neural network (ANN)”.28 This ANN is “inspired by the biological neural network of the human brain, leading to a process of learning that’s far more capable than that of standard machine learning models”.28
Author Patrick Grieve on the also identically titled Zendesk blog post Deep learning vs. machine learning: What’s the difference? regards the difficulty of deep learning models and incorrect conclusions. Grieve states that “like other examples of AI, [a deep learning model] requires lots of training to get the learning processes correct. But when it works as it’s intended, functional deep learning is often received as a scientific marvel that many consider to be the backbone of true artificial intelligence”.29
Simplilearn author of the article Top 10 Deep Learning Algorithms You Should Know in 2022 Avijeet Biswal notes that while deep learning algorithms often use self-learning features, “[these algorithms] depend upon ANNs that mirror the way the brain computes information”.30 Biswal continues that “during [their] training process, algorithms use unknown elements in the input distribution to extract features, group objects, and discover useful data patterns [and] this [training process] occurs at multiple levels, using the algorithms to build the models”.30
The idea that deep learning is the most complex AI that’s widely used today isn’t hyperbole. Deep learning models employ a variety of different learning models for certain tasks. Avijeet Biswal states that the following are the top 10 most popular deep learning algorithms:
CNNs consist of multiple layers “that process and extract features from data”30 and as an algorithm that “can assign weights and biases to different objects in an image and differentiate one object in the image from another”26, including “[identifying] satellite images, [processing] medical images, [forecasting] time series, and [detecting] anomalies”.30 Biswal notes that “Yann LeCun developed the first CNN in 1988 when it was called LeNet [and it] was used for recognizing characters like ZIP codes and digits”.30
The following are the four layers that CNNs leverage when they process and extract features from data:
LSTMs are a type of RNN that are designed around learning and retaining information and then recalling that previously learned information. They are capable of learning and similarly demonstrating long term memory retention (à la long-term dependencies). Techopedia notes that LSTMs “can learn order dependence in sequence prediction problems”26 and that they’re often “used in machine translation and language modeling”.26 Biswal notes that LSTMs “are useful in time-series [predictions] because they remember previous inputs [and they] are typically used for speech recognition, music composition, and pharmaceutical development”.30
LSTMs work the way they do because of a linking structure where three layers or gates communicate. Rian Dolphin of the article LSTM Networks | A Detailed Explanation notes that these gates are the “forget gate, input gate and output gate”.31
Here is an example of a LSTM workflow:
RNNs are algorithms capable of remembering sequential data and feature “connections that form directed cycles [that] allow the outputs from the LSTM to be fed as inputs to the current phase [and capable of memorizing] previous inputs due to its internal memory”.31 RNNs are often used for “speech recognition, voice recognition, time series prediction [and analysis] and natural language processing”26 and “image captioning, handwriting recognition, and machine translation”30.
GANs are composed of two algorithms that compete against one another to produce new data. Biswal expands on this description, stating that each “GAN has two components: a generator model [that] learns to generate fake data, and a discriminator model [that] learns from that false information”.30
GANs are often used in “digital photo restoration and deepfake video”26 and other programs to “help generate realistic images and cartoon characters, create photographs of human faces, and render 3D objects”30. GANs help video game developers to “upscale low-resolution, 2D textures in old video games by recreating them in 4K or higher resolutions via image training”.30
Here is an example of a GAN workflow:
Techopedia describes RBFNs as “a type of supervised [ANN] that uses supervised machine learning to function as a nonlinear classifier, [a nonlinear function that uses] sophisticated functions to go further in analysis than simple linear classifiers that work on lower-dimensional vectors”.32 Biswal describes RBFNs a little differently, stating they are “special types of feedforward neural networks that use radial basis functions as activation functions [and include] an input layer, a hidden layer, and an output layer and are mostly used for classification, regression, and time-series prediction”.30
RBFNs work because of the following flows and features:
Techopedia describes MPLs as “a feedforward [ANN] that generates a set of outputs form a set of inputs [and] is characterized by several layers of input nodes connected as a directed graph between the input and output layers”.33 Biswal notes that MLPs “have the same number of input and output layers but may have multiple hidden layers”.30 Techopedia notes that MLPs often feature “several layers of input nodes connected as a directed graph between the input and output layers [meaning] that the signal path through the nodes only goes one way. Each node, apart from the input nodes, has a nonlinear activation function”.33
MLPs leverage “backpropagation as a supervised learning technique for training the network [and are] widely used for solving problems that require supervised learning [and] research into computational neuroscience and parallel distributed processing”.33 MPLs are often used in applications for speech recognition, image recognition and machine translation.
Here's an example of an MPL workflow:
Invented by professor Teuvo Kohonen and sometimes referred to as self-organizing feature map (SOFM) or a Kohonen map, SOMs are described by Techopedia as “a type of [ANN] that uses unsupervised learning to build a two-dimensional map of a problem space, [where] the problem space can be anything from votes in U.S. Congress, maps of colors and even links between Wikipedia articles.”34
Phrased another way, SOMs “enable data visualization to reduce the dimensions of data through self-organizing [ANNs]”.30 SOMs leverage data visualization by “[generating] a visual representation of data on a hexagonal or rectangular grid”34. This is principally done “to solve the problem that humans cannot easily visualize high-dimensional data [and] to help users understand this high-dimensional information”.30
Specifically, SOMs try to “mirror the way the visual cortex in the human brain sees objects using signals generated by the optic nerves, [making] all the nodes in the network respond differently to different inputs”.34
While MLPs use backpropagation for supervised learning, SOMs leverage “competitive learning where the nodes eventually specialize rather than error-correction learning, such as backpropagation with gradient descent”.34 SOMs differ from “supervised learning or error-correction learning, but without using error or reward signals to train an algorithm, [making them] a kind of unsupervised learning”.34
Techopedia notes that when SOMS are fed input data, they compute “the Euclidean distance or the straight-line distance between the nodes, which are given a weight”.34 The best matching unit (BMU) refers to the network’s node that’s most alike the input data.
As a SOM “[advances] through the problem set, the weights start to look more like the actual data [and the SOM] has trained itself to see patterns in the data [similar to how] a human [perceives these patterns]”.34
SOMs are commonly used for applications involving “meteorology, oceanography, project prioritization, and oil and gas exploration”.34
Here's an example of an SOM workflow:
In 1985, Geoffrey Hinton co-created the RBM with David Ackley and Terry Sejnowski35. Techopedia describes RBMs as a “type of generative network”36 commonly used for [collaborative] filtering, feature learning and classification that leverages types of dimensionality reduction to help tackle complicated inputs”.36 Biswal adds that RBMs are also used for “dimensionality reduction, regression, and topic modeling”30 and that “RBMs constitute the building blocks of DBNs”.30 RBMs are notably used for creating DBNs and similarly sophisticated models by stacking individual RBMs together.
RBMs get their name due to there being “no communication between layers in the model, which is the ‘restriction’ of the model”36 and that an RMB’s nodes “make ‘stochastic’ [or random] decisions”.36 Because of this random process, RBMs are sometimes labeled as “stochastic neural networks”.30
Biswal notes that RBMs include two layers: one with visible units and another with hidden units and that “each visible unit is connected to all hidden units [and] RBMs have a bias unit that is connected to all the visible units and the hidden units, and they have no output nodes”.30
A RBMs workflow involves two phases: “forward pass and backward pass”.30
In the forward pass phase, the RBM...
In the backward pass phase, the RBM...
A complex type of generative neural network (GNN), Techopedia defines DBNs as “an unsupervised deep learning algorithm [where] each layer has two purposes: it functions as a hidden layer for what came before and a visible layer for what comes next”.26
Biswal notes that DBNs are “generative models that consist of multiple layers of stochastic, latent variables [that] have binary values and are often called hidden units”.30
Collectively described by both Biswal and Techopedia as a group of RBNs that “are composed of various smaller unsupervised neural networks”37. Techopedia describes DBNs as having “connections between layers [and] each RBM layer communicates with both the previous and subsequent layers”.37 While these layers are connected, “the network does not include connections between unites in a single layer”.37
One of the common features of a deep belief network is that although layers have connections between them, the network does not include connections between units in a single layer.
Per Techopedia, ML and neural network design pioneer Geoffrey Hinton “characterizes stacked RBMs as providing a system that can be trained in a "greedy" manner and describes deep belief networks as models ‘that extract a deep hierarchical representation of training data’”.37 Specifically, “the greedy learning algorithm uses a layer-by-layer approach for learning the top-down, generative weights”.30 Biswal notes that “DBNs learn that the values of the latent variables in every layer can be inferred by a single, bottom-up pass”.30
This greedy unsupervised ML model displays “how engineers can pursue less structured, more rugged systems where there is [less] data labeling and the technology has to assemble results based on random inputs and iterative processes”.37
As part of the DBN process, they “run the steps of Gibbs sampling on the top two hidden layers, [drawing] a sample from the RBM defined by the top two hidden layers [then drawing] a sample from the visible units using a single pass of ancestral sampling through the rest of the model”.30
DBNs are commonly used for applications involving “image-recognition, video-recognition, and motion-capture data”30 and by “healthcare sectors for cancer and other disease detection”26.
Also known as autoassociator and Diabolo network, AEs are a “unsupervised [ANN] that provides compression and other functionality, [leveraging] a feedforward approach to reconstitute an output from an input”38 and where “the input and output are identical”.30
Developed by Geoffrey Hinton39 to solve unsupervised learning problems, AEs work by first compressing the input and then sending it to be decompressed as an output. This decompressed output is frequently similar to the original input. Techopedia notes that this process exemplifies “the nature of an autoencoder – that the similar inputs and outputs get measured and compared for execution results”.38
Autoencoders have two main parts: an encoder and a decoder. The encoder maps the input into code and the decoder maps the code to a reconstruction of the input. The code is sometimes considered a third part as “the original data goes into a coded result, and the subsequent layers of the network expand it into a finished output”.38
A “denoising” AE is a useful tool for understanding AEs, with Techopedia stating that a denoising AE “uses original inputs along with a noisy input, to refine the output and rebuild something representing the original set of inputs”.39
AEs are commonly used for applications involving image processing, and “pharmaceutical discovery [and] popularity prediction”.30
Here's an example of an AE workflow:
Techopedia describes ML as a sub-topic of AI “that focuses on building algorithmic models that can identify patterns and relationships in data, [contextualizing] the word machine [as] a synonym for computer program and the word learning [for] how ML algorithms will automatically become more accurate as they receive additional data”.40
Patrick Grieve defines machine learning as “An application of [AI] that includes algorithms that parse data, learn from that data, and then apply what they’ve learned to make informed decisions”.29 Arne Wolfewicz defines machine learning more simply as “the general term for when computers learn from data”28 and as “the intersect of computer science and statistics where algorithms are used to perform a specific task without being explicitly programmed [and] instead, they recognize patterns in the data and make predictions once new data arrives”.28
The term “machine learning” was first coined in 1959 by American IBMer and AI and computer gaming pioneer Arthur Samuel in his paper Some Studies in Machine Learning Using the Game of Checkers.41 Much the same with AI, the idea of ML was not a new one. Techopedia notes that ML’s “practical application in business was not financially feasible until the advent of the internet and recent advances in big data analytics and cloud computing [and] because training an ML algorithm to find patterns in data requires extremely large data sets”.40
Wolfewicz notes that “the learning process of these algorithms can either be supervised or unsupervised, depending on the data being used to feed the algorithms”28 He elaborates providing this example of machine learning:
“A traditional machine learning algorithm can be something as simple as linear regression, [so] imagine you want to predict your income given your years of higher education. [First], you have to define a function, e.g. income = y + x * years of education. Then, give your algorithm a set of training data [such as] a simple table with data on some people’s years of higher education and their associated income. Next, let your algorithm draw the line, e.g. through an ordinary least squares (OLS) regression. Now, you can give the algorithm some test data, e.g. your personal years of higher education, and let it predict your income”.28
Wolfewicz argues that the “the driving force behind machine learning is ordinary statistics [and that] the algorithm learned to make a prediction without being explicitly programmed, only based on patterns and inference”.28
Grieve notes that while ML is complicated, “at the end of the day, [ML] serves the same mechanical function that a flashlight, car, or computer screen does”29 and that ML can be interpreted as meaning “[a device continually] performs a function with the data given to it and gets progressively better over time”.29
ML is often leveraged by enterprise’s today for predictive analytics, such as risk analysis, fraud detection, and voice and image recognition. Grieve notes that ML powers a variety of “automated tasks that span across multiple industries, from data security firms that hunt down malware to finance professionals who want alerts for favorable trades [with] AI algorithms [that] are programmed to constantly learn in a way that simulates a virtual personal assistant”.29 Techopedia adds that “predictive analytics and other similar ML projects frequently require [computer scientists,] data scientists, and machine learning engineers”.40
The three primary learning algorithms used for training in ML consist of the following examples:
Middleton describes data scientists as a role where you “compose the models and algorithms needed to pursue [your] industry’s goals [and you] oversee the processing and analysis of data generated by the computers”.27 This role requires coding expertise, including languages like Python and Java, with “a strong understanding of the business and strategic goals of a company or industry”.27
Middleton describes machine learning engineers as a role where you “implement the data scientists’ models and integrate them into the complex data and technological ecosystems of the firm [and where you are] at the helm for the implementation [and] programming of automated controls or robots that take actions based on incoming data”.27
Techopedia notes that machine learning operations (MLOps) are the primary focus of a ML engineer’s job. MLOps is “an approach to managing the entire lifecycle of a machine learning model”.40 from training through daily use up to retirement. ML engineers tend to have knowledge of “mathematics and statistics, in addition to data modeling, feature engineering and programming”.40
It's likely that at an enterprise, data scientists and ML engineers will work together on a variety of AI-based projects. They may work on “deciding which type of learning algorithm will work best to solve a particular business problem [or on] deciding what data should be used for training and how machine learning model outcomes will be validated”.40
Techopedia defines a ML model as “the output of an ML algorithm that’s been run on data”.40 When it comes to differentiating between models and algorithms, the author of the Finance Train’s eponymous article Difference Between Model and Algorithm notes that “an algorithm is a set of rules to follow to solve a problem, [that] it will have a set of rules that need to be followed in the right order [to] solve the problem. A model is what you build by using the algorithm”.42
Here's an example of a workflow for creating an ML model:
In the BMCBlogs post Bias & Variance in Machine Learning: Concepts & Tutorials, author Shanika Wickramasinghe notes that “With larger data sets, various implementations, algorithms, and learning requirements, it has become even more complex to create and evaluate ML models since all those factors directly impact the overall accuracy and learning outcome of the model”.43 She continues stating that “any issues in the algorithm or polluted data set can negatively impact the ML model”.43
Techopedia states that despite the desire for transparent and explainable ML algorithms, “algorithmic transparency for machine learning can be more complicated than just sharing which algorithm was used to make a particular prediction”.40 While many of the popular algorithms today are freely available, the proprietary training data where bias often is rooted, is proprietary and harder to access.
What is bias in the context of machine learning algorithms? Wickramasinghe describes bias as the following:
Wickramasinghe adds that “characteristics of a high bias model include the following features:
Much like deep learning and ML, bias and variance in ML are often confused and conflated. Wickramasinghe describes variance as the following:
Wickramasinghe adds that “characteristics of a high variance model include the following features:
Underfitting and overfitting are terms for “how a model fails to match data”43. Wickramasinghe notes that “the fitting of a model directly correlates to whether it will return accurate predictions from a given data set”.43
Techopedia describes underfitting as “a condition that occurs when the ML model is so simple that no learning can take place”.44 The author adds that “if a predictive model performs poorly on training data, underfitting is the most likely reason”.44
Opposingly, overfitting is described as “a condition that occurs when a machine learning or deep neural network model performs significantly better for training data than it does for new data”.44 The author notes that when a ML model can't make accurate predictions about new data because it can't distinguish extraneous (noisey) data from essential data that forms a pattern”,44 then overfitting is likely the reason.
Wickramasinghe states that “bias and variance are inversely connected. It is impossible to have an ML model with a low bias and a low variance”.43 Therefore, if a ML algorithm is adjusted for a given data set, there’s a high probability that bias will be reduced and variance will increase along with the odds of inaccuracy for the model’s predictions. Similarly, crafting a model that better fits a data set reduces the risk of inaccuracy for predictions, lowering the variance and raising the risk of bias.
This is a constant balancing act between variance and bias that data engineers must maintain. Wickramasinghe notes that “having a higher variance does not indicate a bad ML algorithm. Machine learning algorithms should be able to handle some variance”.43
She argues that data engineers can approach the trade-off between bias and variance using the following methods:
Wickramasinghe presents a table listing common algorithms and their expected behavior between bias and variance:
Algorithm | Bias | Variance |
Linear Regression | High Bias | Less Variance |
Decision Tree | Low Bias | High Variance |
Bagging | Low Bias | High Variance (Less than Decision Tree) |
Random Forest | Low Bias | High Variance (Less than Decision Tree and Bagging)43 |
Wolfewicz defines AI, ML and deep learning as the following:
Middleton argues that the key differences between machine learning and deep learning involve the following elements:
So much of the technology that we use today relies on machine learning and deep learning algorithms that we take for granted. Grieve notes that in customer service, today’s AI apps “are used to drive self-service, increase agent productivity, and make workflows more reliable”.27
Waves of customers’ queries are fed into these algorithms that aggregate and process the information, before producing answers for the customers. Grieve notes that ML and deep learning both help to “power [NLP and help] computers to comprehend text and speech [while] Amazon Alexa and Apple’s Siri are two good examples of ‘virtual agents’ that can use speech recognition to answer a consumer’s questions”.27
Chatbots are another example of AI-infused technology to respond to customers. Grieve notes that “Zendesk’s AI chatbot, Answer Bot, [incorporates] a deep learning model to understand the context of a support ticket and learn which help articles it should suggest to a customer”.27