What is machine learning Everything you need to know || ARLearners

What is machine learning Everything you need to know

2019-08-20

Machine learning is enabling computers to tackle tasks that have, until now, only been carried out by people.

From driving cars to translating speech, machine learning is driving an explosion in the capabilities of artificial intelligence -- helping software make sense of the messy and unpredictable real world.

But what exactly is machine learning and what is making the current boom in machine learning possible?

WHAT IS MACHINE LEARNING?

At a very high level, machine learning is the process of teaching a computer system how to make accurate predictions when fed data.

Those predictions could be answering whether a piece of fruit in a photo is a banana or an apple, spotting people crossing the road in front of a self-driving car, whether the use of the word book in a sentence relates to a paperback or a hotel reservation, whether an email is spam, or recognizing speech accurately enough to generate captions for a YouTube video.

The key difference from traditional computer software is that a human developer hasn't written code that instructs the system how to tell the difference between the banana and the apple.

Instead a machine-learning model has been taught how to reliably discriminate between the fruits by being trained on a large amount of data, in this instance likely a huge number of images labelled as containing a banana or an apple.

Data, and lots of it, is the key to making machine learning possible.

WHAT IS THE DIFFERENCE BETWEEN AI AND MACHINE LEARNING?

Machine learning may have enjoyed enormous success of late, but it is just one method for achieving artificial intelligence.

At the birth of the field of AI in the 1950s, AI was defined as any machine capable of performing a task that would typically require human intelligence.

AI systems will generally demonstrate at least some of the following traits: planning, learning, reasoning, problem solving, knowledge representation, perception, motion, and manipulation and, to a lesser extent, social intelligence and creativity.

Alongside machine learning, there are various other approaches used to build AI systems, including evolutionary computation, where algorithms undergo random mutations and combinations between generations in an attempt to "evolve" optimal solutions, and expert systems, where computers are programmed with rules that allow them to mimic the behavior of a human expert in a specific domain, for example an autopilot system flying a plane.

WHAT ARE THE MAIN TYPES OF MACHINE LEARNING?

Machine learning is generally split into two main categories: supervised and unsupervised learning.

WHAT IS SUPERVISED LEARNING?

This approach basically teaches machines by example.

During training for supervised learning, systems are exposed to large amounts of labelled data, for example images of handwritten figures annotated to indicate which number they correspond to. Given sufficient examples, a supervised-learning system would learn to recognize the clusters of pixels and shapes associated with each number and eventually be able to recognize handwritten numbers, able to reliably distinguish between the numbers 9 and 4 or 6 and 8.

However, training these systems typically requires huge amounts of labelled data, with some systems needing to be exposed to millions of examples to master a task.

As a result, the datasets used to train these systems can be vast, with Google's Open Images Dataset having about nine million images, its labeled video repository YouTube-8M linking to seven million labeled videos and ImageNet, one of the early databases of this kind, having more than 14 million categorized images. The size of training datasets continues to grow, with Facebook recently announcing it had compiled 3.5 billion images publicly available on Instagram, using hashtags attached to each image as labels. Using one billion of these photos to train an image-recognition system yielded record levels of accuracy -- of 85.4 percent -- on ImageNet's benchmark.

The laborious process of labeling the datasets used in training is often carried out using crowdworking services, such as Amazon Mechanical Turk, which provides access to a large pool of low-cost labor spread across the globe. For instance, ImageNet was put together over two years by nearly 50,000 people, mainly recruited through Amazon Mechanical Turk. However, Facebook's approach of using publicly available data to train systems could provide an alternative way of training systems using billion-strong datasets without the overhead of manual labeling.

WHAT IS UNSUPERVISED LEARNING?

In contrast, unsupervised learning tasks algorithms with identifying patterns in data, trying to spot similarities that split that data into categories.

An example might be Airbnb clustering together houses available to rent by neighborhood, or Google News grouping together stories on similar topics each day.

The algorithm isn't designed to single out specific types of data, it simply looks for data that can be grouped by its similarities, or for anomalies that stand out.

WHAT IS SEMI-SUPERVISED LEARNING?

The importance of huge sets of labelled data for training machine-learning systems may diminish over time, due to the rise of semi-supervised learning.

As the name suggests, the approach mixes supervised and unsupervised learning. The technique relies upon using a small amount of labelled data and a large amount of unlabelled data to train systems. The labelled data is used to partially train a machine-learning model, and then that partially trained model is used to label the unlabelled data, a process called pseudo-labelling. The model is then trained on the resulting mix of the labelled and pseudo-labelled data.

The viability of semi-supervised learning has been boosted recently by Generative Adversarial Networks ( GANs), machine-learning systems that can use labelled data to generate completely new data, for example creating new images of Pokemon from existing images, which in turn can be used to help train a machine-learning model.

Were semi-supervised learning to become as effective as supervised learning, then access to huge amounts of computing power may end up being more important for successfully training machine-learning systems than access to large, labelled datasets.

WHAT IS REINFORCEMENT LEARNING?

A way to understand reinforcement learning is to think about how someone might learn to play an old school computer game for the first time, when they aren't familiar with the rules or how to control the game. While they may be a complete novice, eventually, by looking at the relationship between the buttons they press, what happens on screen and their in-game score, their performance will get better and better.

An example of reinforcement learning is Google DeepMind's Deep Q-network, which has beaten humans in a wide range of vintage video games. The system is fed pixels from each game and determines various information about the state of the game, such as the distance between objects on screen. It then considers how the state of the game and the actions it performs in game relate to the score it achieves.

Over the process of many cycles of playing the game, eventually the system builds a model of which actions will maximize the score in which circumstance, for instance, in the case of the video game Breakout, where the paddle should be moved to in order to intercept the ball.

HOW DOES SUPERVISED MACHINE LEARNING WORK?

Everything begins with training a machine-learning model, a mathematical function capable of repeatedly modifying how it operates until it can make accurate predictions when given fresh data.

Before training begins, you first have to choose which data to gather and decide which features of the data are important.

A hugely simplified example of what data features are is given in this explainer by Google, where a machine learning model is trained to recognize the difference between beer and wine, based on two features, the drinks' color and their alcoholic volume (ABV).

Each drink is labelled as a beer or a wine, and then the relevant data is collected, using a spectrometer to measure their color and hydrometer to measure their alcohol content.

An important point to note is that the data has to be balanced, in this instance to have a roughly equal number of examples of beer and wine.

The gathered data is then split, into a larger proportion for training, say about 70 percent, and a smaller proportion for evaluation, say the remaining 30 percent. This evaluation data allows the trained model to be tested to see how well it is likely to perform on real-world data.

Before training gets underway there will generally also be a data-preparation step, during which processes such as deduplication, normalization and error correction will be carried out.

The next step will be choosing an appropriate machine-learning model from the wide variety available. Each have strengths and weaknesses depending on the type of data, for example some are suited to handling images, some to text, and some to purely numerical data.

HOW DOES SUPERVISED MACHINE-LEARNING TRAINING WORK?

Basically, the training process involves the machine-learning model automatically tweaking how it functions until it can make accurate predictions from data, in the Google example, correctly labeling a drink as beer or wine when the model is given a drink's color and ABV.

A good way to explain the training process is to consider an example using a simple machine-learning model, known as linear regression with gradient descent. In the following example, the model is used to estimate how many ice creams will be sold based on the outside temperature.

Imagine taking past data showing ice cream sales and outside temperature, and plotting that data against each other on a scatter graph -- basically creating a scattering of discrete points.

To predict how many ice creams will be sold in future based on the outdoor temperature, you can draw a line that passes through the middle of all these points, similar to the illustration below.

Once this is done, ice cream sales can be predicted at any temperature by finding the point at which the line passes through a particular temperature and reading off the corresponding sales at that point.

Bringing it back to training a machine-learning model, in this instance training a linear regression model would involve adjusting the vertical position and slope of the line until it lies in the middle of all of the points on the scatter graph.

At each step of the training process, the vertical distance of each of these points from the line is measured. If a change in slope or position of the line results in the distance to these points increasing, then the slope or position of the line is changed in the opposite direction, and a new measurement is taken.

In this way, via many tiny adjustments to the slope and the position of the line, the line will keep moving until it eventually settles in a position which is a good fit for the distribution of all these points, as seen in the video below. Once this training process is complete, the line can be used to make accurate predictions for how temperature will affect ice cream sales, and the machine-learning model can be said to have been trained.

While training for more complex machine-learning models such as neural networks differs in several respects, it is similar in that it also uses a "gradient descent" approach, where the value of "weights" that modify input data are repeatedly tweaked until the output values produced by the model are as close as possible to what is desired.

HOW TO EVALUATE MACHINE-LEARNING MODELS?

Once training of the model is complete, the model is evaluated using the remaining data that wasn't used during training, helping to gauge its real-world performance.

To further improve performance, training parameters can be tuned. An example might be altering the extent to which the "weights" are altered at each step in the training process.

WHAT ARE NEURAL NETWORKS AND HOW ARE THEY TRAINED?

A very important group of algorithms for both supervised and unsupervised machine learning are neural networks. These underlie much of machine learning, and while simple models like linear regression used can be used to make predictions based on a small number of data features, as in the Google example with beer and wine, neural networks are useful when dealing with large sets of data with many features.

Neural networks, whose structure is loosely inspired by that of the brain, are interconnected layers of algorithms, called neurons, which feed data into each other, with the output of the preceding layer being the input of the subsequent layer.

Each layer can be thought of as recognizing different features of the overall data. For instance, consider the example of using machine learning to recognize handwritten numbers between 0 and 9. The first layer in the neural network might measure the color of the individual pixels in the image, the second layer could spot shapes, such as lines and curves, the next layer might look for larger components of the written number -- for example, the rounded loop at the base of the number 6. This carries on all the way through to the final layer, which will output the probability that a given handwritten figure is a number between 0 and 9.

WHAT IS DEEP LEARNING AND WHAT ARE DEEP NEURAL NETWORKS?

A subset of machine learning is deep learning, where neural networks are expanded into sprawling networks with a huge number of layers that are trained using massive amounts of data. It is these deep neural networks that have fueled the current leap forward in the ability of computers to carry out task like speech recognition and computer vision.

There are various types of neural networks, with different strengths and weaknesses. Recurrent neural networks are a type of neural net particularly well suited to language processing and speech recognition, while convolutional neural networks are more commonly used in image recognition. The design of neural networks is also evolving, with researchers recently devising a more efficient design for an effective type of deep neural network called long short-term memory or LSTM, allowing it to operate fast enough to be used in on-demand systems like Google Translate.

The AI technique of evolutionary algorithms is even being used to optimize neural networks, thanks to a process called neuroevolution. The approach was recently showcased by Uber AI Labs, which released papers on using genetic algorithms to train deep neural networks for reinforcement learning problems.

WHY IS MACHINE LEARNING SO SUCCESSFUL?

While machine learning is not a new technique, interest in the field has exploded in recent years.

This resurgence comes on the back of a series of breakthroughs, with deep learning setting new records for accuracy in areas such as speech and language recognition, and computer vision.

What's made these successes possible are primarily two factors, one being the vast quantities of images, speech, video and text that is accessible to researchers looking to train machine-learning systems.

But even more important is the availability of vast amounts of parallel-processing power, courtesy of modern graphics processing units (GPUs), which can be linked together into clusters to form machine-learning powerhouses.

Today anyone with an internet connection can use these clusters to train machine-learning models, via cloud services provided by firms like Amazon, Google and Microsoft.

As the use of machine-learning has taken off, so companies are now creating specialized hardware tailored to running and training machine-learning models. An example of one of these custom chips is Google's Tensor Processing Unit (TPU), the latest version of which accelerates the rate at which machine-learning models built using Google's TensorFlow software library can infer information from data, as well as the rate at which they can be trained.

These chips are not just used to train models for Google DeepMind and Google Brain, but also the models that underpin Google Translate and the image recognition in Google Photo, as well as services that allow the public to build machine learning models using Google's TensorFlow Research Cloud. The second generation of these chips was unveiled at Google's I/O conference in May last year, with an array of these new TPUs able to train a Google machine-learning model used for translation in half the time it would take an array of the top-end GPUs, and the recently announced third-generation TPUs able to accelerate training and inference even further.

As hardware becomes increasingly specialized and machine-learning software frameworks are refined, it's becoming increasingly common for ML tasks to be carried out on consumer-grade phones and computers, rather than in cloud datacenters. In the summer of 2018, Google took a step towards offering the same quality of automated translation on phones that are offline as is available online, by rolling out local neural machine translation for 59 languages to the Google Translate app for iOS and Android.

WHAT IS ALPHAGO?

Perhaps the most famous demonstration of the efficacy of machine-learning systems was the 2016 triumph of the Google DeepMind AlphaGo AI over a human grandmaster in Go, a feat that wasn't expected until 2026. Go is an ancient Chinese game whose complexity bamboozled computers for decades. Go has about 200 moves per turn, compared to about 20 in Chess. Over the course of a game of Go, there are so many possible moves that searching through each of them in advance to identify the best play is too costly from a computational standpoint. Instead, AlphaGo was trained how to play the game by taking moves played by human experts in 30 million Go games and feeding them into deep-learning neural networks.

Training the deep-learning networks needed can take a very long time, requiring vast amounts of data to be ingested and iterated over as the system gradually refines its model in order to achieve the best outcome.

However, more recently Google refined the training process with AlphaGo Zero, a system that played "completely random" games against itself, and then learnt from the results. At last year's prestigious Neural Information Processing Systems (NIPS) conference, Google DeepMind CEO Demis Hassabis revealed AlphaGo had also mastered the games of chess and shogi.

DeepMind continue to break new ground in the field of machine learning. In July 2018, DeepMind reported that its AI agents had taught themselves how to play the 1999 multiplayer 3D first-person shooter Quake III Arena, well enough to beat teams of human players. These agents learned how to play the game using no more information than the human players, with their only input being the pixels on the screen as they tried out random actions in game, and feedback on their performance during each game.

More recently DeepMind demonstrated an AI agent capable of superhuman performance across multiple classic Atari games, an improvement over earlier approaches where each AI agent could only perform well at a single game. DeepMind researchers say these general capabilities will be important if AI research is to tackle more complex real-world domains.

WHAT IS MACHINE LEARNING USED FOR?

Machine learning systems are used all around us, and are a cornerstone of the modern internet.

Machine-learning systems are used to recommend which product you might want to buy next on Amazon or video you want to may want to watch on Netflix.

Every Google search uses multiple machine-learning systems, to understand the language in your query through to personalizing your results, so fishing enthusiasts searching for "bass" aren't inundated with results about guitars. Similarly Gmail's spam and phishing-recognition systems use machine-learning trained models to keep your inbox clear of rogue messages.

One of the most obvious demonstrations of the power of machine learning are virtual assistants, such as Apple's Siri, Amazon's Alexa, the Google Assistant, and Microsoft Cortana.