Today I gave a presentation in ISDA’10 about the recently published paper “Intelligent OLCBP Agent Model ofr RTS Games”. Below is this presentation !
My goal from this post is to simplify machine learning as much as possible. I have summarized what’s considered to be a summary in a question/answer form in order to let interested people get the big picture rapidly.
What is Machine Learning?
Machine learning is the study of computer algorithms to make the computer learn stuff. The learning is based on examples, direct experience or instruction. In general, machine learning is about learning to do better in the future based on what was experienced in the past.
What’s its relation to Artificial Intelligence?
Machine learning is a core subarea of Artiﬁcial Intelligence (AI) because:
- It is very unlikely that we will be able to build any kind of intelligent system capable of any of the facilities that we associate with intelligence, such as language or vision, without using learning to get there. These tasks are otherwise simply too diﬃcult to solve.
- We would not consider a system to be truly intelligent if it were incapable of learning since learning is at the core of intelligence.
Although a subarea of AI, machine learning also intersects broadly with other ﬁelds, especially statistics, but also mathematics, physics, theoretical computer science and more.
Examples of machine learning applications ?
Optical Character Recognition (OCR) – Face Detection – Spam Filtering- Topic Spotting – Spoken language understanding – Medical Diagnosis – Customer Segmentation – Fraud Detection – Weather Prediction
Machine learning general approaches ?
Supervised learning simply makes the computer learn by showing it examples. An example of this could be telling the computer that a specific handwritten “Z” is really the letter “Z”. So afterward, when the computer is being questioned if this letter is a “Z” or not, it can answer.
A supervised learning problem could either be a classification or a regression. In classiﬁcation, we want to categorize objects into ﬁxed categories. In regression, on the other hand, we are trying to predict a real value. For instance, we may wish to predict how much it will rain tomorrow.
Unsupervised learning simple makes the computer divide – on its own – a set of objects into a number of groups based on the differences between them. For example if a group of fruit (cucumber and tomatoes ) are the set of objects introduced, the computer – based on the difference in color, size and smell – will tell that certain objects (which it didn’t know they’re named Cucumbers) belong to a certain group and other objects (which it didn’t know they’re named tomatoes) belong to an another certain group.
Sometimes, it’s not a single action (such as figuring out the type of the fruit) that is important, what is important is the policy that is the sequence of actions to reach the goal. There is no such thing as the best action; an action is good if it’s part of a good policy. A good example is game playing where a single move by itself is not that important; it is the sequence of right moves that is good.
A simple example of a machine learning problem ?
In Figure 1, supervised learning is demonstrated; notice that it consists of 2 phases (that could be done at the same time) :
- Training phase: where the computer learns what the right things to do are. As you can see, the computer learns by an example that bats, leopards, zebras and mice are land mammals (+ve sign). On the other hand, ants, dolphins, sea lions, sharks and chicken are not (-ve sign)
- Testing phase: where the computer evaluates what it has learnt. It’s asked to state whether the tiger, tuna and platypus are land mammals or not .
Basic Definitions for a supervised learning classification problem
- An example (sometimes also called an instance) is the object that is being classiﬁed. For instance, in OCR, the images are the examples.
- An example is described by a set of attributes, also known as features or variables. For instance, in medical diagnosis, a patient might be described by attributes such as gender, age, weight, blood pressure, body temperature, etc.
- The label is the category that we are trying to predict. For instance, in OCR, the labels are the possible letters or digits being represented. During training, the learning algorithm is supplied with labeled examples, while during testing, only unlabeled examples are provided.
- The rule used for mapping from an example to a label is called a concept.
3 conditions for learning to succeed
There are 3 conditions that must be met for learning to succeed.
- We need enough data
- We need to ﬁnd a rule (concept) that makes a low number of mistakes on the training data.
- We need that rule to be as simple as possible
Note that the last two requirements are typically in conﬂict with one another: we sometimes can only ﬁnd a rule that makes a low number of mistakes by choosing a rule that is more complex, and conversely, choosing a simple rule can sometimes come at the cost of allowing more mistakes on the training data. Finding the right balance is perhaps the most central problem of machine learning. The notion that simple rules should be preferred is often referred to as “Occam’s razor.”
Rob Schapire, COS 511: THEORETICAL MACHINE LEARNING, LECTURE 1
Ethem Alpaydin , Introduction to machine learning , 2004.
Introduction to the introduction
Greetings! Today I talk about the AI topic I’ m most concerned about nowadays. It’s RL, CBR, hybrid CBR/RL Techniques and the comparison of CBR and RL techniques. According to my information the Hybridization of CBR with RL is a relatively modern research topic (since 2005). However, I have never seen any document comparing RL to CBR (I’d love to see one).
I will give a slight introduction today.
Case based reasoning
Case-based reasoning (CBR) a Machine Learning Technique which aims to solve a new problem (case) using a database (case-base) of old problems (cases). So, it depends on past Experience.
- NB: For people unaware of the above two techniques, please read more about them before proceeding.
Can we compare CBR to RL?
What I think is … Yes we can, some people say “RL is a problem but CBR is a solution to a problem so how can you compare them?” I shall answer: “When I mean RL I mean implicitly RL techniques such as TD-Learning and Q-Learning”. CBR solves the learning problem by depending on past experience while RL solves the learning problem depending on trial and error.
How are CBR and RL similar anyway?
- In CBR we have the Case-Base, In Reinforcement Learning we have the state-action space. Each case consists of the problem and its solution. In RL the action is considered the solution to the current state too.
- In CBR there could be a value for each case that measures the performance of this case; in RL each state-action pair has its value.
- In RL rewards from the environment are the way of updating the state-action pairs’ values. In CBR there are no rewards but after applying each case a revision process is performed to the case after testing it to update its performance value.
- In CBR retrieving the case is done in RL under the name of “the policy of choosing actions from the action space”
- In CBR adaptation is performed after retrieving the action, in RL no adaptation is performed because the action-space contains ALL the possible solutions.
And there are more examples to say but those are enough.
So Are CBR and RL techniques the same thing?
Of course not, there are many differences between them. CBR solves the learning problem by depending on past experience while RL solves the learning problem depending on trial and error.
What is the significance of hybrid CBR/RL Techniques?
It’s as if we make the computer learn BOTH by trial and error and past experience. Of-course this often leads to better results. We can say that both complete each other.
How is CBR hybridized with RL and vice versa?
CBR needs RL techniques in the revising phase. Q-learning is used in this paper: “Transfer Learning in Real-Time Strategy Games Using Hybrid CBR/RL -2007″ , and another example is the master “A CBR/RL system for learning micromanagement in real-time strategy games – 2009″
RL needs CBR for function approximation; an example is the paper “CBR for State Value Function Approximation in Reinforcement Learning – 2005″
RL needs CBR for learning continuous action model (also approximation), an example is: “Learning Continuous Action Models in a Real-Time Strategy Environment – 2008″
Heuristically Accelerated RL – One of the Heuristics here is the case-base, an example is “Improving Reinforcement Learning by using Case Based Heuristics – 2009”
So the questions here are …
- When to use Reinforcement Learning and when to use Case-Based Reasoning?
- Are there Cases where Applying one of them is completely refused from the technical point of view?
- If we are dealing with infinite number of state-action pairs, what is the preferred solution? Could both techniques serve well separately?
- How can we make the best use of Hybrid CBR/RL approach to tackle the problem?
I’ll do my best to answer them in future post(s) .However; I’d be delighted if an expert could answer now.
Bootstrapping : An expression i find extremely interesting. Bootstrapping originally means the impossible action of one lifting himself using his own bootstraps (The one shown in the figure) . Bootstrapping -also named booting- has been used in a wide range of scientific terms in Science in general; especially Computer Science.
Examples of Applying bootstrapping :
In Bussiness, Bootstrapping is to start a business without external help/capital.
In Statistics, Bootstrapping is a resampling technique used to obtain estimates of summary statistics.
In Computing in general, Bootstrapping is the summary of the process of a simple computer system activating a more complicated computer system.
In Compilers, Bootstrapping is writing a compiler for a computer language using the computer language itself to code the compiler.
In Networks, A Bootstrapping Node is a network node that helps newly joining nodes successfully join a P2P network.
In linguistics, Bootstrapping is a theory of language acquisition.
Bootstrapping And AI …
Bootstrapping in AI is using a week learning method to provide the starting information for a stronger learning method. For Example, Consider a Classifier that classifies a set of samples . It Uses Clustering (“week” Unsupervised Learning ) to estimate the cluster of each sample, then considers the estimated cluster for each sample its REAL class in the next “stronger” supervised learning which can finally achieve high performance.
So As u see, The Classifier has built estimates based on its OWN estimates. It has predicted new stuff based on its OWN previous predictions.
Bootstrapping and Reinforcement Learning
Since Reinforcement Learning is based on Dynamic Programming. Bootstrapping plays an important role here too.
Reinforcement Learning Methods are mainly classified into its 2 practical techniques : Monte-Carlo Methods and Temporal-Difference Methods.
In Monte-Carlo Methods the agent doesn’t receive any reward for its actions except after its goal is achieved. So No Bootstrapping here occurs because the reward is a “REAL ASSURED” reward.
On The contrary in Temporal-Difference Methods, The Agents receives rewards after every action it does by estimating whether this action has made the goal closer or not.These “estimated” rewards affect its future actions. Thus the Agent using Temporal-Difference Learning bootstraps all the way until it achieves it desired goal.
In The End …
Bootstrapping is vital to Machine Learning because it increases the speed of learning and makes machine learning resemble human learning.
This was just an introduction. Maybe We’ll talk about it more later. But Now We have more interesting stuff to talk about 😉