While winning all the time is the most desirable state, it is not necessarily practical or even preferred especially when we are trying to learn something new.
In any new learning endeavour, there are bound to be things we will not fully understand at first. There will be many hidden nuances of the topic that might not be visible on our initial perusal of the learning material. If there is a grading system that accompanies the learning material, then upon completion of the material and taking the test we may find out that we do not get a perfect score, while this is distasteful, it is actually a good thing as we will see below.
Positive learning is the kind of learning we do when we get all the facts right on our graded tests. This gives us the illusion that we know all that there is to know about what we have been studying. Emotionally it is pleasant and encourages us to want to learn more because of the dopamine release we get in our brains from a rewarding experience.
We so enjoy positive learning that we have designed our AI systems to make most of their decisions from learning about positive examples. If we want to teach our AI systems about cats, we show it a lot of cats and hope that the training system will be able to learn every possible (or positive) thing about cats. While this is very desirable and easy on our minds when designing these systems, problems arise when these systems are exposed to adversarial examples or when these systems misidentify some object even after training it with lots of data and even after proper testing.
In humans most of the time, if we do not fail enough to get a very solid taste of failure, we could actually go out to the world to be great successes, but when faced with adversarial situations for which our positive learning did not prepare us, our failures can be catastrophic with sometimes very little hope of recovery.
When learning some new stuff with a well-designed grading system, failure can result in a massive improvement in our learning outcomes if we understand how important failure is. When we select an option in a graded test that turns out to be the wrong option, there might be a small churn in our stomachs as our emotional systems react to the unpleasant scenario. But if we understand that the pain we feel when we fail is actually a very powerful emotional force that can be channelled, we could use the failure to learn the material very well and imprint the knowledge much stronger in our brains.
Our brains are wired to remember negative experiences more than positive experiences because of our difficult evolutionary past on earth. It would make sense to remember which part of the forests has a lot of sabre tooth tigers than which part has more berries because we can survive without berries for long, but encountering a sabre tooth tiger may be the end our existence.
Since negative emotions are stronger than positive ones, and also our brains are biased toward remembering negative experiences, then failing to select the correct option in some test might make us, if we can stomach repeating a few more times, memorise the correct answer much better.
It is better to fail as much as possible in private so that we fail less in public, that's why heavily repeated practice is so essential to learning.
When it comes to designing an AI system, the over dominance of the training system using positive examples mostly will lead to systems that are fragile and not very resilient to catastrophe. One might say that Reinforcement learning involves penalizing systems of making the wrong decisions and eventually the systems learn to avoid the wrong decisions. This doesn't really fully map to the kind of negative learning I am talking about.
We, humans, learn by reinforcement too, when we are rewarded for good actions and punished for the bad ones. But this kind of reinforcement breeds a resistance towards failure. I don't know if the machine equivalent of reinforcement learning actually develops any real resistance towards wrong actions or if the system just becomes biased towards making only positive actions and can fail catastrophically if some new unanticipated environment is presented.
What I am trying to emphasize in this post is that if we train systems, human or AI, by exposing them to much negation, then it would be easier for them to start naturally recognizing the correct solutions.
In humans, we could encourage people to embrace failure and make our larger societies be more tolerant of failure. Encouraging people to fail fast like the tech companies do, enables people to make their mistakes early so as to avoid failure in the larger scenes of life. Although initially there might be a cost to failing, after a while the graph dips rapidly because eventually through negative examples you learn what to avoid and then it becomes easier to make progress positively.
A society based on harsh punishment for failure will make people scared and scared people avoid taking actions to avoid failure, which can result in stagnation.
Failing fast and rapidly in the early part of some project will lead to less failure in the future as we learn rapidly what not to do and gain a deeper understanding of the nuances of the systems we are dealing with.
Avoiding failure will only shift it into the future. It is better to set out early enough to develop a very strong stomach for failure if you wish to achieve major life endeavours.
Learning through negative examples in an AI system is something I plan to explore deeply, and when I do I will be sure to update everyone on my progress.
Comments
Post a Comment