Combining positive and negative reinforcementPosted in: Clicker Training
Can I combine positive reinforcement with negative reinforcement?
By Catherine Bell, The Equine Independent
“Positive reinforcement (+R), particularly when used in conjunction with clicker training, is commonly combined with the use of negative reinforcement (-R) and/or punishment. Typically the aversive stimuli (i.e. the pressure applied) in these cases will be mild and the combined approach is used to clarify and/or hasten the training. Is there anything wrong with this? Are those of us who would say “yes” just being dogmatic and purist in our approach to positive reinforcement? Or do we all need to take a step back and think more carefully about just how positive our positive training actually is?
Firstly I still don’t know of anyone who uses only +R all the time with all their horses and don’t believe it is possible (or useful). But I do believe it is possible, and extremely valuable in some cases, to have discrete sessions in which only +R is used – i.e. free shaping. For some horses, in some stages of their lives, I would say free shaping should make up most of the interaction they have with humans. But that depends on the horse and the stage it is at. More generally, outside those specific free-shaping sessions, the vast majority of emotionally well-placed horses will suffer no ill consequence for the occasional mild aversive stimulus. A gentle pull on the reins to stop or to raise the horse’s head from the grass will not cause psychological trauma to the well-adjusted individual.
But if you are going to use – within the same session and/or to achieve the same behaviour – a combination of +R and -R then various things can happen. This isn’t only because of bad training but also because of what is going on in the horse’s brain at the time.
The first reason is practical – if the horse is experiencing two different reinforcers pretty much simultaneously then the horse is going to be reinforced more by one of them than the other. This is known as “saliency” and is effectively the relative value of the reinforcers from the perspective of the horse. Does he find more value in the release of pressure or the reward? They are unlikely to be identical in value. The presence of the click and treat may well help the horse’s understanding along and confirm to him that he is performing the correct behaviour, but that is not the same thing as true positive reinforcement. The horse may well still be changing his behaviour because he is searching for the release of pressure, not because he is actively trying to earn a reward. The presence of rewards does not make your training positive; it is all down to the horse’s perception of the training and the reasons why he chooses to change his behaviour.
Another objection I have to the combination of positive and negative reinforcement is the issue of what Karen Pryor termed “The Poisoned Cue”. Due to classical (i.e. Pavlovian) conditioning, if you are using pressure then the level of pressure the horse feels in its training will become associated with you and your training equipment/environment . It’s a bit like receiving a phone call from someone you don’t want to speak to, you start dreading the phone ringing. So if you combine the pressure with some form of positive reinforcement, the positive reinforcement will be diminished in value (like getting a pay cheque, knowing that it’s all going to go straight out again on bills), possibly to the point of being irrelevant. While you could argue that some +R is better than nothing (in fact I *did* used to argue that) I have also seen a demonstration by someone combining CT with a well-known pressure-based training method and it was really really awful. More on that in a moment….
If an animal is experiencing genuine positive reinforcement then it is believed from neuroscience studies that a particular region of the brain is activated and dopamine is released. This is the opioid which makes us feel good when something good happens. Over time, this dopamine release can take place even in the absence of an actual reward. So if we do lots of reward-based training and trigger dopamine, then even just our arrival at the field can do the same, whether or not we have treats. It’s not just about the horse wanting us for our treats. We make the horse feel good. This is the neurological basis for the Pavlov’s dogs result. We feel genuinely pleased when our payslip arrives, because of what it represents, even though it’s only actually a worthless piece of paper.
If we do pressure-based training or even just “neutral” training then there is no dopamine released, even when you release the pressure. A different brain circuit is stimulated and, depending on how much pressure you use, there may be an adrenalin release, i.e. a stress response.
If we mix the two whilst training the same behaviour then the dopamine response is likely to be over-ridden by the adrenalin. Even if you normally do -R (depending on the degree of pressure – either physical or emotional) and decide to have an occasional pure +R session, you may still not be getting the dopamine release because of what you normally represent to your horse. So the best-case scenario may well be that you are not positively reinforcing your horse at all. You might be giving it treats but that is not the same thing as the horse FEELING positively reinforced. That’s not to say this is necessarily bad, and it may help your training along a bit if your timing is good, but it makes sense to be doing what you think you are doing and not complicating the session with red herrings.
The use of +R can encourage a horse to offer behaviours in the attempt to earn a reward and this puts the horse in a very emotionally vulnerable position (which is why a proper +R free-shaping session will reassure the horse that it is ok and that there is no negative consequence for a wrong answer). If pressure is likely to be used as well when the horse gets the wrong behaviour then it can create a major conflict in the horse’s mind, increasing the stress yet further. If a lot of pressure is being used then the best thing for the horse to do is just do as he’s told so as to avoid the pressure. If he is being encouraged to offer behaviours spontaneously as well then it puts the horse is a very difficult position. It’s like when you’re at school and you have to summon up the courage to speak in front of the class and then the teacher tells you you’re stupid. This isn’t just “bad training”, it can also be technically good training in a very unempathic way and it is something I have seen from various trainers who (perhaps inadvertently) prioritise the achievement of certain behaviours above the feelings of the horse. The horse I watched who stands out in particular was being trained with a combination of a Natural Horsemanship method and CT. The pressure was all at a relatively low sort of level but that didn’t stop the horse being very stressed about what it was being expected to do. He clearly knew the cost of getting a wrong answer but was unable to just switch off and respond to cues because the CT element demanded that he offer behaviours. The difference in attitude of a horse under this sort of conflict and a horse having a true free-shaping session are just such worlds apart that it’s very hard to do justice to it on a keyboard….
There is nothing wrong with doing low-pressure or neutral work, no-one is living in a state of constant dopamine fix! But if you never receive it you are unlikely to be in very emotionally developed place. In humans we call it “depression”. The horse is not likely to be making psychologically healthy choices and enjoying his work, merely responding to cues and trying to keep out of trouble. The ideal is that the horse is engaging his brain and thinking “howabout if I try a step backwards”, rather than “I need to move away from pressure” – free-shaping is often very much about “brain exercises” rather than physical training. There are, of course, caveats to these generalisations that can be made in individual cases. When I clicker trained my horse to walk backwards, I did start the training by “cheating” and using a light hand pressure on his chest and so negative reinforcement was involved to help him understand the behaviour I wanted. But once he understood the right behaviour, he started to offer it spontaneously and any residual association with the pressure was clearly counter-conditioned by the on-going purely positive free-shaping. It is better if you can avoid this sort of short-cut by correct shaping but if the alternative is a horse who is likely to become frustrated by not understanding the right behaviour then it may be appropriate – feel and judgement are always crucial.
My personal preference for a horse in an emotionally “good” place is to have some pure +R free-shaping sessions interspersed with just “normal” -R. For dealing with specific problems I would take a step back and devise a shaping plan with tiny steps so that each step gives the opportunity for reward and positive associations with the task. For horses in an emotionally difficult place then I would say many more free-shaping sessions are necessary before the horse is ready for -R and these sessions may need to be spread out over a long period of time. It is time well-spent and will create the foundations for a much more successful horse-human relationship.“