Moving onto Variable Schedules of ReinforcementPosted in: Clicker Training
Following on from the previous post on ‘Maintaining Behaviours’, I am delighted to be able to share a brilliant post from Katie Bartlett, a very experienced US clicker trainer, which she posted on Alexandra Kurland’s The Click That Teaches group outlining some solutions to try if we find ourselves stuck in a Continuous Reinforcement pattern.
Katherine (Katie) Bartlett has been involved with horses for 40 years, starting in the traditional world of hunter/jumper, eventing and then moving on to dressage. She attended Cornell University where she studied biology and had an interest in animal behavior, evolution, and ecology. After school she spent time working in biology and computers and then took time off to have a family. While staying home with her kids and horses, she discovered clicker training by reading Alexandra Kurland’s first book, Clicker Training for Your Horse. She thought it sounded interesting, started using clicker training with her horses, and got hooked.
In 2001 she attended her first clinic with Alexandra Kurland and has been a clinic regular ever since. Since that first clinic she has clicker trained all her own horses (8), done some local teaching, and become a regular on the internet groups discussing clicker training. Her main riding horse is a Dutch warmblood mare named Rosie who came to her with aggression issues and is now learning dressage. Katie continues to work with Alexandra Kurland, but is also working with local classical dressage trainers and integrating the clicker and balance work into a more traditional approach to dressage. Her website is www.equineclickertraining.com which is an educational site to promote and provide information about clicker training.
(Published with Katie Bartlett’s kind permission)
This is a good question and to be honest, it’s one that I don’t find is addressed often enough. I read plenty of things that say you just click and treat less, or you don’t need to click and treat once the behavior is learned. But I don’t find much actual information on the process of taking a finished behavior from a continuous reinforcement schedule to one where the behavior is maintained with fewer clicks and treats.
Using a variable reinforcement schedule comes up a lot and is used as an example of why you can maintain behavior with less reinforcement, but I clearly remember someone (I think at Clicker Expo) saying that while variable reinforcement can make behavior less resistant to extinction, it is not necessarily pleasant for the learner. People often use slot machines as an example of why variable reinforcement works, but this speaker pointed out that if you wander around in a casino, you see a lot of people on variable reinforcement schedules who are not happy.
I’ve thought about this a lot recently because one of the areas where my training could use improvement is how to get more behavior for each click and treat or how to maintain everyday behaviors without clicking and treating each time. Like you, for a long time I was happy to click and treat for lots of things and I didn’t mind clicking and treating for known behaviors even after the horse clearly knew them. But for various reasons, I am now looking at ways that I can minimize the clicking and treating for some behaviors.
I was thinking about this when I redid my website this winter and I added an article that outlines the steps of clicker training a behavior from start to finish. If you want to read that, you can find it on the articles page. There is a little bit at the end about how to maintain behaviors, but I really should write a longer piece on it once I have played around with it a bit more.
As of now, if someone asks about this, I would agree with you that it is a very individual thing. On the most basic level, if you want to maintain a behavior long term, there has to be some kind of reinforcement. And we have to remember that there are always competing reinforcers in the environment. As a trainer, my goal is to make sure there is enough reinforcement for the behavior I want, so that the horse doesn’t decide it is more reinforcing to do something else. I also have to decide how important the behavior is, and how precisely I need to maintain it.
I’ve done some experimenting and here are some bits and pieces about what I have done. I tend to be a “plan” person so I usually work through things in a systematic way, but I do know people who just decided to click and treat less and the horse figures it out and everything is ok. I find it is hard to make myself do that, as I seem to be either a softie or not have as much self control about clicking less than I would like. In my barn, it is sometimes debatable who is training who, but that’s a whole other subject!
Option 1: Add variability by moving the click around
I have a behavior that I have shaped and I no longer want to click and treat each effort, but it is a “one time” behavior meaning I only do it once in each session so I can’t just click and treat some tries and not others. Haltering would fall into this category. I found it was easy to end up clicking and treating each time I haltered the horse even after the horse was good at it. And, as you said, the horses got so they learned to expect a click and treat every time they were haltered.
Once I fall into a pattern like this, I have found that the best way to change the expectation is to just add some element of variability back in. I can move the click around a bit, so I don’t always click at the same step. If I was clicking for nose in the noseband loop, I might now click for the halter going over the ears. Or I might click for standing for a moment after the halter is on, or add in another step like ask for head down after the halter is on. I just want to break the pattern up so there is still reinforcement, but it is not always at the same place. I find that once I move the click around enough, I then have more options about when and how to reinforce the behavior.
If I am haltering the horse to lead it somewhere, I have had good luck haltering and asking for one step and then clicking. The horses are always surprised at first, but it is really no big deal since they know the behavior that comes next (walking with me) and I click and treat right after they move, so they adjust to this quickly. Then I can play with moving that click around until they are not getting clicked and reinforced until I ask for something else. I might insert a stop of some other simple behavior that I can click and treat. If I am bringing them in or out and it’s something they want to do, often I can just stop clicking and treating entirely. That works because they are still getting reinforced for the behavior since they want to go where I am taking them and they know putting the halter on is the first step.
I think there are a lot of behaviors where you can decrease how much you click and treat by using strategies like this, if you want to. I do want to point out that there are some “one time” behaviors that I do always click and treat. Bridling, standing at the mounting block, and coming when I call are some of them.
Option 2: Add variability by changing how much you ask for
This is similar to option 1 in that I am going to start changing the pattern so that I can remove the expectation of the click at a specific point in time, but I do it a bit differently. The idea here is to make it so that I can ask for varying amounts of behavior for each click, and to shift the horse in the direction of expecting to do more. Then if I ask for much less and don’t click and treat, the horse is ok with it because his reinforcement is that he had to do less behavior.
It is easier to explain with an example. I taught my horses to let me pick up their feet and clean them out with clicker training. I got to the point where I could pick up and clean out each foot and I would click and treat at the end. But then I got to the point where I wanted to only click at the end, or only click random feet, or some variation. I have lots of options for working through this, but first I have to decide how much and when I want to reinforce. What I decided was that I really shouldn’t need to click and treat for picking up each foot, especially if there was nothing in it. I don’t mind clicking and treating some, but not every foot every time.
So I played around with how much I expected of the horse when I picked up a foot. Instead of just picking it up and cleaning it out, I would hold it for longer, or move it around or brush it off. The idea was to make it so the horse expected that when I picked up his foot, I would be holding it for a while and doing different things. I got the horses used to me holding each foot up for longer periods of time and then I would sometimes throw in an “easy” one where I just picked it up and put it back down. When I did that, the horses got verbal praise only for their reinforcement. I have conditioned verbal praise as a reinforcer, but it is lower value than food. But they were totally ok with it, because in their minds (ok I’m guessing here), they had been expecting to have to do more and they got to put their foot down early.
Option 3: Create chains
You can be really systematic about getting more behavior by creating chains. A chain is a series of behaviors you do in a predictable order. You can build formal chains by asking the horse to do a number of behaviors in a certain order before clicking and treating. The horse learns that each behavior gets him closer to the final behavior, which is then rewarded.
Chains can be built forward or backward, whichever makes more sense for the situation, but the general idea is to get the horse used to doing a few behaviors before each click. I find that people fall into habits as easily as horses, so I have to make a conscious effort to not click at times. If I am not paying attention it is easier to just keep clicking and treating in the same way without moving the training along, so if you have to keep reminding yourself to ask for more before you click, you are not alone.
Remember that when you build chains, your cues should act as reinforcers so that the horse is getting reinforced for each behavior by the chance to do the next one, just not with the click and treat directly. Chains can fall apart for lots of different reasons but the first thing to check is the strength of each link. Every behavior has to be strong before you put the chain together.
Creating chains is another way to handle foot care. I always pick out my horses feet in the same order so I can chain (or backchain) those behaviors and only reinforce at the end of the chain. If you are not familiar with chains or backchaining, there are articles on my website about chains and loopy training (which uses many of the same ideas).
Option 4: Create other reinforcers to add variety in your training
The previous options listed above have similar components in that they are about changing the horse’s expectations and playing around with reinforcement, but within the structure of the training session. When I talk about creating other reinforces here, I am referring specifically to setting aside some time to condition other reinforcers that you can then mix in to your training.
Just to review, in option 1, I said you can often reinforce haltering by taking the horse some place it wants to go. So a horse that wants to come in can learn that the reinforcement for haltering is going in. In option 2, I made the behavior harder (more complicated) so that an easier repetition was a reward in itself because it required less effort than other repetitions. If you are prepared to run a mile and you get to stop after 0.25 miles for running well, running well is being reinforced by getting a break or less work (however you care to think about it.) In option 3, I showed how to use chains to get more behavior. Each behavior in the chain is reinforced by the chance to do the next behavior which gets the horse closer to the reinforcement at the end of the chain.
In addition to these options, I can specifically condition secondary reinforcers. I can teach my horse to accept other things for reinforcement (scratching, praise, etc…) by pairing them with food (classical conditioning) and then use them to reinforce known behaviors. Once a horse knows a behavior and it is easy for them, they will often accept other reinforcers some of the time.
Sometimes I can take advantage of reinforcers that happen to be present in certain situations. In the spring, we have a lot of gnats on our property and the horse’s ears get very itchy. My horses have learned that if they stand quietly for haltering, I will rub the insides of their ears. I might start out by itching their ears and then offering a treat too, but during gnat season the scratching is often more reinforcing than food so I can fade the food out quite quickly.
I am always on the lookout for itchy spots or indications that my horses find some behaviors reinforcing, which I can then use as reinforcement. I do want to point out that it’s important to be observant about how your horse responds to these alternate reinforcers because what a horse finds reinforcing can change from day to day. However, by mixing in some of them, I can add some variety to our routine and this makes the horses more flexible about reinforcement in general.
Elverson, Pa., USA
Please do not copy or duplicate without specific permissions from Katie Bartlett