Slot Machines Use A Variable Ratio Because ____

Variable-Ratio (The Slot Machine) A variable-ratio schedule rewards a particular behavior but does so in an unpredictable fashion. The reinforcement may come after the 1st level press or the 15th, and then may follow immediately with the next press or perhaps not follow for another 10 presses. Roulette and slots cost the player more - house advantages of 5.3% for double-zero roulette and 5% to 10% for slots - while the wheel of fortune feeds the casino near 20% of the wagers, and keno is a veritable casino cash cow with average house advantage close to 30%.

Positive reinforcement, using food rewards to increase the likelihood a dog will repeat a desirable behavior, is universally regarded as the most reliable method for teaching commands. While the basic concepts of rewardbased training are easy to understand, people sometimes inadvertently inhibit progress by using too many—or too few —treats.

Let’s say you’re in Las Vegas playing a slot machine, but every time you deposit a quarter and pull the arm, you get your one quarter in return. This wouldn’t keep your attention for long, and you’d probably opt for a different machine.

Now, what if you started feeding your hard-earned quarters into the next machine, but for hours on end got none back? Chances are you’d become equally frustrated and end your short gambling career.

Applied to dog training, both of these extremes—continuous reinforcement or none at all—can lead to lower command compliance.

GET THE BARK IN YOUR INBOX!

“My dog will only sit if I have a treat.” Over the years, I have heard this refrain many times, and it almost always indicates that the dog was rewarded with treats for sitting on cue too often and for too long. Essentially, the dog had learned two things had to be true for him to comply: the sit cue plus a treat. If either were not true, he’d find something more interesting to do.

When initially teaching a new command, “Continuous Reinforcement”—CR in the geeky learning-theory world— is the most effective approach. For instance, when first teaching a puppy to sit, rewarding each successful completion (or “trial”) makes sense because your focus is on clearly pairing the verbal cue and hand gesture with the behavior: put the quarter in (your puppy sits on cue) and the reward appears (treat!).

But acting as your puppy’s loose slot machine for too long causes him to stop working so hard. Why bother sitting quickly, or at all, when a treat invariably appears? CR for too long also causes the dog to become dependent on the food reward: he will refuse to work unless food is presented. Before you get to that point—usually within a few days of teaching a new cue —it’s time to move to a less predictable reinforcement schedule.

Back to the gambling analogy. Once you’re sure your dog has a grasp on what you’re teaching him, it’s time to become a fair and honest slot machine, dispensing small food rewards less frequently for successful trials. (This is also a good time to find soft treats that won’t easily crumble to bits, and to always have a few hidden in your pocket.)

The psychology behind slots— enticing folks to pump coins into machines for hours on end—is that the probability of winning remains constant, even though the number of plays it takes to recoup your money, or better yet, hit the jackpot, changes. The unpredictability makes doing the same mundane activity, over and over, interesting and exciting. You can take advantage of this same psychology to train your dog faster.

When teaching your dog a new command, once you’ve determined that he knows what you’re expecting from him, begin randomly rewarding successful trials using “Variable Ratio” (VR) reinforcement. Start with a low ratio, rewarding roughly one out of every three trials, then increase the ratio over the course of several training sessions.

For example, when teaching your puppy to sit, provide a small treat for (successful) trials 2, 7, 9, 15, 18, 19, 20, 23 and 25. Notice that during 25 trials, sometimes he gets three rewards in a row, but sometimes, there’s a longer lag between treats. The idea is to keep him guessing—and working!

Over the course of twice-daily training sessions (two to five minutes each), increase the ratio until he is rewarded for roughly one out of every ten successful trials. The behavior should become a happy habit by then, although, to keep commands fresh, continue to occasionally reward your dog for life. In other words, don’t become the slot machine that never pays a jackpot!

There are other types of reinforcement schedules too involved for our purposes here, but one to take advantage of is “Differential Reinforcement of Excellent Behavior” or DRE. This is just a fancy way of saying “better performance earns bigger rewards.” Once you’ve worked through Continuous Reinforcement (treating every time to teach the command) and Variable Ratio (treating randomly to hone the behavior), you can polish the command by handsomely rewarding only the best trials.

Let’s think about DRE in terms of teaching recalls. Once your dog is largely responding to your “come” command, and you’ve worked through Variable Ratio reinforcement—by sometimes treating and sometimes not— start rewarding with higher-value treats, or more of what you have, only when your dog immediately and enthusiastically answers your call. If he stops and smells the roses (or whatever that was) en route, no reward is given.

Advancing through these levels is not rigid, and you may combine aspects of more than one as you progress. Be ready to back up a step if you’ve moved too fast—your dog will let you know!

Regulating opponents’ behavior
What follows is a special preview from Jeff’s book, Advanced Pot-Limit Omaha Volume II: LAG Play

Once you get to a certain point in your development as a poker player — you’ve learned hand valuations and acquired the necessary technical skills to play the game — the next big step to opening up your game is figuring out how to regulate your opponents’ behavior in such a way as to make them easier to play against. That is, the next step is founded in large part on psychology.

Enter variable-ratio reinforcement.

Variable-ratio reinforcement is generally defined as delivering reinforcement after a target behavior is exhibited a random number of times. Let’s take a slot machine, for example. A gambler sits down at a slot machine and bets $1 a pull. As you would expect, most of the time the gambler will bet $1 and lose, which of course is great for the casino. But if all the gambler does is bet $1 and lose every time, eventually he will quit or go broke, and never want to play again. So, every few spins, the slot machine will reward the gambler with a payoff: $1 here, $1 there; $5 here, $1 there.

Then, every once in a long while, the machine will reward the gambler with a big payoff in the form of a jackpot.

Now, none of this quite adds up, which is how the house wins in the long run. But the promise of the big payoff, along with the intermittent rewards, is generally enough for the casino to reinforce the target behavior, which is to have the gambler keep betting $1 a pull.

That brings us to our next topic, which is the reinforcement schedule.

Reinforcement Schedules:
Variable vs. Fixed
There are two basic types of reinforcement schedules: variable-ratio reinforcement schedules, and fixed-ratio reinforcement schedules.

Let’s start with the latter, which is the most basic. A fixed-ratio reinforcement schedule is one in which reinforcement is delivered at fixed intervals. Let’s say, for example, that you are casino management, and you want the slot machine to pay out 20 percent of the time, or every fifth spin. So, the gambler will lose $1 four times in a row and get a payout on the fifth one every time.

The reinforcement schedule would look something like this:
Slot Machine: Fixed-Ratio Reinforcement Schedule
Lose Lose Lose Lose Win
Lose Lose Lose Lose Win
Lose Lose Lose Lose Win
Lose Lose Lose Lose Win
Lose Lose Lose Lose Win

Adjusted for payouts, the schedule might look more like this:
Slot Machine: Fixed-Ratio Reinforcement Schedule With Payouts
-$1 -$1 -$1 -$1 +$2
-$1 -$1 -$1 -$1 +$10
-$1 -$1 -$1 -$1 +$1
-$1 -$1 -$1 -$1 +$4
-$1 -$1 -$1 -$1 +$1

In this scenario, for every 25 spins, the gambler would win $18 on the five winning spins and lose $20 on the rest, for a net loss of $2. For the house, this represents a payout of 92 percent and a house edge of 8 percent.

Now, all of this sounds great, but there is a major problem: Nobody would ever play a game with a payout (reinforcement) schedule like this one!

Examples Of Variable Ratio Reinforcement

OK, maybe “nobody” and “ever” might be a little strong, but the point remains, because it wouldn’t take long for the gambler to figure out that this slot machine pays out every fifth spin, and only every fifth spin. As a result, he would quit playing.

Example Of Variable Interval Reinforcement

Using a variable-ratio reinforcement schedule is the fix for this problem.

Variable-Ratio Reinforcement Schedule
A variable-ratio reinforcement schedule uses a predetermined ratio while delivering the reinforcement randomly. Going back to the slot machine, let’s say that you once again are casino management and want the slot machine to pay out 20 percent of the time, or every fifth time on average.

Now, your reinforcement schedule may look something like this:
Slot Machine: Variable-Ratio Reinforcement Schedule
Lose Lose Lose Lose Win
Lose Win Lose Lose Lose
Lose Lose Win Lose Lose
Win Lose Lose Lose Lose
Lose Lose Lose Win Lose

And adjusted for payouts, the schedule would look like this:
Slot Machine: Variable-Ratio Reinforcement Schedule With Payouts
-$1 -$1 -$1 -$1 +$2
-$1 +$10 -$1 -$1 -$1
-$1 -$1 +$1 -$1 -$1
+$4 -$1 -$1 -$1 -$1
-$1 -$1 -$1 +$1 -$1

Psychology Of Slot Machines

In aggregate, the expectation is the same: Over 25 spins, the gambler will still realize a net $2 loss, for a 92 percent payout and 8 percent house advantage for the casino. But in reality, this scenario is far more likely to achieve the desired result, which is to have the gambler keep playing. In contrast to the fixed-ratio reinforcement schedule, a variable-ratio reinforcement schedule with a 20 percent reinforcement ratio provides clusters of payouts (for example, back-to-back wins), as opposed to having spins (or blocks of spins) on which the gambler can say for certain that he will lose, and quit playing as a result.

Continuous Reinforcement Occurs When

This is because the variable-ratio reinforcement schedule does not specify when the payouts occur, but only how often they occur on average.

That said, in regard to pot-limit Omaha, there is one major application for variable-ratio reinforcement that I will discuss another time. That application is the continuation-bet (c-bet).