Correct Answer

verified
Solution of Value Determination Equations: g(R1)= -2.67 v0(R1)= 2.667 v1(R1)= 0 Policy Improvement: State 0: -2 + 0.75(2.667)+ (0)- (2.667)= -2.67 for decision 1 -1 + 0.5 (2.667)+ (0)- (2.667)= -2.33 for decision 2 State 1: -4 + 0.50(2.667)+ (0)- (0)= -2.67 for decision 1 -3 + 0.25(2.667)+ (0)- (0)= -2.33 for decision 2 The minimum for both states is achieved by using decision 1 (don't advertise).Since this policy is identical to the preceding policy (the initial policy),it must be an optimal policy.Optimal Policy: d0(R2)= 1 d1(R2)= 1 g(R1)= -2.67 v0(R1)= 2.667 v1(R1)= 0