1- Yazd University 
 2- university of Kerman 
                    
                    
                    Abstract:       (11548 Views)
                    
                    
                    In this paper, we present a novel continuous reinforcement learning approach. The proposed approach, called "Fuzzy Least Squares Policy Iteration (FLSPI)", is obtained from combination of "Least Squares Policy Iteration (LSPI)" and a zero order Takagi Sugeno fuzzy system. We define state-action basis function based on fuzzy system so that LSPI conditions are satisfied. It is proven that there is an error bound for difference of the exact state-action value function and approximated state-action value function obtained by FLSPI. Simulation results show that learning speed and operation quality for FLSPI are higher than two previous critic-only fuzzy reinforcement learning approaches i.e. fuzzy Q-learning and fuzzy Sarsa learning. Another advantage of this approach is needlessness to learning rate determination.
                    
                    
                    
                    
                    
                    Type of Article:  
Research paper |
                    Subject: 
                    
Special  Received: 2014/05/3 | Accepted: 2014/08/30 | Published: 2014/12/11