Reinforcement Learning: A review

Moussaoui, Hanae; El akkad, Nabil; Benslimane, Mohamed

doi:http://dx.doi.org/10.12785/ijcds/1301118

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Volume 13
→
Issue 01
→
View Item

dc.contributor.author	Moussaoui, Hanae
dc.contributor.author	El akkad, Nabil
dc.contributor.author	Benslimane, Mohamed
dc.date.accessioned	2023-05-07T08:51:33Z
dc.date.available	2023-05-07T08:51:33Z
dc.date.issued	2023-05-07
dc.identifier.issn	2210-142X
dc.identifier.uri	https://journal.uob.edu.bh:443/handle/123456789/4943
dc.description.abstract	Reinforcement learning is considered a sort of machine learning that acquires knowledge of solving problems using the trial-and-error technique. The process starts with the main actor that is the agent interacting with a given environment and attempting to achieve a multi-step goal within this environment. Take the example of a self-driving car trying to drive on real roads, where its main goal is to drive the owner from a given point A to a specific point B while avoiding obstacles. The environment is characterized by a state that the agent detects and examines. The state might include for example the car's location, the condition of the road, and the location of other vehicles. On the other hand, due to the agent's several actions, the environment's state changes according to these modifications. Eventually, and at this stage, the agent gets reward signals as it proceeds nearer to its goal. The agent uses these rewards signals to determine which actions were successful and which actions were not. We repeat this state-action and reward loop until the agent learns how to operate effectively within the environment using the trial-and-error concept. The agent's main objective is to learn how to always choose the right action given any state of the environment that leads it closer to its goal. In this paper, we gathered all the methods used in the literature. Multi-armed bandits, the Markov decision process, dynamic programming, Monte Carlo methods, and temporal-difference learning are some of the corresponding methods used to solve reinforcement learning problems.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Bahrain	en_US
dc.subject	Reinforcement learning; multi-armed bandits; Markov decision process; dynamic programming; Monte Carlo methods; Deep reinforcement learning	en_US
dc.title	Reinforcement Learning: A review	en_US
dc.identifier.doi	http://dx.doi.org/10.12785/ijcds/1301118	en
dc.volume	13	en_US
dc.issue	1	en_US
dc.pagestart	1	en_US
dc.pageend	1	en_US
dc.contributor.authorcountry	Morocco	en_US
dc.contributor.authoraffiliation	Sidi Mohamed Ben Abdellah University Fez - ENSA	en_US
dc.contributor.authoraffiliation	ENSA of Fez, Sidi Mohamed Ben Abdellah University	en_US
dc.contributor.authoraffiliation	EST of Fez, Sidi Mohamed Ben Abdellah University	en_US
dc.source.title	International Journal of Computing and Digital Systems	en_US
dc.abbreviatedsourcetitle	IJCDS	en_US