یادگیری تقویتی چندعاملی مشارکتی در محیط‌های پویا بر اساس انتقال دانش برای مسأله گله‌داری

نیک انجام, امین; عبدوس, منیره; مهدوی مقدم, ماهنوش

doi:10.52547/joc.14.4.55

دوره 14، شماره 4 - ( مجله کنترل، جلد 14، شماره 4، زمستان 1399 ) جلد 14 شماره 4,1399 صفحات 66-55 | برگشت به فهرست نسخه ها

‎ 10.52547/joc.14.4.55

‎ 20.1001.1.20088345.1399.14.4.6.0

Mendeley

Zotero

RefWorks

Nikanjam A, Abdoos M, Mahdavi Moghadam M. Collaborative Multi-Agent Reinforcement Learning in Dynamic Environments using Knowledge Transfer for Herding Problem. JoC 2021; 14 (4) :55-66
URL: http://joc.kntu.ac.ir/article-1-642-fa.html

نیک انجام امین، عبدوس منیره، مهدوی مقدم ماهنوش. یادگیری تقویتی چندعاملی مشارکتی در محیط‌های پویا بر اساس انتقال دانش برای مسأله گله‌داری. مجله کنترل. 1399; 14 (4) :55-66

URL: http://joc.kntu.ac.ir/article-1-642-fa.html

یادگیری تقویتی چندعاملی مشارکتی در محیط‌های پویا بر اساس انتقال دانش برای مسأله گله‌داری

امین نیک انجام^*¹

، منیره عبدوس²

، ماهنوش مهدوی مقدم¹

1- گروه هوش مصنوعی،دانشکده‌ی مهندسی کامپیوتر،دانشگاه صنعتی خواجه نصیرالدین طوسی، تهران، ایران
2- گروه هوش مصنوعی، رباتیک و رایانش شناختی،دانشکده‌ی مهندسی و علوم کامپیوتر،دانشگاه شهید بهشتی،تهران، ایران

چکیده: (9245 مشاهده)

امروزه، برای حل بسیاری از مسائل، از سیستمهای چندعاملی مشارکتی استفاده میشود که در آن گروهی از عاملها برای رسیدن به یک هدف مشترک همکاری می‌کنند. همکاری میان عاملها، فوایدی همچون کاهش هزینههای عملیاتی، مقیاسپذیری بالا و سازگاری قابل‌توجه را به ارمغان خواهد آورد. برای آموزش این عاملها در رسیدن به یک سیاست بهینه، از یادگیری تقویتی بهره میجویند. یادگیری در محیطهای چندعاملی مشارکتی پویا، غیرقطعی و با اندازه فضای حالت بزرگ به یک چالش بسیار مهم در برنامههای کاربردی تبدیل‌شده است. ازجمله این چالشها می‌توان به تأثیر اندازه فضای حالت بر مدت زمان یادگیری و همچنین همکاری ناکارآمد میان عاملها و عدم وجود هماهنگی مناسب در تصمیم‌گیری عاملها اشاره کرد. همچنین هنگام استفاده از الگوریتمهای یادگیری تقویتی نیز با چالشهایی نظیر دشواری تعیین هدف یادگیری مناسب و زمان طولانی همگرایی ناشی از یادگیری مبتنی بر آزمایش و خطا مواجه خواهیم بود. در این مقاله، با معرفی یک چارچوب ارتباطی برای سیستمهای چندعاملی مشارکتی، تلاش شده چالشهای فوق تا حدی برطرف شود. در راستای حل مشکلات مربوط به همگرایی، انتقال دانش به کار برده شده است که می‌تواند به شکل قابل‌توجهی در افزایش کارایی الگوریتم‌های یادگیری تقویتی موثر واقع شود. همکاری میان عامل‌ها با استفاده از عامل سرگروه و هماهنگی میان آنان توسط یک عامل هماهنگ‌کننده صورت می‌پذیرد. چارچوب پیشنهادی برای حل مسأله گله‌داری به کار رفته است و نتایج تجربی افزایش کارایی عامل‌ها را نشان می‌دهند.

واژه‌های کلیدی: سیستم‌های چندعامله مشارکتی، یادگیری تقویتی، انتقال دانش، مساله گله‌داری

متن کامل [PDF 934 kb] (3570 دریافت)

نوع مطالعه: پژوهشي | موضوع مقاله: تخصصي
دریافت: 1397/10/30 | پذیرش: 1398/10/5 | انتشار الکترونیک پیش از انتشار نهایی: 1399/7/14 | انتشار: 1399/12/1

فهرست منابع

1. [1] Glavic, M., "Agents and multi-agent systems: A short introduction for power engineers", University of Liege-Electrical engineering and computer science department, 2006.

2. [2] Celiberto Jr, Luiz A., Jackson P. Matsuura, Ramón López De Màntaras, and Reinaldo AC Bianchi. "Using transfer learning to speed-up reinforcement learning: a cased-based approach." In Robotics Symposium and Intelligent Robotic Meeting (LARS), 2010 Latin American, pp. 55-60. IEEE, 2010.

3. [3] Taylor, Matthew E., and Peter Stone. "Transfer learning for reinforcement learning domains: A survey" Journal of Machine Learning Research, 10, 1633-1685, 2009.

4. [4] Wu, Jun, Xin Xu, Pengcheng Zhang, and Chunming Liu. "A novel multi-agent reinforcement learning approach for job scheduling in Grid computing." Future Generation Computer Systems, 27(5), 430-439, 2011. [DOI:10.1016/j.future.2010.10.009]

5. [5] Khamis, Mohamed A., and Walid Gomaa. "Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework." Engineering Applications of Artificial Intelligence, 29, 134-151, 2014. [DOI:10.1016/j.engappai.2014.01.007]

6. [6] Kachroo, Pushkin, Samy A. Shedied, John S. Bay, and Hugh Vanlandingham. "Dynamic programming solution for a class of pursuit evasion problems: the herding problem." IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 31(1), 35-41, 2001. [DOI:10.1109/5326.923266]

7. [7] Bayazit, O. Burchan, Jyh-Ming Lien, and Nancy M. Amato. "Better group behaviors using rule-based roadmaps." In Algorithmic Foundations of Robotics V, pp. 95-111. Springer, Berlin, Heidelberg, 2004. [DOI:10.1007/978-3-540-45058-0_7]

8. [8] Lien, Jyh-Ming, O. Burchan Bayazit, Ross T. Sowell, Samuel Rodriguez, and Nancy M. Amato. "Shepherding behaviors." In IEEE International Conference on Robotics and Automation, vol. 4, pp. 4159-4164. IEEE, 2004.

9. [9] Lien, Jyh-Ming, Samuel Rodriguez, Jean-Phillipe Malric, and Nancy M. Amato. "Shepherding behaviors with multiple shepherds." In Proceedings of IEEE International Conference on Robotics and Automation (ICRA 2005), pp. 3402-3407. IEEE, 2005.

10. [10] Lien, Jyh-Ming, and Emlyn Pratt. "Interactive Planning for Shepherd Motion." In AAAI Spring Symposium: Agents that Learn from Human Teachers, pp. 95-102. 2009.

11. [11] Cowling, Peter I., and Christian Gmeinwieser. "AI for Herding Sheep." In Proceedings of the Sixth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE 2010), pages 2-7, 2010.

12. [12] Yadav, Nitin, Chenguang Zhou, Sebastian Sardina, and Ralph Rönnquist. "A BDI agent system for the cow herding domain." Annals of mathematics and artificial intelligence, 59(3-4), 313-333, 2010. [DOI:10.1007/s10472-010-9182-1]

13. [13] Dow, Steven, Anand Kulkarni, Scott Klemmer, and Björn Hartmann. "Shepherding the crowd yields better work." In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, pp. 1013-1022. ACM, 2012. [DOI:10.1145/2145204.2145355]

14. [14] Strömbom, Daniel. "Attraction based models of collective motion." PhD dissertation, Uppsala university, Department of Mathematics, 2013.

15. [15] Strömbom, Daniel, Richard P. Mann, Alan M. Wilson, Stephen Hailes, A. Jennifer Morton, David JT Sumpter, and Andrew J. King. "Solving the shepherding problem: heuristics for herding autonomous, interacting agents." Journal of the royal society interface, 11, 2014. [DOI:10.1098/rsif.2014.0719]

16. [16] Licitra, Ryan A., Zachary D. Hutcheson, Emily A. Doucette, and Warren E. Dixon. "Single agent herding of n-agents: A switched systems approach." IFAC-PapersOnLine, 50(1), 14374-14379, 2017. [DOI:10.1016/j.ifacol.2017.08.2020]

17. [17] https://multiagentcontest.org/2008/protocol.pdf, (last access on September 2018)

18. [18] Parker, Lynne E., Balajee Kannan, Xiaoquan Fu, and Yifan Tang. "Heterogeneous mobile sensor net deployment using robot herding and line-of-sight formations." In Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003), vol. 3, pp. 2488-2493. IEEE, 2003.

19. [19] Strumberger, Ivana, Nebojsa Bacanin, Slavisa Tomic, Marko Beko, and Milan Tuba. "Static drone placement by elephant herding optimization algorithm." In 2017 25th Telecommunication Forum (Telfor), pp. 1-4. IEEE, 2017. [DOI:10.1109/TELFOR.2017.8249469]

20. [20] Stathopoulos, Thanos, Lewis Girod, John Heidemann, and Deborah Estrin. "Mote herding for tiered wireless sensor networks.", Technical Report No. 58, Center for Embedded Networked Computing, University of California, Los Angeles, 2005.

ارسال پیام به نویسنده مسئول

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

کلیه حقوق این وب سایت متعلق به مجله کنترل می باشد.

طراحی و برنامه نویسی : یکتاوب افزار شرق

Designed & Developed by : Yektaweb

پایگاه های مرتبط

کلمات کلیدی