research-article Free Access
- Authors:
- Mustafa O. Karabag Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, TX, USA
Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, TX, USA
Search about this author
- Melkior Ornik Department of Aerospace Engineering, University of Illinois Urbana–Champaign, IL, USA
Department of Aerospace Engineering, University of Illinois Urbana–Champaign, IL, USA
Search about this author
- Uf*ck Topcu Department of Aerospace Engineering and Engineering Mechanics, The University of Texas at Austin, TX, USA
Department of Aerospace Engineering and Engineering Mechanics, The University of Texas at Austin, TX, USA
Search about this author
Automatica (Journal of IFAC)Volume 161Issue CMar 2024https://doi.org/10.1016/j.automatica.2023.111482
Published:16 May 2024Publication History
- 0citation
- 0
- Downloads
Metrics
Total Citations0Total Downloads0Last 12 Months0
Last 6 weeks0
- Get Citation Alerts
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
- Publisher Site
Automatica (Journal of IFAC)
Volume 161, Issue C
PreviousArticleNextArticle
Abstract
Abstract
In an adversarial environment, a hostile player performing a task may behave like a non-hostile one in order not to reveal its identity to an opponent. To model such a scenario, we define identity concealment games: zero-sum stochastic reachability games with a zero-sum objective of identity concealment. To measure the identity concealment of the player, we introduce the notion of an average player. The average player’s policy represents the expected behavior of a non-hostile player. We show that there exists an equilibrium policy pair for every identity concealment game and give the optimality equations to synthesize an equilibrium policy pair. If the player’s opponent follows a non-equilibrium policy, the player can hide its identity better. For this reason, we study how the hostile player may learn the opponent’s policy. Since learning via exploration policies would quickly reveal the hostile player’s identity to the opponent, we consider the problem of learning a near-optimal policy for the hostile player using the game runs collected under the average player’s policy. Consequently, we propose an algorithm that provably learns a near-optimal policy and give an upper bound on the number of sample runs to be collected.
References
- Baier and Katoen, 2008 Baier C., Katoen J.-P., Principles of model checking, MIT Press, 2008.Google ScholarDigital Library
- Başar and Olsder, 1998 Başar T., Olsder G.J., Dynamic noncooperative game theory, SIAM, 1998.Google Scholar
- Boularias et al., 2011 Boularias,A., Kober,J., & Peters,J. (2011). Relative entropy inverse reinforcement learning. In Proceedings of the 14th international conference on artificial intelligence and statistics (pp. 182–189).Google Scholar
- Chatterjee et al., 2008 Chatterjee K., Henzinger T.A., Piterman N., Algorithms for Büchi games, 2008, arXiv preprint arXiv:0805.2620.Google Scholar
- Chen et al., 2013 Chen T., Forejt V., Kwiatkowska M., Simaitis A., Wiltsche C.,
On stochastic games with multiple objectives , in: Proceedings of the 38th international symposium on mathematical foundations of computer science, Springer, 2013, pp. 266–277.Google Scholar - Cover and Thomas, 2012 Cover T.M., Thomas J.A., Elements of information theory, John Wiley & Sons, 2012.Google Scholar
- Farajtabar et al., 2018 Farajtabar M., Chow Y., Ghavamzadeh M.,
More robust doubly robust off-policy evaluation , in: Proceedings of the 35th international conference on machine learning, PMLR, 2018, pp. 1447–1456.Google Scholar - Fiechter, 1994 Fiechter C.-N.,
Efficient reinforcement learning , in: Proceedings of the 7th annual conference on computational learning theory, ACM, 1994, pp. 88–97.Google Scholar - Fox et al., 2016 Fox R., Pakman A., Tishby N.,
Taming the noise in reinforcement learning via soft updates , in: Proceedings of the 32nd conference on uncertainty in artificial intelligence, AUAI Press, 2016, pp. 202–211.Google Scholar - Grau-Moya et al., 2018 Grau-Moya J., Leibfried F., Bou-Ammar H.,
Balancing two-player stochastic games with soft Q-learning , in: Proceedings of the 27th international joint conference on artificial intelligence, AAAI Press, 2018, pp. 268–274.Google Scholar - Hecker, 2018 Hecker C., SpyParty, 2018, http://www.spyparty.com/. Online.Google Scholar
- Hogg et al., 1977 Hogg R.V., Tanis E.A., Zimmerman D.L., Probability and statistical inference, vol. 993, Macmillan New York, 1977.Google Scholar
- Karabag et al., 2021a Karabag M.O., Ornik M., Topcu U., Deception in supervisory control, IEEE Transactions on Automatic Control 67 (2) (2021) 738–753.Google Scholar
- Karabag et al., 2021b Karabag M.O., Ornik M., Topcu U., Identity concealment games: how I learned to stop revealing and love the coincidences, 2021, arXiv preprint arXiv:2105.05377.Google Scholar
- Karabag and Topcu, 2018 Karabag M.O., Topcu U.,
On the sample complexity of vanilla model-based offline reinforcement learning with dependent samples , in: Proceedings of the 37th AAAI conference on artificial intelligence, AAAI Press, 2018, pp. 8195–8202.Google Scholar - Kardes and Hall, 2005 Kardes E., Hall R., Survey of literature on strategic decision making in the presence of adversaries, 2005.Google Scholar
- Kearns and Singh, 2002 Kearns M., Singh S., Near-optimal reinforcement learning in polynomial time, Machine Learning 49 (2–3) (2002) 209–232.Google Scholar
- Keren et al., 2016 Keren,S., Gal,A., & Karpas,E. (2016). Privacy Preserving Plans in Partially Observable Environments.. In Proceedings of the 25th international joint conference on artificial intelligence (pp. 3170–3176).Google Scholar
- Kidambi et al., 2020 Kidambi R., Rajeswaran A., Netrapalli P., Joachims T.,
Morel: Model-based offline reinforcement learning ,Advances in neural information processing systems , vol. 33, 2020, pp. 21810–21823.Google Scholar - Kulkarni et al., 2019 Kulkarni,A., Srivastava,S., & Kambhampati,S. (2019). A unified framework for planning in adversarial and cooperative environments. In Proceedings of the 33rd AAAI conference on artificial intelligence (pp. 2479–2487).Google Scholar
- Levine et al., 2020 Levine S., Kumar A., Tucker G., Fu J., Offline reinforcement learning: Tutorial, review, and perspectives on open problems, 2020, arXiv preprint arXiv:2005.01643.Google Scholar
- Macintyre, 2018 Macintyre B., The spy and the traitor: The greatest espionage story of the cold war, Viking, 2018.Google Scholar
- Newman, 2015 Newman G., Garry’s Mod Guess Who, 2015, https://gmod.facepunch.com/. Online.Google Scholar
- Patek and Bertsekas, 1999 Patek S.D., Bertsekas D.P., Stochastic shortest path games, SIAM Journal on Control and Optimization 37 (3) (1999) 804–824.Google Scholar
- Peters et al., 2010 Peters,J., Mulling,K., & Altun,Y. (2010). Relative entropy policy search. In Proceedings of the 24th AAAI conference on artificial intelligence.Google Scholar
- Precup et al., 2000 Precup,D., Sutton,R. S., & Singh,S. P. (2000). Eligibility Traces for Off-Policy Policy Evaluation. In Proceedings of the 17th international conference on machine learning (pp. 759–766).Google Scholar
- Puterman, 2014 Puterman M.L., Markov decision processes: Discrete stochastic dynamic programming, John Wiley & Sons, 2014.Google Scholar
- Ross and Bagnell, 2012 Ross S., Bagnell D.,
Agnostic system identification for model-based reinforcement learning , in: Proceedings of the 29th international conference on machine learning, PMLR, 2012, pp. 1905–1912.Google Scholar - Schulman et al., 2015 Schulman J., Levine S., Abbeel P., Jordan M., Moritz P.,
Trust region policy optimization , in: Proceedings of the 31st international conference on machine learning, PMLR, 2015, pp. 1889–1897.Google Scholar - Strehl et al., 2009 Strehl A.L., Li L., Littman M.L., Reinforcement learning in finite MDPs: PAC analysis, Journal of Machine Learning Research 10 (Nov) (2009) 2413–2444.Google Scholar
- Strehl and Littman, 2008 Strehl A.L., Littman M.L., An analysis of model-based interval estimation for Markov decision processes, Journal of Computer and System Sciences 74 (8) (2008) 1309–1331.Google Scholar
- Sutton and Barto, 2018 Sutton R.S., Barto A.G., Reinforcement learning: An introduction, MIT Press, 2018.Google ScholarDigital Library
- Uehara and Sun, 2021 Uehara M., Sun W., Pessimistic model-based offline reinforcement learning under partial coverage, 2021, arXiv preprint arXiv:2107.06226.Google Scholar
- Weissman et al., 2003 Weissman T., Ordentlich E., Seroussi G., Verdu S., Weinberger M.J., Inequalities for the L1 deviation of the empirical distribution, Hewlett-Packard Labs, 2003.Google Scholar
- Yu et al., 2020 Yu T., Thomas G., Yu L., Ermon S., Zou J.Y., Levine S., Finn C., Ma T., Mopo: Model-based offline policy optimization, Advances in Neural Information Processing Systems 33 (2020) 14129–14142.Google Scholar
Cited By
View all
Recommendations
- Mastering multi-player games
AAMAS '12: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
We consider multi-player games, and the guarantees that a master player that plays on behalf of a set of players can offer them, without making any assumptions on the rationality of the other players. Our model consists of an (n + 1)-player game, with m ...
Read More
- Fuzzily Constrained Games
WI-IAT '13: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 02
In real life, games are often played in contexts. So, when a player of a game chooses a strategy, it should consider not only the material or money payoff from taking the strategy and others' best response but also how feasible to take the strategy in ...
Read More
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in
Full Access
Get this Article
- Information
- Contributors
Published in
Automatica (Journal of IFAC) Volume 161, Issue C
Mar 2024
329 pages
ISSN:0005-1098
Issue’s Table of Contents
Elsevier Ltd
Sponsors
In-Cooperation
Publisher
Pergamon Press, Inc.
United States
Publication History
- Published: 16 May 2024
Author Tags
- Identity concealment
- Deception
- Game theory
- Markov models
- Offline learning
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics
- Bibliometrics
- Citations0
Article Metrics
- View Citations
Total Citations
Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet
Digital Edition
View this article in digital edition.
View Digital Edition
- Figures
- Other
Close Figure Viewer
Browse AllReturn
Caption
View Issue’s Table of Contents