Identity concealment games (2024)

research-article

Free Access

Authors:
Mustafa O. Karabag Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, TX, USA

Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, TX, USA
Search about this author

,
Melkior Ornik Department of Aerospace Engineering, University of Illinois Urbana–Champaign, IL, USA

Department of Aerospace Engineering, University of Illinois Urbana–Champaign, IL, USA
Search about this author

,
Uf*ck Topcu Department of Aerospace Engineering and Engineering Mechanics, The University of Texas at Austin, TX, USA

Department of Aerospace Engineering and Engineering Mechanics, The University of Texas at Austin, TX, USA
Search about this author

Automatica (Journal of IFAC)Volume 161Issue CMar 2024https://doi.org/10.1016/j.automatica.2023.111482

Published:16 May 2024Publication History

0citation
0
Downloads

Metrics

Total Citations0Total Downloads0

Last 12 Months0

Last 6 weeks0

Get Citation Alerts
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
Publisher Site

Automatica (Journal of IFAC)

Volume 161, Issue C

PreviousArticleNextArticle

Skip Abstract Section

Abstract

In an adversarial environment, a hostile player performing a task may behave like a non-hostile one in order not to reveal its identity to an opponent. To model such a scenario, we define identity concealment games: zero-sum stochastic reachability games with a zero-sum objective of identity concealment. To measure the identity concealment of the player, we introduce the notion of an average player. The average player’s policy represents the expected behavior of a non-hostile player. We show that there exists an equilibrium policy pair for every identity concealment game and give the optimality equations to synthesize an equilibrium policy pair. If the player’s opponent follows a non-equilibrium policy, the player can hide its identity better. For this reason, we study how the hostile player may learn the opponent’s policy. Since learning via exploration policies would quickly reveal the hostile player’s identity to the opponent, we consider the problem of learning a near-optimal policy for the hostile player using the game runs collected under the average player’s policy. Consequently, we propose an algorithm that provably learns a near-optimal policy and give an upper bound on the number of sample runs to be collected.

References

Baier and Katoen, 2008 Baier C., Katoen J.-P., Principles of model checking, MIT Press, 2008.Google ScholarDigital Library
Başar and Olsder, 1998 Başar T., Olsder G.J., Dynamic noncooperative game theory, SIAM, 1998.Google Scholar
Boularias et al., 2011 Boularias,A., Kober,J., & Peters,J. (2011). Relative entropy inverse reinforcement learning. In Proceedings of the 14th international conference on artificial intelligence and statistics (pp. 182–189).Google Scholar
Chatterjee et al., 2008 Chatterjee K., Henzinger T.A., Piterman N., Algorithms for Büchi games, 2008, arXiv preprint arXiv:0805.2620.Google Scholar
Chen et al., 2013 Chen T., Forejt V., Kwiatkowska M., Simaitis A., Wiltsche C., On stochastic games with multiple objectives, in: Proceedings of the 38th international symposium on mathematical foundations of computer science, Springer, 2013, pp. 266–277.Google Scholar
Cover and Thomas, 2012 Cover T.M., Thomas J.A., Elements of information theory, John Wiley & Sons, 2012.Google Scholar
Farajtabar et al., 2018 Farajtabar M., Chow Y., Ghavamzadeh M., More robust doubly robust off-policy evaluation, in: Proceedings of the 35th international conference on machine learning, PMLR, 2018, pp. 1447–1456.Google Scholar
Fiechter, 1994 Fiechter C.-N., Efficient reinforcement learning, in: Proceedings of the 7th annual conference on computational learning theory, ACM, 1994, pp. 88–97.Google Scholar
Fox et al., 2016 Fox R., Pakman A., Tishby N., Taming the noise in reinforcement learning via soft updates, in: Proceedings of the 32nd conference on uncertainty in artificial intelligence, AUAI Press, 2016, pp. 202–211.Google Scholar
Grau-Moya et al., 2018 Grau-Moya J., Leibfried F., Bou-Ammar H., Balancing two-player stochastic games with soft Q-learning, in: Proceedings of the 27th international joint conference on artificial intelligence, AAAI Press, 2018, pp. 268–274.Google Scholar
Hecker, 2018 Hecker C., SpyParty, 2018, http://www.spyparty.com/. Online.Google Scholar
Hogg et al., 1977 Hogg R.V., Tanis E.A., Zimmerman D.L., Probability and statistical inference, vol. 993, Macmillan New York, 1977.Google Scholar
Karabag et al., 2021a Karabag M.O., Ornik M., Topcu U., Deception in supervisory control, IEEE Transactions on Automatic Control 67 (2) (2021) 738–753.Google Scholar
Karabag et al., 2021b Karabag M.O., Ornik M., Topcu U., Identity concealment games: how I learned to stop revealing and love the coincidences, 2021, arXiv preprint arXiv:2105.05377.Google Scholar
Karabag and Topcu, 2018 Karabag M.O., Topcu U., On the sample complexity of vanilla model-based offline reinforcement learning with dependent samples, in: Proceedings of the 37th AAAI conference on artificial intelligence, AAAI Press, 2018, pp. 8195–8202.Google Scholar
Kardes and Hall, 2005 Kardes E., Hall R., Survey of literature on strategic decision making in the presence of adversaries, 2005.Google Scholar
Kearns and Singh, 2002 Kearns M., Singh S., Near-optimal reinforcement learning in polynomial time, Machine Learning 49 (2–3) (2002) 209–232.Google Scholar
Keren et al., 2016 Keren,S., Gal,A., & Karpas,E. (2016). Privacy Preserving Plans in Partially Observable Environments.. In Proceedings of the 25th international joint conference on artificial intelligence (pp. 3170–3176).Google Scholar
Kidambi et al., 2020 Kidambi R., Rajeswaran A., Netrapalli P., Joachims T., Morel: Model-based offline reinforcement learning, Advances in neural information processing systems, vol. 33, 2020, pp. 21810–21823.Google Scholar
Kulkarni et al., 2019 Kulkarni,A., Srivastava,S., & Kambhampati,S. (2019). A unified framework for planning in adversarial and cooperative environments. In Proceedings of the 33rd AAAI conference on artificial intelligence (pp. 2479–2487).Google Scholar
Levine et al., 2020 Levine S., Kumar A., Tucker G., Fu J., Offline reinforcement learning: Tutorial, review, and perspectives on open problems, 2020, arXiv preprint arXiv:2005.01643.Google Scholar
Macintyre, 2018 Macintyre B., The spy and the traitor: The greatest espionage story of the cold war, Viking, 2018.Google Scholar
Newman, 2015 Newman G., Garry’s Mod Guess Who, 2015, https://gmod.facepunch.com/. Online.Google Scholar
Patek and Bertsekas, 1999 Patek S.D., Bertsekas D.P., Stochastic shortest path games, SIAM Journal on Control and Optimization 37 (3) (1999) 804–824.Google Scholar
Peters et al., 2010 Peters,J., Mulling,K., & Altun,Y. (2010). Relative entropy policy search. In Proceedings of the 24th AAAI conference on artificial intelligence.Google Scholar
Precup et al., 2000 Precup,D., Sutton,R. S., & Singh,S. P. (2000). Eligibility Traces for Off-Policy Policy Evaluation. In Proceedings of the 17th international conference on machine learning (pp. 759–766).Google Scholar
Puterman, 2014 Puterman M.L., Markov decision processes: Discrete stochastic dynamic programming, John Wiley & Sons, 2014.Google Scholar
Ross and Bagnell, 2012 Ross S., Bagnell D., Agnostic system identification for model-based reinforcement learning, in: Proceedings of the 29th international conference on machine learning, PMLR, 2012, pp. 1905–1912.Google Scholar
Schulman et al., 2015 Schulman J., Levine S., Abbeel P., Jordan M., Moritz P., Trust region policy optimization, in: Proceedings of the 31st international conference on machine learning, PMLR, 2015, pp. 1889–1897.Google Scholar
Strehl et al., 2009 Strehl A.L., Li L., Littman M.L., Reinforcement learning in finite MDPs: PAC analysis, Journal of Machine Learning Research 10 (Nov) (2009) 2413–2444.Google Scholar
Strehl and Littman, 2008 Strehl A.L., Littman M.L., An analysis of model-based interval estimation for Markov decision processes, Journal of Computer and System Sciences 74 (8) (2008) 1309–1331.Google Scholar
Sutton and Barto, 2018 Sutton R.S., Barto A.G., Reinforcement learning: An introduction, MIT Press, 2018.Google ScholarDigital Library
Uehara and Sun, 2021 Uehara M., Sun W., Pessimistic model-based offline reinforcement learning under partial coverage, 2021, arXiv preprint arXiv:2107.06226.Google Scholar
Weissman et al., 2003 Weissman T., Ordentlich E., Seroussi G., Verdu S., Weinberger M.J., Inequalities for the L1 deviation of the empirical distribution, Hewlett-Packard Labs, 2003.Google Scholar
Yu et al., 2020 Yu T., Thomas G., Yu L., Ermon S., Zou J.Y., Levine S., Finn C., Ma T., Mopo: Model-based offline policy optimization, Advances in Neural Information Processing Systems 33 (2020) 14129–14142.Google Scholar

Cited By

View all

Recommendations

Skill in Games
Differences in players' skill are important determinants of relative player success in most real games such as poker, chess, basketball, business, and politics. Yet conventional game theory has concentrated primarily on games with no skill differences ...
Read More
Mastering multi-player games
AAMAS '12: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
We consider multi-player games, and the guarantees that a master player that plays on behalf of a set of players can offer them, without making any assumptions on the rationality of the other players. Our model consists of an (n + 1)-player game, with m ...
Read More
Fuzzily Constrained Games
WI-IAT '13: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 02
In real life, games are often played in contexts. So, when a player of a game chooses a strategy, it should consider not only the material or money payoff from taking the strategy and others' best response but also how feasible to take the strategy in ...
Read More

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Information
Contributors

Published in

Automatica (Journal of IFAC) Volume 161, Issue C
Mar 2024
329 pages
ISSN:0005-1098
Issue’s Table of Contents

Elsevier Ltd
Sponsors
In-Cooperation
Publisher
Pergamon Press, Inc.
United States
Publication History
- Published: 16 May 2024
Author Tags
Identity concealment
Deception
Game theory
Markov models
Offline learning
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Bibliometrics
Citations0

Article Metrics
- Total Citations
  View Citations
- Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

Digital Edition

View this article in digital edition.

View Digital Edition

Figures
Other

Close Figure Viewer

Browse AllReturnChange zoom level

Caption

View Issue’s Table of Contents

Identity concealment games (2024)

New Citation Alert added!

New Citation Alert!

Automatica (Journal of IFAC)

Abstract

Abstract

References

Cited By

Recommendations

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Digital Edition

Caption

Export Citations