Контрфактуальные объяснения на основе генерации синтетических данных

Юрий А. Зеленков; Елизавета В. Лашкевич

doi:10.17323/2587-814X.2024.3.24.40

Юрий А. Зеленков Высшая школа бизнеса, Национальный исследовательский университет «Высшая школа экономики», Москва, Россия https://orcid.org/0000-0002-2248-1023
Елизавета В. Лашкевич Высшая школа бизнеса, Национальный исследовательский университет «Высшая школа экономики», Москва, Россия https://orcid.org/0000-0002-3241-2291

DOI: https://doi.org/10.17323/2587-814X.2024.3.24.40

Ключевые слова: кредитный скоринг, байесовская сеть, контрфактуальные объяснения, генерация синтетических данных, моделирование мультимодальных распределений

Аннотация

Контрфактуальное объяснение – это генерация для заданного экземпляра множества объектов, которые принадлежат к противоположному классу, но находятся в пространстве признаков максимально близко к объясняемому фактуалу. Известные алгоритмы, решающие эту задачу, как правило, основаны на сложных моделях, требующих большого объема обучающих данных и значительных вычислительных затрат. В данной статье предлагается метод, который включает два этапа. На первом этапе на основе простых статистических моделей (гауссовская копула, последовательная модель на основе условных распределений, байесовская сеть и др.) генерируется синтетическое множество потенциальных контрфактуалов, на втором – производится отбор объектов, удовлетворяющих ограничениям правдоподобия, близости, разнообразия и т.д. Такая организация позволяет сделать процесс прозрачным, управляемым и повторно использовать модели генерации. Эксперименты на трех свободно распространяемых наборах данных показали, что предложенный метод позволяет добиться результатов, как минимум, сравнимых с известными алгоритмами контрфактуальных объяснений, а в ряде случаев их превосходит, особенно на малых наборах данных. Наиболее эффективной моделью генерации при этом является байесовская сеть.

Скачивания

Данные скачивания пока не доступны.

Литература

Samek W., Muller K.-R. (2019) Towards explainable artificial intelligence. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Lecture Notes in Computer Science, vol. 11700, pp. 5–22. https://doi.org/10.1007/978-3-030-28954-6_1

Giuste F., Shi W., Zhu Y., Naren T., Isgut M., Sha Y., Tong L., Gupte M., Wang M.D. (2023) Explainable artificial intelligence methods in combating pandemics: A systematic review. IEEE Reviews in Biomedical Engineering, vol. 16, pp. 5–21. https://doi.org/10.1109/RBME.2022.3185953

Barocas S., Selbst A. D., Raghavan M. (2020) The hidden assumptions behind counterfactual explanations and principal reasons. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT* '20), pp. 80–89. https://doi.org/10.1145/3351095.3372830

Murdoch W.J., Singh C., Kumbier K., Abbasi-Asl R., Yu B. (2019) Definitions, methods, and applications in interpretable machine learning. National Academy of Sciences, vol. 116(44), pp. 22071–22080. https://doi.org/10.1073/pnas.1900654116

Guidotti R. (2022) Counterfactual explanations and how to find them: Literature review and benchmarking. Data Mining and Knowledge Discovery. https://doi.org/10.1007/s10618-022-00831-6

Verma S., Boonsanong V., Hoang M., Hines K. E., Dickerson J. P., Shah C. (2020) Counterfactual explanations and algorithmic recourses for machine learning: A review. arXiv:2010.10596. https://doi.org/10.4550/arxiv.2010.10596

Stepin I., Alonso J.M., Catala A., Pereira-Fariña M. (2021) A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence. IEEE Access, vol. 9, pp. 11974–12001. https://doi.org/10.1109/ACCESS.2021.3051315

Chou Y.L., Moreira C., Bruza P., Ouyang C., Jorge J. (2022) Counterfactuals and causability in explainable artificial intelligence: Theory, algorithms, and applications. Information Fusion, vol. 81, pp. 59–83. https://doi.org/10.1016/j.inffus.2021.11.003

Mishra P. (2022) Practical explainable AI using Python: Artificial Intelligence model explanations using python-based libraries, extensions, and frameworks. Apress.

Pearl J. (2009) Causality: models, reasoning, and inference. 2nd ed. New York: Cambridge University Press.

Cho S.H., Shin K.S. (2023) Feature-weighted counterfactual-based explanation for bankruptcy prediction. Expert Systems with Applications, vol. 216, article 119390. https://doi.org/10.1016/j.eswa.2022.119390

Wang D., Chen Z., Florescu I., Wen B. (2023) A sparsity algorithm for finding optimal counterfactual explanations: Application to corporate credit rating. Research in International Business and Finance, vol. 64, article 101869. https://doi.org/10.1016/j.ribaf.2022.101869

Mertes S., Huber T., Weitz K., Heimerl A., André E. (2022) Ganterfactual—counterfactual explanations for medical non-experts using generative adversarial learning. Frontiers in Artificial Intelligence, vol. 5, article 825565. https://doi.org/10.3389/frai.2022.825565

Fonseca J., Bacao F. (2023) Tabular and latent space synthetic data generation: A literature review. Journal of Big Data, vol. 10(1), article 115. https://doi.org/10.1186/s40537-023-00792-7

Patki N., Wedge R., Veeramachaneni K. (2016) The synthetic data vault. Proceedings of the 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 399–410. https://doi.org/10.1109/DSAA.2016.49

Dankar F., Ibrahim M., Ismail L. (2022) A multi-dimensional evaluation of synthetic data generators. IEEE Access, vol. 10, pp. 11147–11158. https://doi.org/10.1109/ACCESS.2022.3144765

Endres M., Mannarapotta Venugopal A., Tran T.S. (2022) Synthetic data generation: A comparative study. Proceedings of the 26th International Database Engineered Applications Symposium, pp. 94–102. https://doi.org/10.1145/3548785.3548793

Wachter S., Mittelstadt B., Russell C. (2017) Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology (Harvard JOLT), vol. 31, article 841.

Mothilal R.K., Sharma A., Tan C. (2020) Explaining machine learning classifiers through diverse counterfactual explanations. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT* '20), pp. 607–617. https://doi.org/10.1145/3351095.3372850

Karimi A.H., Barthe G., Schölkopf B., Valera I. (2023) A survey of algorithmic recourse: Contrastive explanations and consequential recommendations. ACM Computing Surveys, vol. 55(5), article 95. https://doi.org/10.1145/3527848

Breunig M.M., Kriegel H.-P., Ng R.T., Sander J. (2000) LOF: identifying density-based local outliers. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (ICDM), pp. 93–104. https://doi.org/10.1145/335191.335388

Poyiadzi К., Sokol K., Santos-Rodriguez R., De Bie T., Flach P. (2020) FACE: feasible and actionable counterfactual explanations. Proceedings of the 2020 AAAI/ACM Conference on AI, Ethics, and Society (AIES 2020), pp. 344–350. https://doi.org/10.1145/3351095.3372850

van Looveren A., Klaise J. (2021) Interpretable counterfactual explanations guided by prototypes. Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2021), pp. 650–665. https://doi.org/10.1007/978-3-030-86520-7_40

Aamodt A., Plaza E. (1994) Case-based reasoning: Foundational issues, methodological variations, and system approaches. Artificial Intelligence Communications, vol. 7(1), pp. 39–59.

Keane M.T., Smyth B. (2020) Good counterfactuals and where to find them: A case-based technique for generating counterfactuals for explainable AI (XAI). Proceedings of the 28th International Conference on Case-Based Reasoning Research and Development (ICCBR), pp. 163–178. https://doi.org/10.1007/978-3-030-58342-2_11

Joshi S., Koyejo O., Vijitbenjaronk W., Kim B., Ghosh J. (2019) Towards realistic individual recourse and actionable explanations in black-box decision making systems. arXiv:1907.09615. https://doi.org/10.48550/arXiv.1907.09615

Guyomard V., Fessant F., Bouadi T., Guyet T. (2021) Post-hoc counterfactual generation with supervised autoencoder. Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2021), pp. 105–114. https://doi.org/10.1007/978-3-030-93736-2_10

Downs M., Chu J.L., Yacoby Y., Doshi-Velez F., Pan W. (2020) CRUDS: Counterfactual recourse using disentangled subspaces. Proceedings of the 2020 ICML Workshop on Human Interpretability in Machine Learning (WHI 2020), pp. 1–23.

Pawelczyk M., Broelemann K., Kasneci G. (2020) Learning model-agnostic counterfactual explanations for tabular data. Proceedings of the Web Conference 2020 (WWW'20), pp. 3126–3132. https://doi.org/10.1145/3366423.3380087

Klys J., Snell J., Zemel R. (2018) Learning latent subspaces in variational autoencoders. Advances in Neural Information Processing Systems 31 (NeurIPS 2018).

Hoyer P., Janzing D., Mooij J.M., Peters J., Schölkopf B. (2008) Nonlinear causal discovery with additive noise models. Advances in Neural Information Processing Systems 21 (NIPS 2008).

Peters J., Janzing D., Schölkopf B. (2017) Elements of causal inference: foundations and learning algorithms. MIT press.

Alaa A., van Breugel B., Saveliev E.S., van der Schaar M. (2022) How faithful is your synthetic data? Sample-level metrics for evaluating and auditing generative models. Proceedings of the 39th International Conference on Machine Learning, pp. 290–306.

Ping Р., Stoyanovich J., Howe D. (2017) DataSynthesizer: Privacy-preserving synthetic datasets. Proceedings of the 29th International Conference on Scientific and Statistical Database Management (SSDBM’17). https://doi.org/10.1145/3085504.3091117

Drechsler J., Reiter J.P. (2011) An empirical evaluation of easily implemented nonparametric methods for generating synthetic datasets. Computational Statistics & Data Analysis, vol. 55(12), pp. 3232–3243. https://doi.org/10.1016/j.csda.2011.06.006

Nowok B., Raab G.M., Dibben C. (2016) Synthpop: Bespoke creation of synthetic data in R. Journal of Statistical Software, vol. 74, pp. 1–26. https://doi.org/10.18637/jss.v074.i11

Marin J. (2022) Evaluating synthetically generated data from small sample sizes: An experimental study. arXiv:2211.10760. https://doi.org/10.48550/arXiv.2211.10760

Qian Z., Cebere B.C., van der Schaar M. (2023) Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv:2301.07573. https://doi.org/10.48550/arxiv.2301.07573