SketchXAI

SketchXAI: A First Look at Explainability for Human Sketches

Zhiyu Qu^{1, 3}, Yulia Gryaditskaya¹, Ke Li^{1, 2}, Kaiyue Pang¹, Tao Xiang^{1, 3}, Yi-Zhe Song^{1, 3}

¹SketchX, CVSSP, University of Surrey
²School of Artificial Intelligence, Beijing University of Posts and Telecommunications
³iFlyTek-Surrey Joint Research Centre on Artificial Intelligence

Accepted by CVPR 2023

[Paper] [Code] [Model] [Dataset]

Explainability, but for human sketches. We demonstrate a new methodology for explaining AI decisions on human sketch data. Instead of one static explanation per instance as in existing works, our proposed method supports generating infinitely many explanation paths with each dynamically showcasing the inner working of an AI classifier. This enables infinite varieties of explanation paths and allows humans to enjoy a wider coverage on how AI functions, and therefore better scrutinise AI.

Abstract

This paper, for the very first time, introduces human sketches to the landscape of XAI. We argue that, sketch as a "human-centred" data form, represents a natural interface to study explainability. We focus on cultivating sketch-specific explainability designs. This starts by identifying strokes as a unique building block that offers a degree of flexibility in object construction and manipulation impossible in photos. Following this, we design a simple explainability-friendly sketch encoder that accommodates intrinsic properties of strokes: shape, location, and order. We then move on to define the first ever XAI task for sketch, that of stroke location inversion SLI. Just as we have heat maps for photos, and correlation matrices for text, SLI offers an explainability angle to sketch in terms of asking a network how well it can recover stroke locations of an unseen sketch. We offer qualitative results for readers to interpret, in the form of snapshots of the SLI process in the paper and videos here. A minor but interesting note is that, thanks to its sketch-specific design, our sketch encoder also yields the best sketch recognition accuracy to date, while having the smallest number of parameters.

Method

We build a classifier a sketch classifier upon stroke vectors rather than raster pixels. All strokes are decomposed into three parts -- order, shape and location. We use a bidirectional LSTM, a linear model and a learnable time embedding matrix to encode such decomposed stroke representation respectively. The dashed line refers to the gradient flow of the location parameters when we generate explanations by SLI with a trained classifier.

BibTeX

@inproceedings{qu2023sketchxai,
  title={SketchXAI: A First Look at Explainability for Human Sketches},
  author={Qu, Zhiyu and Gryaditskaya, Yulia and Li, Ke and Pang, Kaiyue and Xiang, Tao and Song, Yi-Zhe},
  booktitle={CVPR},
  year={2023}
}

Visualisations

NOTE: animations playing at 0.05x speed for the first few iterations for better visualisations.

Recovery of SLI