SketchXAI: A First Look at Explainability for Human Sketches
Zhiyu Qu1, 3Yulia Gryaditskaya1Ke Li1, 2Kaiyue Pang1Tao Xiang1, 3Yi-Zhe Song1, 3
1SketchX, CVSSP, University of Surrey
2School of Artificial Intelligence, Beijing University of Posts and Telecommunications
3iFlyTek-Surrey Joint Research Centre on Artificial Intelligence

Accepted by CVPR 2023

Explainability, but for human sketches. We demonstrate a new methodology for explaining AI decisions on human sketch data. Instead of one static explanation per instance as in existing works, our proposed method supports generating infinitely many explanation paths with each dynamically showcasing the inner working of an AI classifier. This enables infinite varieties of explanation paths and allows humans to enjoy a wider coverage on how AI functions, and therefore better scrutinise AI.
This paper, for the very first time, introduces human sketches to the landscape of XAI. We argue that, sketch as a "human-centred" data form, represents a natural interface to study explainability. We focus on cultivating sketch-specific explainability designs. This starts by identifying strokes as a unique building block that offers a degree of flexibility in object construction and manipulation impossible in photos. Following this, we design a simple explainability-friendly sketch encoder that accommodates intrinsic properties of strokes: shape, location, and order. We then move on to define the first ever XAI task for sketch, that of stroke location inversion SLI. Just as we have heat maps for photos, and correlation matrices for text, SLI offers an explainability angle to sketch in terms of asking a network how well it can recover stroke locations of an unseen sketch. We offer qualitative results for readers to interpret, in the form of snapshots of the SLI process in the paper and videos here. A minor but interesting note is that, thanks to its sketch-specific design, our sketch encoder also yields the best sketch recognition accuracy to date, while having the smallest number of parameters.

We build a classifier a sketch classifier upon stroke vectors rather than raster pixels. All strokes are decomposed into three parts -- order, shape and location. We use a bidirectional LSTM, a linear model and a learnable time embedding matrix to encode such decomposed stroke representation respectively. The dashed line refers to the gradient flow of the location parameters when we generate explanations by SLI with a trained classifier.

  title={SketchXAI: A First Look at Explainability for Human Sketches},
  author={Qu, Zhiyu and Gryaditskaya, Yulia and Li, Ke and Pang, Kaiyue and Xiang, Tao and Song, Yi-Zhe},
NOTE: animations playing at 0.05x speed for the first few iterations for better visualisations.

Recovery of SLI

Transfer of SLI

chair -> broom

sun -> apple

bicycle -> camera

car -> bicycle

tree -> cloud

airplane -> bed

book -> pants

sun -> spider

bicycle -> eyeglasses

car -> face

sun -> spider

apple -> clock

sun -> spider

cell_phone -> book

spider -> bicycle

bicycle -> spider

face -> clock

bicycle -> cell_phone

sun -> spider

flower -> cloud