Model robustness in the face of adversaries

Adversarial examples are really, really weird: pictures of penguins that get classified with high certainty by machine learning algorithms as drumsets, or random noise labeled as pandas, or any one of an infinite number of mistakes in labeling data that humans would never make but computers make with joyous abandon. What gives? A compelling new argument makes the case that it’s not the algorithms so much as the features in the datasets that holds the clue. This week’s episode goes through several papers pushing our collective understanding of adversarial examples, and giving us clues to what makes these counterintuitive cases possible.

Relevant links:

Robustness may be at odds with accuracy
Adversarial examples are not bugs, they are features
Distill.pub: A discussion of “Adversarial examples are not bugs, they are features”
How can we fool LIME and SHAP? Adversarial attacks on post hoc explanation methods