Warning: This post contains abstract depictions of nudity and may be unsuitable for the workplace
Yahoo's recently open sourced neural network, open_nsfw
, is a fine tuned Residual Network which scores images on a scale of to on its suitability for use in the workplace. In the documentation, Yahoo notes
Defining NSFW material is subjective and the task of identifying these images is non-trivial. Moreover, what may be objectionable in one context can be suitable in another.
What makes an image NSFW, according to Yahoo? I explore this question with a clever new visualization technique by Nguyen et al.. Like Google's Deep Dream, this visualization trick works by maximally activating certain neurons of the classifier. Unlike deep dream, we optimize these activations by performing descent on a parameterization of the manifold of natural images. This parametrization takes the form of a Generative Network, , trained adversarially on an unrelated dataset of natural images.
The "space of natural images", according to , look mostly like abstract art. Unsurprisingly, these random pictures, lacking any kind of semantics, have low scores on the classifier.
Following Nguyen et al., we perform projected gradient descent on the following problem
to obtain the maximal activation for . Not surprisingly, the results of the optimization are clearly pornographic.
D(x) = 1 | D(x) = 1 | D(x) = 1 | D(x) = 1 | D(x) = 1 |
D(x) = 1 | D(x) = 1 | D(x) = 1 | D(x) = 1 | D(x) = 1 |
D(x) = 1 | D(x) = 1 | D(x) = 1 | D(x) = 1 | D(x) = 1 |
D(x) = 1 | D(x) = 1 | D(x) = 1 | D(x) = 1 | D(x) = 1 |
On there other end of the spectrum, optimizing for SFW images seem redundant, as you might just expect it to be the absence of NSFW content. If this were the case, one would expect to observe most images scoring close to . This is only kinda true. Random images generally score between . This is small, but not . We will try to push this down further by descending on
in the exact same way as above.
Images which maximize the this score all have a distinct pastoral quality - depictions of hills, streams and generally pleasant scenery. This is likely an artifact of the negative examples used in the training set.
Lets take this even further by stripping a layer off this network. The final score, , is in fact calculated from the relative strength of two independent neurons, a "" neuron, and a "" neuron. This explains the phenomena above, as the neuron gets excited on the sight of rolling hills and running brooks, and the excitations of correlate with, well, pornography. The classifier takes in both these expert opinions, and combines them democratically by the softmax,
to get the final score. Since most pornography does not take place with a Thomas Kinkade painting in the background, so this is a fair heuristic for most real world problems. But what happens if we try to excite both neurons simultaneously? This amounts to minimizing
Surprisingly, from my experiments, for and , the relative strength of still dominates. However, there is enough of a contribution of to produce images of a very different flavor.
D(x) = 1 | D(x) = 1 | D(x) = 1 | D(x) = 1 | D(x) = 1 |
D(x) = 1 | D(x) = 1 | D(x) = 1 | D(x) = 1 | D(x) = 1 |
D(x) = 1 | D(x) = 1 | D(x) = 1 | D(x) = 1 | D(x) = 1 |
D(x) = 1 | D(x) = 1 | D(x) = 1 | D(x) = 0.009 | D(x) = 0.03 |
Spurred on by the success above, I explore the possibility of the generation of images for which activations span two different networks. Nguyen et al. has achieved great results on the MIT scene recognition model places-CNN
. What happens when we maximize neurons of places-CNN
and open_nsfw
together?
We will refer to the places-CNN
classifier's belief that an image belongs to category as . These categories are one of possible labels, such as "marketplace" or "abbey". We perform descent on this linear combination of the two objectives:
(The above equation isn't strictly correct, and needs one more tweak for this to work. For details of the optimization, I refer you to the code)
This program produces the most remarkable results. The images generated range from the garishly explicit to the subtle. But the subtle images are the most fascinating as to my surprise they are only seemingly innocent. These are not adversarial examples per-say. The NSFW elements are all present, just hidden in plain sight. Once you see the true nature of these images, something clicks and it becomes impossible to unsee. I've picked a few of my favorite results for show here.
The generative capacity of convolutional neural nets are, quite simply, remarkable.
If you liked this project, say hi here. And you can view my badly commented code for the second part of this project here. You will need this library, and of course, open_nsfw to run it. I trust you'll figure the rest out.
If you really want to, you can follow me on twitter.
This is my website