Stay-Positive: A Case for Ignoring Real Image Features in Fake Image Detections

Abstract

Detecting AI generated images is a challenging yet essential task. A primary difficulty arises from the detectors tendency to rely on spurious patterns, such as compression artifacts, which can influence its decisions. These issues often stem from specific patterns that the detector associates with the real data distribution, making it difficult to isolate the actual generative traces. We argue that an image should be classified as fake if and only if it contains artifacts introduced by the generative model. Based on this premise, we propose Stay Positive, an algorithm designed to constrain the detectors focus to generative artifacts while disregarding those associated with real data. Experimental results demonstrate that detectors trained with Stay Positive exhibit reduced susceptibility to spurious correlations, leading to improved generalization and robustness to post processing. Additionally, unlike detectors that associate artifacts with real images, those that focus purely on fake artifacts are better at detecting inpainted real images.

Key Insight

Consider this toy example. We have three images: a real one on the left, an LDM-generated one in the middle, and a FLUX-generated one on the right. During training, we're only exposed to the first two — real and LDM-generated. So it seems reasonable to assume that real images tend to have clean, readable text, and the generated ones don’t.

But at test time, we introduce FLUX, which is also a latent diffusion model, just a more advanced one. And now, the generated images start to look a lot more like the real ones, especially in how they render text.

That earlier assumption — that good text means the image is real — breaks down. What we found is that relying on such assumptions can actually hurt the detector’s performance. It can get confused, even when there are still other clear signs that the image was generated.

A Simple Fix

To address the issue, we make three key assumptions:

Assumption 1: Class 0 denotes the real distribution, and Class 1 denotes the distribution of AI-generated images.
Assumption 2: The score output by the network passes through the sigmoid activation before the decision.
Assumption 3: The feature vector extracted by the neural network before applying the linear classification layer is passed through the ReLU activation.

These three assumptions can be easily satisfied by popular modern networks, including ResNet-50. These networks are commonly used for binary classification tasks and already incorporate such mechanisms in their design.

The first assumption we make is about the class labels: class 0 for real images and class 1 for fake images. This is a typical setup for binary classification. Next, we use a monotonically-increasing sigmoid activation function (Assumption-2), which means the detector will give higher scores to AI-generated images compared to real ones.

In addition, according to Assumption-3, the input to the linear classifier is a vector with strictly non-negative values. Each dimension of this vector represents a specific feature or pattern that influences the decision the neural network makes.

The network assigns a weight to each feature. If the weight is positive, the presence of that feature will increase the score, making it more likely that the image will be classified as fake. Conversely, if the weight is negative, the image is more likely to be classified as real.

The key insight here is that finding certain features should never increase the likelihood of an image being real. To address this, we retrain only the final classification layer, keeping the rest of the network frozen. We ensure that the weights "stay-positive" to stop patterns associated with real images from influencing the decision.

Improved Generalization to other Generators

We observe that existing detectors underperform on images coming from different generators since they rely on spurious real features (Section 3.2). The plot shows that strong detectors (Rajan, Corvi) struggle to detect images generated by FLUX and aMUSEd. However, after applying the Stay-Positive algorithm, they are able to detect these images.

Detecting Partially-Inpainted Images

The figure shows an image of a painting (human-created), which has been partially modified by Stable Diffusion in various regions. Existing detectors often struggle with images where only a small region has been generated, while the rest remains real. This is understandable—since most of the image is authentic, it can dominate the detector’s decision. In contrast, our method remains effective even in these challenging cases, successfully identifying such partially generated images.

Limitations and Future Work

The original detector can still learn spurious features indicative of fake images. In Appendix A.4, we provide an example showing that the absence of a specific feature commonly found in real images can be exploited to identify fake ones. The Stay-Positive algorithm does not fully address this issue. Therefore, future work should aim to mitigate such problems during the initial stage of training.

BibTeX

@inproceedings{
anonymous2025staypositive,
title={Stay-Positive: A Case for Ignoring Real Image Features in Fake Image Detection},
author={Anonymous},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=VNLmfMJi3w}
}