Liminal Bugs

Introduction

Ironically, one of the hardest questions you can ask a security practitioner is “What is a bug?” It’s one of those features that has lots of clear positive examples (the program crashing when it is not supposed to), and lots of clear negative examples (the program returning the correct result), but a vast, murky, grey-area in between.

Unfortunately, as a PhD student researching automatic bug-finding techniques, I’m forced to grapple with this question on a daily basis as my research prototype stomps around in this murky swamp looking for bugs.

In doing so, I’ve uncovered an interesting class of bugs: liminal bugs. These bugs show themselves only when components of a system are combined, yet cannot be observed when either component is examined in isolation. They live in the liminal space between the two components.

This post is a brief exploration of a particularly scary liminal bug I found in libjpeg-turbo.

A real life liminal bug (perhaps) · source

Bugs in Library APIs

My current research is about finding bugs in library APIs, specifically C and C++ libraries.

The premise of the research seems straightforward: we want to find bugs caused by valid usage of the public API that could lead to issues (particularly those that could be security-relevant).

Traditionally, fuzzing has focused on just those public APIs that are extremely fuzzable (i.e. take some attacker-controlled input). For example, here is the transform.cc fuzz harness from libjpeg-turbo:

The majority of this harness is basically boilerplate set up to prepare necessary data structures. Then the actual fuzzable data is just passed directly to a few select functions like tj3Transform and tj3DecompressHeader (highlighted lines).In my opinion this harness is actually not super well constructed since it reuses the same data multiple times, but it’s beside the point.

The nice thing is that we know that this harness is using the API correctly (because it’s been manually written and vetted by maintainers), so if we find some data that causes memory corruption or a crash, we are very confident that it’s a real bug.

Improving Coverage

Unfortunately, fuzzers are only as good as their fuzz harnesses. If the fuzz harnesses misses some entrypoint functions, we may be blocked from finding any bugs via that entrypoint. According to fuzz-introspector, there are 10 functions with 0% coverage across all 30 of libjpeg-turbo’s fuzz harnesses.

Furthermore, these harnesses might lock in certain flag/argument combinations that prevent us from actually exercising certain code paths, even if we are already hitting the right entrypoint.

So it would be great if we could automatically generate more fuzz harnesses and expand code coverage.

Enter LLMs

Assuming you’re not afflicted with a phobia of unsound stochastic processes, it may seem reasonable to try to use LLMs to generate more fuzz harnesses. In fact, people have been trying to do this.oss-fuzz-gen, PromptFuzz, PromeFuzz, …

Of course, the problem now is that we introduce the risk that the fuzz harness itself is using the API incorrectly. In these cases, “bugs” found by the fuzzer may in fact be false positives.

In fact, the bulk of my current research is basically trying to address this problem – figuring out how to maximize the flexibility of the fuzzer without introducing false positives through invalid usage, and fixing it if we accidentally do. (Full details and a preprint coming later this month!)

The hard part is that “valid usage” is never formally defined, but rather a vague concept roughly supported by sporadic documentation, examples, and intuition. Thus in practice, it introduces the potential for liminal bugs where library code looks correct on its own, client code looks correct on its own, but when combined, a liminal bug manifests.

Case Study: NOREALLOCosa Brasiliensis (libjpeg-turbo)

Photo by Marcelo Casacuberta · CC BY-SA 3.0

Our bug of interest, NOREALLOCosa Brasiliensis, emerges in libjpeg-turbo’s (deprecated, but still supported) tjTransform function with the usage of the flag TJ_NOREALLOC.

In particular, libjpeg-turbo is a library for manipulating JPEG images. It includes a function called tjTransform (legacy API) that allows you to apply various lossless transformations to a JPEG image.

This testcase sets up a transform context, including a destination buffer (dstBufs[0]) and then invokes tjTransform with a no-op transformation (TJXOP_NONE). Importantly, the destination buffer is too small (100 bytes) to hold the transformed image and tjTransform is invoked with the flag TJFLAG_NOREALLOC.

☠️

The symptoms? Deadly. In the call to tjTransform, the destination buffer is cleanly overflowed with data from the source jpeg image. In cases where this kind of code is deployed where attackers can control the image, it could lie dormant as long as the source image is small enough, and then suddenly result in a very controlled heap overflow.

Documentation

We turn to the documentationThe documentation has since been updated after the issue was reported. to understand if this usage is valid. We see that in general, the way we construct the handle and prepare the transformation structures is valid (not shown). The main question lies in how we construct the destination buffer and what the flag TJFLAG_NOREALLOC is supposed to do.

Specifically, TJFLAG_NOREALLOC is described as follows (emphasis is my own):

📖

Disable JPEG buffer (re)allocation. If passed to one of the JPEG compression or transform functions, this flag will cause those functions to generate an error if the JPEG destination buffer is invalid or too small, rather than attempt to allocate or reallocate that buffer.

Further, looking at the documentation for tjTransform, specifically the part about the destination buffer we see:

📖

(dstBufs is a) pointer to an array of n byte buffers. dstBufs[i] will receive a JPEG image that has been transformed using the parameters in transforms[i]. TurboJPEG has the ability to reallocate the JPEG destination buffer to accommodate the size of the transformed JPEG image. Thus, you can choose to:

pre-allocate the JPEG destination buffer with an arbitrary size using tjAlloc() and let TurboJPEG grow the buffer as needed,
set dstBufs[i] to NULL to tell TurboJPEG to allocate the buffer for you, or
pre-allocate the buffer to a “worst case” size determined by calling tjBufSize() with the transformed or cropped width and height and the level of subsampling used in the destination image (taking into account grayscale conversion and transposition of the width and height.) Under normal circumstances, this should ensure that the buffer never has to be re-allocated. (Setting TJFLAG_NOREALLOC guarantees that it won’t be.) Note, however, that there are some rare cases (such as transforming images with a large amount of embedded Exif or ICC profile data) in which the transformed JPEG image will be larger than the worst-case size, and TJFLAG_NOREALLOC cannot be used in those cases unless the embedded data is discarded using TJXOPT_COPYNONE.

If you choose option 1, then dstSizes[i] should be set to the size of your pre-allocated buffer. In any case, unless you have set TJFLAG_NOREALLOC, you should always check dstBufs[i] upon return from this function, as it may have changed.

Consumer Perspective

As a consumer of the library, we could look at the documentation above and conclude that our usage is valid. Our argument would go something like this:

Providing an undersized destination buffer is something that is allowed (and normal!) according to option 1.
We are following the rule about setting dstSizes to the size of the pre-allocated buffer. (So the library knows how big the buffer actually is)
Normally, tjTransform will reallocate the buffer if it is too small. However, documentation says that TJFLAG_NOREALLOC will prevent the (normal) reallocation of the buffer if it is too small. Instead it will generate an error.
The comment “In any case, unless you have set TJFLAG_NOREALLOC, you should always check dstBufs…” implies that TJFLAG_NOREALLOC is not just restricted to case 3, but also the other two cases.

Based on this, we might expect the following to happen:

Buffer is big enough: the data is copied into the destination buffer and the function returns successfully.
Buffer is too small: the function returns an error (because we are using the flag TJFLAG_NOREALLOC)

No bugs here!

Library Perspective

As a library developer, we could also look at the documentation above and conclude that this usage is not valid according to the documentation. Since this usage is not valid, the library is allowed to do anything it wants (undefined behavior).

Specifically, our argument would go something like this:

There are explicitly three documented ways to use the destination buffer.
Using an undersized buffer and preventing reallocation is not allowed according to option 1 – it states the buffer can be undersized AND we need to let TurboJPEG grow the buffer as needed.
Using TJFLAG_NOREALLOC prevents this reallocation, thus we are violating option 1.

Based on this, we could assume that any consumer of this API which follows the documentation would never create this kind of usage, thus there is no need to support it and we are free to do whatever we want.

No bugs here!

Liminal Manifestation

It is specifically this clash of perspectives that gives rise to the liminal bug.

This liminal bug in particular is particularly scary because it’s possible to write code that works correctly in most cases and seems to respect the libjpeg-turbo documentation, but when an attacker provides a large enough image, it results in a controlled heap overflow.

When I reported this bug to the maintainers, it was originally closed quickly as a non-issue. Only after some further discussion to sort out what the intended behavior was, we arrived at the conclusion that the library was implemented correctly with how TJFLAG_NOREALLOC was originally designed to work, but the documentation around it was arguably too ambiguous.

The maintainer explained that TJFLAG_NOREALLOC was actually originally introduced for the specific case of supporting the TurboJPEG Java API:

TJFLAG_NOREALLOC is a legacy feature that has only survived this long because it was needed by the TurboJPEG Java API (due to the fact that JNI can’t deal with buffers allocated on the C heap.)

And for this particular application, it was assumed that client applications would always pre-allocate the buffer to the worst-case size (i.e. option 3 in the docs):

It was assumed that, if TJFLAG_NOREALLOC was specified, then the destination buffer was allocated based on tjBufSize(), as required by the TurboJPEG v1.0 and v1.1 APIs.

Unfortunately, it was not possible to patch or safeguard the actual API since it could break backwards-compatability with existing programs using the legacy API in the intended way. Thus, the best course of action was just to update the wording in the documentation to clarify the actual behavior.

Does it matter?

So my LLM-powered fuzzer found this weird edge case in the documentation for a function. But ultimately, does it matter?

Well I was curious if other developers ran into the same semantic misunderstanding as this example. I set up a GitHub query to search for people using the TJFLAG_NOREALLOC keyword,tjCompress, tjDecompress, and tjTransform all can exhibit the same liminal bug. while trying to filter out actual libjpeg-turbo source code. It turns out that a lot of people are making the same mistake, assuming that TJFLAG_NOREALLOC makes it safe to use a potentially undersized buffer.

For example:

This C/Java repo: odnoklassniki/one-webp
This C++ repo: ailab-mayfestival2016/phenox
Even this Rust wrapper repo: sportsball-ai/jpeg-turbo-rs

Note that these are all legacy 7+ year old repositories with just a few stars, so I don’t believe there is any negative impact from listing them here.

Takeaways

The main thing I’ve learned from this experience is that the notion of a bug is extremely context-dependent. While some bugs can exist and be observed in isolation, there is a whole class of bugs, liminal bugs, that require a nuanced understanding of the whole system in order to be observed.

One interesting ramification is that the notion of testing a library API independently from its client code may itself be a limitation, preventing us from effectively exposing these tricky bugs.

Another takeaway is the extent to which documentation itself can be the source of a potentially serious security issue. As an avid CTF player and ex-vulnerability researcher, I am much more comfortable with the concept that a bug pairs one-to-one with a code patch that can fix it. But this is not always the case.

In the world of software engineering, our systems are composed of many moving parts with interfaces that have been designed and documented by humans. Imprecision, and miscommunication in the interfaces themselves can propagate into actual execution and construct these liminal bugs.

Timeline

November 4, 2025: Started fuzzing libjpeg-turbo with STITCH, found the testcase
November 6, 2025: Reported the bug via GitHub Security Advisory
November 9, 2025: Maintainer responds and closes as “wont-fix”
November 9-11, 2025: Back-and-forth about the issue and how best to address it
November 11, 2025: Maintainer updates documentation to clarify expected usage

1. In my opinion this harness is actually not super well constructed since it reuses the same data multiple times, but it's beside the point.

2. oss-fuzz-gen, PromptFuzz, PromeFuzz, ...

3. The documentation has since been updated after the issue was reported.

4. tjCompress, tjDecompress, and tjTransform all can exhibit the same liminal bug.