First assumptions on the xerox scan error cause

The following information has also been appended to the original article in order to give new readers all information on one single page.

It seems that the thoughts about excessive image compression have not been that wrong at all. Several mails I got suggest that the xerox machines use JBIG2 for compression. This algorithm creates a dictionary of image patches it finds “similar”. Those patches then get reused instead of the original image data, as long as the error generated by them is not “too high”. Makes sense.

This also would explain, why the error occurs when scanning letters or numbers in low resolution (still readable, though). In this case, the letter size is close to the patch size of JBIG2, and whole “similar” letters or even letter blocks get replaced by each other.

Of course, if Xerox would have chosen the patch size in a way enabling whole, readable letters to fit into the patches, this would be grossly negligent. Also, it would shed light on how these machines are tested, as when using some patch-based compression algorithm, it kind of suggests itself to test it with low-resolution, albeit still readable letters.

I am curious how Xerox is going to react and what will come out. Until then, thanks for spreading the word, please go on doing so – and of course, I am looking forward to getting further helpful emails!

Edit: Just got this by eMail – Thanks, Boris! :-)