News

All three compression modes mangling numbers on Xerox WorkCentre 7545, reader reports also on 7655

As all of you probably know by now, the Xerox devices have three compression settings: Normal, higher and highest. Xerox claimed in their questions and answers sheet that the normal would be “the JBIG 2 setting” mangling numbers, and the rest would be safe at 200dpi resolution and above.

A few days ago, I found that on a Xerox WorkCentre 7545 in my hometown Bonn, besides the normal mode, also the “higher” compression mode was mangling numbers, even on a generous 300 dpi resolution. I reported my findings to Xerox, and after they tested it on one of their 7545s in the US, consequently seeing mangled numbers as well, I wrote a detailed blog post.

This already was in stark contrast to their on-screen notifications, their manuals (at least the ones I know) and even their current press statements, as all of these only issue character substitution warnings with respect to “normal” compression or even tell explicitly the user will be safe with the other compression modes.

Xerox investigating latest mangling test findings

In the second Xerox press statement, Rick Dastin, Vice President at Xerox Corporation, stated: “You will not see a character substitution issue when scanning with the factory default settings.” In a Question and Answers Sheet on the Topic, Xerox defined the factory default settings to be the “higher” compression mode, with a resolution of at least 200 dpi. For the “higher” (and for the “high”) compression mode, there is also no “character substitution” warning, neither in the manuals, nor in the admin panel, at least not in those I know.

Earlier today, I was able to replicate mangled numbers on a Xerox WorkCentre 7545, using the “higher” compression mode and an even more generous resolution of 300dpi. As all of you can imagine, this might be a serious problem now. Not only does the problem occur using default settings, additionally, their statements may have not been totally accurate, albeit well-meant.

As I did not intend to do any harm to Xerox, I had to have these findings verified and make sure I was not wrong. So, I have not been publishing the findings right away, but informed Francis Tse, Imaging System Architect at Xerox Corporation first. As a result, we have been in close contact the last hours and I sent Mr. Tse

  • a precise documentation of all the settings I made on the WorkCentre 7545, as well as the whole process I was able to replicate the problem with
  • my testing document in order to replicate the problem (you now it; its the tiff on the original article)
  • and an image with a non-exhaustive bunch of false eights marked yellow, clearly discernible by their characteristic dent in the middle, marked for Mr. Tse's convenience, click to enlarge):

Mr. Tse confirmed to me I was setting all the attributes exactly as intended by them. As a result, he acknowledged seeing mangling even using the factory settings. We also agreed it seemed that whole numbers are being copied across the paper pixel by pixel. Look at the following image, it is an enlarged version of the rectangle marked blue above:

The groups of digits marked red seem to be identical pixel by pixel, which one would expect to be highly unlikely to happen, as the scanned paper already contained some little artifacts (it has been scanned and printed before). One would expect small differences across digits of the same value, like it is the case across digit groups marked green.

Edit: A reader coded this interactive visualisation, where you can see how symbols are reused across the first page of the PDF. Move the mouse over a digit to see it's siblings painted red! Wow! Thanks!

All of this absolutely blew both of our minds, and during the last hours Xerox has been working on replicating the problem on their own devices. Unfortunately it turned out, they were able to see mangling, too.

It may for example be, that the little artifacts created by first-printing, and then-scanning of the already scanned tif help causing number-mangling, even though the 6es replaced by 8s have still been perfectly readable on the scanned paper. However, this cannot be told for sure at the time.

I am writing this article while closely collaborating with Xerox. They want to understand this issue like I do. They're listening, they're investigating and they will be releasing a statement soon.

Edit: Here is their statement, by Vice President Rick Dastin.

Could it have been that easy, Xerox?

I keep getting comments and emails stating a very interesting thing. May it be, that included in the compression algorithm in question, JBIG2, there exists a flag “lossless” one just has to turn on within the implementation, and everything's in order? This press announcement (thanks, Flavio) suggests so. Money Quote:

supports traditional “lossless” compression, but also a new “lossy” type of image compression, whereby the compression factor is increased on average by a factor of about 3 to 10, without noticeable visual differences compared with the lossless mode.

The “lossy” one is the thing all is about here. The important thing to note from that quote seems that the 3-10x factor applies to lossy JBIG2, not lossless. Lossless JBIG2 will generally create files that are a bit smaller than other forms of compression, lossy will create files that are massively smaller.