The day after

… seems not to be today.

I was a bit of anticipating the Xerox number mangling concern to lose momentum today. This would have been okay, since it is not my goal to inflict any damage to Xerox. I really appreciated how they got in touch and listened, and this is why I tried to help as far as I could.

Only – the internet doesn't forget. Read the sometimes harsh comments below the Xerox statement! People do not seem to agree that some small notice in the web interface only shown when changing the compression level to “normal” (sic!) would make up for possible annual productions of subtly incorrect documents at what may easily be thousands of enterprises world-wide. As a result, and as a result of the mass media kicking in additionally, this web site runs on up to 160 hits a minute the whole day. A friend of mine condensed the issue in a wonderful way I do not want to withhold from you:

Conference call with Xerox

This evening, I had like half an hour conference call with

  • Rick Dastin, Corporate Vice President Office and Solutions, and
  • Francis Tse, Imaging System Architect at Xerox Corporation.

First, I'd like to point out that the atmosphere of the call was very relaxed and easy-going. Above all, both sides were listening to each other, at least this is what I feel (Mr. Dastin, Mr. Tse, feel free to object ;-) ). I highly appreciate the way, Xerox deals with the issue, as not all enterprises would do it in this friendly way. We all know the stories of enterprises shooting at the messenger for such a blog post.

Facts first:

  1. The suggested workaround is indeed a workaround, as it switches off JBIG2
  2. The main problem was, respectively is, a support problem, which would not have ocurred, if Xerox support would have known their machines.

Now for the finer granularity facts. The Xerox design in scanning modes contains three levels. Two standard levels (high and higher) and one that gives us small file sizes, but deliberately neglects data integrity (named normal). Now, the “normal” setting uses JBIG2 (as suggested) and therefore may indeed mangle characters. The “higher” and “high” levels use another compression, which also explains, why the image quality may actually decrease when switching from “higher” to “high” – another counter-intuitive thing, as we also discussed.

If one needs a data integrity neglecting compression level in scanners, can be argued about. You need to make your own opinion with respect to this. The double key phrase concecning JBIG2 from the conference call was the following (from my memory minutes, but I think these were the words):

David Kriesel: “If you give me a document encoded with JBIG2, and I claim it's incorrect, you can't prove me wrong.”

Francis Tse: “Yes, you're right, it's a probablistic thing.”

To be most fair: The “normal” setting is not the default setting (in particular, in the company where the error occured to me first, somebody must have set scanning to “normal”) and there exists an (albeit small) warning message in the web interface, see also the screen shot in the workaround blog post.

Only, I don't think Xerox is off the hook with this. Personally, I would never ever implement patch based image compression algorithms for text data that might possibly need legal certainty, which I also stated during the call. We however agreed that the Names of the compression levels may be misleading. Somebody might always think “hey, the normal setting is enough for me”, neglect the small warning in the web interface telling about character substitutions, and go on with business (in fact, exactly this is what lots and lots of people seemingly did, but more on this later, when I also propose the only two solutions I can think of).

Possible workaround for character substitutions in xerox machines

It might be, that a reader of this blog found the magic bullet, or at least a workaround to the issue. It seems to be a scan parameter within the machines, whose tendency to create character substitutions was even known to Xerox. This raises the question why several xerox support guys weren't able to help at all, even though they visited the machines several times since the end of July. But, first things first, here is the work around. Yesterday, I was made aware of a scanning parameter in the Xerox ColorCube 9203, which seems to at least diminish the issue. Have a look at the following screen shot:

It is made of the scanning parameters within the web management of the Xerox ColorQube 9203 (those machines, as the WorkCentres, feature an integrated HTTP-Server for management purposes). The reader raised the quality from “normal” to a higher setting, which – counter-intuitively – reduced the readability of the scanned document, however, reduced the number of mangled numbers drastically (maybe even to zero). The image quality was reduced in a way that even made some characters unreadable, but in this case, the user at least sees that something is not okay and rescans the document. The employer of the reader now will not use the “normal” setting any more.

Now, straight to the point: I can confirm the impact of the setting as described above for the Xerox WorkCentre 7535. There exists an analogue setting, with the same “character substitution” text, and if the quality is raised, the reading quality is actually decreased, but the number of errors is reduced drastically. Nice one!

As there is still no statement by Xerox, I can only hope, this information works out for you. The possible consequences as described in the original article (lots and lots of documents may be scanned in a subtle incorrect way around the world) unfortunately stay the same.