29.07.2010

A Classic Lotus Domino Web Development Mystery

>>Author:  Arnd Koch
>>Ort:     Schwentinental (Kiel)
        
URL: http://www.assono.de/blog/d6plinks/Domino-Web-Field-Name-Encoding

Category: Lotus Notes, Web-Entwicklung, Entwicklung

In classic Lotus Domino Web development (without XPages) using non-7 bit-ASCII characters in field names can lead into problems, because Domino encodes such names into cryptic strings inside the HTML page. In some cases, i.e. for client side validation of fields using JavaScript in the browser, it is nearly impossible to get the original fields name off the encoded one. The easiest way would be to pre-calculate the encoded names in the Domino back-end and generate the needed JavaScript code before the HTML is send to the browser.
But there are two problems:
The first problem is, there is no function available to encode data in the same way that Domino uses during generation of the HTML. The only way to decode such a  string is using the "@URLDecode" function, which isn't available in JavaScript in the browser.
The other problem is that the encoding algorithm seems not to be documented at all, which makes it quite hard to implement an appropriate encoding function yourself. Well, there is a light at the end of the tunnel and I did a little research on this. The following analysis shows how this encoding works:

When using the field name "Straße" (eng.: street, the "ß" is a sharp "s" in german)  the Domino server encodes this into "_gadq74of1ck_". When removing the underscores and trying various examples, the conclusion is that it must be some kind of Base32 encoding, but the results of various Base32 variants don't match the Domino one:
"Straße" -> Domino    -> "gadq74of1ck"
"Straße" -> Base32    -> "kn2heyo7mu"
"Straße" -> zBase32   -> "un7rzo75fd"
"Straße" -> Base32hex -> "adq74ogvck"

Looking a little closer at the result, there is a partial match on the Base32hex (RFC 4648 (1)) encoding, but there are some differences:
Notes:     gadq74of1ck
Base32hex:  adq74ogvck

The Domino encoded value has an extra character and doesn't match the Base32 result in all characters, to look a little closer at the cause of these differences, it is needed to decode the Domino generated string step by step. For now, we skip the first character. It'll be explained later on:
1: encoded field name
2: translation into the decimal values according to the Base32hex table
3: binary values, 5 bit per character
4: regrouping of the bits
5: groups of 8 bit per character
6: converting into characters

1:    a     d     q     7     4     o     f     1     c     k
2:   10    13    26     7     4    24    15     1    12    20
3: 01010 01101 11010 00111 00100 11000 01111 00001 01100 10100
4: 01010011011101000111001001100001111000010110010100
5: 01010011 01110100 01110010 01100001 11100001 01100101
6:        S        t        r        a        á        e

The difference between "f1" and "gv" is resulting from another character being used instead of the "ß". The conclusion is that Domino uses another character set to encode the field names; a little research on common charsets brings the MS-DOS Codepage 850 (2) to the eye, indeed in this charset and its relatives, the "ß" matches the decimal value of "225". This however doesn't make it not easier, most likely the used code page varies depending  on the systems configuration so developing a fully compatible encoding function might be a challenge.

Now the explanation of the above mentioned first character in the Domino encoding, this one is a little tricky, too. First we need to decode it again using Base32hex:  
     g
    16
 10000

The additional 5 Bits are Parity Bits (Even Parity) and are used to validate the encoding to prevent errors, to make this more clear we use a different display order of the results:
g adq74of1ck
1 0010010001
0 1110011010
0 0101101011
0 1011001000
0 0101001100
 
Well that's it, one less mystery out there  

(1) http://www.rfc-editor.org/rfc/rfc4648.txt
(2) ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP850.TXT

Comments

#1 Interesting.... I saw this years ago with non-aliased combobox values that had non-ascii characters in them. I worked around it by simply adding aliases that were identical to the UI values and it worked. Glad to know what was really going on. Gravatar Image
#2 Hmmm. From a hacking/security point of view, this information is of vital importance. Thank you very much. Just one more weapon in my arsenal of security measures. Gravatar Image
#3 It is unlikely that the encoding is going to be dependent on the language settings of the server. 225 is the LMBCS value of the sharp-s character. Here's a link to a handy cross-reference between Unicode and LMBCS that I built: { Link }

Gravatar Image

Post A Comment

Comments

:-D:-o:-p:-x:-(:-):-\:angry::cool::cry::emb::grin::huh::laugh::lips::rolleyes:;-)

Tags

Deutsche RSS-Feeds (German)

Custom Button Custom Button

English RSS feeds

Custom Button Custom Button