HOWTO Use Encoding

The Encoding class in POJava is for the type of encoding used to represent binary data in a more portable format. It currently includes Base64 and Hexadecimal encoding and decoding. Hexadecimal is familiar even to new programmers, while Base64 is only a little less so. Both encoding methods use a fixed set of characters to represent a sequence of bits. The main difference between the two is that Hexadecimal characters each represent 4 bits, and Base64 characters each represent 6 bits.

Encoding and decoding data in either format using POJava is simple. Here is an example.

// Encoding
String str="This is the data I wish to encode.";
char[] encoded=EncodingTool.base64Encode(str.getBytes());

// Decoding
String decoded=EncodingTool.base64Decode(encoded);
assertEquals(str, decoded);
If your encoded Base64 content contains spaces and carriage returns, you can pass it as a String to the base64Decode method and it will ignore the whitespace in the String. Passing in a byte array will prompt a more strict interpretation of allowed characters.
String encodedString=new String(encoded);
String decoded=EncodingTool.base64Decode(encoded);

Reversibility

Both Base64 and Hexadecimal encodings are reversible without any loss of data. Both represent an exact sequence of ones and zeros.

Portability

The notion of portability implies that there are places where pure binary data can't go without some sort of conversion. Where might this happen? I've seen Hexadecimal encoding used most frequently on a computer screen. Some byte values of a character set represent unviewable values like null, backspace, clear or down, so displaying a memory or packet dump of raw data often uses hexadecimal instead. Base64 is frequently associated with printable documents, like MIME-encoded e-mails or with binary data stored in an XML or HTML document.

Hexadecimal Encoding

Hexadecimal encoding takes a series of bytes, and represents each sequence of 4 bits with an individual character. The ASCII characters '0' through '9' map to the first ten bit combinations, and the characters 'A' - 'F' represent the last six, adding up to all 16 possible combinations.

BitsDecHex
000000
000111
001022
001133
010044
010155
011066
011177
BitsDecHex
100088
100199
101010A
101111B
110012C
110113D
111014E
111115F

Base64 Encoding

Base64 converts a sequence of bytes into a sequence of characters, each representing six bits. Each sequence of three unencoded bytes will fall neatly into a sequence of four encoded characters, each representing the same twenty-four bits.

If the sequence of bytes to encode doesn't fall evenly into three bytes at a time, the remaining bytes are still sequenced into four encoded characters. Equals signs are used to represent the filler portion that will not be decoded into the original data. Depending on the length of the original data, the encoded characters may end in zero, one or two equals signs.
BitsDecBase64
0000000A
0000011B
0000102C
0000113D
0001004E
0001015F
0001106G
0001117H
0010008I
0010019J
00101010K
00101111L
00110012M
00110113N
00111014O
00111115P
BitsDecBase64
01000016Q
01000117R
01001018S
01001119T
01010020U
01010121V
01011022W
01011123X
01100024Y
01100125Z
01101026a
01101127b
01110028c
01110129d
01111030e
01111131f
BitsDecBase64
10000032g
10000133h
10001034i
10001135j
10010036k
10010137l
10011038m
10011139n
10100040o
10100141p
10101042q
10101143r
10110044s
10110145t
10111046u
10111147v
BitsDecBase64
11000048w
11000149x
11001050y
11001151z
110100520
110101531
110110542
110111553
111000564
111001575
111010586
111011597
111100608
111101619
11111062+
11111163/

Encoding is not Encryption

Even though a person cannot easily interpret Base64 encoded content without the aid of a computer, one should not confuse it with encryption. Base64 offers no privacy. It is not weak encryption-- it is simply a representation of data in a more portable format.