blob: 33f9189a03afb2cd93eefabe7ef0b8f2f0d9ad24 [file] [log] [blame]
To test that utf8 characters in pagenames are being properly migrated:
1. Go to http://babelfish.yahoo.com/ or something translation page
2. translate 'testing' or something like that into Korean characters
3. Create a file and copy the korean chars into the filename somewhere. End the filename with the extension '.uwc'.
4. Run the uwc on this file with the moinmoin converter (which does not request url decoding, but needs the .uwc extension).
5. Examine the resulting Confluence page and verify that the korean characters were preserved in the page title
To test that the encoding framework works,
1. cp converter.testencoding.properties to target/uwc/conf/
2. edit it, and set the value of the encoding property to Cp1252.
3. Run the UWC on encoding/euro-cp1252.txt with upload turned on
4. Examine the page in confluence. A Euro symbol should be visible.
5. edit the properties file, and set the value of the encoding property to iso-8859-1.
6. Run the UWC on encoding/fraction-8859-1.txt with upload turned on
7. Examine the page in confluence. A fraction symbol should be visible.
8. Edit the properties file, and set the value of the encoding property to iso-8859-15.
9. Run the UWC on encoding/euro-8859-15.txt with upload turned on.
10. Examine the page in confluence. A Euro symbol should be visible.
These are the most likely encodings to have problems, and these tests are of characters that do not overlap with utf-8.
11. Finally, as a caution, the file deriv-oracle-cesu8.txt uses the broken character encoding cesu-8, which is not going to be fixable for several reasons. We have this test file as an example of what happens when the UWC tries to handle a broken character encoding (which is to say, odd errors from the CRJW that cuase upload to fail).