blob: 5b8ce5ffe31f3882f55f1223dbb1cb7d5719db49 [file] [log] [blame]
# German special characters are replaced:
häufig haufig
# here the stemmer works okay, it maps related words to the same stem:
abschließen abschliess
abschließender abschliess
abschließendes abschliess
abschließenden abschliess
Tisch tisch
Tische tisch
Tischen tisch
Haus hau
Hauses hau
Häuser hau
Häusern hau
# here's a case where overstemming occurs, i.e. a word is
# mapped to the same stem as unrelated words:
hauen hau
# here's a case where understemming occurs, i.e. two related words
# are not mapped to the same stem. This is the case with basically
# all irregular forms:
Drama drama
Dramen dram
# replace "ß" with 'ss':
Ausmaß ausmass
# fake words to test if suffixes are cut off:
xxxxxe xxxxx
xxxxxs xxxxx
xxxxxn xxxxx
xxxxxt xxxxx
xxxxxem xxxxx
xxxxxer xxxxx
xxxxxnd xxxxx
# the suffixes are also removed when combined:
xxxxxetende xxxxx
# words that are shorter than four charcters are not changed:
xxe xxe
# -em and -er are not removed from words shorter than five characters:
xxem xxem
xxer xxer
# -nd is not removed from words shorter than six characters:
xxxnd xxxnd