blob: c61ef98b301efcc04237727c8d6b986a400ed395 [file] [log] [blame]
Some fairly large changes happened between Lucene 2.x and 3.x, particularly the addition of Generics and Enum types to Java.
Due to some of the major differences between C#'s generics and Java's generics, there are some areas of lucene that differ
greatly in design.
The AttributeFactory in AttributeSource is a good example of this. Java has the Class<E> type, which would
be .NET's Type<T>, if it existed. Since .NET doesn't have a generic type, the compile time checking Java has for attributes
(being constrained to typeof(Attribute)) had to be done in a different way. The factory methods for AddAttribute and GetAttribute
now take no parameters, and instead use generic type arguments (AddAttribute<TypeAttribute>(); instead of
AddAttribute(typeof(TypeAttribute));) this change should be documented.
Another example is in Enum types. Lucene has converted its Enum types from Util.Parameter classes into proper Enums. this
is good improvement, since they are more lightweight and performant than a class. However, Java's enums are closer to classes than
in .NET. The enumerations in Field (ie, Field.Index) have methods that help to determine the properties of that field.
Right now, they are put in a static class as extension methods. That allows us to use methods like IsStored(), WithOffsets(),
WithPositions(), etc on the actual enum type without having to use a static class, but since the extension methods can only be used
on instances of the type, the functions that create the enums, ie ToIndex(), ToTermVector(), are static methods on a static
class.
Also, more unit tests fail intermittantly in Release mode. We notice this mostly with TestIndexWriter.TestExceptionsDuringCommit, but
now we're seeing it on a others as well (I think one in Store and others). It has to do with the file system, we'll get
AccessViolationExceptions, and seem to be caused by the pure speed that we're trying to access the file. I think we're trying to
access the file after it's been written, but before the kernel has finished writing to the file, since its buffered like that.
It passes if you run in release with the debugger attached. I can also get them to pass if I run them in release where they would
normally fail, but with Process Monitor on in the background, monitoring the file requests. - cc
TODO: Confirm HashMap emulates java properly
TODO: Tests need to be written for WeakDictionary
TODO: Comments need to be written for WeakDictionary
TODO: Tests need to be written for IdentityDictionary -> Verify behavior
PriorityQueue in InsertWithOverflow, java returns null, I set it to return default(T). I don't think it's an issue. We should, at least, document
that is may have unexpected results if used with a non-nullable type.
BooleanClause.java - Can't override ToString on Enum or replace with Extension Method. Leave type-safe, override with extension method, or create static class?
ParallelReader - extra data types, using SortedDictionary in place of TreeMap. Confirm compatibility. Looks okay, .NET uses a r/b tree just like Java, and it
seems to perform/behave just about the same.
FieldValueHitQueue.Entry had to be made public for accessibility.
FieldCacheRangeFilter & (NumericRangeFilter/Query) - Expects nullable primitives for the anonymous range filters<T> -> replaced with Nullable<T>
-> Could FieldCacheRangeFilter and NumericRangeFilter/Query be converted to use normal primitives, and define no lower/upper bounds as being
Type.MaxValue instead of null?
FuzzyQuery - uses java.util.PriorityQueue<T>, which .net does not have. Using SortedList<TKey, TVal> in it's place, which works, but a) isn't a perfect replacement
(a SortedList<TKey, TVal> doesn't allow duplicate keys, which is what is sorted, where a PriorityQueue does) and b) it's likely slower than a PriorityQueue<T>
I can't tell if the PriorityQueue that is defined in Lucene.Net.Util would work in its place.
Java LinkedList behavior compared to C#. Used extensively in Attributes, filters and the like
SegmentInfos inherits from java.util.Vector which is threadsafe. Closest equiv is SynchronizedCollection<T>, which is in System.ServiceModel.dll
so, we'd have a dependency on that DLL for the one collection, which I'm not sure is worth it. We could probably synchronize it a different way.
ThreadInterruptedException.java was not ported, because it only exists in the java because the built-in one is a checked exception
-> Anywhere in .NET code that catches a ThreadInterruptedException and re-throws it, should just be removed, as it's redundant.
-> Example places include (FSDirectory, ConcurrentMergeScheduler,
Dispose needs to be implemented properly around the entire library. IMO, that means that Close should be Obsoleted and the code in Close() moved to Dispose().
Constants.cs - LUCENE_MAIN_VERSION, and static constructor differs quite a bit from Java. It may be that way by design, I'm guessing differences in how
java packages work versus .NET. Either way, the tests for versioning passes, so it's probably not an issue?
ParallelMultiSearcher -> Successfully ported, but in Java the threads are named, in .NET, I ported it without named threads
(also without NamedThreadFactory from java's util)
FieldSelectorResult -> uses kludgy workaround due to Enums not being able to be null. It's only used in the MapFieldSelector class, when
deciding to include a field or not.
ConcurrentMergeScheduler/IndexWriter -> Tries to assert the current thread holds a lock. this isn't possible in .NET
SegmentInfos.cs -> 3 places need to return a readonly HashMap.
There are a good amount of methods that have been changed from protected internal to public, seemingly for use with NUnit. I've added Lucene.Net.Test
as a friend assembly that can access internals. We can change these accessibility modifiers back to how they are in java, and still have it be testable.
We can also get rid of the properties and such that are "fields_forNUnit" or like it. It just doesn't look good.
TODO: NamedThreadFactory.java - Is this needed? What is it for, just for debugging?
TODO: DummyConcurrentLock.java - Not Needed?
TODO: LockStressTest.java - Not yet ported.
TODO: MMapDirectory.java - Port Issues
TODO: NIOFSDirectory.java - Port Issues