commit | 7943b7ad1c51ed894ec4e89f97a29c4d185c955b | [log] [tgz] |
---|---|---|
author | Shubham Chaudhary <36742242+shubhamvishu@users.noreply.github.com> | Mon Oct 30 19:22:28 2023 +0530 |
committer | GitHub <noreply@github.com> | Mon Oct 30 09:52:28 2023 -0400 |
tree | 13c10067f2b8b740dd3375420b3e25c587674aa2 | |
parent | 2a8d187a99e98897e1263ee2eca7912f51198cfb [diff] |
Return the same input vector if its a unit vector in VectorUtil#l2normalize (#12726) ### Description While going through [VectorUtil](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/VectorUtil.java) class, I observed we don't have a check for unit vector in `VectorUtil#l2normalize` so passing a unit vector goes thorough the whole L2 normalization(which is totally not required and it should early exit?). I confirmed this by trying out a silly example of `VectorUtil.l2normalize(VectorUtil.l2normalize(nonUnitVector))` and it performed every calculation twice. We could also argue that user should not call for this for a unit vector but I believe there would be cases where user simply want to perform the L2 normalization without checking the vector or if there are some overflowing values. TL;DR : We should early exit in `VectorUtil#l2normalize`, returning the same input vector if its a unit vector This is easily avoidable if we introduce a light check to see if the L1 norm or squared sum of input vector is equal to 1.0 (or) maybe just check `Math.abs(l1norm - 1.0d) <= 1e-5` (as in this PR) because that unit vector dot product(`v x v`) are not exactly 1.0 but like example : `0.9999999403953552` etc. With `1e-5` delta here we would be assuming a vector v having `v x v` >= `0.99999` is a unit vector or say already L2 normalized which seems fine as the delta is really small? and also the check is not heavy one?.
Apache Lucene is a high-performance, full-featured text search engine library written in Java.
This README file only contains basic setup instructions. For more comprehensive documentation, visit:
gradlew
).We‘ll assume that you know how to get and set up the JDK - if you don’t, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README.
See Contributing Guide for details.
Bug fixes, improvements and new features are always welcome! Please review the Contributing to Lucene Guide for information on contributing.
#lucene
and #lucene-dev
on freenode.net