| Apache Any23 2.4 |
| Release Notes |
| 20/09/2020 (dd/mm/yyy) |
| |
| Sub-task |
| |
| [ANY23-146] - CEN Metalex Vocabulary |
| [ANY23-149] - Expand SCHEMAORG Vocab |
| [ANY23-150] - Implement all vocab.sindice.net Vocabularies |
| [ANY23-269] - Support auto.schema.org |
| [ANY23-270] - Support bib.schema.org |
| |
| Bug |
| |
| [ANY23-427] - http://semanticweb.org/ down causes tests to fail |
| [ANY23-428] - RDFa parse issue if vocab not defined with trailing slash |
| [ANY23-430] - Microdata and HTML's attribute case |
| [ANY23-441] - TikaEncodingDetector: guessEncoding may throws an ArrayIndexOutOfBoundsException |
| [ANY23-446] - Fix bugs in Jsoup |
| [ANY23-449] - Fix the online microdata test failure |
| [ANY23-453] - Upgrade jsonld-java to 0.13.1 |
| |
| New Feature |
| |
| [ANY23-5] - Add support for archive input. |
| [ANY23-6] - Integrate MetaX support |
| |
| Improvement |
| |
| [ANY23-51] - Full support for rel-tag's |
| [ANY23-178] - Add fully annotated Javadoc to o.a.any23.source.* |
| [ANY23-183] - Address javac warning's in Any23 code base |
| [ANY23-202] - Add analytics on any23.org landing page |
| [ANY23-254] - Demo frontend should provide interactive CLI usage examples |
| [ANY23-281] - Build Policeman's Forbidden API Checker into Maven config |
| [ANY23-426] - Address Javadoc WARNING's |
| [ANY23-439] - Replace commons-lang with commons-lang3 |
| [ANY23-440] - any23 configuration documentation has a wrong property name |
| [ANY23-442] - Move HTML preprocessing logic from BaseRDFExtractor to semargl Extractors |
| [ANY23-443] - Improve efficiency of RDFa Extractor |
| [ANY23-444] - Update all dependencies and plugins |
| [ANY23-450] - Update Maven deps and plugin versions |
| |
| Wish |
| |
| [ANY23-225] - Fix Javadoc WARNING's in Any23 codebase |
| |
| Task |
| |
| [ANY23-72] - Evaluate the introduction of Aether as to improve the Any23 plugin management system |
| [ANY23-429] - Website Build Fails due to Javadoc issues |
| [ANY23-431] - Upgrade jsoup to v1.12.1 |
| [ANY23-432] - Upgrade owlapi to v5.1.11 |
| [ANY23-433] - Upgrade rdf4j to v3.0.0 |
| [ANY23-434] - Upgrade tika to v1.22 |
| [ANY23-435] - Upgrade httpclient to v4.5.10 |
| [ANY23-436] - Upgrade commons-csv to v1.7 |
| [ANY23-437] - Upgrade snakeyaml to v1.25 |
| [ANY23-438] - Upgrade slf4j-api to v1.7.28 |
| [ANY23-448] - Move service and plugins out of core |
| |
| Apache Any23 2.3 |
| Release Notes |
| 10/02/2019 (dd/mm/yyy) |
| |
| Sub-task |
| |
| [ANY23-184] - Update Javadoc in o.a.a.extractor.microdata.* |
| [ANY23-356] - Update dependencies |
| [ANY23-357] - Resolve mockito deprecation warnings |
| [ANY23-358] - Resolve junit.framework deprecation warnings & RDFa11Parser deprecation warnings |
| [ANY23-359] - Resolve org.apache.commons.io.IOUtils deprecation warning |
| [ANY23-360] - Resolve Xerces deprecation warnings |
| [ANY23-361] - Resolve Tika deprecation warning |
| [ANY23-362] - Resolve rdf4j deprecation warnings |
| [ANY23-363] - Update httpclient/httpcore to version 4.5.6/4.4.10 |
| [ANY23-364] - Resolve POI deprecation warnings |
| [ANY23-365] - Resolve additional warnings |
| [ANY23-366] - Resolve additional warnings in build |
| [ANY23-369] - Resolve overlapping classes |
| [ANY23-388] - It should be possible to configure the NTriplesWriter to use unicode points |
| [ANY23-404] - Make MicrodataExtractor compliant with default registry |
| [ANY23-405] - Parse microdata property values correctly |
| [ANY23-407] - Allow microdata itemids to be created from relative URLs |
| [ANY23-408] - Use document IRI as default namespace in microdata strict mode |
| [ANY23-409] - Allow multiple microdata itemtype values |
| [ANY23-410] - Fix microdata itemrefs |
| |
| Bug |
| |
| [ANY23-13] - Verify why the maven-changelog-plugin doesn't work properly |
| [ANY23-16] - Property URI generation for Microdata/schema.org |
| [ANY23-17] - problem detecting media type for turtle content with comment at the top |
| [ANY23-55] - any23 is not following the redirection |
| [ANY23-67] - Microdata extraction using obsolete RDF conversion scheme |
| [ANY23-154] - Not able to extract microdata in few test cases |
| [ANY23-167] - Microdata itemscope properties incorrectly attached |
| [ANY23-169] - Incorrect interpretation of relative and absolute paths with Microdata |
| [ANY23-188] - NPE when ICBMExtractor#getDescription()#getExtractorLabel() called |
| [ANY23-237] - Fix RDFa test 0087: stylesheet reserved word is stripped out |
| [ANY23-245] - Infinite loop on some malformed markup |
| [ANY23-322] - Any23 embedded service is broken |
| [ANY23-329] - master branch broken with pom.xml any23 version |
| [ANY23-331] - Tool service implementations declared in wrong module? |
| [ANY23-334] - SingleDocumentExtraction.createExtractionContext() uses UUID as defaultLanguage |
| [ANY23-336] - Parsing json-ld content takes prohibitively long time |
| [ANY23-337] - BenchmarkTripleHandler does not report accurate extraction interval times |
| [ANY23-338] - Json-ld comment parsing fails in rare cases |
| [ANY23-339] - Microdata extractor can sometime merge two different itemscopes into one |
| [ANY23-340] - Any23 extraction does not pass Nutch plugin test |
| [ANY23-344] - MicrodataExtractor not resolving urls correctly |
| [ANY23-345] - MicrodataExtractorTest has a duplicated test |
| [ANY23-346] - rdf4j versions 2.3.0, 2.3.1 contain a regression: we need to switch back to version 2.2.4 |
| [ANY23-347] - RDFParseException: the prefix "pw" is not bound |
| [ANY23-348] - IllegalArgumentException in MicrodataExtractor |
| [ANY23-349] - MicrodataExtractor errors for links that are telephone numbers |
| [ANY23-350] - RDFParseException: "icon" must be followed by ' = ' character |
| [ANY23-351] - NullPointerException in HCardExtractor |
| [ANY23-353] - RDFParseException: datatype rdf:langString requires a language tag |
| [ANY23-367] - latest.stable.released property is never used and out of date |
| [ANY23-368] - Jenkins builds are failing after running out of disk space |
| [ANY23-372] - LGPL-licensed transitive dependency |
| [ANY23-373] - Web page /install.html: software version variable was not decoded. |
| [ANY23-376] - IllegalArgumentException: invalid property name '' |
| [ANY23-377] - Microdata extractor replaces empty strings with "Null" |
| [ANY23-378] - JsonParseException caused by trailing commas in JSON-LD |
| [ANY23-379] - RDFa SAXParseException: invalid XML character |
| [ANY23-380] - RDFa SAXParseException: attribute was already specified |
| [ANY23-381] - JsonParseException: Illegal unquoted character |
| [ANY23-382] - Distinguish between fatal and recoverable json-ld parsing errors |
| [ANY23-383] - JsonParseException: Unexpected character 0x2028 |
| [ANY23-386] - Item's properties are in the wrong item since the 2.2 |
| [ANY23-387] - Possible OutOfMemoryError with bad deeply nested HTML |
| [ANY23-389] - RDFa extraction breaks when base element uses relative href |
| [ANY23-391] - ICAL vocab uses class "vcalendar" instead of "Vcalendar" |
| [ANY23-392] - Lunching maven-jetty-plugin: Problem accessing /apache-any23-service/resources/form.html |
| [ANY23-395] - any23.org 500 Internal Server Error |
| [ANY23-406] - Cannot suppress Tika warnings |
| [ANY23-411] - Use Content-Type to help determine encoding |
| [ANY23-415] - NTriplesExtractor tries all text/plain files, causing numerous fatal issues |
| [ANY23-416] - NTriplesExtractor does not recognize "application/n-triples" mimetype |
| [ANY23-420] - Handle Json+ld extraction failure |
| [ANY23-425] - iCal, jCal, xCal extractors aren't listed in META-INF/services |
| |
| New Feature |
| |
| [ANY23-81] - Interactive web service |
| |
| Improvement |
| |
| [ANY23-38] - Use a single logging tool: slf4j |
| [ANY23-190] - any23.org homepage busted on IE11 |
| [ANY23-212] - Improve naming convention for service output files |
| [ANY23-215] - Forward slashes in URL's should not be escaped in RDF output |
| [ANY23-231] - Make JSON Reporting output pretty print |
| [ANY23-240] - Option to process html tags as spaces in Microdata |
| [ANY23-323] - Update Eclipse RDF4J version to 2.3 |
| [ANY23-332] - Plugin-specific properties shouldn't be declared in default-configuration.properties |
| [ANY23-341] - Remove dependency on defunct commons-httpclient 3.1 |
| [ANY23-343] - Upgrade to jsonld-java v 0.12.0 |
| [ANY23-352] - Update to rdf4j version 2.3.2 |
| [ANY23-354] - Clean up dependencies |
| [ANY23-355] - Deprecate RDFa11Parser since Rio implementations are used instead |
| [ANY23-374] - Invalid nested item takes out everything |
| [ANY23-385] - Improve charset detection for (x)html documents |
| [ANY23-390] - Implement ICal, JCal, XCal extractors |
| [ANY23-393] - Any23 master to build under JDK 10.X |
| [ANY23-394] - JSON-LD Extractions Flag Errors in Google's Structured Data Tooling |
| [ANY23-396] - Overhaul WriterFactory API |
| [ANY23-399] - Upgrade Apache parent POM to version 21 |
| [ANY23-401] - Upgrade to Tika 1.19.1 |
| [ANY23-402] - Deprecate JSONWriter, JSONWriterFactory |
| [ANY23-403] - Upgrade to RDF4J 2.4.0 |
| [ANY23-414] - Support reverse itemprops in microdata |
| [ANY23-418] - Take another look at encoding detection |
| [ANY23-419] - Add J2EE depednencies such that service runs under JDK11 |
| [ANY23-424] - Update dependencies |
| |
| Test |
| |
| [ANY23-422] - Error message when any23 cli tool used |
| |
| Task |
| |
| [ANY23-333] - Augment use of Any23PluginManager in How to Register a Plugin documentation |
| [ANY23-423] - Update POM for the move to gitbox. |
| |
| Apache Any23 2.2 |
| Release Notes |
| 25/01/2018 (dd/mm/yyy) |
| |
| Sub-task |
| |
| [ANY23-155] - Test failure: testRunOnHTTPResource(org.apache.any23.cli.MicrodataParserTest) |
| [ANY23-267] - Entire extractions fail due to "The element type 'meta' must be terminated by the matching end-tag </meta>" |
| [ANY23-268] - Entire extraction task fails due to "Element type "t.length" must be followed by either attribute specifications, ">" or "/>" |
| |
| Bug |
| |
| [ANY23-12] - character are wrongly encoded in rdfxml output |
| [ANY23-131] - Nested Microdata are not extracted |
| [ANY23-140] - Revise Any23 tests to remove fetching of web content |
| [ANY23-166] - Parsing crashes with attributes that don't use quotes |
| [ANY23-201] - Service Regularly Times Out on DBPedia Queries |
| [ANY23-227] - not extracting opengraph rdfa |
| [ANY23-228] - Invalid URI |
| [ANY23-230] - any23.org redirects to single slash URI |
| [ANY23-256] - MicrodataParserTest failing locally but not on Jenkins |
| [ANY23-260] - Get Any23 listed as an Application capable of using DBPedia |
| [ANY23-266] - Fix Issues with Failing WebService Examples |
| [ANY23-271] - Address "...The entity "raquo" was referenced, but not declared" SAXParseException |
| [ANY23-273] - The content of elements must consist of well-formed character data or markup - no bogus comments |
| [ANY23-303] - JsonLdError: loading remote context failed: http://schema.org/ |
| [ANY23-306] - Absent binaries for version 2.0 |
| [ANY23-312] - Triple sub-pred-null should not be added into outcome. Change traversing method. |
| [ANY23-314] - Service fails to return extraction in case of extraction error |
| [ANY23-316] - Yaml parser does not halndle intentional null value |
| [ANY23-317] - Any23 fails when dealing with JavaScript |
| [ANY23-318] - ExtractionException handling in BaseRDFExtractor.java kills entire extraction |
| [ANY23-326] - parsing unclosed meta and input tags fails |
| |
| New Feature |
| |
| [ANY23-8] - Write a separate tool for RDFa/microformat detection tool usable in crawlers |
| [ANY23-233] - Add local extraction cache to Any23 service |
| |
| Improvement |
| |
| [ANY23-106] - Gracefully shut down Any23 service |
| [ANY23-213] - Implement JSOn reporting for the Any23 service |
| [ANY23-214] - ë (e-umlaut or diaeresis) not decoded in RDF output |
| [ANY23-249] - Update all W3C and other Standards Compliance within Any23 |
| [ANY23-280] - Refactor ContentExtractor to improve extraction flexibility |
| [ANY23-291] - JSON-LD should be looked up in entire HTML document, not just in <head> |
| [ANY23-298] - Revisit the OGP.java vocabulary and update it |
| [ANY23-309] - "Scraper" misspelled as "Scarper" on Downloads webpage |
| [ANY23-319] - Upgrade jsonld-java dependency to 0.11.1 |
| [ANY23-324] - Replace net.sourceforge.nekohtml with jsoup |
| [ANY23-325] - Any23 incompatible with http://rdfa.info/test-suite/# |
| |
| Test |
| |
| [ANY23-320] - Address @Ignore tests in Any23 |
| |
| Wish |
| |
| [ANY23-210] - Address 1.0 Release Review Discrepancies |
| |
| Task |
| |
| [ANY23-40] - Complete Documentation for Plugin Management system |
| |
| |
| Apache Any23 2.1 |
| Release Notes |
| 14/09/2017 (dd/mm/yyy) |
| |
| Bug |
| |
| [ANY23-244] - Broken Links on Web-Site |
| [ANY23-282] - Replacement for all Sindice namespaces and URI's |
| [ANY23-304] - Add extractor for OpenIE |
| [ANY23-305] - Missing appender in command line tool |
| [ANY23-308] - Adding option "-d" to yaml file parsing gives error |
| [ANY23-310] - Rover displays wrong statistical values |
| |
| Improvement |
| |
| [ANY23-206] - Overhaul Any23 site documentation |
| [ANY23-301] - Forward all logs into STDERR stream |
| |
| New Feature |
| |
| [ANY23-257] - Support OWL as an input format |
| |
| Task |
| |
| [ANY23-283] - access to analysis.apache.org |
| |
| Apache Any23 2.0 |
| Release Notes |
| 03/02/2017 (dd/mm/yyy) |
| Sub-task |
| |
| [ANY23-243] - Overhaul and update README.txt |
| |
| Bug |
| |
| [ANY23-79] - No execute permissions in command line tool |
| [ANY23-92] - NQuadsParser does not require whitespace between elements |
| [ANY23-99] - NQuadsWriter should force ASCII in OutputStream constructor |
| [ANY23-153] - Automatically Generate EARL reports for Any23 RDF Parsers |
| [ANY23-176] - DOC: Apache Any23 Installation Guide |
| [ANY23-200] - Build revision is not correctly defined |
| [ANY23-219] - rover is does not work with -f nquads option |
| [ANY23-235] - NQuads links broken on Supported Formats Page |
| [ANY23-236] - Port Any23 site to Apache CMS |
| [ANY23-248] - NTriplesWriter on hadoop : issue with MIME type/Upgrade sesame dependencies to 2.7.14 |
| [ANY23-252] - JSON-LD format MIME type is not detected |
| [ANY23-253] - JSON-LD cannot be processed by Rover |
| [ANY23-255] - apache-any23-quads dependency should not be <scope> test in core pom.xml |
| [ANY23-265] - ThreadSafety issue in ItemPropValue |
| [ANY23-272] - Service fails to start with any23server.bat |
| [ANY23-277] - Any23 master branch will not build to to build due to lacking maven-assembly-plugin |
| [ANY23-279] - Fix EmbeddedJSONLDExtractor ExtractorDescription getDescription() implementation |
| [ANY23-296] - Tar complains about groupid value being too big |
| [ANY23-302] - rover JSON output is not valid |
| |
| Improvement |
| |
| [ANY23-80] - Split out command line tools into a separate module |
| [ANY23-163] - VocabPrinter tool broken with No writer factory available for RDF format N-Quads (mimeTypes=text/x-nquads; ext=nq) |
| [ANY23-185] - Add missing <meta> element attributes to HTMLMetaExtractor |
| [ANY23-207] - Implement Microformats2 |
| [ANY23-246] - Add Open Graph Protocol and Facebook prefixes to popular.prefixes |
| [ANY23-247] - FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character. |
| [ANY23-250] - Upgrade to Tika 1.7 |
| [ANY23-261] - Tiny typo in Data Extraction documentation source example |
| [ANY23-263] - Upgrade to Tika 1.14 |
| [ANY23-274] - Change any23.microdata.ns.default configuration value to http://schema.org |
| [ANY23-276] - Upgrade sesame dependencies to RDF4J |
| [ANY23-278] - Upgrade all Maven plugin versions in parent pom.xml |
| [ANY23-293] - Package log4j configuration with core appassembler |
| [ANY23-297] - Any23 doesn't build under JDK1.8 |
| [ANY23-299] - Missing YAML to RDF parser |
| [ANY23-300] - Ignore NetBeans configuration files |
| |
| Task |
| |
| [ANY23-141] - Upgrade OpenRDF Sesame to 2.7.0 |
| [ANY23-242] - Address issues with 1.1 #1 RC |
| |
| Wish |
| |
| [ANY23-19] - Abstract away any specific RDF APIs |
| [ANY23-226] - Extract JSON-LD embedded in HTML |
| |
| Apache Any23 1.1 |
| Release Notes |
| 15/10/2014 (dd/mm/yyyy) |
| Bug |
| |
| [ANY23-205] - Remove xrefs from Any23 site and replave with Git(hub) links |
| [ANY23-220] - Run crawler plugin on Apache Any23 site |
| [ANY23-234] - No writer factory available for RDF format N-Quads (mimeTypes=text/x-nquads; ext=nq) |
| |
| Improvement |
| |
| [ANY23-157] - Update Any23 site to accommodate move to Git. |
| [ANY23-197] - Extract embedded json-ld from html documents |
| [ANY23-204] - fix url encoding problem : PR#3 |
| [ANY23-209] - Bug in site generation |
| [ANY23-221] - Enable JSON-LD as an input format for the WebService at any23.org |
| [ANY23-238] - Fix generation of BNode name for microdata when 'itemid' is given without a value. |
| |
| New Feature |
| |
| [ANY23-7] - Performance test suite |
| [ANY23-160] - [SECURITY] Frame injection vulnerability in published Javadoc |
| |
| Task |
| |
| [ANY23-222] - Push 1.1-SNAPSHOT artifacts to the Any23 website |
| |
| |
| Apache Any23 1.0 |
| Release Notes |
| 09/05/2014 (dd/mm/yyyy) |
| |
| Sub-task |
| |
| [ANY23-148] - Programmes Ontology |
| |
| Bug |
| |
| [ANY23-100] - Issue with RDFa extractor while processing nested properties |
| [ANY23-135] - Any23 RDFa Extractor ignores multiple prefix and property statements |
| [ANY23-136] - Some RDFa tests have incorrect expected results |
| [ANY23-168] - RDFa properties in <meta> elements not picked up |
| [ANY23-170] - Dependency error org.apache.commons:commons-csv:1.0-SNAPSHOT-rev1148315 |
| [ANY23-172] - Fix minor issues with Any23 0.9.0 RC |
| [ANY23-173] - Please delete old releases from mirroring system |
| [ANY23-174] - Incorrect RDFa extractions |
| [ANY23-203] - Update version revisions from 0.9.1 to 1.0 |
| |
| Improvement |
| |
| [ANY23-65] - Update to RDFa extraction stylesheet |
| [ANY23-128] - html-rdfa11 extractor fails on mailto: anchors |
| [ANY23-130] - Improve aesthetics of the output format when straying from default java.io.PrintStream |
| [ANY23-137] - RDFa parser implementation proposal |
| [ANY23-179] - Improve Javadoc and throwing of IllegalArgumentException in Any23#createDocumentSource |
| [ANY23-180] - Create an Apache hosted jail running an Any23 service instance |
| [ANY23-181] - Upgrade NekoHTML to 1.9.20 |
| |
| New Feature |
| |
| [ANY23-134] - Create o.a.a.extractor.tika Parser and Extractor implementations |
| [ANY23-177] - Add support for JSON-LD |
| |
| Task |
| |
| [ANY23-162] - Add package.java for all LKIFCore classes |
| |
| Apache Any23 0.9.0 |
| Release Notes |
| 28/10/2013 (dd/mm/yyyy) |
| |
| Sub-task |
| |
| [ANY23-142] - LKIF-Core Vocabulary |
| [ANY23-143] - LRICore Vocabulary |
| |
| Bug |
| |
| [ANY23-111] - Any23 raises an unmanaged exception from the Microdata parser |
| [ANY23-115] - Empty spans seem to break ANY23 |
| [ANY23-161] - Fix service file generation |
| [ANY23-165] - "Invalid content" error if TITLE precedes encoding declaration in the document |
| [ANY23-171] - form.html not in correct location in service. |
| |
| Improvement |
| |
| [ANY23-47] - Migrate basic-crawler classes to org.apache.nutch |
| [ANY23-164] - office-scraper ExcelExtractorFactory.java to accept application/x-tika-ooxml and application/x-tika-msoffice formats |
| |
| New Feature |
| |
| [ANY23-120] - Split CLI tools out into a new module |
| |
| Task |
| |
| [ANY23-122] - Cleanup Distribution Mirrors |
| |
| Apache Any23 0.8.0 |
| Release Notes |
| 01/05/2013 (dd/mm/yyyy) |
| |
| Sub-task |
| |
| [ANY23-109] - Missing tika-config.xml in o.a.a.mime |
| [ANY23-110] - DOAP Vocabulary |
| |
| Bug |
| |
| [ANY23-44] - error when parsing a document from http://www.afdsi.org/docs/test/html/RDFa/_food-stream_.htm |
| [ANY23-78] - Download page links are broken |
| [ANY23-108] - Broken schema.org microdata extraction |
| [ANY23-112] - Fix incubation disclaimer |
| [ANY23-113] - Remove dependencies from parent pom.xml file |
| [ANY23-116] - Empty values are skipped when reading tab separated CSV. |
| [ANY23-156] - Add logging dependencies to plugins and service |
| |
| Improvement |
| |
| [ANY23-2] - Add support for hreview-aggregate microformat. |
| [ANY23-26] - Upgrade dependency to Apache Tika 1.2 |
| [ANY23-46] - Update Any23 web service |
| [ANY23-83] - Remove hardcoded formats throughout Any23 to make it useful as a library |
| [ANY23-101] - Use RDFFormat.NQUADS in nquads module |
| [ANY23-139] - Simplify site deploy plugging the maven-scm-publish-plugin |
| [ANY23-144] - Implement comprehensive naming of o.a.a.api.vocab classes |
| |
| New Feature |
| |
| [ANY23-4] - Integrate W3C's RDFa test suite and pass all tests |
| [ANY23-85] - Split NQuads out into its own module |
| [ANY23-96] - Add user agent string to basic-crawler |
| [ANY23-117] - Split Mime type detection out into its own module |
| [ANY23-118] - Split Encoding detection out into its own module |
| |
| Task |
| |
| [ANY23-41] - Write basic-crawler plugin documentation |
| [ANY23-125] - Drop the Incubating DISCLAIMER |
| |
| |
| Apache Any23 0.7.0-incubating |
| Release Notes |
| 25/06/2012 |
| |
| Sub-task |
| |
| [ANY23-25] - Update all Maven POM's in trunk |
| [ANY23-31] - Move any23 site documentation out of trunk and into its own SVN directory |
| [ANY23-53] - Bad Web Service documentation |
| |
| Bug |
| |
| [ANY23-14] - Add support for Extractor sub results |
| [ANY23-20] - The Any23 PluginManager fails handing resource paths containing spaces. |
| [ANY23-34] - Plugin Integration Test Fails |
| [ANY23-37] - LGPL'ed components cannot be included in distribution packages |
| [ANY23-42] - Fix issue in RDFa11Parser.java is not resolving relative URIs correctly |
| [ANY23-49] - N3/NQ parsers ignoring stopAtFirstError flag |
| [ANY23-58] - HCardExtractor infinite loop and memory exhaustion |
| [ANY23-62] - ExtractionResultImpl loses all issues generated by sub extractions |
| [ANY23-73] - The ToolRunner CLI driver -p (--plugins-dir) option doesn't work because parsed after the Tool list loading |
| [ANY23-77] - Facing a infinite loop problem in version 0.6.1 - Verify |
| [ANY23-78] - Download page links are broken |
| [ANY23-87] - Bogus arguement in o.a.a.cli.CrawlerTest |
| [ANY23-88] - any23 script -v or --version option doesn't display actual version |
| [ANY23-94] - The Microdata CLI tool doesn't work anymore |
| [ANY23-95] - Activate the IgnoreAccidentalRDFa filter for the Any23 Service instance |
| [ANY23-97] - The test suite was not running all tests, minor regressions occurred |
| |
| Improvement |
| |
| [ANY23-18] - Add a new extractor for RDFa using java-rdfa |
| [ANY23-28] - Document munging of Any23 history to CHANGES.txt |
| [ANY23-32] - replace hardcoded bash script with generated via appassembler |
| [ANY23-33] - Replace proprietary SUN imports from Any23 classes. |
| [ANY23-45] - Improve issue verification support in Extractor tests |
| [ANY23-50] - Simplify plugin loading avoiding the classpath scanning |
| [ANY23-56] - Change repo-ext to Any23 SVN mirrior repo. |
| [ANY23-63] - The Any23 web service doesn't return the Issue Report generated by activated Extractors, hiding major metadata issues |
| [ANY23-64] - Improve CLI uage aesthetics |
| [ANY23-70] - Establish searchable list archives |
| [ANY23-71] - improve the current CLI engine |
| [ANY23-74] - Disable domain triple generation in default configuration |
| [ANY23-75] - Improve runtime of the Microdata extractor on documents with many relations. |
| [ANY23-76] - Improve runtime of the Microformat extractor on documents with many relations. |
| [ANY23-82] - Don't use explicit reference to Log4j classes |
| [ANY23-86] - Better logging in SiteCrawlerTest |
| |
| New Feature |
| |
| [ANY23-9] - Prepare a dedicated homepage for Any23 |
| [ANY23-29] - Migrate code base to ASF infrastructure |
| [ANY23-57] - Create Any23 History documentation and add to site |
| [ANY23-59] - Create KEYS file for Any23 |
| [ANY23-68] - Create Powered By documentation/page |
| [ANY23-102] - Any23 DOAP file |
| |
| Task |
| |
| [ANY23-21] - Migrate all packages and classes to ORG.APACHE.ANY23 |
| [ANY23-27] - Import revisions r1547 to r1607 from Google Code SVN to ASF SVN |
| [ANY23-36] - Merge GCode specific CHANGES.txt report in main changes.xml |
| [ANY23-39] - Write Down Overall Architecture Document to help new developers maintaining the Any23 core |
| [ANY23-48] - Update Documentation (Site + READMEs) to reflect changes in shell script usage |
| [ANY23-52] - Remove non ASF logos from Any23 Service page |
| [ANY23-66] - Fix Javadoc |
| |
| ========================================================================== |
| |
| Apache Any23 0.6.1 |
| Release Notes |
| |
| Fixes |
| |
| * Improved MIMEType detection for CSV input. [172, 176] |
| |
| ========================================================================== |
| |
| Apache Any23 0.6.0 |
| Release Notes |
| |
| Fixes |
| |
| * Fixed several bugs. [151, 153, 154, 155, 156, 164, 168] |
| * Removed unused Apache Any23 dependencies. [162] |
| * Introduced parent POM dependencyManagement. [163] |
| * Minor code refactoring. [142] |
| * Updated project documentation. [161] |
| |
| Enhancements |
| |
| * Added support for Microdata [114, 141, 144, 145, 152, 157] |
| * Added RDFa 1.1 support for new prefix specification. [143] |
| * Added CSV Extractor (RDFizer). [150, 165] |
| * Added HTML/META Extractor. [148, 149] |
| * Improved Configuration programmatic management. [147] |
| * Added several flags to control metadata triples generation. [146] |
| * Improved nesting relationship explicitation in Microformat extractors. [80] |
| * Major Extractor interface refactoring. [160, 167] |
| * Improved TagSoup Extractor based error reporting. [159] |
| * Added command-line tool to print out the Apache Any23 declared vocabularies. [114] |
| |
| ========================================================================== |
| |
| Apache Any23 0.6.0-M2 |
| Release Notes |
| |
| The release 0.6.0-M2 introduces major fixes on M1 milestone |
| [154, 155, 156] and improves Configuration [147] and Microdata |
| error management[157]. |
| |
| ========================================================================== |
| |
| Apache Any23 0.6.0-M1 |
| Release Notes |
| |
| The release 0.6.0-M1 is an early preview of the |
| Microdata support. [114] |
| |
| ========================================================================== |
| |
| Apache Any23 0.5.0 |
| Release Notes |
| |
| Fixes |
| |
| * Fixed wrong conversion of a generic XML file to RDF. [131] |
| * Fixed usage of 'base' tag when resolving relative URIs |
| in RDFa. [75] |
| * Fixed error parsing Turtle data. [87] |
| * Fixed issue with escaping in NQuads parser. [126] |
| * Fixed XML DTD validation attempt. [95] |
| * Fixed concurrent modification exception in |
| ExtractionContentBlocker filter. [86] |
| * Fixed mime type detection of direct input when source |
| contains blank chars. [83, 90] |
| * Fixed reporting when producing no triples. [79] |
| * Fixed any23-service packaging, added profile for excluding |
| embedded dependencies. [113] |
| |
| Enhancements |
| |
| * Improved extraction report: added list of |
| activated extractors. [89] |
| * Improved extraction of HTML link element. [133] |
| * Added XPath HTML extractor. [124] |
| * Added HRecipe Microformat extractor. [103] |
| * Added plugin support for Apache Any23. [111] |
| * Implemented HTML Scraper Plugin. [123] |
| * Upgraded to Sesame 2.4.0. [136] |
| * Upgraded to Jetty 8.0.0 [138] |
| * Upgraded maven-site-plugin. [85] |
| * Added flags to exclude metadata triples [134] |
| * Added removal of CSS related triples. [135] |
| * Improved overall documentation. [130] |
| * Overall POM refactoring. [125] |
| |
| ========================================================================== |
| |
| Apache Any23 0.4.0 |
| Release Notes |
| |
| * The any23-service module has been separated from the any23-core module, |
| the Ant build system has been dropped. [Issue 44] |
| * Added support for HTML metadata (RDFa / Microformats) validation |
| and correction (validator). [Issue 77] |
| * Added flag to disable the nesting relationship property |
| enrichment. [Issue 67] |
| * Improved coverage of Microformats tests. [Issue 65] |
| * Improved documentation. [Issue 44] |
| * Various code consolidation. [Issues 68, 69, 70, 71, 72, 73, 74, 77] |
| |
| ========================================================================== |
| |
| Apache Any23 0.3.0 |
| Release Notes |
| |
| * Added detection and enrichment of nested microformats. [Issue #61] |
| * Added detection and support of N-Quads as input and output format. [Issue #7] |
| * General Improvements in RDFa extraction. [Issue #12, Issue #14] |
| * Added support of Turtle embedded in HTML script tag. [Issue #62] |
| * Improvement in encoding support. [Issue #43] |
| * Improvement in Core API. [Issue #27] |
| * Improved support for Species Microformat. [Issue #63] |
| * General Code prettification. |
| |
| ========================================================================== |
| |
| Apache Any23 0.2.2 |
| Release Notes |
| |
| * Fixed dependency management on Maven. A second level dependency of Xerces |
| introduced a conflict on the java.xml.transform API causing wrong XSLT |
| transformations within RDFa extractor. |
| |
| ========================================================================== |
| |
| Apache Any23 0.2.1 |
| Release Notes |
| |
| * Major applyFix on Tika configuration management. This applyFix solves the |
| auto detection of the main Semantic Web related formats. |
| |
| ========================================================================== |
| |
| Apache Any23 0.2 |
| Release Notes |
| |
| ============ |
| Introduction |
| ============ |
| |
| This release features a redesigned API and incorporating enhancements and |
| bug fixes that have accumulated since the 0.1 release. |
| Apart from some new or changed dependencies on the underlying libraries, |
| this version comes with an improved unit test coverage and other features |
| like the automatic charset encoding detection and an improved documentation. |
| Maven build system has been introduced. |
| |
| |
| ================================== |
| Summary of major changes since 0.1 |
| ================================== |
| |
| * Redesigned Java API |
| - Input from string, stream, file, or URI |
| - Allow choosing which extractors to use |
| - Report origin of triples (document/extractor) to client processors |
| - Various processors/serializers for extracted triples |
| * Added flexible command-line tool for easy testing |
| * Vastly improved website and documentation |
| * Media type and encoding detection via Apache Tika |
| * Switched RDF library from Jena to Sesame |
| * Added Maven build |
| * Better RDF extraction from Microformats |
| * Extractors now come with an example file to document typical in- and output |
| * Major refactoring |
| * Lots and lots of bugfixes |
| |
| ================= |
| Supported formats |
| ================= |
| |
| * RDF/XML |
| * Notation3 and Turtle |
| * N-Triples |
| * RDFa |
| |
| Various microformats, see http://sindice.com/developers/microformat on Sindice Microformats support. |
| |
| =================== |
| Dependency Upgrade |
| =================== |
| |
| CyberNeko Html parser has been upgraded to 1.9.14. |
| |
| Apache Tika 0.3 has been replaced with 0.6, with the |
| new support for the automatic encoding detection. |
| |
| EOF |
| |