| Apache Any23 1.1 |
| Release Notes |
| 15/10/2014 (dd/mm/yyyy) |
| Bug |
| |
| [ANY23-205] - Remove xrefs from Any23 site and replave with Git(hub) links |
| [ANY23-220] - Run crawler plugin on Apache Any23 site |
| [ANY23-234] - No writer factory available for RDF format N-Quads (mimeTypes=text/x-nquads; ext=nq) |
| |
| Improvement |
| |
| [ANY23-157] - Update Any23 site to accommodate move to Git. |
| [ANY23-197] - Extract embedded json-ld from html documents |
| [ANY23-204] - fix url encoding problem : PR#3 |
| [ANY23-209] - Bug in site generation |
| [ANY23-221] - Enable JSON-LD as an input format for the WebService at any23.org |
| [ANY23-238] - Fix generation of BNode name for microdata when 'itemid' is given without a value. |
| |
| New Feature |
| |
| [ANY23-7] - Performance test suite |
| [ANY23-160] - [SECURITY] Frame injection vulnerability in published Javadoc |
| |
| Task |
| |
| [ANY23-222] - Push 1.1-SNAPSHOT artifacts to the Any23 website |
| |
| |
| Apache Any23 1.0 |
| Release Notes |
| 09/05/2014 (dd/mm/yyyy) |
| |
| Sub-task |
| |
| [ANY23-148] - Programmes Ontology |
| |
| Bug |
| |
| [ANY23-100] - Issue with RDFa extractor while processing nested properties |
| [ANY23-135] - Any23 RDFa Extractor ignores multiple prefix and property statements |
| [ANY23-136] - Some RDFa tests have incorrect expected results |
| [ANY23-168] - RDFa properties in <meta> elements not picked up |
| [ANY23-170] - Dependency error org.apache.commons:commons-csv:1.0-SNAPSHOT-rev1148315 |
| [ANY23-172] - Fix minor issues with Any23 0.9.0 RC |
| [ANY23-173] - Please delete old releases from mirroring system |
| [ANY23-174] - Incorrect RDFa extractions |
| [ANY23-203] - Update version revisions from 0.9.1 to 1.0 |
| |
| Improvement |
| |
| [ANY23-65] - Update to RDFa extraction stylesheet |
| [ANY23-128] - html-rdfa11 extractor fails on mailto: anchors |
| [ANY23-130] - Improve aesthetics of the output format when straying from default java.io.PrintStream |
| [ANY23-137] - RDFa parser implementation proposal |
| [ANY23-179] - Improve Javadoc and throwing of IllegalArgumentException in Any23#createDocumentSource |
| [ANY23-180] - Create an Apache hosted jail running an Any23 service instance |
| [ANY23-181] - Upgrade NekoHTML to 1.9.20 |
| |
| New Feature |
| |
| [ANY23-134] - Create o.a.a.extractor.tika Parser and Extractor implementations |
| [ANY23-177] - Add support for JSON-LD |
| |
| Task |
| |
| [ANY23-162] - Add package.java for all LKIFCore classes |
| |
| Apache Any23 0.9.0 |
| Release Notes |
| 28/10/2013 (dd/mm/yyyy) |
| |
| Sub-task |
| |
| [ANY23-142] - LKIF-Core Vocabulary |
| [ANY23-143] - LRICore Vocabulary |
| |
| Bug |
| |
| [ANY23-111] - Any23 raises an unmanaged exception from the Microdata parser |
| [ANY23-115] - Empty spans seem to break ANY23 |
| [ANY23-161] - Fix service file generation |
| [ANY23-165] - "Invalid content" error if TITLE precedes encoding declaration in the document |
| [ANY23-171] - form.html not in correct location in service. |
| |
| Improvement |
| |
| [ANY23-47] - Migrate basic-crawler classes to org.apache.nutch |
| [ANY23-164] - office-scraper ExcelExtractorFactory.java to accept application/x-tika-ooxml and application/x-tika-msoffice formats |
| |
| New Feature |
| |
| [ANY23-120] - Split CLI tools out into a new module |
| |
| Task |
| |
| [ANY23-122] - Cleanup Distribution Mirrors |
| |
| Apache Any23 0.8.0 |
| Release Notes |
| 01/05/2013 (dd/mm/yyyy) |
| |
| Sub-task |
| |
| [ANY23-109] - Missing tika-config.xml in o.a.a.mime |
| [ANY23-110] - DOAP Vocabulary |
| |
| Bug |
| |
| [ANY23-44] - error when parsing a document from http://www.afdsi.org/docs/test/html/RDFa/_food-stream_.htm |
| [ANY23-78] - Download page links are broken |
| [ANY23-108] - Broken schema.org microdata extraction |
| [ANY23-112] - Fix incubation disclaimer |
| [ANY23-113] - Remove dependencies from parent pom.xml file |
| [ANY23-116] - Empty values are skipped when reading tab separated CSV. |
| [ANY23-156] - Add logging dependencies to plugins and service |
| |
| Improvement |
| |
| [ANY23-2] - Add support for hreview-aggregate microformat. |
| [ANY23-26] - Upgrade dependency to Apache Tika 1.2 |
| [ANY23-46] - Update Any23 web service |
| [ANY23-83] - Remove hardcoded formats throughout Any23 to make it useful as a library |
| [ANY23-101] - Use RDFFormat.NQUADS in nquads module |
| [ANY23-139] - Simplify site deploy plugging the maven-scm-publish-plugin |
| [ANY23-144] - Implement comprehensive naming of o.a.a.api.vocab classes |
| |
| New Feature |
| |
| [ANY23-4] - Integrate W3C's RDFa test suite and pass all tests |
| [ANY23-85] - Split NQuads out into its own module |
| [ANY23-96] - Add user agent string to basic-crawler |
| [ANY23-117] - Split Mime type detection out into its own module |
| [ANY23-118] - Split Encoding detection out into its own module |
| |
| Task |
| |
| [ANY23-41] - Write basic-crawler plugin documentation |
| [ANY23-125] - Drop the Incubating DISCLAIMER |
| |
| |
| Apache Any23 0.7.0-incubating |
| Release Notes |
| 25/06/2012 |
| |
| Sub-task |
| |
| [ANY23-25] - Update all Maven POM's in trunk |
| [ANY23-31] - Move any23 site documentation out of trunk and into its own SVN directory |
| [ANY23-53] - Bad Web Service documentation |
| |
| Bug |
| |
| [ANY23-14] - Add support for Extractor sub results |
| [ANY23-20] - The Any23 PluginManager fails handing resource paths containing spaces. |
| [ANY23-34] - Plugin Integration Test Fails |
| [ANY23-37] - LGPL'ed components cannot be included in distribution packages |
| [ANY23-42] - Fix issue in RDFa11Parser.java is not resolving relative URIs correctly |
| [ANY23-49] - N3/NQ parsers ignoring stopAtFirstError flag |
| [ANY23-58] - HCardExtractor infinite loop and memory exhaustion |
| [ANY23-62] - ExtractionResultImpl loses all issues generated by sub extractions |
| [ANY23-73] - The ToolRunner CLI driver -p (--plugins-dir) option doesn't work because parsed after the Tool list loading |
| [ANY23-77] - Facing a infinite loop problem in version 0.6.1 - Verify |
| [ANY23-78] - Download page links are broken |
| [ANY23-87] - Bogus arguement in o.a.a.cli.CrawlerTest |
| [ANY23-88] - any23 script -v or --version option doesn't display actual version |
| [ANY23-94] - The Microdata CLI tool doesn't work anymore |
| [ANY23-95] - Activate the IgnoreAccidentalRDFa filter for the Any23 Service instance |
| [ANY23-97] - The test suite was not running all tests, minor regressions occurred |
| |
| Improvement |
| |
| [ANY23-18] - Add a new extractor for RDFa using java-rdfa |
| [ANY23-28] - Document munging of Any23 history to CHANGES.txt |
| [ANY23-32] - replace hardcoded bash script with generated via appassembler |
| [ANY23-33] - Replace proprietary SUN imports from Any23 classes. |
| [ANY23-45] - Improve issue verification support in Extractor tests |
| [ANY23-50] - Simplify plugin loading avoiding the classpath scanning |
| [ANY23-56] - Change repo-ext to Any23 SVN mirrior repo. |
| [ANY23-63] - The Any23 web service doesn't return the Issue Report generated by activated Extractors, hiding major metadata issues |
| [ANY23-64] - Improve CLI uage aesthetics |
| [ANY23-70] - Establish searchable list archives |
| [ANY23-71] - improve the current CLI engine |
| [ANY23-74] - Disable domain triple generation in default configuration |
| [ANY23-75] - Improve runtime of the Microdata extractor on documents with many relations. |
| [ANY23-76] - Improve runtime of the Microformat extractor on documents with many relations. |
| [ANY23-82] - Don't use explicit reference to Log4j classes |
| [ANY23-86] - Better logging in SiteCrawlerTest |
| |
| New Feature |
| |
| [ANY23-9] - Prepare a dedicated homepage for Any23 |
| [ANY23-29] - Migrate code base to ASF infrastructure |
| [ANY23-57] - Create Any23 History documentation and add to site |
| [ANY23-59] - Create KEYS file for Any23 |
| [ANY23-68] - Create Powered By documentation/page |
| [ANY23-102] - Any23 DOAP file |
| |
| Task |
| |
| [ANY23-21] - Migrate all packages and classes to ORG.APACHE.ANY23 |
| [ANY23-27] - Import revisions r1547 to r1607 from Google Code SVN to ASF SVN |
| [ANY23-36] - Merge GCode specific CHANGES.txt report in main changes.xml |
| [ANY23-39] - Write Down Overall Architecture Document to help new developers maintaining the Any23 core |
| [ANY23-48] - Update Documentation (Site + READMEs) to reflect changes in shell script usage |
| [ANY23-52] - Remove non ASF logos from Any23 Service page |
| [ANY23-66] - Fix Javadoc |
| |
| ========================================================================== |
| |
| Apache Any23 0.6.1 |
| Release Notes |
| |
| Fixes |
| |
| * Improved MIMEType detection for CSV input. [172, 176] |
| |
| ========================================================================== |
| |
| Apache Any23 0.6.0 |
| Release Notes |
| |
| Fixes |
| |
| * Fixed several bugs. [151, 153, 154, 155, 156, 164, 168] |
| * Removed unused Apache Any23 dependencies. [162] |
| * Introduced parent POM dependencyManagement. [163] |
| * Minor code refactoring. [142] |
| * Updated project documentation. [161] |
| |
| Enhancements |
| |
| * Added support for Microdata [114, 141, 144, 145, 152, 157] |
| * Added RDFa 1.1 support for new prefix specification. [143] |
| * Added CSV Extractor (RDFizer). [150, 165] |
| * Added HTML/META Extractor. [148, 149] |
| * Improved Configuration programmatic management. [147] |
| * Added several flags to control metadata triples generation. [146] |
| * Improved nesting relationship explicitation in Microformat extractors. [80] |
| * Major Extractor interface refactoring. [160, 167] |
| * Improved TagSoup Extractor based error reporting. [159] |
| * Added command-line tool to print out the Apache Any23 declared vocabularies. [114] |
| |
| ========================================================================== |
| |
| Apache Any23 0.6.0-M2 |
| Release Notes |
| |
| The release 0.6.0-M2 introduces major fixes on M1 milestone |
| [154, 155, 156] and improves Configuration [147] and Microdata |
| error management[157]. |
| |
| ========================================================================== |
| |
| Apache Any23 0.6.0-M1 |
| Release Notes |
| |
| The release 0.6.0-M1 is an early preview of the |
| Microdata support. [114] |
| |
| ========================================================================== |
| |
| Apache Any23 0.5.0 |
| Release Notes |
| |
| Fixes |
| |
| * Fixed wrong conversion of a generic XML file to RDF. [131] |
| * Fixed usage of 'base' tag when resolving relative URIs |
| in RDFa. [75] |
| * Fixed error parsing Turtle data. [87] |
| * Fixed issue with escaping in NQuads parser. [126] |
| * Fixed XML DTD validation attempt. [95] |
| * Fixed concurrent modification exception in |
| ExtractionContentBlocker filter. [86] |
| * Fixed mime type detection of direct input when source |
| contains blank chars. [83, 90] |
| * Fixed reporting when producing no triples. [79] |
| * Fixed any23-service packaging, added profile for excluding |
| embedded dependencies. [113] |
| |
| Enhancements |
| |
| * Improved extraction report: added list of |
| activated extractors. [89] |
| * Improved extraction of HTML link element. [133] |
| * Added XPath HTML extractor. [124] |
| * Added HRecipe Microformat extractor. [103] |
| * Added plugin support for Apache Any23. [111] |
| * Implemented HTML Scraper Plugin. [123] |
| * Upgraded to Sesame 2.4.0. [136] |
| * Upgraded to Jetty 8.0.0 [138] |
| * Upgraded maven-site-plugin. [85] |
| * Added flags to exclude metadata triples [134] |
| * Added removal of CSS related triples. [135] |
| * Improved overall documentation. [130] |
| * Overall POM refactoring. [125] |
| |
| ========================================================================== |
| |
| Apache Any23 0.4.0 |
| Release Notes |
| |
| * The any23-service module has been separated from the any23-core module, |
| the Ant build system has been dropped. [Issue 44] |
| * Added support for HTML metadata (RDFa / Microformats) validation |
| and correction (validator). [Issue 77] |
| * Added flag to disable the nesting relationship property |
| enrichment. [Issue 67] |
| * Improved coverage of Microformats tests. [Issue 65] |
| * Improved documentation. [Issue 44] |
| * Various code consolidation. [Issues 68, 69, 70, 71, 72, 73, 74, 77] |
| |
| ========================================================================== |
| |
| Apache Any23 0.3.0 |
| Release Notes |
| |
| * Added detection and enrichment of nested microformats. [Issue #61] |
| * Added detection and support of N-Quads as input and output format. [Issue #7] |
| * General Improvements in RDFa extraction. [Issue #12, Issue #14] |
| * Added support of Turtle embedded in HTML script tag. [Issue #62] |
| * Improvement in encoding support. [Issue #43] |
| * Improvement in Core API. [Issue #27] |
| * Improved support for Species Microformat. [Issue #63] |
| * General Code prettification. |
| |
| ========================================================================== |
| |
| Apache Any23 0.2.2 |
| Release Notes |
| |
| * Fixed dependency management on Maven. A second level dependency of Xerces |
| introduced a conflict on the java.xml.transform API causing wrong XSLT |
| transformations within RDFa extractor. |
| |
| ========================================================================== |
| |
| Apache Any23 0.2.1 |
| Release Notes |
| |
| * Major applyFix on Tika configuration management. This applyFix solves the |
| auto detection of the main Semantic Web related formats. |
| |
| ========================================================================== |
| |
| Apache Any23 0.2 |
| Release Notes |
| |
| ============ |
| Introduction |
| ============ |
| |
| This release features a redesigned API and incorporating enhancements and |
| bug fixes that have accumulated since the 0.1 release. |
| Apart from some new or changed dependencies on the underlying libraries, |
| this version comes with an improved unit test coverage and other features |
| like the automatic charset encoding detection and an improved documentation. |
| Maven build system has been introduced. |
| |
| |
| ================================== |
| Summary of major changes since 0.1 |
| ================================== |
| |
| * Redesigned Java API |
| - Input from string, stream, file, or URI |
| - Allow choosing which extractors to use |
| - Report origin of triples (document/extractor) to client processors |
| - Various processors/serializers for extracted triples |
| * Added flexible command-line tool for easy testing |
| * Vastly improved website and documentation |
| * Media type and encoding detection via Apache Tika |
| * Switched RDF library from Jena to Sesame |
| * Added Maven build |
| * Better RDF extraction from Microformats |
| * Extractors now come with an example file to document typical in- and output |
| * Major refactoring |
| * Lots and lots of bugfixes |
| |
| ================= |
| Supported formats |
| ================= |
| |
| * RDF/XML |
| * Notation3 and Turtle |
| * N-Triples |
| * RDFa |
| |
| Various microformats, see http://sindice.com/developers/microformat on Sindice Microformats support. |
| |
| =================== |
| Dependency Upgrade |
| =================== |
| |
| CyberNeko Html parser has been upgraded to 1.9.14. |
| |
| Apache Tika 0.3 has been replaced with 0.6, with the |
| new support for the automatic encoding detection. |
| |
| EOF |
| |