| ------ |
| Apache Any23 - PoweredBy |
| ------ |
| The Apache Software Foundation |
| ------ |
| 2011-2012 |
| |
| ~~ Licensed to the Apache Software Foundation (ASF) under one or more |
| ~~ contributor license agreements. See the NOTICE file distributed with |
| ~~ this work for additional information regarding copyright ownership. |
| ~~ The ASF licenses this file to You under the Apache License, Version 2.0 |
| ~~ (the "License"); you may not use this file except in compliance with |
| ~~ the License. You may obtain a copy of the License at |
| ~~ |
| ~~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~~ |
| ~~ Unless required by applicable law or agreed to in writing, software |
| ~~ distributed under the License is distributed on an "AS IS" BASIS, |
| ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| ~~ See the License for the specific language governing permissions and |
| ~~ limitations under the License. |
| |
| PoweredBy |
| |
| This page collects a set of project and web initiatives which use <Apache Any23>. |
| |
| [./images/logo-sindice-90x30.png] |
| |
| Sindice was a platform to build applications on top of structured data. |
| Sindice collected Web Data in many ways, following existing web standards, |
| and offered Search and Querying across this data, updated live every few minutes. |
| |
| [./images/fu-logo-90x25.png][./images/kit-logo-90x40.png] |
| |
| {{{http://webdatacommons.org/}WebDataCommons}} |
| |
| Extracting Structured Data from the Common Web Crawl. |
| More and more websites have started to embed structured data describing products, |
| people, organizations, places, events into their HTML pages. The Web Data Commons |
| project extracts this data from several billion web pages and provides the extracted |
| data for download. Web Data Commons thus enables you to use the data without needing |
| to crawl the Web yourself. |
| |
| [./images/nutch_logo_tm.png] |
| |
| {{{http://nutch.apache.org}Apache Nutch}} |
| |
| Apache Nutch is a well matured, production ready Web crawler. Nutch enables fine |
| grained configuration, relying on Apache Hadoop data structures, which are |
| great for batch processing. |
| Nutch uses Any23 within it's plugin infrastructure to extract structured data markup |
| from Webpages. This data can then be indexed into one of the Nutch supported storage |
| mechanisms. |
| |