Copy updates from main ponymail repo
diff --git a/source/markdown/docs/API.md b/source/markdown/docs/API.md
index a73213e..1d62f75 100644
--- a/source/markdown/docs/API.md
+++ b/source/markdown/docs/API.md
@@ -33,12 +33,15 @@
"tid": "06b318af97ca96c115e878c14d0814a53407751c31388410421c1751@1441467256@<dev.any23.apache.org>",
"list_raw": "<dev.any23.apache.org>"
}
+
+Note: date and epoch are in UTC
+
~~~
### Fetching list data
Usage:
-`GET /api/stats.lua?list=$list&domain=$domain[&d=$timespan][&q=$query][&header_from=$from][&header_to=$to][&header_subject=$subject][&header_body=$body][&quick][&emailsOnly][&s=$s&e=$e]`
+`GET /api/stats.lua?list=$list&domain=$domain[&d=$timespan][&q=$query][&header_from=$from][&header_to=$to][&header_subject=$subject][&header_body=$body][&quick][&emailsOnly][&s=$s&e=$e][&since=$since][&dfrom=$dfrom&dto=$dto]`
See below for details of [timespan](#Timespans) values
@@ -46,7 +49,7 @@
- $list: The list prefix (e.g. `dev`). Wildcards may be used
- $domain: The list domain (e.g. `httpd.apache.org`). Wildcards may be used
- - $timespan: A timespan value (see below)
+ - $timespan: A [timespan](#Timespans) value
- $s: yyyy-mm start of month (day 1)
- $e: yyyy-mm end of month (last day)
- $query: A search query (may contain wildcards or negations):
@@ -57,6 +60,13 @@
- $to: Optional To: address
- $subject: Optional Subject: line
- $body: Optional body text
+ - $since: number of seconds since the epoch, defaults to now.
+ Returns '{"changed":false}' if no emails are later than epoch, otherwise proceeds with normal search
+ - $dfrom: days ago to start
+ - $dto: total days to match
+
+Options:
+
- quick: send statistics only (exclude participants, threadstruct, word-cloud, emails apart from epoch)
- emailsOnly: return email summaries only (omit thread_struct, top 10 participants and word-cloud)
@@ -94,7 +104,35 @@
"name": "dev",
"cloud": {...},
"hits": 25,
- "thread_struct": {...},
+ thread_struct":
+ {
+ "nest": 2,
+ "children": {
+ {
+ "children": {
+ {
+ "children": {
+ {
+ "children": { },
+ epoch: ...,
+ tid: ...,
+ nest: 1
+ }
+ },
+ epoch: ...,
+ tid: ...,
+ nest: 2
+ }
+ },
+ "epoch": 1474883100,
+ "tid": "b1d6446f5cc8f4846454cbabc48ddb08afbb601a77169f8e32e34102@<dev.ponymail.apache.org>",
+ "nest": 2
+ }
+ },
+ epoch: ...,
+ tid: ...,
+ body: ...
+ },
"max": 5000,
"searchlist": "<dev.ponymail.info>",
"list": "dev@ponymail.info",
@@ -167,9 +205,10 @@
### Fetching notifications for a logged in user
Usage:
-`GET /api/notifications.lua`
+`GET /api/notifications.lua[?seen=$mid]`
-Parameters: `None` (cookie required)
+Parameters: (cookie required)
+ - $mid: id of the message to be marked as having been seen
Response example:
@@ -178,6 +217,8 @@
{
"notifications": {...}
}
+or
+{"marked": true}
~~~
### Fetching a month's data as an mbox file
@@ -190,3 +231,21 @@
TBA
~~~
+### Get ATOM data for list or email
+
+Usage:
+`GET /api/atom/lua(?list=$lid|?mid=$mid)`
+
+Parameters: (cookie may be required)
+ - $lid: the list id, e.g. dev@ponymail.apache.org
+ - $mid: The email ID (Permalink)
+
+One of the above is required.
+In the case of the list id, data is returned for the last month.
+For email ID, the thread is returned.
+
+Response example:
+
+~~~
+TBA
+~~~
diff --git a/source/markdown/docs/DESIGN-NOTES.md b/source/markdown/docs/DESIGN-NOTES.md
new file mode 100644
index 0000000..1feee34
--- /dev/null
+++ b/source/markdown/docs/DESIGN-NOTES.md
@@ -0,0 +1,76 @@
+# Design Notes
+
+This file is an attempt to summarise some of the design issues.
+
+## Database
+The project uses the ElasticSearch (ES) database to store the mails as individual documents.
+The database stores each mail to each list as a separate document.
+If the same mail was sent to multiple lists, then it exists as multiple documents in the database.
+
+ES requires that each distinct document has a unique id (MID).
+The MID is used to insert the document in the database, and can be used to fetch it.
+
+### Database design
+The mails are stored in two separate ES indexes:
+* "mbox" - this stores information about the document, plus the parsed content, and is used for searching and summary displays.
+* "mbox_source" - this is used to store the raw content of the document.
+The two versions of the document are linked by using the same MID.
+
+### Requirements for the MID
+As mentioned above, each different document must have a unique id (MID).
+This document may arrive as a single mail message, or be loaded from a collection such as an mbox file.
+
+Duplicate database entries can be avoided by ensuring that the same MID is calculated regardless of the input source.
+[If the same message is processed more than once, it then does not matter as only the last instance will be stored.]
+The MID format does not have to be transparent; it can be an opaque hash.
+
+### Generation of the MID
+The same message may be sent to multiple lists, so the message data alone is not sufficient to identify it uniquely.
+The same message may potentially be sent more than once to the same list,
+so the combination of message and listname is also not sufficient to identify a message.
+
+Many messages will have a Message-Id header which is intended to be unique to the message.
+However this may not be the case, and some messages do not have one.
+
+Many mailing list servers will allocate a squence number or other such id to each message they send.
+This should be unique for the list, assuming that sequence is not reset.
+
+Where the Message-Id and List Server Id both exist, they can be combined to generate a MID.
+[If the List Server Id is known to be unique, then that can potentially be used alone.]
+
+Where one or other id does not exist, then alternative means need to be used to generate the MID.
+The data used to do so must be present it all supported message sources.
+
+Algorithms for the generator remain TBA
+
+### Permalink requirements
+The application provides Permalinks which can later be used to refer to any document in the database.
+Once published, it is important that such links must continue to work.
+
+Links should be portable; i.e. if the raw messages are loaded into a new archive it should be possible
+to support existing published Permalinks.
+
+Multiple links may refer to the same document, however each link should refer to a single document.
+Ideally the Permalink should be relatively short; however that may conflict with the uniqueness requirement.
+
+It may be useful for the Permalink format to be relatively transparent.
+For example, a current ASF mod_mbox link looks like:
+
+http://mail-archives.apache.org/mod_mbox/ponymail-commits/201605.mbox/<1f73b4e0fc1a4fbbbfe4d155293c2f1a@git.apache.org>
+
+This includes a reference to the:
+- mailing list name (ponymail-commits)
+- month when mail was sent (201605.mbox)
+- the Message-Id (<1f73b4e0fc1a4fbbbfe4d155293c2f1a@git.apache.org>)
+
+This information should be sufficient to find the message in just about any mail-archive.
+
+Whereas vendor-specific links may be much shorter, but are only valid for the particular service.
+For example the equivalent Markmail link is:
+http://markmail.org/message/oanktcpxlxkmyora
+
+There may be use cases for both styles of link.
+
+### Permalink design
+TBA
+
diff --git a/source/markdown/docs/INSTALLING.md b/source/markdown/docs/INSTALLING.md
index de197ef..0ecd8d3 100644
--- a/source/markdown/docs/INSTALLING.md
+++ b/source/markdown/docs/INSTALLING.md
@@ -14,8 +14,9 @@
## Pre-requisites ##
You will need the following software installed on your machine:
-- ElasticSearch >= 2.1
+- ElasticSearch >= 2.1 and < 6.0 (setup.py does not support 6.x+; the code may perhaps run on 6.x)
- Python 3.x for the archiver plugin (setup.py will handle dependencies) and importer
+- Python `html2text` package (GPLv3) if you wish to archive HTML-only mails (remember to add the `--html2text` command line arg)
- Apache HTTP Server 2.4.x with mod_lua (see http://modlua.org/gs/installing if you need to build mod_lua manually)
- Lua >=5.1 with the following modules: cjson, luasec, luasocket
(Note: Lua 5.3 is not currently supported by httpd mod_lua or luasocket)
@@ -208,4 +209,11 @@
To enable these headers, set `full_headers` to `true` in the `site/api/lib/config.lua` file.
### Lastly, a note about Message-ID (MID) generators
+The default MID generator is called 'medium' and digests the message
+body, timestamp and list-ID to generate the MID. There is also a 'short'
+that only digests the body, and a 'full' that uses the entire message as
+a bytestring to generate an ID. Medium is recommended for most setups
+(especially clustered setups), while full can be used for single-machine
+setups.
+N.B. At present, all the generators have issues, see (#176 #177 #178)
Please see [this paragraph](archiving.html#usingtherightidgenerator) about document ID generators.