rss-filter
==========
- http://spenibus.net
- https://github.com/spenibus/rss-filter-php
- https://gitlab.com/spenibus/rss-filter-php
- https://bitbucket.org/spenibus/rss-filter-php
This script aggregates and filters rss feeds to output only desired items as a
single feed.
- Configuration files use the xml extension.
- Configuration files must be placed in `./config/`
- example: `./config/customFeed.xml`
- You can create multiple configuration files
- A configuration file is loaded by calling the `config` parameter with the
configuration filename without extension
- example: `rss-filter/?config=customFeed`
- A configuration file sample is available at `./example.xml`
- Configuration keywords are case sensitive, details below.
Configuration structure
-----------------------
````
````
Configuration keywords
----------------------
- `config`
- root element
- appears only once in entire document
- `title`
- sets the title of the output feed
- appears only once within `config`
- `ruleSet`
- a configuration block
- can occur multiple times within `config`
- `source`
- a url pointing to a rss feed
- can occur multiple times within `ruleSet`
- `timeout`
- how long to wait for a source
- this is any number above zero
- `userAgent`
- adds a user agent to the http request for the source
- `titleDuplicateRemove`
- value: true|false, default: false
- appears only once within `ruleSet`
- when this is set to `true`, if multiple items from sources share the same
title, only the most recent is kept
- `linkDuplicateRemove`
- value: true|false, default: false
- appears only once within `ruleSet`
- when this is set to `true`, if multiple items from sources share the same
link, only the most recent is kept
- `rules`
- a block of rules
- can occur multiple times within `ruleSet`
- `titleMatch`
- a regular expression usable by PCRE (preg_*), ex: `/(foo|bar)/siu`
- when at least one `titleMatch` matches the title of an item from one of
the sources, the item is kept
- this works like the logical operator `OR`
- `titleMatchNot`
- a regular expression usable by PCRE (preg_*), ex: `/(foo|bar)/siu`
- when at least one `titleMatchNot` matches the title of an item from one of
the sources, the item is discarded
- this works like the logical operator `AND NOT`
- `titleMatchMust`
- a regular expression usable by PCRE (preg_*), ex: `/(foo|bar)/siu`
- when at least one `titleMatchMust` doesn't match the title of an item from
one of the sources, the item is discarded
- this works like the logical operator `AND`
- `descriptionMatch`
- a regular expression usable by PCRE (preg_*), ex: `/(foo|bar)/siu`
- when at least one `descriptionMatch` matches the description of an item from one of
the sources, the item is kept
- this works like the logical operator `OR`
- `descriptionMatchNot`
- a regular expression usable by PCRE (preg_*), ex: `/(foo|bar)/siu`
- when at least one `descriptionMatchNot` matches the description of an item from one of
the sources, the item is discarded
- this works like the logical operator `AND NOT`
- `descriptionMatchMust`
- a regular expression usable by PCRE (preg_*), ex: `/(foo|bar)/siu`
- when at least one `descriptionMatchMust` doesn't match the description of an item from
one of the sources, the item is discarded
- this works like the logical operator `AND`
- `categoryMatch`
- a regular expression usable by PCRE (preg_*), ex: `/(foo|bar)/siu`
- when at least one `categoryMatch` matches any category of an item from one of
the sources, the item is kept
- this works like the logical operator `OR`
- `categoryMatchNot`
- a regular expression usable by PCRE (preg_*), ex: `/(foo|bar)/siu`
- when at least one `categoryMatchNot` matches any category of an item from one of
the sources, the item is discarded
- this works like the logical operator `AND NOT`
- `categoryMatchMust`
- a regular expression usable by PCRE (preg_*), ex: `/(foo|bar)/siu`
- when at least one `categoryMatchMust` doesn't match any category of an item from
one of the sources, the item is discarded
- this works like the logical operator `AND`
- `before`
- a string representing time than can be parsed by `strtotime()`
- personally recommended format: `2014-12-31 23:59:59 +1200`
- when an item pubDate is more recent than this, the item is discarded
- `after`
- a string representing time than can be parsed by `strtotime()`
- personally recommended format: `2014-12-31 23:59:59 +1200`
- when an item pubDate is older than this, the item is discarded
- `olderThan`
- a string representing a duration using a number followed by an optional quantifier, ex: `7d`
- available quantifiers:
- `s` for seconds
- `m` for minutes
- `h` for hours
- `d` for days
- default quantifier is `s` when omitted
- when an item pubDate is more recent than the current time minus this duration, the item is discarded
- `newerThan`
- a string representing a duration using a number followed by an optional quantifier, ex: `7d`
- available quantifiers:
- `s` for seconds
- `m` for minutes
- `h` for hours
- `d` for days
- default quantifier is `s` when omitted
- when an item pubDate is older than the current time minus this duration, the item is discarded
Notes
-----
- `rules` blocks only apply to items coming from the `source` elements within the
same `ruletSet` block
- keywords within a `rules` block only apply to that block, this is important to
remember when using multiple `rules` block because while one block can exclude
some items, another block can still include them
- multiple identically named keywords can appear in one `rules` block
- multiple `*MatchMust` keywords within one `rules` block are implicitly AND-connected,
i.e. **all matches must be true** in order for a feed item to be added to the output
- multiple `*MatchNot` keywords within one `rules` block are implicitly AND-NOT-connected,
i.e. **all matches must be false** in order for a feed item to be added to the output
- multiple `*Match` keywords within one `rules` block are implicitly OR-connected,
i.e. **at least one match must be true** in order for a feed item to be added to the output
- multiple `ruleSet` elements are implicitly OR-connected,
i.e. **at least one `rules` block must be true** in order for a feed item to be added to the output
Examples
--------
````
/red//army/
````
The example above will return "red army" because the first `rules` block has
already added the item to the output when the second `rules` block is evaluated.
````
/red//army/
````
The example above will not return "red army".
````
/red//army//.*/
````
The example above will also return "red army" because even though the first
`rules` block has discarded the item, the second one will match it.
It is possible to remove unused keywords, as in the example below:
````
````