Document: ezcDocumentWikiMediawikiTokenizer
[ ]
[ Conversion ]
[ ]
[ ]
[ ]
[ ]
Class: ezcDocumentWikiMediawikiTokenizer
|
Tokenizer for Mediawiki wiki documents. [
source]
Mediawiki is probably the most popular wiki, and the driving force behing Wikipedia. The markup has a lot extension, but the basics are defined at:
http://www.mediawiki.org/wiki/Markup_spec
Parents
ezcDocumentWikiTokenizer
|
--ezcDocumentWikiMediawikiTokenizer
Constants
NEW_LINE
= '(?:\\r\\n|\\r|\\n)'
|
Regular sub expression to match newlines. |
SPECIAL_CHARS
= '/*^,\'_<>\\\\\\[\\]{}()|='
|
Special characters, which do have some special meaaning and though may not have been matched otherwise. |
TEXT_END_CHARS
= '/*^,\'_<>\\\\\\[\\]{}()|=\\r\\n\\t\\x20'
|
Characters ending a pure text section. |
WHITESPACE_CHARS
= '[\\x20\\t]'
|
Common whitespace characters. The vertical tab is excluded, because it causes strange problems with PCRE. |
Inherited Member Variables
From
ezcDocumentWikiTokenizer:
Method Summary
Inherited Methods
From
ezcDocumentWikiTokenizer :
Methods
filterTokens
array filterTokens(
$tokens )
Filter tokens
Method to filter tokens, after the input string ahs been tokenized. The filter should extract additional information from tokens, which are not generally available yet, like the depth of a title depending on the title markup.
Parameters
| Name |
Type |
Description |
$tokens |
array |
|
Redefinition of
__construct
void __construct(
)
Construct tokenizer
Create token array with regular repression matching the respective token.
Redefinition of
Last updated: Mon, 09 Feb 2009