Path

ez components / documentation / api reference / 2008.2.1 / document


eZ Components 2008.2.1

Document: ezcDocumentWikiMediawikiTokenizer

[ Tutorial ] [ Conversion ] [ Class tree ] [ Element index ] [ ChangeLog ] [ Credits ]

Class: ezcDocumentWikiMediawikiTokenizer

Tokenizer for Mediawiki wiki documents. [source]
Mediawiki is probably the most popular wiki, and the driving force behing Wikipedia. The markup has a lot extension, but the basics are defined at:
http://www.mediawiki.org/wiki/Markup_spec

Parents

ezcDocumentWikiTokenizer
   |
   --ezcDocumentWikiMediawikiTokenizer

Constants

NEW_LINE = '(?:\\r\\n|\\r|\\n)' Regular sub expression to match newlines.
SPECIAL_CHARS = '/*^,\'_<>\\\\\\[\\]{}()|=' Special characters, which do have some special meaaning and though may not have been matched otherwise.
TEXT_END_CHARS = '/*^,\'_<>\\\\\\[\\]{}()|=\\r\\n\\t\\x20' Characters ending a pure text section.
WHITESPACE_CHARS = '[\\x20\\t]' Common whitespace characters. The vertical tab is excluded, because it causes strange problems with PCRE.

Inherited Member Variables

From ezcDocumentWikiTokenizer:
protected  ezcDocumentWikiTokenizer::$tokens

Method Summary

protected array filterTokens( $tokens )
Filter tokens
public void __construct( )
Construct tokenizer

Inherited Methods

From ezcDocumentWikiTokenizer :
public abstract void ezcDocumentWikiTokenizer::__construct()
Construct tokenizer
protected void ezcDocumentWikiTokenizer::convertTabs()
Convert tabs to spaces
protected abstract array ezcDocumentWikiTokenizer::filterTokens()
Filter tokens
public array ezcDocumentWikiTokenizer::tokenizeFile()
Tokenize the given file
public array ezcDocumentWikiTokenizer::tokenizeString()
Tokenize the given string

Methods

filterTokens

array filterTokens( $tokens )
Filter tokens
Method to filter tokens, after the input string ahs been tokenized. The filter should extract additional information from tokens, which are not generally available yet, like the depth of a title depending on the title markup.

Parameters

Name Type Description
$tokens array  

Redefinition of

Method Description
ezcDocumentWikiTokenizer::filterTokens() Filter tokens

__construct

void __construct( )
Construct tokenizer
Create token array with regular repression matching the respective token.

Redefinition of

Method Description
ezcDocumentWikiTokenizer::__construct() Construct tokenizer

Last updated: Mon, 09 Feb 2009