Path

ez components / documentation / api reference / 2009.1 / document


eZ Components 2009.1

Document: ezcDocumentPdfTokenizer

[ Tutorial ] [ Conversion ] [ Styles ] [ Class tree ] [ Element index ] [ ChangeLog ] [ Credits ]

Class: ezcDocumentPdfTokenizer

Abstract base class for tokenizer implementations. [source]
Tokenizers are used to split a series of words (sentences) into single words, which can be rendered split by spaces.

Descendents

Child Class Description
ezcDocumentPdfDefaultTokenizer Tokenizer implementation for common texts, using whitespaces as word seperators.

Constants

SPACE = 0 Constant indicating a breaking point, including a rendered space.
WRAP = 1 Constant indicating a possible breaking point without rendering a space character.

Method Summary

public abstract array tokenize( $string )
Split string into words

Methods

tokenize

array tokenize( string $string )
Split string into words
This function takes a string and splits it into words. There are different mechanisms which indicate possible splitting points in the resulting word stream:
  • self:SPACE: The renderer might render a space
  • self:WRAP: The renderer might wrap the line at this position, but will not render spaces, might as well just be omitted.
A possible splitting of an english sentence might look like:
1.   array(
2.       'Hello',
3.       self:SPACE,
4.       'world!',
5.   );
Non breaking spaces should not be splitted into multiple words, so there will be no break applied.

Parameters

Name Type Description
$string string  

Redefined in descendants as

Method Description
ezcDocumentPdfDefaultTokenizer::tokenize() Split string into words

Last updated: Mon, 29 Jun 2009