Document: ezcDocumentPdfTokenizer
[ ]
[ Conversion ] [ Styles ]
[ ]
[ ]
[ ]
[ ]
Class: ezcDocumentPdfTokenizer
|
Abstract base class for tokenizer implementations. [
source]
Tokenizers are used to split a series of words (sentences) into single words, which can be rendered split by spaces.
Descendents
Constants
SPACE
= 0
|
Constant indicating a breaking point, including a rendered space. |
WRAP
= 1
|
Constant indicating a possible breaking point without rendering a space character. |
Method Summary
|
public abstract array |
tokenize(
$string )
Split string into words |
Methods
tokenize
array tokenize(
string
$string )
Split string into words
This function takes a string and splits it into words. There are different mechanisms which indicate possible splitting points in the resulting word stream:
- self:SPACE: The renderer might render a space
- self:WRAP: The renderer might wrap the line at this position, but will
not render spaces, might as well just be omitted.
A possible splitting of an english sentence might look like:
1. array(
2. 'Hello',
3. self:SPACE,
4. 'world!',
5. );
Non breaking spaces should not be splitted into multiple words, so there will be no break applied.
Parameters
| Name |
Type |
Description |
$string |
string |
|
Redefined in descendants as
Last updated: Mon, 29 Jun 2009