Package xappy :: Module highlight :: Class Highlighter
[frames] | no frames]

Class Highlighter

source code

object --+
         |
        Highlighter

Class for highlighting text and creating contextual summaries.

>>> hl = Highlighter("en")
>>> hl.makeSample('Hello world.', ['world'])
'Hello world.'
>>> hl.highlight('Hello world', ['world'], ('<', '>'))
'Hello <world>'


Instance Methods
 
__init__(self, language_code='en', stemmer=None)
Create a new highlighter for the specified language.
source code
 
makeSample(self, text, query, maxlen=600, hl=None)
Make a contextual summary from the supplied text.
source code
 
highlight(self, text, query, hl, strip_tags=False)
Add highlights (string prefix/postfix) to a string.
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Properties

Inherited from object: __class__

Method Details

__init__(self, language_code='en', stemmer=None)
(Constructor)

source code 
Create a new highlighter for the specified language.
Overrides: object.__init__

makeSample(self, text, query, maxlen=600, hl=None)

source code 

Make a contextual summary from the supplied text.

This basically works by splitting the text into phrases, counting the query terms in each, and keeping those with the most.

Any markup tags in the text will be stripped.

text is the source text to summarise. query is either a Xapian query object or a list of (unstemmed) term strings. maxlen is the maximum length of the generated summary. hl is a pair of strings to insert around highlighted terms, e.g. ('<b>', '</b>')

highlight(self, text, query, hl, strip_tags=False)

source code 

Add highlights (string prefix/postfix) to a string.

text is the source to highlight. query is either a Xapian query object or a list of (unstemmed) term strings. hl is a pair of highlight strings, e.g. ('<i>', '</i>') strip_tags strips HTML markout iff True

>>> hl = Highlighter()
>>> qp = xapian.QueryParser()
>>> q = qp.parse_query('cat dog')
>>> tags = ('[[', ']]')
>>> hl.highlight('The cat went Dogging; but was <i>dog tired</i>.', q, tags)
'The [[cat]] went [[Dogging]]; but was <i>[[dog]] tired</i>.'