[doc/guides/module] use browser2

This commit is contained in:
Alexandre Morignot 2015-06-17 23:45:52 +02:00 committed by Florent
commit 7f10865215

View file

@ -347,52 +347,59 @@ When your browser locates on a page, an instance of the class related to the
is created. You can declare methods on your class to allow your browser to is created. You can declare methods on your class to allow your browser to
interact with it. interact with it.
The first thing to know is that your instance owns these attributes: The first thing to know is that page parsing is done in a descriptive way. You
don't have to loop on HTML elements to construct the object. Just describe how
* ``browser`` - your ``ExampleBrowser`` class to get correct data to construct it. It is the Browser class work to actually
* ``logger`` - context logger construct the object.
* ``encoding`` - the encoding of the page
* ``response`` - the ``Response`` object from ``requests``
* ``url`` - current url
* ``doc`` - parsed document with ``lxml``
The most important attribute is ``doc`` you will use to get information from the page. You can call two methods:
* ``xpath`` - xpath expressions
* ``cssselect`` - CSS selectors
For example:: For example::
from weboob.browser.filters.html import Attr
from weboob.browser.filters.standard import CleanDecimal, CleanText
from weboob.capabilities.bank import Account from weboob.capabilities.bank import Account
class ListPage(LoggedPage, HTMLPage): class ListPage(LoggedPage, HTMLPage):
def get_accounts(self): @method
for el in self.doc.xpath('//ul[@id="list"]/li'): class get_accounts(ListElement):
account = Account() item_xpath = '//ul[@id="list"]/li'
account.id = el.attrib['id']
account.label = el.xpath('./td[@class="name"]').text
account.balance = Decimal(el.xpath('./td[@class="balance"]').text)
yield account
An alternative with ``cssselect``:: class item(ItemElement):
klass = Account()
from weboob.capabilities.bank import Account obj_id = Attr('id')
obj_label = CleanText('./td[@class="name"]')
obj_balance = CleanDecimal('./td[@class="balance"]')
class ListPage(LoggedPage, HTMLPage): As you see, we first set ``item_xpath`` which is the xpath string used to
def get_accounts(self): iterate over elements to access data. In a second time we define ``klass`` which
for el in self.document.getroot().cssselect('ul#list li'): is the real class of our object. And then we describe how to fill each object's
id = el.attrib['id'] attribute using what we call filters.
account = Account()
account.id = el.attrib['id'] Some example of filters:
account.label = el.cssselect('td.name').text
account.balance = Decimal(el.cssselect('td.balance').text) * :class:`Attr <weboob.browser.filters.html.Attr>`: extract a tag attribute
yield account * :class:`CleanText <weboob.browser.filters.standard.CleanText>`: get a cleaned text from an element
* :class:`CleanDecimal <weboob.browser.filters.standard.CleanDecimal>`: get a cleaned Decimal value from an element
* :class:`Date <weboob.browser.filters.standard.Date>`: read common date formats
* :class:`Link <weboob.browser.filters.html.Link>`: get the link uri of an element
* :class:`Regexp <weboob.browser.filters.standard.Regexp>`: apply a regex
* :class:`Time <weboob.browser.filters.standard.Time>`: read common time formats
* :class:`Type <weboob.browser.filters.standard.Type>`: get a cleaned value of any type from an element text
Filters can be combined. For example::
obj_id = Link('./a[1]') & Regexp(r'id=(\d+)') & Type(type=int)
This code do several things, in order:
#) extract the href attribute of our item first ``a`` tag child
#) apply a regex to extract a value
#) convert this value to int type
.. note:: .. note::
All objects ID must be unique, and useful to get more information later All objects ID must be unique, and useful to get more information later
Your module is now functional and you can use this command:: Your module is now functional and you can use this command::
$ boobank -b example list $ boobank -b example list