[doc/guides/module] use browser2
This commit is contained in:
parent
4881bb0be7
commit
7f10865215
1 changed files with 39 additions and 32 deletions
|
|
@ -347,52 +347,59 @@ When your browser locates on a page, an instance of the class related to the
|
||||||
is created. You can declare methods on your class to allow your browser to
|
is created. You can declare methods on your class to allow your browser to
|
||||||
interact with it.
|
interact with it.
|
||||||
|
|
||||||
The first thing to know is that your instance owns these attributes:
|
The first thing to know is that page parsing is done in a descriptive way. You
|
||||||
|
don't have to loop on HTML elements to construct the object. Just describe how
|
||||||
* ``browser`` - your ``ExampleBrowser`` class
|
to get correct data to construct it. It is the Browser class work to actually
|
||||||
* ``logger`` - context logger
|
construct the object.
|
||||||
* ``encoding`` - the encoding of the page
|
|
||||||
* ``response`` - the ``Response`` object from ``requests``
|
|
||||||
* ``url`` - current url
|
|
||||||
* ``doc`` - parsed document with ``lxml``
|
|
||||||
|
|
||||||
The most important attribute is ``doc`` you will use to get information from the page. You can call two methods:
|
|
||||||
|
|
||||||
* ``xpath`` - xpath expressions
|
|
||||||
* ``cssselect`` - CSS selectors
|
|
||||||
|
|
||||||
For example::
|
For example::
|
||||||
|
|
||||||
|
from weboob.browser.filters.html import Attr
|
||||||
|
from weboob.browser.filters.standard import CleanDecimal, CleanText
|
||||||
from weboob.capabilities.bank import Account
|
from weboob.capabilities.bank import Account
|
||||||
|
|
||||||
class ListPage(LoggedPage, HTMLPage):
|
class ListPage(LoggedPage, HTMLPage):
|
||||||
def get_accounts(self):
|
@method
|
||||||
for el in self.doc.xpath('//ul[@id="list"]/li'):
|
class get_accounts(ListElement):
|
||||||
account = Account()
|
item_xpath = '//ul[@id="list"]/li'
|
||||||
account.id = el.attrib['id']
|
|
||||||
account.label = el.xpath('./td[@class="name"]').text
|
|
||||||
account.balance = Decimal(el.xpath('./td[@class="balance"]').text)
|
|
||||||
yield account
|
|
||||||
|
|
||||||
An alternative with ``cssselect``::
|
class item(ItemElement):
|
||||||
|
klass = Account()
|
||||||
|
|
||||||
from weboob.capabilities.bank import Account
|
obj_id = Attr('id')
|
||||||
|
obj_label = CleanText('./td[@class="name"]')
|
||||||
|
obj_balance = CleanDecimal('./td[@class="balance"]')
|
||||||
|
|
||||||
class ListPage(LoggedPage, HTMLPage):
|
As you see, we first set ``item_xpath`` which is the xpath string used to
|
||||||
def get_accounts(self):
|
iterate over elements to access data. In a second time we define ``klass`` which
|
||||||
for el in self.document.getroot().cssselect('ul#list li'):
|
is the real class of our object. And then we describe how to fill each object's
|
||||||
id = el.attrib['id']
|
attribute using what we call filters.
|
||||||
account = Account()
|
|
||||||
account.id = el.attrib['id']
|
Some example of filters:
|
||||||
account.label = el.cssselect('td.name').text
|
|
||||||
account.balance = Decimal(el.cssselect('td.balance').text)
|
* :class:`Attr <weboob.browser.filters.html.Attr>`: extract a tag attribute
|
||||||
yield account
|
* :class:`CleanText <weboob.browser.filters.standard.CleanText>`: get a cleaned text from an element
|
||||||
|
* :class:`CleanDecimal <weboob.browser.filters.standard.CleanDecimal>`: get a cleaned Decimal value from an element
|
||||||
|
* :class:`Date <weboob.browser.filters.standard.Date>`: read common date formats
|
||||||
|
* :class:`Link <weboob.browser.filters.html.Link>`: get the link uri of an element
|
||||||
|
* :class:`Regexp <weboob.browser.filters.standard.Regexp>`: apply a regex
|
||||||
|
* :class:`Time <weboob.browser.filters.standard.Time>`: read common time formats
|
||||||
|
* :class:`Type <weboob.browser.filters.standard.Type>`: get a cleaned value of any type from an element text
|
||||||
|
|
||||||
|
Filters can be combined. For example::
|
||||||
|
|
||||||
|
obj_id = Link('./a[1]') & Regexp(r'id=(\d+)') & Type(type=int)
|
||||||
|
|
||||||
|
This code do several things, in order:
|
||||||
|
|
||||||
|
#) extract the href attribute of our item first ``a`` tag child
|
||||||
|
#) apply a regex to extract a value
|
||||||
|
#) convert this value to int type
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
All objects ID must be unique, and useful to get more information later
|
All objects ID must be unique, and useful to get more information later
|
||||||
|
|
||||||
|
|
||||||
Your module is now functional and you can use this command::
|
Your module is now functional and you can use this command::
|
||||||
|
|
||||||
$ boobank -b example list
|
$ boobank -b example list
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue