From a3c4c55fd60a7b9d2caf2840b755c19699136bba Mon Sep 17 00:00:00 2001 From: Romain Bignon Date: Tue, 19 Aug 2014 23:32:59 +0200 Subject: [PATCH] change module documentation to learn browser2 (refs #1451) --- docs/source/guides/module.rst | 391 ++++++++++++++++------------------ 1 file changed, 184 insertions(+), 207 deletions(-) diff --git a/docs/source/guides/module.rst b/docs/source/guides/module.rst index 2be160b4..c60ca524 100644 --- a/docs/source/guides/module.rst +++ b/docs/source/guides/module.rst @@ -8,10 +8,10 @@ Before read it, you should :doc:`setup your development environment ` website provides. A capability is a class derived from :class:`weboob.capabilities.base.CapBase` and with some abstract methods (which raise ``NotImplementedError``). -A capability needs to be as generic as possible to allow a maximum number of modules to implements it. +A capability needs to be as generic as possible to allow a maximum number of modules to implement it. Anyway, if you really need to handle website specificities, you can create more specific sub-capabilities. For example, there is the :class:`CapMessages ` capability, with the associated @@ -47,56 +47,28 @@ The module tree *************** Create a new directory in ``modules/`` with the name of your module. In this example, we assume that we want to create a -module for a forum website which URL is http://www.example.com. So we will call our module **example**, and the selected -capability is :class:`CapMessages `. +module for a bank website which URL is http://www.example.com. So we will call our module **example**, and the selected +capability is :class:`CapBank `. -So, use this command:: +It is recommended to use the helper tool ``tools/boilerplate.py`` to build your +module tree. There are several templates available: - $ mkdir modules/example/ +* **base** - create only base files +* **comic** - create a comic module +* **cap** - create a module for a given capability + +For example, use this command:: + + $ tools/boilerplate.py cap example CapBank In a module directory, there are commonly these files: * **__init__.py** - needed in every python modules, it exports your :class:`BaseBackend ` class. * **backend.py** - defines the main class of your module, which derives :class:`BaseBackend `. -* **browser.py** - your browser, derived from :class:`BaseBrowser `, is called by your module to interact with the supported website. +* **browser.py** - your browser, derived from :class:`BaseBrowser `, is called by your module to interact with the supported website. * **pages.py** - all website's pages handled by the browser are defined here * **test.py** - functional tests -* **favicon.png** - a 64x64 PNG icon - -Backend class -************* - -Firstly, create the file ``__init__.py`` and write in:: - - from .backend import ExampleBackend - - __all__ = ['ExampleBackend'] - -Then, you can edit ``backend.py`` and create your :class:`BaseBackend ` class:: - - # -*- coding: utf-8 -*- - - from weboob.capabilities.messages import CapMessages - from weboob.tools.backend import BaseBackend - - __all__ = ['ExampleBackend'] - - class ExampleBackend(BaseBackend, CapMessages): - # The name of module - NAME = 'example' - # Name of maintainer of this backend - MAINTAINER = u'John Smith' - # Email address of the maintainer - EMAIL = 'john.smith@example.com' - # Version of weboob - VERSION = '0.c' - # Description of your module - DESCRIPTION = 'Example forum website' - # License of your module - LICENSE = 'AGPLv3+' - -In the code above, you can see that your ``ExampleBackend`` inherits :class:`CapMessages `, as -we have selected it for the supported website. +* **favicon.png** - a 64x64 transparent PNG icon Update modules list ------------------- @@ -111,25 +83,41 @@ To be sure your module is correctly added, use this command:: .------------------------------------------------------------------------------. | Module example | +-----------------.------------------------------------------------------------' - | Version | 201203261420 + | Version | 201405191420 | Maintainer | John Smith | License | AGPLv3+ - | Description | Example forum website - | Capabilities | CapMessages + | Description | Example bank website + | Capabilities | CapBank, CapCollection | Installed | yes | Location | /home/me/src/weboob/modules/example '-----------------' If the last command does not work, check your :doc:`repositories setup `. +Backend class +************* + +Edit ``backend.py``. It contains the main class of the module derived from :class:`BaseBackend ` class:: + + class ExampleBackend(BaseBackend, CapBank): + NAME = 'example' # The name of module + DESCRIPTION = u'Example bank website' # Description of your module + MAINTAINER = u'John Smith' # Name of maintainer of this module + EMAIL = 'john.smith@example.com' # Email address of the maintainer + LICENSE = 'AGPLv3+' # License of your module + VERSION = '0.i' # Version of weboob + +In the code above, you can see that your ``ExampleBackend`` inherits :class:`CapBank `, as +we have selected it for the supported website. + Configuration ------------- -When a module is instanced as a backend, you probably want to ask parameters to user. It is manager by the ``CONFIG`` class +When a module is instanced as a backend, you probably want to ask parameters to user. It is managed by the ``CONFIG`` class attribute. It supports key/values with default values and some other parameters. The :class:`Value ` class is used to define a value. -Parameters of :class:`Value ` are: +Available parameters of :class:`Value ` are: * **label** - human readable description of a value * **required** - if ``True``, the backend can't loaded if the key isn't found in its configuration @@ -139,8 +127,10 @@ Parameters of :class:`Value ` are: * **regexp** - if specified, on load the specified value is checked against this regexp, and an error is raised if it doesn't match * **choices** - if this parameter is set, the value must be in the list -There is a special class, :class:`ValueBackendPassword `, which is used to manage -private parameters of the config (like passwords or sensible information). +.. note:: + + There is a special class, :class:`ValueBackendPassword `, which is used to manage + private parameters of the config (like passwords or sensible information). For example:: @@ -148,7 +138,7 @@ For example:: from weboob.tools.backend import BackendConfig # ... - class ExampleBackend(BaseBackend, CapMessages): + class ExampleBackend(BaseBackend, CapBank): # ... CONFIG = BackendConfig(Value('username', label='Username', regexp='.+'), ValueBackendPassword('password', label='Password'), @@ -176,134 +166,134 @@ Implement capabilities You need to implement each method of all of the capabilities your module implements. For example, in our case:: # ... - class ExampleBackend(BaseBackend, CapMessages): + class ExampleBackend(BaseBackend, CapBank): # ... - def iter_threads(self): + def iter_accounts(self): raise NotImplementedError() - def get_thread(self, id): + def get_account(self, id): raise NotImplementedError() - def iter_unread_messages(self): + def iter_history(self, account): raise NotImplementedError() - def set_message_read(self, message): + def iter_coming(self, account): raise NotImplementedError() -Read :class:`documentation of the capability ` to know what are types of arguments, +If you ran the ``boilerplate`` script command ``cap``, every methods are already in ``backend.py`` and documented. + +Read :class:`documentation of the capability ` to know what are types of arguments, what are expected returned objects, and what exceptions it may raises. Browser ******* -Most of modules use a class derived from :class:`BaseBrowser ` to interact with a website. +Most of modules use a class derived from :class:`PagesBrowser ` or +:class:`LoginBrowser ` (for authenticated websites) to interact with a website. -Edit ``browser.py`` and write in:: +Edit ``browser.py``:: # -*- coding: utf-8 -*- - from weboob.tools.browser import BaseBrowser + from weboob.tools.browser2 import PagesBrowser __all__ = ['ExampleBrowser'] - class ExampleBrowser(BaseBrowser): - DOMAIN = 'example.com' - PROTOCOL = 'https' - ENCODING = 'utf-8' - USER_AGENT = BaseBrowser.USER_AGENTS['desktop_firefox'] - PAGES = {} + class ExampleBrowser(PagesBrowser): + BASEURL = 'https://www.example.com' -There are several attributes: +There are several possible class attributes: -* **DOMAIN** - hostname of the website. -* **PROTOCOL** - what protocol to use to access to website (http or https). -* **ENCODING** - what is the encoding of HTML pages. If you set it to ``None``, it will use the web server one. -* **USER_AGENT** - what *UserAgent* to use to access to website. Sometimes, websites provide different behaviors when you use different user agents. - You can use one of the :class:`predefined user-agents `, or write your - own string. -* **PAGES** - list of handled pages, and the associated :class:`BasePage ` class. +* **BASEURL** - base url of website used for absolute paths given to :class:`open() ` or :class:`location() ` +* **PROFILE** - defines the behavior of your browser against the website. By default this is Firefox, but you can import other profiles +* **TIMEOUT** - defines the timeout for requests (defaults to 10 seconds) +* **VERIFY** - SSL verification (if the protocol used is **https**) Pages ----- -For each page you want to handle, you have to create an associated class derived from :class:`BasePage `. +For each page you want to handle, you have to create an associated class derived from one of these classes: -Create ``pages.py`` and write in:: +* :class:`HTMLPage ` - a HTML page +* :class:`XMLPage ` - a XML document +* :class:`JsonPage ` - a Json object + +In the file ``pages.py``, you can write, for example:: # -*- coding: utf-8 -*- - from weboob.tools.browser import BasePage + from weboob.tools.browser2.page import HTMLPage __all__ = ['IndexPage', 'ListPage'] - class IndexPage(BasePage): + class IndexPage(HTMLPage): pass - class ListPage(BasePage): - def iter_threads_list(self): + class ListPage(HTMLPage): + def iter_accounts(): return iter([]) ``IndexPage`` is the class we will use to get information from the home page of the website, and ``ListPage`` will handle pages -which list forum threads. To associate them to URLs, change the ``ExampleBrowser.PAGES`` dictionary:: +which list accounts. +Then, you have to declare them in your browser, with the :class:`URL ` object:: + + from weboob.tools.browser2.page import PagesBrowser, URL from .pages import IndexPage, ListPage # ... - class ExampleBrowser(BaseBrowser): + class ExampleBrowser(PagesBrowser): # ... - PAGES = {'https://example\.com/': IndexPage, - 'https://example\.com/posts': ListPage, - } -Easy, isn't it? The key is a regexp, and the value is your class. Each time you will go on the home page, ``IndexPage`` will be -instanced and set as the ``page`` attribute. + home = URL('/$', IndexPage) + accounts = URL('/accounts$', ListPage) -To check on what page the browser is currently, you can use :func:`is_on_page `. +Easy, isn't it? The first parameters are regexps of the urls (if you give only a path, it uses the ``BASEURL`` class attribute), and the last one is the class used to handle the response. -For example, we can now implement the ``home`` method in ``ExampleBrowser``:: +Each time you will go on the home page, ``IndexPage`` will be instanced and set as the ``page`` attribute. - class ExampleBrowser(BaseBrowser): +For example, we can now implement some methods in ``ExampleBrowser``:: + + class ExampleBrowser(PagesBrowserr): # ... - def home(self): - self.location('/') + def go_home(self): + self.home.go() - assert self.is_on_page(IndexPage) + assert self.home.is_here() - def iter_threads_list(self): - self.location('/posts') + def iter_accounts_list(self): + self.accounts.stay_or_go() - assert self.is_on_page(ListPage) - return self.page.iter_threads_list() + return self.page.iter_accounts_list() -``home`` is automatically called when an instance of ``ExampleBrowser`` is created. We also have defined ``iter_threads_list`` -to go on the corresponding page and get list of threads. For now, ``ListPage.iter_threads_list`` returns an empty iterator, but -we will implement it later. +When calling the :func:`go() ` method, it reads the first regexp url of our :class:`URL ` object, and go on the page. + +:func:`stay_or_go() ` is used when you want to relocate on the page only if we aren't already on it. + +Once we are on the ``ListPage``, we can call every methods of the ``page`` object. Use it in backend ----------------- -Once you have a functional browser, you can use it in your class ``ExampleBackend`` by defining it with the ``BROWSER`` attribute:: +Now you have a functional browser, you can use it in your class ``ExampleBackend`` by defining it with the ``BROWSER`` attribute:: from .browser import ExampleBrowser # ... - class ExampleBackend(BaseBackend, CapMessages): + class ExampleBackend(BaseBackend, CapBank): # ... BROWSER = ExampleBrowser -You can now access it with member ``browser``. The class is instanced at the first call to this attribute. It is often better to use -your browser only in a ``with`` block, to prevent problems when your backend is called in a multi-threading environment. +You can now access it with member ``browser``. The class is instanced at the first call to this attribute. -For example, we can now implement :func:`CapMessages.iter_threads `:: +For example, we can now implement :func:`CapBank.iter_accounts `:: - class ExampleBackend(BaseBackend, CapMessages): + class ExampleBackend(BaseBackend, CapBank): # ... def create_default_browser(self): return self.create_browser(self.config['username'].get(), self.config['password'].get()) -On the browser side, the important thing to know is that every times you call -:func:`location `, the method -:func:`is_logged ` is called to know if we are logged or not. -It is useful when the browser is launched to automatically login, or when your session has expired on website and you -need to re-login. +On the browser side, you need to inherit from :func:`LoginBrowser ` and to implement the function +:func:`do_login `:: -When you are not logged, the method :func:`login ` is called. - -For example:: - - from weboob.tools.browser import BaseBrowser, BrowserIncorrectPassword - - # ... - class ExampleBrowser(BaseBrowser): + class ExampleBrowser(LoginBrowser): + login = URL('/login', LoginPage) # ... - PAGES = {'https://example\.com/': IndexPage, - 'https://example\.com/login': LoginPage, - 'https://example\.com/posts': ListPage, - } - def is_logged(self): - return self.is_on_page(LoginPage) == False - - def login(self): - if not self.is_on_page(LoginPage): - self.location('/login', no_login=True) + def do_login(self): + self.login.stay_or_go() self.page.login(self.username, self.password) - if not self.is_logged(): - raise BrowserIncorrectPassword() + if self.login_error.is_here(): + raise BrowserIncorrectPassword(self.page.get_error()) -The way to know if we are logged or not is different between websites. In this hypothetical case, we assume the website -isn't accessible if you aren't logged, and you are always redirected to ``login/`` until you are authenticated. +Also, your ``LoginPage`` may look like:: -.. note:: - - The parameter ``no_login`` have to be used in this case to prevent an infinite loop. - -Code of ``LoginPage`` in ``pages.py`` may be something like that:: - - class LoginPage(BasePage): + class LoginPage(HTMLPage): def login(self, username, password): - self.browser.select_form(name='login') - self.browser['login'] = username - self.browser['password'] = password - self.browser.submit() + form = self.get_form(name='auth') + form['username'] = username + form['password'] = password + form.submit() -It selects the form named **login**, fill fields and submit it. You can also simulate the request by hand with:: +Then, each method on your browser which need your user to be authenticated may be decorated by :func:`need_login `:: - import urllib - class ExampleBrowser(BaseBrowser): - # ... - def login(self): - if not self.is_on_page(LoginPage): - self.loaction('/login', no_login=True) + class ExampleBrowser(LoginBrowser): + accounts = URL('/accounts$', ListPage) - d = {'login': self.username, - 'password': self.password, - } - self.location('/', urllib.urlencode(d), no_login=True) + @need_login + def iter_accounts(self): + self.accounts.stay_or_go() + return self.page.get_accounts() + +The last thing to know is that :func:`need_login ` checks if the current page is a logged one by +reading the attribute :func:`logged ` of the instance. You can either define it yourself, as a +class boolean attribute or as a property, or to inherit your class from :class:`LoggedPage `. - if not self.is_logged(): - raise BrowserIncorrectPassword() Parsing of pages ----------------- +**************** -To parse pages in your classes derived from :class:`BasePage `, there are several tools and things to know. +.. note:: + Depending of the base class you use for your page, it will parse html, json, csv, etc. In our case, it will be only html documents. -Firstly, your object has these attributes: -* **browser** - your ``ExampleBrowser`` class -* **parser** - parser used to parse the HTML page (by default this is *lxml*) -* **document** - parsed document -* **url** - URL -* **logger** - context logger +When your browser locates on a page, an instance of the class related to the +:class:`URL ` attribute which matches the url +is created. You can declare methods on your class to allow your browser to +interact with it. -To find an element, there are two methods: +The first thing to know is that your instance owns these attributes: -* **xpath** - xpath expressions -* **cssselect** - CSS selectors +* ``browser`` - your ``ExampleBrowser`` class +* ``logger`` - context logger +* ``encoding`` - the encoding of the page +* ``response`` - the ``Response`` object from ``requests`` +* ``url`` - current url +* ``doc`` - parsed document with ``lxml`` + +The most important attribute is ``doc`` you will use to get information from the page. You can call two methods: + +* ``xpath`` - xpath expressions +* ``cssselect`` - CSS selectors For example:: - from weboob.capabilities.messages import Thread - class ListPage(BasePage): - def iter_threads_list(self): - for el in self.document.xpath('//ul[@id="list"]/li'): + from weboob.capabilities.bank import Account + + class ListPage(LoggedPage, HTMLPage): + def get_accounts(self): + for el in self.doc.xpath('//ul[@id="list"]/li'): id = el.attrib['id'] - thread = Thread(id) - thread.title = el.xpath('./h3').text - yield thread + account = Account(id) + account.label = el.xpath('./td[@class="name"]').text + account.balance = Decimal(el.xpath('./td[@class="balance"]').text) + yield account An alternative with ``cssselect``:: - from weboob.capabilities.messages import Thread - class ListPage(BasePage): - def iter_threads_list(self): + from weboob.capabilities.bank import Account + + class ListPage(LoggedPage, HTMLPage): + def get_accounts(self): for el in self.document.getroot().cssselect('ul#list li'): id = el.attrib['id'] - thread = Thread(id) - thread.title = el.find('h3').text - yield thread + account = Account(id) + account.label = el.cssselect('td.name').text + account.balance = Decimal(el.cssselect('td.balance').text) + yield account .. note:: @@ -428,7 +404,7 @@ An alternative with ``cssselect``:: Your module is now functional and you can use this command:: - $ boobmsg -b example list + $ boobank -b example list Tests ***** @@ -436,20 +412,20 @@ Tests Every modules must have a tests suite to detect when there are changes on websites, or when a commit breaks the behavior of the module. -Create ``test.py`` and write it, for example:: +Edit ``test.py`` and write, for example:: # -*- coding: utf-8 -*- from weboob.tools.test import BackendTest - __all__ = ['DLFPTest'] + __all__ = ['ExampleTest'] class ExampleTest(BackendTest): BACKEND = 'example' - def test_iter_threads(self): - threads = list(self.backend.iter_threads()) + def test_iter_accounts(self): + accounts = list(self.backend.iter_accounts()) - self.assertTrue(len(threads) > 0) + self.assertTrue(len(accounts) > 0) To try running test of your module, launch:: @@ -476,27 +452,28 @@ uncompleted fields, and call the method associated to the type of the object. To define what objects are supported to be filled, and what method to call, define the ``OBJECTS`` class attribute in your ``ExampleBackend``:: - OBJECTS = {Thread: fill_thread} + class ExampleBackend(BaseBackend, CapVideo): + # ... + + OBJECTS = {Video: fill_video} The prototype of the function might be:: - def func(self, obj, fields) + func(self, obj, fields) Then, the function might, for each requested fields, fetch the right data and fill the object. For example:: - def fill_thread(self, thread, fields): - if 'root' in fields or \ - 'date' in fields: - return self.get_thread(thread) + class ExampleBackend(BaseBackend, CapVideo): + # ... - return thread + def fill_video(self, video, fields): + if 'url' in fields: + return self.backend.get_video(video.id) -Here, when the application has got a :class:`Thread ` object with -:func:`iter_threads `, only two fields -are empty (set to ``NotLoaded``): + return video -* **root** - tree of messages in the thread -* **date** - date of thread +Here, when the application has got a :class:`Video ` object with +:func:`search_videos `, in most cases, there are only some meta-data, but not the direct link to the video media. -As our method :func:`get_thread ` will get all -of the missing data, we just call it with the object as parameter to complete it. +As our method :func:`get_video ` will get all +of the missing informations, we just call it with the object as parameter to complete it.