492 lines
19 KiB
ReStructuredText
492 lines
19 KiB
ReStructuredText
Write a new module
|
||
==================
|
||
|
||
This guide aims to learn how to write a new module for `Weboob <http://weboob.org>`_.
|
||
|
||
Before read it, you should :doc:`setup your development environment </guides/setup>`.
|
||
|
||
What is a module
|
||
****************
|
||
|
||
A module is an interface between a website and Weboob. It represents the python code which is stored
|
||
in repositories.
|
||
|
||
Weboob applications need *backends* to interact with websites. A *backend* is an instance of a *module*, usually
|
||
with several parameters like your username, password, or other options. You can create multiple *backends*
|
||
for a single *module*.
|
||
|
||
Select capabilities
|
||
*******************
|
||
|
||
Each module implements one or many :doc:`capabilities </api/capabilities/index>` to tell what kind of features the
|
||
website provides. A capability is a class derived from :class:`weboob.capabilities.base.Capability` and with some abstract
|
||
methods (which raise ``NotImplementedError``).
|
||
|
||
A capability needs to be as generic as possible to allow a maximum number of modules to implement it.
|
||
Anyway, if you really need to handle website specificities, you can create more specific sub-capabilities.
|
||
|
||
For example, there is the :class:`CapMessages <weboob.capabilities.messages.CapMessages>` capability, with the associated
|
||
:class:`CapMessagesPost <weboob.capabilities.messages.CapMessagesPost>` capability to allow answers to messages.
|
||
|
||
Pick an existing capability
|
||
---------------------------
|
||
|
||
When you want to create a new module, you may have a look on existing capabilities to decide which one can be
|
||
implemented. It is quite important, because each already existing capability is supported by at least one application.
|
||
So if your new module implements an existing capability, it will be usable from the existing applications right now.
|
||
|
||
Create a new capability
|
||
-----------------------
|
||
|
||
If the website you want to manage implements some extra-features which are not implemented by any capability,
|
||
you can introduce a new capability.
|
||
|
||
You should read the related guide to know :doc:`how to create a capability </guides/capability>`.
|
||
|
||
The module tree
|
||
***************
|
||
|
||
Create a new directory in ``modules/`` with the name of your module. In this example, we assume that we want to create a
|
||
module for a bank website which URL is http://www.example.com. So we will call our module **example**, and the selected
|
||
capability is :class:`CapBank <weboob.capabilities.bank.CapBank>`.
|
||
|
||
It is recommended to use the helper tool ``tools/boilerplate.py`` to build your
|
||
module tree. There are several templates available:
|
||
|
||
* **base** - create only base files
|
||
* **comic** - create a comic module
|
||
* **cap** - create a module for a given capability
|
||
|
||
For example, use this command::
|
||
|
||
$ tools/boilerplate.py cap example CapBank
|
||
|
||
In a module directory, there are commonly these files:
|
||
|
||
* **__init__.py** - needed in every python modules, it exports your :class:`Module <weboob.tools.backend.Module>` class.
|
||
* **module.py** - defines the main class of your module, which derives :class:`Module <weboob.tools.backend.Module>`.
|
||
* **browser.py** - your browser, derived from :class:`Browser <weboob.browser.browsers.Browser>`, is called by your module to interact with the supported website.
|
||
* **pages.py** - all website's pages handled by the browser are defined here
|
||
* **test.py** - functional tests
|
||
* **favicon.png** - a 64x64 transparent PNG icon
|
||
|
||
Update modules list
|
||
-------------------
|
||
|
||
As you are in development mode, to see your new module in ``weboob-config``'s list, you have to update ``modules/modules.list`` with this command::
|
||
|
||
$ weboob-config update
|
||
|
||
To be sure your module is correctly added, use this command::
|
||
|
||
$ weboob-config info example
|
||
.------------------------------------------------------------------------------.
|
||
| Module example |
|
||
+-----------------.------------------------------------------------------------'
|
||
| Version | 201405191420
|
||
| Maintainer | John Smith <john.smith@example.com>
|
||
| License | AGPLv3+
|
||
| Description | Example bank website
|
||
| Capabilities | CapBank, CapCollection
|
||
| Installed | yes
|
||
| Location | /home/me/src/weboob/modules/example
|
||
'-----------------'
|
||
|
||
If the last command does not work, check your :doc:`repositories setup </guides/setup>`.
|
||
|
||
Module class
|
||
*************
|
||
|
||
Edit ``module.py``. It contains the main class of the module derived from :class:`Module <weboob.tools.backend.Module>` class::
|
||
|
||
class ExampleModule(Module, CapBank):
|
||
NAME = 'example' # The name of module
|
||
DESCRIPTION = u'Example bank website' # Description of your module
|
||
MAINTAINER = u'John Smith' # Name of maintainer of this module
|
||
EMAIL = 'john.smith@example.com' # Email address of the maintainer
|
||
LICENSE = 'AGPLv3+' # License of your module
|
||
VERSION = '0.i' # Version of weboob
|
||
|
||
In the code above, you can see that your ``ExampleModule`` inherits :class:`CapBank <weboob.capabilities.bank.CapBank>`, as
|
||
we have selected it for the supported website.
|
||
|
||
Configuration
|
||
-------------
|
||
|
||
When a module is instanced as a backend, you probably want to ask parameters to user. It is managed by the ``CONFIG`` class
|
||
attribute. It supports key/values with default values and some other parameters. The :class:`Value <weboob.tools.value.Value>`
|
||
class is used to define a value.
|
||
|
||
Available parameters of :class:`Value <weboob.tools.value.Value>` are:
|
||
|
||
* **label** - human readable description of a value
|
||
* **required** - if ``True``, the backend can't loaded if the key isn't found in its configuration
|
||
* **default** - an optional default value, used when the key is not in config. If there is no default value and the key
|
||
is not found in configuration, the **required** parameter is implicitly set
|
||
* **masked** - if ``True``, the value is masked. It is useful for applications to know if this key is a password
|
||
* **regexp** - if specified, on load the specified value is checked against this regexp, and an error is raised if it doesn't match
|
||
* **choices** - if this parameter is set, the value must be in the list
|
||
|
||
.. note::
|
||
|
||
There is a special class, :class:`ValueBackendPassword <weboob.tools.value.ValueBackendPassword>`, which is used to manage
|
||
private parameters of the config (like passwords or sensible information).
|
||
|
||
For example::
|
||
|
||
from weboob.tools.value import Value, ValueBool, ValueInt, ValueBackendPassword
|
||
from weboob.tools.backend import BackendConfig
|
||
|
||
# ...
|
||
class ExampleModule(Module, CapBank):
|
||
# ...
|
||
CONFIG = BackendConfig(Value('username', label='Username', regexp='.+'),
|
||
ValueBackendPassword('password', label='Password'),
|
||
ValueBool('get_news', label='Get newspapers', default=True),
|
||
Value('choice', label='Choices', choices={'value1': 'Label 1',
|
||
'value2': 'Label 2'}, default='1'),
|
||
Value('regexp', label='Birthday', regexp='^\d+/\d+/\d+$'),
|
||
ValueInt('integer', label='A number', required=True))
|
||
|
||
|
||
Implement capabilities
|
||
----------------------
|
||
|
||
You need to implement each method of all of the capabilities your module implements. For example, in our case::
|
||
|
||
# ...
|
||
class ExampleModule(Module, CapBank):
|
||
# ...
|
||
|
||
def iter_accounts(self):
|
||
raise NotImplementedError()
|
||
|
||
def get_account(self, id):
|
||
raise NotImplementedError()
|
||
|
||
def iter_history(self, account):
|
||
raise NotImplementedError()
|
||
|
||
def iter_coming(self, account):
|
||
raise NotImplementedError()
|
||
|
||
If you ran the ``boilerplate`` script command ``cap``, every methods are already in ``module.py`` and documented.
|
||
|
||
Read :class:`documentation of the capability <weboob.capabilities.bank.CapBank>` to know what are types of arguments,
|
||
what are expected returned objects, and what exceptions it may raises.
|
||
|
||
|
||
Browser
|
||
*******
|
||
|
||
Most of modules use a class derived from :class:`PagesBrowser <weboob.browser.browsers.PagesBrowser>` or
|
||
:class:`LoginBrowser <weboob.browser.browsers.LoginBrowser>` (for authenticated websites) to interact with a website.
|
||
|
||
Edit ``browser.py``::
|
||
|
||
# -*- coding: utf-8 -*-
|
||
|
||
from weboob.browser import PagesBrowser
|
||
|
||
__all__ = ['ExampleBrowser']
|
||
|
||
class ExampleBrowser(PagesBrowser):
|
||
BASEURL = 'https://www.example.com'
|
||
|
||
There are several possible class attributes:
|
||
|
||
* **BASEURL** - base url of website used for absolute paths given to :class:`open() <weboob.browser.browsers.PagesBrowser.open>` or :class:`location() <weboob.browser.browsers.PagesBrowser.location>`
|
||
* **PROFILE** - defines the behavior of your browser against the website. By default this is Firefox, but you can import other profiles
|
||
* **TIMEOUT** - defines the timeout for requests (defaults to 10 seconds)
|
||
* **VERIFY** - SSL verification (if the protocol used is **https**)
|
||
|
||
Pages
|
||
-----
|
||
|
||
For each page you want to handle, you have to create an associated class derived from one of these classes:
|
||
|
||
* :class:`HTMLPage <weboob.browser.pages.HTMLPage>` - a HTML page
|
||
* :class:`XMLPage <weboob.browser.pages.XMLPage>` - a XML document
|
||
* :class:`JsonPage <weboob.browser.pages.JsonPage>` - a Json object
|
||
* :class:`CsvPage <weboob.browser.pages.CsvPage>` - a CSV table
|
||
|
||
In the file ``pages.py``, you can write, for example::
|
||
|
||
# -*- coding: utf-8 -*-
|
||
|
||
from weboob.browser.pages import HTMLPage
|
||
|
||
__all__ = ['IndexPage', 'ListPage']
|
||
|
||
class IndexPage(HTMLPage):
|
||
pass
|
||
|
||
class ListPage(HTMLPage):
|
||
def iter_accounts():
|
||
return iter([])
|
||
|
||
``IndexPage`` is the class we will use to get information from the home page of the website, and ``ListPage`` will handle pages
|
||
which list accounts.
|
||
|
||
Then, you have to declare them in your browser, with the :class:`URL <weboob.browser.url.URL>` object::
|
||
|
||
from weboob.browser import PagesBrowser, URL
|
||
from .pages import IndexPage, ListPage
|
||
|
||
# ...
|
||
class ExampleBrowser(PagesBrowser):
|
||
# ...
|
||
|
||
home = URL('/$', IndexPage)
|
||
accounts = URL('/accounts$', ListPage)
|
||
|
||
Easy, isn't it? The first parameters are regexps of the urls (if you give only a path, it uses the ``BASEURL`` class attribute), and the last one is the class used to handle the response.
|
||
|
||
Each time you will go on the home page, ``IndexPage`` will be instanced and set as the ``page`` attribute.
|
||
|
||
For example, we can now implement some methods in ``ExampleBrowser``::
|
||
|
||
class ExampleBrowser(PagesBrowserr):
|
||
# ...
|
||
def go_home(self):
|
||
self.home.go()
|
||
|
||
assert self.home.is_here()
|
||
|
||
def iter_accounts_list(self):
|
||
self.accounts.stay_or_go()
|
||
|
||
return self.page.iter_accounts()
|
||
|
||
When calling the :func:`go() <weboob.browser.url.URL.go>` method, it reads the first regexp url of our :class:`URL <weboob.browser.url.URL>` object, and go on the page.
|
||
|
||
:func:`stay_or_go() <weboob.browser.url.URL.stay_or_go>` is used when you want to relocate on the page only if we aren't already on it.
|
||
|
||
Once we are on the ``ListPage``, we can call every methods of the ``page`` object.
|
||
|
||
Use it in backend
|
||
-----------------
|
||
|
||
Now you have a functional browser, you can use it in your class ``ExampleModule`` by defining it with the ``BROWSER`` attribute::
|
||
|
||
from .browser import ExampleBrowser
|
||
|
||
# ...
|
||
class ExampleModule(Module, CapBank):
|
||
# ...
|
||
BROWSER = ExampleBrowser
|
||
|
||
You can now access it with member ``browser``. The class is instanced at the first call to this attribute.
|
||
|
||
For example, we can now implement :func:`CapBank.iter_accounts <weboob.capabilities.bank.CapBank.iter_accounts`::
|
||
|
||
def iter_accounts(self):
|
||
return self.browser.iter_accounts_list()
|
||
|
||
For this method, we only call immediately ``ExampleBrowser.iter_accounts_list``, as there isn't anything else to do around.
|
||
|
||
Login management
|
||
----------------
|
||
|
||
When the website requires to be authenticated, you have to give credentials to the constructor of the browser. You can redefine
|
||
the method :func:`create_default_browser <weboob.tools.backend.Module.create_default_browser>`::
|
||
|
||
class ExampleModule(Module, CapBank):
|
||
# ...
|
||
def create_default_browser(self):
|
||
return self.create_browser(self.config['username'].get(), self.config['password'].get())
|
||
|
||
On the browser side, you need to inherit from :func:`LoginBrowser <weboob.browser.browsers.LoginBrowser>` and to implement the function
|
||
:func:`do_login <weboob.browser.browsers.LoginBrowser.do_login>`::
|
||
|
||
class ExampleBrowser(LoginBrowser):
|
||
login = URL('/login', LoginPage)
|
||
# ...
|
||
|
||
def do_login(self):
|
||
self.login.stay_or_go()
|
||
|
||
self.page.login(self.username, self.password)
|
||
|
||
if self.login_error.is_here():
|
||
raise BrowserIncorrectPassword(self.page.get_error())
|
||
|
||
Also, your ``LoginPage`` may look like::
|
||
|
||
class LoginPage(HTMLPage):
|
||
def login(self, username, password):
|
||
form = self.get_form(name='auth')
|
||
form['username'] = username
|
||
form['password'] = password
|
||
form.submit()
|
||
|
||
Then, each method on your browser which need your user to be authenticated may be decorated by :func:`need_login <weboob.browser.browsers.need_login>`::
|
||
|
||
class ExampleBrowser(LoginBrowser):
|
||
accounts = URL('/accounts$', ListPage)
|
||
|
||
@need_login
|
||
def iter_accounts(self):
|
||
self.accounts.stay_or_go()
|
||
return self.page.get_accounts()
|
||
|
||
The last thing to know is that :func:`need_login <weboob.browser.browsers.need_login>` checks if the current page is a logged one by
|
||
reading the attribute :func:`logged <weboob.browser.pages.Page.logged>` of the instance. You can either define it yourself, as a
|
||
class boolean attribute or as a property, or to inherit your class from :class:`LoggedPage <weboob.browser.pages.LoggedPage>`.
|
||
|
||
|
||
Parsing of pages
|
||
****************
|
||
|
||
.. note::
|
||
Depending of the base class you use for your page, it will parse html, json, csv, etc. In our case, it will be only html documents.
|
||
|
||
|
||
When your browser locates on a page, an instance of the class related to the
|
||
:class:`URL <weboob.browser.url.URL>` attribute which matches the url
|
||
is created. You can declare methods on your class to allow your browser to
|
||
interact with it.
|
||
|
||
The first thing to know is that page parsing is done in a descriptive way. You
|
||
don't have to loop on HTML elements to construct the object. Just describe how
|
||
to get correct data to construct it. It is the Browser class work to actually
|
||
construct the object.
|
||
|
||
For example::
|
||
|
||
from weboob.browser.filters.html import Attr
|
||
from weboob.browser.filters.standard import CleanDecimal, CleanText
|
||
from weboob.capabilities.bank import Account
|
||
|
||
class ListPage(LoggedPage, HTMLPage):
|
||
@method
|
||
class get_accounts(ListElement):
|
||
item_xpath = '//ul[@id="list"]/li'
|
||
|
||
class item(ItemElement):
|
||
klass = Account()
|
||
|
||
obj_id = Attr('id')
|
||
obj_label = CleanText('./td[@class="name"]')
|
||
obj_balance = CleanDecimal('./td[@class="balance"]')
|
||
|
||
As you see, we first set ``item_xpath`` which is the xpath string used to
|
||
iterate over elements to access data. In a second time we define ``klass`` which
|
||
is the real class of our object. And then we describe how to fill each object's
|
||
attribute using what we call filters.
|
||
|
||
Some example of filters:
|
||
|
||
* :class:`Attr <weboob.browser.filters.html.Attr>`: extract a tag attribute
|
||
* :class:`CleanText <weboob.browser.filters.standard.CleanText>`: get a cleaned text from an element
|
||
* :class:`CleanDecimal <weboob.browser.filters.standard.CleanDecimal>`: get a cleaned Decimal value from an element
|
||
* :class:`Date <weboob.browser.filters.standard.Date>`: read common date formats
|
||
* :class:`Link <weboob.browser.filters.html.Link>`: get the link uri of an element
|
||
* :class:`Regexp <weboob.browser.filters.standard.Regexp>`: apply a regex
|
||
* :class:`Time <weboob.browser.filters.standard.Time>`: read common time formats
|
||
* :class:`Type <weboob.browser.filters.standard.Type>`: get a cleaned value of any type from an element text
|
||
|
||
Filters can be combined. For example::
|
||
|
||
obj_id = Link('./a[1]') & Regexp(r'id=(\d+)') & Type(type=int)
|
||
|
||
This code do several things, in order:
|
||
|
||
#) extract the href attribute of our item first ``a`` tag child
|
||
#) apply a regex to extract a value
|
||
#) convert this value to int type
|
||
|
||
.. note::
|
||
|
||
All objects ID must be unique, and useful to get more information later
|
||
|
||
Your module is now functional and you can use this command::
|
||
|
||
$ boobank -b example list
|
||
|
||
Tests
|
||
*****
|
||
|
||
Every modules must have a tests suite to detect when there are changes on websites, or when a commit
|
||
breaks the behavior of the module.
|
||
|
||
Edit ``test.py`` and write, for example::
|
||
|
||
# -*- coding: utf-8 -*-
|
||
from weboob.tools.test import BackendTest
|
||
|
||
__all__ = ['ExampleTest']
|
||
|
||
class ExampleTest(BackendTest):
|
||
MODULE = 'example'
|
||
|
||
def test_iter_accounts(self):
|
||
accounts = list(self.backend.iter_accounts())
|
||
|
||
self.assertTrue(len(accounts) > 0)
|
||
|
||
To try running test of your module, launch::
|
||
|
||
$ tools/run_tests.sh example
|
||
|
||
For more information, look at the :doc:`tests` guides.
|
||
|
||
Advanced topics
|
||
***************
|
||
|
||
Filling objects
|
||
---------------
|
||
|
||
An object returned by a method of a capability can be not fully completed.
|
||
|
||
The class :class:`Module <weboob.tools.backend.Module>` provides a method named
|
||
:func:`fillobj <weboob.tools.backend.Module.fillobj>`, which can be called by an application to
|
||
fill some unloaded fields of a specific object, for example with::
|
||
|
||
backend.fillobj(video, ['url', 'author'])
|
||
|
||
The ``fillobj`` method will check on the object what fields, in the ones given in list, which are
|
||
not loaded (equal to ``NotLoaded``, which is the default value), to reduce the list to the real
|
||
uncompleted fields, and call the method associated to the type of the object.
|
||
|
||
To define what objects are supported to be filled, and what method to call, define the ``OBJECTS``
|
||
class attribute in your ``ExampleModule``::
|
||
|
||
class ExampleModule(Module, CapVideo):
|
||
# ...
|
||
|
||
OBJECTS = {Video: fill_video}
|
||
|
||
The prototype of the function might be::
|
||
|
||
func(self, obj, fields)
|
||
|
||
Then, the function might, for each requested fields, fetch the right data and fill the object. For example::
|
||
|
||
class ExampleModule(Module, CapVideo):
|
||
# ...
|
||
|
||
def fill_video(self, video, fields):
|
||
if 'url' in fields:
|
||
return self.backend.get_video(video.id)
|
||
|
||
return video
|
||
|
||
Here, when the application has got a :class:`Video <weboob.capabilities.video.BaseVideo>` object with
|
||
:func:`search_videos <weboob.capabilities.video.CapVideo.search_videos>`, in most cases, there are only some meta-data, but not the direct link to the video media.
|
||
|
||
As our method :func:`get_video <weboob.capabilities.video.CapVideo.get_video>` will get all
|
||
of the missing data, we just call it with the object as parameter to complete it.
|
||
|
||
|
||
Storage
|
||
-------
|
||
|
||
The application can provide a storage to let your backend store data. So, you can define the structure of your storage space::
|
||
|
||
STORAGE = {'seen': {}}
|
||
|
||
To store and read data in your storage space, use the ``storage`` attribute of your :class:`Module <weboob.tools.backend.Module>`
|
||
object.
|
||
|
||
It implements the methods of :class:`BackendStorage <weboob.tools.backend.BackendStorage>`.
|