Commit graph

21 commits

Author SHA1 Message Date
Laurent Bachelier
b8d1a52732 Use simplejson first, and centralize import
simplejson is supposed to be faster:
http://stackoverflow.com/questions/712791/json-and-simplejson-module-differences-in-python
2012-03-16 16:27:22 +01:00
Laurent Bachelier
006e97a8be PEP8 style fixes and other small style fixes
I used autopep8 on some files and did carefully check the changes.
I ignored E501,E302,E231,E225,E222,E221,E241,E203 in my search, and at
least E501 on any autopep8 run.

Other style fixes not related to PEP8:
* Only use new-style classes. I don't think the usage of old-style
  classes was voluntary. Old-style classes are removed in Python 3.
* Convert an if/else to a one-liner in mediawiki, change docstring style
  change to a comment something that wasn't really appropriate for a
  docstring.
* Unneeded first if condition in meteofrance
2012-03-14 04:51:46 +01:00
Romain Bignon
1fa64bf5f1 default parsers are now only lxml and lxmlsoup, to prevent bad behaviors with bad parsers 2012-02-02 10:17:41 +01:00
Romain Bignon
59dfe3083a delete 'remove_html_tags' global function, and create IParser.tocleanstring and IParser.strip abstract methods. 2011-10-25 13:28:43 +02:00
Romain Bignon
2cc992a8bc new parser 'json' 2011-09-23 10:00:46 +02:00
Laurent Bachelier
92fc86a033 Add support for xpath in LxmlHtmlParser.select
The returned results are similar to those of the cssselect method
so there wasn't much to do except calling it.
2011-04-12 01:00:28 +02:00
Romain Bignon
9afb301ebe move select() in parser 2011-04-09 11:25:13 +02:00
Romain Bignon
7e2bb91b3b change license to AGPLv3+ 2011-04-08 12:48:07 +02:00
Christophe Benz
2ab29ac070 implement tostring for html5lib parser 2010-11-16 15:30:13 +01:00
Romain Bignon
6de583c4ca Revert "do not strip cdata"
This reverts commit 8bd0ebbea2.
2010-11-09 13:39:43 +01:00
Nicolas Duhamel
8bd0ebbea2 do not strip cdata 2010-11-09 12:03:35 +01:00
Christophe Benz
b4c672fa46 new select() helper 2010-07-14 17:14:53 +02:00
Christophe Benz
470f2a9fe2 use real comments for licence header 2010-06-22 16:27:33 +02:00
Romain Bignon
89c11ca4a0 fix pyflakes errors 2010-05-20 10:42:20 +02:00
Christophe Benz
a9c8c93965 add new lxmlsoup parser 2010-05-20 01:33:54 +02:00
Romain Bignon
fbf639993b misc 2010-05-01 14:41:09 +02:00
Romain Bignon
77044dd4be fix typo 2010-04-20 21:11:14 +02:00
Romain Bignon
3f9083df27 documentation 2010-04-16 20:11:36 +02:00
Christophe Benz
f8e2016d59 get_parser returns class instead of object 2010-04-16 19:41:06 +02:00
Romain Bignon
384e3521c7 factorization 2010-04-16 18:44:55 +02:00
Christophe Benz
8638024756 rename parser/parsers module, add get_parsers() with preference_order 2010-04-16 18:11:52 +02:00