Commit graph

19 commits

Author SHA1 Message Date
Romain Bignon
1fa64bf5f1 default parsers are now only lxml and lxmlsoup, to prevent bad behaviors with bad parsers 2012-02-02 10:17:41 +01:00
Romain Bignon
59dfe3083a delete 'remove_html_tags' global function, and create IParser.tocleanstring and IParser.strip abstract methods. 2011-10-25 13:28:43 +02:00
Romain Bignon
2cc992a8bc new parser 'json' 2011-09-23 10:00:46 +02:00
Laurent Bachelier
92fc86a033 Add support for xpath in LxmlHtmlParser.select
The returned results are similar to those of the cssselect method
so there wasn't much to do except calling it.
2011-04-12 01:00:28 +02:00
Romain Bignon
9afb301ebe move select() in parser 2011-04-09 11:25:13 +02:00
Romain Bignon
7e2bb91b3b change license to AGPLv3+ 2011-04-08 12:48:07 +02:00
Christophe Benz
2ab29ac070 implement tostring for html5lib parser 2010-11-16 15:30:13 +01:00
Romain Bignon
6de583c4ca Revert "do not strip cdata"
This reverts commit 8bd0ebbea2.
2010-11-09 13:39:43 +01:00
Nicolas Duhamel
8bd0ebbea2 do not strip cdata 2010-11-09 12:03:35 +01:00
Christophe Benz
b4c672fa46 new select() helper 2010-07-14 17:14:53 +02:00
Christophe Benz
470f2a9fe2 use real comments for licence header 2010-06-22 16:27:33 +02:00
Romain Bignon
89c11ca4a0 fix pyflakes errors 2010-05-20 10:42:20 +02:00
Christophe Benz
a9c8c93965 add new lxmlsoup parser 2010-05-20 01:33:54 +02:00
Romain Bignon
fbf639993b misc 2010-05-01 14:41:09 +02:00
Romain Bignon
77044dd4be fix typo 2010-04-20 21:11:14 +02:00
Romain Bignon
3f9083df27 documentation 2010-04-16 20:11:36 +02:00
Christophe Benz
f8e2016d59 get_parser returns class instead of object 2010-04-16 19:41:06 +02:00
Romain Bignon
384e3521c7 factorization 2010-04-16 18:44:55 +02:00
Christophe Benz
8638024756 rename parser/parsers module, add get_parsers() with preference_order 2010-04-16 18:11:52 +02:00