Liam Healy - May 15th, 2007
May. 15th, 2007
10:24 am - Parsing HTML and memory fault from cl-curl
I've had need to parse HTML in lisp from time to time. The latest reason is some very specific and uncomplicated HTML that I scrape for some satellite data off a published database. A search online turns up XMLS as a likely candidate for this task. I have used it successfully in the past, but recently I find it won't parse its own example and its own supplied HTML documentation, to say nothing of the real HTML I want it to parse. It either returns NIL (meaning an error in parsing the correct HTML), or only the first line of HTML. There is a thread on comp.lang.lisp about how to parse HTML, and many people recommend cl-html-parse. I was dissuaded at first because of the wiki comments implying that it had been superseded by pxmlutils whose web page in turn implies that it had been superseded by... XMLS! But cl-html-parse works just fine on the web pages I need to scrape.
So, success. But then, I am grabbing the web page with cl-curl which works most of the time, but for a particular query gives "memory fault," I think because there is a lot of data. And the author/maintainer of cl-curl is... me! D'Oh. It would be nice to have cl-curl using CFFI instead of UFFI, maybe based on the pedagogical development given in the CFFI tutorial. My hope is at least it would solve this problem. Any interest/volunteers/motivators?
| ← Previous day | (Calendar) | Next day → |
