I've written a script to parse the name and price of certain items from craigslist. The xpath
I've defined within my scraper are working ones. The thing is when I try to scrape the items in usual way then applying try/except
block I can avoid IndexError
when the value of certain price is none. I even tried with customized function to make it work and found success as well.
However, In this below snippet I would like to apply lambda
function to kick out IndexError
error. I tried but could not succeed.
Btw, when I run the code It neither fetches anything nor throws any error either.
import requests from lxml.html import fromstring page = requests.get('http://bangalore.craigslist.co.in/search/rea?s=120').text tree = fromstring(page) # I wish to fix this function to make a go get_val = lambda item,path:item.text if item.xpath(path) else "" for item in tree.xpath('//li[@class="result-row"]'): link = get_val(item,'.//a[contains(@class,"hdrlnk")]') price = get_val(item,'.//span[@class="result-price"]') print(link,price)
1 Answers
Answers 1
First of all, your lambda function get_val
returns the text of the item if the path exists, and not the text of the searched node. This is probably not what you want. If want want to return the text content of the (first) element matching the path, you should write:
get_val = lambda item, path: item.xpath(path)[0].text if item.xpath(path) else ""
Please note that xpath
returns a list. I assume here that you have only one element in that list.
The output is something like that:
... Residential Plot @ Sarjapur Check Post ₨1000 Prestige dolce vita apartments in whitefield, Bangalore Brigade Golden Triangle, ₨12500000 Nikoo Homes, ₨6900000
But I think you want a link, not the text. If this is the case, read below.
Ok, how to get a link? When you have an anchor a
, you get its href
(the link) in the table of attibutes: a.attrib["href"]
.
So as I understand, in the case of the price, you want the text, but in the case of the anchor, you want the value of one specific attributes, href. Here's the real use of lambdas. Rewrite your function like that:
def get_val(item, path, l): return l(item.xpath(path)[0]) if item.xpath(path) else ""
The parameter l
is a function that is applied to the node. l
may return the text of the node, or the href of an anchor:
link = get_val(item,'.//a[contains(@class,"hdrlnk")]', lambda n: n.attrib["href"]) price = get_val(item,'.//span[@class="result-price"]', lambda n: n.text)
Now the output is:
... https://bangalore.craigslist.co.in/reb/d/residential-plot-sarjapur/6522786441.html ₨1000 https://bangalore.craigslist.co.in/reb/d/prestige-dolce-vita/6522754197.html https://bangalore.craigslist.co.in/reb/d/brigade-golden-triangle/6522687904.html ₨12500000 https://bangalore.craigslist.co.in/reb/d/nikoo-homes/6522687772.html ₨6900000
0 comments:
Post a Comment