Try to search your question here, if you can't find : Ask Any Question Now ?

how to yield a parsed item from one link with other parsed items from other links in the same item list

HomeCategory: stackoverflowhow to yield a parsed item from one link with other parsed items from other links in the same item list
bhawya asked 3 weeks ago

The problem is that I’ve been iterating from a list of places to scrape the latitude longitude and elevation. The thing is when I get what I scraped back I have no way to link it with my current df since the names that I iterated may have either been modified or skipped.

I’ve managed to get the name of what I looked but since its parsed from an outside the link from the rest of the items it doesn’t work properly.

import scrapy
import pandas as pd
from ..items import latlonglocItem


df = pd.read_csv('wine_df_final.csv')
df = df[pd.notnull(df.real_place)]
real_place = list(set(df.real_place))


class latlonglocSpider(scrapy.Spider):


    name = 'latlonglocs'
    start_urls = []


    for place in real_place:
        baseurl =  place.replace(',', '').replace(' ', '+')
        cleaned_href = f'http://www.google.com/search?q={baseurl}+coordinates+latitude+longitude+distancesto'
        start_urls.append(cleaned_href)



    def parse(self, response):

        items = latlonglocItem()

        items['base_name'] = response.xpath('string(/html/head/title)').get().split(' coordinates')[0]
        for href in response.xpath('//*[@id="ires"]/ol/div/h3/a/@href').getall():
            if href.startswith('/url?q=https://www.distancesto'):
                yield response.follow(href, self.parse_distancesto)
            else:
                pass
        yield items

    def parse_distancesto(self, response):
        items = latlonglocItem()

        try:
            items['appellation'] = response.xpath('string(/html/body/div[3]/div/div[2]/div[3]/div[2]/p/strong)').get()
            items['latitude'] = response.xpath('string(/html/body/div[3]/div/div[2]/div[3]/div[3]/table/tbody/tr[1]/td)').get()
            items['longitude'] = response.xpath('string(/html/body/div[3]/div/div[2]/div[3]/div[3]/table/tbody/tr[2]/td)').get()
            items['elevation'] = response.xpath('string(/html/body/div[3]/div/div[2]/div[3]/div[3]/table/tbody/tr[10]/td)').get()
            yield items
        except Exception:
            pass
#output
 appellation      base_name       elevation    latitude    longitude
                  Chalone, USA
 Santa Cruz, USA                  56.81        35           9.23 

what is happening is that I parse what I looked for then it goes inside a link and parses the rest of the information. However, evidently on my dataframe I get the name of what I looked for completely unattached with the rest of the items and even then is hard to find the match. I wish to pass the info to the other function so it yields all the items all together.

1 Answers
Best Answer
Mannu answered 3 weeks ago
Your Answer

13 + 11 =

Popular Tags

WP Facebook Auto Publish Powered By : XYZScripts.com