python - 从 url 下载 csv 并使其成为数据框 python pandas

Question

我是 python 新手，所以在这里需要一点帮助。我有一个带有链接的 url 列的数据框，该链接允许我为每个链接下载 CSV。我的目标是创建一个循环/任何可行的方法，以便我可以运行一个命令来下载、读取 csv 并为每一行创建一个数据框。任何帮助，将不胜感激。我在下面附上了部分数据框。如果链接不起作用（它可能不会，你可以用来自“ https://finance.yahoo.com/quote/GOOG/history?p=GOOG ”（任何其他公司）的链接替换它和导航以下载 csv 并使用该链接。

数据框：

Symbol         Link
YI             https://query1.finance.yahoo.com/v7/finance/download/YI?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
PIH            https://query1.finance.yahoo.com/v7/finance/download/PIH?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
TURN           https://query1.finance.yahoo.com/v7/finance/download/TURN?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
FLWS           https://query1.finance.yahoo.com/v7/finance/download/FLWS?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E

再次感谢。

score 16 · Accepted Answer

有多种方法可以从 URL 获取 CSV 数据。从您的示例中，即 Yahoo Finance，您可以复制Historical 数据链接并在 Pandas 中调用它

...
HISTORICAL_URL = "https://query1.finance.yahoo.com/v7/finance/download/GOOG?period1=1582781719&period2=1614404119&interval=1d&events=history&includeAdjustedClose=true"

df = pd.read_csv(HISTORICAL_URL)

一般模式可能涉及像requests或httpx发出 GET|POST 请求并将内容获取到io.

import pandas as pd
import requests
import io

url = 'https://query1.finance.yahoo.com/v7/finance/download/GOOG'
params ={'period1':1538761929,
         'period2':1541443929,
         'interval':'1d',
         'events':'history',
         'crumb':'v4z6ZpmoP98',
        }

r = requests.post(url,data=params)
if r.ok:
    data = r.content.decode('utf8')
    df = pd.read_csv(io.StringIO(data))

为了获取参数，我只是按照喜欢的内容复制了“？”之后的所有内容。检查它们是否匹配；）

结果：

更新：

如果您可以直接在 url 中看到原始 csv 内容，只需将pd.read_csv 示例数据中的 url 直接从 url 传递：

data_url ='https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/data/iris.csv'

df = pd.read_csv(data_url)

score 0 · Accepted Answer

我经常使用这个程序

import pandas as pd
import requests

url="<URL TO DOWNLOAD.CSV>"
s=requests.get(url).content
c=pd.read_csv(s)

score 0 · Accepted Answer

首先将任务分解为更小的部分，您需要做的是：

使用链接迭代 DataFrame。

for index, row in df.iterrows():
    url= row["Link"]

requests使用 Python库从 Yahoo Finance 下载 JSON 文件。这可能是最困难的部分，您需要在实际下载 CSV 文件之前获取 cookie，更多信息在这里、这里和这里。使用 cookie 创建正确的 URL 后，您可以使用以下命令下载它：
```
re = requests.get(URL)
print(re.status_code) #status code 200 for successful download
```
或者，您可以将响应保存到本地磁盘。

用熊猫加载它。

df = pd.read_csv(file_name) #in case of saving file to disk
df = pd.read_csv(re.content) #directly from the response

score 0 · Accepted Answer

If you apply the following to the dataframe it will place each of the documents in an np.array. Not in a dataframe( I'm unsure of how to get there). But this will give you access to all the files and its only a matter of putting them in a df.

links = test['Link'].unique()

import requests
a=[]
for x in links:
     url=x
     s=requests.get(url).content
     a.append(s)

a[4] or np.array(a[4]).tolist() outputs the entire file just in the incorrect format.

Use 'https://api.iextrading.com/1.0/stock/GOOG/chart/5y?format=csv' rather than Yahoo it is much more accessible.

python - 从 url 下载 csv 并使其成为数据框 python pandas

4 回答 4

Related

Reference