0

我访问了下面有公共信息的网站,并在 Inspect Element 中跟踪了结果

链接中,我在“Nome da Parte”字段中输入名称“MARCONI FERREIRA PERILLO JUNIOR”,然后单击“Consultar”按钮

以下是诉讼清单

在检查元素中,在“网络”选项卡中,我看到有一个“请求 URL:” https://pjd.tjgo.jus.br/BuscaProcessoPublica(使用 POST 方法)

我遵循了我朋友对这个问题的建议,并尝试根据请求创建一个 Python 脚本,以捕获 JSON 格式的搜索结果。我试过这样:

import requests

url = "https://pjd.tjgo.jus.br/BuscaProcessoPublica"

header = {
    'Cookie': 'style=null; WIDPJP=.cp06-2:cp06-2; JSESSIONID=h_9l1zwHbaWGL2pPbVTyf8alvfVUREhUbCtqNGxN.cp06:server-cp06-2',
    'Content-Type': 'application/json'
}

r = requests.post(url, headers=header)
r
<Response [200]>

但返回的只是查询屏幕文本

请问有谁知道我怎样才能获得正确的标题信息并做出正确的调用?

我试图找到这个问题中指出的解决方案。

4

1 回答 1

2

如果您查看“网络”选项卡并右键单击网络请求,然后选择“复制 -> 复制为 cURL”,您会得到如下内容:

curl 'https://pjd.tjgo.jus.br/BuscaProcessoPublica' -H 'Connection: keep-alive' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Origin: https://pjd.tjgo.jus.br' -H 'X-Requested-With: XMLHttpRequest' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36' -H 'Content-Type: application/x-www-form-urlencoded;charset=UTF-8' -H 'Sec-Fetch-Site: same-origin' -H 'Sec-Fetch-Mode: cors' -H 'Referer: https://pjd.tjgo.jus.br/BuscaProcessoPublica?PaginaAtual=2&Passo=7' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept-Language: en-US,en;q=0.9' -H 'Cookie: WIDPJP=.cp03-1:cp03-1; style=null; JSESSIONID=kMiu0qC-d11BAdWVd6QoJM837YUXuTsWVgTofjLk.cp03:server-cp03-1' --data 'chamadaAjax=true&tabela=1&offset=0&PassoEditar=12&consultaPronta=true&' --compressed

然后,您可以使用https://curl.trillworks.com/之类的工具作为一种简单的方法将其从 cURL 格式转换为请求格式,并最终得到


cookies = {
    'WIDPJP': '.cp03-1:cp03-1',
    'style': 'null',
    'JSESSIONID': 'kMiu0qC-d11BAdWVd6QoJM837YUXuTsWVgTofjLk.cp03:server-cp03-1',
}

headers = {
    'Connection': 'keep-alive',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache',
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'Origin': 'https://pjd.tjgo.jus.br',
    'X-Requested-With': 'XMLHttpRequest',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36',
    'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8',
    'Sec-Fetch-Site': 'same-origin',
    'Sec-Fetch-Mode': 'cors',
    'Referer': 'https://pjd.tjgo.jus.br/BuscaProcessoPublica?PaginaAtual=2&Passo=7',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
}

data = {
  'chamadaAjax': 'true',
  'tabela': '1',
  'offset': '0',
  'PassoEditar': '12',
  'consultaPronta': 'true',
  '': ''
}

response = requests.post('https://pjd.tjgo.jus.br/BuscaProcessoPublica', headers=headers, cookies=cookies, data=data)

# Edit: If you want to get all the pages of data, you can do something like this: 

response = requests.post('https://pjd.tjgo.jus.br/BuscaProcessoPublica', headers=headers, cookies=cookies, data=data)

TOTAL = response.json()['total']

offset = 0

values = []

while offset < TOTAL:
    data['offset'] = str(offset)
    response = requests.post('https://pjd.tjgo.jus.br/BuscaProcessoPublica', headers=headers, cookies=cookies, data=data)
    rows = response.json()['rows']
    offset += len(rows)
    values += rows

于 2019-11-14T21:20:18.550 回答