我不是 Python 专家,甚至可能不是 Python 业余爱好者,但我一直在为一个围绕获取 TikTok 数据的项目学习和编辑脚本。所以我确信我的代码可能有点乱,因为我边走边学。
好的,我有一个脚本,可以下载一个或一组 TikTok,转录语音,然后输出我要查找的数据并将转录结果输出为 .CSV,但是当涉及到时我陷入了僵局使其连续运行并更新每个视频的成绩单。
我想要的是程序一次下载一个 TikToks 并转录它们,然后使用 TikTok API 提供的字典将视频中的数据写入 .CSV 以及一个使用文本的附加列转录。现在它做了这些事情,但我不知道如何将后续视频的新成绩单写成一个更新列,就像当前的数据一样。我还想学习如何让程序循环直到停止,而不是预先定义视频数量。
让我知道这一切是否有意义,或者您是否有改进的想法。
verifyFp="tiktok_fp"
api = TikTokApi.get_instance(custom_verifyFp=verifyFp, custom_device_id=did, use_test_endpoints=True)
hashtag='hashtag'
videofun=api.by_hashtag(hashtag=hashtag, count=2)
temp_count=0
transcript_list=[]
############################################################################################################
for i in range(len(videofun)):
data = api.get_video_by_tiktok(videofun[i])# bytes of the video
with open("downloads/{}.mp4".format(str("video")), 'wb') as output:
output.write(data) # saves data to the mp4 file
transcribed_audio_file_name = "transcribed_speech.wav"
video_file_name = r"downloads\video.mp4"
audioclip = AudioFileClip(video_file_name)
audioclip.write_audiofile(transcribed_audio_file_name)
apikey = 'IBM watson api key'
url = 'IBM url'
authenticator = IAMAuthenticator(apikey)
stt = SpeechToTextV1(authenticator=authenticator)
stt.set_service_url(url)
with open('transcribed_speech.wav', 'rb') as f:
res = stt.recognize(audio=f, content_type='audio/wav', model='en-AU_NarrowbandModel').get_result()
res
len(res['results'])
text = [result['alternatives'][0]['transcript'].rstrip() + '.\n' for result in res['results']]
text = [para[0].title() + para[1:] for para in text]
transcript = ''.join(text)
with open(r'\transcription.txt', 'w') as out:
out.writelines(transcript)
def simple_dict(tiktok_dict):
to_return = {}
to_return['user_name'] = tiktok_dict['author']['uniqueId']
to_return['user_id'] = tiktok_dict['author']['id']
to_return['video_id'] = tiktok_dict['id']
to_return['video_desc'] = tiktok_dict['desc']
to_return['video_time'] = tiktok_dict['createTime']
to_return['video_length'] = tiktok_dict['video']['duration']
to_return['video_link'] = 'https://www.tiktok.com/@{}/video/{}?lang=en'.format(to_return['user_name'], to_return['video_id'])
to_return['n_likes'] = tiktok_dict['stats']['diggCount']
to_return['n_shares'] = tiktok_dict['stats']['shareCount']
to_return['n_comments'] = tiktok_dict['stats']['commentCount']
to_return['n_plays'] = tiktok_dict['stats']['playCount']
return to_return
tiktoks = [simple_dict(v) for v in (videofun)]
tiktoks_df = DataFrame.from_dict(tiktoks)
tiktoks_df.to_csv('tiktoks.csv'.format(hashtag),index=False)
print(transcript_list)```