我想创建一个 Node.js 应用程序,它对某些网站进行网络抓取,将数据保存在 PostgreSQL 数据库中,然后在网页上显示这些数据的可视化(在 D3.js 中)。我考虑过拆分前端部分(创建和显示可视化)和后端部分(进行网络抓取和更新数据库)。
两个应用程序的骨架(有两个是因为我把任务分成了两个应用程序)如下。
后端应用程序 ( scraper
):
- 连接到数据库
- 如果表不存在则创建表
- 数据刮板
- 将数据保存在数据库中
- 与数据库断开连接。
这个后端应用程序一年只能启动几次(为此,如果使用 Unix,我可以配置一个 CRON 文件)。
前端应用程序(viz
):
- 连接到数据库
- 启动一个在端口 3000 上等待的服务器(我需要它来进行可视化)
- 每次用户刷新页面 (
onLoad()
) 时,应用程序都会进行一次查询 (SELECT
) 从数据库中获取数据。以这种方式,数据总是被更新。
此应用程序仅由程序员启动一次(理想情况下)。
我创建了这种类型的文件夹结构(我使用了npm init
and Express
):
project
|_ scraper
|_ helpers // contains some useful .js files
|_ elaborateJson.js
|_ saveOnDb.js
|_ utilFunc.js
|_ node_modules // modules installed using `npm install moduleName --save`
|_ routes // contains the files that make scraping
|_ downloaderHome.js
|_ downloaderWork.js
|_ services // contains a files concerning the db
|_ postgreSQLlib.js
|_ app.js
|_ package.json
|_ package-lock.json
|_ viz
|_ helpers // // contains some useful .js files
|_ utilFunc.js
|_ node_modules // modules installed using `npm install moduleName --save`
|_ public // contains files for visualizations
|_ index.handlebars
|_ script.js
|_ style.css
|_ services // contains a file concerning the db
|_ postgreSQLlib.js
|_ app.js
|_ package.json
|_ package-lock.json
使用这种结构,我已经有两个我不知道如何解决的问题:
1.postgreSQLlib.js
文件 (and also ) 在andutilFunc.js
中是相同的。我怎样才能避免这种重复的代码?scraper
viz
2. 我必须在and文件夹 中安装一些模块(例如express-handlebars
and express
)两次。scraper
viz
这是project/scraper/app.js
:
const downloaderHome = require('./routes/downloaderHome.js');
const downloaderWork = require('./routes/downloaderWork.js');
const postgreSQLlib = require('./services/postgreSQLlib.js');
const saveOnDb = require('./helpers/saveOnDb.js');
const utilFunc = require('./helpers/utilFunc.js');
const express = require('express');
const exphbs = require('express-handlebars');
var app = express();
start();
async function start() {
console.log('\n Connect to db');
await postgreSQLlib.connect();
console.log('\n Create tables if they do not exist');
await postgreSQLlib.createHomeTable();
await postgreSQLlib.createWorkTable();
console.log('\n Check if table \'home\' is updated or not');
if(!await utilFunc.isTableUpdated('home', 6418)) { // 6308
console.log('\n Download data for home');
await downloaderHome.download();
console.log('\n Saving data for home on db');
await saveOnDb.saveHome();
}
console.log('\n Check if table \'work\' is updated or not');
if(!await utilFunc.isTableUpdated('work', 6804)) {
console.log('\n Download data for work');
await downloaderWork.download();
console.log('\n Saving data for work on db');
await saveOnDb.saveWork();
}
console.log('\n Disconnect from db');
await postgreSQLlib.disconnect();
}
这是project/viz/app.js
:
const postgreSQLlib = require('./services/postgreSQLlib.js');
const utilFunc = require('./helpers/utilFunc.js');
const express = require('express');
const exphbs = require('express-handlebars');
const http = require('http');
var app = express();
var response;
var callback;
start();
async function start() {
console.log('\n Connect to db');
await postgreSQLlib.connect();
// how do I check when page is refreshed?!
http.get({
hostname: 'localhost',
port: 3000,
path: '/',
agent: false
}, callback);
callback = function(res) {
response = res;
console.log(response); // here response will return an object
console.log('refresh callback');
}
console.log(response);
console.log('refresh');
///////////////////////////////////////////////
// How do I check the disconnection from the db?
// If I disconnect now, the visualizations are no longer work.
// So when do I get disconnected?
// Create problems leaving the connection to the active db?
///////////////////////////////////////////////
//console.log('\n Disconnect from db');
//await postgreSQLlib.disconnect();
}
第一个应用程序 ( project/scraper/app.js
) 运行良好。
第二个申请(project/viz/app.js
)没有。我希望你这样做:
- 连接到数据库 [完成。有用]
- 启动一个在端口 3000 上等待的服务器(我需要它来进行可视化)[我该怎么做?往下看(*) ]
- 每次用户刷新页面 (
onLoad()
) 时,应用程序都会进行查询 (SELECT
) 从数据库中获取数据
(*)我曾想过这样的事情:
async function start() {
console.log('\n Connect to db');
await postgreSQLlib.connect();
console.log('\n Get data from db');
var dataHome = await postgreSQLlib.getTableHome();
var dataWork = await postgreSQLlib.getTableWork();
//console.log('\n Connect to my server');
pageLoad(dataHome, dataWork);
}
function pageLoad(dataHome, dataWork) {
var hbs = exphbs.create({
helpers: {
getDataHome: function() {
return JSON.stringify(dataHome);
},
getDataWork: function() {
return JSON.stringify(dataWork);
}
}
});
app.engine('handlebars', hbs.engine);
app.set('view engine', 'handlebars');
app.get('/', function(req, res, next) {
res.render('index', { // index is html filename
showTitle: true,
});
});
console.log('Go to http://localhost:3000/ to see visualizations');
app.listen(3000);
}
wheredataHome
和are 两个包含使用查询dataWork
从数据库下载的数据的对象。SELECT
但是通过这种方式,数据只会被报废一次,而不是每次用户刷新页面时。
帮助将不胜感激。谢谢!
编辑
你能更精确一点吗?我试图这样做,但它不起作用:
项目/即/app.js:
const postgreSQLlib = require('../shared_libs/postgreSQLlib.js');
const express = require('express');
var app = express();
start();
async function start() {
console.log('Connect to db');
await postgreSQLlib.connect();
app.get('/', fetchFreshData);
}
async function fetchFreshData(req, res) {
// download data from db
var dataHome = await postgreSQLlib.getTableHome();
var dataWork = await postgreSQLlib.getTableWork();
// fill this JSON using the results
var viewData = {dataHome, dataWork};
// pass data to view
res.render('index', viewData);
}
项目\viz\view\index.handlebars:
<!DOCTYPE html>
<html lang='en'>
<head>
<meta charset='utf-8'>
<title>Map</title>
<script src='https://d3js.org/d3.v5.js' charset='utf-8'></script>
<link rel='stylesheet' type='text/css' href='/style.css' media='screen'/>
</head>
<body>
<div id='example'></div>
</body>
<script src='/script.js'></script>
</html>
项目\viz\view\script.js:
console.log('viewData:', viewData);
我哪里错了?
编辑 2
好的,我再次修改viz/app.js
代码:
const postgreSQLlib = require('../shared_libs/postgreSQLlib.js');
const express = require('express');
const exphbs = require('express-handlebars');
var app = express();
start();
async function start() {
await postgreSQLlib.connect();
var hbs = Handlebars.registerHelper('json', function(context) {
return JSON.stringify(context);
});
app.engine('handlebars', hbs.engine);
app.set('view engine', 'handlebars');
app.get('/', fetchFreshData);
console.log('Go to http://localhost:3000/ to see data');
app.listen(3000);
}
async function fetchFreshData(req, res) {
// download data from db
var dataHome = await postgreSQLlib.getTableHome();
var dataWork = await postgreSQLlib.getTableWork();
// fill this JSON using the results
var viewData = {};
viewData.timestamp = Date.now();
viewData.entries = dataHome;
// pass data to view
res.render('index', viewData);
}
当我运行应用程序时,没有错误,但如果我连接到http://localhost:3000/,浏览器会告诉我无法访问该站点。我觉得有点傻...
编辑 3
如果我正确理解您的代码,则您的代码中存在(分散注意力的)错误。returnOBJ()
代替res.render('index', viewData);
它应该是res.render('obj', viewData);
(与文件有关obj.hbs
)。正确的?
我以这种方式更改 index.hbs 文件:
<html lang='en'>
<head>
<meta charset='utf-8'>
<title>Index</title>
<script src='https://d3js.org/d3.v5.js' charset='utf-8'></script>
<link rel='stylesheet' type='text/css' href='/style.css' media='screen'/>
</head>
<body>
<h1>INDEX<small>{{timestamp}}</small></h1>
</body>
<script>
// add global variables in the .hbs file
window.viewData_dataWork = {{ json entries }}
console.log(window.viewData);
</script>
<script src='/script.js'></script>
</html>
但我得到:
(node:207156) UnhandledPromiseRejectionWarning: Error: callback function required
at Function.engine (C:\...\node_modules\express\lib\application.js:295:11)
at start (C:\...\viz\app.js:20:6)
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:182:7)
(node:207156) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:207156) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
我也不明白这段代码。
app.set('view engine', 'hbs');
app.engine('hbs', hbs.__express);
hbs.registerHelper('json', function(context) {
return JSON.stringify(context);
});
app.engine('handlebars', hbs.engine);
app.set('view engine', 'handlebars');
为什么你app.set('view engine', ...)
用不同的值调用两次?
编辑 4
我进一步简化了代码:
/viz/app.js:
const postgreSQLlib = require(__dirname + './../shared_libs/services/postgreSQLlib.js');
const express = require('express');
const hbs = require('hbs');
var app = express();
// Server initiator
async function start() {
await postgreSQLlib.connect();
// hbs
app.set('views', '' + __dirname + '/views');
app.set('view engine', 'hbs');
app.engine('hbs', hbs.__express);
hbs.registerHelper('json', function(context) {
return JSON.stringify(context);
});
app.engine('handlebars', hbs.engine);
app.set('view engine', 'handlebars');
// router
app.get('/', testMe);
console.log('Go to http://localhost:3000/ to see data');
app.listen(3000);
}
// Your section with fresh data has been populated properly
async function testMe(req, res) {
console.log('testMe');
// fill this JSON using the results
var viewData = {};
viewData.data = 'this string';
// pass data to view
res.render('test', viewData);
}
// start the server
start();
/viz/views/test.hbs:
<html>
<head>
<title>Server test</title>
</head>
<body>
{{data}}
</body>
</html>
然后在提示命令中,我转到project/viz
并键入node app.js
+ enter。该过程开始并等待:没有错误。当我去http://localhost:3000/
但我得到Connection failed。
我快疯了。
编辑 5
问题不connect
在于做出选择的函数,所以我稍微简化了代码。现在,它几乎可以工作了!
这是代码。
即/app.js:
const postgreSQLlib = require(__dirname + './../shared_libs/services/postgreSQLlib.js');
const express = require('express');
var app = express()
const hbs = require('hbs');
const webapp_opts = {"port":3000};
Initialize();
//.: Setup & Start Server
async function Initialize(){
await postgreSQLlib.connect();
console.log("[~] starting ...")
//:[HBS]:Setup
app.set('view engine', 'hbs');
app.engine('hbs', hbs.__express)
app.set('views', "" + __dirname + "/views")
//:[HBS]:Helpers
hbs.registerHelper('json', function(context) {
return JSON.stringify(context);
})
//:[EXPRESS]:Router.Paths
app.get("/", IndexPathFunction);
// app.get("/script.js", scriptFile); <-- for script.js file
//:[EXPRESS]:Start
app.listen(webapp_opts.port,()=>{
console.log("[i] ready & listening","\n http://localhost:"+webapp_opts.port+"/")
})
}
/*async function scriptFile(req, res) { <-- for script.js file
console.log('\nscriptFile');
var viewData = {};
viewData.number = 50;
console.log('viewData:', viewData);
res.render('script.js', viewData);
}*/
//.: Router Function : "/"
async function IndexPathFunction(req,res){
var viewData = {};
viewData.timestamp = Date.now();
viewData.exJson = [{color: 'red', year: '1955'}, {color: 'blue', year: '2000'}, {color: 'yellow', year: '2013'}];
viewData.exString = 'example of string';
console.log('viewData:', viewData);
res.render('index', viewData);
}
即/views/index.hbs:
<html lang='en'>
<head>
<meta charset='utf-8'>
<title>Index</title>
<script src='https://d3js.org/d3.v5.js' charset='utf-8'></script>
<link rel='stylesheet' type='text/css' href='/style.css' media='screen'/>
</head>
<body>
<h1>INDEX timestamp: <small>{{timestamp}}</small></h1>
</body>
<script>
viewData = {};
console.log('viewData:', viewData);
viewData.exJson = JSON.parse('{{ json exJson }}'.replace(/"/g, '"').replace(/</, ''));
viewData.timestamp = {{timestamp}}; // doesn't work
viewData.exString = {{ exString }}; // doesn't work
console.log('viewData.exJson:', viewData.exJson);
console.log('viewData.timestamp:', viewData.timestamp);
console.log('viewData.exString:', viewData.exString);
</script>
<!--<script src='/script.js'></script>-->
</html>
问题是得到一个不是 json 的数据类型。例如,当我尝试打印时间戳和 exString 时,它给了我错误。为什么?
此外,我想稍微清理一下代码,并将 javascript 部分放在script.js
一个index.hbs
使用<script src='/script.js'></script>
.
编辑 6
我发现这个教程对我非常有用。
我index.hbs
通过添加一个 css 文件、一个图像和一个脚本来编辑文件(它只包含一个console.log('here');
但想法是在 script.js 中放置viewData
变量)。
项目/即/视图/index.hbs:
<html lang='en'>
<head>
<meta charset='utf-8'>
<title>Index</title>
<script src='https://d3js.org/d3.v5.js' charset='utf-8'></script>
<link href="/css/style.css" rel="stylesheet">
</head>
<body>
<img src="/images/logo.png"/>
<h1>timestamp: <small>{{timestamp}}</small></h1>
<h2>Welcome in index.hbs</h2>
</body>
<script>
viewData = {};
console.log('viewData:', viewData);
viewData.exJson = JSON.parse('{{json exJson }}'.replace(/"/g, '"').replace(/</, ''));
viewData.timestamp = {{timestamp}};
viewData.exString = '{{exString}}';
console.log('viewData.exJson:', viewData.exJson);
console.log('viewData.timestamp:', viewData.timestamp);
console.log('viewData.exString:', viewData.exString);
</script>
<link href='/script/script.js' rel='script'>
</html>
我的文件结构是:
project
|_ node_modules
|_ scraper
|_ shared_libs
|_ viz
|_ app.js
|_ public
|_ css
|_ style.css
|_ images
|_ logo.png
|_ script
|_ script.js
|_ views
|_ index.hbs
现在我看到了图像并使用了 css。但是脚本似乎不起作用,因为它没有在这里打印字符串。
我在互联网上搜索如何将变量从脚本标签传递到外部 js 文件,但我似乎没有找到任何适合我的东西。我已经阅读了车把 API,但它们并没有用。