查看 Amazon Redshift 等服务,该服务旨在存储 PB 级数据。什么形式的数据应该存储在这里?日志,原始数据?
2 回答
数据仓库的问题不是你在其中存储什么样的信息,而是你如何存储它以及你打算将它用于什么。组织需要分析和比较的任何数据都可以放入数据仓库。
定义数据仓库非常困难,您可能会得到与您询问的人一样多的定义。我见过很多不同的实现,没有人能真正说这是一个数据仓库,而事实并非如此。然而,数据仓库通常应该满足一些关键点,即它应该是时间变量(即随时间存储数据点)并且它应该是非易失性的(即你永远不会更新数据仓库中的数据,你只插入)。
遵循这些规则,您可以进行最常见的数据仓库分析,即分析一段时间内的数据,例如比较本季与上季的销售额。我不确定 Amazon Redshift 实际上做了什么,但如果它是一个数据仓库,我认为更多的是你如何使用它的问题。
from what I understand the Amazon Redshift is a Service NOT a TECHNOLOGY. The service is meant to handle ALL of your data warehousing needs towards keeping a minimised Capital expenditure (CAPEX).
Effectively you can use it as the corporate Data Warehousing solution (store ANY DATA you would have paid money to store and analyse: be it Logs, Raw Unstructured Data, Structured Data - literally ANY DATA); this is what Amazon is aiming at. It is intended to save you the costs of infrastructure, software, setup and even people costs, therefore it's nature as a Service. Having worked in the Data industry for 20 years I can see the advantage being offered.
I have also noticed that Amazon is even offering a certification program which should simplify the selection of people who you have to hire to service this solution when you are ready to venture into it.
See this very simple video here - it sounds too good to be true. But I would advice you to get a certified or very experienced with Amazon Cloud Infrastructure deployments, see some partners here, so you get the true ins and outs. I am sure they will offer you free consultation as part of their pre-sales work.
All the best! Leslie