大学毕业设计仓库管理系统数据库计算机外文参考文献原文及翻译
趣事作文600字-中央美院分数线
河北工程大学毕业论文(设计)
英文参考文献原文复印件及译文
数据仓库
数据仓库为商务运作提供结构与工具,以便系统地组织、理解
和使用数据
进行决策。大量组织机构已经发现,在当今这个充满竞争、快速发展的世界,
数据仓
库是一个有价值的工具。在过去的几年中,许多公司已花费数百万美
元,建立企业范围的数据仓库。许多
人感到,随着工业竞争的加剧,数据仓库
成了必备的最新营销武器——通过更多地了解客户需求而保住客
户的途
径。 “那么”,你可能会充满神秘地问,“到底什么是数据仓库?”
数据
仓库已被多种方式定义,使得很难严格地定义它。宽松地讲,数据仓
库是一个数据库,它与组织机构的操
作数据库分别维护。数据仓库系统允许将
各种应用系统集成在一起,为统一的历史数据分析提供坚实的平
台,对信息处
理提供支持。
按照W. H. Inmon,一位数据仓库系统构造方面的领
头建筑师的说法,“数
据仓库是一个面向主题的、集成的、时变的、非易失的数据集合,支持管理决论文题目:鸿海种业仓库管理系统的
策制定”。这个简短、全面的定义指出了数据仓库的主要特征
。四个关键词,面
向主题的、集成的、时变的、非易失的,将数据仓库与其它数据存储系统
(如
,关系数据库系统、事务处理系统、和文件系统)相区别。让我们进一步
看看这些关键特征。
作者姓名: 石成华
(1)面向主题的:数据仓库围绕一
些主题,如顾客、供应商、产品和销售组
织。数据仓库关注决策者的数据建模与分析,而不是构造组织机
构的日常操作
和事务处理。因此,数据仓库排除对于决策无用的数据,提供特定主题的简明
学号
信息: 100340119
视图。
设计与实现
专业班级: 信管1001
指导老师: 张贵炜
(2)集成的:通常,构造数据仓库是将多个异种数据源,如关系数据库、一
论文日期:
2018.04.10
1 10
般文件和联机事务处理记录
,集成在一起。使用数据清理和数据集成技术,确
保命名约定、编码结构、属性度量的一致性等。
(3)时变的:数据存储从历史的角度(例如,过去5-10
年)提供信息。数据
仓库中的关键结构,隐式或显式地包含时间元素。
(4) 非易失的:
数据仓库总是物理地分离存放数据;这些数据源于操作环境
下的应用数据。由于这种分离,数据仓库不需
要事务处理、恢复和并行控制机
制。通常,它只需要两种数据访问:数据的初始化装入和数据访问。 <
br>概言之,数据仓库是一种语义上一致的数据存储,它充当决策支持数据模
型的物理实现,并存放企
业决策所需信息。数据仓库也常常被看作一种体系结
构,通过将异种数据源中的数据集成在一起而构造,
支持结构化和启发式查
询、分析报告和决策制定。
“好”,你现在问,“那么,什么是建立数据仓库?”
根据上面的讨论,我们把建立数据仓库
看作构造和使用数据仓库的过程。
数据仓库的构造需要数据集成、数据清理、和数据统一。利用数据仓库
常常需
要一些决策支持技术。这使得“知识工人”(例如,经理、分析人员和主管)能
够使用数
据仓库,快捷、方便地得到数据的总体视图,根据数据仓库中的信息
做出准确的决策。有些作者使用术语
“建立数据仓库”表示构造数据仓库的过
程,而用术语“仓库DBMS”表示管理和使用数据仓库。我们
将不区分二者。
“组织机构如何使用数据仓库中的信息?”许多组织机构正在使用这些信息
支持商务决策活动,包括:
(1)、增加顾客关注,包括分析顾客购买模式(如,喜爱买什么、购买时
间、预算周期、消费
习惯);
(2)、根据季度、年、地区的营销情况比较,重新配置产品和管理投资,调
整生产策略;
(3)、分析运作和查找利润源;
(4)、管理顾客关系、进行环境调整、管理合股人的资产开销。
从异种数据库集成的角度看
,数据仓库也是十分有用的。许多组织收集了
形形色色数据,并由多个异种的、自治的、分布的数据源维
护大型数据库。集
成这些数据,并提供简便、有效的访问是非常希望的,并且也是一种挑战。数
据库工业界和研究界都正朝着实现这一目标竭尽全力。
对于异种数据库的集成,传统的数据库做法是:
在多个异种数据库上,建
立一个包装程序和一个集成程序(或仲裁程序)。这方面的例子包括IBM 的
数
据连接程序和Informix的数据刀。当一个查询提交客户站点,首先使用元数据
字典对
查询进行转换,将它转换成相应异种站点上的查询。然后,将这些查询
2 10
映射和发送到局部查询处理器。由不同站点返回的结果被集成为全局回答。这
种查询驱动的方法需要
复杂的信息过滤和集成处理,并且与局部数据源上的处
理竞争资源。这种方法是低效的,并且对于频繁的
查询,特别是需要聚集操作
的查询,开销很大。
对于异种数据库集成的传统方法,数据仓库提
供了一个有趣的替代方案。
数据仓库使用更新驱动的方法,而不是查询驱动的方法。这种方法将来自多个
异种源的信息预先集成,并存储在数据仓库中,供直接查询和分析。与联机事
务处理数据库不同
,数据仓库不包含最近的信息。然而,数据仓库为集成的异
种数据库系统带来了高性能,因为数据被拷贝
、预处理、集成、注释、汇总,
并重新组织到一个语义一致的数据存储中。在数据仓库中进行的查询处理
并不
影响在局部源上进行的处理。此外,数据仓库存储并集成历史信息,支持复杂
的多维查询。
这样,建立数据仓库在工业界已非常流行。
1.操作数据库系统与数据仓库的区别由于大多数人都熟悉
商品
关系数据库系统,将数据仓库与之比较,就容易理解什么是数据仓
库。
联机操作
数据库系统的主要任务是执行联机事务和查询处理。这种系统称
为联机事务处理(OLTP)系统。它们
涵盖了一个组织的大部分日常操作,如购
买、库存、制造、银行、工资、注册、记帐等。另一方面,数据
仓库系统在数
据分析和决策方面为用户或“知识工人”提供服务。这种系统可以用不同的格式
组
织和提供数据,以便满足不同用户的形形色色需求。这种系统称为联机分析
处理(OLAP)系统。
OLTP 和OLAP 的主要区别概述如下。
(1)用户和系统的面向性:OLTP
是面向顾客的,用于办事员、客户、和信
息技术专业人员的事务和查询处理。OLAP
是面向市场的,用于知识工人(包
括经理、主管、和分析人员)的数据分析。
(2)数据内容:OLTP
系统管理当前数据。通常,这种数据太琐碎,难以方
便地用于决策。OLAP 系统管理大量历史数据,
提供汇总和聚集机制,并在不
同的粒度级别上存储和管理信息。这些特点使得数据容易用于见多识广的决
策。
(3)数据库设计:通常,OLTP 系统采用实体-
联系(ER)模型和面向应用的
数据库设计。而OLAP
系统通常采用星形或雪花模型和面向主题的数据库设
计。
(4)视图:OLTP 系统主要关
注一个企业或部门内部的当前数据,而不涉及
历史数据或不同组织的数据。相比之下,由于组织的变化,
OLAP 系统常常跨
3 10
越数据库模式的多个版本。OLAP
系统也处理来自不同组织的信息,由多个数
据存储集成的信息。由于数据量巨大,OLAP
数据也存放在多个存储介质上。
(5)、访问模式:OLTP 系统的访问主要由短的、原子事务组成
。这种系统
需要并行控制和恢复机制。然而,对OLAP系统的访问大部分是只读操作(由
于大
部分数据仓库存放历史数据,而不是当前数据),尽管许多可能是复杂的
查询。 OLTP
和OLAP 的其它区别包括数据库大小、操作的频繁程度、性能
度量等。
2.但是,为什么需要一个分离的数据仓库 “既然操作数据库存放了大量数
据”,你注意到,
“为什么不直接在这种数据库上进行联机分析处理,而是另外
花费时间和资源去构造一个分离的数据仓库
?”分离的主要原因是提高两个系统
的性能。操作数据库是为已知的任务和负载设计的,如使用主关键字
索引和散
列,检索特定的记录,和优化“罐装的”查询。另一方面,数据仓库的查询通常
是复杂
的,涉及大量数据在汇总级的计算,可能需要特殊的数据组织、存取方
法和基于多维视图的实现方法。在
操作数据库上处理OLAP 查询,可能会大大
降低操作任务的性能。
此外,操作数据库
支持多事务的并行处理,需要加锁和日志等并行控制和
恢复机制,以确保一致性和事务的强健性。通常,
OLAP 查询只需要对数据记
录进行只读访问,以进行汇总和聚集。如果将并行控制和恢复机制用于这
OLAP 操作,就会危害并行事务的运行,从而大大降低OLTP 系统的吞吐
量。
最后,数据仓库与操作数据库分离是由于这两种系统中数据的结构、内容
和用法都不相同。决策
支持需要历史数据,而操作数据库一般不维护历史数
据。在这种情况下,操作数据库中的数据尽管很丰富
,但对于决策,常常还是
远远不够的。决策支持需要将来自异种源的数据统一(如,聚集和汇总),产<
br>生高质量的、纯净的和集成的数据。相比之下,操作数据库只维护详细的原始
数据(如事务),这
些数据在进行分析之前需要统一。由于两个系统提供很不
相同的功能,需要不同类型的数据,因此需要维
护分离的数据库。
Data warehousing provides
architectures and tools for business executives to
sy
stematically organize, understand, and use
their data to make strategic decisions. A
lar
ge number of organizations have found that
data warehouse systems are valuable tools
in
today's competitive, fast evolving world. In the
last several years, many firms have
spent
millions of dollars in building
enterprise-
wide data warehouses. Many people
feel that with competition mounting in every
ind
ustry, data warehousing is the latest must-
have marketing weapon ——
4 10
a
way to keep customers by learning more about their
needs.
“So you may ask, full of intrigue,
“what exactly is a data warehouse?
Data
warehouses have been defined in many ways, making
it difficult to formulat
e a rigorous
definition. Loosely speaking, a data warehouse
refers to a database that is
maintained
separately from an organization's operational
databases. Data warehouse s
ystems allow for
the integration of a variety of application
systems. They support info
rmation processing
by providing a solid platform of consolidated,
historical data for a
nalysis.
According
to W. H. Inmon, a leading architect in the
construction of data wareho
use systems, “a
data warehouse is a subject-oriented, integrated,
time-
variant, and nonvolatile collection of
data in support of management's decision
makin
g process.
ata warehouse. The four
keywords, subject-oriented, integrated,
time-
variant, and nonvolatile, distinguish
data warehouses from other data repository
syste
ms, such as relational database systems,
transaction processing systems, and file
syste
ms. Let's take a closer look at each of
these key features.
(1).Subject-
oriented:
A data warehouse is organized around major
subjects, such as customer, ven
dor, product,
and sales. Rather than concentrating on the day-
to-
day operations and transaction processing
of an organization, a data warehouse focuse
s
on the modeling and analysis of data for decision
makers. Hence, data warehouses ty
pically
provide a simple and concise view around
particular subject issues by excludin
g data
that are not useful in the decision support
process.
(2) Integrated: A data warehouse is
usually constructed by integrating multiple
he
terogeneous sources, such as relational
databases, flat files, and on-
line transaction
records. Data cleaning and data integration
techniques are applied to e
nsure consistency
in naming conventions, encoding structures,
attribute measures, and
so on.
(3).Time-
variant: Data are stored to provide
information from a historical perspective (e.g.,
the
past 5-
10 years). Every key structure
in the data warehouse contains, either implicitly
or expl
icitly, an element of time.
(4)Nonvolatile: A data warehouse is always a
physically separate store of data tra
5 10
nsformed from the application data
found in the operational environment. Due to this
separation, a data warehouse does not require
transaction processing, recovery, and
co
ncurrency control mechanisms. It usually
requires only two operations in data
accessi
ng: initial loading of data and access
of data.
In sum, a data warehouse is a
semantically consistent data store that serves as
a p
hysical implementation of a decision
support data model and stores the information
on
which an enterprise needs to make strategic
decisions. A data warehouse is also often
viewed as an architecture, constructed by
integrating data from multiple heterogeneou
s
sources to support structured andor ad hoc
queries, analytical reporting, and decisio
n
making.
“OK you now ask, “what, then, is data
warehousing?
Based on the above, we view data
warehousing as the process of constructing and
using data warehouses. The construction of a data
warehouse requires data integratio
n, data
cleaning, and data consolidation. The utilization
of a data warehouse often nec
essitates a
collection of decision support technologies. This
allows “knowledge worke
rs
veniently obtain
an overview of the data, and to make sound
decisions based on infor
mation in the
warehouse. Some authors use the term “data
warehousing
to the process of data warehouse
construction, while the term warehouse DBMS is
use
d to refer to the management and
utilization of data warehouses. We will not make
thi
s distinction here.
“How are
organizations using the information from data
warehouses?
anizations are using this
information to support business decision making
activities, in
cluding:
(1) increasing
customer focus, which includes the analysis of
customer buying pa
tterns (such as buying
preference, buying time, budget cycles, and
appetites for spendi
ng),
(2)
repositioning products and managing product
portfolios by comparing the per
formance of
sales by quarter, by year, and by geographic
regions, in order to fine-
tune production
strategies,
(3) analyzing operations and
looking for sources of profit,
(4) managing
the customer relationships, making environmental
corrections, and
managing the cost of
corporate assets.
Data warehousing is also
very useful from the point of view of
heterogeneous dat
6 10
abase
integration. Many organizations typically collect
diverse kinds of data and main
tain large
databases from multiple, heterogeneous,
autonomous, and distributed infor
mation
sources. To integrate such data, and provide easy
and efficient access to it is hi
ghly
desirable, yet challenging.
Much effort has
been spent in the database industry and research
community tow
ards achieving this goal.
The traditional database approach to
heterogeneous database integration is to buil
d
wrappers and integrators (or mediators) on top of
multiple, heterogeneous databases
. A variety
of data joiner and data blade products belong to
this category. When a quer
y is posed to a
client site, a metadata dictionary is used to
translate the query into quer
ies appropriate
for the individual heterogeneous sites involved.
These queries are then
mapped and sent to
local query processors. The results returned from
the different sit
es are integrated into a
global answer set. This query-
driven approach
requires complex information filtering and
integration processes, and
competes for
resources with processing at local sources. It is
inefficient and potentiall
y expensive for
frequent queries, especially for queries requiring
aggregations.
Data warehousing provides an
interesting alternative to the traditional
approach o
f heterogeneous database integration
described above. Rather than using a
query-
driven approach, data warehousing
employs an update-
driven approach in which
information from multiple, heterogeneous sources
is integra
ted in advance and stored in a
warehouse for direct querying and analysis. Unlike
on-
line transaction processing databases, data
warehouses do not contain the most current
information. However, a data warehouse brings high
performance to the integrated he
terogeneous
database system since data are copied,
preprocessed, integrated, annotate
d,
summarized, and restructured into one semantic
data store. Furthermore, query proc
essing in
data warehouses does not interfere with the
processing at local sources. Mor
eover, data
warehouses can store and integrate historical
information and support com
plex
multidimensional queries. As a result, data
warehousing has become very popula
r in
industry.
1. Differences between operational
database systems and data warehouses
Since
most people are familiar with commercial
relational database systems, it is
easy to
understand what a data warehouse is by comparing
these two kinds of systems
.
The major
task of on-line operational database systems is to
perform on-
7 10
line transaction
and query processing. These systems are called
on-
line transaction processing (OLTP) systems.
They cover most of the day-to-
day operations
of an organization, such as, purchasing,
inventory, manufacturing, ban
king, payroll,
registration, and accounting. Data warehouse
systems, on the other hand
, serve users or
“knowledge workers
Such systems can organize
and present data in various formats in order to
accommod
ate the diverse needs of the different
users. These systems are known as on-
line
analytical processing (OLAP) systems.
The
major distinguishing features between OLTP and
OLAP are summarized as f
ollows.
(1).
Users and system orientation: An OLTP system is
customer-
oriented and is used for transaction
and query processing by clerks, clients, and
infor
mation technology professionals. An OLAP
system is market-
oriented and is used for data
analysis by knowledge workers, including managers,
exe
cutives, and analysts.
(2). Data
contents: An OLTP system manages current data
that, typically, are too
detailed to be easily
used for decision making. An OLAP system manages
large amou
nts of historical data, provides
facilities for summarization and aggregation, and
stores
and manages information at different
levels of granularity. These features make the
d
ata easier for use in informed decision
making.
(3). Database design: An OLTP system
usually adopts an entity-
relationship (ER)
data model and an application -
oriented
database design. An OLAP system typically adopts
either a star or snowflake
model, and a
subject-oriented database design.
(4). View:
An OLTP system focuses mainly on the current data
within an enterpri
se or department, without
referring to historical data or data in different
organizations.
In contrast, an OLAP system
often spans multiple versions of a database
schema, due
to the evolutionary process of an
organization. OLAP systems also deal with
informat
ion that originates from different
organizations, integrating information from many
da
ta stores. Because of their huge volume,
OLAP data are stored on multiple storage
me
dia.
(5). Access patterns: The access
patterns of an OLTP system consist mainly of
sh
ort, atomic transactions. Such a system
requires concurrency control and recovery
me
chanisms. However, accesses to OLAP systems
are mostly read-
8 10
only
operations (since most data warehouses store
historical rather than up-to-
date
information), although many could be complex
queries.
Other features which distinguish
between OLTP and OLAP systems include data
base
size, frequency of operations, and performance
metrics and so on. 2. But, why ha
ve a separate
data warehouse?
“Since operational databases
store huge amounts of data“why not
perform
on-
line analytical processing directly on such
databases instead of spending additional ti
me
and resources to construct a separate data
warehouse?
A major reason for such a separation
is to help promote the high performance of
both systems. An operational database is
designed and tuned from known tasks and
w
orkloads, such as indexing and hashing using
primary keys, searching for particular
re
cords, and optimizing “canned queries. On
the other hand, data warehouse queries ar
e
often complex. They involve the computation of
large groups of data at summarized
levels, and
may require the use of special data organization,
access, and implementati
on methods based on
multidimensional views. Processing OLAP queries in
operationa
l databases would substantially
degrade the performance of operational tasks.
Moreover, an operational database supports the
concurrent processing of several t
ransactions.
Concurrency control and recovery mechanisms, such
as locking and loggi
ng, are required to ensure
the consistency and robustness of transactions. An
OLAP qu
ery often needs read-
only access of
data records for summarization and aggregation.
Concurrency control a
nd recovery mechanisms,
if applied for such OLAP operations, may
jeopardize the ex
ecution of concurrent
transactions and thus substantially reduce the
throughput of an
OLTP system.
Finally,
the separation of operational databases from data
warehouses is based on
the different
structures, contents, and uses of the data in
these two systems. Decision
support requires
historical data, whereas operational databases do
not typically mainta
in historical data. In
this context, the data in operational databases,
though abundant, i
s usually far from complete
for decision making. Decision support requires
consolidat
ion (such as aggregation and
summarization) of data from heterogeneous sources,
resu
lting in high quality, cleansed and
integrated data. In contrast, operational
databases c
ontain only detailed raw data, such
as transactions, which need to be consolidated
bef
ore analysis. Since the two systems provide
quite different functionalities and require
9
10
different kinds of data, it is
necessary to maintain separate databases.
10 10