数据集

王朝百科·作者佚名 2009-12-04

Data Set

(From Wikipedia, the free encyclopedia)

A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. It lists values for each of the variables, such as height and weight fo an object or values of random numbers. Each value is known as a datum. The data set may comprise data for one or more members, corresponding to the number of rows.

Historically, the term originated in the mainframe field, where it had a well-defined meaning, very close to contemporary computer file. This topic is not covered here.

In the simplest case, there is only one variable, and then the data set consists of a single column of values, often represented as a list. In spite of the name, such a univariate data set is not a set in the usual mathematical sense, since a given value may occur multiple times. Normally the order does not matter, and then the collection of values may be considered to be a multiset rather than an (ordered) list.

The values may be numbers, such as real numbers or integers, for example representing a person's height in centimeters, but may also be nominal data (i.e., not consisting of numerical values), for example representing a person's ethnicity. More generally, values may be of any of the kinds described as a level of measurement. For each variable, the values will normally all be of the same kind. However, there may also be "missing values", which need to be indicated in some way.

In statistics data sets usually come from actual observations obtained by sampling a statistical population, and each row corresponds to the observations on one element of that population. Data sets may further be generated by algorithms for the purpose of testing certain kinds of software. Some modern statistical analysis software such as PSPP still present their data in the classsical data set fashion.

『翻译如下：

数据集

来自百科，自由的百科全书

Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

从历史上看，这个术语起源于大型机领域，在那里它有一个明确界定的意义，非常接近现代的计算机档案。这个主题是不包括在这里。

最简单的情况下，只有一个变量，然后在数据集由一列列的数值组成，往往被描述为一个列表。尽管名称，这样一个单数据集不是一套通常的数学意义，因为某一个指定数值，可能会出现多次。通常的顺序并不重要，然后这样数值的集合可能被视为多重集，而不是（顺序）列表。

值可能是数字，例如真正的数字或整数，例如代表一个人的身高多少厘米，但也可能是象征性的数据（即不包括数字），例如代表一个人的种族问题。更一般的说，价值可以是任何类型描述为某种程度的测量。对于每一个变量，通常所有的值都是同类。但是也可能是“遗漏值”，其中需要指出的某种方式。

在统计数据集通常来自实际观测得到的抽样统计人口，每一行对应于观测的一个组成部分，人口。数据集可能会进一步产生算法为测试目的某些种类的软件。一些现代统计分析软件，如PSPP仍然存在的数据中的经典数据集的方式。

』

数据集在断开缓存中存储数据。数据集的结构类似于关系数据库的结构；它公开表、行和列的分层对象模型。另外，它包含为数据集定义的约束和关系。

数据集可以类型化或非类型化。类型化数据集是这样一种数据集，它先从基类派生，然后使用XML架构文件（.xsd文件）中的信息生成新类。架构中的信息（表、列等）被作为一组第一类对象和属性生成并编译为此新数据集类。

因为类型化DataSet类从基类DataSet继承，所以此类型化类承接DataSet类的所有功能，并且可与将DataSet类的实例作为参数的方法一起使用。

相形之下，非类型化数据集没有相应的内置架构。与类型化数据集一样，非类型化数据集也包含表、列等，但它们只作为集合公开。

（摘自WordPress中文）

Delphi 4中有四种类型的标准数据集构件，分别是TTable、TQuery、TStoredProc和TClientDataSet。这些数据集构件都是从一个共同的基类TDataSet继承下来的，其中，只有TClientDataSet是直接从TDataSet继承下来的，而TTable、TQuery、TStoredProc的直接上级是TDBDataSet，TDBDataSet的上级是TBDEDataSet，TBDEDataSet 的上级才是TDataSet。这几个类之间的继承关系可以用图6.1来表示。

TDataSet是所有数据集的抽象基类，它的大部分属性和方法是虚拟的或抽象的。所谓虚拟的方法，是指这些方法可以被派生类重载。所谓抽象的方法，是指这些方法只有声明，没有定义，派生类必须给出定义后才能调用这些方法，不同的派生类可以有不同的定义。

由于TDataSet中包含抽象的方法，您不能直接创建它的实例，否则会引起运行期错误。

如果从功能上划分，TDataSet的属性和方法可以分为这么几大块：打开和关闭数据集、浏览记录、编辑数据、书签管理、控制连接、访问字段、记录缓冲区管理、过滤、事件。