Vapnik–Chervonenkis dimension

In Vapnik–Chervonenkis theory, the Vapnik–Chervonenkis (VC) dimension is a measure of the size (capacity, complexity, expressive power, richness, or flexibility) of a class of sets. The notion can be extended to classes of binary functions. It is defined as the cardinality of the largest set of points that the algorithm can shatter, which means the algorithm can always learn a perfect classifier for any labeling of at least one configuration of those data points. It was originally defined by Vladimir Vapnik and Alexey Chervonenkis. Informally, the capacity of a classification model is related to how complicated it can be. For example, consider the thresholding of a high-degree polynomial: if the polynomial evaluates above zero, that point is classified as positive, otherwise as negative. A high-degree polynomial can be wiggly, so it can fit a given set of training points well. But one can expect that the classifier will make errors on other points, because it is too wiggly. Such a polynomial has a high capacity. A much simpler alternative is to threshold a linear function. This function may not fit the training set well, because it has a low capacity. This notion of capacity is made rigorous below.

Definitions

VC dimension of a set-family

Let H be a set family (a set of sets) and C a set. Their intersection is defined as the following set family: We say that a set C is shattered by H if H\cap C contains all the subsets of C, i.e.: The VC dimension D of H is the cardinality of the largest set that is shattered by H. If arbitrarily large sets can be shattered, the VC dimension is \infty.

VC dimension of a classification model

A binary classification model f with some parameter vector \theta is said to shatter a set of generally positioned data points if, for every assignment of labels to those points, there exists a \theta such that the model f makes no errors when evaluating that set of data points. The VC dimension of a model f is the maximum number of points that can be arranged so that f shatters them. More formally, it is the maximum cardinal D such that there exists a generally positioned data point set of cardinality D that can be shattered by f.

Examples

Uses

In statistical learning theory

The VC dimension can predict a probabilistic upper bound on the test error of a classification model. Vapnik proved that the probability of the test error (i.e., risk with 0–1 loss function) distancing from an upper bound (on data that is drawn i.i.d. from the same distribution as the training set) is given by: where D is the VC dimension of the classification model,, and N is the size of the training set (restriction: this formula is valid when D\ll N. When D is larger, the test-error may be much higher than the training-error. This is due to overfitting). The VC dimension also appears in sample-complexity bounds. A space of binary functions with VC dimension D can be learned with: samples, where \varepsilon is the learning error and \delta is the failure probability. Thus, the sample-complexity is a linear function of the VC dimension of the hypothesis space.

In computational geometry

The VC dimension is one of the critical parameters in the size of ε-nets, which determines the complexity of approximation algorithms based on them; range sets without finite VC dimension may not have finite ε-nets at all.

Bounds

Examples of VC Classes

VC dimension of a finite projective plane

A finite projective plane of order n is a collection of n2 + n + 1 sets (called "lines") over n2 + n + 1 elements (called "points"), for which: The VC dimension of a finite projective plane is 2. Proof: (a) For each pair of distinct points, there is one line that contains both of them, lines that contain only one of them, and lines that contain none of them, so every set of size 2 is shattered. (b) For any triple of three distinct points, if there is a line x that contain all three, then there is no line y that contains exactly two (since then x and y would intersect in two points, which is contrary to the definition of a projective plane). Hence, no set of size 3 is shattered.

VC dimension of a boosting classifier

Suppose we have a base class B of simple classifiers, whose VC dimension is D. We can construct a more powerful classifier by combining several different classifiers from B; this technique is called boosting. Formally, given T classifiers and a weight vector, we can define the following classifier: The VC dimension of the set of all such classifiers (for all selections of T classifiers from B and a weight-vector from ), assuming T,D\ge 3, is at most:

VC dimension of a neural network

A neural network is described by a directed acyclic graph G(V,E), where: The VC dimension of a neural network is bounded as follows:

Generalizations

The VC dimension is defined for spaces of binary functions (functions to {0,1}). Several generalizations have been suggested for spaces of non-binary functions.

Footnotes

This article is derived from Wikipedia and licensed under CC BY-SA 4.0. View the original article.

Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc.
Bliptext is not affiliated with or endorsed by Wikipedia or the Wikimedia Foundation.

Tools

Edit History

Contents