On The Combinatorics of Sample Compression Schemes

Institution: | Faculty of Graduate Studies and Research, University of Regina |
---|---|

Department: | Mathematics |

Degree: | MSc |

Year: | 2014 |

Record ID: | 2029868 |

Full text PDF: | http://hdl.handle.net/10294/5391 |

A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements for the Degree of Master of Science in Mathematics, University of Regina. vi, 76 p. A sample compression scheme of size k for a concept class C is a pair of functions (f; g) called the compression function and the reconstruction function. The functions have the property that for any sample S consistent with a concept in C, f compresses S to some subset of S, for which g returns a set of domain points, labelled consistently with the original sample S. The sample compression scheme is called labelled if the compression sets are labelled subsets of S and unlabelled if the compression sets are subsets of the instance set of S. M. Warmuth and S. Floyd have shown that if a sample compression scheme of size equal to the VC dimension of a concept class C exists then C can be PAC learned by some learning algorithm. Although it is already known that any concept class of nite VC dimension is PAC learnable, the existence of a sample compression scheme of size equal to the VC dimension improves the sample complexity of learning some concept classes. This leads to an important conjecture, rst proposed by M. Warmuth and S. Floyd: does there always exist a sample compression scheme of size O(d) for a concept class C with VC dimension d. This thesis examines a modi cation of sample compression schemes, speci cally, for a concept class C we de ne a sequence-based sample compression scheme for C as a pair of functions (f ; g ) where the items we compress to are now sequences instead of sets. Here we can di erentiate between labelled and unlabelled sequence-based sam- ple compression schemes in a similar fashion as with standard sample compression schemes. We look at properties of sequence-based sample compression schemes and also discuss a few sequence-based sample compression scheme algorithms and deter- mine how they improve compression bounds over the original set-based compression scheme algorithms. Finally we discuss connections between set and sequence-based sample compression schemes and design theory. ii