# Semi-Supervised Domain Adaptation with Non-Parametric Copulas

###### Abstract

A new framework based on the theory of copulas is proposed to address semi-supervised domain adaptation problems. The presented method factorizes any multivariate density into a product of marginal distributions and bivariate copula functions. Therefore, changes in each of these factors can be detected and corrected to adapt a density model accross different learning domains. Importantly, we introduce a novel vine copula model, which allows for this factorization in a non-parametric manner. Experimental results on regression problems with real-world data illustrate the efficacy of the proposed approach when compared to state-of-the-art techniques.

## 1 Introduction

When humans address a new learning problem, they often use knowledge acquired
while learning different but related tasks in the past. For example, when
learning a second language, people rely on grammar rules and word derivations
from their mother tongue. This is called *language transfer*
[19]. However, in machine learning, most of the
traditional methods are not able to exploit similarities between different
learning tasks. These techniques only achieve good performance when the data
distribution is stable between training and test phases. When this is not the
case, it is necessary to a) collect and label additional data and b) re-run
the learning algorithm. However, these operations are not affordable in most
practical scenarios.

*Domain adaptation*, *transfer learning* or *multitask learning*
frameworks [17, 2, 5, 13] confront these issues by first, building a notion of
*task relatedness* and second, providing mechanisms to *transfer
knowledge* between similar tasks. Generally, we are interested in improving
predictive performance on a *target task* by using knowledge obtained when
solving another related *source task*. Domain adaptation methods are
concerned about *what* knowledge we can share between different tasks,
*how* we can transfer this knowledge and *when* we should do it or
not to avoid additional damage [4].

In this work, we study semi-supervised domain adaptation for regression tasks. In these problems, the object of interest (the mechanism that maps a set of inputs to a set of outputs) can be stated as a conditional density function. The data available for solving each learning task is assumed to be sampled from modified versions of a common multivariate distribution. Therefore, we are interested in sharing the “common pieces” of this generative model between tasks, and use the data from each individual task to detect, learn and adapt the varying parts of the model. To do so, we must find a decomposition of multivariate distributions into simpler building blocks that may be studied separately across different domains. The theory of copulas provides such representations [18].

Copulas are statistical tools that factorize multivariate distributions into
the product of its marginals and a function that captures any possible form of
dependence among them. This function is referred to as the copula, and it links
the marginals together into the joint multivariate model. Firstly introduced by
Sklar [22], copulas have been successfully used in a wide range of
applications, including finance, time series or natural phenomena modeling
[12]. Recently, a new family of copulas named
*vines* have gained interest in the statistics literature [1].
These are methods that factorize multivariate densities into a product of
marginal distributions and bivariate copula functions. Each of these factors
corresponds to one of the building blocks that we assume either constant or
varying across different learning domains.

The contributions of this paper are two-fold. First, we propose a non-parametric vine copula model which can be used as a high-dimensional density estimator. Second, by making use of this method, we present a new framework to address semi-supervised domain adaptation problems, which performance is validated in a series of experiments with real-world data and competing state-of-the-art techniques.

The rest of the paper is organized as follows: Section 2 provides a brief introduction to copulas, and describes a non-parametric estimator for the bivariate case. Section 3 introduces a novel non-parametric vine copula model, which is formed by the described bivariate non-parametric copulas. Section 4 describes a new framework to address semi-supervised domain adaptation problems using the proposed vine method. Finally, section 5 describes a series of experiments that validate the proposed approach on regression problems with real-world data.

## 2 Copulas

When the components of are jointly independent, their density function can be written as

(1) |

This equality does not hold when are not independent.
Nevertheless, the differences can be corrected if we multiply the right hand
side of (1) by a specific function that fully describes any
possible dependence between . This function is called the
*copula* of [18] and satisfies

(2) |

The copula is the joint density of , where is the marginal cdf of the random variable . This density has uniform marginals, since for any random variable . That is, when we apply the transformation to , we are eliminating all information about the marginal distributions. Therefore, the copula captures any distributional pattern that does not depend on their specific form, or, in other words, all the information regarding the dependencies between . When are continuous, the copula is unique [22]. However, infinitely many multivariate models share the same underlying copula function, as illustrated in Figure 1. The main advantage of copulas is that they allow us to model separately the marginal distributions and the dependencies linking them together to produce the multivariate model subject of study.

Given a sample from (2), we can estimate as follows. First, we construct estimates of the marginal pdfs, , which also provide estimates of the corresponding marginal cdfs, . These cdfs estimates are used to map the data to the -dimensional unit hyper-cube. The transformed data are then used to obtain an estimate for the copula of . Finally, (2) is approximated as

(3) |

The estimation of marginal pdfs and cdfs can be implemented in a non-parametric manner by using unidimensional kernel density estimates. By contrast, it is common practice to assume a parametric model for the estimation of the copula function. Some examples of parametric copulas are Gaussian, Gumbel, Frank, Clayton or Student copulas [18]. Nevertheless, real-world data often exhibit complex dependencies which cannot be correctly described by these parametric copula models. This lack of flexibility of parametric copulas is illustrated in Figure 2. As an alternative, we propose to approximate the copula function in a non-parametric manner. Kernel density estimates can also be used to generate non-parametric approximations of copulas, as described in [8]. The following section reviews this method for the two-dimensional case.

### 2.1 Non-parametric Bivariate Copulas

We now elaborate on how to non-parametrically estimate the copula of a given bivariate density . Recall that this density can be factorized as the product of its marginals and its copula

(4) |

Additionally, given a sample from , we can obtain a pseudo-sample from its copula by mapping each observation to the unit square using estimates of the marginal cdfs, namely

(5) |

These are approximate observations from the uniformly distributed random variables and , whose joint density is the copula function . We could try to approximate this density function by placing Gaussian kernels on each observation and . However, the resulting density estimate would have support on , while the support of is the unit square. A solution is to perform the density estimation in a transformed space. For this, we select some continuous distribution with support on , strictly positive density , cumulative distribution and quantile function . Let and be two new random variables given by and . Then, the joint density of and is

(6) |

The copula of this new density is identical to the copula of (4), since the performed transformations are marginal-wise. The support of (6) is now ; therefore, we can now approximate it with Gaussian kernels. Let and . Then,

(7) |

where is a two-dimensional Gaussian density with mean and covariance matrix . For convenience, we select , and to be the standard Gaussian pdf, cdf and quantile function, respectively. Finally, the copula density is approximated by combining (6) with (7):

(8) |

## 3 Regular Vines

The method described above can be generalized to the estimation of copulas of
more than two random variables. However, although kernel density estimates can
be successful in spaces of one or two dimensions, as the number of variables
increases, this methods start to be significantly affected by the curse of
dimensionality and tend to overfit to the training data. Additionally, for
addressing domain adaptation problems, we are interested in factorizing these
high-dimensional copulas into simpler building blocks transferrable accross
learning domains. These two drawbacks can be addressed by recent methods in
copula modelling called *vines* [1]. Vines decompose any
high-dimensional copula density as a product of bivariate copula densities that
can be approximated using the non-parametric model described above. These
bivariate copulas (as well as the marginals) correspond to the simple building
blocks that we plan to transfer from one learning domain to another. Different
types of vines have been proposed in the literature. Some examples are
*canonical vines*, *D-vines* or *regular* vines
[16, 1]. In this work we focus on regular vines (R-vines) since
they are the most general models.

An R-vine for a probability density with variable set is formed by a set of undirected trees , each of them with corresponding set of nodes and set of edges , where for . Any edge has associated three sets called the conditioned, conditioning and constraint sets of , respectively. Initially, is inferred from a complete graph with a node associated with each element of ; for any joining nodes and , and . The trees are constructed so that each is formed by joining two edges which share a common node, for . The new edge has conditioned, conditioning and constraint sets given by , , , where is the symmetric difference operator. Figure 3 illustrates this procedure for an R-vine with 4 variables.