---
title: "module util::Sampling"
id: Sampling
slug: /Library/util/Sampling
---

<div class="theme-doc-version-badge badge badge--secondary">rascal-Not specified</div>

Utilities to randomly select smaller datasets from larger datasets
#### Usage

```rascal
import util::Sampling;
```

#### Dependencies
```rascal
import util::Math;
import Map;
import List;
import Set;
```

#### Description


Sampling is important when the analysis algorithms do not scale to the size of 
the original corpus, or when you need to train an analysis on a representative
set without overfitting on the entire corpus. These sampling functions all
assume that a uniformly random selection is required.


## function sample {#util-Sampling-sample}

Reduce the arity of a set by selecting a uniformly distributed sample.

```rascal
set[&T] sample(set[&T] corpus, int target)
```


A uniform subset is computed by iterating over the set and skipping every element
with a probability of `1/(size(corpus) / target)`. This rapidly generates a new set of
expected `target` size, but most probably a little smaller or larger.

#### Examples



```rascal-shell 
rascal>import util::Sampling;
ok
rascal>sample({"a","b","c","e","f","g","h","i","j","k"}, 4)
set[str]: {"b","e","f","g","h"}
rascal>sample({"a","b","c","e","f","g","h","i","j","k"}, 4)
set[str]: {"c","e","f","h","i"}
rascal>sample({"a","b","c","e","f","g","h","i","j","k"}, 4)
set[str]: {"b","c","f","h","k"}
```

## function sample {#util-Sampling-sample}

Reduce the length of a list by selecting a uniformly distributed sample.

```rascal
list[&T] sample(list[&T] corpus, int target)
```


The random selection of elements does not change their initial order in the list.
A uniform sublist is computed by iterating over the list and skipping every element
with a probability of `1/(size(corpus) / target)`. This rapidly generates a new list of
expected `target` size, but most probably a little smaller or larger.

#### Examples



```rascal-shell 
rascal>import util::Sampling;
ok
rascal>sample([1..1000], 30)
list[int]: [9,10,89,92,123,126,179,219,265,391,462,496,500,513,525,546,623,675,692,738,773,777,809,829,841,855,858,891,948,992]
rascal>sample([1..1000], 30)
list[int]: [9,15,18,24,52,130,133,164,203,233,258,291,294,331,338,373,394,439,457,467,468,495,517,568,635,640,641,649,663,669,785,835,865,924,960]
rascal>sample([1..1000], 30)
list[int]: [69,101,114,130,132,178,192,210,258,327,332,341,369,384,465,479,503,518,588,629,678,680,689,703,712,746,754,755,761,767,791,807,846,848,857,869,876,918,970]
```

## function sample {#util-Sampling-sample}

Reduce the size of a map by selecting a uniformly distributed sample.

```rascal
map[&T,&U] sample(map[&T,&U] corpus, int target)
```


A uniform submap is computed by iterating over the map's keys and skipping every key
with a probability of `1/(size(corpus) / target)`. This rapidly generates a new map of
expected `target` size, but most probably a little smaller or larger.

