Impute#

The impute transform allows you to fill-in missing entries in a dataset. As an example, consider the following data, which includes missing values that we filter-out of the long-form representation (see Long-form vs. Wide-form Data for more on this):

import numpy as np
import pandas as pd

data = pd.DataFrame({
    't': range(5),
    'x': [2, np.nan, 3, 1, 3],
    'y': [5, 7, 5, np.nan, 4]
}).melt('t').dropna()
data
       t variable  value
    0  0        x    2.0
    2  2        x    3.0
    3  3        x    1.0
    4  4        x    3.0
    5  0        y    5.0
    6  1        y    7.0
    7  2        y    5.0
    9  4        y    4.0

Notice the result: the x series has no entry at t=1, and the y series has a missing entry at t=3. If we use Altair to visualize this data directly, the line skips the missing entries:

import altair as alt

raw = alt.Chart(data).mark_line(point=True).encode(
    x='t:Q',
    y='value:Q',
    color='variable:N'
)
raw

This is not always desirable, because (particularly for a line plot with no points) it can imply the existence of data that is not there.

Impute via Encodings#

To address this, you can use the impute method of the encoding channel. For example, we can impute using a constant value (we’ll show the raw chart lightly in the background for reference):

background = raw.encode(opacity=alt.value(0.2))
chart = alt.Chart(data).mark_line(point=True).encode(
    x='t:Q',
    y=alt.Y('value:Q').impute(value=0),
    color='variable:N'
)
background + chart

Or we can impute using any supported aggregate:

chart = alt.Chart(data).mark_line(point=True).encode(
    x='t:Q',
    y=alt.Y('value:Q').impute(method='mean'),
    color='variable:N'
)
background + chart

Impute via Transform#

Similar to the Bin and Aggregate, it is also possible to specify the impute transform outside the encoding as a transform. For example, here is the equivalent of the above two charts:

chart = alt.Chart(data).transform_impute(
    impute='value',
    key='t',
    value=0,
    groupby=['variable']
).mark_line(point=True).encode(
    x='t:Q',
    y='value:Q',
    color='variable:N'
)
background + chart
chart = alt.Chart(data).transform_impute(
    impute='value',
    key='t',
    method='mean',
    groupby=['variable']
).mark_line(point=True).encode(
    x='t:Q',
    y='value:Q',
    color='variable:N'
)
background + chart

If you would like to use more localized imputed values, you can specify a frame parameter similar to the Window that will control which values are used for the imputation. For example, here we impute missing values using the mean of the neighboring points on either side:

chart = alt.Chart(data).transform_impute(
    impute='value',
    key='t',
    method='mean',
    frame=[-1, 1],
    groupby=['variable']
).mark_line(point=True).encode(
    x='t:Q',
    y='value:Q',
    color='variable:N'
)
background + chart

Transform Options#

The transform_impute() method is built on the ImputeTransform class, which has the following options:

Click to show table

Property

Type

Description

frame

array([null, number])

A frame specification as a two-element array used to control the window over which the specified method is applied. The array entries should either be a number indicating the offset from the current data object, or null to indicate unbounded rows preceding or following the current data object. For example, the value [-5, 5] indicates that the window should include five objects preceding and five objects following the current object.

Default value:: [null, null] indicating that the window includes all objects.

groupby

array(FieldName)

An optional array of fields by which to group the values. Imputation will then be performed on a per-group basis.

impute

FieldName

The data field for which the missing values should be imputed.

key

FieldName

A key field that uniquely identifies data objects within a group. Missing key values (those occurring in the data but not in the current group) will be imputed.

keyvals

anyOf(array(any), ImputeSequence)

Defines the key values that should be considered for imputation. An array of key values or an object defining a number sequence.

If provided, this will be used in addition to the key values observed within the input data. If not provided, the values will be derived from all unique values of the key field. For impute in encoding, the key field is the x-field if the y-field is imputed, or vice versa.

If there is no impute grouping, this property must be specified.

method

ImputeMethod

The imputation method to use for the field value of imputed data objects. One of "value", "mean", "median", "max" or "min".

Default value: "value"

value

any

The field value to use when the imputation method is "value".