vega-lite-linter

Setup

Prerequisite

Since vega-lite-linter requires Clingo as the solver of Answer Set Programming, you are required to install it first.

For Linux users:

apt-get install -y gringo

For MacOs users:

brew install clingo

Or using Conda:

conda install -c potassco clingo

More information for downloading Clingo can be found here.

Installation

Vega-lite-linter is built on Python 3 and can be installed by:

pip install vega-lite-linter

Sample Code

After successfully installing Clingo and vega-lite-linter, you can use the below sample code to get started.

More detailed examples can be found in Examples.


from vega_lite_linter import Lint

vega_json = {
    "data": {
        "url": "data/cars.json"
    },
    "mark": "bar",
    "encoding": {
        "x": {
            "field": "Horsepower",
            "type": "quantitative"
        },
        "y": {
            "field": "Miles_per_Gallon",
            "type": "quantitative"
        },
        "size": {
            "field": "Cylinders",
            "type": "ordinal"
        }
    }
}

# initialize
lint = Lint(vega_json)

# show rules that the input vega-lite json violated
violate_rules = lint.lint()

# show fixing recommendation by vega-lite-linter
fix = lint.fix()

API Reference

Vega-lite-linter provides simple APIs for visualization developers to detect and fix issues in the built visualizations.

Initialization

At first, a Lint instance should be initialized given the target visualization specification:

lint = Lint(vegalite_json)

After initialization, the two functions listed below can be called on the instance object.

`lint()`: Detecting Issues

lint() detects any issues in the given visualization specification. Each detected issue will be presented as an Rule object containing:

id: string. The linted rule id.
param1: string(optional). Related parameters, usually the encoding channel where the rule is related.
param2: string(optional). Related parameters, usually the incorrect input where the rule is violated.
explain: string. The description of the rule.

`fix()`: Fixing Issues

fix() runs the algorithm to help revise the visualization specification into a correct one. The result of fix() contains:

fixable: boolean. The indicator of whether the given visualization specification can be fixed by vega-lite-linter.
optimize_spec: object. The specification after revision.
optimize_actions: Action[]. The recommended action set to fix the visualization.
possible_actions: Action[][]. All possible actions to fix the visualization, grouped by each issue.
violate_rules: Rule[]. The detected rules in the original specification.
origin_spec: object. The original specification.

Action object contains:

action: string. The name of the action.
param1: string. The encoding channel to perform the action.
param2: string. The parameter to perform the action.
rid: string. The rule id that the action solved.
transition: number. The transition cost of the action.
reward: number. The reward of the action.
score: number. The score of the action, calculating by transition and reward.
action_intro: string. The description of the action.
apply: 0 | 1. The indicator of whether the action is adopted in the optimize_actions.

Vega-Lite Properties

The related Vega-Lite properties are listed as follows.

Data

Vega-lite-linter helps detect some errors related to data by deriving data properties from raw data, such as data field type and min/max value of numerical data field. Currently, vega-lite-linter supports such calculation with inline data specified using values property, or build-in datasets of Vega and Vega-Lite.

airports
anscombe
barley
burtin
cars
crimea
driving
iowa-electricity
iris
la-riots
seattle-temps
seattle-weather
sf-temps
stocks
us-employment
wheat

Mark

Property	Value
mark	Required. The mark type of the visualization. Can be one of the following values: area, bar, line, point, and tick.

Encoding

Property	Value
channel	Required. The encoding channel type, which is specified as the key of each encoding. Can be one of the following values: x, y, color, size.
field	The data field encoded by the channel.
type	The type of measurement. Can be one of the following values: quantitative, temporal, ordinal, or nominal.
bin	Binning discretizes numeric values into a set of bins. Can be one of the following values: true, false, or { maxBins: Maximum_number_of_bins(e.g., 10) }.
aggregate	Aggregating summary statistics on the data field. Can be one of the following values: count, mean, median, min, max, stdev, sum and etc.
stack	The type of stacking offset if the field should be stacked. Can be one of the following values: true, zero, normalize, center or false.
scale	Functions that transform a domain of data values.

The scale property includes:

Property	Value
type	The type of scale transformation. Currently, the algorithm detects errors related to log type.
zero	If true, ensure that a zero baseline value is included in the scale domain.

More details about Vega-Lite properties can be found here.

Rules

Rules in vega-lite-linter are referred to and refined from Draco. The rules are grouped into four categories.

Issue Type 1. Incompatibility issues within each encoding channel.

Rule	Meaning
enc_type_valid_1	Verify the consistency of data field and type 'quantitative'.
enc_type_valid_2	Verify the consistency of data field and type 'temporal'.
bin_q_o	Only use bin on quantitative or ordinal data.
zero_q	Only use log scale with quantitative data.
log_discrete	Only use log scale with non-discrete data.
log_zero	A log scale cannot have a zero baseline in the scale domain.
log_non_positive	Use log scale on data that are all positive.
bin_and_aggregate	Use both bin and aggregate on the data in the same time is illegal.
aggregate_o_valid	Oridnal data only supports min, max, and median aggregation.
aggregate_t_valid	Temporal only supports min and max aggregation.
aggregate_nominal	Nominal data cannot be aggregated.
count_q_without_field_1	Use count aggregation or declare a data field of an encoding, instead of doing both of them.
count_q_without_field_2	The encoding with count aggregation has to be 'quantitative' type.
size_nominal	Channel size implies order in the data, it is not suitable for nominal data.
size_negative	Channel size is not suitable for data with negative values.
encoding_no_field_and_not_count	Declare the data field or use count aggregation in each encoding.
color_with_cardinality_gt_twenty	Use at most 20 categorical colors in the visualization.
stack_without_x_y	Use stack on x or y channels.
stack_discrete	Use stack on continuous data.

Issue Type 2. Incompatibility issues across multiple encoding channels.

Rule	Meaning
repeat_channel	Use each channel only once.
no_encodings	Use at least one encoding. Otherwise, the visualization doesn't show anything.
same_field_x_and_y	Use different fields for x axis and y axis.
count_twice	Use count aggregation once in the visualization.
stack_without_summative_agg	Only use summative aggregation (count, sum, distinct, valid, missing) with stack in the encoding.
stack_without_discrete_color_1	Only use stack with a color channel encoding discrete data in the visualization.
stack_without_discrete_color_2	Only use stack with a color channel encoding discrete data in the visualization.
stack_without_discrete_color_3	Only use stack with a color channel encoding discrete data in the visualization.
stack_with_non_positional_non_agg	When using stack in the visualization, apply aggregation in non-positional continuous channels (color, size) .

Issue Type 3. Incompatibility issues between encoding channels and marks.

Rule	Meaning
point_tick_bar_without_x_or_y	Use x or y channel for mark 'point', 'tick', and 'bar'.
line_area_without_x_y	Use x and y channels for mark 'line' and 'area'.
bar_tick_continuous_x_y	Use no more than one continuous data in the x and y channels for mark 'bar' and 'tick'.
bar_tick_area_line_without_continuous_x_y	Mark 'bar', 'tick', 'line', 'area' require some continuous variable on x or y.
bar_area_without_zero_1	Mark 'bar' and 'area' require the scale of the x-axis to start at zero, when the x-axis encodes quantitative data.
bar_area_without_zero_2	Mark 'bar' and 'area' require the scale of the y-axis to start at zero, when the y-axis encodes quantitative data.
size_without_point	Use the size channel with the mark 'point' would be better.
stack_without_bar_area	Only use stacking for the mark 'bar' and 'area'.

Issue Type 4. Typo issues.

Rule	Meaning
invalid_mark	Use valid mark type, including 'area', 'bar', 'line', 'point', 'tick'.
invalid_channel	Use valid channels, including x, y, color, size.
invalid_type	Use valid types, including quantitative, nominal, ordinal, temporal.
invalid_agg	Use valid aggregation, including count, mean, median, min, max, stdev, sum, etc.
invalid_bin	Use non-negative number for bin amounts (maxbins).

Credits

Vega-lite-linter was invented by the iDV^x Lab together with AntV. Based on our technology, AntV and iDV^x Lab also developed ChartLinter in Javascript to support visualization charts beyond Vega-Lite.

Contact Us

If you have any questions, please feel free to open an issue or contact idvx.lab [at] gmail.com.

License

The software is available under the MIT License.

Documentation