Documentation

Setup

Prerequisite

Since vega-lite-linter requires Clingo as the solver of Answer Set Programming, you are required to install it first.

For Linux users:

apt-get install -y gringo

For MacOs users:

brew install clingo

Or using Conda:

conda install -c potassco clingo

More information for downloading Clingo can be found here.

Installation

Vega-lite-linter is built on Python 3 and can be installed by:

pip install vega-lite-linter

Sample Code

After successfully installing Clingo and vega-lite-linter, you can use the below sample code to get started.

More detailed examples can be found in Examples.


from vega_lite_linter import Lint

vega_json = {
    "data": {
        "url": "data/cars.json"
    },
    "mark": "bar",
    "encoding": {
        "x": {
            "field": "Horsepower",
            "type": "quantitative"
        },
        "y": {
            "field": "Miles_per_Gallon",
            "type": "quantitative"
        },
        "size": {
            "field": "Cylinders",
            "type": "ordinal"
        }
    }
}

# initialize
lint = Lint(vega_json)

# show rules that the input vega-lite json violated
violate_rules = lint.lint()

# show fixing recommendation by vega-lite-linter
fix = lint.fix()
                                        

API Reference

Vega-lite-linter provides simple APIs for visualization developers to detect and fix issues in the built visualizations.

Initialization

At first, a Lint instance should be initialized given the target visualization specification:

lint = Lint(vegalite_json)

After initialization, the two functions listed below can be called on the instance object.

lint(): Detecting Issues

lint() detects any issues in the given visualization specification. Each detected issue will be presented as an Rule object containing:

  • id: string. The linted rule id.
  • param1: string(optional). Related parameters, usually the encoding channel where the rule is related.
  • param2: string(optional). Related parameters, usually the incorrect input where the rule is violated.
  • explain: string. The description of the rule.

fix(): Fixing Issues

fix() runs the algorithm to help revise the visualization specification into a correct one. The result of fix() contains:

  • fixable: boolean. The indicator of whether the given visualization specification can be fixed by vega-lite-linter.
  • optimize_spec: object. The specification after revision.
  • optimize_actions: Action[]. The recommended action set to fix the visualization.
  • possible_actions: Action[][]. All possible actions to fix the visualization, grouped by each issue.
  • violate_rules: Rule[]. The detected rules in the original specification.
  • origin_spec: object. The original specification.

Action object contains:

  • action: string. The name of the action.
  • param1: string. The encoding channel to perform the action.
  • param2: string. The parameter to perform the action.
  • rid: string. The rule id that the action solved.
  • transition: number. The transition cost of the action.
  • reward: number. The reward of the action.
  • score: number. The score of the action, calculating by transition and reward.
  • action_intro: string. The description of the action.
  • apply: 0 | 1. The indicator of whether the action is adopted in the optimize_actions.

Vega-Lite Properties

The related Vega-Lite properties are listed as follows.

Data

Vega-lite-linter helps detect some errors related to data by deriving data properties from raw data, such as data field type and min/max value of numerical data field. Currently, vega-lite-linter supports such calculation with inline data specified using values property, or build-in datasets of Vega and Vega-Lite.

  • airports
  • anscombe
  • barley
  • burtin
  • cars
  • crimea
  • driving
  • iowa-electricity
  • iris
  • la-riots
  • seattle-temps
  • seattle-weather
  • sf-temps
  • stocks
  • us-employment
  • wheat

Mark

Property Value
mark Required. The mark type of the visualization. Can be one of the following values: area, bar, line, point, and tick.

Encoding

Property Value
channel Required. The encoding channel type, which is specified as the key of each encoding. Can be one of the following values: x, y, color, size.
field The data field encoded by the channel.
type The type of measurement. Can be one of the following values: quantitative, temporal, ordinal, or nominal.
bin Binning discretizes numeric values into a set of bins. Can be one of the following values: true, false, or { maxBins: Maximum_number_of_bins(e.g., 10) }.
aggregate Aggregating summary statistics on the data field. Can be one of the following values: count, mean, median, min, max, stdev, sum and etc.
stack The type of stacking offset if the field should be stacked. Can be one of the following values: true, zero, normalize, center or false.
scale Functions that transform a domain of data values.

The scale property includes:

Property Value
type The type of scale transformation. Currently, the algorithm detects errors related to log type.
zero If true, ensure that a zero baseline value is included in the scale domain.

More details about Vega-Lite properties can be found here.

Rules

Rules in vega-lite-linter are referred to and refined from Draco. The rules are grouped into four categories.

Issue Type 1. Incompatibility issues within each encoding channel.

Rule Meaning
enc_type_valid_1 Verify the consistency of data field and type 'quantitative'.
enc_type_valid_2 Verify the consistency of data field and type 'temporal'.
bin_q_o Only use bin on quantitative or ordinal data.
zero_q Only use log scale with quantitative data.
log_discrete Only use log scale with non-discrete data.
log_zero A log scale cannot have a zero baseline in the scale domain.
log_non_positive Use log scale on data that are all positive.
bin_and_aggregate Use both bin and aggregate on the data in the same time is illegal.
aggregate_o_valid Oridnal data only supports min, max, and median aggregation.
aggregate_t_valid Temporal only supports min and max aggregation.
aggregate_nominal Nominal data cannot be aggregated.
count_q_without_field_1 Use count aggregation or declare a data field of an encoding, instead of doing both of them.
count_q_without_field_2 The encoding with count aggregation has to be 'quantitative' type.
size_nominal Channel size implies order in the data, it is not suitable for nominal data.
size_negative Channel size is not suitable for data with negative values.
encoding_no_field_and_not_count Declare the data field or use count aggregation in each encoding.
color_with_cardinality_gt_twenty Use at most 20 categorical colors in the visualization.
stack_without_x_y Use stack on x or y channels.
stack_discrete Use stack on continuous data.

Issue Type 2. Incompatibility issues across multiple encoding channels.

Rule Meaning
repeat_channel Use each channel only once.
no_encodings Use at least one encoding. Otherwise, the visualization doesn't show anything.
same_field_x_and_y Use different fields for x axis and y axis.
count_twice Use count aggregation once in the visualization.
stack_without_summative_agg Only use summative aggregation (count, sum, distinct, valid, missing) with stack in the encoding.
stack_without_discrete_color_1 Only use stack with a color channel encoding discrete data in the visualization.
stack_without_discrete_color_2 Only use stack with a color channel encoding discrete data in the visualization.
stack_without_discrete_color_3 Only use stack with a color channel encoding discrete data in the visualization.
stack_with_non_positional_non_agg When using stack in the visualization, apply aggregation in non-positional continuous channels (color, size) .

Issue Type 3. Incompatibility issues between encoding channels and marks.

Rule Meaning
point_tick_bar_without_x_or_y Use x or y channel for mark 'point', 'tick', and 'bar'.
line_area_without_x_y Use x and y channels for mark 'line' and 'area'.
bar_tick_continuous_x_y Use no more than one continuous data in the x and y channels for mark 'bar' and 'tick'.
bar_tick_area_line_without_continuous_x_y Mark 'bar', 'tick', 'line', 'area' require some continuous variable on x or y.
bar_area_without_zero_1 Mark 'bar' and 'area' require the scale of the x-axis to start at zero, when the x-axis encodes quantitative data.
bar_area_without_zero_2 Mark 'bar' and 'area' require the scale of the y-axis to start at zero, when the y-axis encodes quantitative data.
size_without_point Use the size channel with the mark 'point' would be better.
stack_without_bar_area Only use stacking for the mark 'bar' and 'area'.

Issue Type 4. Typo issues.

Rule Meaning
invalid_mark Use valid mark type, including 'area', 'bar', 'line', 'point', 'tick'.
invalid_channel Use valid channels, including x, y, color, size.
invalid_type Use valid types, including quantitative, nominal, ordinal, temporal.
invalid_agg Use valid aggregation, including count, mean, median, min, max, stdev, sum, etc.
invalid_bin Use non-negative number for bin amounts (maxbins).

Credits

Vega-lite-linter was invented by the iDVx Lab together with AntV. Based on our technology, AntV and iDVx Lab also developed ChartLinter in Javascript to support visualization charts beyond Vega-Lite.

Contact Us

If you have any questions, please feel free to open an issue or contact idvx.lab [at] gmail.com.

License

The software is available under the MIT License.