blob: 86da734f80f76bf8bd9db7931f2050ea7738a1e6 [file] [log] [blame]
(window.webpackJsonp=window.webpackJsonp||[]).push([[42],{343:function(e,n,t){"use strict";t.r(n),n.default="# Data Transform\n\n`Data transform` has been supported since Apache ECharts<sup>TM</sup> 5. In echarts, the term `data transform` refers to generating new data from user provided source data using transform functions. This feature enables users to process data in a declarative way and provides users several common \"transform functions\" to make such tasks work \"out-of-the-box\". (For consistency, we use the noun form \"transform\" rather than \"transformation.\")\n\nThe abstract formula of a data transform is: `outData = f(inputData)`, where the transform function `f` can be `filter`, `sort`, `regression`, `boxplot`, `cluster`, `aggregate` and so on.\nWith the help of these transform functions, users can implement features like:\n\n- Partition data into multiple series.\n- Compute statistics and visualize the results.\n- Adapt visualization algorithms to the data and display the results.\n- Sort data.\n- Remove or choose certain kind of empty or special datums.\n- ...\n\n## Get Started to Data Transform\n\nIn echarts, data transform is implemented based on the concept of [dataset](${optionPath}#dataset).You can specify a [dataset.transform](${optionPath}#dataset.transform) within a dataset instance to indicate that the dataset should be generated using the defined `transform`. For example:\n\n```js live\nvar option = {\n dataset: [\n {\n // This dataset is on `datasetIndex: 0`.\n source: [\n ['Product', 'Sales', 'Price', 'Year'],\n ['Cake', 123, 32, 2011],\n ['Cereal', 231, 14, 2011],\n ['Tofu', 235, 5, 2011],\n ['Dumpling', 341, 25, 2011],\n ['Biscuit', 122, 29, 2011],\n ['Cake', 143, 30, 2012],\n ['Cereal', 201, 19, 2012],\n ['Tofu', 255, 7, 2012],\n ['Dumpling', 241, 27, 2012],\n ['Biscuit', 102, 34, 2012],\n ['Cake', 153, 28, 2013],\n ['Cereal', 181, 21, 2013],\n ['Tofu', 395, 4, 2013],\n ['Dumpling', 281, 31, 2013],\n ['Biscuit', 92, 39, 2013],\n ['Cake', 223, 29, 2014],\n ['Cereal', 211, 17, 2014],\n ['Tofu', 345, 3, 2014],\n ['Dumpling', 211, 35, 2014],\n ['Biscuit', 72, 24, 2014]\n ]\n // id: 'a'\n },\n {\n // This dataset is on `datasetIndex: 1`.\n // A `transform` is configured to indicate that the\n // final data of this dataset is transformed via this\n // transform function.\n transform: {\n type: 'filter',\n config: { dimension: 'Year', value: 2011 }\n }\n // There can be optional properties `fromDatasetIndex` or `fromDatasetId`\n // to indicate that where is the input data of the transform from.\n // For example, `fromDatasetIndex: 0` specify the input data is from\n // the dataset on `datasetIndex: 0`, or `fromDatasetId: 'a'` specify the\n // input data is from the dataset having `id: 'a'`.\n // [DEFAULT_RULE]\n // If both `fromDatasetIndex` and `fromDatasetId` are omitted,\n // `fromDatasetIndex: 0` are used by default.\n },\n {\n // This dataset is on `datasetIndex: 2`.\n // Similarly, if neither `fromDatasetIndex` nor `fromDatasetId` is\n // specified, `fromDatasetIndex: 0` is used by default\n transform: {\n // The \"filter\" transform filters and gets data items only match\n // the given condition in property `config`.\n type: 'filter',\n // Transforms has a property `config`. In this \"filter\" transform,\n // the `config` specify the condition that each result data item\n // should be satisfied. In this case, this transform get all of\n // the data items that the value on dimension \"Year\" equals to 2012.\n config: { dimension: 'Year', value: 2012 }\n }\n },\n {\n // This dataset is on `datasetIndex: 3`\n transform: {\n type: 'filter',\n config: { dimension: 'Year', value: 2013 }\n }\n }\n ],\n series: [\n {\n type: 'pie',\n radius: 50,\n center: ['25%', '50%'],\n // In this case, each \"pie\" series reference to a dataset that has\n // the result of its \"filter\" transform.\n datasetIndex: 1\n },\n {\n type: 'pie',\n radius: 50,\n center: ['50%', '50%'],\n datasetIndex: 2\n },\n {\n type: 'pie',\n radius: 50,\n center: ['75%', '50%'],\n datasetIndex: 3\n }\n ]\n};\n```\n\nLet's summarize the key points of using data transform:\n\n- Generate new data from existing declared data via the declaration of `transform`, `fromDatasetIndex`/`fromDatasetId` in some blank dataset.\n- Series references these datasets to show the result.\n\n## Advanced Usage\n\n#### Piped Transform\n\nThere is a syntactic sugar that pipe transforms like:\n\n```js\noption = {\n dataset: [\n {\n source: [] // The original data\n },\n {\n // Declare transforms in an array to pipe multiple transforms,\n // which makes them execute one by one and take the output of\n // the previous transform as the input of the next transform.\n transform: [\n {\n type: 'filter',\n config: { dimension: 'Product', value: 'Tofu' }\n },\n {\n type: 'sort',\n config: { dimension: 'Year', order: 'desc' }\n }\n ]\n }\n ],\n series: {\n type: 'pie',\n // Display the result of the piped transform.\n datasetIndex: 1\n }\n};\n```\n\n> Note: theoretically any type of transform is able to have multiple input data and multiple output data. But when a transform is piped, it is only able to take one input (except it is the first transform of the pipe) and product one output (except it is the last transform of the pipe).\n\n#### Output Multiple Data\n\nIn most cases, transform functions only need to produce one data. But there is indeed scenarios that a transform function needs to produce multiple data, each of whom might be used by different series.\n\nFor example, in the built-in boxplot transform, besides boxplot data produced, the outlier data are also produced, which can be used in a scatter series. See the [example](${exampleEditorPath}boxplot-light-velocity).\n\nWe use prop [dataset.fromTransformResult](${optionPath}#dataset.fromTransformResult) to satisfy this requirement. For example:\n\n```js\noption = {\n dataset: [\n {\n // Original source data.\n source: []\n },\n {\n transform: {\n type: 'boxplot'\n }\n // After this \"boxplot transform\" two result data generated:\n // result[0]: The boxplot data\n // result[1]: The outlier data\n // By default, when series or other dataset reference this dataset,\n // only result[0] can be visited.\n // If we need to visit result[1], we have to use another dataset\n // as follows:\n },\n {\n // This extra dataset references the dataset above, and retrieves\n // the result[1] as its own data. Thus series or other dataset can\n // reference this dataset to get the data from result[1].\n fromDatasetIndex: 1,\n fromTransformResult: 1\n }\n ],\n xAxis: {\n type: 'category'\n },\n yAxis: {},\n series: [\n {\n name: 'boxplot',\n type: 'boxplot',\n // Reference the data from result[0].\n datasetIndex: 1\n },\n {\n name: 'outlier',\n type: 'scatter',\n // Reference the data from result[1].\n datasetIndex: 2\n }\n ]\n};\n```\n\nWhat more, [dataset.fromTransformResult](${optionPath}#dataset.fromTransformResult) and [dataset.transform](${optionPath}#dataset.transform) can both appear in one dataset, which means that the input of the transform is from retrieved from the upstream result specified by `fromTransformResult`. For example:\n\n```js\n{\n fromDatasetIndex: 1,\n fromTransformResult: 1,\n transform: {\n type: 'sort',\n config: { dimension: 2, order: 'desc' }\n }\n}\n```\n\n#### Debug in Develop Environment\n\nWhen using data transform, we might run into the trouble that the final chart do not display correctly but we do not know where the config is wrong. There is a property `transform.print` might help in such case. (`transform.print` is only available in dev environment).\n\n```js\noption = {\n dataset: [\n {\n source: []\n },\n {\n transform: {\n type: 'filter',\n config: {},\n // The result of this transform will be printed\n // in dev tool via `console.log`.\n print: true\n }\n }\n ]\n};\n```\n\n## Filter Transform\n\nTransform type \"filter\" is a built-in transform that provide data filter according to specified conditions. The basic option is like:\n\n```js live\noption = {\n dataset: [\n {\n source: [\n ['Product', 'Sales', 'Price', 'Year'],\n ['Cake', 123, 32, 2011],\n ['Latte', 231, 14, 2011],\n ['Tofu', 235, 5, 2011],\n ['Milk Tee', 341, 25, 2011],\n ['Porridge', 122, 29, 2011],\n ['Cake', 143, 30, 2012],\n ['Latte', 201, 19, 2012],\n ['Tofu', 255, 7, 2012],\n ['Milk Tee', 241, 27, 2012],\n ['Porridge', 102, 34, 2012],\n ['Cake', 153, 28, 2013],\n ['Latte', 181, 21, 2013],\n ['Tofu', 395, 4, 2013],\n ['Milk Tee', 281, 31, 2013],\n ['Porridge', 92, 39, 2013],\n ['Cake', 223, 29, 2014],\n ['Latte', 211, 17, 2014],\n ['Tofu', 345, 3, 2014],\n ['Milk Tee', 211, 35, 2014],\n ['Porridge', 72, 24, 2014]\n ]\n },\n {\n transform: {\n type: 'filter',\n config: { dimension: 'Year', '=': 2011 }\n // The config is the \"condition\" of this filter.\n // This transform traverse the source data and\n // and retrieve all the items that the \"Year\"\n // is `2011`.\n }\n }\n ],\n series: {\n type: 'pie',\n datasetIndex: 1\n }\n};\n```\n\nThis is another example of filter transform:\n\n<md-example src=\"data-transform-filter\"></md-example>\n\n**About dimension:**\n\nThe `config.dimension` can be:\n\n- Dimension name declared in dataset, like `config: { dimension: 'Year', '=': 2011 }`. Dimension name declaration is not mandatory.\n- Dimension index (start from 0), like `config: { dimension: 3, '=': 2011 }`.\n\n**About relational operator:**\n\nThe relational operator can be:\n`>`(`gt`), `>=`(`gte`), `<`(`lt`), `<=`(`lte`), `=`(`eq`), `!=`(`ne`, `<>`), `reg`. (The name in the parentheses are aliases). They follows the common semantics.\nBesides the common number comparison, there is some extra features:\n\n- Multiple operators are able to appear in one {} item like `{ dimension: 'Price', '>=': 20, '<': 30 }`, which means logical \"and\" (Price >= 20 and Price < 30).\n- The data value can be \"numeric string\". Numeric string is a string that can be converted to number. Like ' 123 '. White spaces and line breaks will be auto trimmed in the conversion.\n- If we need to compare \"JS `Date` instance\" or date string (like '2012-05-12'), we need to specify `parser: 'time'` manually, like `config: { dimension: 3, lt: '2012-05-12', parser: 'time' }`.\n- Pure string comparison is supported but can only be used in `=`, `!=`. `>`, `>=`, `<`, `<=` do not support pure string comparison (the \"right value\" of the four operators can not be a \"string\").\n- The operator `reg` can be used to make regular expression test. Like using `{ dimension: 'Name', reg: /\\s+Müller\\s*$/ }` to select all data items that the \"Name\" dimension contains family name Müller.\n\n**About logical relationship:**\n\nSometimes we also need to express logical relationship ( `and` / `or` / `not` ):\n\n```js\noption = {\n dataset: [\n {\n source: [\n // ...\n ]\n },\n {\n transform: {\n type: 'filter',\n config: {\n // Use operator \"and\".\n // Similarly, we can also use \"or\", \"not\" in the same place.\n // But \"not\" should be followed with a {...} rather than `[...]`.\n and: [\n { dimension: 'Year', '=': 2011 },\n { dimension: 'Price', '>=': 20, '<': 30 }\n ]\n }\n // The condition is \"Year\" is 2011 and \"Price\" is greater\n // or equal to 20 but less than 30.\n }\n }\n ],\n series: {\n type: 'pie',\n datasetIndex: 1\n }\n};\n```\n\n`and`/`or`/`not` can be nested like:\n\n```js\ntransform: {\n type: 'filter',\n config: {\n or: [{\n and: [{\n dimension: 'Price', '>=': 10, '<': 20\n }, {\n dimension: 'Sales', '<': 100\n }, {\n not: { dimension: 'Product', '=': 'Tofu' }\n }]\n }, {\n and: [{\n dimension: 'Price', '>=': 10, '<': 20\n }, {\n dimension: 'Sales', '<': 100\n }, {\n not: { dimension: 'Product', '=': 'Cake' }\n }]\n }]\n }\n}\n```\n\n**About parser:**\n\nSome \"parser\" can be specified when make value comparison. At present only supported:\n\n- `parser: 'time'`: Parse the value to date time before comparing. The parser rule is the same as `echarts.time.parse`, where JS `Date` instance, timestamp number (in millisecond) and time string (like `'2012-05-12 03:11:22'`) are supported to be parse to timestamp number, while other value will be parsed to `NaN`.\n- `parser: 'trim'`: Trim the string before making comparison. For non-string, return the original value.\n- `parser: 'number'`: Force to convert the value to number before making comparison. If not possible to be converted to a meaningful number, converted to `NaN`. In most cases it is not necessary, because by default the value will be auto converted to number if possible before making comparison. But the default conversion is strict while this parser provide a loose strategy. If we meet the case that number string with unit suffix (like `'33%'`, `12px`), we should use `parser: 'number'` to convert them to number before making comparison.\n\nThis is an example to show the `parser: 'time'`:\n\n```js\noption = {\n dataset: [\n {\n source: [\n ['Product', 'Sales', 'Price', 'Date'],\n ['Milk Tee', 311, 21, '2012-05-12'],\n ['Cake', 135, 28, '2012-05-22'],\n ['Latte', 262, 36, '2012-06-02'],\n ['Milk Tee', 359, 21, '2012-06-22'],\n ['Cake', 121, 28, '2012-07-02'],\n ['Latte', 271, 36, '2012-06-22']\n // ...\n ]\n },\n {\n transform: {\n type: 'filter',\n config: {\n dimension: 'Date',\n '>=': '2012-05',\n '<': '2012-06',\n parser: 'time'\n }\n }\n }\n ]\n};\n```\n\n**Formally definition:**\n\nFinally, we give the formally definition of the filter transform config here:\n\n```ts\ntype FilterTransform = {\n type: 'filter';\n config: ConditionalExpressionOption;\n};\ntype ConditionalExpressionOption =\n | true\n | false\n | RelationalExpressionOption\n | LogicalExpressionOption;\ntype RelationalExpressionOption = {\n dimension: DimensionName | DimensionIndex;\n parser?: 'time' | 'trim' | 'number';\n lt?: DataValue; // less than\n lte?: DataValue; // less than or equal\n gt?: DataValue; // greater than\n gte?: DataValue; // greater than or equal\n eq?: DataValue; // equal\n ne?: DataValue; // not equal\n '<'?: DataValue; // lt\n '<='?: DataValue; // lte\n '>'?: DataValue; // gt\n '>='?: DataValue; // gte\n '='?: DataValue; // eq\n '!='?: DataValue; // ne\n '<>'?: DataValue; // ne (SQL style)\n reg?: RegExp | string; // RegExp\n};\ntype LogicalExpressionOption = {\n and?: ConditionalExpressionOption[];\n or?: ConditionalExpressionOption[];\n not?: ConditionalExpressionOption;\n};\ntype DataValue = string | number | Date;\ntype DimensionName = string;\ntype DimensionIndex = number;\n```\n\n> Note that when using [Minimal Bundle](${lang}/basics/import#shrinking-bundle-size), if you need to use this built-in transform, besides the `Dataset` component, it's required to import the `Transform` component.\n\n```ts\nimport {\n DatasetComponent,\n TransformComponent\n} from 'echarts/components';\n\necharts.use([\n DatasetComponent,\n TransformComponent\n]);\n```\n\n## Sort Transform\n\nAnother built-in transform is \"sort\".\n\n```js\noption = {\n dataset: [\n {\n dimensions: ['name', 'age', 'profession', 'score', 'date'],\n source: [\n [' Hannah Krause ', 41, 'Engineer', 314, '2011-02-12'],\n ['Zhao Qian ', 20, 'Teacher', 351, '2011-03-01'],\n [' Jasmin Krause ', 52, 'Musician', 287, '2011-02-14'],\n ['Li Lei', 37, 'Teacher', 219, '2011-02-18'],\n [' Karle Neumann ', 25, 'Engineer', 253, '2011-04-02'],\n [' Adrian Groß', 19, 'Teacher', null, '2011-01-16'],\n ['Mia Neumann', 71, 'Engineer', 165, '2011-03-19'],\n [' Böhm Fuchs', 36, 'Musician', 318, '2011-02-24'],\n ['Han Meimei ', 67, 'Engineer', 366, '2011-03-12']\n ]\n },\n {\n transform: {\n type: 'sort',\n // Sort by score.\n config: { dimension: 'score', order: 'asc' }\n }\n }\n ],\n series: {\n type: 'bar',\n datasetIndex: 1\n }\n // ...\n};\n```\n\n<md-example src=\"data-transform-sort-bar\"></md-example>\n\nSome extra features about \"sort transform\":\n\n- Order by multiple dimensions is supported. See examples below.\n- The sort rule:\n - By default \"numeric\" (that is, number and numeric-string like `' 123 '`) are able to sorted by numeric order.\n - Otherwise \"non-numeric-string\" are also able to be ordered among themselves. This might help to the case like grouping data items with the same tag, especially when multiple dimensions participated in the sort (See example below).\n - When \"numeric\" is compared with \"non-numeric-string\", or either of them is compared with other types of value, they are not comparable. So we call the latter one as \"incomparable\" and treat it as \"min value\" or \"max value\" according to the prop `incomparable: 'min' | 'max'`. This feature usually helps to decide whether to put the empty values (like `null`, `undefined`, `NaN`, `''`, `'-'`) or other illegal values to the head or tail.\n- `parser: 'time' | 'trim' | 'number'` can be used, the same as \"filter transform\".\n - If intending to sort time values (JS `Date` instance or time string like `'2012-03-12 11:13:54'`), `parser: 'time'` should be specified. Like `config: { dimension: 'date', order: 'desc', parser: 'time' }`\n - If intending to sort values with unit suffix (like `'33%'`, `'16px'`), need to use `parser: 'number'`.\n\nSee an example of multiple order:\n\n```js\noption = {\n dataset: [\n {\n dimensions: ['name', 'age', 'profession', 'score', 'date'],\n source: [\n [' Hannah Krause ', 41, 'Engineer', 314, '2011-02-12'],\n ['Zhao Qian ', 20, 'Teacher', 351, '2011-03-01'],\n [' Jasmin Krause ', 52, 'Musician', 287, '2011-02-14'],\n ['Li Lei', 37, 'Teacher', 219, '2011-02-18'],\n [' Karle Neumann ', 25, 'Engineer', 253, '2011-04-02'],\n [' Adrian Groß', 19, 'Teacher', null, '2011-01-16'],\n ['Mia Neumann', 71, 'Engineer', 165, '2011-03-19'],\n [' Böhm Fuchs', 36, 'Musician', 318, '2011-02-24'],\n ['Han Meimei ', 67, 'Engineer', 366, '2011-03-12']\n ]\n },\n {\n transform: {\n type: 'sort',\n config: [\n // Sort by the two dimensions.\n { dimension: 'profession', order: 'desc' },\n { dimension: 'score', order: 'desc' }\n ]\n }\n }\n ],\n series: {\n type: 'bar',\n datasetIndex: 1\n }\n // ...\n};\n```\n\n<md-example src=\"doc-example/data-transform-multiple-sort-bar\"></md-example>\n\nFinally, we give the formally definition of the sort transform config here:\n\n```ts\ntype SortTransform = {\n type: 'sort';\n config: OrderExpression | OrderExpression[];\n};\ntype OrderExpression = {\n dimension: DimensionName | DimensionIndex;\n order: 'asc' | 'desc';\n incomparable?: 'min' | 'max';\n parser?: 'time' | 'trim' | 'number';\n};\ntype DimensionName = string;\ntype DimensionIndex = number;\n```\n\n> Note that when using [Minimal Bundle](${lang}/basics/import#shrinking-bundle-size), if you need to use this built-in transform, besides the `Dataset` component, it's required to import the `Transform` component.\n\n```ts\nimport {\n DatasetComponent,\n TransformComponent\n} from 'echarts/components';\n\necharts.use([\n DatasetComponent,\n TransformComponent\n]);\n```\n\n## Use External Transforms\n\nBesides built-in transforms (like 'filter', 'sort'), we can also use external transforms to provide more powerful functionalities. Here we use a third-party library [ecStat](https://github.com/ecomfe/echarts-stat) as an example:\n\nThis case show how to make a regression line via ecStat:\n\n```js\n// Register the external transform at first.\necharts.registerTransform(ecStatTransform(ecStat).regression);\n```\n\n```js\noption = {\n dataset: [\n {\n source: rawData\n },\n {\n transform: {\n // Reference the registered external transform.\n // Note that external transform has a namespace (like 'ecStat:xxx'\n // has namespace 'ecStat').\n // built-in transform (like 'filter', 'sort') does not have a namespace.\n type: 'ecStat:regression',\n config: {\n // Parameters needed by the external transform.\n method: 'exponential'\n }\n }\n }\n ],\n xAxis: { type: 'category' },\n yAxis: {},\n series: [\n {\n name: 'scatter',\n type: 'scatter',\n datasetIndex: 0\n },\n {\n name: 'regression',\n type: 'line',\n symbol: 'none',\n datasetIndex: 1\n }\n ]\n};\n```\n\nExamples with echarts-stat:\n\n- [Aggregate](${exampleEditorPath}data-transform-aggregate&edit=1&reset=1)\n- [Bar histogram](${exampleEditorPath}bar-histogram&edit=1&reset=1)\n- [Scatter clustering](${exampleEditorPath}scatter-clustering&edit=1&reset=1)\n- [Scatter linear regression](${exampleEditorPath}scatter-linear-regression&edit=1&reset=1)\n- [Scatter exponential regression](${exampleEditorPath}scatter-exponential-regression&edit=1&reset=1)\n- [Scatter logarithmic regression](${exampleEditorPath}scatter-logarithmic-regression&edit=1&reset=1)\n- [Scatter polynomial regression](${exampleEditorPath}scatter-polynomial-regression&edit=1&reset=1)\n"}}]);