Propositionalization: Robot sensor data

In this notebook, we compare getML's FastProp against well-known feature engineering libraries featuretools and tsfresh.

Summary:

  • Prediction type: Regression
  • Domain: Robotics
  • Prediction target: The force vector on the robot's arm
  • Population size: 15001

Background

A common approach to feature engineering is to generate attribute-value representations from relational data by applying a fixed set of aggregations to columns of interest and perform a feature selection on the (possibly large) set of generated features afterwards. In academia, this approach is called propositionalization.

getML's FastProp is an implementation of this propositionalization approach that has been optimized for speed and memory efficiency. In this notebook, we want to demonstrate how – well – fast FastProp is. To this end, we will benchmark FastProp against the popular feature engineering libraries featuretools and tsfresh. Both of these libraries use propositionalization approaches for feature engineering.

The data set has been generously provided by Erik Berger who originally collected it for his dissertation:

Berger, E. (2018). Behavior-Specific Proprioception Models for Robotic Force Estimation: A Machine Learning Approach. Freiberg, Germany: Technische Universitaet Bergakademie Freiberg.

A web frontend for getML

The getML monitor is a frontend built to support your work with getML. The getML monitor displays information such as the imported data frames, trained pipelines and allows easy data and feature exploration. You can launch the getML monitor here.

Analysis

We begin by importing the libraries and setting the project.

In [1]:
import datetime
import os
import sys
import time
from urllib import request

import getml
import getml.data as data
import getml.data.roles as roles
import getml.database as database
import getml.engine as engine
import getml.feature_learning.aggregations as agg
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from IPython.display import Image

plt.style.use("seaborn")
%matplotlib inline

getml.engine.launch()
getml.engine.set_project("robot")
getML engine is already running.



Connected to project 'robot'
http://localhost:1709/#/listprojects/robot/
In [2]:
sys.path.append(os.path.join(sys.path[0], ".."))

from utils import Benchmark, FTTimeSeriesBuilder, TSFreshBuilder

1. Loading data

1.1 Download from source

In [3]:
data_all = getml.data.DataFrame.from_csv(
    "https://static.getml.com/datasets/robotarm/robot-demo.csv", "data_all"
)
Downloading robot-demo.csv...
[========================================] 100%
In [4]:
data_all
Out[4]:
name 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 98 99 100 101 102 103 104 105 106 f_x f_y f_z
role unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float
0 3.4098 -0.3274 0.9604 -3.7436 -1.0191 -6.0205 0  0  0  0  0  0  0  0  0  0  0  0  8.38e-17 -4.8116 -1.4033 -0.1369 0.002472 0  9.803e-16 -55.642 -16.312 -1.2042 0.02167 0  3.4098 -0.3274 0.9605 -3.7437 -1.0191 -6.0205 0  0  0  0  0  0  0.1233 -6.5483 -2.8045 -0.8296 0.07625 -0.1906 0.1211 -6.5483 -2.8157 -0.8281 0.07015 -0.1983 0.7699 0.41 0.08279 -1.4094 0.786 -0.3682 0  0  0  0  0  0  -22.654 -11.503 -18.673 -3.5155 5.8354 -2.05 0.7699 0.41 0.08278 -1.4094 0.786 -0.3681 0  0  0  0  0  0  48.069 48.009 0.9668 47.834 47.925 47.818 47.834 47.955 47.971 -11.03 6.9 -7.33
1 3.4098 -0.3274 0.9604 -3.7436 -1.0191 -6.0205 0  0  0  0  0  0  0  0  0  0  0  0  8.38e-17 -4.8116 -1.4033 -0.1369 0.002472 0  9.803e-16 -55.642 -16.312 -1.2042 0.02167 0  3.4098 -0.3274 0.9604 -3.7437 -1.0191 -6.0205 0  0  0  0  0  0  0.1188 -6.5506 -2.8404 -0.8281 0.06405 -0.1998 0.1211 -6.5483 -2.8157 -0.8281 0.07015 -0.1983 0.7699 0.41 0.0828 -1.4094 0.7859 -0.3682 0  0  0  0  0  0  -21.627 -11.046 -18.66 -3.5395 5.7577 -1.9805 0.7699 0.41 0.08278 -1.4094 0.786 -0.3681 0  0  0  0  0  0  48.009 48.009 0.8594 47.834 47.925 47.818 47.834 47.955 47.971 -10.848 6.7218 -7.4427
2 3.4098 -0.3274 0.9604 -3.7436 -1.0191 -6.0205 0  0  0  0  0  0  0  0  0  0  0  0  8.38e-17 -4.8116 -1.4033 -0.1369 0.002472 0  9.803e-16 -55.642 -16.312 -1.2042 0.02167 0  3.4098 -0.3274 0.9605 -3.7437 -1.0191 -6.0205 0  0  0  0  0  0  0.1099 -6.5438 -2.8 -0.8205 0.07473 -0.183 0.1211 -6.5483 -2.8157 -0.8281 0.07015 -0.1922 0.7699 0.41 0.08279 -1.4094 0.7859 -0.3682 0  0  0  0  0  0  -23.843 -12.127 -18.393 -3.6453 5.978 -1.9978 0.7699 0.41 0.08278 -1.4094 0.786 -0.3681 0  0  0  0  0  0  48.009 48.069 0.931 47.879 47.925 47.818 47.834 47.955 47.971 -10.666 6.5436 -7.5555
3 3.4098 -0.3274 0.9604 -3.7436 -1.0191 -6.0205 0  0  0  0  0  0  0  0  0  0  0  0  8.38e-17 -4.8116 -1.4033 -0.1369 0.002472 0  9.803e-16 -55.642 -16.312 -1.2042 0.02167 0  3.4098 -0.3273 0.9604 -3.7437 -1.0191 -6.0205 0  0  0  0  0  0  0.1233 -6.5483 -2.8224 -0.8266 0.07168 -0.1998 0.1211 -6.5483 -2.8157 -0.8281 0.07015 -0.1967 0.7699 0.41 0.08275 -1.4094 0.786 -0.3681 0  0  0  0  0  0  -21.772 -10.872 -18.691 -3.5512 5.6648 -1.9976 0.7699 0.41 0.08278 -1.4094 0.786 -0.3681 0  0  0  0  0  0  48.069 48.069 0.931 47.879 47.925 47.818 47.834 47.955 47.971 -10.507 6.4533 -7.65
4 3.4098 -0.3274 0.9604 -3.7436 -1.0191 -6.0205 0  0  0  0  0  0  0  0  0  0  0  0  8.38e-17 -4.8116 -1.4033 -0.1369 0.002472 0  9.803e-16 -55.642 -16.312 -1.2042 0.02167 0  3.4098 -0.3274 0.9604 -3.7437 -1.0191 -6.0205 0  0  0  0  0  0  0.1255 -6.5394 -2.8 -0.8327 0.07473 -0.1952 0.1211 -6.5483 -2.8157 -0.8327 0.07015 -0.1922 0.7699 0.41 0.08278 -1.4094 0.786 -0.3681 0  0  0  0  0  0  -22.823 -11.645 -18.524 -3.5305 5.8712 -2.0096 0.7699 0.41 0.08278 -1.4094 0.786 -0.3681 0  0  0  0  0  0  48.069 48.069 0.8952 47.879 47.925 47.818 47.834 47.955 47.971 -10.413 6.6267 -7.69
...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ... 
14996 3.0837 -0.8836 1.4501 -2.2102 -1.559 -5.3265 -0.03151 -0.05375 0.04732 0.1482 -0.05218 0.06706 0.2969 0.5065 -0.4459 -1.3963 0.4916 -0.6319 -0.3694 -4.1879 -1.1847 -0.09441 -0.1568 0.1898 1.1605 -42.951 -19.023 -2.6343 0.1551 -0.1338 3.0836 -0.8836 1.4503 -2.2101 -1.5591 -5.3263 -0.03347 -0.05585 0.04805 0.151 -0.05513 0.07114 -0.3564 -6.0394 -2.3001 -0.2181 -0.1159 0.09608 -0.3632 -6.0394 -2.3023 -0.212 -0.125 0.1113 0.7116 0.06957 0.06036 -0.8506 2.9515 -0.03352 -0.03558 -0.03029 0.002444 -0.04208 0.1458 -0.1098 -0.8784 -0.07291 -37.584 0.0001132 -2.1031 0.03318 0.7117 0.0697 0.06044 -0.8511 2.951 -0.03356 -0.03508 -0.02849 0.001571 -0.03951 0.1442 -0.1036 48.069 48.009 0.8952 47.818 47.834 47.818 47.803 47.94 47.94 10.84 -1.41 16.14
14997 3.0835 -0.884 1.4505 -2.2091 -1.5594 -5.326 -0.02913 -0.0497 0.04376 0.137 -0.04825 0.062 0.2969 0.5065 -0.4459 -1.3963 0.4916 -0.6319 -0.3677 -4.1837 -1.1874 -0.09682 -0.1562 0.189 1.1592 -42.937 -19.023 -2.6331 0.1545 -0.1338 3.0833 -0.8841 1.4507 -2.209 -1.5596 -5.3258 -0.02909 -0.04989 0.04198 0.1481 -0.05465 0.06249 -0.3161 -6.1179 -2.253 -0.3752 -0.03965 0.08693 -0.3273 -6.1022 -2.2597 -0.366 -0.05033 0.0915 0.7114 0.06932 0.06039 -0.8497 2.953 -0.03359 -0.0335 -0.02723 0.001208 -0.04242 0.1428 -0.0967 -2.7137 0.8552 -38.514 -0.6088 -3.2383 -0.9666 0.7114 0.06948 0.06045 -0.8503 2.9525 -0.03359 -0.03246 -0.02633 0.001469 -0.03657 0.1333 -0.09571 48.009 48.009 0.8594 47.818 47.834 47.818 47.803 47.94 47.94 10.857 -1.52 15.943
14998 3.0833 -0.8844 1.4508 -2.208 -1.5598 -5.3256 -0.02676 -0.04565 0.04019 0.1258 -0.04431 0.05695 0.2969 0.5065 -0.4459 -1.3963 0.4916 -0.6319 -0.3659 -4.1797 -1.1901 -0.09922 -0.1555 0.1881 1.1579 -42.924 -19.023 -2.6321 0.154 -0.1338 3.0831 -0.8844 1.451 -2.2078 -1.56 -5.3253 -0.02776 -0.04382 0.03652 0.1295 -0.05064 0.04818 -0.343 -6.2569 -2.1566 -0.3035 0.00305 0.1434 -0.3385 -6.2322 -2.1589 -0.302 -0.00915 0.1571 0.7111 0.06912 0.06039 -0.849 2.9544 -0.0337 -0.02911 -0.02589 0.001292 -0.04046 0.1246 -0.08058 4.2749 1.0128 -36.412 -1.2811 -0.4296 -1.1013 0.7112 0.06928 0.06046 -0.8495 2.9538 -0.03362 -0.02984 -0.02417 0.001364 -0.03362 0.1224 -0.08786 48.009 48.009 0.931 47.818 47.834 47.818 47.803 47.94 47.94 10.89 -1.74 15.55
14999 3.0831 -0.8847 1.4511 -2.2071 -1.5602 -5.3251 -0.02438 -0.0416 0.03662 0.1147 -0.04038 0.0519 0.2969 0.5065 -0.4459 -1.3963 0.4916 -0.6319 -0.3642 -4.1758 -1.1928 -0.1016 -0.1548 0.1873 1.1568 -42.912 -19.023 -2.6311 0.1535 -0.1338 3.0829 -0.8848 1.4513 -2.2068 -1.5604 -5.3249 -0.02149 -0.04059 0.03417 0.1202 -0.0395 0.04178 -0.4237 -6.2703 -2.0939 -0.302 -0.01372 0.1739 -0.4125 -6.2569 -2.0916 -0.2943 -0.02898 0.1891 0.7109 0.06894 0.06039 -0.8484 2.9557 -0.03384 -0.02738 -0.01982 0.001031 -0.03028 0.1157 -0.06702 11.518 1.5002 -39.314 -1.8671 -0.3734 -0.5733 0.7109 0.06909 0.06047 -0.8488 2.955 -0.03364 -0.02721 -0.02201 0.001255 -0.03067 0.1115 -0.08003 48.009 48.009 0.931 47.818 47.834 47.818 47.803 47.94 47.94 11.29 -1.4601 15.743
15000 3.0829 -0.885 1.4514 -2.2062 -1.5605 -5.3247 -0.02201 -0.03755 0.03305 0.1035 -0.03645 0.04684 0.2969 0.5065 -0.4459 -1.3963 0.4916 -0.6319 -0.3624 -4.172 -1.1955 -0.1041 -0.1542 0.1864 1.1558 -42.901 -19.023 -2.6302 0.1531 -0.1338 3.0827 -0.8851 1.4516 -2.2059 -1.5607 -5.3246 -0.02096 -0.03808 0.02958 0.1171 -0.03289 0.03883 -0.417 -6.2434 -2.058 -0.4102 -0.04728 0.1967 -0.4237 -6.2367 -2.0714 -0.4163 -0.0671 0.2059 0.7107 0.06878 0.06041 -0.8478 2.9567 -0.03382 -0.02535 -0.01854 0.001614 -0.02421 0.11 -0.06304 15.099 2.936 -39.068 -1.9402 0.139 -0.2674 0.7107 0.06893 0.06048 -0.8482 2.9561 -0.03367 -0.02458 -0.01986 0.001142 -0.0277 0.1007 -0.07221 48.009 48.069 0.8952 47.818 47.834 47.818 47.803 47.94 47.955 11.69 -1.1801 15.937

15001 rows x 96 columns
memory usage: 11.52 MB
name: data_all
type: getml.DataFrame
url: http://localhost:1709/#/getdataframe/robot/data_all/

1.2 Prepare data

The force vector consists of three component (f_x, f_y and f_z), meaning that we have three targets. For this comparison, we only predict the first component (f_x).

Also, we want to speed things up a little, so we only use 10 columns. A previous analysis has revealed that the predictive power is mainly extracted from these 10 columns:

In [5]:
only_use = ["30", "34", "37", "38", "4", "59", "61", "7", "77", "78"]

data_all.set_role(["f_x"], getml.data.roles.target)
data_all.set_role(only_use, getml.data.roles.numerical)

This is what the data set looks like:

In [6]:
data_all
Out[6]:
name f_x 30 34 37 38 4 59 61 7 77 78 3 5 6 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 31 32 33 35 36 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 60 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 79 80 81 82 83 84 85 86 98 99 100 101 102 103 104 105 106 f_y f_z
role target numerical numerical numerical numerical numerical numerical numerical numerical numerical numerical unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float unused_float
0 -11.03 -1.2042 -0.3274 -1.0191 -6.0205 -0.3274 0.08279 0.786 -1.0191 0.08278 -1.4094 3.4098 0.9604 -3.7436 -6.0205 0  0  0  0  0  0  0  0  0  0  0  0  8.38e-17 -4.8116 -1.4033 -0.1369 0.002472 0  9.803e-16 -55.642 -16.312 0.02167 0  3.4098 0.9605 -3.7437 0  0  0  0  0  0  0.1233 -6.5483 -2.8045 -0.8296 0.07625 -0.1906 0.1211 -6.5483 -2.8157 -0.8281 0.07015 -0.1983 0.7699 0.41 -1.4094 -0.3682 0  0  0  0  0  0  -22.654 -11.503 -18.673 -3.5155 5.8354 -2.05 0.7699 0.41 0.786 -0.3681 0  0  0  0  0  0  48.069 48.009 0.9668 47.834 47.925 47.818 47.834 47.955 47.971 6.9 -7.33
1 -10.848 -1.2042 -0.3274 -1.0191 -6.0205 -0.3274 0.0828 0.7859 -1.0191 0.08278 -1.4094 3.4098 0.9604 -3.7436 -6.0205 0  0  0  0  0  0  0  0  0  0  0  0  8.38e-17 -4.8116 -1.4033 -0.1369 0.002472 0  9.803e-16 -55.642 -16.312 0.02167 0  3.4098 0.9604 -3.7437 0  0  0  0  0  0  0.1188 -6.5506 -2.8404 -0.8281 0.06405 -0.1998 0.1211 -6.5483 -2.8157 -0.8281 0.07015 -0.1983 0.7699 0.41 -1.4094 -0.3682 0  0  0  0  0  0  -21.627 -11.046 -18.66 -3.5395 5.7577 -1.9805 0.7699 0.41 0.786 -0.3681 0  0  0  0  0  0  48.009 48.009 0.8594 47.834 47.925 47.818 47.834 47.955 47.971 6.7218 -7.4427
2 -10.666 -1.2042 -0.3274 -1.0191 -6.0205 -0.3274 0.08279 0.7859 -1.0191 0.08278 -1.4094 3.4098 0.9604 -3.7436 -6.0205 0  0  0  0  0  0  0  0  0  0  0  0  8.38e-17 -4.8116 -1.4033 -0.1369 0.002472 0  9.803e-16 -55.642 -16.312 0.02167 0  3.4098 0.9605 -3.7437 0  0  0  0  0  0  0.1099 -6.5438 -2.8 -0.8205 0.07473 -0.183 0.1211 -6.5483 -2.8157 -0.8281 0.07015 -0.1922 0.7699 0.41 -1.4094 -0.3682 0  0  0  0  0  0  -23.843 -12.127 -18.393 -3.6453 5.978 -1.9978 0.7699 0.41 0.786 -0.3681 0  0  0  0  0  0  48.009 48.069 0.931 47.879 47.925 47.818 47.834 47.955 47.971 6.5436 -7.5555
3 -10.507 -1.2042 -0.3273 -1.0191 -6.0205 -0.3274 0.08275 0.786 -1.0191 0.08278 -1.4094 3.4098 0.9604 -3.7436 -6.0205 0  0  0  0  0  0  0  0  0  0  0  0  8.38e-17 -4.8116 -1.4033 -0.1369 0.002472 0  9.803e-16 -55.642 -16.312 0.02167 0  3.4098 0.9604 -3.7437 0  0  0  0  0  0  0.1233 -6.5483 -2.8224 -0.8266 0.07168 -0.1998 0.1211 -6.5483 -2.8157 -0.8281 0.07015 -0.1967 0.7699 0.41 -1.4094 -0.3681 0  0  0  0  0  0  -21.772 -10.872 -18.691 -3.5512 5.6648 -1.9976 0.7699 0.41 0.786 -0.3681 0  0  0  0  0  0  48.069 48.069 0.931 47.879 47.925 47.818 47.834 47.955 47.971 6.4533 -7.65
4 -10.413 -1.2042 -0.3274 -1.0191 -6.0205 -0.3274 0.08278 0.786 -1.0191 0.08278 -1.4094 3.4098 0.9604 -3.7436 -6.0205 0  0  0  0  0  0  0  0  0  0  0  0  8.38e-17 -4.8116 -1.4033 -0.1369 0.002472 0  9.803e-16 -55.642 -16.312 0.02167 0  3.4098 0.9604 -3.7437 0  0  0  0  0  0  0.1255 -6.5394 -2.8 -0.8327 0.07473 -0.1952 0.1211 -6.5483 -2.8157 -0.8327 0.07015 -0.1922 0.7699 0.41 -1.4094 -0.3681 0  0  0  0  0  0  -22.823 -11.645 -18.524 -3.5305 5.8712 -2.0096 0.7699 0.41 0.786 -0.3681 0  0  0  0  0  0  48.069 48.069 0.8952 47.879 47.925 47.818 47.834 47.955 47.971 6.6267 -7.69
...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ... 
14996 10.84 -2.6343 -0.8836 -1.5591 -5.3263 -0.8836 0.06036 2.9515 -1.559 0.06044 -0.8511 3.0837 1.4501 -2.2102 -5.3265 -0.03151 -0.05375 0.04732 0.1482 -0.05218 0.06706 0.2969 0.5065 -0.4459 -1.3963 0.4916 -0.6319 -0.3694 -4.1879 -1.1847 -0.09441 -0.1568 0.1898 1.1605 -42.951 -19.023 0.1551 -0.1338 3.0836 1.4503 -2.2101 -0.03347 -0.05585 0.04805 0.151 -0.05513 0.07114 -0.3564 -6.0394 -2.3001 -0.2181 -0.1159 0.09608 -0.3632 -6.0394 -2.3023 -0.212 -0.125 0.1113 0.7116 0.06957 -0.8506 -0.03352 -0.03558 -0.03029 0.002444 -0.04208 0.1458 -0.1098 -0.8784 -0.07291 -37.584 0.0001132 -2.1031 0.03318 0.7117 0.0697 2.951 -0.03356 -0.03508 -0.02849 0.001571 -0.03951 0.1442 -0.1036 48.069 48.009 0.8952 47.818 47.834 47.818 47.803 47.94 47.94 -1.41 16.14
14997 10.857 -2.6331 -0.8841 -1.5596 -5.3258 -0.884 0.06039 2.953 -1.5594 0.06045 -0.8503 3.0835 1.4505 -2.2091 -5.326 -0.02913 -0.0497 0.04376 0.137 -0.04825 0.062 0.2969 0.5065 -0.4459 -1.3963 0.4916 -0.6319 -0.3677 -4.1837 -1.1874 -0.09682 -0.1562 0.189 1.1592 -42.937 -19.023 0.1545 -0.1338 3.0833 1.4507 -2.209 -0.02909 -0.04989 0.04198 0.1481 -0.05465 0.06249 -0.3161 -6.1179 -2.253 -0.3752 -0.03965 0.08693 -0.3273 -6.1022 -2.2597 -0.366 -0.05033 0.0915 0.7114 0.06932 -0.8497 -0.03359 -0.0335 -0.02723 0.001208 -0.04242 0.1428 -0.0967 -2.7137 0.8552 -38.514 -0.6088 -3.2383 -0.9666 0.7114 0.06948 2.9525 -0.03359 -0.03246 -0.02633 0.001469 -0.03657 0.1333 -0.09571 48.009 48.009 0.8594 47.818 47.834 47.818 47.803 47.94 47.94 -1.52 15.943
14998 10.89 -2.6321 -0.8844 -1.56 -5.3253 -0.8844 0.06039 2.9544 -1.5598 0.06046 -0.8495 3.0833 1.4508 -2.208 -5.3256 -0.02676 -0.04565 0.04019 0.1258 -0.04431 0.05695 0.2969 0.5065 -0.4459 -1.3963 0.4916 -0.6319 -0.3659 -4.1797 -1.1901 -0.09922 -0.1555 0.1881 1.1579 -42.924 -19.023 0.154 -0.1338 3.0831 1.451 -2.2078 -0.02776 -0.04382 0.03652 0.1295 -0.05064 0.04818 -0.343 -6.2569 -2.1566 -0.3035 0.00305 0.1434 -0.3385 -6.2322 -2.1589 -0.302 -0.00915 0.1571 0.7111 0.06912 -0.849 -0.0337 -0.02911 -0.02589 0.001292 -0.04046 0.1246 -0.08058 4.2749 1.0128 -36.412 -1.2811 -0.4296 -1.1013 0.7112 0.06928 2.9538 -0.03362 -0.02984 -0.02417 0.001364 -0.03362 0.1224 -0.08786 48.009 48.009 0.931 47.818 47.834 47.818 47.803 47.94 47.94 -1.74 15.55
14999 11.29 -2.6311 -0.8848 -1.5604 -5.3249 -0.8847 0.06039 2.9557 -1.5602 0.06047 -0.8488 3.0831 1.4511 -2.2071 -5.3251 -0.02438 -0.0416 0.03662 0.1147 -0.04038 0.0519 0.2969 0.5065 -0.4459 -1.3963 0.4916 -0.6319 -0.3642 -4.1758 -1.1928 -0.1016 -0.1548 0.1873 1.1568 -42.912 -19.023 0.1535 -0.1338 3.0829 1.4513 -2.2068 -0.02149 -0.04059 0.03417 0.1202 -0.0395 0.04178 -0.4237 -6.2703 -2.0939 -0.302 -0.01372 0.1739 -0.4125 -6.2569 -2.0916 -0.2943 -0.02898 0.1891 0.7109 0.06894 -0.8484 -0.03384 -0.02738 -0.01982 0.001031 -0.03028 0.1157 -0.06702 11.518 1.5002 -39.314 -1.8671 -0.3734 -0.5733 0.7109 0.06909 2.955 -0.03364 -0.02721 -0.02201 0.001255 -0.03067 0.1115 -0.08003 48.009 48.009 0.931 47.818 47.834 47.818 47.803 47.94 47.94 -1.4601 15.743
15000 11.69 -2.6302 -0.8851 -1.5607 -5.3246 -0.885 0.06041 2.9567 -1.5605 0.06048 -0.8482 3.0829 1.4514 -2.2062 -5.3247 -0.02201 -0.03755 0.03305 0.1035 -0.03645 0.04684 0.2969 0.5065 -0.4459 -1.3963 0.4916 -0.6319 -0.3624 -4.172 -1.1955 -0.1041 -0.1542 0.1864 1.1558 -42.901 -19.023 0.1531 -0.1338 3.0827 1.4516 -2.2059 -0.02096 -0.03808 0.02958 0.1171 -0.03289 0.03883 -0.417 -6.2434 -2.058 -0.4102 -0.04728 0.1967 -0.4237 -6.2367 -2.0714 -0.4163 -0.0671 0.2059 0.7107 0.06878 -0.8478 -0.03382 -0.02535 -0.01854 0.001614 -0.02421 0.11 -0.06304 15.099 2.936 -39.068 -1.9402 0.139 -0.2674 0.7107 0.06893 2.9561 -0.03367 -0.02458 -0.01986 0.001142 -0.0277 0.1007 -0.07221 48.009 48.069 0.8952 47.818 47.834 47.818 47.803 47.94 47.955 -1.1801 15.937

15001 rows x 96 columns
memory usage: 11.52 MB
name: data_all
type: getml.DataFrame
url: http://localhost:1709/#/getdataframe/robot/data_all/

1.3 Separate data into a training and testing set

We also want to separate the data set into a training and testing set. We do so by using the first 10,500 measurements for training and then using the remainder for testing.

In [7]:
split = getml.data.split.time(data_all, "rowid", test=10500)
split
Out[7]:
0 train
1 train
2 train
3 train
4 train
...

15001 rows
type: StringColumnView

In [8]:
time_series = getml.data.TimeSeries(
    population=data_all,
    split=split,
    time_stamps="rowid",
    lagged_targets=False,
    memory=30,
)

time_series
Out[8]:

data model

diagram


data_allpopulationrowid <= rowidMemory: 30 time steps

staging

data frames staging table
0 population POPULATION__STAGING_TABLE_1
1 data_all DATA_ALL__STAGING_TABLE_2

container

population

subset name rows type
0 test data_all 4501 View
1 train data_all 10500 View

peripheral

name rows type
0 data_all 15001 View

2. Predictive modeling

2.1 Propositionalization with getML's FastProp

In [9]:
fast_prop = getml.feature_learning.FastProp(
    loss_function=getml.feature_learning.loss_functions.SquareLoss,
)
In [10]:
pipe_fp_fl = getml.pipeline.Pipeline(
    data_model=time_series.data_model,
    feature_learners=[fast_prop],
    tags=["feature learning", "fastprop"],
)
In [11]:
pipe_fp_fl.check(time_series.train)
Checking data model...


Staging...
[========================================] 100%

Checking...
[========================================] 100%


OK.
In [12]:
benchmark = Benchmark()
In [13]:
with benchmark("fastprop"):
    pipe_fp_fl.fit(time_series.train)
    fastprop_train = pipe_fp_fl.transform(time_series.train, df_name="fastprop_train")
Checking data model...


Staging...
[========================================] 100%


OK.


Staging...
[========================================] 100%

FastProp: Trying 134 features...
[========================================] 100%


Trained pipeline.
Time taken: 0h:0m:0.043236



Staging...
[========================================] 100%

FastProp: Building features...
[========================================] 100%


In [14]:
fastprop_test = pipe_fp_fl.transform(time_series.test, df_name="fastprop_test")

Staging...
[========================================] 100%

FastProp: Building features...
[========================================] 100%


In [15]:
predictor = getml.predictors.XGBoostRegressor()

pipe_fp_pr = getml.pipeline.Pipeline(
    tags=["prediction", "fastprop"], predictors=[predictor]
)
In [16]:
pipe_fp_pr.check(fastprop_train)
Checking data model...


Staging...
[========================================] 100%

Checking...
[========================================] 100%


WARNING [COLUMN SHOULD BE UNUSED]: All non-NULL entries in column 'feature_1_123' in POPULATION__STAGING_TABLE_1 are equal to each other. You should consider setting its role to unused_float or using it for comparison only (you can do the latter by setting a unit that contains 'comparison only').
WARNING [COLUMN SHOULD BE UNUSED]: All non-NULL entries in column 'feature_1_125' in POPULATION__STAGING_TABLE_1 are equal to each other. You should consider setting its role to unused_float or using it for comparison only (you can do the latter by setting a unit that contains 'comparison only').
WARNING [COLUMN SHOULD BE UNUSED]: All non-NULL entries in column 'feature_1_128' in POPULATION__STAGING_TABLE_1 are equal to each other. You should consider setting its role to unused_float or using it for comparison only (you can do the latter by setting a unit that contains 'comparison only').
WARNING [COLUMN SHOULD BE UNUSED]: All non-NULL entries in column 'feature_1_129' in POPULATION__STAGING_TABLE_1 are equal to each other. You should consider setting its role to unused_float or using it for comparison only (you can do the latter by setting a unit that contains 'comparison only').
WARNING [COLUMN SHOULD BE UNUSED]: All non-NULL entries in column 'feature_1_132' in POPULATION__STAGING_TABLE_1 are equal to each other. You should consider setting its role to unused_float or using it for comparison only (you can do the latter by setting a unit that contains 'comparison only').
In [17]:
pipe_fp_pr.fit(fastprop_train)
Checking data model...


Staging...
[========================================] 100%


WARNING [COLUMN SHOULD BE UNUSED]: All non-NULL entries in column 'feature_1_123' in POPULATION__STAGING_TABLE_1 are equal to each other. You should consider setting its role to unused_float or using it for comparison only (you can do the latter by setting a unit that contains 'comparison only').
WARNING [COLUMN SHOULD BE UNUSED]: All non-NULL entries in column 'feature_1_125' in POPULATION__STAGING_TABLE_1 are equal to each other. You should consider setting its role to unused_float or using it for comparison only (you can do the latter by setting a unit that contains 'comparison only').
WARNING [COLUMN SHOULD BE UNUSED]: All non-NULL entries in column 'feature_1_128' in POPULATION__STAGING_TABLE_1 are equal to each other. You should consider setting its role to unused_float or using it for comparison only (you can do the latter by setting a unit that contains 'comparison only').
WARNING [COLUMN SHOULD BE UNUSED]: All non-NULL entries in column 'feature_1_129' in POPULATION__STAGING_TABLE_1 are equal to each other. You should consider setting its role to unused_float or using it for comparison only (you can do the latter by setting a unit that contains 'comparison only').
WARNING [COLUMN SHOULD BE UNUSED]: All non-NULL entries in column 'feature_1_132' in POPULATION__STAGING_TABLE_1 are equal to each other. You should consider setting its role to unused_float or using it for comparison only (you can do the latter by setting a unit that contains 'comparison only').


Staging...
[========================================] 100%

XGBoost: Training as predictor...
[========================================] 100%


Trained pipeline.
Time taken: 0h:0m:9.341887

Out[17]:
Pipeline(data_model='population',
         feature_learners=[],
         feature_selectors=[],
         include_categorical=False,
         loss_function=None,
         peripheral=[],
         predictors=['XGBoostRegressor'],
         preprocessors=[],
         share_selected_features=0.5,
         tags=['prediction', 'fastprop'])

url: http://localhost:1709/#/getpipeline/robot/1MiGNS/0/
In [18]:
pipe_fp_pr.score(fastprop_test)

Staging...
[========================================] 100%


Out[18]:
date time set used target mae rmse rsquared
0 2022-03-30 01:14:22 fastprop_train f_x 0.4383 0.5764 0.9963
1 2022-03-30 01:14:22 fastprop_test f_x 0.5515 0.7236 0.9951

2.2 Propositionalization with featuretools

In [19]:
data_train = time_series.train.population.to_df("data_train")
data_test = time_series.test.population.to_df("data_test")
In [20]:
dfs_pandas = {}

for df in [data_train, data_test, data_all]:
    dfs_pandas[df.name] = df.to_pandas()
    delete_columns = [
        col for col in dfs_pandas[df.name].columns if col not in only_use + ["f_x"]
    ]
    for col in delete_columns:
        del dfs_pandas[df.name][col]
    dfs_pandas[df.name]["id"] = 1
    dfs_pandas[df.name]["ds"] = pd.to_datetime(
        np.arange(0, dfs_pandas[df.name].shape[0]), unit="s"
    )
In [21]:
dfs_pandas["data_train"]
Out[21]:
30 34 37 38 4 59 61 7 77 78 f_x id ds
0 -1.2042 -0.32739 -1.0191 -6.0205 -0.32737 0.082791 0.78597 -1.0191 0.082782 -1.4094 -11.0300 1 1970-01-01 00:00:00
1 -1.2042 -0.32739 -1.0191 -6.0205 -0.32737 0.082800 0.78592 -1.0191 0.082782 -1.4094 -10.8480 1 1970-01-01 00:00:01
2 -1.2042 -0.32737 -1.0191 -6.0205 -0.32737 0.082786 0.78594 -1.0191 0.082782 -1.4094 -10.6660 1 1970-01-01 00:00:02
3 -1.2042 -0.32734 -1.0191 -6.0205 -0.32737 0.082755 0.78599 -1.0191 0.082782 -1.4094 -10.5070 1 1970-01-01 00:00:03
4 -1.2042 -0.32736 -1.0191 -6.0205 -0.32737 0.082782 0.78597 -1.0191 0.082782 -1.4094 -10.4130 1 1970-01-01 00:00:04
... ... ... ... ... ... ... ... ... ... ... ... ... ...
10495 -1.1446 -0.37311 -1.0486 -5.9532 -0.37326 0.087343 0.90793 -1.0488 0.087468 -1.4162 -9.7673 1 1970-01-01 02:54:55
10496 -1.1349 -0.37103 -1.0472 -5.9564 -0.37108 0.087241 0.90199 -1.0474 0.087274 -1.4160 -9.9200 1 1970-01-01 02:54:56
10497 -1.1255 -0.36889 -1.0458 -5.9596 -0.36896 0.087055 0.89618 -1.0460 0.087082 -1.4158 -9.7743 1 1970-01-01 02:54:57
10498 -1.1163 -0.36680 -1.0444 -5.9627 -0.36689 0.086907 0.89034 -1.0447 0.086893 -1.4155 -8.6109 1 1970-01-01 02:54:58
10499 -1.1072 -0.36477 -1.0430 -5.9657 -0.36487 0.086720 0.88476 -1.0434 0.086706 -1.4153 -8.4345 1 1970-01-01 02:54:59

10500 rows × 13 columns

In [22]:
ft_builder = FTTimeSeriesBuilder(
    num_features=200,
    horizon=pd.Timedelta(seconds=0),
    memory=pd.Timedelta(seconds=15),
    column_id="id",
    time_stamp="ds",
    target="f_x",
)
In [23]:
with benchmark("featuretools"):
    featuretools_train = ft_builder.fit(dfs_pandas["data_train"])

featuretools_test = ft_builder.transform(dfs_pandas["data_test"])
featuretools: Trying features...
/usr/local/lib/python3.9/dist-packages/featuretools/synthesis/dfs.py:309: UnusedPrimitiveWarning: Some specified primitives were not used during DFS:
  agg_primitives: ['all', 'any', 'count', 'num_true', 'percent_true']
This may be caused by a using a value of max_depth that is too small, not setting interesting values, or it may indicate no compatible columns for the primitive were found in the data.
  warnings.warn(warning_msg, UnusedPrimitiveWarning)
Selecting the best out of 158 features...
Time taken: 0h:5m:36.013785

/usr/local/lib/python3.9/dist-packages/featuretools/synthesis/dfs.py:309: UnusedPrimitiveWarning: Some specified primitives were not used during DFS:
  agg_primitives: ['all', 'any', 'count', 'num_true', 'percent_true']
This may be caused by a using a value of max_depth that is too small, not setting interesting values, or it may indicate no compatible columns for the primitive were found in the data.
  warnings.warn(warning_msg, UnusedPrimitiveWarning)
In [24]:
featuretools_train
Out[24]:
MIN(peripheral.37) MIN(peripheral.7) FIRST(peripheral.37) FIRST(peripheral.7) MEDIAN(peripheral.37) MEDIAN(peripheral.7) MEAN(peripheral.37) MEAN(peripheral.7) SUM(peripheral.37) SUM(peripheral.7) ... TREND(peripheral.38, ds) TREND(peripheral.59, ds) TREND(peripheral.30, ds) TREND(peripheral.4, ds) TREND(peripheral.78, ds) TREND(peripheral.34, ds) TREND(peripheral.77, ds) f_x id ds
_featuretools_index
0 -1.0191 -1.0191 -1.0191 -1.0191 -1.0191 -1.0191 -1.019100 -1.019100 -1.0191 -1.0191 ... 0.000000e+00 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -11.0300 1 1970-01-01 00:00:00
1 -1.0191 -1.0191 -1.0191 -1.0191 -1.0191 -1.0191 -1.019100 -1.019100 -2.0382 -2.0382 ... 0.000000e+00 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -10.8480 1 1970-01-01 00:00:01
2 -1.0191 -1.0191 -1.0191 -1.0191 -1.0191 -1.0191 -1.019100 -1.019100 -3.0573 -3.0573 ... -1.204867e-26 -0.216000 0.000000 0.000000 0.000000 0.864000 0.000000 -10.6660 1 1970-01-01 00:00:02
3 -1.0191 -1.0191 -1.0191 -1.0191 -1.0191 -1.0191 -1.019100 -1.019100 -4.0764 -4.0764 ... 0.000000e+00 -1.054080 0.000000 0.000000 0.000000 1.468800 0.000000 -10.5070 1 1970-01-01 00:00:03
4 -1.0191 -1.0191 -1.0191 -1.0191 -1.0191 -1.0191 -1.019100 -1.019100 -5.0955 -5.0955 ... 0.000000e+00 -0.544320 0.000000 0.000000 0.000000 0.950400 0.000000 -10.4130 1 1970-01-01 00:00:04
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
10495 -1.0716 -1.0721 -1.0716 -1.0721 -1.0596 -1.0596 -1.059793 -1.059927 -15.8969 -15.8989 ... -3.789643e-03 -0.000183 0.011060 0.002575 0.000191 0.002561 -0.000201 -9.7673 1 1970-01-01 02:54:55
10496 -1.0698 -1.0702 -1.0698 -1.0702 -1.0580 -1.0580 -1.058167 -1.058280 -15.8725 -15.8742 ... -3.708571e-03 -0.000179 0.010880 0.002522 0.000198 0.002506 -0.000201 -9.9200 1 1970-01-01 02:54:56
10497 -1.0681 -1.0684 -1.0681 -1.0684 -1.0565 -1.0563 -1.056567 -1.056667 -15.8485 -15.8500 ... -3.629286e-03 -0.000177 0.010697 0.002469 0.000206 0.002450 -0.000200 -9.7743 1 1970-01-01 02:54:57
10498 -1.0662 -1.0665 -1.0662 -1.0665 -1.0549 -1.0548 -1.054987 -1.055087 -15.8248 -15.8263 ... -3.550357e-03 -0.000176 0.010510 0.002416 0.000214 0.002398 -0.000199 -8.6109 1 1970-01-01 02:54:58
10499 -1.0644 -1.0648 -1.0644 -1.0648 -1.0534 -1.0532 -1.053440 -1.053547 -15.8016 -15.8032 ... -3.473214e-03 -0.000175 0.010321 0.002363 0.000219 0.002347 -0.000198 -8.4345 1 1970-01-01 02:54:59

10500 rows × 116 columns

In [25]:
roles = {
    getml.data.roles.target: ["f_x"],
    getml.data.roles.join_key: ["id"],
    getml.data.roles.time_stamp: ["ds"],
}

df_featuretools_train = getml.data.DataFrame.from_pandas(
    featuretools_train, name="featuretools_train", roles=roles
)

df_featuretools_test = getml.data.DataFrame.from_pandas(
    featuretools_test, name="featuretools_test", roles=roles
)
In [26]:
df_featuretools_train.set_role(
    df_featuretools_train.roles.unused, getml.data.roles.numerical
)

df_featuretools_test.set_role(
    df_featuretools_test.roles.unused, getml.data.roles.numerical
)
In [27]:
predictor = getml.predictors.XGBoostRegressor()

pipe_ft_pr = getml.pipeline.Pipeline(
    tags=["prediction", "featuretools"], predictors=[predictor]
)

pipe_ft_pr
Out[27]:
Pipeline(data_model='population',
         feature_learners=[],
         feature_selectors=[],
         include_categorical=False,
         loss_function=None,
         peripheral=[],
         predictors=['XGBoostRegressor'],
         preprocessors=[],
         share_selected_features=0.5,
         tags=['prediction', 'featuretools'])
In [28]:
pipe_ft_pr.fit(df_featuretools_train)
Checking data model...


Staging...
[========================================] 100%

Checking...
[========================================] 100%


OK.


Staging...
[========================================] 100%

XGBoost: Training as predictor...
[========================================] 100%


Trained pipeline.
Time taken: 0h:0m:8.368015

Out[28]:
Pipeline(data_model='population',
         feature_learners=[],
         feature_selectors=[],
         include_categorical=False,
         loss_function=None,
         peripheral=[],
         predictors=['XGBoostRegressor'],
         preprocessors=[],
         share_selected_features=0.5,
         tags=['prediction', 'featuretools'])

url: http://localhost:1709/#/getpipeline/robot/aJco9c/0/
In [29]:
pipe_ft_pr.score(df_featuretools_test)

Staging...
[========================================] 100%


Out[29]:
date time set used target mae rmse rsquared
0 2022-03-30 01:22:36 featuretools_train f_x 0.4312 0.5714 0.9963
1 2022-03-30 01:22:36 featuretools_test f_x 0.5658 0.7453 0.9948

2.3 Propositionalization with tsfresh

In [30]:
tsfresh_builder = TSFreshBuilder(
    num_features=200,
    memory=15,
    column_id="id",
    time_stamp="ds",
    target="f_x",
)
In [31]:
with benchmark("tsfresh"):
    tsfresh_train = tsfresh_builder.fit(dfs_pandas["data_train"])

tsfresh_test = tsfresh_builder.transform(dfs_pandas["data_test"])
Rolling: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:06<00:00,  9.44it/s]
Feature Extraction: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:19<00:00,  3.03it/s]
Feature Extraction: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:18<00:00,  3.24it/s]
Selecting the best out of 130 features...
Time taken: 0h:0m:51.376822

Rolling: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:02<00:00, 22.10it/s]
Feature Extraction: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:08<00:00,  7.04it/s]
Feature Extraction: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:07<00:00,  7.68it/s]
In [32]:
roles = {
    getml.data.roles.target: ["f_x"],
    getml.data.roles.join_key: ["id"],
    getml.data.roles.time_stamp: ["ds"],
}

df_tsfresh_train = getml.data.DataFrame.from_pandas(
    tsfresh_train, name="tsfresh_train", roles=roles
)

df_tsfresh_test = getml.data.DataFrame.from_pandas(
    tsfresh_test, name="tsfresh_test", roles=roles
)
In [33]:
df_tsfresh_train.set_role(df_tsfresh_train.roles.unused, getml.data.roles.numerical)

df_tsfresh_test.set_role(df_tsfresh_test.roles.unused, getml.data.roles.numerical)
In [34]:
pipe_tsf_pr = getml.pipeline.Pipeline(
    tags=["predicition", "tsfresh"], predictors=[predictor]
)

pipe_tsf_pr
Out[34]:
Pipeline(data_model='population',
         feature_learners=[],
         feature_selectors=[],
         include_categorical=False,
         loss_function=None,
         peripheral=[],
         predictors=['XGBoostRegressor'],
         preprocessors=[],
         share_selected_features=0.5,
         tags=['predicition', 'tsfresh'])
In [35]:
pipe_tsf_pr.check(df_tsfresh_train)
Checking data model...


Staging...
[========================================] 100%

Checking...
[========================================] 100%


OK.
In [36]:
pipe_tsf_pr.fit(df_tsfresh_train)
Checking data model...


Staging...
[========================================] 100%


OK.


Staging...
[========================================] 100%

XGBoost: Training as predictor...
[========================================] 100%


Trained pipeline.
Time taken: 0h:0m:9.285027

Out[36]:
Pipeline(data_model='population',
         feature_learners=[],
         feature_selectors=[],
         include_categorical=False,
         loss_function=None,
         peripheral=[],
         predictors=['XGBoostRegressor'],
         preprocessors=[],
         share_selected_features=0.5,
         tags=['predicition', 'tsfresh'])

url: http://localhost:1709/#/getpipeline/robot/EHdkUY/0/
In [37]:
pipe_tsf_pr.score(df_tsfresh_test)

Staging...
[========================================] 100%


Out[37]:
date time set used target mae rmse rsquared
0 2022-03-30 01:24:04 tsfresh_train f_x 0.4916 0.6636 0.9951
1 2022-03-30 01:24:04 tsfresh_test f_x 0.5986 0.7906 0.9938

3. Comparison

In [38]:
num_features = dict(
    fastprop=134,
    featuretools=158,
    tsfresh=120,
)

runtime_per_feature = [
    benchmark.runtimes["fastprop"] / num_features["fastprop"],
    benchmark.runtimes["featuretools"] / num_features["featuretools"],
    benchmark.runtimes["tsfresh"] / num_features["tsfresh"],
]

features_per_second = [1.0 / r.total_seconds() for r in runtime_per_feature]

normalized_runtime_per_feature = [
    r / runtime_per_feature[0] for r in runtime_per_feature
]

comparison = pd.DataFrame(
    dict(
        runtime=[
            benchmark.runtimes["fastprop"],
            benchmark.runtimes["featuretools"],
            benchmark.runtimes["tsfresh"],
        ],
        num_features=num_features.values(),
        features_per_second=features_per_second,
        normalized_runtime=[
            1,
            benchmark.runtimes["featuretools"] / benchmark.runtimes["fastprop"],
            benchmark.runtimes["tsfresh"] / benchmark.runtimes["fastprop"],
        ],
        normalized_runtime_per_feature=normalized_runtime_per_feature,
        rsquared=[pipe_fp_pr.rsquared, pipe_ft_pr.rsquared, pipe_tsf_pr.rsquared],
        rmse=[pipe_fp_pr.rmse, pipe_ft_pr.rmse, pipe_tsf_pr.rmse],
        mae=[pipe_fp_pr.mae, pipe_ft_pr.mae, pipe_tsf_pr.mae],
    )
)

comparison.index = ["getML: FastProp", "featuretools", "tsfresh"]
In [39]:
comparison
Out[39]:
runtime num_features features_per_second normalized_runtime normalized_runtime_per_feature rsquared rmse mae
getML: FastProp 0 days 00:00:00.667496 134 200.762899 1.000000 1.000000 0.995059 0.723622 0.551521
featuretools 0 days 00:05:36.014673 158 0.470218 503.395785 426.957438 0.994800 0.745286 0.565795
tsfresh 0 days 00:00:51.376962 120 2.335679 76.969693 85.954828 0.993836 0.790602 0.598600
In [40]:
comparison.to_csv("comparisons/robot.csv")

Why is FastProp so fast?

First, FastProp hugely benefits from getML's custom-built C++-native in-memory database engine. The engine is highly optimized for working with relational data structures and makes use of information about the relational structure of the data to efficiently store and carry out computations on such data. This matters in particular for time series where we relate the current observation to a certain number of observations from the past: Other libraries have to deal explicitly with this inherent structure of (multivariate) time series; and such explicit transformations are costly, in terms of consumption of both, memory and computational resources. All operations on data stored in getML's engine benefit from implementations in modern C++. Further, we are taking advantage of functional design patterns where all column-based operations are evaluated lazily. So, for example, aggregations are carried out only on rows that matter (taking into account even complex conditions that might span multiple tables in the relational model). Duplicate operations are reduced to a bare minimum by keeping track of the relational data model. In addition to the mere advantage in performance, FastProp, by building on an abstract data model, also has an edge in memory consumption based on the abstract database design, the reliance on efficient storage patterns (utilizing pointers and indices) for concrete data, and by taking advantage of functional design patterns and lazy computations. This allows working with data sets of substantial size even without falling back to distributed computing models.

Next Steps

If you are interested in further real-world applications of getML, visit the notebook section on getml.com. If you want to gain a deeper understanding about our notebooks' contents or download the code behind the notebooks, have a look at the getml-demo repository. Here, you can also find futher benchmarks of getML.

Want to try out without much hassle? Just head to try.getml.com to launch an instance of getML directly in your browser.

Further, here is some additional material from our documentation if you want to learn more about getML:

Get in contact

If you have any questions, just write us an email. Prefer a private demo of getML for your team? Just contact us to arrange an introduction to getML.