NOTE: Due to featuretools' memory requirement, this notebook will not run on MyBinder.
In this notebook, we will demonstrate how getML can be used for a customer churn project using a synthetic dataset of a fictional company. We will also benchmark getML against featuretools.
Summary:
Author: Dr. Patrick Urbanke
AdventureWorks is a fictional company that sells bicycles. It is used by Microsoft to showcase how its MS SQL Server can be used to manage business data. Since this dataset resembles a real-world customer database and it is open-source, we will use it to showcase how getML can be used for a classic customer churn project (real customer databases are not easily available for the purposes of showcasing and benchmarking, for reasons of data privacy).
The dataset has been downloaded from the CTU Prague relational learning repository (Motl and Schulte, 2015).
We will benchmark getML 's feature learning algorithms against featuretools, an open-source implementation of the propositionalization algorithm, similar to getML's FastProp.
Your getML live session is running inside a docker container on mybinder.org, a service built by the Jupyter community and funded by Google Cloud, OVH, GESIS Notebooks and the Turing Institute. As it is a free service, this session will shut down after 10 minutes of inactivity.
Let's get started with the analysis and set up your session:
import copy
import os
from urllib import request
import numpy as np
import pandas as pd
from IPython.display import Image
import matplotlib.pyplot as plt
plt.style.use('seaborn')
%matplotlib inline
import featuretools
import getml
getml.set_project('adventure_works')
Connected to project 'adventure_works'
We begin by downloading the data:
conn = getml.database.connect_mariadb(
host="relational.fit.cvut.cz",
dbname="AdventureWorks2014",
port=3306,
user="guest",
password="relational"
)
conn
Connection(conn_id='default', dbname='AdventureWorks2014', dialect='mysql', host='relational.fit.cvut.cz', port=3306)
def load_if_needed(name):
"""
Loads the data from the relational learning
repository, if the data frame has not already
been loaded.
"""
if not getml.data.exists(name):
data_frame = getml.DataFrame.from_db(
name=name,
table_name=name,
conn=conn
)
data_frame.save()
else:
data_frame = getml.data.load_data_frame(name)
return data_frame
product = load_if_needed("Product")
sales_order_detail = load_if_needed("SalesOrderDetail")
sales_order_header = load_if_needed("SalesOrderHeader")
sales_order_reason = load_if_needed("SalesOrderHeaderSalesReason")
special_offer = load_if_needed("SpecialOffer")
store = load_if_needed("Store")
product
name | ProductID | MakeFlag | FinishedGoodsFlag | SafetyStockLevel | ReorderPoint | DaysToManufacture | ProductSubcategoryID | ProductModelID | Name | ProductNumber | Color | StandardCost | ListPrice | Size | SizeUnitMeasureCode | WeightUnitMeasureCode | Weight | ProductLine | Class | Style | SellStartDate | SellEndDate | DiscontinuedDate | rowguid | ModifiedDate |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
role | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string |
0 | 1 | 0 | 0 | 1000 | 750 | 0 | nan | nan | Adjustable Race | AR-5381 | NULL | 0.0000 | 0.0000 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | 694215B7-08F7-4C0D-ACB1-D734BA44... | 2014-02-08 10:01:36 |
1 | 2 | 0 | 0 | 1000 | 750 | 0 | nan | nan | Bearing Ball | BA-8327 | NULL | 0.0000 | 0.0000 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | 58AE3C20-4F3A-4749-A7D4-D568806C... | 2014-02-08 10:01:36 |
2 | 3 | 1 | 0 | 800 | 600 | 1 | nan | nan | BB Ball Bearing | BE-2349 | NULL | 0.0000 | 0.0000 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | 9C21AED2-5BFA-4F18-BCB8-F11638DC... | 2014-02-08 10:01:36 |
3 | 4 | 0 | 0 | 800 | 600 | 0 | nan | nan | Headset Ball Bearings | BE-2908 | NULL | 0.0000 | 0.0000 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | ECFED6CB-51FF-49B5-B06C-7D8AC834... | 2014-02-08 10:01:36 |
4 | 316 | 1 | 0 | 800 | 600 | 1 | nan | nan | Blade | BL-2036 | NULL | 0.0000 | 0.0000 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | E73E9750-603B-4131-89F5-3DD15ED5... | 2014-02-08 10:01:36 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
499 | 995 | 1 | 1 | 500 | 375 | 1 | 5 | 96 | ML Bottom Bracket | BB-8107 | NULL | 44.9506 | 101.2400 | NULL | NULL | G | 168.00 | NULL | M | NULL | 2013-05-30 00:00:00 | NULL | NULL | 71AB847F-D091-42D6-B735-7B0C2D82... | 2014-02-08 10:01:36 |
500 | 996 | 1 | 1 | 500 | 375 | 1 | 5 | 97 | HL Bottom Bracket | BB-9108 | NULL | 53.9416 | 121.4900 | NULL | NULL | G | 170.00 | NULL | H | NULL | 2013-05-30 00:00:00 | NULL | NULL | 230C47C5-08B2-4CE3-B706-69C0BDD6... | 2014-02-08 10:01:36 |
501 | 997 | 1 | 1 | 100 | 75 | 4 | 2 | 31 | Road-750 Black, 44 | BK-R19B-44 | Black | 343.6496 | 539.9900 | 44 | CM | LB | 19.77 | R | L | U | 2013-05-30 00:00:00 | NULL | NULL | 44CE4802-409F-43AB-9B27-CA534218... | 2014-02-08 10:01:36 |
502 | 998 | 1 | 1 | 100 | 75 | 4 | 2 | 31 | Road-750 Black, 48 | BK-R19B-48 | Black | 343.6496 | 539.9900 | 48 | CM | LB | 20.13 | R | L | U | 2013-05-30 00:00:00 | NULL | NULL | 3DE9A212-1D49-40B6-B10A-F564D981... | 2014-02-08 10:01:36 |
503 | 999 | 1 | 1 | 100 | 75 | 4 | 2 | 31 | Road-750 Black, 52 | BK-R19B-52 | Black | 343.6496 | 539.9900 | 52 | CM | LB | 20.42 | R | L | U | 2013-05-30 00:00:00 | NULL | NULL | AE638923-2B67-4679-B90E-ABBAB17D... | 2014-02-08 10:01:36 |
504 rows x 25 columns
memory usage: 0.19 MB
name: Product
type: getml.DataFrame
url: http://localhost:1709/#/getdataframe/adventure_works/Product/
sales_order_detail
name | SalesOrderID | SalesOrderDetailID | OrderQty | ProductID | SpecialOfferID | CarrierTrackingNumber | UnitPrice | UnitPriceDiscount | LineTotal | rowguid | ModifiedDate |
---|---|---|---|---|---|---|---|---|---|---|---|
role | unused_float | unused_float | unused_float | unused_float | unused_float | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string |
0 | 43659 | 1 | 1 | 776 | 1 | 4911-403C-98 | 2024.9940 | 0.0000 | 2024.994000 | B207C96D-D9E6-402B-8470-2CC176C4... | 2011-05-31 00:00:00 |
1 | 43659 | 2 | 3 | 777 | 1 | 4911-403C-98 | 2024.9940 | 0.0000 | 6074.982000 | 7ABB600D-1E77-41BE-9FE5-B9142CFC... | 2011-05-31 00:00:00 |
2 | 43659 | 3 | 1 | 778 | 1 | 4911-403C-98 | 2024.9940 | 0.0000 | 2024.994000 | 475CF8C6-49F6-486E-B0AD-AFC6A50C... | 2011-05-31 00:00:00 |
3 | 43659 | 4 | 1 | 771 | 1 | 4911-403C-98 | 2039.9940 | 0.0000 | 2039.994000 | 04C4DE91-5815-45D6-8670-F462719F... | 2011-05-31 00:00:00 |
4 | 43659 | 5 | 1 | 772 | 1 | 4911-403C-98 | 2039.9940 | 0.0000 | 2039.994000 | 5A74C7D2-E641-438E-A7AC-37BF2328... | 2011-05-31 00:00:00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
121312 | 75122 | 121313 | 1 | 878 | 1 | NULL | 21.9800 | 0.0000 | 21.980000 | 8CAD6675-18CC-4F47-8287-97B41A8E... | 2014-06-30 00:00:00 |
121313 | 75122 | 121314 | 1 | 712 | 1 | NULL | 8.9900 | 0.0000 | 8.990000 | 84F1C363-1C50-4442-BE16-541C59B6... | 2014-06-30 00:00:00 |
121314 | 75123 | 121315 | 1 | 878 | 1 | NULL | 21.9800 | 0.0000 | 21.980000 | C18B6476-429F-4BB1-828E-2BE5F82A... | 2014-06-30 00:00:00 |
121315 | 75123 | 121316 | 1 | 879 | 1 | NULL | 159.0000 | 0.0000 | 159.000000 | 75A89C6A-C60A-47EA-8A52-B52A9C43... | 2014-06-30 00:00:00 |
121316 | 75123 | 121317 | 1 | 712 | 1 | NULL | 8.9900 | 0.0000 | 8.990000 | 73646D26-0461-450D-8019-2C6C8586... | 2014-06-30 00:00:00 |
121317 rows x 11 columns
memory usage: 21.84 MB
name: SalesOrderDetail
type: getml.DataFrame
url: http://localhost:1709/#/getdataframe/adventure_works/SalesOrderDetail/
sales_order_header
name | SalesOrderID | RevisionNumber | Status | OnlineOrderFlag | CustomerID | SalesPersonID | TerritoryID | BillToAddressID | ShipToAddressID | ShipMethodID | CreditCardID | CurrencyRateID | OrderDate | DueDate | ShipDate | SalesOrderNumber | PurchaseOrderNumber | AccountNumber | CreditCardApprovalCode | SubTotal | TaxAmt | Freight | TotalDue | Comment | rowguid | ModifiedDate |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
role | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string |
0 | 43659 | 8 | 5 | 0 | 29825 | 279 | 5 | 985 | 985 | 5 | 16281 | nan | 2011-05-31 00:00:00 | 2011-06-12 00:00:00 | 2011-06-07 00:00:00 | SO43659 | PO522145787 | 10-4020-000676 | 105041Vi84182 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | NULL | 79B65321-39CA-4115-9CBA-8FE0903E... | 2011-06-07 00:00:00 |
1 | 43660 | 8 | 5 | 0 | 29672 | 279 | 5 | 921 | 921 | 5 | 5618 | nan | 2011-05-31 00:00:00 | 2011-06-12 00:00:00 | 2011-06-07 00:00:00 | SO43660 | PO18850127500 | 10-4020-000117 | 115213Vi29411 | 1294.2529 | 124.2483 | 38.8276 | 1457.3288 | NULL | 738DC42D-D03B-48A1-9822-F95A67EA... | 2011-06-07 00:00:00 |
2 | 43661 | 8 | 5 | 0 | 29734 | 282 | 6 | 517 | 517 | 5 | 1346 | 4 | 2011-05-31 00:00:00 | 2011-06-12 00:00:00 | 2011-06-07 00:00:00 | SO43661 | PO18473189620 | 10-4020-000442 | 85274Vi6854 | 32726.4786 | 3153.7696 | 985.5530 | 36865.8012 | NULL | D91B9131-18A4-4A11-BC3A-90B6F53E... | 2011-06-07 00:00:00 |
3 | 43662 | 8 | 5 | 0 | 29994 | 282 | 6 | 482 | 482 | 5 | 10456 | 4 | 2011-05-31 00:00:00 | 2011-06-12 00:00:00 | 2011-06-07 00:00:00 | SO43662 | PO18444174044 | 10-4020-000227 | 125295Vi53935 | 28832.5289 | 2775.1646 | 867.2389 | 32474.9324 | NULL | 4A1ECFC0-CC3A-4740-B028-1C50BB48... | 2011-06-07 00:00:00 |
4 | 43663 | 8 | 5 | 0 | 29565 | 276 | 4 | 1073 | 1073 | 5 | 4322 | nan | 2011-05-31 00:00:00 | 2011-06-12 00:00:00 | 2011-06-07 00:00:00 | SO43663 | PO18009186470 | 10-4020-000510 | 45303Vi22691 | 419.4589 | 40.2681 | 12.5838 | 472.3108 | NULL | 9B1E7A40-6AE0-4AD3-811C-A6495185... | 2011-06-07 00:00:00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
31460 | 75119 | 8 | 5 | 1 | 11981 | nan | 1 | 17649 | 17649 | 1 | 6761 | nan | 2014-06-30 00:00:00 | 2014-07-12 00:00:00 | 2014-07-07 00:00:00 | SO75119 | NULL | 10-4030-011981 | 429826Vi35166 | 42.2800 | 3.3824 | 1.0570 | 46.7194 | NULL | 9382F1C9-383A-435F-9449-0EECEA21... | 2014-07-07 00:00:00 |
31461 | 75120 | 8 | 5 | 1 | 18749 | nan | 6 | 28374 | 28374 | 1 | 8925 | nan | 2014-06-30 00:00:00 | 2014-07-12 00:00:00 | 2014-07-07 00:00:00 | SO75120 | NULL | 10-4030-018749 | 929849Vi46003 | 84.9600 | 6.7968 | 2.1240 | 93.8808 | NULL | AE6A4FCF-FF73-4CD4-AF2C-5993D00D... | 2014-07-07 00:00:00 |
31462 | 75121 | 8 | 5 | 1 | 15251 | nan | 6 | 26553 | 26553 | 1 | 14220 | nan | 2014-06-30 00:00:00 | 2014-07-12 00:00:00 | 2014-07-07 00:00:00 | SO75121 | NULL | 10-4030-015251 | 529864Vi73738 | 74.9800 | 5.9984 | 1.8745 | 82.8529 | NULL | D7395C0E-00CB-4BFA-A238-0D6A9F49... | 2014-07-07 00:00:00 |
31463 | 75122 | 8 | 5 | 1 | 15868 | nan | 6 | 14616 | 14616 | 1 | 18719 | nan | 2014-06-30 00:00:00 | 2014-07-12 00:00:00 | 2014-07-07 00:00:00 | SO75122 | NULL | 10-4030-015868 | 330022Vi97312 | 30.9700 | 2.4776 | 0.7743 | 34.2219 | NULL | 4221035A-4159-492F-AF40-4363A64F... | 2014-07-07 00:00:00 |
31464 | 75123 | 8 | 5 | 1 | 18759 | nan | 6 | 14024 | 14024 | 1 | 10084 | nan | 2014-06-30 00:00:00 | 2014-07-12 00:00:00 | 2014-07-07 00:00:00 | SO75123 | NULL | 10-4030-018759 | 230370Vi51970 | 189.9700 | 15.1976 | 4.7493 | 209.9169 | NULL | D54752FF-2B54-4BE5-95EA-3B72289C... | 2014-07-07 00:00:00 |
31465 rows x 26 columns
memory usage: 12.80 MB
name: SalesOrderHeader
type: getml.DataFrame
url: http://localhost:1709/#/getdataframe/adventure_works/SalesOrderHeader/
sales_order_reason
name | SalesOrderID | SalesReasonID | ModifiedDate |
---|---|---|---|
role | unused_float | unused_float | unused_string |
0 | 43697 | 5 | 2011-05-31 00:00:00 |
1 | 43697 | 9 | 2011-05-31 00:00:00 |
2 | 43702 | 5 | 2011-06-01 00:00:00 |
3 | 43702 | 9 | 2011-06-01 00:00:00 |
4 | 43703 | 5 | 2011-06-01 00:00:00 |
... | ... | ... | |
27642 | 75119 | 1 | 2014-06-30 00:00:00 |
27643 | 75120 | 1 | 2014-06-30 00:00:00 |
27644 | 75121 | 1 | 2014-06-30 00:00:00 |
27645 | 75122 | 1 | 2014-06-30 00:00:00 |
27646 | 75123 | 1 | 2014-06-30 00:00:00 |
27647 rows x 3 columns
memory usage: 1.22 MB
name: SalesOrderHeaderSalesReason
type: getml.DataFrame
url: http://localhost:1709/#/getdataframe/adventure_works/SalesOrderHeaderSalesReason/
special_offer
name | SpecialOfferID | MinQty | MaxQty | Description | DiscountPct | Type | Category | StartDate | EndDate | rowguid | ModifiedDate |
---|---|---|---|---|---|---|---|---|---|---|---|
role | unused_float | unused_float | unused_float | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string |
0 | 1 | 0 | nan | No Discount | 0.0000 | No Discount | No Discount | 2011-05-01 00:00:00 | 2014-11-30 00:00:00 | 0290C4F5-191F-4337-AB6B-0A2DDE03... | 2011-04-01 00:00:00 |
1 | 2 | 11 | 14 | Volume Discount 11 to 14 | 0.0200 | Volume Discount | Reseller | 2011-05-31 00:00:00 | 2014-05-30 00:00:00 | D7542EE7-15DB-4541-985C-5CC27AEF... | 2011-05-01 00:00:00 |
2 | 3 | 15 | 24 | Volume Discount 15 to 24 | 0.0500 | Volume Discount | Reseller | 2011-05-31 00:00:00 | 2014-05-30 00:00:00 | 4BDBCC01-8CF7-40A9-B643-40EC5B71... | 2011-05-01 00:00:00 |
3 | 4 | 25 | 40 | Volume Discount 25 to 40 | 0.1000 | Volume Discount | Reseller | 2011-05-31 00:00:00 | 2014-05-30 00:00:00 | 504B5E85-8F3F-4EBC-9E1D-C1BC5DEA... | 2011-05-01 00:00:00 |
4 | 5 | 41 | 60 | Volume Discount 41 to 60 | 0.1500 | Volume Discount | Reseller | 2011-05-31 00:00:00 | 2014-05-30 00:00:00 | 677E1D9D-944F-4E81-90E8-47EB0A82... | 2011-05-01 00:00:00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
11 | 12 | 0 | nan | LL Road Frame Sale | 0.3500 | Excess Inventory | Reseller | 2013-05-30 00:00:00 | 2013-07-14 00:00:00 | C0AF1C89-9722-4235-9248-3FBA4D9E... | 2013-04-30 00:00:00 |
12 | 13 | 0 | nan | Touring-3000 Promotion | 0.1500 | New Product | Reseller | 2013-05-30 00:00:00 | 2013-08-29 00:00:00 | 5061CCE4-E021-45A8-9A75-DFB36CBB... | 2013-04-30 00:00:00 |
13 | 14 | 0 | nan | Touring-1000 Promotion | 0.2000 | New Product | Reseller | 2013-05-30 00:00:00 | 2013-08-29 00:00:00 | 1AF84A9E-A98C-4BD9-B48F-DC2B8B6B... | 2013-04-30 00:00:00 |
14 | 15 | 0 | nan | Half-Price Pedal Sale | 0.5000 | Seasonal Discount | Customer | 2013-07-14 00:00:00 | 2013-08-14 00:00:00 | 03E3594D-6EBB-46A6-B8EE-A9289C0C... | 2013-06-14 00:00:00 |
15 | 16 | 0 | nan | Mountain-500 Silver Clearance Sa... | 0.4000 | Discontinued Product | Reseller | 2014-03-31 00:00:00 | 2014-05-30 00:00:00 | EB7CB484-BCCF-4D2D-BF73-521B2001... | 2014-03-01 00:00:00 |
16 rows x 11 columns
memory usage: 0.00 MB
name: SpecialOffer
type: getml.DataFrame
url: http://localhost:1709/#/getdataframe/adventure_works/SpecialOffer/
store
name | BusinessEntityID | SalesPersonID | Name | Demographics | rowguid | ModifiedDate |
---|---|---|---|---|---|---|
role | unused_float | unused_float | unused_string | unused_string | unused_string | unused_string |
0 | 292 | 279 | Next-Door Bike Store | <StoreSurvey xmlns="http://schem... | A22517E3-848D-4EBE-B9D9-7437F343... | 2014-09-12 11:15:07 |
1 | 294 | 276 | Professional Sales and Service | <StoreSurvey xmlns="http://schem... | B50CA50B-C601-4A13-B07E-2C63862D... | 2014-09-12 11:15:07 |
2 | 296 | 277 | Riders Company | <StoreSurvey xmlns="http://schem... | 337C3688-1339-4E1A-A08A-B54B2356... | 2014-09-12 11:15:07 |
3 | 298 | 275 | The Bike Mechanics | <StoreSurvey xmlns="http://schem... | 7894F278-F0C8-4D16-BD75-213FDBF1... | 2014-09-12 11:15:07 |
4 | 300 | 286 | Nationwide Supply | <StoreSurvey xmlns="http://schem... | C3FC9705-A8C4-4F3A-9550-EB2FA4B7... | 2014-09-12 11:15:07 |
... | ... | ... | ... | ... | ... | |
696 | 1988 | 282 | Retreat Inn | <StoreSurvey xmlns="http://schem... | EA21EC81-1BFA-4A07-9B4D-73D9852A... | 2014-09-12 11:15:07 |
697 | 1990 | 281 | Technical Parts Manufacturing | <StoreSurvey xmlns="http://schem... | C8E3C4ED-8F58-4DB2-B600-E0CD11D9... | 2014-09-12 11:15:07 |
698 | 1992 | 277 | Totes & Baskets Company | <StoreSurvey xmlns="http://schem... | CE860B58-643C-4567-BFD8-06E97969... | 2014-09-12 11:15:07 |
699 | 1994 | 277 | World of Bikes | <StoreSurvey xmlns="http://schem... | 0C10F2B6-A13A-440C-9C25-5B28D482... | 2014-09-12 11:15:07 |
700 | 2051 | 275 | A Bicycle Association | <StoreSurvey xmlns="http://schem... | 82237172-D3FE-4A95-82EF-636F6552... | 2014-09-12 11:15:07 |
701 rows x 6 columns
memory usage: 0.38 MB
name: Store
type: getml.DataFrame
url: http://localhost:1709/#/getdataframe/adventure_works/Store/
getML requires that we define roles for each of the columns.
product.set_role("ProductID", getml.data.roles.join_key)
product.set_role(["MakeFlag", "ProductSubcategoryID", "ProductModelID"], getml.data.roles.categorical)
product.set_role(["SafetyStockLevel", "ReorderPoint", "StandardCost", "ListPrice"], getml.data.roles.numerical)
product
name | ProductID | MakeFlag | ProductSubcategoryID | ProductModelID | SafetyStockLevel | ReorderPoint | StandardCost | ListPrice | FinishedGoodsFlag | DaysToManufacture | Name | ProductNumber | Color | Size | SizeUnitMeasureCode | WeightUnitMeasureCode | Weight | ProductLine | Class | Style | SellStartDate | SellEndDate | DiscontinuedDate | rowguid | ModifiedDate |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
role | join_key | categorical | categorical | categorical | numerical | numerical | numerical | numerical | unused_float | unused_float | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string |
0 | 1 | 0 | NULL | NULL | 1000 | 750 | 0 | 0 | 0 | 0 | Adjustable Race | AR-5381 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | 694215B7-08F7-4C0D-ACB1-D734BA44... | 2014-02-08 10:01:36 |
1 | 2 | 0 | NULL | NULL | 1000 | 750 | 0 | 0 | 0 | 0 | Bearing Ball | BA-8327 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | 58AE3C20-4F3A-4749-A7D4-D568806C... | 2014-02-08 10:01:36 |
2 | 3 | 1 | NULL | NULL | 800 | 600 | 0 | 0 | 0 | 1 | BB Ball Bearing | BE-2349 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | 9C21AED2-5BFA-4F18-BCB8-F11638DC... | 2014-02-08 10:01:36 |
3 | 4 | 0 | NULL | NULL | 800 | 600 | 0 | 0 | 0 | 0 | Headset Ball Bearings | BE-2908 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | ECFED6CB-51FF-49B5-B06C-7D8AC834... | 2014-02-08 10:01:36 |
4 | 316 | 1 | NULL | NULL | 800 | 600 | 0 | 0 | 0 | 1 | Blade | BL-2036 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | E73E9750-603B-4131-89F5-3DD15ED5... | 2014-02-08 10:01:36 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
499 | 995 | 1 | 5 | 96 | 500 | 375 | 44.9506 | 101.24 | 1 | 1 | ML Bottom Bracket | BB-8107 | NULL | NULL | NULL | G | 168.00 | NULL | M | NULL | 2013-05-30 00:00:00 | NULL | NULL | 71AB847F-D091-42D6-B735-7B0C2D82... | 2014-02-08 10:01:36 |
500 | 996 | 1 | 5 | 97 | 500 | 375 | 53.9416 | 121.49 | 1 | 1 | HL Bottom Bracket | BB-9108 | NULL | NULL | NULL | G | 170.00 | NULL | H | NULL | 2013-05-30 00:00:00 | NULL | NULL | 230C47C5-08B2-4CE3-B706-69C0BDD6... | 2014-02-08 10:01:36 |
501 | 997 | 1 | 2 | 31 | 100 | 75 | 343.6496 | 539.99 | 1 | 4 | Road-750 Black, 44 | BK-R19B-44 | Black | 44 | CM | LB | 19.77 | R | L | U | 2013-05-30 00:00:00 | NULL | NULL | 44CE4802-409F-43AB-9B27-CA534218... | 2014-02-08 10:01:36 |
502 | 998 | 1 | 2 | 31 | 100 | 75 | 343.6496 | 539.99 | 1 | 4 | Road-750 Black, 48 | BK-R19B-48 | Black | 48 | CM | LB | 20.13 | R | L | U | 2013-05-30 00:00:00 | NULL | NULL | 3DE9A212-1D49-40B6-B10A-F564D981... | 2014-02-08 10:01:36 |
503 | 999 | 1 | 2 | 31 | 100 | 75 | 343.6496 | 539.99 | 1 | 4 | Road-750 Black, 52 | BK-R19B-52 | Black | 52 | CM | LB | 20.42 | R | L | U | 2013-05-30 00:00:00 | NULL | NULL | AE638923-2B67-4679-B90E-ABBAB17D... | 2014-02-08 10:01:36 |
504 rows x 25 columns
memory usage: 0.17 MB
name: Product
type: getml.DataFrame
url: http://localhost:1709/#/getdataframe/adventure_works/Product/
sales_order_detail.set_role(["SalesOrderID", "SalesOrderDetailID", "ProductID", "SpecialOfferID"], getml.data.roles.join_key)
sales_order_detail.set_role(["OrderQty", "UnitPrice", "UnitPriceDiscount", "LineTotal"], getml.data.roles.numerical)
sales_order_detail.set_role("ModifiedDate", getml.data.roles.time_stamp)
sales_order_detail
name | ModifiedDate | SalesOrderID | SalesOrderDetailID | ProductID | SpecialOfferID | OrderQty | UnitPrice | UnitPriceDiscount | LineTotal | CarrierTrackingNumber | rowguid |
---|---|---|---|---|---|---|---|---|---|---|---|
role | time_stamp | join_key | join_key | join_key | join_key | numerical | numerical | numerical | numerical | unused_string | unused_string |
unit | time stamp, comparison only | ||||||||||
0 | 2011-05-31 | 43659 | 1 | 776 | 1 | 1 | 2024.994 | 0 | 2024.994 | 4911-403C-98 | B207C96D-D9E6-402B-8470-2CC176C4... |
1 | 2011-05-31 | 43659 | 2 | 777 | 1 | 3 | 2024.994 | 0 | 6074.982 | 4911-403C-98 | 7ABB600D-1E77-41BE-9FE5-B9142CFC... |
2 | 2011-05-31 | 43659 | 3 | 778 | 1 | 1 | 2024.994 | 0 | 2024.994 | 4911-403C-98 | 475CF8C6-49F6-486E-B0AD-AFC6A50C... |
3 | 2011-05-31 | 43659 | 4 | 771 | 1 | 1 | 2039.994 | 0 | 2039.994 | 4911-403C-98 | 04C4DE91-5815-45D6-8670-F462719F... |
4 | 2011-05-31 | 43659 | 5 | 772 | 1 | 1 | 2039.994 | 0 | 2039.994 | 4911-403C-98 | 5A74C7D2-E641-438E-A7AC-37BF2328... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
121312 | 2014-06-30 | 75122 | 121313 | 878 | 1 | 1 | 21.98 | 0 | 21.98 | NULL | 8CAD6675-18CC-4F47-8287-97B41A8E... |
121313 | 2014-06-30 | 75122 | 121314 | 712 | 1 | 1 | 8.99 | 0 | 8.99 | NULL | 84F1C363-1C50-4442-BE16-541C59B6... |
121314 | 2014-06-30 | 75123 | 121315 | 878 | 1 | 1 | 21.98 | 0 | 21.98 | NULL | C18B6476-429F-4BB1-828E-2BE5F82A... |
121315 | 2014-06-30 | 75123 | 121316 | 879 | 1 | 1 | 159 | 0 | 159 | NULL | 75A89C6A-C60A-47EA-8A52-B52A9C43... |
121316 | 2014-06-30 | 75123 | 121317 | 712 | 1 | 1 | 8.99 | 0 | 8.99 | NULL | 73646D26-0461-450D-8019-2C6C8586... |
121317 rows x 11 columns
memory usage: 14.32 MB
name: SalesOrderDetail
type: getml.DataFrame
url: http://localhost:1709/#/getdataframe/adventure_works/SalesOrderDetail/
sales_order_reason.set_role("SalesOrderID", getml.data.roles.join_key)
sales_order_reason.set_role("SalesReasonID", getml.data.roles.categorical)
sales_order_reason
name | SalesOrderID | SalesReasonID | ModifiedDate |
---|---|---|---|
role | join_key | categorical | unused_string |
0 | 43697 | 5 | 2011-05-31 00:00:00 |
1 | 43697 | 9 | 2011-05-31 00:00:00 |
2 | 43702 | 5 | 2011-06-01 00:00:00 |
3 | 43702 | 9 | 2011-06-01 00:00:00 |
4 | 43703 | 5 | 2011-06-01 00:00:00 |
... | ... | ... | |
27642 | 75119 | 1 | 2014-06-30 00:00:00 |
27643 | 75120 | 1 | 2014-06-30 00:00:00 |
27644 | 75121 | 1 | 2014-06-30 00:00:00 |
27645 | 75122 | 1 | 2014-06-30 00:00:00 |
27646 | 75123 | 1 | 2014-06-30 00:00:00 |
27647 rows x 3 columns
memory usage: 1.00 MB
name: SalesOrderHeaderSalesReason
type: getml.DataFrame
url: http://localhost:1709/#/getdataframe/adventure_works/SalesOrderHeaderSalesReason/
special_offer.set_role(["SpecialOfferID"], getml.data.roles.join_key)
special_offer.set_role(["MinQty", "DiscountPct"], getml.data.roles.numerical)
special_offer.set_role(["Category", "Description", "Type"], getml.data.roles.categorical)
special_offer.set_role(["StartDate", "EndDate"], getml.data.roles.time_stamp)
special_offer
name | StartDate | EndDate | SpecialOfferID | Category | Description | Type | MinQty | DiscountPct | MaxQty | rowguid | ModifiedDate |
---|---|---|---|---|---|---|---|---|---|---|---|
role | time_stamp | time_stamp | join_key | categorical | categorical | categorical | numerical | numerical | unused_float | unused_string | unused_string |
unit | time stamp, comparison only | time stamp, comparison only | |||||||||
0 | 2011-05-01 | 2014-11-30 | 1 | No Discount | No Discount | No Discount | 0 | 0 | nan | 0290C4F5-191F-4337-AB6B-0A2DDE03... | 2011-04-01 00:00:00 |
1 | 2011-05-31 | 2014-05-30 | 2 | Reseller | Volume Discount 11 to 14 | Volume Discount | 11 | 0.02 | 14 | D7542EE7-15DB-4541-985C-5CC27AEF... | 2011-05-01 00:00:00 |
2 | 2011-05-31 | 2014-05-30 | 3 | Reseller | Volume Discount 15 to 24 | Volume Discount | 15 | 0.05 | 24 | 4BDBCC01-8CF7-40A9-B643-40EC5B71... | 2011-05-01 00:00:00 |
3 | 2011-05-31 | 2014-05-30 | 4 | Reseller | Volume Discount 25 to 40 | Volume Discount | 25 | 0.1 | 40 | 504B5E85-8F3F-4EBC-9E1D-C1BC5DEA... | 2011-05-01 00:00:00 |
4 | 2011-05-31 | 2014-05-30 | 5 | Reseller | Volume Discount 41 to 60 | Volume Discount | 41 | 0.15 | 60 | 677E1D9D-944F-4E81-90E8-47EB0A82... | 2011-05-01 00:00:00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
11 | 2013-05-30 | 2013-07-14 | 12 | Reseller | LL Road Frame Sale | Excess Inventory | 0 | 0.35 | nan | C0AF1C89-9722-4235-9248-3FBA4D9E... | 2013-04-30 00:00:00 |
12 | 2013-05-30 | 2013-08-29 | 13 | Reseller | Touring-3000 Promotion | New Product | 0 | 0.15 | nan | 5061CCE4-E021-45A8-9A75-DFB36CBB... | 2013-04-30 00:00:00 |
13 | 2013-05-30 | 2013-08-29 | 14 | Reseller | Touring-1000 Promotion | New Product | 0 | 0.2 | nan | 1AF84A9E-A98C-4BD9-B48F-DC2B8B6B... | 2013-04-30 00:00:00 |
14 | 2013-07-14 | 2013-08-14 | 15 | Customer | Half-Price Pedal Sale | Seasonal Discount | 0 | 0.5 | nan | 03E3594D-6EBB-46A6-B8EE-A9289C0C... | 2013-06-14 00:00:00 |
15 | 2014-03-31 | 2014-05-30 | 16 | Reseller | Mountain-500 Silver Clearance Sa... | Discontinued Product | 0 | 0.4 | nan | EB7CB484-BCCF-4D2D-BF73-521B2001... | 2014-03-01 00:00:00 |
16 rows x 11 columns
memory usage: 0.00 MB
name: SpecialOffer
type: getml.DataFrame
url: http://localhost:1709/#/getdataframe/adventure_works/SpecialOffer/
store.set_role(["SalesPersonID"], getml.data.roles.join_key)
store["test"] = store["ModifiedDate"].update(getml.data.random() > 0.5, "NULL")
store.set_role(["SalesPersonID"], getml.data.roles.join_key)
store.set_role(["test"], getml.data.roles.time_stamp)
store
name | test | SalesPersonID | BusinessEntityID | Name | Demographics | rowguid | ModifiedDate |
---|---|---|---|---|---|---|---|
role | time_stamp | join_key | unused_float | unused_string | unused_string | unused_string | unused_string |
unit | time stamp, comparison only | ||||||
0 | 2014-09-12 11:15:07 | 279 | 292 | Next-Door Bike Store | <StoreSurvey xmlns="http://schem... | A22517E3-848D-4EBE-B9D9-7437F343... | 2014-09-12 11:15:07 |
1 | NULL | 276 | 294 | Professional Sales and Service | <StoreSurvey xmlns="http://schem... | B50CA50B-C601-4A13-B07E-2C63862D... | 2014-09-12 11:15:07 |
2 | 2014-09-12 11:15:07 | 277 | 296 | Riders Company | <StoreSurvey xmlns="http://schem... | 337C3688-1339-4E1A-A08A-B54B2356... | 2014-09-12 11:15:07 |
3 | NULL | 275 | 298 | The Bike Mechanics | <StoreSurvey xmlns="http://schem... | 7894F278-F0C8-4D16-BD75-213FDBF1... | 2014-09-12 11:15:07 |
4 | NULL | 286 | 300 | Nationwide Supply | <StoreSurvey xmlns="http://schem... | C3FC9705-A8C4-4F3A-9550-EB2FA4B7... | 2014-09-12 11:15:07 |
... | ... | ... | ... | ... | ... | ... | |
696 | NULL | 282 | 1988 | Retreat Inn | <StoreSurvey xmlns="http://schem... | EA21EC81-1BFA-4A07-9B4D-73D9852A... | 2014-09-12 11:15:07 |
697 | 2014-09-12 11:15:07 | 281 | 1990 | Technical Parts Manufacturing | <StoreSurvey xmlns="http://schem... | C8E3C4ED-8F58-4DB2-B600-E0CD11D9... | 2014-09-12 11:15:07 |
698 | NULL | 277 | 1992 | Totes & Baskets Company | <StoreSurvey xmlns="http://schem... | CE860B58-643C-4567-BFD8-06E97969... | 2014-09-12 11:15:07 |
699 | NULL | 277 | 1994 | World of Bikes | <StoreSurvey xmlns="http://schem... | 0C10F2B6-A13A-440C-9C25-5B28D482... | 2014-09-12 11:15:07 |
700 | 2014-09-12 11:15:07 | 275 | 2051 | A Bicycle Association | <StoreSurvey xmlns="http://schem... | 82237172-D3FE-4A95-82EF-636F6552... | 2014-09-12 11:15:07 |
701 rows x 7 columns
memory usage: 0.38 MB
name: Store
type: getml.DataFrame
url: http://localhost:1709/#/getdataframe/adventure_works/Store/
store.to_pandas()
test | SalesPersonID | BusinessEntityID | Name | Demographics | rowguid | ModifiedDate | |
---|---|---|---|---|---|---|---|
0 | 2014-09-12 11:15:07 | 279 | 292.0 | Next-Door Bike Store | <StoreSurvey xmlns="http://schemas.microsoft.c... | A22517E3-848D-4EBE-B9D9-7437F3432304 | 2014-09-12 11:15:07 |
1 | NaT | 276 | 294.0 | Professional Sales and Service | <StoreSurvey xmlns="http://schemas.microsoft.c... | B50CA50B-C601-4A13-B07E-2C63862D71B4 | 2014-09-12 11:15:07 |
2 | 2014-09-12 11:15:07 | 277 | 296.0 | Riders Company | <StoreSurvey xmlns="http://schemas.microsoft.c... | 337C3688-1339-4E1A-A08A-B54B23566E49 | 2014-09-12 11:15:07 |
3 | NaT | 275 | 298.0 | The Bike Mechanics | <StoreSurvey xmlns="http://schemas.microsoft.c... | 7894F278-F0C8-4D16-BD75-213FDBF13023 | 2014-09-12 11:15:07 |
4 | NaT | 286 | 300.0 | Nationwide Supply | <StoreSurvey xmlns="http://schemas.microsoft.c... | C3FC9705-A8C4-4F3A-9550-EB2FA4B7B64D | 2014-09-12 11:15:07 |
... | ... | ... | ... | ... | ... | ... | ... |
696 | NaT | 282 | 1988.0 | Retreat Inn | <StoreSurvey xmlns="http://schemas.microsoft.c... | EA21EC81-1BFA-4A07-9B4D-73D9852AFCBF | 2014-09-12 11:15:07 |
697 | 2014-09-12 11:15:07 | 281 | 1990.0 | Technical Parts Manufacturing | <StoreSurvey xmlns="http://schemas.microsoft.c... | C8E3C4ED-8F58-4DB2-B600-E0CD11D9CFAD | 2014-09-12 11:15:07 |
698 | NaT | 277 | 1992.0 | Totes & Baskets Company | <StoreSurvey xmlns="http://schemas.microsoft.c... | CE860B58-643C-4567-BFD8-06E97969CC67 | 2014-09-12 11:15:07 |
699 | NaT | 277 | 1994.0 | World of Bikes | <StoreSurvey xmlns="http://schemas.microsoft.c... | 0C10F2B6-A13A-440C-9C25-5B28D482872A | 2014-09-12 11:15:07 |
700 | 2014-09-12 11:15:07 | 275 | 2051.0 | A Bicycle Association | <StoreSurvey xmlns="http://schemas.microsoft.c... | 82237172-D3FE-4A95-82EF-636F655240AD | 2014-09-12 11:15:07 |
701 rows × 7 columns
sales_order_header["SalesPersonIDCat"] = sales_order_header["SalesPersonID"]
sales_order_header["TerritoryIDCat"] = sales_order_header["TerritoryID"]
sales_order_header.set_role(["CustomerID", "SalesOrderID", "SalesPersonID", "TerritoryID"], getml.data.roles.join_key)
sales_order_header.set_role(
["RevisionNumber", "OnlineOrderFlag", "SalesPersonIDCat", "TerritoryIDCat", "ShipMethodID"],
getml.data.roles.categorical)
sales_order_header.set_role(["SubTotal", "TaxAmt", "Freight", "TotalDue"], getml.data.roles.numerical)
sales_order_header.set_role(["OrderDate", "DueDate", "ShipDate", "ModifiedDate"], getml.data.roles.time_stamp)
sales_order_header
name | OrderDate | DueDate | ShipDate | ModifiedDate | CustomerID | SalesOrderID | SalesPersonID | TerritoryID | RevisionNumber | OnlineOrderFlag | SalesPersonIDCat | TerritoryIDCat | ShipMethodID | SubTotal | TaxAmt | Freight | TotalDue | Status | BillToAddressID | ShipToAddressID | CreditCardID | CurrencyRateID | SalesOrderNumber | PurchaseOrderNumber | AccountNumber | CreditCardApprovalCode | Comment | rowguid |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
role | time_stamp | time_stamp | time_stamp | time_stamp | join_key | join_key | join_key | join_key | categorical | categorical | categorical | categorical | categorical | numerical | numerical | numerical | numerical | unused_float | unused_float | unused_float | unused_float | unused_float | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string |
unit | time stamp, comparison only | time stamp, comparison only | time stamp, comparison only | time stamp, comparison only | ||||||||||||||||||||||||
0 | 2011-05-31 | 2011-06-12 | 2011-06-07 | 2011-06-07 | 29825 | 43659 | 279 | 5 | 8 | 0 | 279 | 5 | 5 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | 5 | 985 | 985 | 16281 | nan | SO43659 | PO522145787 | 10-4020-000676 | 105041Vi84182 | NULL | 79B65321-39CA-4115-9CBA-8FE0903E... |
1 | 2011-05-31 | 2011-06-12 | 2011-06-07 | 2011-06-07 | 29672 | 43660 | 279 | 5 | 8 | 0 | 279 | 5 | 5 | 1294.2529 | 124.2483 | 38.8276 | 1457.3288 | 5 | 921 | 921 | 5618 | nan | SO43660 | PO18850127500 | 10-4020-000117 | 115213Vi29411 | NULL | 738DC42D-D03B-48A1-9822-F95A67EA... |
2 | 2011-05-31 | 2011-06-12 | 2011-06-07 | 2011-06-07 | 29734 | 43661 | 282 | 6 | 8 | 0 | 282 | 6 | 5 | 32726.4786 | 3153.7696 | 985.553 | 36865.8012 | 5 | 517 | 517 | 1346 | 4 | SO43661 | PO18473189620 | 10-4020-000442 | 85274Vi6854 | NULL | D91B9131-18A4-4A11-BC3A-90B6F53E... |
3 | 2011-05-31 | 2011-06-12 | 2011-06-07 | 2011-06-07 | 29994 | 43662 | 282 | 6 | 8 | 0 | 282 | 6 | 5 | 28832.5289 | 2775.1646 | 867.2389 | 32474.9324 | 5 | 482 | 482 | 10456 | 4 | SO43662 | PO18444174044 | 10-4020-000227 | 125295Vi53935 | NULL | 4A1ECFC0-CC3A-4740-B028-1C50BB48... |
4 | 2011-05-31 | 2011-06-12 | 2011-06-07 | 2011-06-07 | 29565 | 43663 | 276 | 4 | 8 | 0 | 276 | 4 | 5 | 419.4589 | 40.2681 | 12.5838 | 472.3108 | 5 | 1073 | 1073 | 4322 | nan | SO43663 | PO18009186470 | 10-4020-000510 | 45303Vi22691 | NULL | 9B1E7A40-6AE0-4AD3-811C-A6495185... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
31460 | 2014-06-30 | 2014-07-12 | 2014-07-07 | 2014-07-07 | 11981 | 75119 | NULL | 1 | 8 | 1 | NULL | 1 | 1 | 42.28 | 3.3824 | 1.057 | 46.7194 | 5 | 17649 | 17649 | 6761 | nan | SO75119 | NULL | 10-4030-011981 | 429826Vi35166 | NULL | 9382F1C9-383A-435F-9449-0EECEA21... |
31461 | 2014-06-30 | 2014-07-12 | 2014-07-07 | 2014-07-07 | 18749 | 75120 | NULL | 6 | 8 | 1 | NULL | 6 | 1 | 84.96 | 6.7968 | 2.124 | 93.8808 | 5 | 28374 | 28374 | 8925 | nan | SO75120 | NULL | 10-4030-018749 | 929849Vi46003 | NULL | AE6A4FCF-FF73-4CD4-AF2C-5993D00D... |
31462 | 2014-06-30 | 2014-07-12 | 2014-07-07 | 2014-07-07 | 15251 | 75121 | NULL | 6 | 8 | 1 | NULL | 6 | 1 | 74.98 | 5.9984 | 1.8745 | 82.8529 | 5 | 26553 | 26553 | 14220 | nan | SO75121 | NULL | 10-4030-015251 | 529864Vi73738 | NULL | D7395C0E-00CB-4BFA-A238-0D6A9F49... |
31463 | 2014-06-30 | 2014-07-12 | 2014-07-07 | 2014-07-07 | 15868 | 75122 | NULL | 6 | 8 | 1 | NULL | 6 | 1 | 30.97 | 2.4776 | 0.7743 | 34.2219 | 5 | 14616 | 14616 | 18719 | nan | SO75122 | NULL | 10-4030-015868 | 330022Vi97312 | NULL | 4221035A-4159-492F-AF40-4363A64F... |
31464 | 2014-06-30 | 2014-07-12 | 2014-07-07 | 2014-07-07 | 18759 | 75123 | NULL | 6 | 8 | 1 | NULL | 6 | 1 | 189.97 | 15.1976 | 4.7493 | 209.9169 | 5 | 14024 | 14024 | 10084 | nan | SO75123 | NULL | 10-4030-018759 | 230370Vi51970 | NULL | D54752FF-2B54-4BE5-95EA-3B72289C... |
31465 rows x 28 columns
memory usage: 8.58 MB
name: SalesOrderHeader
type: getml.DataFrame
url: http://localhost:1709/#/getdataframe/adventure_works/SalesOrderHeader/
We must also define customer churn. In this case, we define customer churn as a customer not making another purchase within 180 days of his or her last purchase.
Thus, the churn variable is defined as follows:
OrderDate
OrderDate
max(OrderDate) - OrderDate <= 180 days
NULL targets can not be used in our analysis.
sales_order_header_pd = sales_order_header[["OrderDate", "CustomerID", "SalesOrderID"]].to_pandas()
repeat_purchases = sales_order_header_pd.merge(
sales_order_header_pd[["OrderDate", "CustomerID"]],
on="CustomerID",
how="left",
)
repeat_purchases = repeat_purchases[
repeat_purchases["OrderDate_y"] > repeat_purchases["OrderDate_x"]
]
repeat_purchases = repeat_purchases[
repeat_purchases["OrderDate_y"] - repeat_purchases["OrderDate_x"] > pd.Timedelta('180 days')
]
repeat_purchases.groupby("SalesOrderID", as_index=False).aggregate({"CustomerID": "max"})
repeat_purchase_ids = {sid: True for sid in repeat_purchases["SalesOrderID"]}
cut_off_date = max(sales_order_header_pd["OrderDate"]) - pd.Timedelta('180 days')
churn = np.asarray([
np.nan if order_date >= cut_off_date else 0 if order_id in repeat_purchase_ids else 1
for (order_date, order_id) in zip(sales_order_header_pd["OrderDate"], sales_order_header_pd["SalesOrderID"])
])
sales_order_header["churn"] = churn
sales_order_header = sales_order_header[~sales_order_header.churn.is_nan()].to_df("SalesOrderHeaderRefined")
sales_order_header.set_role("churn", getml.data.roles.target)
sales_order_header
name | OrderDate | DueDate | ShipDate | ModifiedDate | CustomerID | SalesOrderID | SalesPersonID | TerritoryID | churn | RevisionNumber | OnlineOrderFlag | SalesPersonIDCat | TerritoryIDCat | ShipMethodID | SubTotal | TaxAmt | Freight | TotalDue | Status | BillToAddressID | ShipToAddressID | CreditCardID | CurrencyRateID | SalesOrderNumber | PurchaseOrderNumber | AccountNumber | CreditCardApprovalCode | Comment | rowguid |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
role | time_stamp | time_stamp | time_stamp | time_stamp | join_key | join_key | join_key | join_key | target | categorical | categorical | categorical | categorical | categorical | numerical | numerical | numerical | numerical | unused_float | unused_float | unused_float | unused_float | unused_float | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string |
unit | time stamp, comparison only | time stamp, comparison only | time stamp, comparison only | time stamp, comparison only | |||||||||||||||||||||||||
0 | 2011-05-31 | 2011-06-12 | 2011-06-07 | 2011-06-07 | 29825 | 43659 | 279 | 5 | 0 | 8 | 0 | 279 | 5 | 5 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | 5 | 985 | 985 | 16281 | nan | SO43659 | PO522145787 | 10-4020-000676 | 105041Vi84182 | NULL | 79B65321-39CA-4115-9CBA-8FE0903E... |
1 | 2011-05-31 | 2011-06-12 | 2011-06-07 | 2011-06-07 | 29672 | 43660 | 279 | 5 | 0 | 8 | 0 | 279 | 5 | 5 | 1294.2529 | 124.2483 | 38.8276 | 1457.3288 | 5 | 921 | 921 | 5618 | nan | SO43660 | PO18850127500 | 10-4020-000117 | 115213Vi29411 | NULL | 738DC42D-D03B-48A1-9822-F95A67EA... |
2 | 2011-05-31 | 2011-06-12 | 2011-06-07 | 2011-06-07 | 29734 | 43661 | 282 | 6 | 0 | 8 | 0 | 282 | 6 | 5 | 32726.4786 | 3153.7696 | 985.553 | 36865.8012 | 5 | 517 | 517 | 1346 | 4 | SO43661 | PO18473189620 | 10-4020-000442 | 85274Vi6854 | NULL | D91B9131-18A4-4A11-BC3A-90B6F53E... |
3 | 2011-05-31 | 2011-06-12 | 2011-06-07 | 2011-06-07 | 29994 | 43662 | 282 | 6 | 0 | 8 | 0 | 282 | 6 | 5 | 28832.5289 | 2775.1646 | 867.2389 | 32474.9324 | 5 | 482 | 482 | 10456 | 4 | SO43662 | PO18444174044 | 10-4020-000227 | 125295Vi53935 | NULL | 4A1ECFC0-CC3A-4740-B028-1C50BB48... |
4 | 2011-05-31 | 2011-06-12 | 2011-06-07 | 2011-06-07 | 29565 | 43663 | 276 | 4 | 0 | 8 | 0 | 276 | 4 | 5 | 419.4589 | 40.2681 | 12.5838 | 472.3108 | 5 | 1073 | 1073 | 4322 | nan | SO43663 | PO18009186470 | 10-4020-000510 | 45303Vi22691 | NULL | 9B1E7A40-6AE0-4AD3-811C-A6495185... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
19699 | 2013-12-31 | 2014-01-12 | 2014-01-07 | 2014-01-07 | 20826 | 63358 | NULL | 7 | 1 | 8 | 1 | NULL | 7 | 1 | 1173.96 | 93.9168 | 29.349 | 1297.2258 | 5 | 24387 | 24387 | 3239 | nan | SO63358 | NULL | 10-4030-020826 | 1142084Vi17039 | NULL | 41278FBB-3DD8-488B-AEA5-8BF4A6F1... |
19700 | 2013-12-31 | 2014-01-12 | 2014-01-07 | 2014-01-07 | 24114 | 63359 | NULL | 10 | 1 | 8 | 1 | NULL | 10 | 1 | 1179.47 | 94.3576 | 29.4868 | 1303.3144 | 5 | 29682 | 29682 | nan | 10770 | SO63359 | NULL | 10-4030-024114 | NULL | NULL | 749532D4-BFF7-4FEC-9F77-396C4A96... |
19701 | 2013-12-31 | 2014-01-12 | 2014-01-07 | 2014-01-07 | 11417 | 63360 | NULL | 7 | 1 | 8 | 1 | NULL | 7 | 1 | 548.98 | 43.9184 | 13.7245 | 606.6229 | 5 | 21465 | 21465 | 6582 | nan | SO63360 | NULL | 10-4030-011417 | 242387Vi34223 | NULL | F92FA2A3-73E1-4DD0-987C-99D1C87E... |
19702 | 2013-12-31 | 2014-01-12 | 2014-01-07 | 2014-01-07 | 18125 | 63361 | NULL | 8 | 1 | 8 | 1 | NULL | 8 | 1 | 2384.07 | 190.7256 | 59.6018 | 2634.3974 | 5 | 26562 | 26562 | 11994 | nan | SO63361 | NULL | 10-4030-018125 | 1242859Vi61993 | NULL | 3A3758BF-CDFA-4740-9104-87BD5A08... |
19703 | 2013-12-31 | 2014-01-12 | 2014-01-07 | 2014-01-07 | 15692 | 63362 | NULL | 10 | 1 | 8 | 1 | NULL | 10 | 1 | 2419.06 | 193.5248 | 60.4765 | 2673.0613 | 5 | 27090 | 27090 | 16083 | 10770 | SO63362 | NULL | 10-4030-015692 | 242864Vi83167 | NULL | 3151D31F-4020-41B2-87F3-B6732869... |
19704 rows x 29 columns
memory usage: 5.54 MB
name: SalesOrderHeaderRefined
type: getml.DataFrame
url: http://localhost:1709/#/getdataframe/adventure_works/SalesOrderHeaderRefined/
split = getml.data.split.random(train=0.8, test=0.2)
split
0 | train |
---|---|
1 | train |
2 | train |
3 | test |
4 | train |
... |
infinite number of rows
type: StringColumnView
container = getml.data.Container(population=sales_order_header, split=split)
container.add(
product=product,
sales_order_detail=sales_order_detail,
sales_order_header=sales_order_header,
sales_order_reason=sales_order_reason,
special_offer=special_offer,
store=store,
)
container
subset | name | rows | type | |
---|---|---|---|---|
0 | test | SalesOrderHeaderRefined | 3879 | View |
1 | train | SalesOrderHeaderRefined | 15825 | View |
alias | name | rows | type | |
---|---|---|---|---|
0 | product | Product | 504 | DataFrame |
1 | sales_order_detail | SalesOrderDetail | 121317 | DataFrame |
2 | sales_order_header | SalesOrderHeaderRefined | 19704 | DataFrame |
3 | sales_order_reason | SalesOrderHeaderSalesReason | 27647 | DataFrame |
4 | special_offer | SpecialOffer | 16 | DataFrame |
5 | store | Store | 701 | DataFrame |
We loaded the data and defined the roles and units. Next, we create a getML pipeline for relational learning.
dm = getml.data.DataModel(sales_order_header.to_placeholder("population"))
dm.add(getml.data.to_placeholder(
product=product,
sales_order_detail=sales_order_detail,
sales_order_header=sales_order_header,
sales_order_reason=sales_order_reason,
special_offer=special_offer,
store=store,
))
dm.population.join(
dm.sales_order_header,
on="CustomerID",
time_stamps="OrderDate",
lagged_targets=True,
horizon=getml.data.time.days(1),
)
dm.population.join(
dm.sales_order_detail,
on="SalesOrderID",
)
dm.population.join(
dm.sales_order_reason,
on="SalesOrderID",
)
dm.population.join(
dm.store,
on="SalesPersonID",
)
dm.sales_order_detail.join(
dm.product,
on="ProductID",
relationship=getml.data.relationship.many_to_one,
)
dm.sales_order_detail.join(
dm.special_offer,
on="SpecialOfferID",
relationship=getml.data.relationship.many_to_one,
)
dm
data frames | staging table | |
---|---|---|
0 | population | POPULATION__STAGING_TABLE_1 |
1 | sales_order_detail, product, special_offer | SALES_ORDER_DETAIL__STAGING_TABLE_2 |
2 | sales_order_header | SALES_ORDER_HEADER__STAGING_TABLE_3 |
3 | sales_order_reason | SALES_ORDER_REASON__STAGING_TABLE_4 |
4 | store | STORE__STAGING_TABLE_5 |
Set-up the feature learner & predictor
seasonal = getml.preprocessors.Seasonal()
mapping = getml.preprocessors.Mapping()
fast_prop = getml.feature_learning.FastProp(
loss_function=getml.feature_learning.loss_functions.CrossEntropyLoss,
num_threads=1,
num_features=400,
)
relboost = getml.feature_learning.Relboost(
loss_function=getml.feature_learning.loss_functions.CrossEntropyLoss,
num_threads=1,
)
predictor = getml.predictors.XGBoostClassifier(n_jobs=1)
Build the pipeline
pipe1 = getml.Pipeline(
tags=['fast_prop'],
data_model=dm,
preprocessors=[mapping],
feature_learners=[fast_prop],
predictors=[predictor],
include_categorical=True,
)
pipe1
Pipeline(data_model='population', feature_learners=['FastProp'], feature_selectors=[], include_categorical=True, loss_function=None, peripheral=['product', 'sales_order_detail', 'sales_order_header', 'sales_order_reason', 'special_offer', 'store'], predictors=['XGBoostClassifier'], preprocessors=['Mapping'], share_selected_features=0.5, tags=['fast_prop'])
pipe2 = getml.Pipeline(
tags=['relboost'],
data_model=dm,
preprocessors=[seasonal, mapping],
feature_learners=[relboost],
predictors=[predictor],
include_categorical=True,
)
pipe2
Pipeline(data_model='population', feature_learners=['Relboost'], feature_selectors=[], include_categorical=True, loss_function=None, peripheral=['product', 'sales_order_detail', 'sales_order_header', 'sales_order_reason', 'special_offer', 'store'], predictors=['XGBoostClassifier'], preprocessors=['Seasonal', 'Mapping'], share_selected_features=0.5, tags=['relboost'])
pipe1.check(container.train)
Checking data model... Staging... [========================================] 100% Preprocessing... [========================================] 100% [========================================] 100% Checking... [========================================] 100% INFO [FOREIGN KEYS NOT FOUND]: When joining POPULATION__STAGING_TABLE_1 and SALES_ORDER_REASON__STAGING_TABLE_4 over 'SalesOrderID' and 'SalesOrderID', there are no corresponding entries for 33.769352% of entries in 'SalesOrderID' in 'POPULATION__STAGING_TABLE_1'. You might want to double-check your join keys. INFO [FOREIGN KEYS NOT FOUND]: When joining POPULATION__STAGING_TABLE_1 and STORE__STAGING_TABLE_5 over 'SalesPersonID' and 'SalesPersonID', there are no corresponding entries for 84.941548% of entries in 'SalesPersonID' in 'POPULATION__STAGING_TABLE_1'. You might want to double-check your join keys.
pipe1.fit(container.train)
Checking data model... Staging... [========================================] 100% INFO [FOREIGN KEYS NOT FOUND]: When joining POPULATION__STAGING_TABLE_1 and SALES_ORDER_REASON__STAGING_TABLE_4 over 'SalesOrderID' and 'SalesOrderID', there are no corresponding entries for 33.769352% of entries in 'SalesOrderID' in 'POPULATION__STAGING_TABLE_1'. You might want to double-check your join keys. INFO [FOREIGN KEYS NOT FOUND]: When joining POPULATION__STAGING_TABLE_1 and STORE__STAGING_TABLE_5 over 'SalesPersonID' and 'SalesPersonID', there are no corresponding entries for 84.941548% of entries in 'SalesPersonID' in 'POPULATION__STAGING_TABLE_1'. You might want to double-check your join keys. Staging... [========================================] 100% Preprocessing... [========================================] 100% [========================================] 100% FastProp: Trying 710 features... [========================================] 100% FastProp: Building features... [========================================] 100% XGBoost: Training as predictor... [========================================] 100% Trained pipeline. Time taken: 0h:1m:7.794911
Pipeline(data_model='population', feature_learners=['FastProp'], feature_selectors=[], include_categorical=True, loss_function=None, peripheral=['product', 'sales_order_detail', 'sales_order_header', 'sales_order_reason', 'special_offer', 'store'], predictors=['XGBoostClassifier'], preprocessors=['Mapping'], share_selected_features=0.5, tags=['fast_prop', 'container-qlp0vY'])
url: http://localhost:1709/#/getpipeline/adventure_works/M49AEo/0/
pipe2.check(container.train)
Checking data model... Staging... [========================================] 100% Preprocessing... [========================================] 100% [========================================] 100% Checking... [========================================] 100% INFO [FOREIGN KEYS NOT FOUND]: When joining POPULATION__STAGING_TABLE_1 and SALES_ORDER_REASON__STAGING_TABLE_4 over 'SalesOrderID' and 'SalesOrderID', there are no corresponding entries for 33.769352% of entries in 'SalesOrderID' in 'POPULATION__STAGING_TABLE_1'. You might want to double-check your join keys. INFO [FOREIGN KEYS NOT FOUND]: When joining POPULATION__STAGING_TABLE_1 and STORE__STAGING_TABLE_5 over 'SalesPersonID' and 'SalesPersonID', there are no corresponding entries for 84.941548% of entries in 'SalesPersonID' in 'POPULATION__STAGING_TABLE_1'. You might want to double-check your join keys.
pipe2.fit(container.train)
Checking data model... Staging... [========================================] 100% INFO [FOREIGN KEYS NOT FOUND]: When joining POPULATION__STAGING_TABLE_1 and SALES_ORDER_REASON__STAGING_TABLE_4 over 'SalesOrderID' and 'SalesOrderID', there are no corresponding entries for 33.769352% of entries in 'SalesOrderID' in 'POPULATION__STAGING_TABLE_1'. You might want to double-check your join keys. INFO [FOREIGN KEYS NOT FOUND]: When joining POPULATION__STAGING_TABLE_1 and STORE__STAGING_TABLE_5 over 'SalesPersonID' and 'SalesPersonID', there are no corresponding entries for 84.941548% of entries in 'SalesPersonID' in 'POPULATION__STAGING_TABLE_1'. You might want to double-check your join keys. Staging... [========================================] 100% Preprocessing... [========================================] 100% [========================================] 100% Relboost: Training features... [========================================] 100% Relboost: Building features... [========================================] 100% XGBoost: Training as predictor... [========================================] 100% Trained pipeline. Time taken: 0h:1m:53.175772
Pipeline(data_model='population', feature_learners=['Relboost'], feature_selectors=[], include_categorical=True, loss_function=None, peripheral=['product', 'sales_order_detail', 'sales_order_header', 'sales_order_reason', 'special_offer', 'store'], predictors=['XGBoostClassifier'], preprocessors=['Seasonal', 'Mapping'], share_selected_features=0.5, tags=['relboost', 'container-qlp0vY'])
url: http://localhost:1709/#/getpipeline/adventure_works/xkgbq9/0/
pipe1.score(container.test)
Staging... [========================================] 100% Preprocessing... [========================================] 100% [========================================] 100% FastProp: Building features... [========================================] 100%
date time | set used | target | accuracy | auc | cross entropy | |
---|---|---|---|---|---|---|
0 | 2021-08-23 16:07:18 | train | churn | 0.9151 | 0.9742 | 0.2148 |
1 | 2021-08-23 16:09:19 | test | churn | 0.9129 | 0.9712 | 0.2236 |
pipe2.score(container.test)
Staging... [========================================] 100% Preprocessing... [========================================] 100% [========================================] 100% Relboost: Building features... [========================================] 100%
date time | set used | target | accuracy | auc | cross entropy | |
---|---|---|---|---|---|---|
0 | 2021-08-23 16:09:16 | train | churn | 0.9312 | 0.9832 | 0.1689 |
1 | 2021-08-23 16:09:23 | test | churn | 0.9273 | 0.9776 | 0.1929 |
population_train_pd = container.train.population.drop(container.train.population.roles.unused).to_pandas()
population_test_pd = container.test.population.drop(container.train.population.roles.unused).to_pandas()
product_pd = product.drop(product.roles.unused).to_pandas()
sales_order_detail_pd = sales_order_detail.drop(sales_order_detail.roles.unused).to_pandas()
sales_order_header_pd = sales_order_header.drop(sales_order_header.roles.unused).to_pandas()
sales_order_reason_pd = sales_order_reason.drop(sales_order_reason.roles.unused).to_pandas()
special_offer_pd = special_offer.drop(special_offer.roles.unused).to_pandas()
store_pd = store.drop(store.roles.unused).to_pandas()
featuretools does not support many-to-one joins. Therefore, we must manually merge sales_order_detail_pd
, product_pd
and special_offer_pd
.
sales_order_detail_pd = sales_order_detail_pd.merge(
product_pd,
on="ProductID",
how="left",
)
sales_order_detail_pd = sales_order_detail_pd.merge(
special_offer_pd,
on="SpecialOfferID",
how="left",
)
del sales_order_detail_pd["SalesOrderDetailID"]
del sales_order_detail_pd["ProductID"]
del sales_order_detail_pd["SpecialOfferID"]
sales_order_detail_pd
ModifiedDate | SalesOrderID | OrderQty | UnitPrice | UnitPriceDiscount | LineTotal | MakeFlag | ProductSubcategoryID | ProductModelID | SafetyStockLevel | ReorderPoint | StandardCost | ListPrice | StartDate | EndDate | Category | Description | Type | MinQty | DiscountPct | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2011-05-31 | 43659 | 1.0 | 2024.994 | 0.0 | 2024.994 | 1 | 1 | 19 | 100.0 | 75.0 | 1898.0944 | 3374.99 | 2011-05-01 | 2014-11-30 | No Discount | No Discount | No Discount | 0.0 | 0.0 |
1 | 2011-05-31 | 43659 | 3.0 | 2024.994 | 0.0 | 6074.982 | 1 | 1 | 19 | 100.0 | 75.0 | 1898.0944 | 3374.99 | 2011-05-01 | 2014-11-30 | No Discount | No Discount | No Discount | 0.0 | 0.0 |
2 | 2011-05-31 | 43659 | 1.0 | 2024.994 | 0.0 | 2024.994 | 1 | 1 | 19 | 100.0 | 75.0 | 1898.0944 | 3374.99 | 2011-05-01 | 2014-11-30 | No Discount | No Discount | No Discount | 0.0 | 0.0 |
3 | 2011-05-31 | 43659 | 1.0 | 2039.994 | 0.0 | 2039.994 | 1 | 1 | 19 | 100.0 | 75.0 | 1912.1544 | 3399.99 | 2011-05-01 | 2014-11-30 | No Discount | No Discount | No Discount | 0.0 | 0.0 |
4 | 2011-05-31 | 43659 | 1.0 | 2039.994 | 0.0 | 2039.994 | 1 | 1 | 19 | 100.0 | 75.0 | 1912.1544 | 3399.99 | 2011-05-01 | 2014-11-30 | No Discount | No Discount | No Discount | 0.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
121312 | 2014-06-30 | 75122 | 1.0 | 21.980 | 0.0 | 21.980 | 0 | 30 | 121 | 4.0 | 3.0 | 8.2205 | 21.98 | 2011-05-01 | 2014-11-30 | No Discount | No Discount | No Discount | 0.0 | 0.0 |
121313 | 2014-06-30 | 75122 | 1.0 | 8.990 | 0.0 | 8.990 | 0 | 19 | 2 | 4.0 | 3.0 | 6.9223 | 8.99 | 2011-05-01 | 2014-11-30 | No Discount | No Discount | No Discount | 0.0 | 0.0 |
121314 | 2014-06-30 | 75123 | 1.0 | 21.980 | 0.0 | 21.980 | 0 | 30 | 121 | 4.0 | 3.0 | 8.2205 | 21.98 | 2011-05-01 | 2014-11-30 | No Discount | No Discount | No Discount | 0.0 | 0.0 |
121315 | 2014-06-30 | 75123 | 1.0 | 159.000 | 0.0 | 159.000 | 0 | 27 | 122 | 4.0 | 3.0 | 59.4660 | 159.00 | 2011-05-01 | 2014-11-30 | No Discount | No Discount | No Discount | 0.0 | 0.0 |
121316 | 2014-06-30 | 75123 | 1.0 | 8.990 | 0.0 | 8.990 | 0 | 19 | 2 | 4.0 | 3.0 | 6.9223 | 8.99 | 2011-05-01 | 2014-11-30 | No Discount | No Discount | No Discount | 0.0 | 0.0 |
121317 rows × 20 columns
def prepare_sales_order_header(peripheral_pd, train_or_test):
"""
Helper function that imitates the behavior of
the data model defined above.
"""
peripheral_new = peripheral_pd.merge(
train_or_test[["CustomerID", "OrderDate", "SalesOrderID"]],
on="CustomerID"
)
peripheral_new = peripheral_new[
peripheral_new["OrderDate_x"] < peripheral_new["OrderDate_y"]
]
del peripheral_new["SalesOrderID_x"]
del peripheral_new["OrderDate_y"]
del peripheral_new["CustomerID"]
del peripheral_new["SalesPersonIDCat"]
del peripheral_new["TerritoryIDCat"]
return peripheral_new.rename(columns={"OrderDate_x": "OrderDate", "SalesOrderID_y": "SalesOrderID"})
def prepare_store(peripheral_pd, train_or_test):
"""
Helper function that imitates the behavior of
the data model defined above.
"""
peripheral_new = peripheral_pd.merge(
train_or_test[["SalesPersonID", "SalesOrderID"]],
on="SalesPersonID"
)
return peripheral_new
store_train_pd = prepare_store(store_pd, population_train_pd)
store_test_pd = prepare_store(store_pd, population_test_pd)
store_train_pd
test | SalesPersonID | SalesOrderID | |
---|---|---|---|
0 | 2014-09-12 11:15:07 | 279 | 43659 |
1 | 2014-09-12 11:15:07 | 279 | 43660 |
2 | 2014-09-12 11:15:07 | 279 | 43681 |
3 | 2014-09-12 11:15:07 | 279 | 43685 |
4 | 2014-09-12 11:15:07 | 279 | 43695 |
... | ... | ... | ... |
142427 | NaT | 290 | 63216 |
142428 | NaT | 290 | 63217 |
142429 | NaT | 290 | 63223 |
142430 | NaT | 290 | 63282 |
142431 | NaT | 290 | 63284 |
142432 rows × 3 columns
sales_order_header_train_pd = prepare_sales_order_header(sales_order_header_pd, population_train_pd)
sales_order_header_test_pd = prepare_sales_order_header(sales_order_header_pd, population_test_pd)
sales_order_header_train_pd
OrderDate | DueDate | ShipDate | ModifiedDate | SalesPersonID | TerritoryID | churn | RevisionNumber | OnlineOrderFlag | ShipMethodID | SubTotal | TaxAmt | Freight | TotalDue | SalesOrderID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2011-05-31 | 2011-06-12 | 2011-06-07 | 2011-06-07 | 279 | 5 | 0.0 | 8 | 0 | 5 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | 44305 |
2 | 2011-05-31 | 2011-06-12 | 2011-06-07 | 2011-06-07 | 279 | 5 | 0.0 | 8 | 0 | 5 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | 45061 |
3 | 2011-05-31 | 2011-06-12 | 2011-06-07 | 2011-06-07 | 279 | 5 | 0.0 | 8 | 0 | 5 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | 45779 |
4 | 2011-05-31 | 2011-06-12 | 2011-06-07 | 2011-06-07 | 279 | 5 | 0.0 | 8 | 0 | 5 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | 46604 |
5 | 2011-05-31 | 2011-06-12 | 2011-06-07 | 2011-06-07 | 279 | 5 | 0.0 | 8 | 0 | 5 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | 47693 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
39894 | 2013-12-15 | 2013-12-27 | 2013-12-22 | 2013-12-22 | NULL | 6 | 1.0 | 8 | 1 | 1 | 32.6000 | 2.6080 | 0.8150 | 36.0230 | 62799 |
39908 | 2013-12-13 | 2013-12-25 | 2013-12-20 | 2013-12-20 | NULL | 4 | 1.0 | 8 | 1 | 1 | 42.2800 | 3.3824 | 1.0570 | 46.7194 | 62770 |
39952 | 2013-12-15 | 2013-12-27 | 2013-12-22 | 2013-12-22 | NULL | 1 | 1.0 | 8 | 1 | 1 | 53.9900 | 4.3192 | 1.3498 | 59.6590 | 62796 |
40080 | 2013-12-19 | 2013-12-31 | 2013-12-26 | 2013-12-26 | NULL | 4 | 1.0 | 8 | 1 | 1 | 35.0000 | 2.8000 | 0.8750 | 38.6750 | 62643 |
40101 | 2013-12-20 | 2014-01-01 | 2013-12-27 | 2013-12-27 | NULL | 10 | 1.0 | 8 | 1 | 1 | 12.9400 | 1.0352 | 0.3235 | 14.2987 | 62914 |
12271 rows × 15 columns
del population_train_pd["CustomerID"]
del population_train_pd["SalesPersonIDCat"]
del population_test_pd["CustomerID"]
del population_test_pd["SalesPersonIDCat"]
featuretools's entities are similar for getML's container class.
entities_train = {
"population" : (population_train_pd, "SalesOrderID"),
"sales_order_header": (sales_order_header_train_pd, "index"),
"sales_order_detail": (sales_order_detail_pd, "index"),
"sales_order_reason": (sales_order_reason_pd, "index"),
"store": (store_train_pd, "index"),
}
entities_test = {
"population" : (population_test_pd, "SalesOrderID"),
"sales_order_header": (sales_order_header_test_pd, "index"),
"sales_order_detail": (sales_order_detail_pd, "index"),
"sales_order_reason": (sales_order_reason_pd, "index"),
"store": (store_test_pd, "index"),
}
featuretools's relationships are similar for getML's data model.
relationships = [
("population", "SalesOrderID", "sales_order_header", "SalesOrderID"),
("population", "SalesOrderID", "sales_order_detail", "SalesOrderID"),
("population", "SalesOrderID", "sales_order_reason", "SalesOrderID"),
("population", "SalesOrderID", "store", "SalesOrderID"),
]
featuretools_train_pd = featuretools.dfs(
entities=entities_train,
relationships=relationships,
target_entity="population")[0]
/home/patrick/.local/lib/python3.9/site-packages/featuretools/entityset/entity.py:462: UserWarning: index index not found in dataframe, creating new integer column warnings.warn("index {} not found in dataframe, creating new "
featuretools_test_pd = featuretools.dfs(
entities=entities_test,
relationships=relationships,
target_entity="population")[0]
featuretools_train = getml.data.DataFrame.from_pandas(featuretools_train_pd, "featuretools_train")
featuretools_test = getml.data.DataFrame.from_pandas(featuretools_test_pd, "featuretools_test")
featuretools_train.set_role("churn", getml.data.roles.target)
featuretools_train.set_role(featuretools_train.roles.unused_float, getml.data.roles.numerical)
featuretools_train.set_role(featuretools_train.roles.unused_string, getml.data.roles.categorical)
featuretools_train
91.014218% of all entries of column 'SKEW(sales_order_header.Freight)' are NULL values. 91.014218% of all entries of column 'SKEW(sales_order_header.SubTotal)' are NULL values. 91.014218% of all entries of column 'SKEW(sales_order_header.TaxAmt)' are NULL values. 91.014218% of all entries of column 'SKEW(sales_order_header.TotalDue)' are NULL values. 91.014218% of all entries of column 'SKEW(sales_order_header.churn)' are NULL values.
name | churn | SalesPersonID | TerritoryID | RevisionNumber | OnlineOrderFlag | TerritoryIDCat | ShipMethodID | MODE(sales_order_header.OnlineOrderFlag) | MODE(sales_order_header.RevisionNumber) | MODE(sales_order_header.SalesPersonID) | MODE(sales_order_header.ShipMethodID) | MODE(sales_order_header.TerritoryID) | MODE(sales_order_detail.Category) | MODE(sales_order_detail.Description) | MODE(sales_order_detail.MakeFlag) | MODE(sales_order_detail.ProductModelID) | MODE(sales_order_detail.ProductSubcategoryID) | MODE(sales_order_detail.Type) | MODE(sales_order_reason.SalesReasonID) | MODE(store.SalesPersonID) | SubTotal | TaxAmt | Freight | TotalDue | COUNT(sales_order_header) | MAX(sales_order_header.Freight) | MAX(sales_order_header.SubTotal) | MAX(sales_order_header.TaxAmt) | MAX(sales_order_header.TotalDue) | MAX(sales_order_header.churn) | MEAN(sales_order_header.Freight) | MEAN(sales_order_header.SubTotal) | MEAN(sales_order_header.TaxAmt) | MEAN(sales_order_header.TotalDue) | MEAN(sales_order_header.churn) | MIN(sales_order_header.Freight) | MIN(sales_order_header.SubTotal) | MIN(sales_order_header.TaxAmt) | MIN(sales_order_header.TotalDue) | MIN(sales_order_header.churn) | NUM_UNIQUE(sales_order_header.OnlineOrderFlag) | NUM_UNIQUE(sales_order_header.RevisionNumber) | NUM_UNIQUE(sales_order_header.SalesPersonID) | NUM_UNIQUE(sales_order_header.ShipMethodID) | NUM_UNIQUE(sales_order_header.TerritoryID) | SKEW(sales_order_header.Freight) | SKEW(sales_order_header.SubTotal) | SKEW(sales_order_header.TaxAmt) | SKEW(sales_order_header.TotalDue) | SKEW(sales_order_header.churn) | STD(sales_order_header.Freight) | STD(sales_order_header.SubTotal) | STD(sales_order_header.TaxAmt) | STD(sales_order_header.TotalDue) | STD(sales_order_header.churn) | SUM(sales_order_header.Freight) | SUM(sales_order_header.SubTotal) | SUM(sales_order_header.TaxAmt) | SUM(sales_order_header.TotalDue) | SUM(sales_order_header.churn) | COUNT(sales_order_detail) | MAX(sales_order_detail.DiscountPct) | MAX(sales_order_detail.LineTotal) | MAX(sales_order_detail.ListPrice) | MAX(sales_order_detail.MinQty) | MAX(sales_order_detail.OrderQty) | MAX(sales_order_detail.ReorderPoint) | MAX(sales_order_detail.SafetyStockLevel) | MAX(sales_order_detail.StandardCost) | MAX(sales_order_detail.UnitPrice) | MAX(sales_order_detail.UnitPriceDiscount) | MEAN(sales_order_detail.DiscountPct) | MEAN(sales_order_detail.LineTotal) | MEAN(sales_order_detail.ListPrice) | MEAN(sales_order_detail.MinQty) | MEAN(sales_order_detail.OrderQty) | MEAN(sales_order_detail.ReorderPoint) | MEAN(sales_order_detail.SafetyStockLevel) | MEAN(sales_order_detail.StandardCost) | MEAN(sales_order_detail.UnitPrice) | MEAN(sales_order_detail.UnitPriceDiscount) | MIN(sales_order_detail.DiscountPct) | MIN(sales_order_detail.LineTotal) | MIN(sales_order_detail.ListPrice) | MIN(sales_order_detail.MinQty) | MIN(sales_order_detail.OrderQty) | MIN(sales_order_detail.ReorderPoint) | MIN(sales_order_detail.SafetyStockLevel) | MIN(sales_order_detail.StandardCost) | MIN(sales_order_detail.UnitPrice) | MIN(sales_order_detail.UnitPriceDiscount) | NUM_UNIQUE(sales_order_detail.Category) | NUM_UNIQUE(sales_order_detail.Description) | NUM_UNIQUE(sales_order_detail.MakeFlag) | NUM_UNIQUE(sales_order_detail.ProductModelID) | NUM_UNIQUE(sales_order_detail.ProductSubcategoryID) | NUM_UNIQUE(sales_order_detail.Type) | SKEW(sales_order_detail.DiscountPct) | SKEW(sales_order_detail.LineTotal) | SKEW(sales_order_detail.ListPrice) | SKEW(sales_order_detail.MinQty) | SKEW(sales_order_detail.OrderQty) | SKEW(sales_order_detail.ReorderPoint) | SKEW(sales_order_detail.SafetyStockLevel) | SKEW(sales_order_detail.StandardCost) | SKEW(sales_order_detail.UnitPrice) | SKEW(sales_order_detail.UnitPriceDiscount) | STD(sales_order_detail.DiscountPct) | STD(sales_order_detail.LineTotal) | STD(sales_order_detail.ListPrice) | STD(sales_order_detail.MinQty) | STD(sales_order_detail.OrderQty) | STD(sales_order_detail.ReorderPoint) | STD(sales_order_detail.SafetyStockLevel) | STD(sales_order_detail.StandardCost) | STD(sales_order_detail.UnitPrice) | STD(sales_order_detail.UnitPriceDiscount) | SUM(sales_order_detail.DiscountPct) | SUM(sales_order_detail.LineTotal) | SUM(sales_order_detail.ListPrice) | SUM(sales_order_detail.MinQty) | SUM(sales_order_detail.OrderQty) | SUM(sales_order_detail.ReorderPoint) | SUM(sales_order_detail.SafetyStockLevel) | SUM(sales_order_detail.StandardCost) | SUM(sales_order_detail.UnitPrice) | SUM(sales_order_detail.UnitPriceDiscount) | COUNT(sales_order_reason) | NUM_UNIQUE(sales_order_reason.SalesReasonID) | COUNT(store) | NUM_UNIQUE(store.SalesPersonID) | DAY(DueDate) | DAY(ModifiedDate) | DAY(OrderDate) | DAY(ShipDate) | MONTH(DueDate) | MONTH(ModifiedDate) | MONTH(OrderDate) | MONTH(ShipDate) | WEEKDAY(DueDate) | WEEKDAY(ModifiedDate) | WEEKDAY(OrderDate) | WEEKDAY(ShipDate) | YEAR(DueDate) | YEAR(ModifiedDate) | YEAR(OrderDate) | YEAR(ShipDate) | MODE(sales_order_header.DAY(DueDate)) | MODE(sales_order_header.DAY(ModifiedDate)) | MODE(sales_order_header.DAY(OrderDate)) | MODE(sales_order_header.DAY(ShipDate)) | MODE(sales_order_header.MONTH(DueDate)) | MODE(sales_order_header.MONTH(ModifiedDate)) | MODE(sales_order_header.MONTH(OrderDate)) | MODE(sales_order_header.MONTH(ShipDate)) | MODE(sales_order_header.WEEKDAY(DueDate)) | MODE(sales_order_header.WEEKDAY(ModifiedDate)) | MODE(sales_order_header.WEEKDAY(OrderDate)) | MODE(sales_order_header.WEEKDAY(ShipDate)) | MODE(sales_order_header.YEAR(DueDate)) | MODE(sales_order_header.YEAR(ModifiedDate)) | MODE(sales_order_header.YEAR(OrderDate)) | MODE(sales_order_header.YEAR(ShipDate)) | NUM_UNIQUE(sales_order_header.DAY(DueDate)) | NUM_UNIQUE(sales_order_header.DAY(ModifiedDate)) | NUM_UNIQUE(sales_order_header.DAY(OrderDate)) | NUM_UNIQUE(sales_order_header.DAY(ShipDate)) | NUM_UNIQUE(sales_order_header.MONTH(DueDate)) | NUM_UNIQUE(sales_order_header.MONTH(ModifiedDate)) | NUM_UNIQUE(sales_order_header.MONTH(OrderDate)) | NUM_UNIQUE(sales_order_header.MONTH(ShipDate)) | NUM_UNIQUE(sales_order_header.WEEKDAY(DueDate)) | NUM_UNIQUE(sales_order_header.WEEKDAY(ModifiedDate)) | NUM_UNIQUE(sales_order_header.WEEKDAY(OrderDate)) | NUM_UNIQUE(sales_order_header.WEEKDAY(ShipDate)) | NUM_UNIQUE(sales_order_header.YEAR(DueDate)) | NUM_UNIQUE(sales_order_header.YEAR(ModifiedDate)) | NUM_UNIQUE(sales_order_header.YEAR(OrderDate)) | NUM_UNIQUE(sales_order_header.YEAR(ShipDate)) | MODE(sales_order_detail.DAY(EndDate)) | MODE(sales_order_detail.DAY(ModifiedDate)) | MODE(sales_order_detail.DAY(StartDate)) | MODE(sales_order_detail.MONTH(EndDate)) | MODE(sales_order_detail.MONTH(ModifiedDate)) | MODE(sales_order_detail.MONTH(StartDate)) | MODE(sales_order_detail.WEEKDAY(EndDate)) | MODE(sales_order_detail.WEEKDAY(ModifiedDate)) | MODE(sales_order_detail.WEEKDAY(StartDate)) | MODE(sales_order_detail.YEAR(EndDate)) | MODE(sales_order_detail.YEAR(ModifiedDate)) | MODE(sales_order_detail.YEAR(StartDate)) | NUM_UNIQUE(sales_order_detail.DAY(EndDate)) | NUM_UNIQUE(sales_order_detail.DAY(ModifiedDate)) | NUM_UNIQUE(sales_order_detail.DAY(StartDate)) | NUM_UNIQUE(sales_order_detail.MONTH(EndDate)) | NUM_UNIQUE(sales_order_detail.MONTH(ModifiedDate)) | NUM_UNIQUE(sales_order_detail.MONTH(StartDate)) | NUM_UNIQUE(sales_order_detail.WEEKDAY(EndDate)) | NUM_UNIQUE(sales_order_detail.WEEKDAY(ModifiedDate)) | NUM_UNIQUE(sales_order_detail.WEEKDAY(StartDate)) | NUM_UNIQUE(sales_order_detail.YEAR(EndDate)) | NUM_UNIQUE(sales_order_detail.YEAR(ModifiedDate)) | NUM_UNIQUE(sales_order_detail.YEAR(StartDate)) | MODE(store.DAY(test)) | MODE(store.MONTH(test)) | MODE(store.WEEKDAY(test)) | MODE(store.YEAR(test)) | NUM_UNIQUE(store.DAY(test)) | NUM_UNIQUE(store.MONTH(test)) | NUM_UNIQUE(store.WEEKDAY(test)) | NUM_UNIQUE(store.YEAR(test)) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
role | target | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical |
0 | 0 | 279 | 5 | 8 | 0 | 5 | 5 | NULL | NULL | NULL | NULL | NULL | No Discount | No Discount | 1 | 19 | 1 | No Discount | NULL | 279 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 12 | 0 | 6074.982 | 3399.99 | 0 | 6 | 75 | 100 | 1912.1544 | 2039.994 | 0 | 0 | 1713.8017 | 1989.8658 | 0 | 2.1667 | 45 | 60 | 1120.2742 | 1193.6427 | 0 | 0 | 10.373 | 8.99 | 0 | 1 | 3 | 4 | 3.3963 | 5.1865 | 0 | 1 | 1 | 2 | 5 | 5 | 1 | 0 | 1.1963 | -0.3885 | 0 | 1.4799 | -0.3884 | -0.3884 | -0.3887 | -0.3885 | 0 | 0 | 1883.1394 | 1729.5034 | 0 | 1.5859 | 37.0749 | 49.4332 | 971.2526 | 1038.0419 | 0 | 0 | 20565.6206 | 23878.39 | 0 | 26 | 540 | 720 | 13443.2903 | 14323.7118 | 0 | 0 | nan | 80 | 1 | 12 | 7 | 31 | 7 | 6 | 6 | 5 | 6 | 6 | 1 | 1 | 1 | 2011 | 2011 | 2011 | 2011 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 30 | 31 | 1 | 11 | 5 | 5 | 6 | 1 | 6 | 2014 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 |
1 | 0 | 279 | 5 | 8 | 0 | 5 | 5 | NULL | NULL | NULL | NULL | NULL | No Discount | No Discount | 1 | 28 | 2 | No Discount | NULL | 279 | 1294.2529 | 124.2483 | 38.8276 | 1457.3288 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 874.794 | 1457.99 | 0 | 1 | 75 | 100 | 884.7083 | 874.794 | 0 | 0 | 647.1264 | 1120.49 | 0 | 1 | 75 | 100 | 685.7074 | 647.1264 | 0 | 0 | 419.4589 | 782.99 | 0 | 1 | 75 | 100 | 486.7066 | 419.4589 | 0 | 1 | 1 | 1 | 2 | 1 | 1 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 321.9705 | 477.2971 | 0 | 0 | 0 | 0 | 281.4297 | 321.9705 | 0 | 0 | 1294.2529 | 2240.98 | 0 | 2 | 150 | 200 | 1371.4149 | 1294.2529 | 0 | 0 | nan | 80 | 1 | 12 | 7 | 31 | 7 | 6 | 6 | 5 | 6 | 6 | 1 | 1 | 1 | 2011 | 2011 | 2011 | 2011 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 30 | 31 | 1 | 11 | 5 | 5 | 6 | 1 | 6 | 2014 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 |
2 | 0 | 282 | 6 | 8 | 0 | 6 | 5 | NULL | NULL | NULL | NULL | NULL | No Discount | No Discount | 1 | 19 | 1 | No Discount | NULL | 282 | 32726.4786 | 3153.7696 | 985.553 | 36865.8012 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 15 | 0 | 8099.976 | 3399.99 | 0 | 5 | 375 | 500 | 1912.1544 | 2039.994 | 0 | 0 | 2181.7652 | 1590.4467 | 0 | 2.5333 | 151 | 201.3333 | 883.0532 | 934.9116 | 0 | 0 | 20.746 | 8.99 | 0 | 1 | 3 | 4 | 6.9223 | 5.1865 | 0 | 1 | 1 | 2 | 5 | 5 | 1 | 0 | 1.2022 | 0.2799 | 0 | 0.7955 | 0.6718 | 0.6718 | 0.3285 | 0.348 | 0 | 0 | 2515.2803 | 1423.6488 | 0 | 1.1872 | 166.7505 | 222.3339 | 802.3616 | 858.7914 | 0 | 0 | 32726.4786 | 23856.7 | 0 | 38 | 2265 | 3020 | 13245.7975 | 14023.6738 | 0 | 0 | nan | 74 | 1 | 12 | 7 | 31 | 7 | 6 | 6 | 5 | 6 | 6 | 1 | 1 | 1 | 2011 | 2011 | 2011 | 2011 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 30 | 31 | 1 | 11 | 5 | 5 | 6 | 1 | 6 | 2014 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 |
3 | 0 | 276 | 4 | 8 | 0 | 4 | 5 | NULL | NULL | NULL | NULL | NULL | No Discount | No Discount | 1 | 30 | 2 | No Discount | NULL | 276 | 419.4589 | 40.2681 | 12.5838 | 472.3108 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 419.4589 | 782.99 | 0 | 1 | 75 | 100 | 486.7066 | 419.4589 | 0 | 0 | 419.4589 | 782.99 | 0 | 1 | 75 | 100 | 486.7066 | 419.4589 | 0 | 0 | 419.4589 | 782.99 | 0 | 1 | 75 | 100 | 486.7066 | 419.4589 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 419.4589 | 782.99 | 0 | 1 | 75 | 100 | 486.7066 | 419.4589 | 0 | 0 | nan | 39 | 1 | 12 | 7 | 31 | 7 | 6 | 6 | 5 | 6 | 6 | 1 | 1 | 1 | 2011 | 2011 | 2011 | 2011 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 30 | 31 | 1 | 11 | 5 | 5 | 6 | 1 | 6 | 2014 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 |
4 | 0 | 280 | 1 | 8 | 0 | 1 | 5 | NULL | NULL | NULL | NULL | NULL | No Discount | No Discount | 1 | 19 | 1 | No Discount | NULL | 280 | 24432.6088 | 2344.9921 | 732.81 | 27510.4109 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 8 | 0 | 8099.976 | 3399.99 | 0 | 4 | 75 | 100 | 1912.1544 | 2039.994 | 0 | 0 | 3054.0761 | 2553.115 | 0 | 1.75 | 57 | 76 | 1438.4664 | 1531.5806 | 0 | 0 | 28.8404 | 49.99 | 0 | 1 | 3 | 4 | 38.4923 | 28.8404 | 0 | 1 | 1 | 2 | 2 | 2 | 1 | 0 | 0.8117 | -1.4399 | 0 | 1.3554 | -1.4402 | -1.4402 | -1.4399 | -1.4399 | 0 | 0 | 2860.3836 | 1545.0056 | 0 | 1.165 | 33.3295 | 44.4394 | 864.1073 | 927.5374 | 0 | 0 | 24432.6088 | 20424.92 | 0 | 14 | 456 | 608 | 11507.731 | 12252.6448 | 0 | 0 | nan | 38 | 1 | 12 | 7 | 31 | 7 | 6 | 6 | 5 | 6 | 6 | 1 | 1 | 1 | 2011 | 2011 | 2011 | 2011 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 30 | 31 | 1 | 11 | 5 | 5 | 6 | 1 | 6 | 2014 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
15820 | 1 | NULL | 7 | 8 | 1 | 7 | 1 | NULL | NULL | NULL | NULL | NULL | No Discount | No Discount | 0 | 29 | 37 | No Discount | 1 | NULL | 1173.96 | 93.9168 | 29.349 | 1297.2258 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 1120.49 | 1120.49 | 0 | 1 | 375 | 500 | 713.0798 | 1120.49 | 0 | 0 | 293.49 | 293.49 | 0 | 1 | 207 | 276 | 183.2694 | 293.49 | 0 | 0 | 3.99 | 3.99 | 0 | 1 | 3 | 4 | 1.4923 | 3.99 | 0 | 1 | 1 | 2 | 4 | 3 | 1 | 0 | 1.9981 | 1.9981 | 0 | 0 | -0.1153 | -0.1153 | 1.9994 | 1.9981 | 0 | 0 | 551.4201 | 551.4201 | 0 | 0 | 196.204 | 261.6053 | 353.2259 | 551.4201 | 0 | 0 | 1173.96 | 1173.96 | 0 | 4 | 828 | 1104 | 733.0777 | 1173.96 | 0 | 1 | 1 | 0 | nan | 12 | 7 | 31 | 7 | 1 | 1 | 12 | 1 | 6 | 1 | 1 | 1 | 2014 | 2014 | 2013 | 2014 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 30 | 31 | 1 | 11 | 12 | 5 | 6 | 1 | 6 | 2014 | 2013 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | nan | nan | nan | nan | nan | nan | nan | nan |
15821 | 1 | NULL | 10 | 8 | 1 | 10 | 1 | NULL | NULL | NULL | NULL | NULL | No Discount | No Discount | 0 | 11 | 19 | No Discount | 1 | NULL | 1179.47 | 94.3576 | 29.4868 | 1303.3144 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 1120.49 | 1120.49 | 0 | 1 | 75 | 100 | 713.0798 | 1120.49 | 0 | 0 | 393.1567 | 393.1567 | 0 | 1 | 27 | 36 | 252.8315 | 393.1567 | 0 | 0 | 8.99 | 8.99 | 0 | 1 | 3 | 4 | 6.9223 | 8.99 | 0 | 1 | 1 | 2 | 3 | 3 | 1 | 0 | 1.7238 | 1.7238 | 0 | 0 | 1.7321 | 1.7321 | 1.7199 | 1.7238 | 0 | 0 | 630.2226 | 630.2226 | 0 | 0 | 41.5692 | 55.4256 | 398.8992 | 630.2226 | 0 | 0 | 1179.47 | 1179.47 | 0 | 3 | 81 | 108 | 758.4944 | 1179.47 | 0 | 1 | 1 | 0 | nan | 12 | 7 | 31 | 7 | 1 | 1 | 12 | 1 | 6 | 1 | 1 | 1 | 2014 | 2014 | 2013 | 2014 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 30 | 31 | 1 | 11 | 12 | 5 | 6 | 1 | 6 | 2014 | 2013 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | nan | nan | nan | nan | nan | nan | nan |