The first ML framework
for relational learning.

Think of getML as Tensorflow – just for relational data.

Why getML?What is getML?How do I use it?Getting started Benchmarks

Why getML?

Machine Learning models need features as an input. But building features by hand is an expensive process. Data scientists and experts spend up to 90% of their time on tasks related to feature engineering. We at getML build general-purpose algorithms for data scientists that automate feature engineering on any kind of relational data.

Billions of features with ~20 lines of Python

Benefits of getML for feature learning

Feature learning boosts your productivity

Feature learning automates manual feature engineering through supervised learning. This is preferable to writing and maintaining hundreds of SQL, pandas or R/data.table scripts for feature engineering. getML's algorithms allow data scientists to build end-to-end prediction pipelines in days instead of months.

Algorithms that discover domain specific patterns

Manual feature engineering is an error-prone, repetitive process that requires countless hours of meetings to obtain domain knowledge from experts. Using feature learning, data scientists let algorithms automatically learn all the relevant features logic straight from relational data.

Great features lead to high ML model accuracy

Improving your model performance starts with finding better features. Feature learning helps you avoid the negative impact of unknown unknowns or common time constraints in the model building phase. getML helps data scientists to deliver the most accurate prediction models, faster.

What is getML?

All you need to build
end-to-end ML pipelines.

Load Data

Python

From Pandas, pyspark, pyarrow, Dict or JSON

Database connectors

Unified import interface for PostgreSQL, MySQL, MariaDB, SQLite3, SAP HANA, Greenplum or from any other ODBC compatible database

File storage

Import your data from CSV, parquet or AWS S3 buckets

Machine Learning

Feature Learning

FastProp, Multirel & Relboost for feature learning from relational data and time series

Prediction

Predict with XGB Regressor, XGB Classifier, logistics & linear or bring your own algorithm

Hyperparameter optimization

Tune hyperparameters on a latin hypercube or using a gaussian search

Evaluate & Deploy

Train pipelines

Wrap feature learner ensembles and predictors in end-to-end ML pipelines

Evaluate

Benchmark models & insights through features

Deploy

Use python, or deploy models behind a HTTP model server to serve predictions or feature transforms, or transpile pipelines to SQLite or Spark SQL.

getML is a high-performance machine learning framework to build regression and prediction models on any kind of relational data. It comes with an easy-to-use python API that allows to build end-to-end ML pipelines on terabytes of input data.

For maximum performance and speed

Blazing Fast C++ Engine

Core of the getML framework

Standalone application that handles I/O, feature learning & AutoML

Implements a high-performance data management layer for ML models at terabyte scale

Zero external dependencies

Explore pipelines, data frames, and engine processes

getML Interface

Comes with the getML engine

Web frontend for data exploration, easy inspection of trained models and learned features

Easy to use inside your existing Python codebase

Python API

Open-source license, available on pip

Wrapper around the getML engine for easy integration of relational learning into existing data science workflows

Sends all the instructions & data to the getML engine

How feature learning works

To find the best set of aggregation functions and conditions, getML’s supervised learning algorithms perform an iterative, tree-based search inside relational data. This allows for the automatic generation of complex features for a given target variable on a scale and accuracy that no manual or brute-force approach can match.

How do I use it?

>>> import getml


import getml

getml.set_project("loans")

population_train, population_test, order, trans, meta = getml.datasets.load_loans()

schema = getml.data.StarSchema(
    train=population_train,
    test=population_test,
    alias="population",
)

schema.join(
    trans,
    on="account_id",
    time_stamps=("date_loan", "date"),
)

schema.join(
    order,
    on="account_id",
)

schema.join(
    meta,
    on="account_id",
)

relmt = getml.feature_learning.RelMT(
    loss_function=getml.feature_learning.loss_functions.CrossEntropyLoss,
)

xgboost = getml.predictors.XGBoostClassifier()

pipe = getml.pipeline.Pipeline(
    data_model=schema.data_model,
    feature_learners=relmt,
    predictors=xgboost,
)

pipe.fit(schema.train)

Try getML

It takes less than 30 seconds to get started.

Install getML locally

Starting with getML is as easy as downloading the getML suite and pip-installing the getml python API.

Download getML

Benchmarks

Beating the state-of-the-art in Relational Learning

getML outperforms modern libraries and academic literature in terms of speed and accuracy.

5%

Beating state-of-the-art approaches when classifying a citation network by delivering 5% better results than academia.

Check out notebook "Cora"

11%

Outperforming Facebook’s Prophet by 11 percentage points in one-step-ahead predictions.

Check out notebook "Interstate 94"

179x

Up to 179x faster than popular feature engineering libraries featuretools and tsfresh.

Blog Post: Introducing FastProp

The first ML frameworkfor relational learning.