dstack 0.6.4: simpler API, grid layout, tqdm, scikit-learn pipelines, and more!

Based on the feedback from our users, we reworked a few things to improve the product experience. Today we are very happy to share with you the updates.

Simpler and intuitive API

First and foremost, the application API is now much easier to write and read. Here’s an example that demonstrates the ease of using dstack APIs such as adding new controls, creating sidebars to your application, and loading data among others.

import dstack as ds
import plotly.express as px

app = ds.app()  # Create an instance of the application


# An utility function that loads the data
def get_data():
    return px.data.stocks()


siebar = app.sidebar()  # Create a sidebar

# A drop-down control that shows stock symbols
stock = siebar.select(items=get_data().columns[1:].tolist())


# A handler that updates the plot based on the selected stock
def output_handler(self, stock):
    symbol = stock.value()  # The selected stock
    # A plotly line chart where the X axis is date and Y is the stock's price
    self.data = px.line(get_data(), x='date', y=symbol)


# A plotly chart output
app.output(handler=output_handler, depends=[stock])

# Deploy the application with the name "stocks_sidebar" and print its URL
url = app.deploy("stocks_sidebar")
print(url)

If you run the code, here’s what you’ll see:

ds_stocks_sidebar.png

Here’s another example to show how you can have multiple tabs within the same application:

import dstack as ds
import plotly.express as px

# Create an instance of the application
app = ds.app()

# Create a tab
scatter_tab = app.tab("Scatter Chart")

# Create an output with a chart
scatter_tab.output(data=px.scatter(px.data.iris(), x="sepal_width", y="sepal_length", color="species"))

# Create a tab
bar_tab = app.tab("Bar Chart")

# Create an output with a chart
bar_tab.output(data=px.bar(px.data.tips(), x="sex", y="total_bill", color="smoker", barmode="group"))

# Deploy the application with the name "tabs" and print its URL
url = app.deploy("tabs")
print(url)

If you run this code and open the application, here’s what you’ll see:

ds_tabs.png

You can find lots of examples of the new API in our reworked documentation.

Please note that the update is not backward compatible because of the changes made to the API. If you update to v0.6.4, you'll have to redeploy your applications using the new API.

Grid layout for controls

It is now possible to define how many columns and rows each control can take using the new ‘grid’ layout mechanism.

Here’s how it works:

import dstack as ds

# Create an instance of the application that has three columns
app = ds.app(columns=3)

# An input that takes one column and one row
input_1 = app.input(label="Input 1", colspan=1)
# An input that takes one column and one row
input_2 = app.input(label="Input 2", colspan=1)
# An input that takes one column and two rows
input_3 = app.input(label="Input 3", colspan=1, rowspan=2)

url = app.deploy("layout_1")
print(url)

If you run this code and open the application, here’s what you’ll see:

ds_layout_1.png

Together with a sidebar, this allows you to have a flexible layout for your application:

ds_outputs.png

To learn more about how the grid layout mechanism works, please read the documentation.

tqdm Integration to view progress of application

The new update allows the application now to report the progress via the tqdm package. In order to do that, you have to use dstack.tqdm or dstack.trange (instead of tqdm.tqdm and tqdm.trange).

Here's a quick example:

from time import sleep
import dstack as ds

from dstack import trange

# Create an instance of the application
app = ds.app()


# A handler that sets the text to the markdown control
def markdown_handler(self):
    for _ in trange(100):
        sleep(0.5)
    self.text = "Finished"


# A markdown control
app.markdown(handler=markdown_handler)

# Deploy the application with the name "tqdm" and print its URL
result = app.deploy("tqdm")
print(result.url)

If you run it and open the application, you'll see the following while the application is executing:

[ds_tqdm.png

The modules dstack.tqdm or dstack.trange support everything the standard tqdm.tqdm and tqdm.trange packages support.

Store and retrieve scikit-learn Pipelines using dstack ML registry

Last but not least, the update adds support for sklearn’s pipelines. Now, it's possible to push and pull (store and retrieve) entire ML pipelines which may include your custom transformers.

Here’s an example:

import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
import dstack as ds


# The first step transforms the data into the form suitable for model.
class PrepareData(BaseEstimator, TransformerMixin):
    def __init__(self):
        pass

    def transform(self, X, **transform_params):
        df = X.copy()

        # Add new features: "Years" (the number of years with a positive number of purchased licences)
        def years(row):
            l = [row["y2019"], row["y2018"], row["y2017"], row["y2016"], row["y2015"]]
            return len([x for x in l if x != 0])

        df["Years"] = df.apply(years, axis=1)

        # Drop features that aren't needed
        df = df.drop(["Company", "Region", "Manager", "RenewalMonth", "RenewalDate"], axis=1)

        # Normalize the values of the columns that need it
        for col in ["y2015", "y2016", "y2017", "y2018", "y2019"]:
            df[col] = df[col] / df[col].max()

        # Encode categorical columns into the columns with 0 and 1 -s (required by Logistic Regression)
        for c in X["Country"].unique():
            df[c] = df["Country"].apply(lambda x: 1 if x == c else 0)
        for s in X["Sector"].unique():
            if s:
                df[s] = df["Sector"].apply(lambda x: 1 if x == s else 0)
        df = df.drop(["Country", "Sector"], axis=1)

        return df

    def fit(self, X, y=None, **fit_params):
        return self


# The second step makes sure the data has exactly the same format as the data used to train the model.
# This step is necessary because the model that will be trained as the part of this pipeline, will
# be re-used later to make predictions based on other data.
class ReindexColumns(BaseEstimator, TransformerMixin):
    def __init__(self, columns):
        self.columns = columns  # the columns of the original data that was used to train the model

    def transform(self, X, **transform_params):
        # Drop the columns that were not present in the original data
        # Add missing columns that were present in the original data with 0 as value
        return X.reindex(columns=self.columns, fill_value=0)

    def fit(self, X, y=None, **fit_params):
        return self


# Read the data. Drop the rows with incomplete data (including the rows without historical churn data).
df = pd.read_csv("https://www.dropbox.com/s/cat8vm6lchlu5tp/data.csv?dl=1", index_col=0).dropna()
X = df.drop(["Churn"], axis=1)  # features
y = df["Churn"]  # target variable

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=99)

# Save the columns of the original processed data to be used in the pipeline.
X_train_columns = PrepareData().transform(X_train)

# Define the pipeline as a chain of steps
pipeline = make_pipeline(
    PrepareData(),  # 1. Transform the data into the form suitable for the model
    ReindexColumns(X_train_columns.columns),
    # 2 Make sure the transformed data has the same format as the original data that was used to train the model
    LogisticRegression()  # Pass the transformed and re-indexed data to the logistic regression
)
# Train the pipeline
pipeline.fit(X_train, y_train)

# deploy the pipeline with the name "tutorials/sklearn_model" and print its URL
url = ds.push("tutorials/sklearn_model", pipeline)
print(url)

Now, if you run this code, the deployed model can be re-used on live data. The model will transform the data as a part of the pipeline.

For more details on how sklearn pipelines can be deployed and re-used from dstack applications, please read the corresponding tutorial.

What’s next?

Our core vision remains to bring ML into business operations. And we will continue simplifying this process by providing low code tools to build ML models and to put them into production within enterprises.

In the short-term, we will improve the experience of building interactive applications to make it even simpler. Once we do that, we’ll also help build and re-use ML models.

Would you like to learn more about our vision, or to share your thoughts with us? Let us know by dropping an email to team@dstack.ai.

Meanwhile, please do try the product, and share your feedback with us in our tracker.

Call for action

  • Star us on GitHub
  • Install the update (pip install dstack==0.6.4 and dstack server start)
  • Follow Quickstart and Tutorials
  • Share your application with others using dstack.cloud.
  • Tell us about your feedback (for bugs, create issues, for other feedback, email to the team at dstack.ai, or ask in Discord)

No Comments Yet