dstack 0.6.4: simpler API, grid layout, tqdm, scikit-learn pipelines, and more!
Based on the feedback from our users, we reworked a few things to improve the product experience. Today we are very happy to share with you the updates.
Simpler and intuitive API
First and foremost, the application API is now much easier to write and read. Here’s an example that demonstrates the ease of using dstack APIs such as adding new controls, creating sidebars to your application, and loading data among others.
import dstack as ds import plotly.express as px app = ds.app() # Create an instance of the application # An utility function that loads the data def get_data(): return px.data.stocks() siebar = app.sidebar() # Create a sidebar # A drop-down control that shows stock symbols stock = siebar.select(items=get_data().columns[1:].tolist()) # A handler that updates the plot based on the selected stock def output_handler(self, stock): symbol = stock.value() # The selected stock # A plotly line chart where the X axis is date and Y is the stock's price self.data = px.line(get_data(), x='date', y=symbol) # A plotly chart output app.output(handler=output_handler, depends=[stock]) # Deploy the application with the name "stocks_sidebar" and print its URL url = app.deploy("stocks_sidebar") print(url)
If you run the code, here’s what you’ll see:
Here’s another example to show how you can have multiple tabs within the same application:
import dstack as ds import plotly.express as px # Create an instance of the application app = ds.app() # Create a tab scatter_tab = app.tab("Scatter Chart") # Create an output with a chart scatter_tab.output(data=px.scatter(px.data.iris(), x="sepal_width", y="sepal_length", color="species")) # Create a tab bar_tab = app.tab("Bar Chart") # Create an output with a chart bar_tab.output(data=px.bar(px.data.tips(), x="sex", y="total_bill", color="smoker", barmode="group")) # Deploy the application with the name "tabs" and print its URL url = app.deploy("tabs") print(url)
If you run this code and open the application, here’s what you’ll see:
You can find lots of examples of the new API in our reworked documentation.
Please note that the update is not backward compatible because of the changes made to the API. If you update to v0.6.4, you'll have to redeploy your applications using the new API.
Grid layout for controls
It is now possible to define how many columns and rows each control can take using the new ‘grid’ layout mechanism.
Here’s how it works:
import dstack as ds # Create an instance of the application that has three columns app = ds.app(columns=3) # An input that takes one column and one row input_1 = app.input(label="Input 1", colspan=1) # An input that takes one column and one row input_2 = app.input(label="Input 2", colspan=1) # An input that takes one column and two rows input_3 = app.input(label="Input 3", colspan=1, rowspan=2) url = app.deploy("layout_1") print(url)
If you run this code and open the application, here’s what you’ll see:
Together with a sidebar, this allows you to have a flexible layout for your application:
To learn more about how the grid layout mechanism works, please read the documentation.
tqdm Integration to view progress of application
The new update allows the application now to report the progress via the tqdm package. In order to do that, you have to use
dstack.trange (instead of
Here's a quick example:
from time import sleep import dstack as ds from dstack import trange # Create an instance of the application app = ds.app() # A handler that sets the text to the markdown control def markdown_handler(self): for _ in trange(100): sleep(0.5) self.text = "Finished" # A markdown control app.markdown(handler=markdown_handler) # Deploy the application with the name "tqdm" and print its URL result = app.deploy("tqdm") print(result.url)
If you run it and open the application, you'll see the following while the application is executing:
dstack.trange support everything the standard
tqdm.trange packages support.
Store and retrieve scikit-learn Pipelines using dstack ML registry
Last but not least, the update adds support for
sklearn’s pipelines. Now, it's possible to push and pull (store and retrieve) entire ML pipelines which may include your custom transformers.
Here’s an example:
import pandas as pd from sklearn.base import BaseEstimator, TransformerMixin from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.pipeline import make_pipeline import dstack as ds # The first step transforms the data into the form suitable for model. class PrepareData(BaseEstimator, TransformerMixin): def __init__(self): pass def transform(self, X, **transform_params): df = X.copy() # Add new features: "Years" (the number of years with a positive number of purchased licences) def years(row): l = [row["y2019"], row["y2018"], row["y2017"], row["y2016"], row["y2015"]] return len([x for x in l if x != 0]) df["Years"] = df.apply(years, axis=1) # Drop features that aren't needed df = df.drop(["Company", "Region", "Manager", "RenewalMonth", "RenewalDate"], axis=1) # Normalize the values of the columns that need it for col in ["y2015", "y2016", "y2017", "y2018", "y2019"]: df[col] = df[col] / df[col].max() # Encode categorical columns into the columns with 0 and 1 -s (required by Logistic Regression) for c in X["Country"].unique(): df[c] = df["Country"].apply(lambda x: 1 if x == c else 0) for s in X["Sector"].unique(): if s: df[s] = df["Sector"].apply(lambda x: 1 if x == s else 0) df = df.drop(["Country", "Sector"], axis=1) return df def fit(self, X, y=None, **fit_params): return self # The second step makes sure the data has exactly the same format as the data used to train the model. # This step is necessary because the model that will be trained as the part of this pipeline, will # be re-used later to make predictions based on other data. class ReindexColumns(BaseEstimator, TransformerMixin): def __init__(self, columns): self.columns = columns # the columns of the original data that was used to train the model def transform(self, X, **transform_params): # Drop the columns that were not present in the original data # Add missing columns that were present in the original data with 0 as value return X.reindex(columns=self.columns, fill_value=0) def fit(self, X, y=None, **fit_params): return self # Read the data. Drop the rows with incomplete data (including the rows without historical churn data). df = pd.read_csv("https://www.dropbox.com/s/cat8vm6lchlu5tp/data.csv?dl=1", index_col=0).dropna() X = df.drop(["Churn"], axis=1) # features y = df["Churn"] # target variable X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=99) # Save the columns of the original processed data to be used in the pipeline. X_train_columns = PrepareData().transform(X_train) # Define the pipeline as a chain of steps pipeline = make_pipeline( PrepareData(), # 1. Transform the data into the form suitable for the model ReindexColumns(X_train_columns.columns), # 2 Make sure the transformed data has the same format as the original data that was used to train the model LogisticRegression() # Pass the transformed and re-indexed data to the logistic regression ) # Train the pipeline pipeline.fit(X_train, y_train) # deploy the pipeline with the name "tutorials/sklearn_model" and print its URL url = ds.push("tutorials/sklearn_model", pipeline) print(url)
Now, if you run this code, the deployed model can be re-used on live data. The model will transform the data as a part of the pipeline.
For more details on how
sklearn pipelines can be deployed and re-used from dstack applications, please read the corresponding tutorial.
Our core vision remains to bring ML into business operations. And we will continue simplifying this process by providing low code tools to build ML models and to put them into production within enterprises.
In the short-term, we will improve the experience of building interactive applications to make it even simpler. Once we do that, we’ll also help build and re-use ML models.
Would you like to learn more about our vision, or to share your thoughts with us? Let us know by dropping an email to
Meanwhile, please do try the product, and share your feedback with us in our tracker.
Call for action
- Star us on GitHub
- Install the update (
pip install dstack==0.6.4and
dstack server start)
- Follow Quickstart and Tutorials
- Share your application with others using dstack.cloud.
- Tell us about your feedback (for bugs, create issues, for other feedback, email to the team at dstack.ai, or ask in Discord)