diff --git "a/data/dai_docs.train.json" "b/data/dai_docs.train.json"
new file mode 100644--- /dev/null
+++ "b/data/dai_docs.train.json"
@@ -0,0 +1,7253 @@
+[
+ {
+ "output": " BlueData DataTap Setup\n\n\nThis section provides instructions for configuring Driverless AI to work with BlueData DataTap."
+ },
+ {
+ "output": " Use ``docker version`` to check which version of Docker you are using. Description of Configuration Attributes\n~\n\n- ``dtap_auth_type``: Selects DTAP authentication."
+ },
+ {
+ "output": " If running Driverless AI as a service, then the Kerberos keytab needs to be owned by the Driverless AI user."
+ },
+ {
+ "output": " This folder can contain multiple config files. Note: The DTAP config file core-site.xml needs to contain DTap FS configuration, for example:\n\n ::\n\n \n \n fs.dtap.impl\n com.bluedata.hadoop.bdfs.Bdfs\n The FileSystem for BlueData dtap: URIs.\n \n \n\n- ``dtap_key_tab_path``: The path of the principal key tab file."
+ },
+ {
+ "output": " - ``dtap_app_principal_user``: The Kerberos app principal user (recommended). - ``dtap_app_login_user``: The user ID of the current user (for example, user@realm)."
+ },
+ {
+ "output": " Separate each argument with spaces. - ``dtap_app_classpath``: The DTap classpath. - ``dtap_init_path``: Specifies the starting DTAP path displayed in the UI of the DTAP browser."
+ },
+ {
+ "output": " This must be configured in order for data connectors to function properly. Example 1: Enable DataTap with No Authentication\n\n\n.. tabs::\n .. group-tab:: Docker Image Installs\n\n This example enables the DataTap data connector and disables authentication."
+ },
+ {
+ "output": " This lets users reference data stored in DTap directly using the name node address, for example: ``dtap://name.node/datasets/iris.csv`` or ``dtap://name.node/datasets/``."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS=\"file,dtap\" \\\n -e DRIVERLESS_AI_DTAP_AUTH_TYPE='noauth' \\\n -p 12345:12345 \\\n -v /etc/passwd:/etc/passwd \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Docker Image with the config.toml\n\n This example shows how to configure DataTap options in the config.toml file, and then specify that file when starting Driverless AI in Docker."
+ },
+ {
+ "output": " 1. Configure the Driverless AI config.toml file. Set the following configuration options:\n\n - ``enabled_file_systems = \"file, upload, dtap\"``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example enables the DataTap data connector and disables authentication in the config.toml file."
+ },
+ {
+ "output": " (Note: The trailing slash is currently required for directories.) 1. Export the Driverless AI config.toml file or add it to ~/.bashrc."
+ },
+ {
+ "output": " Specify the following configuration options in the config.toml file. ::\n\n # File System Support\n # upload : standard upload feature\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n enabled_file_systems = \"file, dtap\"\n\n 3."
+ },
+ {
+ "output": " Example 2: Enable DataTap with Keytab-Based Authentication\n\n\nNotes: \n\n- If using Kerberos Authentication, the the time on the Driverless AI server must be in sync with Kerberos server."
+ },
+ {
+ "output": " - If running Driverless AI as a service, then the Kerberos keytab needs to be owned by the Driverless AI user; otherwise Driverless AI will not be able to read/access the Keytab and will result in a fallback to simple authentication and, hence, fail."
+ },
+ {
+ "output": " - Configures the environment variable ``DRIVERLESS_AI_DTAP_APP_PRINCIPAL_USER`` to reference a user for whom the keytab was created (usually in the form of user@realm)."
+ },
+ {
+ "output": " - Configures the option ``dtap_app_prinicpal_user`` to reference a user for whom the keytab was created (usually in the form of user@realm)."
+ },
+ {
+ "output": " Configure the Driverless AI config.toml file. Set the following configuration options:\n\n - ``enabled_file_systems = \"file, upload, dtap\"``\n - ``dtap_auth_type = \"keytab\"``\n - ``dtap_key_tab_path = \"/tmp/\"``\n - ``dtap_app_principal_user = \"\"``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example:\n\n - Places keytabs in the ``/tmp/dtmp`` folder on your machine and provides the file path as described below."
+ },
+ {
+ "output": " 1. Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n\n # File System Support\n # file : local file system/server file system\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n enabled_file_systems = \"file, dtap\"\n\n # Blue Data DTap connector settings are similar to HDFS connector settings."
+ },
+ {
+ "output": " If running\n # DAI as a service, then the Kerberos keytab needs to\n # be owned by the DAI user."
+ },
+ {
+ "output": " Save the changes when you are done, then stop/restart Driverless AI. Example 3: Enable DataTap with Keytab-Based Impersonation\n~\n\nNotes: \n\n- If using Kerberos, be sure that the Driverless AI time is synched with the Kerberos server."
+ },
+ {
+ "output": " .. tabs::\n .. group-tab:: Docker Image Installs\n\n This example:\n\n - Places keytabs in the ``/tmp/dtmp`` folder on your machine and provides the file path as described below."
+ },
+ {
+ "output": " - Configures the ``DRIVERLESS_AI_DTAP_APP_LOGIN_USER`` variable, which references a user who is being impersonated (usually in the form of user@realm)."
+ },
+ {
+ "output": " - Configures the ``dtap_app_principal_user`` variable, which references a user for whom the keytab was created (usually in the form of user@realm)."
+ },
+ {
+ "output": " 1. Configure the Driverless AI config.toml file. Set the following configuration options:\n\n - ``enabled_file_systems = \"file, upload, dtap\"``\n - ``dtap_auth_type = \"keytabimpersonation\"``\n - ``dtap_key_tab_path = \"/tmp/\"``\n - ``dtap_app_principal_user = \"\"``\n - ``dtap_app_login_user = \"\"``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example:\n\n - Places keytabs in the ``/tmp/dtmp`` folder on your machine and provides the file path as described below."
+ },
+ {
+ "output": " - Configures the ``dtap_app_login_user`` variable, which references a user who is being impersonated (usually in the form of user@realm)."
+ },
+ {
+ "output": " Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n \n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, dtap\"\n\n # Blue Data DTap connector settings are similar to HDFS connector settings."
+ },
+ {
+ "output": " If running\n # DAI as a service, then the Kerberos keytab needs to\n # be owned by the DAI user."
+ },
+ {
+ "output": " Data Recipe URL Setup\n-\n\nDriverless AI lets you explore data recipe URL data sources from within the Driverless AI application."
+ },
+ {
+ "output": " When enabled (default), you will be able to modify datasets that have been added to Driverless AI. (Refer to :ref:`modify_by_recipe` for more information.)"
+ },
+ {
+ "output": " These steps are provided in case this connector was previously disabled and you want to re-enable it."
+ },
+ {
+ "output": " Use ``docker version`` to check which version of Docker you are using. Enable Data Recipe URL\n\n\n.. tabs::\n .. group-tab:: Docker Image Installs\n\n This example enables the data recipe URL data connector."
+ },
+ {
+ "output": " Note that ``recipe_url`` is enabled in the config.toml file by default. 1. Configure the Driverless AI config.toml file."
+ },
+ {
+ "output": " - ``enabled_file_systems = \"file, upload, recipe_url\"``\n\n 2. Mount the config.toml file into the Docker container."
+ },
+ {
+ "output": " Note that ``recipe_url`` is enabled by default. 1. Export the Driverless AI config.toml file or add it to ~/.bashrc."
+ },
+ {
+ "output": " Specify the following configuration options in the config.toml file. ::\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, recipe_url\"\n\n 3."
+ },
+ {
+ "output": " AutoDoc Settings\n\n\nThis section includes settings that can be used to configure AutoDoc. ``make_autoreport``\n~\n\n.. dropdown:: Make AutoDoc\n\t:open:\n\n\tSpecify whether to create an AutoDoc for the experiment after it has finished running."
+ },
+ {
+ "output": " ``autodoc_report_name``\n~\n\n.. dropdown:: AutoDoc Name\n\t:open:\n\n\tSpecify a name for the AutoDoc report."
+ },
+ {
+ "output": " ``autodoc_template``\n\n\n.. dropdown:: AutoDoc Template Location\n\t:open:\n\n\tSpecify a path for the AutoDoc template:\n\n\t- To generate a custom AutoDoc template, specify the full path to your custom template."
+ },
+ {
+ "output": " ``autodoc_output_type``\n~\n\n.. dropdown:: AutoDoc File Output Type\n\t:open:\n\n\tSpecify the AutoDoc output type."
+ },
+ {
+ "output": " Choose from the following:\n\n\t- auto (Default)\n\t- md\n\t- docx\n\n``autodoc_max_cm_size``\n~\n\n.. dropdown:: Confusion Matrix Max Number of Classes\n\t:open:\n\n\tSpecify the maximum number of classes in the confusion matrix."
+ },
+ {
+ "output": " ``autodoc_num_features``\n\n\n.. dropdown:: Number of Top Features to Document\n\t:open:\n\n\tSpecify the number of top features to display in the document."
+ },
+ {
+ "output": " This is set to 50 by default. ``autodoc_min_relative_importance``\n~\n\n.. dropdown:: Minimum Relative Feature Importance Threshold\n\t:open:\n\n\tSpecify the minimum relative feature importance in order for a feature to be displayed."
+ },
+ {
+ "output": " This is set to 0.003 by default. ``autodoc_include_permutation_feature_importance``\n\n\n.. dropdown:: Permutation Feature Importance\n\t:open:\n\n\tSpecify whether to compute permutation-based feature importance."
+ },
+ {
+ "output": " ``autodoc_feature_importance_num_perm``\n~\n\n.. dropdown:: Number of Permutations for Feature Importance\n\t:open:\n\n\tSpecify the number of permutations to make per feature when computing feature importance."
+ },
+ {
+ "output": " ``autodoc_feature_importance_scorer``\n~\n\n.. dropdown:: Feature Importance Scorer\n\t:open:\n\n\tSpecify the name of the scorer to be used when calculating feature importance."
+ },
+ {
+ "output": " ``autodoc_pd_max_rows``\n~\n\n.. dropdown:: PDP Max Number of Rows\n\t:open:\n\n\tSpecify the number of rows for Partial Dependence Plots."
+ },
+ {
+ "output": " Set this value to -1 to disable the time limit. This is set to 20 seconds by default. ``autodoc_out_of_range``\n\n\n.. dropdown:: PDP Out of Range\n\t:open:\n\n\tSpecify the number of standard deviations outside of the range of a column to include in partial dependence plots."
+ },
+ {
+ "output": " This is set to 3 by default. ``autodoc_num_rows``\n\n\n.. dropdown:: ICE Number of Rows\n\t:open:\n\n\tSpecify the number of rows to include in PDP and ICE plots if individual rows are not specified."
+ },
+ {
+ "output": " ``autodoc_population_stability_index``\n\n\n.. dropdown:: Population Stability Index\n\t:open:\n\n\tSpecify whether to include a population stability index if the experiment is a binary classification or regression problem."
+ },
+ {
+ "output": " ``autodoc_population_stability_index_n_quantiles``\n\n\n.. dropdown:: Population Stability Index Number of Quantiles\n\t:open:\n\n\tSpecify the number of quantiles to use for the population stability index."
+ },
+ {
+ "output": " ``autodoc_prediction_stats``\n\n\n.. dropdown:: Prediction Statistics\n\t:open:\n\n\tSpecify whether to include prediction statistics information if the experiment is a binary classification or regression problem."
+ },
+ {
+ "output": " ``autodoc_prediction_stats_n_quantiles``\n\n\n.. dropdown:: Prediction Statistics Number of Quantiles\n\t:open:\n\n\tSpecify the number of quantiles to use for prediction statistics."
+ },
+ {
+ "output": " ``autodoc_response_rate``\n~\n\n.. dropdown:: Response Rates Plot\n\t:open:\n\n\tSpecify whether to include response rates information if the experiment is a binary classification problem."
+ },
+ {
+ "output": " ``autodoc_response_rate_n_quantiles``\n~\n\n.. dropdown:: Response Rates Plot Number of Quantiles\n\t:open:\n\n\tSpecify the number of quantiles to use for response rates information."
+ },
+ {
+ "output": " ``autodoc_gini_plot``\n~\n\n.. dropdown:: Show GINI Plot\n\t:open:\n\n\tSpecify whether to show the GINI plot."
+ },
+ {
+ "output": " ``autodoc_enable_shapley_values``\n~\n\n.. dropdown:: Enable Shapley Values\n\t:open:\n\n\tSpecify whether to show Shapley values results in the AutoDoc."
+ },
+ {
+ "output": " ``autodoc_data_summary_col_num``\n\n\n.. dropdown:: Number of Features in Data Summary Table\n\t:open:\n\n\tSpecify the number of features to be shown in the data summary table."
+ },
+ {
+ "output": " To show all columns, specify any value lower than 1. This is set to -1 by default. ``autodoc_list_all_config_settings``\n\n\n.. dropdown:: List All Config Settings\n\t:open:\n\n\tSpecify whether to show all config settings."
+ },
+ {
+ "output": " All settings are listed when enabled. This is disabled by default. ``autodoc_keras_summary_line_length``\n~\n\n.. dropdown:: Keras Model Architecture Summary Line Length\n\t:open:\n\n\tSpecify the line length of the Keras model architecture summary."
+ },
+ {
+ "output": " To use the default line length, set this value to -1 (default). ``autodoc_transformer_architecture_max_lines``\n\n\n.. dropdown:: NLP/Image Transformer Architecture Max Lines\n\t:open:\n\n\tSpecify the maximum number of lines shown for advanced transformer architecture in the Feature section."
+ },
+ {
+ "output": " ``autodoc_full_architecture_in_appendix``\n~\n\n.. dropdown:: Appendix NLP/Image Transformer Architecture\n\t:open:\n\n\tSpecify whether to show the full NLP/Image transformer architecture in the appendix."
+ },
+ {
+ "output": " ``autodoc_coef_table_appendix_results_table``\n~\n\n.. dropdown:: Full GLM Coefficients Table in the Appendix\n\t:open:\n\n\tSpecify whether to show the full GLM coefficient table(s) in the appendix."
+ },
+ {
+ "output": " ``autodoc_coef_table_num_models``\n~\n\n.. dropdown:: GLM Coefficient Tables Number of Models\n\t:open:\n\n\tSpecify the number of models for which a GLM coefficients table is shown in the AutoDoc."
+ },
+ {
+ "output": " Set this value to -1 to show tables for all models. This is set to 1 by default. ``autodoc_coef_table_num_folds``\n\n\n.. dropdown:: GLM Coefficient Tables Number of Folds Per Model\n\t:open:\n\n\tSpecify the number of folds per model for which a GLM coefficients table is shown in the AutoDoc."
+ },
+ {
+ "output": " ``autodoc_coef_table_num_coef``\n~\n\n.. dropdown:: GLM Coefficient Tables Number of Coefficients\n\t:open:\n\n\tSpecify the number of coefficients to show within a GLM coefficients table in the AutoDoc."
+ },
+ {
+ "output": " Set this value to -1 to show all coefficients. ``autodoc_coef_table_num_classes``\n\n\n.. dropdown:: GLM Coefficient Tables Number of Classes\n\t:open:\n\n\tSpecify the number of classes to show within a GLM coefficients table in the AutoDoc."
+ },
+ {
+ "output": " This is set to 9 by default. ``autodoc_num_histogram_plots``\n~\n\n.. dropdown:: Number of Histograms to Show\n\t:open:\n\n\tSpecify the number of top features for which to show histograms."
+ },
+ {
+ "output": " Snowflake Setup\n- \n\nDriverless AI allows you to explore Snowflake data sources from within the Driverless AI application."
+ },
+ {
+ "output": " This setup requires you to enable authentication. If you enable Snowflake connectors, those file systems will be available in the UI, but you will not be able to use those connectors without authentication."
+ },
+ {
+ "output": " Use ``docker version`` to check which version of Docker you are using. Description of Configuration Attributes\n~\n\n- ``snowflake_account``: The Snowflake account ID\n- ``snowflake_user``: The username for accessing the Snowflake account\n- ``snowflake_password``: The password for accessing the Snowflake account\n- ``enabled_file_systems``: The file systems you want to enable."
+ },
+ {
+ "output": " Enable Snowflake with Authentication\n\n\n.. tabs::\n .. group-tab:: Docker Image Installs\n\n This example enables the Snowflake data connector with authentication by passing the ``account``, ``user``, and ``password`` variables."
+ },
+ {
+ "output": " 1. Configure the Driverless AI config.toml file. Set the following configuration options. - ``enabled_file_systems = \"file, snow\"``\n - ``snowflake_account = \"\"``\n - ``snowflake_user = \"\"``\n - ``snowflake_password = \"\"``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n \n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example enables the Snowflake data connector with authentication by passing the ``account``, ``user``, and ``password`` variables."
+ },
+ {
+ "output": " Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, snow\"\n\n # Snowflake Connector credentials\n snowflake_account = \"\"\n snowflake_user = \"\"\n snowflake_password = \"\"\n\n 3."
+ },
+ {
+ "output": " Adding Datasets Using Snowflake\n \n\nAfter the Snowflake connector is enabled, you can add datasets by selecting Snowflake from the Add Dataset (or Drag and Drop) drop-down menu."
+ },
+ {
+ "output": " 1. Enter Database: Specify the name of the Snowflake database that you are querying. 2. Enter Warehouse: Specify the name of the Snowflake warehouse that you are querying."
+ },
+ {
+ "output": " Enter Schema: Specify the schema of the dataset that you are querying. 4. Enter Name for Dataset to Be Saved As: Specify a name for the dataset to be saved as."
+ },
+ {
+ "output": " 5. Enter Username: (Optional) Specify the username associated with this Snowflake account. This can be left blank if ``snowflake_user`` was specified in the config.toml when starting Driverless AI; otherwise, this field is required."
+ },
+ {
+ "output": " Enter Password: (Optional) Specify the password associated with this Snowflake account. This can be left blank if ``snowflake_password`` was specified in the config.toml when starting Driverless AI; otherwise, this field is required."
+ },
+ {
+ "output": " Enter Role: (Optional) Specify your role as designated within Snowflake. See https://docs.snowflake.net/manuals/user-guide/security-access-control-overview.html for more information."
+ },
+ {
+ "output": " Enter Region: (Optional) Specify the region of the warehouse that you are querying. This can be found in the Snowflake-provided URL to access your database (as in ...snowflakecomputing.com)."
+ },
+ {
+ "output": " 9. Enter File Formatting Parameters: (Optional) Specify any additional parameters for formatting your datasets."
+ },
+ {
+ "output": " (Note: Use only parameters for ``TYPE = CSV``.) For example, if your dataset includes a text column that contains commas, you can specify a different delimiter using ``FIELD_DELIMITER='character'``."
+ },
+ {
+ "output": " For example, you might specify the following to load the \"AMAZON_REVIEWS\" dataset:\n\n * Database: UTIL_DB\n * Warehouse: DAI_SNOWFLAKE_TEST\n * Schema: AMAZON_REVIEWS_SCHEMA\n * Query: SELECT * FROM AMAZON_REVIEWS\n * Enter File Formatting Parameters (Optional): FIELD_OPTIONALLY_ENCLOSED_BY = '\"' \n\n In the above example, if the ``FIELD_OPTIONALLY_ENCLOSED_BY`` option is not set, the following row will result in a failure to import the dataset (as the dataset's delimiter is ``,`` by default):\n\n ::\n \n positive, 2012-05-03,Wonderful\\, tasty taffy,0,0,3,5,2012,Thu,0\n\n Note: Numeric columns from Snowflake that have NULL values are sometimes converted to strings (for example, `\\\\ \\\\N`)."
+ },
+ {
+ "output": " 10. Enter Snowflake Query: Specify the Snowflake query that you want to execute. 11. When you are finished, select the Click to Make Query button to add the dataset."
+ },
+ {
+ "output": " .. _install-on-windows:\n\nWindows 10\n\n\nThis section describes how to install, start, stop, and upgrade Driverless AI on a Windows 10 machine."
+ },
+ {
+ "output": " For information on how to obtain a license key for Driverless AI, visit https://h2o.ai/o/try-driverless-ai/."
+ },
+ {
+ "output": " Overview of Installation on Windows\n~\n\nTo install Driverless AI on Windows, use a Driverless AI Docker image."
+ },
+ {
+ "output": " - Scoring is not available on Windows. Caution: Installing Driverless AI on Windows 10 is not recommended for serious use."
+ },
+ {
+ "output": " | Min Mem | Suitable for |\n+=+=+=+=+\n| Windows 10 Pro | No | 16 GB | Experimentation |\n+-+-+-+-+\n| Windows 10 Enterprise | No | 16 GB | Experimentation |\n+-+-+-+-+\n| Windows 10 Education | No | 16 GB | Experimentation |\n+-+-+-+-+\n\nNote: Driverless AI cannot be installed on versions of Windows 10 that do not support Hyper-V."
+ },
+ {
+ "output": " Docker Image Installation\n~\n\nNotes: \n\n- Be aware that there are known issues with Docker for Windows."
+ },
+ {
+ "output": " - Consult with your Windows System Admin if \n\n - Your corporate environment does not allow third-part software installs\n - You are running Windows Defender\n - You your machine is not running with ``Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux``."
+ },
+ {
+ "output": " Note that some of the images in this video may change between releases, but the installation steps remain the same."
+ },
+ {
+ "output": " Installation Procedure\n\n\n1. Retrieve the Driverless AI Docker image from https://www.h2o.ai/download/."
+ },
+ {
+ "output": " Download, install, and run Docker for Windows from https://docs.docker.com/docker-for-windows/install/."
+ },
+ {
+ "output": " Note that you may have to reboot after installation. 3. Before running Driverless AI, you must:\n\n - Enable shared access to the C drive."
+ },
+ {
+ "output": " - Adjust the amount of memory given to Docker to be at least 10 GB. Driverless AI won\u2019t run at all with less than 10 GB of memory."
+ },
+ {
+ "output": " You can adjust these settings by clicking on the Docker whale in your taskbar (look for hidden tasks, if necessary), then selecting Settings > Shared Drive and Settings > Advanced as shown in the following screenshots."
+ },
+ {
+ "output": " (Docker will restart.) Note that if you cannot make changes, stop Docker and then start Docker again by right clicking on the Docker icon on your desktop and selecting Run as Administrator."
+ },
+ {
+ "output": " Open a PowerShell terminal and set up a directory for the version of Driverless AI on the host machine: \n\n .. code-block:: bash\n :substitutions:\n\n md |VERSION-dir|\n\n5."
+ },
+ {
+ "output": " Move the downloaded Driverless AI image to your new directory. 6. Change directories to the new directory, then load the image using the following command:\n\n .. code-block:: bash\n :substitutions:\n \n cd |VERSION-dir|\n docker load -i .\\dai-docker-ubi8-x86_64-|VERSION-long|.tar.gz\n\n7."
+ },
+ {
+ "output": " .. code-block:: bash\n\n md data\n md log\n md license\n md tmp\n\n8. Copy data into the /data directory."
+ },
+ {
+ "output": " 9. Run ``docker images`` to find the image tag. 10. Start the Driverless AI Docker image. Be sure to replace ``path_to_`` below with the entire path to the location of the folders that you created (for example, \"c:/Users/user-name/driverlessai_folder/data\")."
+ },
+ {
+ "output": " GPU support will not be available. Note that from version 1.10 DAI docker image runs with internal ``tini`` that is equivalent to using ``init`` from docker, if both are enabled in the launch command, tini prints a (harmless) warning message."
+ },
+ {
+ "output": " But if user plans to build :ref:`image auto model ` extensively, then ``shm-size=2g`` is recommended for Driverless AI docker command."
+ },
+ {
+ "output": " Add Custom Recipes\n\n\nCustom recipes are Python code snippets that can be uploaded into Driverless AI at runtime like plugins."
+ },
+ {
+ "output": " If you do not have a custom recipe, you can select from a number of recipes available in the `Recipes for H2O Driverless AI repository `_."
+ },
+ {
+ "output": " To add a custom recipe to Driverless AI, click Add Custom Recipe and select one of the following options:\n\n- From computer: Add a custom recipe as a Python or ZIP file from your local file system."
+ },
+ {
+ "output": " - From Bitbucket: Add a custom recipe from a Bitbucket repository. To use this option, your Bitbucket username and password must be provided along with the custom recipe Bitbucket URL."
+ },
+ {
+ "output": " .. _edit-toml:\n\nEditing the TOML Configuration\n\n\nTo open the built-in TOML configuration editor, click TOML in the :ref:`expert-settings` window."
+ },
+ {
+ "output": " For example, if you set the Make MOJO scoring pipeline setting in the Experiment tab to Off, then the line ``make_mojo_scoring_pipeline = \"off\"`` is displayed in the TOML editor."
+ },
+ {
+ "output": " To confirm your changes, click Save. The experiment preview updates to reflect your specified configuration changes."
+ },
+ {
+ "output": " .. note::\n\tDo not edit the section below the ``[recipe_activation]`` line. This section provides Driverless AI with information about which custom recipes can be used by the experiment."
+ },
+ {
+ "output": " .. _h2o_drive:\n\n###############\nH2O Drive setup\n###############\n\nH2O Drive is an object-store for `H2O AI Cloud `_."
+ },
+ {
+ "output": " Note: For more information on the H2O Drive, refer to the `official documentation `_."
+ },
+ {
+ "output": " To enable the Feature Store data connector, ``h2o_drive`` must be added to this list of data sources."
+ },
+ {
+ "output": " - ``h2o_drive_access_token_scopes``: A space-separated list of OpenID scopes for the access token that are used by the H2O Drive connector."
+ },
+ {
+ "output": " - ``authentication_method``: The authentication method used by DAI. When enabling the Feature Store data connector, this must be set to OpenID Connect (``authentication_method=\"oidc\"``)."
+ },
+ {
+ "output": " .. _install-on-macosx:\n\nMac OS X\n\n\nThis section describes how to install, start, stop, and upgrade the Driverless AI Docker image on Mac OS X."
+ },
+ {
+ "output": " Note: Support for GPUs and MOJOs is not available on Mac OS X. The installation steps assume that you have a license key for Driverless AI."
+ },
+ {
+ "output": " Once obtained, you will be prompted to paste the license key into the Driverless AI UI when you first log in, or you can save it as a .sig file and place it in the \\license folder that you will create during the installation process."
+ },
+ {
+ "output": " Stick to small datasets! For serious use, please use Linux. - Be aware that there are known performance issues with Docker for Mac."
+ },
+ {
+ "output": " Environment\n~\n\n+-+-+-+-+\n| Operating System | GPU Support? | Min Mem | Suitable for |\n+=+=+=+=+\n| Mac OS X | No | 16 GB | Experimentation |\n+-+-+-+-+\n\nInstalling Driverless AI\n\n\n1."
+ },
+ {
+ "output": " 2. Download and run Docker for Mac from https://docs.docker.com/docker-for-mac/install. 3. Adjust the amount of memory given to Docker to be at least 10 GB."
+ },
+ {
+ "output": " You can optionally adjust the number of CPUs given to Docker. You will find the controls by clicking on (Docker Whale)->Preferences->Advanced as shown in the following screenshots."
+ },
+ {
+ "output": " .. image:: ../images/macosx_docker_menu_bar.png\n :align: center\n\n.. image:: ../images/macosx_docker_advanced_preferences.png\n :align: center\n :height: 507\n :width: 382\n\n4."
+ },
+ {
+ "output": " More information is available here: https://docs.docker.com/docker-for-mac/osxfs/#namespaces. .. image:: ../images/macosx_docker_filesharing.png\n :align: center\n :scale: 40%\n\n5."
+ },
+ {
+ "output": " With Docker running, open a Terminal and move the downloaded Driverless AI image to your new directory."
+ },
+ {
+ "output": " Change directories to the new directory, then load the image using the following command:\n\n .. code-block:: bash\n :substitutions:\n\n cd |VERSION-dir|\n docker load < dai-docker-ubi8-x86_64-|VERSION-long|.tar.gz\n\n8."
+ },
+ {
+ "output": " Optionally copy data into the data directory on the host. The data will be visible inside the Docker container at /data."
+ },
+ {
+ "output": " 10. Run ``docker images`` to find the image tag. 11. Start the Driverless AI Docker image (still within the new Driverless AI directory)."
+ },
+ {
+ "output": " Note that GPU support will not be available. Note that from version 1.10 DAI docker image runs with internal ``tini`` that is equivalent to using ``init`` from docker, if both are enabled in the launch command, tini prints a (harmless) warning message."
+ },
+ {
+ "output": " But if user plans to build :ref:`image auto model ` extensively, then ``shm-size=2g`` is recommended for Driverless AI docker command."
+ },
+ {
+ "output": " Connect to Driverless AI with your browser at http://localhost:12345. Stopping the Docker Image\n~\n\n.. include:: stop-docker.rst\n\nUpgrading the Docker Image\n\n\nThis section provides instructions for upgrading Driverless AI versions that were installed in a Docker container."
+ },
+ {
+ "output": " WARNING: Experiments, MLIs, and MOJOs reside in the Driverless AI tmp directory and are not automatically upgraded when Driverless AI is upgraded."
+ },
+ {
+ "output": " - Build MOJO pipelines before upgrading. - Stop Driverless AI and make a backup of your Driverless AI tmp directory before upgrading."
+ },
+ {
+ "output": " Before upgrading, be sure to run MLI jobs on models that you want to continue to interpret in future releases."
+ },
+ {
+ "output": " If you did not build a MOJO pipeline on a model before upgrading Driverless AI, then you will not be able to build a MOJO pipeline on that model after upgrading."
+ },
+ {
+ "output": " Note: Stop Driverless AI if it is still running. Upgrade Steps\n'\n\n1. SSH into the IP address of the machine that is running Driverless AI."
+ },
+ {
+ "output": " Set up a directory for the version of Driverless AI on the host machine:\n\n .. code-block:: bash\n :substitutions:\n\n # Set up directory with the version name\n mkdir |VERSION-dir|\n\n # cd into the new directory\n cd |VERSION-dir|\n\n3."
+ },
+ {
+ "output": " 4. Load the Driverless AI Docker image inside the new directory:\n\n .. code-block:: bash\n :substitutions:\n\n # Load the Driverless AI docker image\n docker load < dai-docker-ubi8-x86_64-|VERSION-long|.tar.gz\n\n5."
+ },
+ {
+ "output": " .. _features-settings:\n\nFeatures Settings\n=\n\n``feature_engineering_effort``\n\n\n.. dropdown:: Feature Engineering Effort\n\t:open:\n\n\tSpecify a value from 0 to 10 for the Driverless AI feature engineering effort."
+ },
+ {
+ "output": " This value defaults to 5. - 0: Keep only numeric features. Only model tuning during evolution. - 1: Keep only numeric features and frequency-encoded categoricals."
+ },
+ {
+ "output": " - 2: Similar to 1 but instead just no Text features. Some feature tuning before evolution. - 3: Similar to 5 but only tuning during evolution."
+ },
+ {
+ "output": " - 4: Similar to 5 but slightly more focused on model tuning. - 5: Balanced feature-model tuning. (Default)\n\t- 6-7: Similar to 5 but slightly more focused on feature engineering."
+ },
+ {
+ "output": " - 9-10: Similar to 8 but no model tuning during feature evolution. .. _check_distribution_shift:\n\n``check_distribution_shift``\n\n\n.. dropdown:: Data Distribution Shift Detection\n\t:open:\n\n\tSpecify whether Driverless AI should detect data distribution shifts between train/valid/test datasets (if provided)."
+ },
+ {
+ "output": " Currently, this information is only presented to the user and not acted upon. Shifted features should either be dropped."
+ },
+ {
+ "output": " Also see :ref:`drop_features_distribution_shift_threshold_auc ` and :ref:`check_distribution_shift_drop `."
+ },
+ {
+ "output": " This defaults to Auto. Note that Auto for time series experiments turns this feature off. Also see :ref:`drop_features_distribution_shift_threshold_auc ` and :ref:`check_distribution_shift `."
+ },
+ {
+ "output": " When train and test dataset differ (or train/valid or valid/test) in terms of distribution of data, then a model can be built that tells for each row, whether the row is in train or test."
+ },
+ {
+ "output": " If this AUC, GINI, or Spearman correlation of the model is above the specified threshold, then Driverless AI will consider it a strong enough shift to drop those features."
+ },
+ {
+ "output": " .. _check_leakage:\n\n``check_leakage``\n~\n\n.. dropdown:: Data Leakage Detection\n\t:open:\n\n\tSpecify whether to check for data leakage for each feature."
+ },
+ {
+ "output": " This may affect model generalization. Driverless AI runs a model to determine the predictive power of each feature on the target variable."
+ },
+ {
+ "output": " The models with high AUC (for classification) or R2 score (regression) are reported to the user as potential leak."
+ },
+ {
+ "output": " This is set to Auto by default. The equivalent config.toml parameter is ``check_leakage``. Also see :ref:`drop_features_leakage_threshold_auc `\n\n.. _drop_features_leakage_threshold_auc:\n\n``drop_features_leakage_threshold_auc``\n~\n\n.. dropdown:: Data Leakage Detection Dropping AUC/R2 Threshold\n\t:open:\n\n\tIf :ref:`Leakage Detection ` is enabled, specify the threshold for dropping features."
+ },
+ {
+ "output": " This value defaults to 0.999. The equivalent config.toml parameter is ``drop_features_leakage_threshold_auc``."
+ },
+ {
+ "output": " This value defaults to 10,000,000. ``max_features_importance``\n~\n\n.. dropdown:: Max. num. features for variable importance\n\t:open:\n\n\tSpecify the maximum number of features to use and show in importance tables."
+ },
+ {
+ "output": " Higher values can lead to lower performance and larger disk space used for datasets with more than 100k columns."
+ },
+ {
+ "output": " of columns > no. of rows). The default value is \"auto\", that will automatically enable the wide rules when detect that number of columns is greater than number of rows."
+ },
+ {
+ "output": " Enabling wide data rules sets all ``max_cols``, ``max_orig_*col``, and ``fs_orig*`` tomls to large values, and enforces monotonicity to be disabled unless ``monotonicity_constraints_dict`` is set or default value of ``monotonicity_constraints_interpretability_switch`` is changed."
+ },
+ {
+ "output": " And enables :ref:`Xgboost Random Forest model ` for modeling. To disable wide rules, set enable_wide_rules to \"off\"."
+ },
+ {
+ "output": " Also see :ref:`wide_datasets_dai` for a quick model run. ``orig_features_fs_report``\n~\n\n.. dropdown:: Report Permutation Importance on Original Features\n\t:open:\n\n\tSpecify whether Driverless AI reports permutation importance on original features (represented as normalized change in the chosen metric) in logs and the report file."
+ },
+ {
+ "output": " ``max_rows_fs``\n~\n\n.. dropdown:: Maximum Number of Rows to Perform Permutation-Based Feature Selection\n\t:open:\n\n\tSpecify the maximum number of rows when performing permutation feature importance, reduced by (stratified) random sampling."
+ },
+ {
+ "output": " ``max_orig_cols_selected``\n\n\n.. dropdown:: Max Number of Original Features Used\n\t:open:\n\n\tSpecify the maximum number of columns to be selected from an existing set of columns using feature selection."
+ },
+ {
+ "output": " For categorical columns, the selection is based upon how well target encoding (or frequency encoding if not available) on categoricals and numerics treated as categoricals helps."
+ },
+ {
+ "output": " First the best [max_orig_cols_selected] are found through feature selection methods and then these features are used in feature evolution (to derive other features) and in modelling."
+ },
+ {
+ "output": " Feature selection is performed on all features when this value is exceeded. This value defaults to 300."
+ },
+ {
+ "output": " This value defaults to 10,0000000. Additional columns above the specified value add special individual with original columns reduced."
+ },
+ {
+ "output": " Note that this is applicable only to special individuals with original columns reduced. A separate individual in the :ref:`genetic algorithm ` is created by doing feature selection by permutation importance on original features."
+ },
+ {
+ "output": " ``fs_orig_nonnumeric_cols_selected``\n\n\n.. dropdown:: Number of Original Non-Numeric Features to Trigger Feature Selection Model Type\n\t:open:\n\n\tThe maximum number of original non-numeric columns, above which Driverless AI will do feature selection on all features."
+ },
+ {
+ "output": " A separate individual in the :ref:`genetic algorithm ` is created by doing feature selection by permutation importance on original features."
+ },
+ {
+ "output": " ``max_relative_cardinality``\n\n\n.. dropdown:: Max Allowed Fraction of Uniques for Integer and Categorical Columns\n\t:open:\n\n\tSpecify the maximum fraction of unique values for integer and categorical columns."
+ },
+ {
+ "output": " This value defaults to 0.95. .. _num_as_cat:\n\n``num_as_cat``\n\n\n.. dropdown:: Allow Treating Numerical as Categorical\n\t:open:\n\n\tSpecify whether to allow some numerical features to be treated as categorical features."
+ },
+ {
+ "output": " The equivalent config.toml parameter is ``num_as_cat``. ``max_int_as_cat_uniques``\n\n\n.. dropdown:: Max Number of Unique Values for Int/Float to be Categoricals\n\t:open:\n\n\tSpecify the number of unique values for integer or real columns to be treated as categoricals."
+ },
+ {
+ "output": " ``max_fraction_invalid_numeric``\n\n\n.. dropdown:: Max. fraction of numeric values to be non-numeric (and not missing) for a column to still be considered numeric\n\t:open:\n\n\tWhen the fraction of non-numeric (and non-missing) values is less or equal than this value, consider the column numeric."
+ },
+ {
+ "output": " Note: Replaces non-numeric values with missing values at start of experiment, so some information is lost, but column is now treated as numeric, which can help."
+ },
+ {
+ "output": " .. _nfeatures_max:\n\n``nfeatures_max``\n~\n\n.. dropdown:: Max Number of Engineered Features\n\t:open:\n\n\tSpecify the maximum number of features to be included per model (and in each model within the final model if an ensemble)."
+ },
+ {
+ "output": " Final ensemble will exclude any pruned-away features and only train on kept features, but may contain a few new features due to fitting on different data view (e.g."
+ },
+ {
+ "output": " Final scoring pipeline will exclude any pruned-away features, but may contain a few new features due to fitting on different data view (e.g."
+ },
+ {
+ "output": " The default value of -1 means no restrictions are applied for this parameter except internally-determined memory and interpretability restrictions."
+ },
+ {
+ "output": " Otherwise, only mutations of scored individuals will be pruned (until the final model where limits are strictly applied)."
+ },
+ {
+ "output": " * E.g. to generally limit every iteration to exactly 1 features, one must set ``nfeatures_max`` = ``ngenes_max`` =1 and ``remove_scored_0gain_genes_in_postprocessing_above_interpretability`` = 0, but the genetic algorithm will have a harder time finding good features."
+ },
+ {
+ "output": " .. _ngenes_max:\n\n``ngenes_max``\n\n\n.. dropdown:: Max Number of Genes\n\t:open:\n\n\tSpecify the maximum number of genes (transformer instances) kept per model (and per each model within the final model for ensembles)."
+ },
+ {
+ "output": " If restriction occurs after scoring features, then aggregated gene importances are used for pruning genes."
+ },
+ {
+ "output": " A value of -1 means no restrictions except internally-determined memory and interpretability restriction."
+ },
+ {
+ "output": " ``features_allowed_by_interpretability``\n\n\n.. dropdown:: Limit Features by Interpretability\n\t:open:\n\n\tSpecify whether to limit feature counts with the Interpretability training setting as specified by the ``features_allowed_by_interpretability`` :ref:`config.toml ` setting."
+ },
+ {
+ "output": " This value defaults to 7. Also see :ref:`monotonic gbm recipe ` and :ref:`Monotonicity Constraints in Driverless AI ` for reference."
+ },
+ {
+ "output": " This value defaults to 0.1. Note: This setting is only enabled when Interpretability is greater than or equal to the value specified by the :ref:`enable-constraints` setting and when the :ref:`constraints-override` setting is not specified."
+ },
+ {
+ "output": " ``monotonicity_constraints_log_level``\n\n\n.. dropdown:: Control amount of logging when calculating automatic monotonicity constraints (if enabled)\n\t:open:\n\n\tFor models that support monotonicity constraints, and if enabled, show automatically determined monotonicity constraints for each feature going into the model based on its correlation with the target."
+ },
+ {
+ "output": " 'medium' shows correlation of positively and negatively constraint features. 'high' shows all correlation values."
+ },
+ {
+ "output": " .. _monotonicity-constraints-drop-low-correlation-features:\n\n``monotonicity_constraints_drop_low_correlation_features``\n\n\n.. dropdown:: Whether to drop features that have no monotonicity constraint applied (e.g., due to low correlation with target)\n\t:open:\n\n\tIf enabled, only monotonic features with +1/-1 constraints will be passed to the model(s), and features without monotonicity constraints (0) will be dropped."
+ },
+ {
+ "output": " Only active when interpretability >= monotonicity_constraints_interpretability_switch or monotonicity_constraints_dict is provided."
+ },
+ {
+ "output": " .. _constraints-override:\n\n``monotonicity_constraints_dict``\n\n\n.. dropdown:: Manual Override for Monotonicity Constraints\n\t:open:\n\n\tSpecify a list of features for max_features_importance which monotonicity constraints are applied."
+ },
+ {
+ "output": " The following is an example of how this list can be specified:\n\n\t::\n\n\t \"{'PAY_0': -1, 'PAY_2': -1, 'AGE': -1, 'BILL_AMT1': 1, 'PAY_AMT1': -1}\"\n\n\tNote: If a list is not provided, then the automatic correlation-based method is used when monotonicity constraints are enabled at high enough interpretability settings."
+ },
+ {
+ "output": " .. _max-feature-interaction-depth:\n\n``max_feature_interaction_depth``\n~\n\n.. dropdown:: Max Feature Interaction Depth\n\t:open:\n\n\tSpecify the maximum number of features to use for interaction features like grouping for target encoding, weight of evidence, and other likelihood estimates."
+ },
+ {
+ "output": " The interaction can take multiple forms (i.e. feature1 + feature2 or feature1 * feature2 + \u2026 featureN)."
+ },
+ {
+ "output": " The depth of the interaction level (as in \"up to\" how many features may be combined at once to create one single feature) can be specified to control the complexity of the feature engineering process."
+ },
+ {
+ "output": " This value defaults to 8. Set Max Feature Interaction Depth to 1 to disable any feature interactions ``max_feature_interaction_depth=1``."
+ },
+ {
+ "output": " To use all features for each transformer, set this to be equal to the number of columns. To do a 50/50 sample and a fixed feature interaction depth of :math:`n` features, set this to -:math:`n`."
+ },
+ {
+ "output": " Target encoding refers to several different feature transformations (primarily focused on categorical data) that aim to represent the feature using information of the actual target variable."
+ },
+ {
+ "output": " These type of features can be very predictive but are prone to overfitting and require more memory as they need to store mappings of the unique categories and the target values."
+ },
+ {
+ "output": " The degree to which GINI is inaccurate is also used to perform fold-averaging of look-up tables instead of using global look-up tables."
+ },
+ {
+ "output": " ``enable_lexilabel_encoding``\n~\n\n.. dropdown:: Enable Lexicographical Label Encoding\n\t:open:\n\n\tSpecify whether to enable lexicographical label encoding."
+ },
+ {
+ "output": " ``enable_isolation_forest``\n~\n\n.. dropdown:: Enable Isolation Forest Anomaly Score Encoding\n\t:open:\n\n\t`Isolation Forest `__ is useful for identifying anomalies or outliers in data."
+ },
+ {
+ "output": " This split depends on how long it takes to separate the points. Random partitioning produces noticeably shorter paths for anomalies."
+ },
+ {
+ "output": " This option lets you specify whether to return the anomaly score of each sample. This is disabled by default."
+ },
+ {
+ "output": " The default Auto setting is only applicable for small datasets and GLMs. ``isolation_forest_nestimators``\n\n\n.. dropdown:: Number of Estimators for Isolation Forest Encoding\n\t:open:\n\n\tSpecify the number of estimators for `Isolation Forest `__ encoding."
+ },
+ {
+ "output": " ``drop_constant_columns``\n~\n\n.. dropdown:: Drop Constant Columns\n\t:open:\n\n\tSpecify whether to drop columns with constant values."
+ },
+ {
+ "output": " ``drop_id_columns``\n~\n\n.. dropdown:: Drop ID Columns\n\t:open:\n\n\tSpecify whether to drop columns that appear to be an ID."
+ },
+ {
+ "output": " ``no_drop_features``\n\n\n.. dropdown:: Don't Drop Any Columns\n\t:open:\n\n\tSpecify whether to avoid dropping any columns (original or derived)."
+ },
+ {
+ "output": " .. _features_to_drop:\n\n``cols_to_drop``\n\n\n.. dropdown:: Features to Drop\n\t:open:\n\n\tSpecify which features to drop."
+ },
+ {
+ "output": " .. _cols_to_force_in:\n\n``cols_to_force_in``\n~\n\n.. dropdown:: Features to always keep or force in, e.g."
+ },
+ {
+ "output": " Forced-in features are handled by the most interpretable transformers allowed by the experiment options, and they are never removed (even if the model assigns 0 importance to them)."
+ },
+ {
+ "output": " When this field is left empty (default), Driverless AI automatically searches all columns (either at random or based on which columns have high variable importance)."
+ },
+ {
+ "output": " This is disabled by default. ``agg_funcs_for_group_by``\n\n\n.. dropdown:: Aggregation Functions (Non-Time-Series) for Group By Operations\n\t:open:\n\n\tSpecify whether to enable aggregation functions to use for group by operations."
+ },
+ {
+ "output": " Out-of-fold aggregations will result in less overfitting, but they analyze less data in each fold. The default value is 5."
+ },
+ {
+ "output": " Select from the following:\n\n\t- sample: Sample transformer parameters (Default)\n\t- batched: Perform multiple types of the same transformation together\n\t- full: Perform more types of the same transformation together than the above strategy\n\n``dump_varimp_every_scored_indiv``\n\n\n.. dropdown:: Enable Detailed Scored Features Info\n\t:open:\n\n\tSpecify whether to dump every scored individual's variable importance (both derived and original) to a csv/tabulated/json file."
+ },
+ {
+ "output": " This is disabled by default. ``dump_trans_timings``\n\n\n.. dropdown:: Enable Detailed Logs for Timing and Types of Features Produced\n\t:open:\n\n\tSpecify whether to dump every scored fold's timing and feature info to a timings.txt file."
+ },
+ {
+ "output": " ``compute_correlation``\n~\n\n.. dropdown:: Compute Correlation Matrix\n\t:open:\n\n\tSpecify whether to compute training, validation, and test correlation matrixes."
+ },
+ {
+ "output": " Note that this setting is currently a single threaded process that may be slow for experiments with many columns."
+ },
+ {
+ "output": " ``interaction_finder_gini_rel_improvement_threshold``\n~\n\n.. dropdown:: Required GINI Relative Improvement for Interactions\n\t:open:\n\n\tSpecify the required GINI relative improvement value for the InteractionTransformer."
+ },
+ {
+ "output": " If the data is noisy and there is no clear signal in interactions, this value can be decreased to return interactions."
+ },
+ {
+ "output": " ``interaction_finder_return_limit``\n~\n\n.. dropdown:: Number of Transformed Interactions to Make\n\t:open:\n\n\tSpecify the number of transformed interactions to make from generated trial interactions."
+ },
+ {
+ "output": " This value defaults to 5. .. _enable_rapids_transformers:\n\n``enable_rapids_transformers``\n\n\n.. dropdown:: Whether to enable RAPIDS cuML GPU transformers (no mojo)\n\t:open:\n\n\tSpecify whether to enable GPU-based `RAPIDS cuML `__ transformers."
+ },
+ {
+ "output": " The equivalent config.toml parameter is ``enable_rapids_transformers`` and the default value is False."
+ },
+ {
+ "output": " This setting also sets the overall scale for lower interpretability settings. Set this to a lower value if you're content with having many weak features despite choosing high interpretability, or if you see a drop in performance due to the need for weak features."
+ },
+ {
+ "output": " Delta improvement of score corresponds to original metric minus metric of shuffled feature frame if maximizing metric, and corresponds to negative of such a score difference if minimizing."
+ },
+ {
+ "output": " Note, if using tree methods, multiple depths may be fitted, in which case regardless of this toml setting, only features that are kept for all depths are kept by feature selection."
+ },
+ {
+ "output": " .. _linux:\n\nLinux x86_64 Installs\n-\n\nThis section provides installation steps for RPM, deb, and tar installs in Linux x86_64 environments."
+ },
+ {
+ "output": " Hive Setup\n\n\nDriverless AI lets you explore Hive data sources from within the Driverless AI application."
+ },
+ {
+ "output": " Note: Depending on your Docker install version, use either the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command when starting the Driverless AI Docker image."
+ },
+ {
+ "output": " Description of Configuration Attributes\n~\n\n- ``enabled_file_systems``: The file systems you want to enable."
+ },
+ {
+ "output": " - ``hive_app_configs``: Configuration for Hive Connector. Inputs are similar to configuring the HDFS connector."
+ },
+ {
+ "output": " This can have multiple files (e.g. hive-site.xml, hdfs-site.xml, etc.) - ``auth_type``: Specify one of ``noauth``, ``keytab``, or ``keytabimpersonation`` for Kerberos authentication\n - ``keytab_path``: Specify the path to Kerberos keytab to use for authentication (this can be ``\"\"`` if using ``auth_type=\"noauth\"``)\n - ``principal_user``: Specify the Kerberos app principal user (required when using ``auth_type=\"keytab\"`` or ``auth_type=\"keytabimpersonation\"``)\n\nNotes:\n\n- With Hive connectors, it is assumed that DAI is running on the edge node."
+ },
+ {
+ "output": " missing classes, dependencies, authorization errors). - Ensure the core-site.xml file (from e.g Hadoop conf) is also present in the Hive conf with the rest of the files (hive-site.xml, hdfs-site.xml, etc.)."
+ },
+ {
+ "output": " ``hadoop.proxyuser.hive.hosts`` & ``hadoop.proxyuser.hive.groups``). - If you have tez as the Hive execution engine, make sure that the required tez dependencies (classpaths, jars, etc.)"
+ },
+ {
+ "output": " Alternatively, you can use internal engines that come with DAI by changing your ``hive.execution.engine`` value in the hive-site.xml file to ``mr`` or ``spark``."
+ },
+ {
+ "output": " For example:\n \n ::\n\n \"\"\"{\n \"hive_connection_1\": {\n \"hive_conf_path\": \"/path/to/hive/conf\",\n \"auth_type\": \"one of ['noauth', 'keytab',\n 'keytabimpersonation']\",\n \"keytab_path\": \"/path/to/.keytab\",\n \"principal_user\": \"hive/node1.example.com@EXAMPLE.COM\",\n },\n \"hive_connection_2\": {\n \"hive_conf_path\": \"/path/to/hive/conf_2\",\n \"auth_type\": \"one of ['noauth', 'keytab', \n 'keytabimpersonation']\",\n \"keytab_path\": \"/path/to/.keytab\",\n \"principal_user\": \"hive/node2.example.com@EXAMPLE.COM\",\n }\n }\"\"\"\n\n \\ Note: The expected input of ``hive_app_configs`` is a `JSON string `__."
+ },
+ {
+ "output": " Depending on how the configuration value is applied, different forms of outer quotations may be required."
+ },
+ {
+ "output": " - Configuration value applied with the config.toml file:\n\n ::\n\n hive_app_configs = \"\"\"{\"my_json_string\": \"value\", \"json_key_2\": \"value2\"}\"\"\"\n\n - Configuration value applied with an environment variable:\n\n ::\n\n DRIVERLESS_AI_HIVE_APP_CONFIGS='{\"my_json_string\": \"value\", \"json_key_2\": \"value2\"}'\n\n- ``hive_app_jvm_args``: Optionally specify additional Java Virtual Machine (JVM) args for the Hive connector."
+ },
+ {
+ "output": " Notes:\n\n - If a custom `JAAS configuration file `__ is needed for your Kerberos setup, use ``hive_app_jvm_args`` to specify the appropriate file:\n\n ::\n\n hive_app_jvm_args = \"-Xmx20g -Djava.security.auth.login.config=/etc/dai/jaas.conf\"\n\n Sample ``jaas.conf`` file:\n ::\n\n com.sun.security.jgss.initiate {\n com.sun.security.auth.module.Krb5LoginModule required\n useKeyTab=true\n useTicketCache=false\n principal=\"hive/localhost@EXAMPLE.COM\" [Replace this line]\n doNotPrompt=true\n keyTab=\"/path/to/hive.keytab\" [Replace this line]\n debug=true;\n };\n\n- ``hive_app_classpath``: Optionally specify an alternative classpath for the Hive connector."
+ },
+ {
+ "output": " This can be done by specifying each environment variable in the ``nvidia-docker run`` command or by editing the configuration options in the config.toml file and then specifying that file in the ``nvidia-docker run`` command."
+ },
+ {
+ "output": " Start the Driverless AI Docker Image. .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS=\"file,hdfs,hive\" \\\n -e DRIVERLESS_AI_HIVE_APP_CONFIGS='{\"hive_connection_2: {\"hive_conf_path\":\"/etc/hadoop/conf\",\n \"auth_type\":\"keytabimpersonation\",\n \"keytab_path\":\"/etc/dai/steam.keytab\",\n \"principal_user\":\"steam/mr-0xg9.0xdata.loc@H2OAI.LOC\"}}' \\\n -p 12345:12345 \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -v /path/to/hive/conf:/path/to/hive/conf/in/docker \\\n -v /path/to/hive.keytab:/path/in/docker/hive.keytab \\\n -u $(id -u):${id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n\n .. group-tab:: Docker Image with the config.toml\n\n This example shows how to configure Hive options in the config.toml file, and then specify that file when starting Driverless AI in Docker."
+ },
+ {
+ "output": " Enable and configure the Hive connector in the Driverless AI config.toml file. The Hive connector configuration must be a JSON/Dictionary string with multiple keys."
+ },
+ {
+ "output": " Mount the config.toml file into the Docker container. .. code-block:: bash \n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro /\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -v /path/to/hive/conf:/path/to/hive/conf/in/docker \\\n -v /path/to/hive.keytab:/path/in/docker/hive.keytab \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n\n .. group-tab:: Native Installs\n\n This enables the Hive connector."
+ },
+ {
+ "output": " Export the Driverless AI config.toml file or add it to ~/.bashrc. ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\"\n\n 2."
+ },
+ {
+ "output": " ::\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, hdfs, s3, hive\"\n\n \n # Configuration for Hive Connector\n # Note that inputs are similar to configuring HDFS connectivity\n # Important keys:\n # * hive_conf_path - path to hive configuration, may have multiple files."
+ },
+ {
+ "output": " Required when using auth_type `keytab` or `keytabimpersonation`\n # JSON/Dictionary String with multiple keys."
+ },
+ {
+ "output": " Save the changes when you are done, then stop/restart Driverless AI. Adding Datasets Using Hive\n~\n\nAfter the Hive connector is enabled, you can add datasets by selecting Hive from the Add Dataset (or Drag and Drop) drop-down menu."
+ },
+ {
+ "output": " Select the Hive configuraton that you want to use. .. figure:: ../images/hive_select_configuration.png\n :alt: Select Hive configuration\n\n2."
+ },
+ {
+ "output": " - Hive Database: Specify the name of the Hive database that you are querying. - Hadoop Configuration Path: Specify the path to your Hive configuration file."
+ },
+ {
+ "output": " - Hive Kerberos Principal: Specify the Hive Kerberos principal. This is required if the Hive Authentication Type is keytabimpersonation."
+ },
+ {
+ "output": " This can be noauth, keytab, or keytabimpersonation. - Enter Name for Dataset to be saved as: Optionally specify a new name for the dataset that you are uploading."
+ },
+ {
+ "output": " Install on Ubuntu\n-\n\nThis section describes how to install the Driverless AI Docker image on Ubuntu."
+ },
+ {
+ "output": " Environment\n~\n\n+-+-+-+\n| Operating System | GPUs? | Min Mem |\n+=+=+=+\n| Ubuntu with GPUs | Yes | 64 GB |\n+-+-+-+\n| Ubuntu with CPUs | No | 64 GB |\n+-+-+-+\n\n.. _install-on-ubuntu-with-gpus:\n\nInstall on Ubuntu with GPUs\n~\n\nNote: Driverless AI is supported on Ubuntu 16.04 or later."
+ },
+ {
+ "output": " Once you are logged in, perform the following steps. 1. Retrieve the Driverless AI Docker image from https://www.h2o.ai/download/."
+ },
+ {
+ "output": " 2. Install and run Docker on Ubuntu (if not already installed):\n\n .. code-block:: bash\n\n # Install and run Docker on Ubuntu\n curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -\n sudo apt-key fingerprint 0EBFCD88 sudo add-apt-repository \\ \n \"deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable\" \n sudo apt-get update\n sudo apt-get install docker-ce\n sudo systemctl start docker\n\n3."
+ },
+ {
+ "output": " More information is available at https://github.com/NVIDIA/nvidia-docker/blob/master/README.md. .. code-block:: bash\n\n curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \\\n sudo apt-key add -\n distribution=$(."
+ },
+ {
+ "output": " Verify that the NVIDIA driver is up and running. If the driver is not up and running, log on to http://www.nvidia.com/Download/index.aspx?lang=en-us to get the latest NVIDIA Tesla V/P/K series driver: \n\n .. code-block:: bash\n\n nvidia-smi\n\n5."
+ },
+ {
+ "output": " Change directories to the new folder, then load the Driverless AI Docker image inside the new directory:\n\n .. code-block:: bash\n :substitutions:\n\n # cd into the new directory\n cd |VERSION-dir|\n\n # Load the Driverless AI docker image\n docker load < dai-docker-ubi8-x86_64-|VERSION-long|.tar.gz\n\n7."
+ },
+ {
+ "output": " Note that this needs to be run once every reboot. Refer to the following for more information: http://docs.nvidia.com/deploy/driver-persistence/index.html."
+ },
+ {
+ "output": " Set up the data, log, and license directories on the host machine:\n\n .. code-block:: bash\n\n # Set up the data, log, license, and tmp directories on the host machine (within the new directory)\n mkdir data\n mkdir log\n mkdir license\n mkdir tmp\n\n9."
+ },
+ {
+ "output": " The data will be visible inside the Docker container. 10. Run ``docker images`` to find the image tag."
+ },
+ {
+ "output": " Start the Driverless AI Docker image and replace TAG below with the image tag. Depending on your install version, use the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command."
+ },
+ {
+ "output": " We recommend ``shm-size=256m`` in docker launch command. But if user plans to build :ref:`image auto model ` extensively, then ``shm-size=2g`` is recommended for Driverless AI docker command."
+ },
+ {
+ "output": " .. tabs::\n\n .. tab:: >= Docker 19.03\n\n .. code-block:: bash\n :substitutions:\n\n # Start the Driverless AI Docker image\n docker run runtime=nvidia \\\n pid=host \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n h2oai/dai-ubi8-x86_64:|tag:\n\n .. tab:: < Docker 19.03\n\n .. code-block:: bash\n :substitutions:\n\n # Start the Driverless AI Docker image\n nvidia-docker run \\\n pid=host \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n Driverless AI will begin running::\n\n \n Welcome to H2O.ai's Driverless AI\n -\n\n - Put data in the volume mounted at /data\n - Logs are written to the volume mounted at /log/20180606-044258\n - Connect to Driverless AI on port 12345 inside the container\n - Connect to Jupyter notebook on port 8888 inside the container\n\n12."
+ },
+ {
+ "output": " This section describes how to install and start the Driverless AI Docker image on Ubuntu. Note that this uses ``docker`` and not ``nvidia-docker``."
+ },
+ {
+ "output": " Watch the installation video `here `__."
+ },
+ {
+ "output": " Open a Terminal and ssh to the machine that will run Driverless AI. Once you are logged in, perform the following steps."
+ },
+ {
+ "output": " Retrieve the Driverless AI Docker image from https://www.h2o.ai/download/. 2. Install and run Docker on Ubuntu (if not already installed):\n\n .. code-block:: bash\n\n # Install and run Docker on Ubuntu\n curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -\n sudo apt-key fingerprint 0EBFCD88 sudo add-apt-repository \\ \n \"deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable\"\n sudo apt-get update\n sudo apt-get install docker-ce\n sudo systemctl start docker\n\n3."
+ },
+ {
+ "output": " Change directories to the new folder, then load the Driverless AI Docker image inside the new directory:\n\n .. code-block:: bash\n :substitutions:\n\n # cd into the new directory\n cd |VERSION-dir|\n\n # Load the Driverless AI docker image\n docker load < dai-docker-ubi8-x86_64-|VERSION-long|.tar.gz\n\n5."
+ },
+ {
+ "output": " At this point, you can copy data into the data directory on the host machine. The data will be visible inside the Docker container."
+ },
+ {
+ "output": " Run ``docker images`` to find the new image tag. 8. Start the Driverless AI Docker image. Note that GPU support will not be available."
+ },
+ {
+ "output": " We recommend ``shm-size=256m`` in docker launch command. But if user plans to build :ref:`image auto model ` extensively, then ``shm-size=2g`` is recommended for Driverless AI docker command."
+ },
+ {
+ "output": " .. _linux-tarsh:\n\nLinux TAR SH\n\n\nThe Driverless AI software is available for use in pure user-mode environments as a self-extracting TAR SH archive."
+ },
+ {
+ "output": " This artifact has the same compatibility matrix as the RPM and DEB packages (combined), it just comes packaged slightly differently."
+ },
+ {
+ "output": " The installation steps assume that you have a valid license key for Driverless AI. For information on how to obtain a license key for Driverless AI, visit https://www.h2o.ai/products/h2o-driverless-ai/."
+ },
+ {
+ "output": " .. note::\n\tTo ensure that :ref:`AutoDoc ` pipeline visualizations are generated correctly on native installations, installing `fontconfig `_ is recommended."
+ },
+ {
+ "output": " Note that if you are using K80 GPUs, the minimum required NVIDIA driver version is 450.80.02\n- OpenCL (Required for full LightGBM support on GPU-powered systems)\n- Driverless AI TAR SH, available from https://www.h2o.ai/download/\n\nNote: CUDA 11.2.2 (for GPUs) and cuDNN (required for TensorFlow support on GPUs) are included in the Driverless AI package."
+ },
+ {
+ "output": " To install OpenCL, run the following as root:\n\n.. code-block:: bash\n\n mkdir -p /etc/OpenCL/vendors && echo \"libnvidia-opencl.so.1\" > /etc/OpenCL/vendors/nvidia.icd && chmod a+r /etc/OpenCL/vendors/nvidia.icd && chmod a+x /etc/OpenCL/vendors/ && chmod a+x /etc/OpenCL\n\n.. note::\n\tIf OpenCL is not installed, then CUDA LightGBM is automatically used."
+ },
+ {
+ "output": " Installing Driverless AI\n\n\nRun the following commands to install the Driverless AI TAR SH. .. code-block:: bash\n :substitutions:\n\n # Install Driverless AI."
+ },
+ {
+ "output": " Starting Driverless AI\n\n\n.. code-block:: bash\n \n # Start Driverless AI. ./run-dai.sh\n\nStarting NVIDIA Persistence Mode\n\n\nIf you have NVIDIA GPUs, you must run the following NVIDIA command."
+ },
+ {
+ "output": " For more information: http://docs.nvidia.com/deploy/driver-persistence/index.html. .. include:: enable-persistence.rst\n\nInstall OpenCL\n\n\nOpenCL is required in order to run LightGBM on GPUs."
+ },
+ {
+ "output": " .. code-block:: bash\n\n yum -y clean all\n yum -y makecache\n yum -y update\n wget http://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/c/clinfo-2.1.17.02.09-1.el7.x86_64.rpm\n wget http://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/o/ocl-icd-2.2.12-1.el7.x86_64.rpm\n rpm -if clinfo-2.1.17.02.09-1.el7.x86_64.rpm\n rpm -if ocl-icd-2.2.12-1.el7.x86_64.rpm\n clinfo\n\n mkdir -p /etc/OpenCL/vendors && \\\n echo \"libnvidia-opencl.so.1\" > /etc/OpenCL/vendors/nvidia.icd\n\nLooking at Driverless AI log files\n\n\n.. code-block:: bash\n\n less log/dai.log\n less log/h2o.log\n less log/procsy.log\n less log/vis-server.log\n\nStopping Driverless AI\n\n\n.. code-block:: bash\n\n # Stop Driverless AI."
+ },
+ {
+ "output": " By default, all files for Driverless AI are contained within this directory. Upgrading Driverless AI\n~\n\n.. include:: upgrade-warning.frag\n\nRequirements\n\n\nWe recommend to have NVIDIA driver >= |NVIDIA-driver-ver| installed (GPU only) in your host environment for a seamless experience on all architectures, including Ampere."
+ },
+ {
+ "output": " Go to `NVIDIA download driver `__ to get the latest NVIDIA Tesla A/T/V/P/K series drivers."
+ },
+ {
+ "output": " .. note::\n\tIf you are using K80 GPUs, the minimum required NVIDIA driver version is 450.80.02. Upgrade Steps\n'\n\n1."
+ },
+ {
+ "output": " 2. Run the self-extracting archive for the new version of Driverless AI. 3. Port any previous changes you made to your config.toml file to the newly unpacked directory."
+ },
+ {
+ "output": " Copy the tmp directory (which contains all the Driverless AI working state) from your previous Driverless AI installation into the newly unpacked directory."
+ },
+ {
+ "output": " Experiment Settings\n=\n\nThis section includes settings that can be used to customize the experiment like total runtime, reproducibility level, pipeline building, feature brain control, adding config.toml settings and more."
+ },
+ {
+ "output": " This is equivalent to pushing the Finish button once half of the specified time value has elapsed. Note that the overall enforced runtime is only an approximation."
+ },
+ {
+ "output": " The Finish button will be automatically selected once 12 hours have elapsed, and Driverless AI will subsequently attempt to complete the overall experiment in the remaining 12 hours."
+ },
+ {
+ "output": " Note that this setting applies to per experiment so if building leaderboard models(n) it will apply to each experiment separately(i.e total allowed runtime will be n*24hrs."
+ },
+ {
+ "output": " This option preserves experiment artifacts that have been generated for the summary and log zip files while continuing to generate additional artifacts."
+ },
+ {
+ "output": " Note that this setting applies to per experiment so if building leaderboard models( say n), it will apply to each experiment separately(i.e total allowed runtime will be n*7days."
+ },
+ {
+ "output": " Also see :ref:`time_abort `. .. _time_abort:\n\n``time_abort``\n\n\n.. dropdown:: Time to Trigger the 'Abort' Button\n\t:open:\n\n\tIf the experiment is not done by this time, push the abort button."
+ },
+ {
+ "output": " Also see :ref:`max_runtime_minutes_until_abort ` for control over per experiment abort times."
+ },
+ {
+ "output": " User can also specify integer seconds since 1970-01-01 00:00:00 UTC. This will apply to the time on a DAI worker that runs the experiments."
+ },
+ {
+ "output": " If user clones this experiment to rerun/refit/restart, this absolute time will apply to such experiments or set of leaderboard experiments."
+ },
+ {
+ "output": " Select from the following:\n\n\t- Auto: Specifies that all models and features are automatically determined by experiment settings, config.toml settings, and the feature engineering effort."
+ },
+ {
+ "output": " - Only uses GLM or booster as 'giblinear'. - :ref:`Fixed ensemble level ` is set to 0."
+ },
+ {
+ "output": " - Max feature interaction depth is set to 1 i.e no interactions. - Target transformers is set to 'identity' for regression."
+ },
+ {
+ "output": " - :ref:`monotonicity_constraints_correlation_threshold ` is set to 0."
+ },
+ {
+ "output": " - Drops features that are not correlated with target by at least 0.01. See :ref:`monotonicity-constraints-drop-low-correlation-features ` and :ref:`monotonicity-constraints-correlation-threshold `."
+ },
+ {
+ "output": " - :ref:`Interaction depth ` is set to 1 i.e no multi-feature interactions done to avoid complexity."
+ },
+ {
+ "output": " The equivalent config.toml parameter is ``recipe=['monotonic_gbm']``. - :ref:`num_as_cat ` feature transformation is disabled."
+ },
+ {
+ "output": " - Kaggle: Similar to Auto except for the following:\n\n\t\t- Any external validation set is concatenated with the train set, with the target marked as missing."
+ },
+ {
+ "output": " - Has several config.toml expert options open-up limits. - nlp_model: Only enable NLP BERT models based on PyTorch to process pure text."
+ },
+ {
+ "output": " For more information, see :ref:`nlp-in-dai`. - included_models = ['TextBERTModel', 'TextMultilingualBERTModel', 'TextXLNETModel', 'TextXLMModel','TextRoBERTaModel', 'TextDistilBERTModel', 'TextALBERTModel', 'TextCamemBERTModel', 'TextXLMRobertaModel']\n\t\t- enable_pytorch_nlp_transformer = 'off'\n\t\t- enable_pytorch_nlp_model = 'on'\n\n\t- nlp_transformer: Only enable PyTorch based BERT transformers that process pure text."
+ },
+ {
+ "output": " For more information, see :ref:`nlp-in-dai`. - included_transformers = ['BERTTransformer']\n\t\t- excluded_models = ['TextBERTModel', 'TextMultilingualBERTModel', 'TextXLNETModel', 'TextXLMModel','TextRoBERTaModel', 'TextDistilBERTModel', 'TextALBERTModel', 'TextCamemBERTModel', 'TextXLMRobertaModel']\n\t\t- enable_pytorch_nlp_transformer = 'on'\n\t\t- enable_pytorch_nlp_model = 'off'\n\n\t- image_model: Only enable image models that process pure images (ImageAutoModel)."
+ },
+ {
+ "output": " For more information, see :ref:`image-model`. Notes:\n\n \t\t- This option disables the :ref:`Genetic Algorithm ` (GA)."
+ },
+ {
+ "output": " - image_transformer: Only enable the ImageVectorizer transformer, which processes pure images. For more information, see :ref:`image-embeddings`."
+ },
+ {
+ "output": " :ref:`See ` for reference. - gpus_max: Maximize use of GPUs (e.g. use XGBoost, RAPIDS, Optuna hyperparameter search, etc."
+ },
+ {
+ "output": " Each pipeline building recipe mode can be chosen, and then fine-tuned using each expert settings. Changing the pipeline building recipe will reset all pipeline building recipe options back to default and then re-apply the specific rules for the new mode, which will undo any fine-tuning of expert options that are part of pipeline building recipe rules."
+ },
+ {
+ "output": " To reset recipe behavior, one can switch between 'auto' and the desired mode. This way the new child experiment will use the default settings for the chosen recipe."
+ },
+ {
+ "output": " This is same as 'on' unless it is a pure NLP or Image experiment. - on: Driverless AI genetic algorithm is used for feature engineering and model tuning and selection."
+ },
+ {
+ "output": " In the Optuna case, the scores shown in the iteration panel are the best score and trial scores. Optuna mode currently only uses Optuna for XGBoost, LightGBM, and CatBoost (custom recipe)."
+ },
+ {
+ "output": " - off: When set to 'off', the final pipeline is trained using the default feature engineering and feature selection."
+ },
+ {
+ "output": " .. _tournament_style:\n\n``tournament_style``\n\n\n.. dropdown:: Tournament Model for Genetic Algorithm\n\t:open:\n\n\tSelect a method to decide which models are best at each iteration."
+ },
+ {
+ "output": " Choose from the following:\n\n\t- auto: Choose based upon accuracy and interpretability\n\t- uniform: all individuals in population compete to win as best (can lead to all, e.g."
+ },
+ {
+ "output": " If enable_genetic_algorithm'Optuna', then every individual is self-mutated without any tournament during the :ref:`genetic algorithm `."
+ },
+ {
+ "output": " ``make_python_scoring_pipeline``\n\n\n.. dropdown:: Make Python Scoring Pipeline\n\t:open:\n\n\tSpecify whether to automatically build a Python Scoring Pipeline for the experiment."
+ },
+ {
+ "output": " Select Off to disable the automatic creation of the Python Scoring Pipeline. ``make_mojo_scoring_pipeline``\n\n\n.. dropdown:: Make MOJO Scoring Pipeline\n\t:open:\n\n\tSpecify whether to automatically build a MOJO (Java) Scoring Pipeline for the experiment."
+ },
+ {
+ "output": " With this option, any capabilities that prevent the creation of the pipeline are dropped. Select Off to disable the automatic creation of the MOJO Scoring Pipeline."
+ },
+ {
+ "output": " ``mojo_for_predictions``\n\n\n.. dropdown:: Allow Use of MOJO for Making Predictions\n\t:open:\n\n\tSpecify whether to use MOJO for making fast, low-latency predictions after the experiment has finished."
+ },
+ {
+ "output": " .. _reduce_mojo_size:\n\n``reduce_mojo_size``\n~\n.. dropdown:: Attempt to Reduce the Size of the MOJO (Small MOJO)\n\t:open:\n\n\tSpecify whether to attempt to create a small MOJO scoring pipeline when the experiment is being built."
+ },
+ {
+ "output": " This setting attempts to reduce the mojo size by limiting experiment's maximum :ref:`interaction depth ` to 3, setting :ref:`ensemble level ` to 0 i.e no ensemble model for final pipeline and limiting the :ref:`maximum number of features ` in the model to 200."
+ },
+ {
+ "output": " This is disabled by default. The equivalent config.toml setting is ``reduce_mojo_size``\n\n``make_pipeline_visualization``\n\n\n.. dropdown:: Make Pipeline Visualization\n\t:open:\n\n\tSpecify whether to create a visualization of the scoring pipeline at the end of an experiment."
+ },
+ {
+ "output": " Note that the Visualize Scoring Pipeline feature is experimental and is not available for deprecated models."
+ },
+ {
+ "output": " ``benchmark_mojo_latency``\n\n\n.. dropdown:: Measure MOJO Scoring Latency\n\t:open:\n\n\tSpecify whether to measure the MOJO scoring latency at the time of MOJO creation."
+ },
+ {
+ "output": " In this case, MOJO scoring latency will be measured if the pipeline.mojo file size is less than 100 MB."
+ },
+ {
+ "output": " If the MOJO creation process times out, a MOJO can still be made from the GUI or the R and Python clients (the timeout constraint is not applied to these)."
+ },
+ {
+ "output": " ``mojo_building_parallelism``\n~\n\n.. dropdown:: Number of Parallel Workers to Use During MOJO Creation\n\t:open:\n\n\tSpecify the number of parallel workers to use during MOJO creation."
+ },
+ {
+ "output": " Set this value to -1 (default) to use all physical cores. ``kaggle_username``\n~\n\n.. dropdown:: Kaggle Username\n\t:open:\n\n\tOptionally specify your Kaggle username to enable automatic submission and scoring of test set predictions."
+ },
+ {
+ "output": " If you don't have a Kaggle account, you can sign up at https://www.kaggle.com. ``kaggle_key``\n\n\n.. dropdown:: Kaggle Key\n\t:open:\n\n\tSpecify your Kaggle API key to enable automatic submission and scoring of test set predictions."
+ },
+ {
+ "output": " For more information on obtaining Kaggle API credentials, see https://github.com/Kaggle/kaggle-api#api-credentials."
+ },
+ {
+ "output": " This value defaults to 120 sec. ``min_num_rows``\n\n\n.. dropdown:: Min Number of Rows Needed to Run an Experiment\n\t:open:\n\n\tSpecify the minimum number of rows that a dataset must contain in order to run an experiment."
+ },
+ {
+ "output": " .. _reproducibility_level:\n\n``reproducibility_level``\n~\n\n.. dropdown:: Reproducibility Level\n\t:open:\n\n\tSpecify one of the following levels of reproducibility."
+ },
+ {
+ "output": " ``seed``\n\n\n.. dropdown:: Random Seed\n\t:open:\n\n\tSpecify a random seed for the experiment. When a seed is defined and the reproducible button is enabled (not by default), the algorithm will behave deterministically."
+ },
+ {
+ "output": " Specify whether to enable full cross-validation (multiple folds) during feature evolution as opposed to a single holdout split."
+ },
+ {
+ "output": " ``save_validation_splits``\n\n\n.. dropdown:: Store Internal Validation Split Row Indices\n\t:open:\n\n\tSpecify whether to store internal validation split row indices."
+ },
+ {
+ "output": " Enable this setting for debugging purposes. This setting is disabled by default. ``max_num_classes``\n~\n\n.. dropdown:: Max Number of Classes for Classification Problems\n\t:open:\n\n\tSpecify the maximum number of classes to allow for a classification problem."
+ },
+ {
+ "output": " Memory requirements also increase with a higher number of classes. This value defaults to 200. ``max_num_classes_compute_roc``\n~\n\n.. dropdown:: Max Number of Classes to Compute ROC and Confusion Matrix for Classification Problems\n\n\tSpecify the maximum number of classes to use when computing the ROC and CM."
+ },
+ {
+ "output": " This value defaults to 200 and cannot be lower than 2. ``max_num_classes_client_and_gui``\n\n\n.. dropdown:: Max Number of Classes to Show in GUI for Confusion Matrix\n\t:open:\n\n\tSpecify the maximum number of classes to show in the GUI for CM, showing first ``max_num_classes_client_and_gui`` labels."
+ },
+ {
+ "output": " Note that if this value is changed in the config.toml and the server is restarted, then this setting will only modify client-GUI launched diagnostics."
+ },
+ {
+ "output": " ``roc_reduce_type``\n~\n\n.. dropdown:: ROC/CM Reduction Technique for Large Class Counts\n\t:open:\n\n\tSpecify the ROC confusion matrix reduction technique used for large class counts:\n\n\t- Rows (Default): Reduce by randomly sampling rows\n\t- Classes: Reduce by truncating classes to no more than the value specified by ``max_num_classes_compute_roc``\n\n``max_rows_cm_ga``\n\n\n.. dropdown:: Maximum Number of Rows to Obtain Confusion Matrix Related Plots During Feature Evolution\n\t:open:\n\n\tSpecify the maximum number of rows to obtain confusion matrix related plots during feature evolution."
+ },
+ {
+ "output": " ``use_feature_brain_new_experiments``\n~\n\n.. dropdown:: Whether to Use Feature Brain for New Experiments\n\t:open:\n\n\tSpecify whether to use feature_brain results even if running new experiments."
+ },
+ {
+ "output": " Even rescoring may be insufficient, so by default this is False. For example, one experiment may have training=external validation by accident, and get high score, and while feature_brain_reset_score='on' means we will rescore, it will have already seen during training the external validation and leak that data as part of what it learned from."
+ },
+ {
+ "output": " .. _feature_brain1:\n\n``feature_brain_level``\n~\n\n.. dropdown:: Model/Feature Brain Level\n\t:open:\n\n\tSpecify whether to use H2O.ai brain, which enables local caching and smart re-use (checkpointing) of prior experiments to generate useful features and models for new experiments."
+ },
+ {
+ "output": " When enabled, this will use the H2O.ai brain cache if the cache file:\n\n\t - has any matching column names and types for a similar experiment type\n\t - has classes that match exactly\n\t - has class labels that match exactly\n\t - has basic time series choices that match\n\t - the interpretability of the cache is equal or lower\n\t - the main model (booster) is allowed by the new experiment\n\n\t- -1: Don't use any brain cache (default)\n\t- 0: Don't use any brain cache but still write to cache."
+ },
+ {
+ "output": " - 1: Smart checkpoint from the latest best individual model. Use case: Want to use the latest matching model."
+ },
+ {
+ "output": " - 2: Smart checkpoint if the experiment matches all column names, column types, classes, class labels, and time series options identically."
+ },
+ {
+ "output": " - 3: Smart checkpoint like level #1 but for the entire population. Tune only if the brain population is of insufficient size."
+ },
+ {
+ "output": " - 4: Smart checkpoint like level #2 but for the entire population. Tune only if the brain population is of insufficient size."
+ },
+ {
+ "output": " - 5: Smart checkpoint like level #4 but will scan over the entire brain cache of populations to get the best scored individuals."
+ },
+ {
+ "output": " When enabled, the directory where the H2O.ai Brain meta model files are stored is H2O.ai_brain. In addition, the default maximum brain size is 20GB."
+ },
+ {
+ "output": " This value defaults to 2. .. _feature_brain2:\n\n``feature_brain2``\n\n\n.. dropdown:: Feature Brain Save Every Which Iteration\n\t:open:\n\n\tSave feature brain iterations every iter_num % feature_brain_iterations_save_every_iteration 0, to be able to restart/refit with which_iteration_brain >= 0."
+ },
+ {
+ "output": " - -1: Don't use any brain cache. - 0: Don't use any brain cache but still write to cache. - 1: Smart checkpoint if an old experiment_id is passed in (for example, via running \"resume one like this\" in the GUI)."
+ },
+ {
+ "output": " (default)\n\t- 3: Smart checkpoint like level #1 but for the entire population. Tune only if the brain population is of insufficient size."
+ },
+ {
+ "output": " Tune only if the brain population is of insufficient size. - 5: Smart checkpoint like level #4 but will scan over the entire brain cache of populations (starting from resumed experiment if chosen) in order to get the best scored individuals."
+ },
+ {
+ "output": " In addition, the default maximum brain size is 20GB. Both the directory and the maximum size can be changed in the config.toml file."
+ },
+ {
+ "output": " Available options include:\n\n\t- -1: Use the last best\n\t- 1: Run one experiment with feature_brain_iterations_save_every_iteration=1 or some other number\n\t- 2: Identify which iteration brain dump you wants to restart/refit from\n\t- 3: Restart/Refit from the original experiment, setting which_iteration_brain to that number here in expert settings."
+ },
+ {
+ "output": " This value defaults to -1. .. _feature_brain4:\n\n``feature_brain4``\n\n\n.. dropdown:: Feature Brain Refit Uses Same Best Individual\n\t:open:\n\n\tSpecify whether to use the same best individual when performing a refit."
+ },
+ {
+ "output": " Enabling this setting lets you view the exact same model or feature with only one new feature added."
+ },
+ {
+ "output": " .. _feature_brain5:\n\n``feature_brain5``\n\n\n.. dropdown:: Feature Brain Adds Features with New Columns Even During Retraining of Final Model\n\t:open:\n\n\tSpecify whether to add additional features from new columns to the pipeline, even when performing a retrain of the final model."
+ },
+ {
+ "output": " New data may lead to new dropped features due to shift or leak detection. Disable this to avoid adding any columns as new features so that the pipeline is perfectly preserved when changing data."
+ },
+ {
+ "output": " ``force_model_restart_to_defaults``\n~\n\n.. dropdown:: Restart-Refit Use Default Model Settings If Model Switches\n\t:open:\n\n\tWhen restarting or refitting, specify whether to use the model class's default settings if the original model class is no longer available."
+ },
+ {
+ "output": " (Note that this may result in errors.) This is enabled by default. ``min_dai_iterations``\n\n\n.. dropdown:: Min DAI Iterations\n\t:open:\n\n\tSpecify the minimum number of Driverless AI iterations for an experiment."
+ },
+ {
+ "output": " This value defaults to 0. .. _target_transformer:\n\n``target_transformer``\n\n\n.. dropdown:: Select Target Transformation of the Target for Regression Problems\n\t:open:\n\n\tSpecify whether to automatically select target transformation for regression problems."
+ },
+ {
+ "output": " Selecting identity_noclip automatically turns off any target transformations. All transformers except for center, standardize, identity_noclip and log_noclip perform clipping to constrain the predictions to the domain of the target in the training data, so avoid them if you want to enable extrapolations."
+ },
+ {
+ "output": " ``fixed_num_folds_evolution``\n~\n\n.. dropdown:: Number of Cross-Validation Folds for Feature Evolution\n\t:open:\n\n\tSpecify the fixed number of cross-validation folds (if >= 2) for feature evolution."
+ },
+ {
+ "output": " This value defaults to -1 (auto). ``fixed_num_folds``\n~\n\n.. dropdown:: Number of Cross-Validation Folds for Final Model\n\t:open:\n\n\tSpecify the fixed number of cross-validation folds (if >= 2) for the final model."
+ },
+ {
+ "output": " This value defaults to -1 (auto). ``fixed_only_first_fold_model``\n~\n\n.. dropdown:: Force Only First Fold for Models\n\t:open:\n\n\tSpecify whether to force only the first fold for models."
+ },
+ {
+ "output": " Set \"on\" to force only first fold for models.This is useful for quick runs regardless of data\n\n``feature_evolution_data_size``\n~\n\n.. dropdown:: Max Number of Rows Times Number of Columns for Feature Evolution Data Splits\n\t:open:\n\n\tSpecify the maximum number of rows allowed for feature evolution data splits (not for the final pipeline)."
+ },
+ {
+ "output": " ``final_pipeline_data_size``\n\n\n.. dropdown:: Max Number of Rows Times Number of Columns for Reducing Training Dataset\n\t:open:\n\n\tSpecify the upper limit on the number of rows times the number of columns for training the final pipeline."
+ },
+ {
+ "output": " ``max_validation_to_training_size_ratio_for_final_ensemble``\n\n\n.. dropdown:: Maximum Size of Validation Data Relative to Training Data\n\t:open:\n\n\tSpecify the maximum size of the validation data relative to the training data."
+ },
+ {
+ "output": " Note that final model predictions and scores will always be provided on the full dataset provided. This value defaults to 2.0."
+ },
+ {
+ "output": " If the threshold is not exceeded, random sampling is performed. This value defaults to 0.01. You can choose to always perform random sampling by setting this value to 0, or to always perform stratified sampling by setting this value to 1."
+ },
+ {
+ "output": " (Refer to the :ref:`sample-configtoml` section to view options that can be overridden during an experiment.)"
+ },
+ {
+ "output": " Separate multiple config overrides with ``\\n``. For example, the following enables Poisson distribution for LightGBM and disables Target Transformer Tuning."
+ },
+ {
+ "output": " ::\n\n\t params_lightgbm=\\\"{'objective':'poisson'}\\\" \\n target_transformer=identity\n\n\tOr you can specify config overrides similar to the following without having to escape double quotes:\n\n\t::\n\n\t \"\"enable_glm=\"off\" \\n enable_xgboost_gbm=\"off\" \\n enable_lightgbm=\"off\" \\n enable_tensorflow=\"on\"\"\"\n\t \"\"max_cores=10 \\n data_precision=\"float32\" \\n max_rows_feature_evolution=50000000000 \\n ensemble_accuracy_switch=11 \\n feature_engineering_effort=1 \\n target_transformer=\"identity\" \\n tournament_feature_style_accuracy_switch=5 \\n params_tensorflow=\"{'layers': [100, 100, 100, 100, 100, 100]}\"\"\"\n\n\tWhen running the Python client, config overrides would be set as follows:\n\n\t::\n\n\t\tmodel = h2o.start_experiment_sync(\n\t\t dataset_key=train.key,\n\t\t target_col='target',\n\t\t is_classification=True,\n\t\t accuracy=7,\n\t\t time=5,\n\t\t interpretability=1,\n\t\t config_overrides=\"\"\"\n\t\t feature_brain_level=0\n\t\t enable_lightgbm=\"off\"\n\t\t enable_xgboost_gbm=\"off\"\n\t\t enable_ftrl=\"off\"\n\t\t \"\"\"\n\t\t)\n\n``last_recipe``\n~\n\n.. dropdown:: last_recipe\n\t:open:\n\n\tInternal helper to allow memory of if changed recipe\n\n``feature_brain_reset_score``\n~\n\n.. dropdown:: Whether to re-score models from brain cache\n\t:open:\n\n\tSpecify whether to smartly keep score to avoid re-munging/re-training/re-scoring steps brain models ('auto'), always force all steps for all brain imports ('on'), or never rescore ('off')."
+ },
+ {
+ "output": " 'on' is useful when smart similarity checking is not reliable enough. 'off' is useful when know want to keep exact same features and model for final model refit, despite changes in seed or other behaviors in features that might change the outcome if re-scored before reaching final model."
+ },
+ {
+ "output": " Can also set refit_same_best_individual True if want exact same best individual (highest scored model+features) to be used regardless of any scoring changes."
+ },
+ {
+ "output": " Set to 0 to disable this setting. ``which_iteration_brain``\n~\n\n.. dropdown:: Feature Brain Restart from which iteration\n\t:open:\n\n\tWhen performing restart or re-fit type feature_brain_level with resumed_experiment_id, choose which iteration to start from, instead of only last best -1 means just use last best."
+ },
+ {
+ "output": " ``refit_same_best_individual``\n\n\n.. dropdown:: Feature Brain refit uses same best individual\n\t:open:\n\n\tWhen doing re-fit from feature brain, if change columns or features, population of individuals used to refit from may change order of which was best, leading to better result chosen (False case)."
+ },
+ {
+ "output": " That is, if refit with just 1 extra column and have interpretability=1, then final model will be same features, with one more engineered feature applied to that new original feature."
+ },
+ {
+ "output": " However, in other cases, if data and all options are nearly (or exactly) identical, then these steps might change the features slightly (e.g."
+ },
+ {
+ "output": " By default, restart and refit avoid these steps assuming data and experiment setup have no changed significantly."
+ },
+ {
+ "output": " In order to ensure exact same final pipeline is fitted, one should also set:\n\n\t- 1) brain_add_features_for_new_columns false\n\t- 2) refit_same_best_individual true\n\t- 3) feature_brain_reset_score 'off'\n\t- 4) force_model_restart_to_defaults false\n\n\tThe score will still be reset if the experiment metric chosen changes, but changes to the scored model and features will be more frozen in place."
+ },
+ {
+ "output": " In some cases, one might have a new dataset but only want to keep same pipeline regardless of new columns, in which case one sets this to False."
+ },
+ {
+ "output": " To avoid change of feature set, one can disable all dropping of columns, but set this to False to avoid adding any columns as new features, so pipeline is perfectly preserved when changing data."
+ },
+ {
+ "output": " If False, then try to keep original hyperparameters, which can fail to work in general. ``dump_modelparams_every_scored_indiv``\n~\n\n.. dropdown:: Enable detailed scored model info\n\t:open:\n\n\tWhether to dump every scored individual's model parameters to csv/tabulated/json file produces files."
+ },
+ {
+ "output": " [txt, csv, json]\n\n.. _fast-approx-trees:\n\n``fast_approx_num_trees``\n~\n\n.. dropdown:: Max number of trees to use for fast approximation\n\t:open:\n\n\tWhen ``fast_approx=True``, specify the maximum number of trees to use."
+ },
+ {
+ "output": " .. note::\n By default, ``fast_approx`` is enabled for MLI and AutoDoc and disabled for Experiment predictions."
+ },
+ {
+ "output": " By default, this setting is enabled. .. note::\n By default, ``fast_approx`` is enabled for MLI and AutoDoc and disabled for Experiment predictions."
+ },
+ {
+ "output": " By default, this setting is disabled. .. note::\n By default, ``fast_approx`` is enabled for MLI and AutoDoc and disabled for Experiment predictions."
+ },
+ {
+ "output": " By default, this value is 50. .. note::\n By default, ``fast_approx_contribs`` is enabled for MLI and AutoDoc."
+ },
+ {
+ "output": " By default, this setting is enabled. .. note::\n By default, ``fast_approx_contribs`` is enabled for MLI and AutoDoc."
+ },
+ {
+ "output": " By default, this setting is enabled. .. note::\n By default, ``fast_approx_contribs`` is enabled for MLI and AutoDoc."
+ },
+ {
+ "output": " .. _linux-rpms:\n\nLinux RPMs\n\n\nFor Linux machines that will not use the Docker image or DEB, an RPM installation is available for the following environments:\n\n- x86_64 RHEL 7 / RHEL 8\n- CentOS 7 / CentOS 8\n\nThe installation steps assume that you have a license key for Driverless AI."
+ },
+ {
+ "output": " Once obtained, you will be prompted to paste the license key into the Driverless AI UI when you first log in, or you can save it as a .sig file and place it in the \\license folder that you will create during the installation process."
+ },
+ {
+ "output": " - When using systemd, remove the ``dai-minio``, ``dai-h2o``, ``dai-redis``, ``dai-procsy``, and ``dai-vis-server`` services."
+ },
+ {
+ "output": " Note that if you are using K80 GPUs, the minimum required NVIDIA driver version is 450.80.02\n- OpenCL (Required for full LightGBM support on GPU-powered systems)\n- Driverless AI RPM, available from https://www.h2o.ai/download/\n\nNote: CUDA 11.2.2 (for GPUs) and cuDNN (required for TensorFlow support on GPUs) are included in the Driverless AI package."
+ },
+ {
+ "output": " To install OpenCL, run the following as root:\n\n.. code-block:: bash\n\n mkdir -p /etc/OpenCL/vendors && echo \"libnvidia-opencl.so.1\" > /etc/OpenCL/vendors/nvidia.icd && chmod a+r /etc/OpenCL/vendors/nvidia.icd && chmod a+x /etc/OpenCL/vendors/ && chmod a+x /etc/OpenCL\n\n.. note::\n\tIf OpenCL is not installed, then CUDA LightGBM is automatically used."
+ },
+ {
+ "output": " Installing Driverless AI\n\n\nRun the following commands to install the Driverless AI RPM. .. code-block:: bash\n :substitutions:\n\n # Install Driverless AI."
+ },
+ {
+ "output": " You can optionally specify a different service user and group as shown below. Replace and as appropriate."
+ },
+ {
+ "output": " # rpm saves these for systemd in the /etc/dai/User.conf and /etc/dai/Group.conf files. sudo DAI_USER=myuser DAI_GROUP=mygroup rpm -i |VERSION-rpm-lin|\n\nYou may now optionally make changes to /etc/dai/config.toml."
+ },
+ {
+ "output": " sudo systemctl start dai\n\nIf you do not have systemd:\n\n.. code-block:: bash\n\n # Start Driverless AI."
+ },
+ {
+ "output": " This command needs to be run every reboot. For more information: http://docs.nvidia.com/deploy/driver-persistence/index.html."
+ },
+ {
+ "output": " sudo systemctl stop dai\n\n # The processes should now be stopped. Verify. sudo ps -u dai\n\nIf you do not have systemd:\n\n.. code-block:: bash\n\n # Stop Driverless AI."
+ },
+ {
+ "output": " Verify. sudo ps -u dai\n\nUpgrading Driverless AI\n~\n\n.. include:: upgrade-warning.frag\n\nRequirements\n\n\nWe recommend to have NVIDIA driver >= |NVIDIA-driver-ver| installed (GPU only) in your host environment for a seamless experience on all architectures, including Ampere."
+ },
+ {
+ "output": " Go to `NVIDIA download driver `__ to get the latest NVIDIA Tesla A/T/V/P/K series drivers."
+ },
+ {
+ "output": " .. note::\n\tIf you are using K80 GPUs, the minimum required NVIDIA driver version is 450.80.02. Upgrade Steps\n'\n\nIf you have systemd (preferred):\n\n.. code-block:: bash\n :substitutions:\n\n # Stop Driverless AI."
+ },
+ {
+ "output": " Verify. sudo ps -u dai\n\n # Make a backup of /opt/h2oai/dai/tmp directory at this time. # Upgrade and restart."
+ },
+ {
+ "output": " sudo pkill -U dai\n\n # The processes should now be stopped. Verify. sudo ps -u dai\n\n # Make a backup of /opt/h2oai/dai/tmp directory at this time."
+ },
+ {
+ "output": " sudo rpm -U |VERSION-rpm-lin|\n sudo -H -u dai /opt/h2oai/dai/run-dai.sh\n\nUninstalling Driverless AI\n\n\nIf you have systemd (preferred):\n\n.. code-block:: bash\n\n # Stop Driverless AI."
+ },
+ {
+ "output": " Verify. sudo ps -u dai\n\n # Uninstall. sudo rpm -e dai\n\nIf you do not have systemd:\n\n.. code-block:: bash\n\n # Stop Driverless AI."
+ },
+ {
+ "output": " Verify. sudo ps -u dai\n\n # Uninstall. sudo rpm -e dai\n\nCAUTION! At this point you can optionally completely remove all remaining files, including the database."
+ },
+ {
+ "output": " .. code-block:: bash\n\n sudo rm -rf /opt/h2oai/dai\n sudo rm -rf /etc/dai\n\nNote: The UID and GID are not removed during the uninstall process."
+ },
+ {
+ "output": " .. _linux-deb:\n\nLinux DEBs\n\n\nFor Linux machines that will not use the Docker image or RPM, a deb installation is available for x86_64 Ubuntu 16.04/18.04/20.04/22.04."
+ },
+ {
+ "output": " For information on how to obtain a license key for Driverless AI, visit https://www.h2o.ai/products/h2o-driverless-ai/."
+ },
+ {
+ "output": " .. note::\n\t- To ensure that :ref:`AutoDoc ` pipeline visualizations are generated correctly on native installations, installing `fontconfig `_ is recommended."
+ },
+ {
+ "output": " When upgrading, you can use the following commands to deactivate these services:\n\n ::\n\n systemctl stop dai-minio\n systemctl disable dai-minio\n systemctl stop dai-h2o\n systemctl disable dai-h2o\n systemctl stop dai-redis\n systemctl disable dai-redis\n systemctl stop dai-procsy\n systemctl disable dai-procsy\n systemctl stop dai-vis-server\n systemctl disable dai-vis-server\n\nEnvironment\n~\n\n+-+-+\n| Operating System | Min Mem |\n+=+=+\n| Ubuntu with GPUs | 64 GB |\n+-+-+\n| Ubuntu with CPUs | 64 GB |\n+-+-+\n\nRequirements\n\n\n- Ubuntu 16.04/Ubuntu 18.04/Ubuntu 20.04/Ubuntu 22.04\n- NVIDIA drivers >= |NVIDIA-driver-ver| is recommended (GPU only)."
+ },
+ {
+ "output": " About the Install\n~\n\n.. include:: linux-rpmdeb-about.frag\n\nStarting NVIDIA Persistence Mode (GPU only)\n~\n\nIf you have NVIDIA GPUs, you must run the following NVIDIA command."
+ },
+ {
+ "output": " For more information: http://docs.nvidia.com/deploy/driver-persistence/index.html. .. include:: enable-persistence.rst\n\nInstalling OpenCL\n~\n\nOpenCL is required for full LightGBM support on GPU-powered systems."
+ },
+ {
+ "output": " CUDA LightGBM is only supported on Pascal-powered (and later) systems, and can be enabled manually with the ``enable_lightgbm_cuda_support`` config.toml setting."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n # Install Driverless AI. sudo dpkg -i |VERSION-deb-lin|\n\nBy default, the Driverless AI processes are owned by the 'dai' user and 'dai' group."
+ },
+ {
+ "output": " Replace and as appropriate. .. code-block:: bash\n :substitutions:\n\n # Temporarily specify service user and group when installing Driverless AI."
+ },
+ {
+ "output": " sudo DAI_USER=myuser DAI_GROUP=mygroup dpkg -i |VERSION-deb-lin|\n\nYou may now optionally make changes to /etc/dai/config.toml."
+ },
+ {
+ "output": " sudo systemctl start dai\n\nNote: If you don't have systemd, refer to :ref:`linux-tarsh` for install instructions."
+ },
+ {
+ "output": " sudo systemctl stop dai\n\n # The processes should now be stopped. Verify. sudo ps -u dai\n\nIf you do not have systemd:\n\n.. code-block:: bash\n\n # Stop Driverless AI."
+ },
+ {
+ "output": " Verify. sudo ps -u dai\n\n\nUpgrading Driverless AI\n~\n\n.. include:: upgrade-warning.frag\n\nRequirements\n\n\nWe recommend to have NVIDIA driver >= |NVIDIA-driver-ver| installed (GPU only) in your host environment for a seamless experience on all architectures, including Ampere."
+ },
+ {
+ "output": " Go to `NVIDIA download driver `__ to get the latest NVIDIA Tesla A/T/V/P/K series drivers."
+ },
+ {
+ "output": " .. note::\n\tIf you are using K80 GPUs, the minimum required NVIDIA driver version is 450.80.02. Upgrade Steps\n'\n\nIf you have systemd (preferred):\n\n.. code-block:: bash\n :substitutions:\n\n # Stop Driverless AI."
+ },
+ {
+ "output": " # Upgrade Driverless AI. sudo dpkg -i |VERSION-deb-lin|\n sudo systemctl daemon-reload\n sudo systemctl start dai\n\nIf you do not have systemd:\n\n.. code-block:: bash\n :substitutions:\n\n # Stop Driverless AI."
+ },
+ {
+ "output": " Verify. sudo ps -u dai\n\n # Make a backup of /opt/h2oai/dai/tmp directory at this time. If you do not, all previous data will be lost."
+ },
+ {
+ "output": " sudo dpkg -i |VERSION-deb-lin|\n sudo -H -u dai /opt/h2oai/dai/run-dai.sh\n\nUninstalling Driverless AI\n\n\nIf you have systemd (preferred):\n\n.. code-block:: bash\n\n # Stop Driverless AI."
+ },
+ {
+ "output": " Verify. sudo ps -u dai\n\n # Uninstall Driverless AI. sudo dpkg -r dai\n\n # Purge Driverless AI."
+ },
+ {
+ "output": " sudo pkill -U dai\n\n # The processes should now be stopped. Verify. sudo ps -u dai\n\n # Uninstall Driverless AI."
+ },
+ {
+ "output": " sudo dpkg -P dai\n\nCAUTION! At this point you can optionally completely remove all remaining files, including the database (this cannot be undone):\n\n.. code-block:: bash\n\n sudo rm -rf /opt/h2oai/dai\n sudo rm -rf /etc/dai\n\nNote: The UID and GID are not removed during the uninstall process."
+ },
+ {
+ "output": " However, we DO NOT recommend removing the UID and GID if you plan to re-install Driverless AI. If you remove the UID and GID and then reinstall Driverless AI, the UID and GID will likely be re-assigned to a different (unrelated) user/group in the future; this may cause confusion if there are any remaining files on the filesystem referring to the deleted user or group."
+ },
+ {
+ "output": " This problem is caused by the font ``NotoColorEmoji.ttf``, which cannot be processed by the Python matplotlib library."
+ },
+ {
+ "output": " (Do not use fontconfig because it is ignored by matplotlib.) The following will print out the command that should be executed."
+ },
+ {
+ "output": " .. _install-on-nvidia-dgx:\n\nInstall on NVIDIA GPU Cloud/NGC Registry\n\n\nDriverless AI is supported on the following NVIDIA DGX products, and the installation steps for each platform are the same."
+ },
+ {
+ "output": " Driverless AI is only available in the NGC registry for DGX machines. 1. Log in to your NVIDIA GPU Cloud account at https://ngc.nvidia.com/registry."
+ },
+ {
+ "output": " 2. In the Registry > Partners menu, select h2oai-driverless. .. image:: ../images/ngc_select_dai.png\n :align: center\n\n3."
+ },
+ {
+ "output": " .. image:: ../images/ngc_select_tag.png\n :align: center\n\n4. On your NVIDIA DGX machine, open a command prompt and use the specified pull command to retrieve the Driverless AI image."
+ },
+ {
+ "output": " Set up a directory for the version of Driverless AI on the host machine: \n\n .. code-block:: bash\n :substitutions:\n\n # Set up directory with the version name\n mkdir |VERSION-dir|\n\n6."
+ },
+ {
+ "output": " At this point, you can copy data into the data directory on the host machine. The data will be visible inside the Docker container."
+ },
+ {
+ "output": " Enable persistence of the GPU. Note that this only needs to be run once. Refer to the following for more information: http://docs.nvidia.com/deploy/driver-persistence/index.html."
+ },
+ {
+ "output": " Run ``docker images`` to find the new image tag. 10. Start the Driverless AI Docker image and replace TAG below with the image tag."
+ },
+ {
+ "output": " Note that from version 1.10 DAI docker image runs with internal ``tini`` that is equivalent to using ``init`` from docker, if both are enabled in the launch command, tini will print a (harmless) warning message."
+ },
+ {
+ "output": " But if user plans to build :ref:`image auto model ` extensively, then ``shm-size=2g`` is recommended for Driverless AI docker command."
+ },
+ {
+ "output": " .. tabs::\n\n .. tab:: >= Docker 19.03\n\n .. code-block:: bash\n :substitutions:\n\n # Start the Driverless AI Docker image\n docker run runtime=nvidia \\\n pid=host \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. tab:: < Docker 19.03\n\n .. code-block:: bash\n :substitutions:\n\n # Start the Driverless AI Docker image\n nvidia-docker run \\\n pid=host \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n Driverless AI will begin running::\n\n \n Welcome to H2O.ai's Driverless AI\n -\n\n - Put data in the volume mounted at /data\n - Logs are written to the volume mounted at /log/20180606-044258\n - Connect to Driverless AI on port 12345 inside the container\n - Connect to Jupyter notebook on port 8888 inside the container\n\n11."
+ },
+ {
+ "output": " Upgrading Driverless AI\n~\n\nThe steps for upgrading Driverless AI on an NVIDIA DGX system are similar to the installation steps."
+ },
+ {
+ "output": " Requirements\n\n\nAs of 1.7.0, CUDA 9 is no longer supported. Your host environment must have CUDA 10.0 or later with NVIDIA drivers >= 440.82 installed (GPU only)."
+ },
+ {
+ "output": " Go to https://www.nvidia.com/Download/index.aspx to get the latest NVIDIA Tesla V/P/K series driver."
+ },
+ {
+ "output": " On your NVIDIA DGX machine, create a directory for the new Driverless AI version. 2. Copy the data, log, license, and tmp directories from the previous Driverless AI directory into the new Driverless AI directory."
+ },
+ {
+ "output": " Run ``docker pull nvcr.io/h2oai/h2oai-driverless-ai:latest`` to retrieve the latest Driverless AI version."
+ },
+ {
+ "output": " AWS Role-Based Authentication\n~\n\nIn Driverless AI, it is possible to enable role-based authentication via the `IAM role `__."
+ },
+ {
+ "output": " AWS IAM Setup\n'\n\n1. Create an IAM role. This IAM role should have a Trust Relationship with Principal Trust Entity set to your Account ID."
+ },
+ {
+ "output": " Create a new policy that lets users assume the role:\n\n .. image:: ../images/aws_iam_policy_create.png\n\n3."
+ },
+ {
+ "output": " .. image:: ../images/aws_iam_policy_assign.png\n\n4. Test role switching here: https://signin.aws.amazon.com/switchrole."
+ },
+ {
+ "output": " Driverless AI Setup\n'\n\nUpdate the ``aws_use_ec2_role_credentials`` config variable in the config.toml file or start Driverless AI using the ``AWS_USE_EC2_ROLE_CREDENTIALS`` environment variable."
+ },
+ {
+ "output": " Granting a User Permissions to Switch Roles: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_permissions-to-switch.html\n2."
+ },
+ {
+ "output": " .. _system-settings:\n\nSystem Settings\n=\n\n.. _exclusive_mode:\n\n``exclusive_mode``\n\n\n.. dropdown:: Exclusive level of access to node resources\n\t:open:\n\n\tThere are three levels of access:\n\n\t\t- safe: this level assumes that there might be another experiment also running on same node."
+ },
+ {
+ "output": " - max: this level assumes that there is absolutly nothing else running on the node except the experiment\n\n\tThe default level is \"safe\" and the equivalent config.toml parameter is ``exclusive_mode``."
+ },
+ {
+ "output": " Each exclusive mode can be chosen, and then fine-tuned using each expert settings. Changing the exclusive mode will reset all exclusive mode related options back to default and then re-apply the specific rules for the new mode, which will undo any fine-tuning of expert options that are part of exclusive mode rules."
+ },
+ {
+ "output": " To reset mode behavior, one can switch between 'safe' and the desired mode. This way the new child experiment will use the default system resources for the chosen mode."
+ },
+ {
+ "output": " Note that if you specify 0, all available cores will be used. Lower values can reduce memory usage but might slow down the experiment."
+ },
+ {
+ "output": " One can also set it using the environment variable OMP_NUM_THREADS or OPENBLAS_NUM_THREADS (e.g., in bash: 'export OMP_NUM_THREADS=32' or 'export OPENBLAS_NUM_THREADS=32')\n\n``max_fit_cores``\n~\n\n.. dropdown:: Maximum Number of Cores to Use for Model Fit\n\t:open:\n\n\tSpecify the maximum number of cores to use for a model's fit call."
+ },
+ {
+ "output": " This value defaults to 10. .. _use_dask_cluster:\n\n``use_dask_cluster``\n\n\n.. dropdown:: If full dask cluster is enabled, use full cluster\n\t:open:\n\n\tSpecify whether to use full multinode distributed cluster (True) or single-node dask (False)."
+ },
+ {
+ "output": " E.g. several DGX nodes can be more efficient, if used one DGX at a time for medium-sized data. The equivalent config.toml parameter is ``use_dask_cluster``."
+ },
+ {
+ "output": " Note that if you specify 0, all available cores will be used. This value defaults to 0(all). ``max_predict_cores_in_dai``\n\n\n.. dropdown:: Maximum Number of Cores to Use for Model Transform and Predict When Doing MLI, AutoDoc\n\t:open:\n\n\tSpecify the maximum number of cores to use for a model's transform and predict call when doing operations in the Driverless AI MLI GUI and the Driverless AI R and Python clients."
+ },
+ {
+ "output": " This value defaults to 4. ``batch_cpu_tuning_max_workers``\n\n\n.. dropdown:: Tuning Workers per Batch for CPU\n\t:open:\n\n\tSpecify the number of workers used in CPU mode for tuning."
+ },
+ {
+ "output": " This value defaults to 0(socket count). ``cpu_max_workers``\n~\n.. dropdown:: Number of Workers for CPU Training\n\t:open:\n\n\tSpecify the number of workers used in CPU mode for training:\n\n\t- 0: Use socket count (Default)\n\t- -1: Use all physical cores >= 1 that count\n\n.. _num_gpus_per_experiment:\n\n``num_gpus_per_experiment``\n~\n\n.. dropdown:: #GPUs/Experiment\n\t:open:\n\n\tSpecify the number of GPUs to use per experiment."
+ },
+ {
+ "output": " Must be at least as large as the number of GPUs to use per model (or -1). In multinode context when using dask, this refers to the per-node value."
+ },
+ {
+ "output": " In order to have a sufficient number of cores per GPU, this setting limits the number of GPUs used."
+ },
+ {
+ "output": " .. _num-gpus-per-model:\n\n``num_gpus_per_model``\n\n.. dropdown:: #GPUs/Model\n\t:open:\n\n\tSpecify the number of GPUs to user per model."
+ },
+ {
+ "output": " Currently num_gpus_per_model other than 1 disables GPU locking, so is only recommended for single experiments and single users."
+ },
+ {
+ "output": " In all cases, XGBoost tree and linear models use the number of GPUs specified per model, while LightGBM and Tensorflow revert to using 1 GPU/model and run multiple models on multiple GPUs."
+ },
+ {
+ "output": " Rulefit uses GPUs for parts involving obtaining the tree using LightGBM. In multinode context when using dask, this parameter refers to the per-node value."
+ },
+ {
+ "output": " of GPUs for Isolated Prediction/Transform\n\t:open:\n\n\tSpecify the number of GPUs to use for ``predict`` for models and ``transform`` for transformers when running outside of ``fit``/``fit_transform``."
+ },
+ {
+ "output": " New processes will use this count for applicable models and transformers. Note that enabling ``tensorflow_nlp_have_gpus_in_production`` will override this setting for relevant TensorFlow NLP transformers."
+ },
+ {
+ "output": " Note: When GPUs are used, TensorFlow, PyTorch models and transformers, and RAPIDS always predict on GPU."
+ },
+ {
+ "output": " In multinode context when using dask, this refers to the per-node value. ``gpu_id_start``\n\n\n.. dropdown:: GPU Starting ID\n\t:open:\n\n\tSpecify Which gpu_id to start with."
+ },
+ {
+ "output": " For example, if ``CUDA_VISIBLE_DEVICES='4,5'`` then ``gpu_id_start=0`` will refer to device #4. From expert mode, to run 2 experiments, each on a distinct GPU out of 2 GPUs, then:\n\n\t- Experiment#1: num_gpus_per_model=1, num_gpus_per_experiment=1, gpu_id_start=0\n\t- Experiment#2: num_gpus_per_model=1, num_gpus_per_experiment=1, gpu_id_start=1\n\n\tFrom expert mode, to run 2 experiments, each on a distinct GPU out of 8 GPUs, then:\n\n\t- Experiment#1: num_gpus_per_model=1, num_gpus_per_experiment=4, gpu_id_start=0\n\t- Experiment#2: num_gpus_per_model=1, num_gpus_per_experiment=4, gpu_id_start=4\n\n\tTo run on all 4 GPUs/model, then\n\n\t- Experiment#1: num_gpus_per_model=4, num_gpus_per_experiment=4, gpu_id_start=0\n\t- Experiment#2: num_gpus_per_model=4, num_gpus_per_experiment=4, gpu_id_start=4\n\n\tIf num_gpus_per_model!=1, global GPU locking is disabled."
+ },
+ {
+ "output": " More information is available at: https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker#gpu-isolation\n\tNote that gpu selection does not wrap, so gpu_id_start + num_gpus_per_model must be less than the number of visibile GPUs."
+ },
+ {
+ "output": " For actual use beyond this value, system will start to have slow-down issues. THe default value is 3."
+ },
+ {
+ "output": " ``max_dt_threads_munging``\n\n\n.. dropdown:: Max Number of Threads to Use for datatable and OpenBLAS for Munging and Model Training\n\t:open:\n\n\tSpecify the maximum number of threads to use for datatable and OpenBLAS during data munging (applied on a per process basis):\n\n\t- 0 = Use all threads\n\t- -1 = Automatically select number of threads (Default)\n\n``max_dt_threads_readwrite``\n\n\n.. dropdown:: Max Number of Threads to Use for datatable Read and Write of Files\n\t:open:\n\n\tSpecify the maximum number of threads to use for datatable during data reading and writing (applied on a per process basis):\n\n\t- 0 = Use all threads\n\t- -1 = Automatically select number of threads (Default)\n\n``max_dt_threads_stats_openblas``\n~\n\n.. dropdown:: Max Number of Threads to Use for datatable Stats and OpenBLAS\n\t:open:\n\n\tSpecify the maximum number of threads to use for datatable stats and OpenBLAS (applied on a per process basis):\n\n\t- 0 = Use all threads\n\t- -1 = Automatically select number of threads (Default)\n\n.. _allow_reduce_features_when_failure:\n\n``allow_reduce_features_when_failure``\n\n\n.. dropdown:: Whether to reduce features when model fails (GPU OOM Protection)\n\t:open:\n\n\tBig models (on big data or with lot of features) can run out of memory on GPUs."
+ },
+ {
+ "output": " Currently is applicable to all non-dask XGBoost models (i.e. GLMModel, XGBoostGBMModel, XGBoostDartModel, XGBoostRFModel),during normal fit or when using Optuna."
+ },
+ {
+ "output": " For example, If XGBoost runs out of GPU memory, this is detected, and (regardless of setting of skip_model_failures), we perform feature selection using XGBoost on subsets of features."
+ },
+ {
+ "output": " This splitting continues until no failure occurs. Then all sub-models are used to estimate variable importance by absolute information gain, in order to decide which features to include."
+ },
+ {
+ "output": " Note:\n\n\t- This option is set to 'auto' -> 'on' by default i.e whenever the conditions are favorable, it is set to 'on'."
+ },
+ {
+ "output": " Hence if user enables reproducibility for the experiment, 'auto' automatically sets this option to 'off'."
+ },
+ {
+ "output": " - Reduction is only done on features and not on rows for the feature selection step. Also see :ref:`reduce_repeats_when_failure ` and :ref:`fraction_anchor_reduce_features_when_failure `\n\n.. _reduce_repeats_when_failure:\n\n``reduce_repeats_when_failure``\n~\n\n.. dropdown:: Number of repeats for models used for feature selection during failure recovery\n\t:open:\n\n\tWith :ref:`allow_reduce_features_when_failure `, this controls how many repeats of sub-models are used for feature selection."
+ },
+ {
+ "output": " More repeats can lead to higher accuracy. The cost of this option is proportional to the repeat count."
+ },
+ {
+ "output": " .. _fraction_anchor_reduce_features_when_failure:\n\n``fraction_anchor_reduce_features_when_failure``\n\n\n.. dropdown:: Fraction of features treated as anchor for feature selection during failure recovery\n\t:open:\n\n\tWith :ref:`allow_reduce_features_when_failure `, this controls the fraction of features treated as an anchor that are fixed for all sub-models."
+ },
+ {
+ "output": " For tuning and evolution, the probability depends upon any prior importance (if present) from other individuals, while final model uses uniform probability for anchor features."
+ },
+ {
+ "output": " ``xgboost_reduce_on_errors_list``\n~\n\n.. dropdown:: Errors From XGBoost That Trigger Reduction of Features\n\t:open:\n\n\tError strings from XGBoost that are used to trigger re-fit on reduced sub-models."
+ },
+ {
+ "output": " ``lightgbm_reduce_on_errors_list``\n\n\n.. dropdown:: Errors From LightGBM That Trigger Reduction of Features\n\t:open:\n\n\tError strings from LightGBM that are used to trigger re-fit on reduced sub-models."
+ },
+ {
+ "output": " ``num_gpus_per_hyperopt_dask``\n\n\n.. dropdown:: GPUs / HyperOptDask\n\t:open:\n\n\tSpecify the number of GPUs to use per model hyperopt training task."
+ },
+ {
+ "output": " For example, when this is set to -1 and there are 4 GPUs available, all of them can be used for the training of a single model across a Dask cluster."
+ },
+ {
+ "output": " In multinode context, this refers to the per-node value. ``detailed_traces``\n~\n\n.. dropdown:: Enable Detailed Traces\n\t:open:\n\n\tSpecify whether to enable detailed tracing in Driverless AI trace when running an experiment."
+ },
+ {
+ "output": " ``debug_log``\n~\n\n.. dropdown:: Enable Debug Log Level\n\t:open:\n\n\tIf enabled, the log files will also include debug logs."
+ },
+ {
+ "output": " ``log_system_info_per_experiment``\n\n\n.. dropdown:: Enable Logging of System Information for Each Experiment\n\t:open:\n\n\tSpecify whether to include system information such as CPU, GPU, and disk space at the start of each experiment log."
+ },
+ {
+ "output": " The F0.5 score is the weighted harmonic mean of the precision and recall (given a threshold value)."
+ },
+ {
+ "output": " More weight should be given to precision for cases where False Positives are considered worse than False Negatives."
+ },
+ {
+ "output": " In this case, you want your predictions to be very precise and only capture the products that will definitely run out."
+ },
+ {
+ "output": " F05 equation:\n\n.. math::\n\n F0.5 = 1.25 \\;\\Big(\\; \\frac{(precision) \\; (recall)}{((0.25) \\; (precision)) + recall}\\; \\Big)\n\nWhere:\n\n- *precision* is the positive observations (true positives) the model correctly identified from all the observations it labeled as positive (the true positives + the false positives)."
+ },
+ {
+ "output": " S3 Setup\n\n\nDriverless AI lets you explore S3 data sources from within the Driverless AI application."
+ },
+ {
+ "output": " Note: Depending on your Docker install version, use either the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command when starting the Driverless AI Docker image."
+ },
+ {
+ "output": " Description of Configuration Attributes\n~\n\n- ``aws_access_key_id``: The S3 access key ID\n- ``aws_secret_access_key``: The S3 access key\n- ``aws_role_arn``: The Amazon Resource Name\n- ``aws_default_region``: The region to use when the aws_s3_endpoint_url option is not set."
+ },
+ {
+ "output": " - ``aws_s3_endpoint_url``: The endpoint URL that will be used to access S3. - ``aws_use_ec2_role_credentials``: If set to true, the S3 Connector will try to to obtain credentials associated with the role attached to the EC2 instance."
+ },
+ {
+ "output": " - ``enabled_file_systems``: The file systems you want to enable. This must be configured in order for data connectors to function properly."
+ },
+ {
+ "output": " It does not pass any S3 access key or secret; however it configures Docker DNS by passing the name and IP of the S3 name node."
+ },
+ {
+ "output": " .. code-block:: bash\n\t :substitutions:\n\n\t nvidia-docker run \\\n\t\t\tshm-size=256m \\\n\t\t\tadd-host name.node:172.16.2.186 \\\n\t\t\t-e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS=\"file,s3\" \\\n\t\t\t-p 12345:12345 \\\n\t\t\tinit -it rm \\\n\t\t\t-v /tmp/dtmp/:/tmp \\\n\t\t\t-v /tmp/dlog/:/log \\\n\t\t\t-v /tmp/dlicense/:/license \\\n\t\t\t-v /tmp/ddata/:/data \\\n\t\t\t-u $(id -u):$(id -g) \\\n\t\t\th2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Docker Image with the config.toml\n\n\tThis example shows how to configure S3 options in the config.toml file, and then specify that file when starting Driverless AI in Docker."
+ },
+ {
+ "output": " 1. Configure the Driverless AI config.toml file. Set the following configuration options. - ``enabled_file_systems = \"file, upload, s3\"``\n\n\t2."
+ },
+ {
+ "output": " .. code-block:: bash\n\t \t :substitutions:\n\n\t\t nvidia-docker run \\\n\t\t \tpid=host \\\n\t\t \tinit \\\n\t\t \trm \\\n\t\t \tshm-size=256m \\\n\t\t \tadd-host name.node:172.16.2.186 \\\n\t\t \t-e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n\t\t \t-p 12345:12345 \\\n\t\t \t-v /local/path/to/config.toml:/path/in/docker/config.toml \\\n\t\t \t-v /etc/passwd:/etc/passwd:ro \\\n\t\t \t-v /etc/group:/etc/group:ro \\\n\t\t \t-v /tmp/dtmp/:/tmp \\\n\t\t \t-v /tmp/dlog/:/log \\\n\t\t \t-v /tmp/dlicense/:/license \\\n\t\t \t-v /tmp/ddata/:/data \\\n\t\t \t-u $(id -u):$(id -g) \\\n\t\t \th2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n\tThis example enables the S3 data connector and disables authentication."
+ },
+ {
+ "output": " 1. Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n\t ::\n\n\t # DEB and RPM\n\t export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n\t # TAR SH\n\t export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n\t2."
+ },
+ {
+ "output": " ::\n\n\t\t# File System Support\n\t\t# upload : standard upload feature\n\t\t# file : local file system/server file system\n\t\t# hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n\t\t# dtap : Blue Data Tap file system, remember to configure the DTap section below\n\t\t# s3 : Amazon S3, optionally configure secret and access key below\n\t\t# gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n\t\t# gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n\t\t# minio : Minio Cloud Storage, remember to configure secret and access key below\n\t\t# snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n\t\t# kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n\t\t# azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n\t\t# jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n\t\t# recipe_url: load custom recipe from URL\n\t\t# recipe_file: load custom recipe from local file system\n\t\tenabled_file_systems = \"file, s3\"\n\n\t3."
+ },
+ {
+ "output": " Example 2: Enable S3 with Authentication\n\n\n.. tabs::\n .. group-tab:: Docker Image Installs\n\n\tThis example enables the S3 data connector with authentication by passing an S3 access key ID and an access key."
+ },
+ {
+ "output": " This allows users to reference data stored in S3 directly using the name node address, for example: s3://name.node/datasets/iris.csv."
+ },
+ {
+ "output": " 1. Configure the Driverless AI config.toml file. Set the following configuration options. - ``enabled_file_systems = \"file, upload, s3\"``\n\t - ``aws_access_key_id = \"\"``\n\t - ``aws_secret_access_key = \"\"``\n\n\t2."
+ },
+ {
+ "output": " .. code-block:: bash\n\t \t:substitutions:\n\n\t\t nvidia-docker run \\\n\t\t \tpid=host \\\n\t\t \tinit \\\n\t\t \trm \\\n\t\t \tshm-size=256m \\\n\t\t \tadd-host name.node:172.16.2.186 \\\n\t\t \t-e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n\t\t \t-p 12345:12345 \\\n\t\t \t-v /local/path/to/config.toml:/path/in/docker/config.toml \\\n\t\t \t-v /etc/passwd:/etc/passwd:ro \\\n\t\t \t-v /etc/group:/etc/group:ro \\\n\t\t \t-v /tmp/dtmp/:/tmp \\\n\t\t \t-v /tmp/dlog/:/log \\\n\t\t \t-v /tmp/dlicense/:/license \\\n\t\t \t-v /tmp/ddata/:/data \\\n\t\t \t-u $(id -u):$(id -g) \\\n\t\t \th2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n\tThis example enables the S3 data connector with authentication by passing an S3 access key ID and an access key."
+ },
+ {
+ "output": " Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n\t ::\n\n\t # DEB and RPM\n\t export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n\t # TAR SH\n\t export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n\t2."
+ },
+ {
+ "output": " ::\n\n\t\t# File System Support\n\t\t# upload : standard upload feature\n\t\t# file : local file system/server file system\n\t\t# hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n\t\t# dtap : Blue Data Tap file system, remember to configure the DTap section below\n\t\t# s3 : Amazon S3, optionally configure secret and access key below\n\t\t# gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n\t\t# gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n\t\t# minio : Minio Cloud Storage, remember to configure secret and access key below\n\t\t# snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n\t\t# kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n\t\t# azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n\t\t# jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n\t\t# recipe_url: load custom recipe from URL\n\t\t# recipe_file: load custom recipe from local file system\n\t\tenabled_file_systems = \"file, s3\"\n\n\t\t# S3 Connector credentials\n\t\taws_access_key_id = \"\"\n\t\taws_secret_access_key = \"\"\n\n\t3."
+ },
+ {
+ "output": " .. _image-settings:\n\nImage Settings\n\n\n``enable_tensorflow_image``\n~\n.. dropdown:: Enable Image Transformer for Processing of Image Data\n\t:open:\n\n\tSpecify whether to use pretrained deep learning models for processing of image data as part of the feature engineering pipeline."
+ },
+ {
+ "output": " This is enabled by default. .. _tensorflow_image_pretrained_models:\n\n``tensorflow_image_pretrained_models``\n\n\n.. dropdown:: Supported ImageNet Pretrained Architectures for Image Transformer\n\t:open:\n\n\tSpecify the supported `ImageNet `__ pretrained architectures for image transformer."
+ },
+ {
+ "output": " If an internet connection is not available, non-default models must be downloaded from http://s3.amazonaws.com/artifacts.h2o.ai/releases/ai/h2o/pretrained/dai_image_models_1_10.zip and extracted into ``tensorflow_image_pretrained_models_dir``."
+ },
+ {
+ "output": " In this case, embeddings from the different architectures are concatenated together (in a single embedding)."
+ },
+ {
+ "output": " Select from the following:\n\n\t- 10\n\t- 25\n\t- 50\n\t- 100 (Default)\n\t- 200\n\t- 300\n\n\tNote: Multiple transformers can be activated at the same time to allow the selection of multiple options."
+ },
+ {
+ "output": " This is disabled by default. ``tensorflow_image_fine_tuning_num_epochs``\n~\n.. dropdown:: Number of Epochs for Fine-Tuning Used for the Image Transformer\n\t:open:\n\n\tSpecify the number of epochs for fine-tuning ImageNet pretrained models used for the Image Transformer."
+ },
+ {
+ "output": " ``tensorflow_image_augmentations``\n\n.. dropdown:: List of Augmentations for Fine-Tuning Used for the Image Transformer\n\t:open:\n\n\tSpecify the list of possible image augmentations to apply while fine-tuning the ImageNet pretrained models used for the Image Transformer."
+ },
+ {
+ "output": " ``tensorflow_image_batch_size``\n~\n.. dropdown:: Batch Size for the Image Transformer\n\t:open:\n\n\tSpecify the batch size for the Image Transformer."
+ },
+ {
+ "output": " Note: Larger architectures and batch sizes use more memory. ``image_download_timeout``\n\n.. dropdown:: Image Download Timeout in Seconds\n\t:open:\n\n\tWhen providing images through URLs, specify the maximum number of seconds to wait for an image to download."
+ },
+ {
+ "output": " ``string_col_as_image_max_missing_fraction``\n\n.. dropdown:: Maximum Allowed Fraction of Missing Values for Image Column\n\t:open:\n\n\tSpecify the maximum allowed fraction of missing elements in a string column for it to be considered as a potential image path."
+ },
+ {
+ "output": " ``string_col_as_image_min_valid_types_fraction``\n\n.. dropdown:: Minimum Fraction of Images That Need to Be of Valid Types for Image Column to Be Used\n\t:open:\n\n\tSpecify the fraction of unique image URIs that need to have valid endings (as defined by ``string_col_as_image_valid_types``) for a string column to be considered as image data."
+ },
+ {
+ "output": " ``tensorflow_image_use_gpu``\n\n.. dropdown:: Enable GPU(s) for Faster Transformations With the Image Transformer\n\t:open:\n\n\tSpecify whether to use any available GPUs to transform images into embeddings with the Image Transformer."
+ },
+ {
+ "output": " Install on RHEL\n-\n\nThis section describes how to install the Driverless AI Docker image on RHEL. The installation steps vary depending on whether your system has GPUs or if it is CPU only."
+ },
+ {
+ "output": " | Min Mem |\n+=+=+=+\n| RHEL with GPUs | Yes | 64 GB |\n+-+-+-+\n| RHEL with CPUs | No | 64 GB |\n+-+-+-+\n\n.. _install-on-rhel-with-gpus:\n\nInstall on RHEL with GPUs\n~\n\nNote: Refer to the following links for more information about using RHEL with GPUs."
+ },
+ {
+ "output": " This is necessary in order to prevent a mismatch between the NVIDIA driver and the kernel, which can lead to the GPUs failures."
+ },
+ {
+ "output": " Note that some of the images in this video may change between releases, but the installation steps remain the same."
+ },
+ {
+ "output": " Open a Terminal and ssh to the machine that will run Driverless AI. Once you are logged in, perform the following steps."
+ },
+ {
+ "output": " Retrieve the Driverless AI Docker image from https://www.h2o.ai/download/. 2. Install and start Docker EE on RHEL (if not already installed)."
+ },
+ {
+ "output": " Alternatively, you can run on Docker CE. .. code-block:: bash\n\n sudo yum install -y yum-utils\n sudo yum-config-manager add-repo https://download.docker.com/linux/centos/docker-ce.repo\n sudo yum makecache fast\n sudo yum -y install docker-ce\n sudo systemctl start docker\n\n3."
+ },
+ {
+ "output": " More information is available at https://github.com/NVIDIA/nvidia-docker/blob/master/README.md. .. code-block:: bash\n\n curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \\\n sudo apt-key add -\n distribution=$(."
+ },
+ {
+ "output": " If you do not run this command, you will have to remember to start the nvidia-docker service manually; otherwise the GPUs will not appear as available."
+ },
+ {
+ "output": " Verify that the NVIDIA driver is up and running. If the driver is not up and running, log on to http://www.nvidia.com/Download/index.aspx?lang=en-us to get the latest NVIDIA Tesla V/P/K series driver."
+ },
+ {
+ "output": " Set up a directory for the version of Driverless AI on the host machine:\n\n .. code-block:: bash\n :substitutions:\n \n # Set up directory with the version name\n mkdir |VERSION-dir|\n\n6."
+ },
+ {
+ "output": " Enable persistence of the GPU. Note that this needs to be run once every reboot. Refer to the following for more information: http://docs.nvidia.com/deploy/driver-persistence/index.html."
+ },
+ {
+ "output": " Set up the data, log, and license directories on the host machine (within the new directory):\n\n .. code-block:: bash\n\n # Set up the data, log, license, and tmp directories on the host machine\n mkdir data\n mkdir log\n mkdir license\n mkdir tmp\n\n9."
+ },
+ {
+ "output": " The data will be visible inside the Docker container. 10. Run ``docker images`` to find the image tag."
+ },
+ {
+ "output": " Start the Driverless AI Docker image and replace TAG below with the image tag. Depending on your install version, use the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command."
+ },
+ {
+ "output": " For GPU users, as GPU needs ``pid=host`` for nvml, which makes tini not use pid=1, so it will show the warning message (still harmless)."
+ },
+ {
+ "output": " But if user plans to build :ref:`image auto model ` extensively, then ``shm-size=2g`` is recommended for Driverless AI docker command."
+ },
+ {
+ "output": " .. tabs::\n\n .. tab:: >= Docker 19.03\n\n .. code-block:: bash\n :substitutions:\n\n # Start the Driverless AI Docker image\n docker run runtime=nvidia \\\n pid=host \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. tab:: < Docker 19.03\n\n .. code-block:: bash\n :substitutions:\n\n # Start the Driverless AI Docker image\n nvidia-docker run \\\n pid=host \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n Driverless AI will begin running::\n\n \n Welcome to H2O.ai's Driverless AI\n -\n\n - Put data in the volume mounted at /data\n - Logs are written to the volume mounted at /log/20180606-044258\n - Connect to Driverless AI on port 12345 inside the container\n - Connect to Jupyter notebook on port 8888 inside the container\n\n12."
+ },
+ {
+ "output": " .. _install-on-rhel-cpus-only:\n\nInstall on RHEL with CPUs\n~\n\nThis section describes how to install and start the Driverless AI Docker image on RHEL."
+ },
+ {
+ "output": " Watch the installation video `here `__."
+ },
+ {
+ "output": " .. note::\n\tAs of this writing, Driverless AI has been tested on RHEL versions 7.4, 8.3, and 8.4. Open a Terminal and ssh to the machine that will run Driverless AI."
+ },
+ {
+ "output": " 1. Install and start Docker EE on RHEL (if not already installed). Follow the instructions on https://docs.docker.com/engine/installation/linux/docker-ee/rhel/."
+ },
+ {
+ "output": " .. code-block:: bash\n\n sudo yum install -y yum-utils\n sudo yum-config-manager add-repo https://download.docker.com/linux/centos/docker-ce.repo\n sudo yum makecache fast\n sudo yum -y install docker-ce\n sudo systemctl start docker\n\n2."
+ },
+ {
+ "output": " 3. Set up a directory for the version of Driverless AI on the host machine:\n\n .. code-block:: bash\n :substitutions:\n\n # Set up directory with the version name\n mkdir |VERSION-dir|\n\n4."
+ },
+ {
+ "output": " Set up the data, log, license, and tmp directories (within the new directory):\n\n .. code-block:: bash\n :substitutions:\n\n # cd into the directory associated with your version of Driverless AI\n cd |VERSION-dir|\n\n # Set up the data, log, license, and tmp directories on the host machine\n mkdir data\n mkdir log\n mkdir license\n mkdir tmp\n\n6."
+ },
+ {
+ "output": " The data will be visible inside the Docker container at //data. 7. Run ``docker images`` to find the image tag."
+ },
+ {
+ "output": " Start the Driverless AI Docker image. Note that GPU support will not be available. Note that from version 1.10 DAI docker image runs with internal ``tini`` that is equivalent to using ``init`` from docker, if both are enabled in the launch command, tini will print a (harmless) warning message."
+ },
+ {
+ "output": " But if user plans to build :ref:`image auto model ` extensively, then ``shm-size=2g`` is recommended for Driverless AI docker command."
+ },
+ {
+ "output": " HDFS Setup\n\n\nDriverless AI lets you explore HDFS data sources from within the Driverless AI application."
+ },
+ {
+ "output": " Note: Depending on your Docker install version, use either the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command when starting the Driverless AI Docker image."
+ },
+ {
+ "output": " Description of Configuration Attributes\n~\n\n- ``hdfs_config_path`` (Required): The location the HDFS config folder path."
+ },
+ {
+ "output": " - ``hdfs_auth_type`` (Required): Specifies the HDFS authentication. Available values are:\n\n - ``principal``: Authenticate with HDFS with a principal user."
+ },
+ {
+ "output": " If running DAI as a service, then the Kerberos keytab needs to be owned by the DAI user. - ``keytabimpersonation``: Login with impersonation using a keytab."
+ },
+ {
+ "output": " - ``key_tab_path``: The path of the principal key tab file. This is required when ``hdfs_auth_type='principal'``."
+ },
+ {
+ "output": " This is required when ``hdfs_auth_type='keytab'``. - ``hdfs_app_jvm_args``: JVM args for HDFS distributions."
+ },
+ {
+ "output": " - ``-Djava.security.krb5.conf``\n - ``-Dsun.security.krb5.debug``\n - ``-Dlog4j.configuration``\n\n- ``hdfs_app_classpath``: The HDFS classpath."
+ },
+ {
+ "output": " For example:\n\n ::\n\n hdfs_app_supported_schemes = ['hdfs://', 'maprfs://', 'custom://']\n\n The following are the default values for this option."
+ },
+ {
+ "output": " - ``hdfs://``\n - ``maprfs://``\n - ``swift://``\n\n- ``hdfs_max_files_listed``: Specifies the maximum number of files that are viewable in the connector UI."
+ },
+ {
+ "output": " To view more files, increase the default value. - ``hdfs_init_path``: Specifies the starting HDFS path displayed in the UI of the HDFS browser."
+ },
+ {
+ "output": " This must be configured in order for data connectors to function properly. Example 1: Enable HDFS with No Authentication\n~\n\n.. tabs::\n .. group-tab:: Docker Image Installs\n\n This example enables the HDFS data connector and disables HDFS authentication."
+ },
+ {
+ "output": " This lets you reference data stored in HDFS directly using name node address, for example: ``hdfs://name.node/datasets/iris.csv``."
+ },
+ {
+ "output": " Note that this example enables HDFS with no authentication. 1. Configure the Driverless AI config.toml file."
+ },
+ {
+ "output": " Note that the procsy port, which defaults to 12347, also has to be changed. - ``enabled_file_systems = \"file, upload, hdfs\"``\n - ``procsy_ip = \"127.0.0.1\"``\n - ``procsy_port = 8080``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example enables the HDFS data connector and disables HDFS authentication in the config.toml file."
+ },
+ {
+ "output": " 1. Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " Note that the procsy port, which defaults to 12347, also has to be changed. ::\n\n # IP address and port of procsy process."
+ },
+ {
+ "output": " (jdbc_app_configs)\n # hive: Hive Connector, remember to configure Hive below. (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, hdfs\"\n\n 3."
+ },
+ {
+ "output": " Example 2: Enable HDFS with Keytab-Based Authentication\n~\n\nNotes: \n\n- If using Kerberos Authentication, then the time on the Driverless AI server must be in sync with Kerberos server."
+ },
+ {
+ "output": " - If running Driverless AI as a service, then the Kerberos keytab needs to be owned by the Driverless AI user; otherwise Driverless AI will not be able to read/access the Keytab and will result in a fallback to simple authentication and, hence, fail."
+ },
+ {
+ "output": " - Configures the environment variable ``DRIVERLESS_AI_HDFS_APP_PRINCIPAL_USER`` to reference a user for whom the keytab was created (usually in the form of user@realm)."
+ },
+ {
+ "output": " - Configures the option ``hdfs_app_prinicpal_user`` to reference a user for whom the keytab was created (usually in the form of user@realm)."
+ },
+ {
+ "output": " Configure the Driverless AI config.toml file. Set the following configuration options. Note that the procsy port, which defaults to 12347, also has to be changed."
+ },
+ {
+ "output": " Mount the config.toml file into the Docker container. .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example:\n\n - Places keytabs in the ``/tmp/dtmp`` folder on your machine and provides the file path as described below."
+ },
+ {
+ "output": " 1. Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n \n # IP address and port of procsy process. procsy_ip = \"127.0.0.1\"\n procsy_port = 8080\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, hdfs\"\n\n # HDFS connector\n # Auth type can be Principal/keytab/keytabPrincipal\n # Specify HDFS Auth Type, allowed options are:\n # noauth : No authentication needed\n # principal : Authenticate with HDFS with a principal user\n # keytab : Authenticate with a Key tab (recommended)\n # keytabimpersonation : Login with impersonation using a keytab\n hdfs_auth_type = \"keytab\"\n\n # Path of the principal key tab file\n key_tab_path = \"/tmp/\"\n\n # Kerberos app principal user (recommended)\n hdfs_app_principal_user = \"\"\n\n 3."
+ },
+ {
+ "output": " Example 3: Enable HDFS with Keytab-Based Impersonation\n\n\nNotes: \n\n- If using Kerberos, be sure that the Driverless AI time is synched with the Kerberos server."
+ },
+ {
+ "output": " - Logins are case sensitive when keytab-based impersonation is configured. .. tabs::\n .. group-tab:: Docker Image Installs\n\n The example:\n\n - Sets the authentication type to ``keytabimpersonation``."
+ },
+ {
+ "output": " - Configures the ``DRIVERLESS_AI_HDFS_APP_PRINCIPAL_USER`` variable, which references a user for whom the keytab was created (usually in the form of user@realm)."
+ },
+ {
+ "output": " - Places keytabs in the ``/tmp/dtmp`` folder on your machine and provides the file path as described below."
+ },
+ {
+ "output": " 1. Configure the Driverless AI config.toml file. Set the following configuration options. Note that the procsy port, which defaults to 12347, also has to be changed."
+ },
+ {
+ "output": " Mount the config.toml file into the Docker container. .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example:\n\n - Sets the authentication type to ``keytabimpersonation``."
+ },
+ {
+ "output": " - Configures the ``hdfs_app_principal_user`` variable, which references a user for whom the keytab was created (usually in the form of user@realm)."
+ },
+ {
+ "output": " Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n\n # IP address and port of procsy process. procsy_ip = \"127.0.0.1\"\n procsy_port = 8080\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, hdfs\"\n\n # HDFS connector\n # Auth type can be Principal/keytab/keytabPrincipal\n # Specify HDFS Auth Type, allowed options are:\n # noauth : No authentication needed\n # principal : Authenticate with HDFS with a principal user\n # keytab : Authenticate with a Key tab (recommended)\n # keytabimpersonation : Login with impersonation using a keytab\n hdfs_auth_type = \"keytabimpersonation\"\n\n # Path of the principal key tab file\n key_tab_path = \"/tmp/\"\n\n # Kerberos app principal user (recommended)\n hdfs_app_principal_user = \"\"\n\n 3."
+ },
+ {
+ "output": " Specifying a Hadoop Platform\n\n\nThe following example shows how to build an H2O-3 Hadoop image and run Driverless AI."
+ },
+ {
+ "output": " Change the ``H2O_TARGET`` to specify a different platform. 1. Clone and then build H2O-3 for CDH 6.0."
+ },
+ {
+ "output": " Start H2O. .. code-block:: bash\n\n docker run -it rm \\\n -v `pwd`:`pwd` \\\n -w `pwd` \\\n entrypoint bash \\\n network=host \\\n -p 8020:8020 \\\n docker.h2o.ai/cdh-6-w-hive \\\n -c 'sudo -E startup.sh && \\\n source /envs/h2o_env_python3.8/bin/activate && \\\n hadoop jar h2o-hadoop-3/h2o-cdh6.0-assembly/build/libs/h2odriver.jar -libjars \"$(cat /opt/hive-jars/hive-libjars)\" -n 1 -mapperXmx 2g -baseport 54445 -notify h2o_one_node -ea -disown && \\\n export CLOUD_IP=localhost && \\\n export CLOUD_PORT=54445 && \\\n make -f scripts/jenkins/Makefile.jenkins test-hadoop-smoke; \\\n bash'\n\n3."
+ },
+ {
+ "output": " .. code-block:: bash\n\n java -cp connectors/hdfs.jar ai.h2o.dai.connectors.HdfsConnector\n\n\n4. Verify the commands for ``ls`` and ``cp``, for example."
+ },
+ {
+ "output": " .. _running-docker-on-gce:\n\nInstall and Run in a Docker Container on Google Compute Engine\n\n\nThis section describes how to install and start Driverless AI from scratch using a Docker container in a Google Compute environment."
+ },
+ {
+ "output": " If you don't have an account, go to https://console.cloud.google.com/getting-started to create one."
+ },
+ {
+ "output": " Watch the installation video `here `__."
+ },
+ {
+ "output": " Before You Begin\n\n\nIf you are trying GCP for the first time and have just created an account, check your Google Compute Engine (GCE) resource quota limits."
+ },
+ {
+ "output": " You can change these settings to match your quota limit, or you can request more resources from GCP."
+ },
+ {
+ "output": " Installation Procedure\n\n\n1. In your browser, log in to the Google Compute Engine Console at https://console.cloud.google.com/."
+ },
+ {
+ "output": " In the left navigation panel, select Compute Engine > VM Instances. .. image:: ../images/gce_newvm_instance.png\n :align: center\n :height: 390\n :width: 400\n\n3."
+ },
+ {
+ "output": " .. image:: ../images/gce_create_instance.png\n :align: center\n\n4. Specify the following at a minimum:\n\n - A unique name for this instance."
+ },
+ {
+ "output": " Note that not all zones and user accounts can select zones with GPU instances. Refer to the following for information on how to add GPUs: https://cloud.google.com/compute/docs/gpus/."
+ },
+ {
+ "output": " Be sure to also increase the disk size of the OS image to be 64 GB. Click Create at the bottom of the form when you are done."
+ },
+ {
+ "output": " .. image:: ../images/gce_instance_settings.png\n :align: center\n :height: 446\n :width: 380\n\n5."
+ },
+ {
+ "output": " On the Google Cloud Platform left navigation panel, select VPC network > Firewall rules. Specify the following settings:\n\n - Specify a unique name and Description for this instance."
+ },
+ {
+ "output": " - Specify the Source IP ranges to be ``0.0.0.0/0``. - Under Protocols and Ports, select Specified protocols and ports and enter the following: ``tcp:12345``."
+ },
+ {
+ "output": " .. image:: ../images/gce_create_firewall_rule.png\n :align: center\n :height: 452\n :width: 477\n\n6."
+ },
+ {
+ "output": " .. image:: ../images/gce_ssh_in_browser.png\n :align: center\n\n7. H2O provides a script for you to run in your VM instance."
+ },
+ {
+ "output": " Copy one of the scripts below (depending on whether you are running GPUs or CPUs). Save the script as install.sh."
+ },
+ {
+ "output": " /etc/os-release;echo $ID$VERSION_ID)\n curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \\\n sudo tee /etc/apt/sources.list.d/nvidia-docker.list\n sudo apt-get update\n\n # Install nvidia-docker2 and reload the Docker daemon configuration\n sudo apt-get install -y nvidia-docker2\n\n .. code-block:: bash\n\n # SCRIPT FOR CPUs ONLY\n apt-get -y update \n apt-get -y no-install-recommends install \\\n curl \\\n apt-utils \\\n python-software-properties \\\n software-properties-common\n\n add-apt-repository -y \"deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable\"\n curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add - \n\n apt-get update \n apt-get install -y docker-ce\n\n\n8."
+ },
+ {
+ "output": " .. code-block:: bash\n\n chmod +x install.sh\n sudo ./install.sh\n\n9. In your user folder, create the following directories as your user."
+ },
+ {
+ "output": " Add your Google Compute user name to the Docker container. .. code-block:: bash\n\n sudo usermod -aG docker \n\n\n11."
+ },
+ {
+ "output": " .. code-block:: bash\n\n sudo reboot\n\n12. Retrieve the Driverless AI Docker image from https://www.h2o.ai/download/."
+ },
+ {
+ "output": " Load the Driverless AI Docker image. The following example shows how to load Driverless AI. Replace VERSION with your image."
+ },
+ {
+ "output": " If you are running CPUs, you can skip this step. Otherwise, you must enable persistence of the GPU."
+ },
+ {
+ "output": " Refer to the following for more information: http://docs.nvidia.com/deploy/driver-persistence/index.html."
+ },
+ {
+ "output": " Start the Driverless AI Docker image and replace TAG below with the image tag. Depending on your install version, use the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command."
+ },
+ {
+ "output": " Note: Use ``docker version`` to check which version of Docker you are using. .. tabs::\n\n .. tab:: >= Docker 19.03\n\n .. code-block:: bash\n :substitutions:\n\n # Start the Driverless AI Docker image\n docker run runtime=nvidia \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. tab:: < Docker 19.03\n\n .. code-block:: bash\n :substitutions:\n\n # Start the Driverless AI Docker image\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n Driverless AI will begin running::\n\n \n Welcome to H2O.ai's Driverless AI\n -\n\n - Put data in the volume mounted at /data\n - Logs are written to the volume mounted at /log/20180606-044258\n - Connect to Driverless AI on port 12345 inside the container\n - Connect to Jupyter notebook on port 8888 inside the container\n\n16."
+ },
+ {
+ "output": " You can stop the instance using one of the following methods: \n\nStopping in the browser\n\n1. On the VM Instances page, click on the VM instance that you want to stop."
+ },
+ {
+ "output": " Click Stop at the top of the page. 3. A confirmation page will display. Click Stop to stop the instance."
+ },
+ {
+ "output": " Azure Blob Store Setup\n \n\nDriverless AI lets you explore Azure Blob Store data sources from within the Driverless AI application."
+ },
+ {
+ "output": " Use ``docker version`` to check which version of Docker you are using. Supported Data Sources Using the Azure Blob Store Connector\n~\n\nThe following data sources can be used with the Azure Blob Store connector."
+ },
+ {
+ "output": " - :ref:`Azure Data Lake Gen 1 (HDFS connector required)`\n- :ref:`Azure Data Lake Gen 2 (HDFS connector optional)`\n\n\nDescription of Configuration Attributes\n~\n\nThe following configuration attributes are specific to enabling Azure Blob Storage."
+ },
+ {
+ "output": " This should be the dns prefix created when the account was created (for example, \"mystorage\"). - ``azure_blob_account_key``: Specify the account key that maps to your account name."
+ },
+ {
+ "output": " With this option, you can include an override for a host, port, and/or account name. For example, \n\n .. code:: bash\n\n azure_connection_string = \"DefaultEndpointsProtocol=http;AccountName=;AccountKey=;BlobEndpoint=http://:/;\"\n\n- ``azure_blob_init_path``: Specifies the starting Azure Blob store path displayed in the UI of the Azure Blob store browser."
+ },
+ {
+ "output": " This must be configured in order for data connectors to function properly. The following additional configuration attributes can be used for enabling an HDFS Connector to connect to Azure Data Lake Gen 1 (and optionally with Azure Data Lake Gen 2)."
+ },
+ {
+ "output": " This folder can contain multiple config files. - ``hdfs_app_classpath``: The HDFS classpath. - ``hdfs_app_supported_schemes``: Supported schemas list is used as an initial check to ensure valid input to connector."
+ },
+ {
+ "output": " This lets users reference data stored on your Azure storage account using the account name, for example: ``https://mystorage.blob.core.windows.net``."
+ },
+ {
+ "output": " 1. Configure the Driverless AI config.toml file. Set the following configuration options:\n\n - ``enabled_file_systems = \"file, upload, azrbs\"``\n - ``azure_blob_account_name = \"mystorage\"``\n - ``azure_blob_account_key = \"\"``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example shows how to enable the Azure Blob Store data connector in the config.toml file when starting Driverless AI in native installs."
+ },
+ {
+ "output": " 1. Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, azrbs\"\n\n # Azure Blob Store Connector credentials\n azure_blob_account_name = \"mystorage\"\n azure_blob_account_key = \"\"\n\n 3."
+ },
+ {
+ "output": " .. _example2:\n\nExample 2: Mount Azure File Shares to the Local File System\n~\n\nSupported Data Sources Using the Local File System\n\n\n- Azure Files (File Storage) \n\nMounting Azure File Shares\n\n\nAzure file shares can be mounted into the Local File system of Driverless AI."
+ },
+ {
+ "output": " .. _example3:\n\nExample 3: Enable HDFS Connector to Connect to Azure Data Lake Gen 1\n~\n\nThis example enables the HDFS Connector to connect to Azure Data Lake Gen1."
+ },
+ {
+ "output": " .. tabs::\n .. group-tab:: Docker Image with the config.toml\n\n 1. Create an Azure AD web application for service-to-service authentication: https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory\n\n 2."
+ },
+ {
+ "output": " Take note of the Hadoop Classpath and add the ``azure-datalake-store.jar`` file. This file can found on any Hadoop version in: ``$HADOOP_HOME/share/hadoop/tools/lib/*``."
+ },
+ {
+ "output": " Configure the Driverless AI config.toml file. Set the following configuration options: \n\n .. code:: bash\n\n enabled_file_systems = \"upload, file, hdfs, azrbs, recipe_file, recipe_url\"\n hdfs_config_path = \"/path/to/hadoop/conf\"\n hdfs_app_classpath = \"/hadoop/classpath/\"\n hdfs_app_supported_schemes = \"['adl://']\"\n \n 5."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n 1."
+ },
+ {
+ "output": " https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory\n\n 2."
+ },
+ {
+ "output": " Take note of the Hadoop Classpath and add the ``azure-datalake-store.jar`` file. This file can found on any hadoop version in: ``$HADOOP_HOME/share/hadoop/tools/lib/*``\n\n .. code:: bash \n \n echo \"$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*\"\n\n 4."
+ },
+ {
+ "output": " Set the following configuration options: \n\n .. code:: bash\n\n enabled_file_systems = \"upload, file, hdfs, azrbs, recipe_file, recipe_url\"\n hdfs_config_path = \"/path/to/hadoop/conf\"\n hdfs_app_classpath = \"/hadoop/classpath/\"\n hdfs_app_supported_schemes = \"['adl://']\"\n \n 5."
+ },
+ {
+ "output": " .. _example4:\n\nExample 4: Enable HDFS Connector to Connect to Azure Data Lake Gen 2\n\n\nThis example enables the HDFS Connector to connect to Azure Data Lake Gen2."
+ },
+ {
+ "output": " .. tabs::\n .. group-tab:: Docker Image with the config.toml\n\n 1. Create an Azure Service Principal: https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal\n\n 2."
+ },
+ {
+ "output": " Add the information from your web application to the Hadoop ``core-site.xml`` configuration file:\n\n .. code:: bash\n\n \n \n fs.azure.account.auth.type\n OAuth\n \n \n fs.azure.account.oauth.provider.type\n org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider\n \n \n fs.azure.account.oauth2.client.endpoint\n Token endpoint created in step 1.\n \n \n fs.azure.account.oauth2.client.id\n Client ID created in step 1\n \n \n fs.azure.account.oauth2.client.secret\n Client Secret created in step 1\n \n \n\n 4."
+ },
+ {
+ "output": " These files can found on any Hadoop version 3.2 or higher at: ``$HADOOP_HOME/share/hadoop/tools/lib/*``\n\n .. code:: bash \n\n echo \"$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*\"\n \n Note: ABFS is only supported for Hadoop version 3.2 or higher."
+ },
+ {
+ "output": " Configure the Driverless AI config.toml file. Set the following configuration options: \n\n .. code:: bash\n\n enabled_file_systems = \"upload, file, hdfs, azrbs, recipe_file, recipe_url\"\n hdfs_config_path = \"/path/to/hadoop/conf\"\n hdfs_app_classpath = \"/hadoop/classpath/\"\n hdfs_app_supported_schemes = \"['abfs://']\"\n \n 6."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n \n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n 1."
+ },
+ {
+ "output": " https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal\n\n 2."
+ },
+ {
+ "output": " Add the information from your web application to the hadoop ``core-site.xml`` configuration file:\n\n .. code:: bash\n\n \n \n fs.azure.account.auth.type\n OAuth\n \n \n fs.azure.account.oauth.provider.type\n org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider\n \n \n fs.azure.account.oauth2.client.endpoint\n Token endpoint created in step 1.\n \n \n fs.azure.account.oauth2.client.id\n Client ID created in step 1\n \n \n fs.azure.account.oauth2.client.secret\n Client Secret created in step 1\n \n \n\n 4."
+ },
+ {
+ "output": " These files can found on any hadoop version 3.2 or higher at: ``$HADOOP_HOME/share/hadoop/tools/lib/*``\n\n .. code:: bash \n \n echo \"$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*\"\n \n Note: ABFS is only supported for hadoop version 3.2 or higher \n\n 5."
+ },
+ {
+ "output": " Set the following configuration options: \n\n .. code:: bash\n \n enabled_file_systems = \"upload, file, hdfs, azrbs, recipe_file, recipe_url\"\n hdfs_config_path = \"/path/to/hadoop/conf\"\n hdfs_app_classpath = \"/hadoop/classpath/\"\n hdfs_app_supported_schemes = \"['abfs://']\"\n \n 6."
+ },
+ {
+ "output": " Export MOJO artifact to Azure Blob Storage\n\n\nIn order to export the MOJO artifact to Azure Blob Storage, you must enable support for the shared access signatures (SAS) token."
+ },
+ {
+ "output": " ``enable_artifacts_upload=true``\n2. ``artifacts_store=\"azure\"``\n3. ``artifacts_azure_sas_token=\"token\"``\n\nFor instructions on exporting artifacts, see :ref:`export_artifacts`."
+ },
+ {
+ "output": " Yes. Driverless AI can use private endpoints if Driverless AI is located in the allowed VNET. Does Driverless AI support secure transfer?"
+ },
+ {
+ "output": " The Azure Blob Store Connector make all connections over HTTPS. Does Driverless AI support hierarchical namespaces?"
+ },
+ {
+ "output": " Can I use Azure Managed Identities (MSI) to access the DataLake? Yes. If Driverless AI is running on an Azure VM with managed identities."
+ },
+ {
+ "output": " .. _recipes-settings:\n\nRecipes Settings\n\n\n.. _included_transformers:\n\n``included_transformers``\n\n\n.. dropdown:: Include Specific Transformers\n\t:open:\n\n\tSelect the :ref:`transformer(s) ` that you want to use in the experiment."
+ },
+ {
+ "output": " Note: If you uncheck all transformers so that none is selected, Driverless AI will ignore this and will use the default list of transformers for that experiment."
+ },
+ {
+ "output": " The equivalent config.toml parameter is ``included_transformers``. .. _included_models:\n\n``included_models``\n~\n\n.. dropdown:: Include Specific Models\n\t:open:\n\n\tSpecify the types of models that you want Driverless AI to build in the experiment."
+ },
+ {
+ "output": " Note: The ImbalancedLightGBM and ImbalancedXGBoostGBM models are closely tied with the :ref:`sampling_method_for_imbalanced` option."
+ },
+ {
+ "output": " If the target fraction proves to be above the allowed imbalance threshold, then sampling will be triggered."
+ },
+ {
+ "output": " - If the ImbalancedLightGBM and/or ImbalancedXGBoostGBM models are ENABLED and the :ref:`sampling_method_for_imbalanced` is DISABLED, sampling will not be used, and these imbalanced models will be disabled."
+ },
+ {
+ "output": " .. _included_pretransformers:\n\n``included_pretransformers``\n\n\n.. dropdown:: Include Specific Preprocessing Transformers\n\t:open:\n\n\tSpecify which :ref:`transformers ` to use for preprocessing before other transformers are activated."
+ },
+ {
+ "output": " Notes:\n\n\t- Preprocessing transformers and all other layers of transformers are part of the Python and (if applicable) MOJO scoring packages."
+ },
+ {
+ "output": " For example, a preprocessing transformer can perform interactions, string concatenations, or date extractions as a preprocessing step before the next layer of Date and DateTime transformations are performed."
+ },
+ {
+ "output": " However, one can use a run-time data recipe to (e.g.) convert a float date-time into string date-time, and this will be used by Driverless AIs Date and DateTime transformers as well as auto-detection of time series."
+ },
+ {
+ "output": " the dataset\n\t must have time column and groups prepared ahead of experiment by user or via a one-time :ref:`data recipe `."
+ },
+ {
+ "output": " .. _num_pipeline_layers:\n\n``num_pipeline_layers``\n~\n\n.. dropdown:: Number of Pipeline Layers\n\t:open:\n\n\tSpecify the number of pipeline layers."
+ },
+ {
+ "output": " The equivalent config.toml parameter is ``num_pipeline_layers``. Note: This does not include the preprocessing layer specified by the :ref:`included_pretransformers` expert setting."
+ },
+ {
+ "output": " Avoids need for separate data preparation step, builds data preparation within experiment and within python scoring package."
+ },
+ {
+ "output": " The equivalent config.toml parameter is ``included_datas``. .. _included_individuals:\n\n``included_individuals``\n\n\n.. dropdown:: Include Specific Individuals\n\t:open:\n\n\tIn Driverless AI, every completed experiment automatically generates Python code for the experiment that corresponds to the individual(s) used to build the final model."
+ },
+ {
+ "output": " This feature gives you code-first access to a significant portion of DAI's internal transformer and model generation process."
+ },
+ {
+ "output": " - Select recipe display names of custom individuals through the UI. If the number of included custom individuals is less than DAI needs, then the remaining individuals are freshly generated."
+ },
+ {
+ "output": " For more information, see :ref:`individual_recipe`. ``threshold_scorer``\n\n\n.. dropdown:: Scorer to Optimize Threshold to Be Used in Other Confusion-Matrix Based Scorers (For Binary Classification)\n\t:open:\n\n\tSpecify the scorer used to optimize the binary probability threshold that is being used in related Confusion Matrix based scorers such as Precision, Recall, FalsePositiveRate, FalseDiscoveryRate, FalseOmissionRate, TrueNegativeRate, FalseNegativeRate, and NegativePredictiveValue."
+ },
+ {
+ "output": " If this is not possible, F1 is used. - F05 More weight on precision, less weight on recall. - F1: Equal weight on precision and recall."
+ },
+ {
+ "output": " - MCC: Use this option when all classes are equally important. ``prob_add_genes``\n\n\n.. dropdown:: Probability to Add Transformers\n\t:open:\n\n\tSpecify the unnormalized probability to add genes or instances of transformers with specific attributes."
+ },
+ {
+ "output": " This value defaults to 0.5. ``prob_addbest_genes``\n\n\n.. dropdown:: Probability to Add Best Shared Transformers\n\t:open:\n\n\tSpecify the unnormalized probability to add genes or instances of transformers with specific attributes that have shown to be beneficial to other individuals within the population."
+ },
+ {
+ "output": " ``prob_prune_genes``\n\n\n.. dropdown:: Probability to Prune Transformers\n\t:open:\n\n\tSpecify the unnormalized probability to prune genes or instances of transformers with specific attributes."
+ },
+ {
+ "output": " ``prob_perturb_xgb``\n\n\n.. dropdown:: Probability to Mutate Model Parameters\n\t:open:\n\n\tSpecify the unnormalized probability to change model hyper parameters."
+ },
+ {
+ "output": " ``prob_prune_by_features``\n\n\n.. dropdown:: Probability to Prune Weak Features\n\t:open:\n\n\tSpecify the unnormalized probability to prune features that have low variable importance instead of pruning entire instances of genes/transformers."
+ },
+ {
+ "output": " ``skip_transformer_failures``\n~\n\n.. dropdown:: Whether to Skip Failures of Transformers\n\t:open:\n\n\tSpecify whether to avoid failed transformers."
+ },
+ {
+ "output": " ``skip_model_failures``\n~\n\n.. dropdown:: Whether to Skip Failures of Models\n\t:open:\n\n\tSpecify whether to avoid failed models."
+ },
+ {
+ "output": " This is enabled by default. ``detailed_skip_failure_messages_level``\n\n\n.. dropdown:: Level to Log for Skipped Failures\n\t:open:\n\n\tSpecify one of the following levels for the verbosity of log failure messages for skipped transformers or models:\n\n\t- 0 = Log simple message\n\t- 1 = Log code line plus message (Default)\n\t- 2 = Log detailed stack traces\n\n``notify_failures``\n~\n\n.. dropdown:: Whether to Notify About Failures of Transformers or Models or Other Recipe Failures\n\t:open:\n\n\tSpecify whether to display notifications in the GUI about recipe failures."
+ },
+ {
+ "output": " The equivalent config.toml parameter is ``notify_failures``. ``acceptance_test_timeout``\n~\n\n.. dropdown:: Timeout in Minutes for Testing Acceptance of Each Recipe\n\t:open:\n\n\tSpecify the number of minutes to wait until a recipe's acceptance testing is aborted."
+ },
+ {
+ "output": " .. _install-gcp-offering:\n\nInstall the Google Cloud Platform Offering\n\n\nThis section describes how to install and start Driverless AI in a Google Compute environment using the GCP Marketplace."
+ },
+ {
+ "output": " If you don't have an account, go to https://console.cloud.google.com/getting-started to create one."
+ },
+ {
+ "output": " By default, GCP allocates a maximum of 8 CPUs and no GPUs. Our default recommendation for launching Driverless AI is 32 CPUs, 120 GB RAM, and 2 P100 NVIDIA GPUs."
+ },
+ {
+ "output": " Refer to https://cloud.google.com/compute/quotas for more information, including information on how to check your quota and request additional quota."
+ },
+ {
+ "output": " In your browser, log in to the Google Compute Engine Console at https://console.cloud.google.com/. 2."
+ },
+ {
+ "output": " .. image:: ../images/google_cloud_launcher.png\n :align: center\n :height: 266\n :width: 355\n\n3."
+ },
+ {
+ "output": " The following page will display. .. image:: ../images/google_driverlessai_offering.png\n :align: center\n\n4."
+ },
+ {
+ "output": " (If necessary, refer to `Google Compute Instance Types `__ for information about machine and GPU types.)"
+ },
+ {
+ "output": " (This defaults to 32 CPUs and 120 GB RAM.) - Specify a GPU type. (This defaults to a p100 GPU.) - Optionally change the number of GPUs."
+ },
+ {
+ "output": " - Specify the boot disk type and size. - Optionally change the network name and subnetwork names. Be sure that whichever network you specify has port 12345 exposed."
+ },
+ {
+ "output": " Driverless AI will begin deploying. Note that this can take several minutes. .. image:: ../images/google_deploy_compute_engine.png\n :align: center\n\n5."
+ },
+ {
+ "output": " This page includes the instance ID and the username (always h2oai) and password that will be required when starting Driverless AI."
+ },
+ {
+ "output": " .. image:: ../images/google_deploy_summary.png\n :align: center\n\n6. In your browser, go to https://[External_IP]:12345 to start Driverless AI."
+ },
+ {
+ "output": " Agree to the Terms and Conditions. 8. Log in to Driverless AI using your user name and password. 9."
+ },
+ {
+ "output": " a. In order to enable GCS and Google BigQuery access, you must pass the running instance a service account json file configured with GCS and GBQ access."
+ },
+ {
+ "output": " Obtain a functioning service account json file from `GCP `__, rename it to \"service_account.json\", and copy it to the Ubuntu user on the running instance."
+ },
+ {
+ "output": " c. Restart the machine for the changes to take effect. .. code-block:: bash\n\n sudo systemctl stop dai\n\n # Wait for the system to stop\n\n # Verify that the system is no longer running\n sudo systemctl status dai\n\n # Restart the system\n sudo systemctl start dai\n\nUpgrading the Google Cloud Platform Offering\n\n\nPerform the following steps to upgrade the Driverless AI Google Platform offering."
+ },
+ {
+ "output": " Note that this upgrade process inherits the service user and group from /etc/dai/User.conf and /etc/dai/Group.conf."
+ },
+ {
+ "output": " .. code-block:: bash\n\n # Stop Driverless AI. sudo systemctl stop dai\n\n # Make a backup of /opt/h2oai/dai/tmp directory at this time."
+ },
+ {
+ "output": " .. _time-series-settings:\n\nTime Series Settings\n\n\n.. _time-series-lag-based-recipe:\n\n``time_series_recipe``\n\n.. dropdown:: Time-Series Lag-Based Recipe\n\t:open:\n\n\tThis recipe specifies whether to include Time Series lag features when training a model with a provided (or autodetected) time column."
+ },
+ {
+ "output": " Lag features are the primary automatically generated time series features and represent a variable's past values."
+ },
+ {
+ "output": " For example, if the sales today are 300, and sales of yesterday are 250, then the lag of one day for sales is 250."
+ },
+ {
+ "output": " Lagging variables are important in time series because knowing what happened in different time periods in the past can greatly facilitate predictions for the future."
+ },
+ {
+ "output": " Ensembling is also disabled if a time column is selected or if time column is set to [Auto] on the experiment setup screen."
+ },
+ {
+ "output": " .. figure:: ../images/time_series_lag.png\n\t :alt: Lag\n\n``time_series_leaderboard_mode``\n\n.. dropdown:: Control the automatic time-series leaderboard mode\n\t:open:\n\n\tSelect from the following options:\n\n - 'diverse': explore a diverse set of models built using various expert settings."
+ },
+ {
+ "output": " - 'sliding_window': If the forecast horizon is N periods, create a separate model for \"each of the (gap, horizon) pairs of (0,n), (n,n), (2*n,n), ..., (2*N-1, n) in units of time periods."
+ },
+ {
+ "output": " This can help to improve short-term forecasting quality. ``time_series_leaderboard_periods_per_model``\n~\n.. dropdown:: Number of periods per model if time_series_leaderboard_mode is 'sliding_window'\n\t:open:\n\n\tSpecify the number of periods per model if ``time_series_leaderboard_mode`` is set to ``sliding_window``."
+ },
+ {
+ "output": " .. _time_series_merge_splits:\n\n``time_series_merge_splits``\n\n.. dropdown:: Larger Validation Splits for Lag-Based Recipe\n\t:open:\n\n\tSpecify whether to create larger validation splits that are not bound to the length of the forecast horizon."
+ },
+ {
+ "output": " This is enabled by default. ``merge_splits_max_valid_ratio``\n\n.. dropdown:: Maximum Ratio of Training Data Samples Used for Validation\n\t:open:\n\n\tSpecify the maximum ratio of training data samples used for validation across splits when larger validation splits are created (see :ref:`time_series_merge_splits` setting)."
+ },
+ {
+ "output": " .. _fixed_size_splits:\n\n``fixed_size_splits``\n~\n.. dropdown:: Fixed-Size Train Timespan Across Splits\n\t:open:\n\n\tSpecify whether to keep a fixed-size train timespan across time-based splits during internal validation."
+ },
+ {
+ "output": " This is disabled by default. ``time_series_validation_fold_split_datetime_boundaries``\n~\n.. dropdown:: Custom Validation Splits for Time-Series Experiments\n\t:open:\n\n\tSpecify date or datetime timestamps (in the same format as the time column) to use for custom training and validation splits."
+ },
+ {
+ "output": " This value defaults to 30. .. _holiday-calendar:\n\n``holiday_features``\n\n.. dropdown:: Generate Holiday Features\n\t:open:\n\n\tFor time-series experiments, specify whether to generate holiday features for the experiment."
+ },
+ {
+ "output": " ``holiday_countries``\n~\n.. dropdown:: Country code(s) for holiday features\n\t:open:\n\n\tSpecify country codes in the form of a list that is used to look up holidays."
+ },
+ {
+ "output": " ``override_lag_sizes``\n\n.. dropdown:: Time-Series Lags Override\n\t:open:\n\n\tSpecify the override lags to be used."
+ },
+ {
+ "output": " The following examples show the variety of different methods that can be used to specify override lags:\n\n\t- \"[0]\" disable lags\n\t- \"[7, 14, 21]\" specifies this exact list\n\t- \"21\" specifies every value from 1 to 21\n\t- \"21:3\" specifies every value from 1 to 21 in steps of 3\n\t- \"5-21\" specifies every value from 5 to 21\n\t- \"5-21:3\" specifies every value from 5 to 21 in steps of 3\n\n``override_ufapt_lag_sizes``\n\n.. dropdown:: Lags Override for Features That are not Known Ahead of Time\n\t:open:\n\n\tSpecify lags override for non-target features that are not known ahead of time."
+ },
+ {
+ "output": " - \"[0]\" disable lags\n\t- \"[7, 14, 21]\" specifies this exact list\n\t- \"21\" specifies every value from 1 to 21\n\t- \"21:3\" specifies every value from 1 to 21 in steps of 3\n\t- \"5-21\" specifies every value from 5 to 21\n\t- \"5-21:3\" specifies every value from 5 to 21 in steps of 3\n\n``min_lag_size``\n\n.. dropdown:: Smallest Considered Lag Size\n\t:open:\n\n\tSpecify a minimum considered lag size."
+ },
+ {
+ "output": " ``allow_time_column_as_feature``\n\n.. dropdown:: Enable Feature Engineering from Time Column\n\t:open:\n\n\tSpecify whether to enable feature engineering based on the selected time column, e.g."
+ },
+ {
+ "output": " This is enabled by default. ``allow_time_column_as_numeric_feature``\n\n.. dropdown:: Allow Integer Time Column as Numeric Feature\n\t:open:\n\n\tSpecify whether to enable feature engineering from an integer time column."
+ },
+ {
+ "output": " This is disabled by default. ``datetime_funcs``\n\n.. dropdown:: Allowed Date and Date-Time Transformations\n\t:open:\n\n\tSpecify the date or date-time transformations to allow Driverless AI to use."
+ },
+ {
+ "output": " Note that ``get_num`` can lead to overfitting if used on IID problems and is disabled by default. .. _filter_datetime_funcs:\n\n``filter_datetime_funcs``\n~\n.. dropdown:: Auto Filtering of Date and Date-Time Transformations\n\t:open:\n\n\tWhether to automatically filter out date and date-time transformations that would lead to unseen values in the future."
+ },
+ {
+ "output": " ``allow_tgc_as_features``\n~\n.. dropdown:: Consider Time Groups Columns as Standalone Features\n\t:open:\n\n\tSpecify whether to consider time groups columns as standalone features."
+ },
+ {
+ "output": " ``allowed_coltypes_for_tgc_as_features``\n\n.. dropdown:: Which TGC Feature Types to Consider as Standalone Features\n\t:open:\n\n\tSpecify whether to consider time groups columns (TGC) as standalone features."
+ },
+ {
+ "output": " Available types are numeric, categorical, ohe_categorical, datetime, date, and text. All types are selected by default."
+ },
+ {
+ "output": " Also note that if \"Time Series Lag-Based Recipe\" is disabled, then all time group columns are allowed features."
+ },
+ {
+ "output": " This is set to Auto by default. ``tgc_only_use_all_groups``\n~\n.. dropdown:: Always Group by All Time Groups Columns for Creating Lag Features\n\t:open:\n\n\tSpecify whether to group by all time groups columns for creating lag features, instead of sampling from them."
+ },
+ {
+ "output": " ``tgc_allow_target_encoding``\n~\n.. dropdown:: Allow Target Encoding of Time Groups Columns\n\t:open:\n\n\tSpecify whether it is allowed to target encode the time groups columns."
+ },
+ {
+ "output": " Notes:\n\n\t- This setting is not affected by ``allow_tgc_as_features``. - Subgroups can be encoded by disabling ``tgc_only_use_all_groups``."
+ },
+ {
+ "output": " This is enabled by default. This can be useful for MLI, but it will slow down the experiment considerably when enabled."
+ },
+ {
+ "output": " ``time_series_validation_splits``\n~\n.. dropdown:: Number of Time-Based Splits for Internal Model Validation\n\t:open:\n\n\tSpecify a fixed number of time-based splits for internal model validation."
+ },
+ {
+ "output": " This value defaults to -1 (auto). ``time_series_splits_max_overlap``\n\n.. dropdown:: Maximum Overlap Between Two Time-Based Splits\n\t:open:\n\n\tSpecify the maximum overlap between two time-based splits."
+ },
+ {
+ "output": " This value defaults to 0.5. ``time_series_max_holdout_splits``\n\n.. dropdown:: Maximum Number of Splits Used for Creating Final Time-Series Model's Holdout Predictions\n\t:open:\n\n\tSpecify the maximum number of splits used for creating the final time-series Model's holdout predictions."
+ },
+ {
+ "output": " Use \t``time_series_validation_splits`` to control amount of time-based splits used for model validation."
+ },
+ {
+ "output": " This setting is used for MLI and calculating metrics. Note that predictions can be slightly less accurate when this setting is enabled."
+ },
+ {
+ "output": " ``mli_ts_fast_approx_contribs``\n~\n.. dropdown:: Whether to Speed up Calculation of Shapley Values for Time-Series Holdout Predictions\n\t:open:\n\n\tSpecify whether to speed up Shapley values for time-series holdout predictions for back-testing on training data."
+ },
+ {
+ "output": " Note that predictions can be slightly less accurate when this setting is enabled. This is enabled by default."
+ },
+ {
+ "output": " This can be useful for MLI, but it can slow down the experiment when enabled. If this setting is disabled, MLI will generate Shapley values on demand."
+ },
+ {
+ "output": " ``time_series_min_interpretability``\n\n.. dropdown:: Lower Limit on Interpretability Setting for Time-Series Experiments (Implicitly Enforced)\n\t:open:\n\n\tSpecify the lower limit on interpretability setting for time-series experiments."
+ },
+ {
+ "output": " To disable this setting, set this value to 1. ``lags_dropout``\n\n.. dropdown:: Dropout Mode for Lag Features\n\t:open:\n\n\tSpecify the dropout mode for lag features in order to achieve an equal n.a."
+ },
+ {
+ "output": " Independent mode performs a simple feature-wise dropout. Dependent mode takes the lag-size dependencies per sample/row into account."
+ },
+ {
+ "output": " ``prob_lag_non_targets``\n\n.. dropdown:: Probability to Create Non-Target Lag Features\n\t:open:\n\n\tLags can be created on any feature as well as on the target."
+ },
+ {
+ "output": " This value defaults to 0.1. .. _rolling-test-set-method:\n\n``rolling_test_method``\n~\n.. dropdown:: Method to Create Rolling Test Set Predictions\n\t:open:\n\n\tSpecify the method used to create rolling test set predictions."
+ },
+ {
+ "output": " TTA is enabled by default. Notes: \n\t\n\t- This setting only applies to the test set that is provided by the user during an experiment."
+ },
+ {
+ "output": " ``fast_tta_internal``\n~\n.. dropdown:: Fast TTA for Internal Validation\n\t:open:\n\n\tSpecify whether the genetic algorithm applies Test Time Augmentation (TTA) in one pass instead of using rolling windows for validation splits longer than the forecast horizon."
+ },
+ {
+ "output": " ``prob_default_lags``\n~\n.. dropdown:: Probability for New Time-Series Transformers to Use Default Lags\n\t:open:\n\n\tSpecify the probability for new lags or the EWMA gene to use default lags."
+ },
+ {
+ "output": " This value defaults to 0.2. ``prob_lagsinteraction``\n\n.. dropdown:: Probability of Exploring Interaction-Based Lag Transformers\n\t:open:\n\n\tSpecify the unnormalized probability of choosing other lag time-series transformers based on interactions."
+ },
+ {
+ "output": " ``prob_lagsaggregates``\n~\n.. dropdown:: Probability of Exploring Aggregation-Based Lag Transformers\n\t:open:\n\n\tSpecify the unnormalized probability of choosing other lag time-series transformers based on aggregations."
+ },
+ {
+ "output": " .. _centering-detrending:\n\n``ts_target_trafo``\n~\n.. dropdown:: Time Series Centering or Detrending Transformation\n\t:open:\n\n\tSpecify whether to use centering or detrending transformation for time series experiments."
+ },
+ {
+ "output": " Linear or Logistic will remove the fitted linear or logistic trend, Centering will only remove the mean of the target signal and Epidemic will remove the signal specified by a `Susceptible-Infected-Exposed-Recovered-Dead `_ (SEIRD) epidemic model."
+ },
+ {
+ "output": " Notes:\n\n\t- MOJO support is currently disabled when this setting is enabled. - The Fast centering and linear detrending options use least squares fitting."
+ },
+ {
+ "output": " outliers. - Please see (:ref:`Custom Bounds for SEIRD Epidemic Model Parameters `) for further details on how to customize the bounds of the free SEIRD parameters."
+ },
+ {
+ "output": " The target column must correspond to *I(t)*, which represents infection cases as a function of time."
+ },
+ {
+ "output": " The model's value is then subtracted from the training response, and the residuals are passed to the feature engineering and modeling pipeline."
+ },
+ {
+ "output": " The following is a list of free parameters:\n\n\t- N: Total population, *N = S+E+I+R+D*\n\t- beta: Rate of exposure (*S* -> *E*)\n\t- gamma: Rate of recovering (*I* -> *R*)\n\t- delta: Incubation period\n\t- alpha: Fatality rate\n\t- rho: Rate at which individuals expire\n\t- lockdown: Day of lockdown (-1 => no lockdown)\n\t- beta_decay: Beta decay due to lockdown\n\t- beta_decay_rate: Speed of beta decay\n\n\tProvide upper or lower bounds for each parameter you want to control."
+ },
+ {
+ "output": " For example:\n\n\t::\n\n\t ts_target_trafo_epidemic_params_dict=\"{'N_min': 1000, 'beta_max': 0.2}\"\n\n\tRefer to https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology and https://arxiv.org/abs/1411.3435 for more information on the SEIRD model."
+ },
+ {
+ "output": " To get the SEIR model, set ``alpha_min=alpha_max=rho_min=rho_max=beta_decay_rate_min=beta_decay_rate_max=0`` and ``lockdown_min=lockdown_max=-1``."
+ },
+ {
+ "output": " Select from the following:\n\n\t- I (Default): Infected\n\t- R: Recovered\n\t- D: Deceased\n\n.. _ts-target-transformation:\n\n``ts_lag_target_trafo``\n~\n.. dropdown:: Time Series Lag-Based Target Transformation\n\t:open:\n\n\tSpecify whether to use either the difference between or ratio of the current target and a lagged target."
+ },
+ {
+ "output": " Notes:\n\n\t- MOJO support is currently disabled when this setting is enabled. - The corresponding lag size is specified with the ``ts_target_trafo_lag_size`` expert setting."
+ },
+ {
+ "output": " .. _install-on-aws:\n\nInstall on AWS\n\n\nDriverless AI can be installed on Amazon AWS using the AWS Marketplace AMI or the AWS Community AMI."
+ },
+ {
+ "output": " Google Cloud Storage Setup\n\n\nDriverless AI lets you explore Google Cloud Storage data sources from within the Driverless AI application."
+ },
+ {
+ "output": " This setup requires you to enable authentication. If you enable GCS or GBP connectors, those file systems will be available in the UI, but you will not be able to use those connectors without authentication."
+ },
+ {
+ "output": " Obtain a JSON authentication file from `GCP `__."
+ },
+ {
+ "output": " Mount the JSON file to the Docker instance. 3. Specify the path to the /json_auth_file.json in the gcs_path_to_service_account_json config option."
+ },
+ {
+ "output": " You can be provided a JSON file that contains both Google Cloud Storage and Google BigQuery authentications, just one or the other, or none at all."
+ },
+ {
+ "output": " Use ``docker version`` to check which version of Docker you are using. Description of Configuration Attributes\n'\n\n- ``gcs_path_to_service_account_json``: Specifies the path to the /json_auth_file.json file."
+ },
+ {
+ "output": " Start GCS with Authentication\n~\n\n.. tabs::\n .. group-tab:: Docker Image Installs\n\n This example enables the GCS data connector with authentication by passing the JSON authentication file."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS=\"file,gcs\" \\\n -e DRIVERLESS_AI_GCS_PATH_TO_SERVICE_ACCOUNT_JSON=\"/service_account_json.json\" \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n -v `pwd`/service_account_json.json:/service_account_json.json \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Docker Image with the config.toml\n\n This example shows how to configure the GCS data connector options in the config.toml file, and then specify that file when starting Driverless AI in Docker."
+ },
+ {
+ "output": " Configure the Driverless AI config.toml file. Set the following configuration options:\n\n - ``enabled_file_systems = \"file, upload, gcs\"``\n - ``gcs_path_to_service_account_json = \"/service_account_json.json\"`` \n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example enables the GCS data connector with authentication by passing the JSON authentication file."
+ },
+ {
+ "output": " 1. Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, gcs\"\n\n # GCS Connector credentials\n # example (suggested) \"/licenses/my_service_account_json.json\"\n gcs_path_to_service_account_json = \"/service_account_json.json\"\n\n 3."
+ },
+ {
+ "output": " .. _model-settings:\n\nModel Settings\n\n\n``enable_constant_model``\n~\n.. dropdown:: Constant Models\n\t:open:\n\n\tSpecify whether to enable :ref:`constant models `."
+ },
+ {
+ "output": " ``enable_decision_tree``\n\n.. dropdown:: Decision Tree Models\n\t:open:\n\n\tSpecify whether to build Decision Tree models as part of the experiment."
+ },
+ {
+ "output": " In this case, Driverless AI will build Decision Tree models if interpretability is greater than or equal to the value of ``decision_tree_interpretability_switch`` (which defaults to 7) and accuracy is less than or equal to ``decision_tree_accuracy_switch`` (which defaults to 7)."
+ },
+ {
+ "output": " GLMs are very interpretable models with one coefficient per feature, an intercept term and a link function."
+ },
+ {
+ "output": " ``enable_xgboost_gbm``\n\n.. dropdown:: XGBoost GBM Models\n\t:open:\n\n\tSpecify whether to build XGBoost models as part of the experiment (for both the feature engineering part and the final model)."
+ },
+ {
+ "output": " This is set to Auto by default. In this case, Driverless AI will use XGBoost unless the number of rows * columns is greater than a threshold."
+ },
+ {
+ "output": " ``enable_lightgbm``\n~\n.. dropdown:: LightGBM Models\n\t:open:\n\n\tSpecify whether to build LightGBM models as part of the experiment."
+ },
+ {
+ "output": " This is set to Auto (enabled) by default. ``enable_xgboost_dart``\n~\n.. dropdown:: XGBoost Dart Models\n\t:open:\n\n\tSpecify whether to use XGBoost's Dart method when building models for experiment (for both the feature engineering part and the final model)."
+ },
+ {
+ "output": " .. _enable_xgboost_rapids:\n\n``enable_xgboost_rapids``\n~\n.. dropdown:: Enable RAPIDS-cuDF extensions to XGBoost GBM/Dart\n\t:open:\n\n\tSpecify whether to enable RAPIDS extensions to XGBoost GBM/Dart."
+ },
+ {
+ "output": " The equivalent config.toml parameter is ``enable_xgboost_rapids`` and the default value is False. Disabled for dask multinode models due to bug in dask_cudf and xgboost."
+ },
+ {
+ "output": " This setting is disabled unless switched on. .. _enable_xgboost_gbm_dask:\n\n``enable_xgboost_gbm_dask``\n~\n.. dropdown:: Enable Dask_cuDF (multi-GPU) XGBoost GBM\n\t:open:\n\n\tSpecify whether to enable Dask_cudf (multi-GPU) version of XGBoost GBM."
+ },
+ {
+ "output": " Only applicable for single final model without early stopping. No Shapley possible. The equivalent config.toml parameter is ``enable_xgboost_gbm_dask`` and the default value is \"auto\"."
+ },
+ {
+ "output": " This option is disabled unless switched on. Only applicable for single final model without early stopping."
+ },
+ {
+ "output": " The equivalent config.toml parameter is ``enable_xgboost_dart_dask`` and the default value is \"auto\"."
+ },
+ {
+ "output": " .. _enable_lightgbm_dask:\n\n``enable_lightgbm_dask``\n\n.. dropdown:: Enable Dask (multi-node) LightGBM\n\t:open:\n\n\tSpecify whether to enable multi-node LightGBM."
+ },
+ {
+ "output": " The equivalent config.toml parameter is ``enable_lightgbm_dask`` and default value is \"auto\". To enable multinode Dask see :ref:`Dask Multinode Training `."
+ },
+ {
+ "output": " \"auto\" and \"on\" are same currently. Dask mode for hyperparameter search is enabled if:\n\n\t\t1) Have a :ref:`Dask multinode cluster ` or multi-GPU node and model uses 1 GPU for each model( see :ref:`num-gpus-per-model`)."
+ },
+ {
+ "output": " The equivalent config.toml parameter is ``enable_hyperopt_dask`` and the default value is \"auto\". .. _num_inner_hyperopt_trials_prefinal:\n\n``num_inner_hyperopt_trials_prefinal``\n\n.. dropdown:: Number of trials for hyperparameter optimization during model tuning only\n\t:open:\n\n\tSpecify the number of trials for Optuna hyperparameter optimization for tuning and evolution of models."
+ },
+ {
+ "output": " 0 means no trials. For small data, 100 is fine, while for larger data smaller values are reasonable if need results quickly."
+ },
+ {
+ "output": " The equivalent config.toml parameter is ``num_inner_hyperopt_trials_prefinal`` and the default value is 0."
+ },
+ {
+ "output": " However, this can overfit on a single fold when doing tuning or evolution, and if using Cross Validation then, averaging the fold hyperparameters can lead to unexpected results."
+ },
+ {
+ "output": " If using RAPIDS or DASK, this is number of trials for rapids-cudf hyperparameter optimization within XGBoost GBM/Dart and LightGBM, and hyperparameter optimization keeps data on GPU entire time."
+ },
+ {
+ "output": " This setting applies to final model only, even if num_inner_hyperopt_trials=0. The equivalent config.toml parameter is ``num_inner_hyperopt_trials_final`` and the default value is 0."
+ },
+ {
+ "output": " The default value is -1, means all. 0 is same as choosing no Optuna trials. Might be only beneficial to optimize hyperparameters of best individual (i.e."
+ },
+ {
+ "output": " The default value is -1, means all. The equivalent config.toml parameter is ``num_hyperopt_individuals_final``\n\n``optuna_pruner``\n~\n.. dropdown:: Optuna Pruners\n\t:open:\n\n\t`Optuna Pruner `__ algorithm to use for early stopping of unpromising trials (applicable to XGBoost and LightGBM that support Optuna callbacks)."
+ },
+ {
+ "output": " To disable choose None. The equivalent config.toml parameter is ``optuna_pruner``\n\n``optuna_sampler``\n\n.. dropdown:: Optuna Samplers\n\t:open:\n\n\t`Optuna Sampler `__ algorithm to use for narrowing down and optimizing the search space (applicable to XGBoost and LightGBM that support Optuna callbacks)."
+ },
+ {
+ "output": " To disable choose None. The equivalent config.toml parameter is ``optuna_sampler``\n\n``enable_xgboost_hyperopt_callback``\n\n\n.. dropdown:: Enable Optuna XGBoost Pruning callback\n\t:open:\n\n\tSpecify whether to enable Optuna's XGBoost Pruning callback to abort unpromising runs."
+ },
+ {
+ "output": " This not is enabled when tuning learning rate. The equivalent config.toml parameter is ``enable_xgboost_hyperopt_callback``\n\n``enable_lightgbm_hyperopt_callback``\n~\n.. dropdown:: Enable Optuna LightGBM Pruning callback\n\t:open:\n\n\tSpecify whether to enable Optuna's LightGBM Pruning callback to abort unpromising runs."
+ },
+ {
+ "output": " This not is enabled when tuning learning rate. The equivalent config.toml parameter is ``enable_lightgbm_hyperopt_callback``\n\n``enable_tensorflow``\n~\n.. dropdown:: TensorFlow Models\n\t:open:\n\n\tSpecify whether to build `TensorFlow `__ models as part of the experiment (usually only for text features engineering and for the final model unless it's used exclusively)."
+ },
+ {
+ "output": " This is set to Auto by default (not used unless the number of classes is greater than 10). TensorFlow models are not yet supported by Java MOJOs (only Python scoring pipelines and C++ MOJOs are supported)."
+ },
+ {
+ "output": " By default, this parameter is set to auto i.e Driverless decides internally whether to use the algorithm for the experiment."
+ },
+ {
+ "output": " ``enable_ftrl``\n~\n.. dropdown:: FTRL Models\n\t:open:\n\n\tSpecify whether to build Follow the Regularized Leader (FTRL) models as part of the experiment."
+ },
+ {
+ "output": " FTRL supports binomial and multinomial classification for categorical targets, as well as regression for continuous targets."
+ },
+ {
+ "output": " ``enable_rulefit``\n\n.. dropdown:: RuleFit Models\n\t:open:\n\n\tSpecify whether to build `RuleFit `__ models as part of the experiment."
+ },
+ {
+ "output": " Note that multiclass classification is not yet supported for RuleFit models. Rules are stored to text files in the experiment directory for now."
+ },
+ {
+ "output": " .. _zero-inflated:\n\n``enable_zero_inflated_models``\n~\n.. dropdown:: Zero-Inflated Models\n\t:open:\n\n\tSpecify whether to enable the automatic addition of :ref:`zero-inflated models ` for regression problems with zero-inflated target values that meet certain conditions:\n\n\t::\n\n\t y >= 0, y.std() > y.mean()\")\n\n\tThis is set to Auto by default."
+ },
+ {
+ "output": " Select one or more of the following:\n\n\t- gbdt: Boosted trees\n\t- rf_early_stopping: Random Forest with early stopping\n\t- rf: Random Forest\n\t- dart: Dropout boosted trees with no early stopping\n\n\tgbdt and rf are both enabled by default."
+ },
+ {
+ "output": " This is disabled by default. Notes:\n\n\t- Only supported for CPU. - A MOJO is not built when this is enabled."
+ },
+ {
+ "output": " LightGBM CUDA is supported on Linux x86-64 environments. ``show_constant_model``\n~\n.. dropdown:: Whether to Show Constant Models in Iteration Panel\n\t:open:\n\n\tSpecify whether to show constant models in the iteration panel."
+ },
+ {
+ "output": " ``params_tensorflow``\n~\n.. dropdown:: Parameters for TensorFlow\n\t:open:\n\n\tSpecify specific parameters for TensorFlow to override Driverless AI parameters."
+ },
+ {
+ "output": " Different strategies for using TensorFlow parameters can be viewed `here `__."
+ },
+ {
+ "output": " This defaults to 3000. Depending on accuracy settings, a fraction of this limit will be used. ``n_estimators_list_no_early_stopping``\n~\n.. dropdown:: n_estimators List to Sample From for Model Mutations for Models That Do Not Use Early Stopping\n\t:open:\n\n\tFor LightGBM, the dart and normal random forest modes do not use early stopping."
+ },
+ {
+ "output": " ``min_learning_rate_final``\n~\n.. dropdown:: Minimum Learning Rate for Final Ensemble GBM Models\n\t:open:\n\n\tThis value defaults to 0.01."
+ },
+ {
+ "output": " Then, one can try increasing the learning rate by raising this minimum, or one can try increasing the maximum number of trees/iterations."
+ },
+ {
+ "output": " This value defaults to 0.05. ``max_nestimators_feature_evolution_factor``\n\n.. dropdown:: Reduction Factor for Max Number of Trees/Iterations During Feature Evolution\n\t:open:\n\n\tSpecify the factor by which the value specified by the :ref:`max-trees-iterations` setting is reduced for tuning and feature evolution."
+ },
+ {
+ "output": " So by default, Driverless AI will produce no more than 0.2 * 3000 trees/iterations during feature evolution."
+ },
+ {
+ "output": " absolute delta between training and validation scores for tree models\n\t:open:\n\n\tModify early stopping behavior for tree-based models (LightGBM, XGBoostGBM, CatBoost) such that training score (on training data, not holdout) and validation score differ no more than this absolute value (i.e., stop adding trees once abs(train_score - valid_score) > max_abs_score_delta_train_valid)."
+ },
+ {
+ "output": " This option is Experimental, and only for expert use to keep model complexity low. To disable, set to 0.0."
+ },
+ {
+ "output": " .. _max_rel_score_delta_train_valid:\n\n``max_rel_score_delta_train_valid``\n~\n.. dropdown:: Max. relative delta between training and validation scores for tree models\n\t:open:\n\n\tModify early stopping behavior for tree-based models (LightGBM, XGBoostGBM, CatBoost) such that training score (on training data, not holdout) and validation score differ no more than this relative value (i.e., stop adding trees once abs(train_score - valid_score) > max_rel_score_delta_train_valid * abs(train_score))."
+ },
+ {
+ "output": " This option is Experimental, and only for expert use to keep model complexity low. To disable, set to 0.0."
+ },
+ {
+ "output": " ``min_learning_rate``\n~\n.. dropdown:: Minimum Learning Rate for Feature Engineering GBM Models\n\t:open:\n\n\tSpecify the minimum learning rate for feature engineering GBM models."
+ },
+ {
+ "output": " ``max_learning_rate``\n~\n.. dropdown:: Max Learning Rate for Tree Models\n\t:open:\n\n\tSpecify the maximum learning rate for tree models during feature engineering."
+ },
+ {
+ "output": " This value defaults to 0.5. ``max_epochs``\n\n.. dropdown:: Max Number of Epochs for TensorFlow/FTRL\n\t:open:\n\n\tWhen building TensorFlow or FTRL models, specify the maximum number of epochs to train models with (it might stop earlier)."
+ },
+ {
+ "output": " This option is ignored if TensorFlow models and/or FTRL models is disabled. ``max_max_depth``\n~\n.. dropdown:: Max Tree Depth\n\t:open:\n\n\tSpecify the maximum tree depth."
+ },
+ {
+ "output": " This value defaults to 12. ``max_max_bin``\n~\n.. dropdown:: Max max_bin for Tree Features\n\t:open:\n\n\tSpecify the maximum ``max_bin`` for tree features."
+ },
+ {
+ "output": " ``rulefit_max_num_rules``\n~\n.. dropdown:: Max Number of Rules for RuleFit\n\t:open:\n\n\tSpecify the maximum number of rules to be used for RuleFit models."
+ },
+ {
+ "output": " .. _ensemble_meta_learner:\n\n``ensemble_meta_learner``\n~\n.. dropdown:: Ensemble Level for Final Modeling Pipeline\n\t:open:\n\n\tModel to combine base model predictions, for experiments that create a final pipeline\n\tconsisting of multiple base models:\n\n\t- blender: Creates a linear blend with non-negative weights that add to 1 (blending) - recommended\n\t- extra_trees: Creates a tree model to non-linearly combine the base models (stacking) - experimental, and recommended to also set enable :ref:`cross_validate_meta_learner`."
+ },
+ {
+ "output": " (Default)\n\t- 0 = No ensemble, only final single model on validated iteration/tree count. Note that holdout predicted probabilities will not be available."
+ },
+ {
+ "output": " - 1 = 1 model, multiple ensemble folds (cross-validation)\n\t- 2 = 2 models, multiple ensemble folds (cross-validation)\n\t- 3 = 3 models, multiple ensemble folds (cross-validation)\n\t- 4 = 4 models, multiple ensemble folds (cross-validation)\n\n\tThe equivalent config.toml parameter is ``fixed_ensemble_level``."
+ },
+ {
+ "output": " Especially recommended for ensemble_meta_learner='extra_trees', to make unbiased training holdout predictions."
+ },
+ {
+ "output": " Not needed for ensemble_meta_learner='blender'. ``cross_validate_single_final_model``\n~\n.. dropdown:: Cross-Validate Single Final Model\n\t:open:\n\n\tDriverless AI normally produces a single final model for low accuracy settings (typically, less than 5)."
+ },
+ {
+ "output": " The final pipeline will build :math:`N+1` models, with N-fold cross validation for the single final model."
+ },
+ {
+ "output": " Note that the setting for this option is ignored for time-series experiments or when a validation dataset is provided."
+ },
+ {
+ "output": " Specify a lower value to avoid excessive tuning, or specify a higher to perform enhanced tuning. This option defaults to -1 (auto)."
+ },
+ {
+ "output": " This is set to off by default. Choose from the following options:\n\n\t- auto: sample both classes as needed, depending on data\n\t- over_under_sampling: over-sample the minority class and under-sample the majority class, depending on data\n\t- under_sampling: under-sample the majority class to reach class balance\n\t- off: do not perform any sampling\n\n\tThis option is closely tied with the Imbalanced Light GBM and Imbalanced XGBoost GBM models, which can be enabled/disabled on the Recipes tab under :ref:`included_models`."
+ },
+ {
+ "output": " If the target fraction proves to be above the allowed imbalance threshold, then sampling will be triggered."
+ },
+ {
+ "output": " The setting here will be ignored. ``imbalance_sampling_threshold_min_rows_original``\n\n.. dropdown:: Threshold for Minimum Number of Rows in Original Training Data to Allow Imbalanced Sampling\n\t:open:\n\n\tSpecify a threshold for the minimum number of rows in the original training data that allow imbalanced sampling."
+ },
+ {
+ "output": " ``imbalance_ratio_sampling_threshold``\n\n.. dropdown:: Ratio of Majority to Minority Class for Imbalanced Binary Classification to Trigger Special Sampling Techniques (if Enabled)\n\t:open:\n\n\tFor imbalanced binary classification problems, specify the ratio of majority to minority class."
+ },
+ {
+ "output": " This value defaults to 5. ``heavy_imbalance_ratio_sampling_threshold``\n\n.. dropdown:: Ratio of Majority to Minority Class for Heavily Imbalanced Binary Classification to Only Enable Special Sampling Techniques (if Enabled)\n\t:open:\n\n\tFor heavily imbalanced binary classification, specify the ratio of the majority to minority class equal and above which to enable only special imbalanced models on the full original data without upfront sampling."
+ },
+ {
+ "output": " ``imbalance_sampling_number_of_bags``\n~\n.. dropdown:: Number of Bags for Sampling Methods for Imbalanced Binary Classification (if Enabled)\n\t:open:\n\n\tSpecify the number of bags for sampling methods for imbalanced binary classification."
+ },
+ {
+ "output": " ``imbalance_sampling_max_number_of_bags``\n~\n.. dropdown:: Hard Limit on Number of Bags for Sampling Methods for Imbalanced Binary Classification\n\t:open:\n\n\tSpecify the limit on the number of bags for sampling methods for imbalanced binary classification."
+ },
+ {
+ "output": " ``imbalance_sampling_max_number_of_bags_feature_evolution``\n~\n.. dropdown:: Hard Limit on Number of Bags for Sampling Methods for Imbalanced Binary Classification During Feature Evolution Phase\n\t:open:\n\n\tSpecify the limit on the number of bags for sampling methods for imbalanced binary classification."
+ },
+ {
+ "output": " Note that this setting only applies to shift, leakage, tuning, and feature evolution models. To limit final models, use the Hard Limit on Number of Bags for Sampling Methods for Imbalanced Binary Classification setting."
+ },
+ {
+ "output": " This setting controls the approximate number of bags and is only active when the \"Hard limit on number of bags for sampling methods for imbalanced binary classification during feature evolution phase\" option is set to -1."
+ },
+ {
+ "output": " ``imbalance_sampling_target_minority_fraction``\n~\n.. dropdown:: Target Fraction of Minority Class After Applying Under/Over-Sampling Techniques\n\t:open:\n\n\tSpecify the target fraction of a minority class after applying under/over-sampling techniques."
+ },
+ {
+ "output": " When starting from an extremely imbalanced original target, it can be advantageous to specify a smaller value such as 0.1 or 0.01."
+ },
+ {
+ "output": " ``ftrl_max_interaction_terms_per_degree``\n~\n.. dropdown:: Max Number of Automatic FTRL Interactions Terms for 2nd, 3rd, 4th order interactions terms (Each)\n\t:open:\n\n\tSamples the number of automatic FTRL interactions terms to no more than this value (for each of 2nd, 3rd, 4th order terms)."
+ },
+ {
+ "output": " When enabled, this setting provides error bars to validation and test scores based on the standard error of the bootstrap mean."
+ },
+ {
+ "output": " ``tensorflow_num_classes_switch``\n~\n.. dropdown:: For Classification Problems with This Many Classes, Default to TensorFlow\n\t:open:\n\n\tSpecify the number of classes above which to use TensorFlow when it is enabled."
+ },
+ {
+ "output": " (Models set to On, however, are still used.) This value defaults to 10. .. _compute-intervals:\n\n``prediction_intervals``\n\n.. dropdown:: Compute Prediction Intervals\n\t:open:\n\n\tSpecify whether to compute empirical prediction intervals based on holdout predictions."
+ },
+ {
+ "output": " .. _confidence-level:\n\n``prediction_intervals_alpha``\n\n.. dropdown:: Confidence Level for Prediction Intervals\n\t:open:\n\n\tSpecify a confidence level for prediction intervals."
+ },
+ {
+ "output": " ``dump_modelparams_every_scored_indiv``\n~\n\n.. dropdown:: Enable detailed scored model info\n\t:open:\n\n\tWhether to dump every scored individual's model parameters to csv/tabulated/json file produces files."
+ },
+ {
+ "output": " Install the Driverless AI AWS Community AMI\n-\n\nWatch the installation video `here `__."
+ },
+ {
+ "output": " Environment\n~\n\n++-++-+\n| Provider | Instance Type | Num GPUs | Suitable for |\n++=++=+\n| AWS | p2.xlarge | 1 | Experimentation |\n| +-++-+\n| | p2.8xlarge | 8 | Serious use |\n| +-++-+\n| | p2.16xlarge | 16 | Serious use |\n| +-++-+\n| | p3.2xlarge | 1 | Experimentation |\n| +-++-+\n| | p3.8xlarge | 4 | Serious use |\n| +-++-+\n| | p3.16xlarge | 8 | Serious use |\n| +-++-+\n| | g3.4xlarge | 1 | Experimentation |\n| +-++-+\n| | g3.8xlarge | 2 | Experimentation |\n| +-++-+\n| | g3.16xlarge | 4 | Serious use |\n++-++-+\n\n\nInstalling the EC2 Instance\n~\n\n1."
+ },
+ {
+ "output": " 2. In the upper right corner of the Amazon Web Services page, set the location drop-down. (Note: We recommend selecting the US East region because H2O's resources are stored there."
+ },
+ {
+ "output": " .. image:: ../images/ami_location_dropdown.png\n :align: center\n\n\n3. Select the EC2 option under the Compute section to open the EC2 Dashboard."
+ },
+ {
+ "output": " Click the Launch Instance button under the Create Instance section. .. image:: ../images/ami_launch_instance_button.png\n :align: center\n\n5."
+ },
+ {
+ "output": " .. image:: ../images/ami_select_h2oai_ami.png\n :align: center\n\n6. On the Choose an Instance Type page, select GPU compute in the Filter by dropdown."
+ },
+ {
+ "output": " Select a GPU compute instance from the available options. (We recommend at least 32 vCPUs.) Click the Next: Configure Instance Details button."
+ },
+ {
+ "output": " Specify the Instance Details that you want to configure. Create a VPC or use an existing one, and ensure that \"Auto-Assign Public IP\" is enabled and associated to your subnet."
+ },
+ {
+ "output": " .. image:: ../images/ami_configure_instance_details.png\n :align: center\n\n8. Specify the Storage Device settings."
+ },
+ {
+ "output": " The machine should have a minimum of 30 GB of disk space. Click Next: Add Tags. .. image:: ../images/ami_add_storage.png\n :align: center\n\n9."
+ },
+ {
+ "output": " Click Next: Configure Security Group. 10. Add the following security rules to enable SSH access to Driverless AI, then click Review and Launch."
+ },
+ {
+ "output": " 12. A popup will appear prompting you to select a key pair. This is required in order to SSH into the instance."
+ },
+ {
+ "output": " Be sure to accept the acknowledgement, then click Launch Instances to start the new instance. .. image:: ../images/ami_select_key_pair.png\n :align: center\n\n13."
+ },
+ {
+ "output": " Click the View Instances button to see information about the instance including the IP address. The Connect button on this page provides information on how to SSH into your instance."
+ },
+ {
+ "output": " Open a Terminal window and SSH into the IP address of the AWS instance. Replace the DNS name below with your instance DNS."
+ },
+ {
+ "output": " .. code-block:: bash\n\n chmod 400 mykeypair.pem\n\n15. If you selected a GPU-compute instance, then you must enable persistence and optimizations of the GPU."
+ },
+ {
+ "output": " Note also that these commands need to be run once every reboot. Refer to the following for more information: \n\n - http://docs.nvidia.com/deploy/driver-persistence/index.html\n - https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/optimize_gpu.html\n - https://www.migenius.com/articles/realityserver-on-aws\n\n .. code-block:: bash\n\n # g3:\n sudo nvidia-persistenced persistence-mode\n sudo nvidia-smi -acp 0\n sudo nvidia-smi auto-boost-permission=0\n sudo nvidia-smi auto-boost-default=0\n sudo nvidia-smi -ac \"2505,1177\"\n\n # p2:\n sudo nvidia-persistenced persistence-mode\n sudo nvidia-smi -acp 0\n sudo nvidia-smi auto-boost-permission=0\n sudo nvidia-smi auto-boost-default=0\n sudo nvidia-smi -ac \"2505,875\"\n\n # p3:\n sudo nvidia-persistenced persistence-mode\n sudo nvidia-smi -acp 0\n sudo nvidia-smi -ac \"877,1530\"\n\n\n16."
+ },
+ {
+ "output": " For example:\n\n .. code-block:: bash\n\n scp -i /path/mykeypair.pem ubuntu@ec2-34-230-6-230.compute-1.amazonaws.com:/path/to/file/to/be/copied/example.csv /path/of/destination/on/local/machine\n\n where:\n \n * ``i`` is the identify file option\n * ``mykeypair`` is the name of the private keypair file\n * ``ubuntu`` is the name of the private keypair file\n * ``ec2-34-230-6-230.compute-1.amazonaws.com`` is the public DNS name of the instance\n * ``example.csv`` is the file to transfer\n\n17."
+ },
+ {
+ "output": " Sign in to Driverless AI with the username h2oai and use the AWS InstanceID as the password. You will be prompted to enter your Driverless AI license key when you log in for the first time."
+ },
+ {
+ "output": " To stop the instance: \n\n1. On the EC2 Dashboard, click the Running Instances link under the Resources section."
+ },
+ {
+ "output": " Select the instance that you want to stop. 3. In the Actions drop down menu, select Instance State > Stop."
+ },
+ {
+ "output": " .. _nlp-settings:\n\nNLP Settings\n\n\n``enable_tensorflow_textcnn``\n~\n.. dropdown:: Enable Word-Based CNN TensorFlow Models for NLP\n\t:open:\n\n\tSpecify whether to use out-of-fold predictions from Word-based CNN TensorFlow models as transformers for NLP."
+ },
+ {
+ "output": " We recommend that you disable this option on systems that do not use GPUs. ``enable_tensorflow_textbigru``\n~\n.. dropdown:: Enable Word-Based BiGRU TensorFlow Models for NLP\n\t:open:\n\n\tSpecify whether to use out-of-fold predictions from Word-based BiG-RU TensorFlow models as transformers for NLP."
+ },
+ {
+ "output": " We recommend that you disable this option on systems that do not use GPUs. ``enable_tensorflow_charcnn``\n~\n.. dropdown:: Enable Character-Based CNN TensorFlow Models for NLP\n\t:open:\n\n\tSpecify whether to use out-of-fold predictions from Character-level CNN TensorFlow models as transformers for NLP."
+ },
+ {
+ "output": " We recommend that you disable this option on systems that do not use GPUs. ``enable_pytorch_nlp_model``\n\n.. dropdown:: Enable PyTorch Models for NLP\n\t:open:\n\n\tSpecify whether to enable pretrained PyTorch models and fine-tune them for NLP tasks."
+ },
+ {
+ "output": " You need to set this to On if you want to use the PyTorch models like BERT for modeling. Only the first text column will be used for modeling with these models."
+ },
+ {
+ "output": " ``enable_pytorch_nlp_transformer``\n\n.. dropdown:: Enable pre-trained PyTorch Transformers for NLP\n\t:open:\n\n\tSpecify whether to enable pretrained PyTorch models for NLP tasks."
+ },
+ {
+ "output": " You need to set this to On if you want to use the PyTorch models like BERT for feature engineering (via fitting a linear model on top of pretrained embeddings)."
+ },
+ {
+ "output": " Notes:\n\n\t- This setting requires an Internet connection. ``pytorch_nlp_pretrained_models``\n~\n.. dropdown:: Select Which Pretrained PyTorch NLP Models to Use\n\t:open:\n\n\tSpecify one or more pretrained PyTorch NLP models to use."
+ },
+ {
+ "output": " - Models that are not selected by default may not have MOJO support. - Using BERT-like models may result in a longer experiment completion time."
+ },
+ {
+ "output": " The higher the number of epochs, the higher the run time. This value defaults to 2 and is ignored if TensorFlow models is disabled."
+ },
+ {
+ "output": " Values equal and above will add all enabled TensorFlow NLP models at the start of the experiment for text-dominated problems when the following NLP expert settings are set to Auto:\n\n\t- Enable word-based CNN TensorFlow models for NLP\n\t- Enable word-based BigRU TensorFlow models for NLP\n\t- Enable character-based CNN TensorFlow models for NLP\n\n\tIf the above transformations are set to ON, this parameter is ignored."
+ },
+ {
+ "output": " This value defaults to 5. ``pytorch_nlp_fine_tuning_num_epochs``\n\n.. dropdown:: Number of Epochs for Fine-Tuning of PyTorch NLP Models\n\t:open:\n\n\tSpecify the number of epochs used when fine-tuning PyTorch NLP models."
+ },
+ {
+ "output": " ``pytorch_nlp_fine_tuning_batch_size``\n\n.. dropdown:: Batch Size for PyTorch NLP Models\n\t:open:\n\n\tSpecify the batch size for PyTorch NLP models."
+ },
+ {
+ "output": " Note: Large models and batch sizes require more memory. ``pytorch_nlp_fine_tuning_padding_length``\n\n.. dropdown:: Maximum Sequence Length for PyTorch NLP Models\n\t:open:\n\n\tSpecify the maximum sequence length (padding length) for PyTorch NLP models."
+ },
+ {
+ "output": " Note: Large models and padding lengths require more memory. ``pytorch_nlp_pretrained_models_dir``\n~\n.. dropdown:: Path to Pretrained PyTorch NLP Models\n\t:open:\n\n\tSpecify a path to pretrained PyTorch NLP models."
+ },
+ {
+ "output": " Note that this can be either a path in the local file system (``/path/on/server/to/file.txt``) or an S3 location (``s3://``)."
+ },
+ {
+ "output": " - You can download the Glove embeddings from `here `__ and specify the local path in this box."
+ },
+ {
+ "output": " - You can also train your own custom embeddings. Please refer to `this code sample `__ for creating custom embeddings that can be passed on to this option."
+ },
+ {
+ "output": " .. _tensorflow_nlp_pretrained_s3_access_key_id:\n\n``tensorflow_nlp_pretrained_s3_access_key_id``\n\n.. dropdown:: S3 access key ID to use when ``tensorflow_nlp_pretrained_embeddings_file_path`` is set to an S3 location\n\t:open:\n\n\tSpecify an S3 access key ID to use when ``tensorflow_nlp_pretrained_embeddings_file_path`` is set to an S3 location."
+ },
+ {
+ "output": " .. _tensorflow_nlp_pretrained_s3_secret_access_key:\n\n``tensorflow_nlp_pretrained_s3_secret_access_key``\n\n.. dropdown:: S3 secret access key to use when ``tensorflow_nlp_pretrained_embeddings_file_path`` is set to an S3 location\n\t:open:\n\n\tSpecify an S3 secret access key to use when ``tensorflow_nlp_pretrained_embeddings_file_path`` is set to an S3 location."
+ },
+ {
+ "output": " ``tensorflow_nlp_pretrained_embeddings_trainable``\n\n.. dropdown:: For TensorFlow NLP, Allow Training of Unfrozen Pretrained Embeddings\n\t:open:\n\n\tSpecify whether to allow training of all weights of the neural network graph, including the pretrained embedding layer weights."
+ },
+ {
+ "output": " All other weights, however, will still be fine-tuned. This is disabled by default. ``text_fraction_for_text_dominated_problem``\n\n.. dropdown:: Fraction of Text Columns Out of All Features to be Considered a Text-Dominanted Problem\n\t:open:\n\n\tSpecify the fraction of text columns out of all features to be considered as a text-dominated problem."
+ },
+ {
+ "output": " Specify when a string column will be treated as text (for an NLP problem) or just as a standard categorical variable."
+ },
+ {
+ "output": " This value defaults to 0.3. ``text_transformer_fraction_for_text_dominated_problem``\n\n.. dropdown:: Fraction of Text per All Transformers to Trigger That Text Dominated\n\t:open:\n\n\tSpecify the fraction of text columns out of all features to be considered a text-dominated problem."
+ },
+ {
+ "output": " ``string_col_as_text_threshold``\n\n.. dropdown:: Threshold for String Columns to be Treated as Text\n\t:open:\n\n\tSpecify the threshold value (from 0 to 1) for string columns to be treated as text (0.0 - text; 1.0 - string)."
+ },
+ {
+ "output": " ``text_transformers_max_vocabulary_size``\n~\n.. dropdown:: Max Size of the Vocabulary for Text Transformers\n\t:open:\n\n\tMax number of tokens created during fitting of Tfidf/Count based text transformers."
+ },
+ {
+ "output": " .. _quick-start-tables:\n\nQuick-Start Tables by Environment\n-\n\nUse the following tables for Cloud, Server, and Desktop to find the right setup instructions for your environment."
+ },
+ {
+ "output": " | Min Mem | Refer to Section |\n+=+=+=++\n| NVIDIA DGX-1 | Yes | 128 GB | :ref:`install-on-nvidia-dgx` |\n+-+-+-++\n| Ubuntu with GPUs | Yes | 64 GB | :ref:`install-on-ubuntu-with-gpus` |\n+-+-+-++\n| Ubuntu with CPUs | No | 64 GB | :ref:`install-on-ubuntu-cpus-only` |\n+-+-+-++\n| RHEL with GPUs | Yes | 64 GB | :ref:`install-on-rhel-with-gpus` |\n+-+-+-++\n| RHEL with CPUs | No | 64 GB | :ref:`install-on-rhel-cpus-only` |\n+-+-+-++\n| IBM Power (Minsky) | Yes | 64 GB | Contact sales@h2o.ai |\n+-+-+-++\n\n\nDesktop\n~\n\n+-+-+-+-++\n| Operating System | GPU Support?"
+ },
+ {
+ "output": " JDBC Setup\n\n\nDriverless AI lets you explore Java Database Connectivity (JDBC) data sources from within the Driverless AI application."
+ },
+ {
+ "output": " Note: Depending on your Docker install version, use either the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command when starting the Driverless AI Docker image."
+ },
+ {
+ "output": " Tested Databases\n\n\nThe following databases have been tested for minimal functionality. Note that JDBC drivers that are not included in this list should work with Driverless AI."
+ },
+ {
+ "output": " See the :ref:`untested-jdbc-driver` section at the end of this chapter for information on how to try out an untested JDBC driver."
+ },
+ {
+ "output": " This is a JSON/Dictionary String with multiple keys. Note: This requires a JSON key (typically the name of the database being configured) to be associated with a nested JSON that contains the ``url``, ``jarpath``, and ``classpath`` fields."
+ },
+ {
+ "output": " Double quotation marks (``\"...\"``) must be used to denote keys and values *within* the JSON dictionary, and *outer* quotations must be formatted as either ``\"\"\"``, ``'``, or ``'``."
+ },
+ {
+ "output": " The following examples show two unique methods for applying outer quotations. - Configuration value applied with the config.toml file:\n\n ::\n\n jdbc_app_configs = \"\"\"{\"my_json_string\": \"value\", \"json_key_2\": \"value2\"}\"\"\"\n\n - Configuration value applied with an environment variable:\n \n ::\n \n DRIVERLESS_AI_JDBC_APP_CONFIGS='{\"my_json_string\": \"value\", \"json_key_2\": \"value2\"}'\n \n For example:\n \n ::\n \n DRIVERLESS_AI_JDBC_APP_CONFIGS='{\n \"postgres\": {\"url\": \"jdbc:postgresql://192.xxx.x.xxx:aaaa:/name_of_database;user=name_of_user;password=your_password\",\"jarpath\": \"/config/postgresql-xx.x.x.jar\",\"classpath\": \"org.postgresql.Driver\"}, \n \"postgres-local\": {\"url\": \"jdbc:postgresql://123.xxx.xxx.xxx:aaaa/name_of_database\",\"jarpath\": \"/config/postgresql-xx.x.x.jar\",\"classpath\": \"org.postgresql.Driver\"},\n \"ms-sql\": {\"url\": \"jdbc:sqlserver://192.xxx.x.xxx:aaaa;databaseName=name_of_database;user=name_of_user;password=your_password\",\"Username\":\"your_username\",\"passsword\":\"your_password\",\"jarpath\": \"/config/sqljdbc42.jar\",\"classpath\": \"com.microsoft.sqlserver.jdbc.SQLServerDriver\"},\n \"oracle\": {\"url\": \"jdbc:oracle:thin:@192.xxx.x.xxx:aaaa/orclpdb1\",\"jarpath\": \"ojdbc7.jar\",\"classpath\": \"oracle.jdbc.OracleDriver\"},\n \"db2\": {\"url\": \"jdbc:db2://127.x.x.x:aaaaa/name_of_database\",\"jarpath\": \"db2jcc4.jar\",\"classpath\": \"com.ibm.db2.jcc.DB2Driver\"},\n \"mysql\": {\"url\": \"jdbc:mysql://192.xxx.x.xxx:aaaa;\",\"jarpath\": \"mysql-connector.jar\",\"classpath\": \"com.mysql.jdbc.Driver\"},\n \"Snowflake\": {\"url\": \"jdbc:snowflake://.snowflakecomputing.com/?\",\"jarpath\": \"/config/snowflake-jdbc-x.x.x.jar\",\"classpath\": \"net.snowflake.client.jdbc.SnowflakeDriver\"},\n \"Derby\": {\"url\": \"jdbc:derby://127.x.x.x:aaaa/name_of_database\",\"jarpath\": \"/config/derbyclient.jar\",\"classpath\": \"org.apache.derby.jdbc.ClientDriver\"}\n }'\\\n\n- ``jdbc_app_jvm_args``: Extra jvm args for JDBC connector."
+ },
+ {
+ "output": " - ``jdbc_app_classpath``: Optionally specify an alternative classpath for the JDBC connector. - ``enabled_file_systems``: The file systems you want to enable."
+ },
+ {
+ "output": " Retrieve the JDBC Driver\n\n\n1. Download JDBC Driver JAR files:\n\n - `Oracle DB `_\n\n - `PostgreSQL `_\n\n - `Amazon Redshift `_\n\n - `Teradata `_\n\n Note: Remember to take note of the driver classpath, as it is needed for the configuration steps (for example, org.postgresql.Driver)."
+ },
+ {
+ "output": " Copy the driver JAR to a location that can be mounted into the Docker container. Note: The folder storing the JDBC jar file must be visible/readable by the dai process user."
+ },
+ {
+ "output": " Note that the JDBC connection strings will vary depending on the database that is used. .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS=\"file,hdfs,jdbc\" \\\n -e DRIVERLESS_AI_JDBC_APP_CONFIGS='{\"postgres\": \n {\"url\": \"jdbc:postgres://localhost:5432/my_database\", \n \"jarpath\": \"/path/to/postgresql/jdbc/driver.jar\", \n \"classpath\": \"org.postgresql.Driver\"}}' \\ \n -e DRIVERLESS_AI_JDBC_APP_JVM_ARGS=\"-Xmx2g\" \\\n -p 12345:12345 \\\n -v /path/to/local/postgresql/jdbc/driver.jar:/path/to/postgresql/jdbc/driver.jar \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Docker Image with the config.toml\n\n This example shows how to configure JDBC options in the config.toml file, and then specify that file when starting Driverless AI in Docker."
+ },
+ {
+ "output": " Configure the Driverless AI config.toml file. Set the following configuration options:\n\n .. code-block:: bash \n\n enabled_file_systems = \"file, upload, jdbc\"\n jdbc_app_configs = \"\"\"{\"postgres\": {\"url\": \"jdbc:postgres://localhost:5432/my_database\",\n \"jarpath\": \"/path/to/postgresql/jdbc/driver.jar\",\n \"classpath\": \"org.postgresql.Driver\"}}\"\"\"\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/jdbc/driver.jar:/path/in/docker/jdbc/driver.jar \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example enables the JDBC connector for PostgresQL."
+ },
+ {
+ "output": " - The configuration requires a JSON key (typically the name of the database being configured) to be associated with a nested JSON that contains the ``url``, ``jarpath``, and ``classpath`` fields."
+ },
+ {
+ "output": " Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"upload, file, hdfs, jdbc\"\n\n # Configuration for JDBC Connector."
+ },
+ {
+ "output": " # Format as a single line without using carriage returns (the following example is formatted for readability)."
+ },
+ {
+ "output": " # Example:\n # \"\"\"{\n # \"postgres\": {\n # \"url\": \"jdbc:postgresql://ip address:port/postgres\",\n # \"jarpath\": \"/path/to/postgres_driver.jar\",\n # \"classpath\": \"org.postgresql.Driver\"\n # },\n # \"mysql\": {\n # \"url\":\"mysql connection string\",\n # \"jarpath\": \"/path/to/mysql_driver.jar\",\n # \"classpath\": \"my.sql.classpath.Driver\"\n # }\n # }\"\"\"\n jdbc_app_configs = \"\"\"{\"postgres\": {\"url\": \"jdbc:postgres://localhost:5432/my_database\",\n \"jarpath\": \"/path/to/postgresql/jdbc/driver.jar\",\n \"classpath\": \"org.postgresql.Driver\"}}\"\"\"\n\n # extra jvm args for jdbc connector\n jdbc_app_jvm_args = \"\"\n\n # alternative classpath for jdbc connector\n jdbc_app_classpath = \"\"\n\n 3."
+ },
+ {
+ "output": " Adding Datasets Using JDBC\n\n\nAfter the JDBC connector is enabled, you can add datasets by selecting JDBC from the Add Dataset (or Drag and Drop) drop-down menu."
+ },
+ {
+ "output": " Click on the Add Dataset button on the Datasets page. 2. Select JDBC from the list that appears. 3."
+ },
+ {
+ "output": " 4. The form will populate with the JDBC Database, URL, Driver, and Jar information. Complete the following remaining fields:\n\n - JDBC Username: Enter your JDBC username."
+ },
+ {
+ "output": " (See the *Notes* section)\n\n - Destination Name: Enter a name for the new dataset. - (Optional) ID Column Name: Enter a name for the ID column."
+ },
+ {
+ "output": " Notes:\n\n - Do not include the password as part of the JDBC URL. Instead, enter the password in the JDBC Password field."
+ },
+ {
+ "output": " - Due to resource sharing within Driverless AI, the JDBC Connector is only allocated a relatively small amount of memory."
+ },
+ {
+ "output": " This ensures that the maximum memory allocation is not exceeded. - If a query that is larger than the maximum memory allocation is made without specifying an ID column, the query will not complete successfully."
+ },
+ {
+ "output": " Write a SQL Query in the format of the database that you want to query. (See the `Query Examples <#queryexamples>`__ section below.)"
+ },
+ {
+ "output": " 6. Click the Click to Make Query button to execute the query. The time it takes to complete depends on the size of the data being queried and the network speeds to the database."
+ },
+ {
+ "output": " .. _queryexamples:\n\nQuery Examples\n\n\nThe following are sample configurations and queries for Oracle DB and PostgreSQL:\n\n.. tabs:: \n .. group-tab:: Oracle DB\n\n 1."
+ },
+ {
+ "output": " Sample Query:\n\n - Select oracledb from the Select JDBC Connection dropdown menu. - JDBC Username: ``oracleuser``\n - JDBC Password: ``oracleuserpassword``\n - ID Column Name:\n - Query:\n\n ::\n\n SELECT MIN(ID) AS NEW_ID, EDUCATION, COUNT(EDUCATION) FROM my_oracle_schema.creditcardtrain GROUP BY EDUCATION\n\n Note: Because this query does not specify an ID Column Name, it will only work for small data."
+ },
+ {
+ "output": " 3. Click the Click to Make Query button to execute the query. .. group-tab:: PostgreSQL \n\n 1. Configuration:\n\n ::\n\n jdbc_app_configs = \"\"\"{\"postgres\": {\"url\": \"jdbc:postgresql://localhost:5432/postgresdatabase\", \"jarpath\": \"/home/ubuntu/postgres-artifacts/postgres/Driver.jar\", \"classpath\": \"org.postgresql.Driver\"}}\"\"\"\n\n 2."
+ },
+ {
+ "output": " - JDBC Username: ``postgres_user``\n - JDBC Password: ``pguserpassword``\n - ID Column Name: ``id``\n - Query:\n\n ::\n\n SELECT * FROM loan_level WHERE LOAN_TYPE = 5 (selects all columns from table loan_level with column LOAN_TYPE containing value 5)\n\n 3."
+ },
+ {
+ "output": " .. _untested-jdbc-driver:\n\nAdding an Untested JDBC Driver\n\n\nWe encourage you to try out JDBC drivers that are not tested in house."
+ },
+ {
+ "output": " Download the JDBC jar for your database. 2. Move your JDBC jar file to a location that DAI can access."
+ },
+ {
+ "output": " Start the Driverless AI Docker image using the JDBC-specific environment variables. .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS=\"upload,file,hdfs,s3,recipe_file,jdbc\" \\\n -e DRIVERLESS_AI_JDBC_APP_CONFIGS=\"\"\"{\"my_jdbc_database\": {\"url\": \"jdbc:my_jdbc_database://hostname:port/database\",\n \"jarpath\": \"/path/to/my/jdbc/database.jar\", \n \"classpath\": \"com.my.jdbc.Driver\"}}\"\"\"\\ \n -e DRIVERLESS_AI_JDBC_APP_JVM_ARGS=\"-Xmx2g\" \\\n -p 12345:12345 \\\n -v /path/to/local/postgresql/jdbc/driver.jar:/path/to/postgresql/jdbc/driver.jar \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Docker Image with the config.toml\n\n 1."
+ },
+ {
+ "output": " 2. Move your JDBC jar file to a location that DAI can access. 3. Configure the Driverless AI config.toml file."
+ },
+ {
+ "output": " Mount the config.toml file and requisite JAR files into the Docker container. .. code-block:: bash\n :substitutions:\n \n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/jdbc/driver.jar:/path/in/docker/jdbc/driver.jar \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n 1."
+ },
+ {
+ "output": " 2. Move your JDBC jar file to a location that DAI can access. 3. Modify the following config.toml settings."
+ },
+ {
+ "output": " # JSON/Dictionary String with multiple keys. # Format as a single line without using carriage returns (the following example is formatted for readability)."
+ },
+ {
+ "output": " # Example:\n jdbc_app_configs = \"\"\"{\"my_jdbc_database\": {\"url\": \"jdbc:my_jdbc_database://hostname:port/database\",\n \"jarpath\": \"/path/to/my/jdbc/database.jar\", \n \"classpath\": \"com.my.jdbc.Driver\"}}\"\"\"\n\n # optional extra jvm args for jdbc connector\n jdbc_app_jvm_args = \"\"\n\n # optional alternative classpath for jdbc connector\n jdbc_app_classpath = \"\"\n\n 4."
+ },
+ {
+ "output": " MinIO Setup\n-\n\nThis section provides instructions for configuring Driverless AI to work with `MinIO `__."
+ },
+ {
+ "output": " Note: Depending on your Docker install version, use either the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command when starting the Driverless AI Docker image."
+ },
+ {
+ "output": " Description of Configuration Attributes\n~\n\n- ``minio_endpoint_url``: The endpoint URL that will be used to access MinIO."
+ },
+ {
+ "output": " - ``minio_secret_access_key``: The MinIO secret access key. - ``minio_skip_cert_verification``: If this is set to true, then MinIO connector will skip certificate verification."
+ },
+ {
+ "output": " - ``enabled_file_systems``: The file systems you want to enable. This must be configured in order for data connectors to function properly."
+ },
+ {
+ "output": " It also configures Docker DNS by passing the name and IP of the name node. This lets you reference data stored in MinIO directly using the endpoint URL, for example: http:////datasets/iris.csv."
+ },
+ {
+ "output": " 1. Configure the Driverless AI config.toml file. Set the following configuration options. - ``enabled_file_systems = \"file, upload, minio\"``\n - ``minio_endpoint_url = \"\"``\n - ``minio_access_key_id = \"\"``\n - ``minio_secret_access_key = \"\"``\n - ``minio_skip_cert_verification = \"false\"``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n \n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n\n .. group-tab:: Native Installs\n\n This example enables the MinIO data connector with authentication by passing an endpoint URL, access key ID, and an access key."
+ },
+ {
+ "output": " This allows users to reference data stored in MinIO directly using the endpoint URL, for example: http:////datasets/iris.csv."
+ },
+ {
+ "output": " Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : MinIO Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, minio\"\n\n # MinIO Connector credentials\n minio_endpoint_url = \"\"\n minio_access_key_id = \"\"\n minio_secret_access_key = \"\"\n minio_skip_cert_verification = \"false\"\n\n 3."
+ },
+ {
+ "output": " .. _install-on-azure:\n\nInstall on Azure\n\n\nThis section describes how to install the Driverless AI image from Azure."
+ },
+ {
+ "output": " This is no longer the case as of version 1.5.2. Watch the installation video `here `__."
+ },
+ {
+ "output": " Environment\n~\n\n++-++-+\n| Provider | Instance Type | Num GPUs | Suitable for |\n++=++=+\n| Azure | Standard_NV6 | 1 | Experimentation |\n| +-++-+\n| | Standard_NV12 | 2 | Experimentation |\n| +-++-+\n| | Standard_NV24 | 4 | Serious use |\n| +-++-+\n| | Standard_NC6 | 1 | Experimentation |\n| +-++-+\n| | Standard_NC12 | 2 | Experimentation |\n| +-++-+\n| | Standard_NC24 | 4 | Serious use |\n++-++-+\n\nAbout the Install\n~\n\n.. include:: linux-rpmdeb-about.frag\n\nInstalling the Azure Instance\n~\n\n1."
+ },
+ {
+ "output": " 2. Search for and select H2O DriverlessAI in the Marketplace. .. image:: ../images/azure_select_driverless_ai.png\n :align: center\n\n3."
+ },
+ {
+ "output": " This launches the H2O DriverlessAI Virtual Machine creation process. .. image:: ../images/azure_search_for_dai.png\n :align: center\n\n4."
+ },
+ {
+ "output": " Enter a name for the VM. b. Select the Disk Type for the VM. Use HDD for GPU instances. c. Enter the name that you will use when connecting to the machine through SSH."
+ },
+ {
+ "output": " e. Specify the Subscription option. (This should be Pay-As-You-Go.) f. Enter a name unique name for the resource group."
+ },
+ {
+ "output": " Click OK when you are done. .. image:: ../images/azure_basics_tab.png\n :align: center\n\n5. On the Size tab, select your virtual machine size."
+ },
+ {
+ "output": " We recommend using an N-Series type, which comes with a GPU. Also note that Driverless AI requires 10 GB of free space in order to run and will stop working of less than 10 GB is available."
+ },
+ {
+ "output": " Click OK when you are done. .. image:: ../images/azure_vm_size.png\n :align: center\n\n6. On the Settings tab, select or create the Virtual Network and Subnet where the VM is going to be located and then click OK.\n\n .. image:: ../images/azure_settings_tab.png\n :align: center\n\n7."
+ },
+ {
+ "output": " When the validation passes successfully, click Create to create the VM. .. image:: ../images/azure_summary_tab.png\n :align: center\n\n8."
+ },
+ {
+ "output": " Select this Driverless AI VM to view the IP address of your newly created machine. 9. Connect to Driverless AI with your browser using the IP address retrieved in the previous step."
+ },
+ {
+ "output": " To stop the instance: \n\n1. Click the Virtual Machines left menu item. 2. Select the checkbox beside your DriverlessAI virtual machine."
+ },
+ {
+ "output": " On the right side of the row, click the ... button, then select Stop. (Note that you can then restart this by selecting Start.)"
+ },
+ {
+ "output": " \nUpgrading the Driverless AI Community Image\n~\n\n.. include:: upgrade-warning.frag\n\nUpgrading from Version 1.2.2 or Earlier\n'\n\nThe following example shows how to upgrade from 1.2.2 or earlier to the current version."
+ },
+ {
+ "output": " 1. SSH into the IP address of the image instance and copy the existing experiments to a backup location:\n\n .. code-block:: bash\n\n # Set up a directory of the previous version name\n mkdir dai_rel_1.2.2\n\n # Copy the data, log, license, and tmp directories as backup\n cp -a ./data dai_rel_1.2.2/data\n cp -a ./log dai_rel_1.2.2/log\n cp -a ./license dai_rel_1.2.2/license\n cp -a ./tmp dai_rel_1.2.2/tmp\n\n2."
+ },
+ {
+ "output": " The command below retrieves version 1.2.2:\n\n .. code-block:: bash\n\n wget https://s3.amazonaws.com/artifacts.h2o.ai/releases/ai/h2o/dai/rel-1.2.2-6/x86_64-centos7/dai-docker-centos7-x86_64-1.2.2-9.0.tar.gz\n\n3."
+ },
+ {
+ "output": " 4. Use the ``docker load`` command to load the image:\n\n .. code-block:: bash\n\n docker load < ami-0c50db5e1999408a7\n\n5."
+ },
+ {
+ "output": " 6. Connect to Driverless AI with your browser at http://Your-Driverless-AI-Host-Machine:12345. Upgrading from Version 1.3.0 or Later\n\n\nThe following example shows how to upgrade from version 1.3.0."
+ },
+ {
+ "output": " SSH into the IP address of the image instance and copy the existing experiments to a backup location:\n\n .. code-block:: bash\n\n # Set up a directory of the previous version name\n mkdir dai_rel_1.3.0\n\n # Copy the data, log, license, and tmp directories as backup\n cp -a ./data dai_rel_1.3.0/data\n cp -a ./log dai_rel_1.3.0/log\n cp -a ./license dai_rel_1.3.0/license\n cp -a ./tmp dai_rel_1.3.0/tmp\n\n2."
+ },
+ {
+ "output": " Replace VERSION and BUILD below with the Driverless AI version. .. code-block:: bash\n\n wget https://s3.amazonaws.com/artifacts.h2o.ai/releases/ai/h2o/dai/VERSION-BUILD/x86_64/dai-ubi8-centos7-x86_64-VERSION.tar.gz\n\n3."
+ },
+ {
+ "output": " In the new AMI, locate the DAI_RELEASE file, and edit that file to match the new image tag. 5. Stop and then start Driverless AI."
+ },
+ {
+ "output": " .. _gbq:\n\nGoogle BigQuery Setup\n#####################\n\nDriverless AI lets you explore Google BigQuery (GBQ) data sources from within the Driverless AI application."
+ },
+ {
+ "output": " .. note::\n\tThe setup described on this page requires you to enable authentication. Enabling the GCS and/or GBQ connectors causes those file systems to be displayed in the UI, but the GCS and GBQ connectors cannot be used without first enabling authentication."
+ },
+ {
+ "output": " In the Google Cloud Platform (GCP), create a private key for your service account. To create a private key, click Service Accounts > Keys, and then click the Add Key button."
+ },
+ {
+ "output": " To finish creating the JSON private key and download it to your local file system, click Create. 2."
+ },
+ {
+ "output": " 3. Specify the path to the downloaded and mounted ``auth-key.json`` file with the ``gcs_path_to_service_account_json`` config option."
+ },
+ {
+ "output": " Use ``docker version`` to check which version of Docker you are using. The following sections describe how to enable the GBQ data connector:\n\n- :ref:`gbq-config-toml`\n- :ref:`gbq-environment-variable`\n- :ref:`gbq-workload-identity`\n\n.. _gbq-config-toml:\n\nEnabling GBQ with the config.toml file\n\n\n.. tabs::\n .. group-tab:: Docker Image Installs\n\n This example enables the GBQ data connector with authentication by passing the JSON authentication file."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n rm \\\n shm-size=256m \\\n -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS=\"file,gbq\" \\\n -e DRIVERLESS_AI_GCS_PATH_TO_SERVICE_ACCOUNT_JSON=\"/service_account_json.json\" \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n -v `pwd`/service_account_json.json:/service_account_json.json \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Docker Image with the config.toml\n\n This example shows how to configure the GBQ data connector options in the config.toml file, and then specify that file when starting Driverless AI in Docker."
+ },
+ {
+ "output": " Configure the Driverless AI config.toml file. Set the following configuration options:\n\n - ``enabled_file_systems = \"file, upload, gbq\"``\n - ``gcs_path_to_service_account_json = \"/service_account_json.json\"``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example enables the GBQ data connector with authentication by passing the JSON authentication file."
+ },
+ {
+ "output": " 1. Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n\n # File System Support\n # file : local file system/server file system\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n enabled_file_systems = \"file, gbq\"\n\n # GCS Connector credentials\n # example (suggested) \"/licenses/my_service_account_json.json\"\n gcs_path_to_service_account_json = \"/service_account_json.json\"\n\n 3."
+ },
+ {
+ "output": " .. _gbq-environment-variable:\n\nEnabling GBQ by setting an environment variable\n*\n\nThe GBQ data connector can be configured by setting the ``GOOGLE_APPLICATION_CREDENTIALS`` environment variable as follows:\n\n::\n\n export GOOGLE_APPLICATION_CREDENTIALS=\"SERVICE_ACCOUNT_KEY_PATH\"\n\nIn the preceding example, replace ``SERVICE_ACCOUNT_KEY_PATH`` with the path of the JSON file that contains your service account key."
+ },
+ {
+ "output": " .. _gbq-workload-identity:\n\nEnabling GBQ by enabling Workload Identity for your GKE cluster\n*\n\nThe GBQ data connector can be configured by enabling Workload Identity for your Google Kubernetes Engine (GKE) cluster."
+ },
+ {
+ "output": " .. note::\n\tIf Workload Identity is enabled, then the ``GOOGLE_APPLICATION_CREDENTIALS`` environment variable does not need to be set."
+ },
+ {
+ "output": " .. note::\n\tTo run a BigQuery query with Driverless AI, the associated service account must have the following Identity and Access Management (IAM) permissions:\n\n ::\n\n bigquery.jobs.create\n bigquery.tables.create\n bigquery.tables.delete\n bigquery.tables.export\n bigquery.tables.get\n bigquery.tables.getData\n bigquery.tables.list\n bigquery.tables.update\n bigquery.tables.updateData\n storage.buckets.get\n storage.objects.create\n storage.objects.delete\n storage.objects.list\n storage.objects.update\n\n For a list of all Identity and Access Management permissions, refer to the `IAM permissions reference `_ from the official Google Cloud documentation."
+ },
+ {
+ "output": " Enter BQ Dataset ID with write access to create temporary table: Enter a dataset ID in Google BigQuery that this user has read/write access to."
+ },
+ {
+ "output": " Note: Driverless AI's connection to GBQ will inherit the top-level directory from the service JSON file."
+ },
+ {
+ "output": " 2. Enter Google Storage destination bucket: Specify the name of Google Cloud Storage destination bucket."
+ },
+ {
+ "output": " 3. Enter Name for Dataset to be saved as: Specify a name for the dataset, for example, ``my_file``."
+ },
+ {
+ "output": " Enter BigQuery Query (Use StandardSQL): Enter a StandardSQL query that you want BigQuery to execute."
+ },
+ {
+ "output": " 5. (Optional) Specify a project to use with the GBQ connector. This is equivalent to providing ``project`` when using a command-line interface."
+ },
+ {
+ "output": " Linux Docker Images\n-\n\nTo simplify local installation, Driverless AI is provided as a Docker image for the following system combinations:\n\n+-++-+-+\n| Host OS | Docker Version | Host Architecture | Min Mem |\n+=++=+=+\n| Ubuntu 16.04 or later | Docker CE | x86_64 | 64 GB |\n+-++-+-+\n| RHEL or CentOS 7.4 or later | Docker CE | x86_64 | 64 GB |\n+-++-+-+\n| NVIDIA DGX Registry | | x86_64 | |\n+-++-+-+\n\nNote: CUDA 11.2.2 or later with NVIDIA drivers >= |NVIDIA-driver-ver| is recommended (GPU only)."
+ },
+ {
+ "output": " For the best performance, including GPU support, use nvidia-docker. For a lower-performance experience without GPUs, use regular docker (with the same docker image)."
+ },
+ {
+ "output": " For information on how to obtain a license key for Driverless AI, visit https://h2o.ai/o/try-driverless-ai/."
+ },
+ {
+ "output": " Note that from version 1.10 DAI docker image runs with internal ``tini`` that is equivalent to using ``init`` from docker, if both are enabled in the launch command, ``tini`` prints a (harmless) warning message."
+ },
+ {
+ "output": " We recommend ``shm-size=256m`` in docker launch command. But if user plans to build :ref:`image auto model ` extensively, then ``shm-size=2g`` is recommended for Driverless AI docker command."
+ },
+ {
+ "output": " \nThis section provides instructions for upgrading Driverless AI versions that were installed in a Docker container."
+ },
+ {
+ "output": " WARNING: Experiments, MLIs, and MOJOs reside in the Driverless AI tmp directory and are not automatically upgraded when Driverless AI is upgraded."
+ },
+ {
+ "output": " - Build MOJO pipelines before upgrading. - Stop Driverless AI and make a backup of your Driverless AI tmp directory before upgrading."
+ },
+ {
+ "output": " Before upgrading, be sure to run MLI jobs on models that you want to continue to interpret in future releases."
+ },
+ {
+ "output": " If you did not build a MOJO pipeline on a model before upgrading Driverless AI, then you will not be able to build a MOJO pipeline on that model after upgrading."
+ },
+ {
+ "output": " Note: Stop Driverless AI if it is still running. Requirements\n\n\nWe recommend to have NVIDIA driver >= |NVIDIA-driver-ver| installed (GPU only) in your host environment for a seamless experience on all architectures, including Ampere."
+ },
+ {
+ "output": " Go to `NVIDIA download driver `__ to get the latest NVIDIA Tesla A/T/V/P/K series drivers."
+ },
+ {
+ "output": " .. note::\n\tIf you are using K80 GPUs, the minimum required NVIDIA driver version is 450.80.02. Upgrade Steps\n'\n\n1."
+ },
+ {
+ "output": " 2. Set up a directory for the version of Driverless AI on the host machine:\n\n .. code-block:: bash\n :substitutions:\n\n # Set up directory with the version name\n mkdir |VERSION-dir|\n\n # cd into the new directory\n cd |VERSION-dir|\n\n3."
+ },
+ {
+ "output": " 4. Load the Driverless AI Docker image inside the new directory:\n\n .. code-block:: bash\n :substitutions:\n\n # Load the Driverless AI docker image\n docker load < dai-docker-ubi8-x86_64-|VERSION-long|.tar.gz\n\n5."
+ },
+ {
+ "output": " Install the Driverless AI AWS Marketplace AMI\n-\n\nA Driverless AI AMI is available in the AWS Marketplace beginning with Driverless AI version 1.5.2."
+ },
+ {
+ "output": " Environment\n~\n\n++-++-+\n| Provider | Instance Type | Num GPUs | Suitable for |\n++=++=+\n| AWS | p2.xlarge | 1 | Experimentation |\n| +-++-+\n| | p2.8xlarge | 8 | Serious use |\n| +-++-+\n| | p2.16xlarge | 16 | Serious use |\n| +-++-+\n| | p3.2xlarge | 1 | Experimentation |\n| +-++-+\n| | p3.8xlarge | 4 | Serious use |\n| +-++-+\n| | p3.16xlarge | 8 | Serious use |\n| +-++-+\n| | g3.4xlarge | 1 | Experimentation |\n| +-++-+\n| | g3.8xlarge | 2 | Experimentation |\n| +-++-+\n| | g3.16xlarge | 4 | Serious use |\n++-++-+\n\nInstallation Procedure\n\n\n1."
+ },
+ {
+ "output": " 2. Search for Driverless AI. .. figure:: ../images/aws-marketplace-search.png\n :alt: Search for Driverless AI\n\n3."
+ },
+ {
+ "output": " .. figure:: ../images/aws-marketplace-versions.png\n :alt: Select version\n\n4. Scroll down to review/edit your region and the selected infrastructure and pricing."
+ },
+ {
+ "output": " Return to the top and select Continue to Subscribe. .. figure:: ../images/aws-marketplace-continue-to-subscribe.png\n :alt: Continue to subscribe\n\n6. Review the subscription, then click Continue to Configure."
+ },
+ {
+ "output": " If desired, change the Fullfillment Option, Software Version, and Region. Note that this page also includes the AMI ID for the selected software version."
+ },
+ {
+ "output": " .. figure:: ../images/aws-marketplace-configure-software.png\n :alt: Configure the software\n\n8. Review the configuration and choose a method for launching Driverless AI."
+ },
+ {
+ "output": " Scroll down to the bottom of the page and click Launch when you are done. .. figure:: ../images/aws-marketplace-launch.png\n :alt: Launch options\n\nYou will receive a \"Success\" message when the image launches successfully."
+ },
+ {
+ "output": " 1. Navigate to the `EC2 Console `__. 2. Select your instance. 3. Open another browser and launch Driverless AI by navigating to https://:12345."
+ },
+ {
+ "output": " Sign in to Driverless AI with the username h2oai and use the AWS InstanceID as the password. You will be prompted to enter your Driverless AI license key when you log in for the first time."
+ },
+ {
+ "output": " To stop the instance: \n\n1. On the EC2 Dashboard, click the Running Instances link under the Resources section."
+ },
+ {
+ "output": " Select the instance that you want to stop. 3. In the Actions drop down menu, select Instance State > Stop."
+ },
+ {
+ "output": " A confirmation page will display. Click Yes, Stop to stop the instance. Upgrading the Driverless AI Marketplace Image\n\n\nNote that the first offering of the Driverless AI Marketplace image was 1.5.2."
+ },
+ {
+ "output": " Perform the following steps if you are upgrading to a Driverless AI Marketeplace image version greater than 1.5.2."
+ },
+ {
+ "output": " Note that this upgrade process inherits the service user and group from /etc/dai/User.conf and /etc/dai/Group.conf."
+ },
+ {
+ "output": " .. code-block:: bash\n\n # Stop Driverless AI. sudo systemctl stop dai\n\n # Make a backup of /opt/h2oai/dai/tmp directory at this time."
+ },
+ {
+ "output": " .. _install-on-google-compute:\n\nInstall on Google Compute\n-\n\nDriverless AI can be installed on Google Compute using one of two methods:\n\n- Install the Google Cloud Platform offering."
+ },
+ {
+ "output": " - Install and Run in a Docker Container on Google Compute Engine. This installs and runs Driverless AI from scratch in a Docker container on Google Compute Engine."
+ },
+ {
+ "output": " kdb+ Setup\n\n\nDriverless AI lets you explore `kdb+ `__ data sources from within the Driverless AI application."
+ },
+ {
+ "output": " Note: Depending on your Docker install version, use either the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command when starting the Driverless AI Docker image."
+ },
+ {
+ "output": " Description of Configuration Attributes\n~\n\n- ``kdb_user``: (Optional) User name \n- ``kdb_password``: (Optional) User's password\n- ``kdb_hostname``: IP address or host of the KDB server\n- ``kdb_port``: Port on which the kdb+ server is listening\n- ``kdb_app_jvm_args``: (Optional) JVM args for kdb+ distributions (for example, ``-Dlog4j.configuration``)."
+ },
+ {
+ "output": " - ``kdb_app_classpath``: (Optional) The kdb+ classpath (or other if the jar file is stored elsewhere)."
+ },
+ {
+ "output": " This must be configured in order for data connectors to function properly. Example 1: Enable kdb+ with No Authentication\n~\n\n.. tabs::\n .. group-tab:: Docker Image Installs\n\n This example enables the kdb+ connector without authentication."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS=\"file,kdb\" \\\n -e DRIVERLESS_AI_KDB_HOSTNAME=\"\" \\\n -e DRIVERLESS_AI_KDB_PORT=\"\" \\\n -p 12345:12345 \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Docker Image with the config.toml\n\n This example shows how to configure kdb+ options in the config.toml file, and then specify that file when starting Driverless AI in Docker."
+ },
+ {
+ "output": " 1. Configure the Driverless AI config.toml file. Set the following configuration options. - ``enabled_file_systems = \"file, upload, kdb\"``\n - ``kdb_hostname = \"``\n - ``kdb_port = \"\"``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example enables the kdb+ connector without authentication."
+ },
+ {
+ "output": " 1. Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, kdb\"\n\n # KDB Connector credentials\n kdb_hostname = \"\n kdb_port = \"\"\n\n 3."
+ },
+ {
+ "output": " Example 2: Enable kdb+ with Authentication\n\n\n.. tabs::\n .. group-tab:: Docker Image Installs\n\n This example provides users credentials for accessing a kdb+ server from Driverless AI."
+ },
+ {
+ "output": " Note that this example enables kdb+ with no authentication. 1. Configure the Driverless AI config.toml file."
+ },
+ {
+ "output": " - ``enabled_file_systems = \"file, upload, kdb\"``\n - ``kdb_user = \"\"``\n - ``kdb_password = \"\"``\n - ``kdb_hostname = \"``\n - ``kdb_port = \"\"``\n - ``kdb_app_classpath = \"\"``\n - ``kdb_app_jvm_args = \"\"``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example provides users credentials for accessing a kdb+ server from Driverless AI."
+ },
+ {
+ "output": " Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, kdb\"\n\n # kdb+ Connector credentials\n kdb_user = \"\"\n kdb_password = \"\"\n kdb_hostname = \"\n kdb_port = \"\"\n kdb_app_classpath = \"\"\n kdb_app_jvm_args = \"\"\n\n 3."
+ },
+ {
+ "output": " Adding Datasets Using kdb+\n\n\nAfter the kdb+ connector is enabled, you can add datasets by selecting kdb+ from the Add Dataset (or Drag and Drop) drop-down menu."
+ },
+ {
+ "output": " 1. Enter filepath to save query. Enter the local file path for storing your dataset. For example, /home//myfile.csv."
+ },
+ {
+ "output": " 2. Enter KDB Query: Enter a kdb+ query that you want to execute. Note that the connector will accept any `q qeuries `__."
+ },
+ {
+ "output": " Data Recipe File Setup\n\n\nDriverless AI lets you explore data recipe file data sources from within the Driverless AI application."
+ },
+ {
+ "output": " When enabled (default), you will be able to modify datasets that have been added to Driverless AI. (Refer to :ref:`modify_by_recipe` for more information.)"
+ },
+ {
+ "output": " These steps are provided in case this connector was previously disabled and you want to re-enable it."
+ },
+ {
+ "output": " Use ``docker version`` to check which version of Docker you are using. Enable Data Recipe File\n~\n\n.. tabs::\n .. group-tab:: Docker Image Installs\n\n This example enables the data recipe file data connector."
+ },
+ {
+ "output": " Note that ``recipe_file`` is enabled in the config.toml file by default. 1. Configure the Driverless AI config.toml file."
+ },
+ {
+ "output": " - ``enabled_file_systems = \"file, upload, recipe_file\"``\n\n 2. Mount the config.toml file into the Docker container."
+ },
+ {
+ "output": " Note that ``recipe_file`` is enabled by default. 1. Export the Driverless AI config.toml file or add it to ~/.bashrc."
+ },
+ {
+ "output": " Specify the following configuration options in the config.toml file. ::\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, recipe_file\"\n\n 3."
+ },
+ {
+ "output": " BlueData DataTap Setup\n\n\nThis section provides instructions for configuring Driverless AI to work with BlueData DataTap. Note: Depending on your Docker install version, use either the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command when starting the Driverless AI Docker image."
+ },
+ {
+ "output": " Description of Configuration Attributes\n~\n\n- ``dtap_auth_type``: Selects DTAP authentication. Available values are:\n\n - ``noauth``: No authentication needed\n - ``principal``: Authenticate with DataTap with a principal user\n - ``keytab``: Authenticate with a Key tab (recommended)."
+ },
+ {
+ "output": " - ``keytabimpersonation``: Login with impersonation using a keytab\n\n- ``dtap_config_path``: The location of the DTAP (HDFS) config folder path. This folder can contain multiple config files. Note: The DTAP config file core-site.xml needs to contain DTap FS configuration, for example:\n\n ::\n\n \n \n fs.dtap.impl\n com.bluedata.hadoop.bdfs.Bdfs\n The FileSystem for BlueData dtap: URIs.\n \n \n\n- ``dtap_key_tab_path``: The path of the principal key tab file."
+ },
+ {
+ "output": " - ``dtap_app_principal_user``: The Kerberos app principal user (recommended). - ``dtap_app_login_user``: The user ID of the current user (for example, user@realm). - ``dtap_app_jvm_args``: JVM args for DTap distributions."
+ },
+ {
+ "output": " - ``dtap_app_classpath``: The DTap classpath. - ``dtap_init_path``: Specifies the starting DTAP path displayed in the UI of the DTAP browser. - ``enabled_file_systems``: The file systems you want to enable."
+ },
+ {
+ "output": " Example 1: Enable DataTap with No Authentication\n\n\n.. tabs::\n .. group-tab:: Docker Image Installs\n\n This example enables the DataTap data connector and disables authentication. It does not pass any configuration file; however it configures Docker DNS by passing the name and IP of the DTap name node."
+ },
+ {
+ "output": " (Note: The trailing slash is currently required for directories.) .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS=\"file,dtap\" \\\n -e DRIVERLESS_AI_DTAP_AUTH_TYPE='noauth' \\\n -p 12345:12345 \\\n -v /etc/passwd:/etc/passwd \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Docker Image with the config.toml\n\n This example shows how to configure DataTap options in the config.toml file, and then specify that file when starting Driverless AI in Docker."
+ },
+ {
+ "output": " 1. Configure the Driverless AI config.toml file. Set the following configuration options:\n\n - ``enabled_file_systems = \"file, upload, dtap\"``\n\n 2. Mount the config.toml file into the Docker container."
+ },
+ {
+ "output": " This allows users to reference data stored in DataTap directly using the name node address, for example: ``dtap://name.node/datasets/iris.csv`` or ``dtap://name.node/datasets/``. (Note: The trailing slash is currently required for directories.)"
+ },
+ {
+ "output": " Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n\n # File System Support\n # upload : standard upload feature\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n enabled_file_systems = \"file, dtap\"\n\n 3."
+ },
+ {
+ "output": " Example 2: Enable DataTap with Keytab-Based Authentication\n\n\nNotes: \n\n- If using Kerberos Authentication, the the time on the Driverless AI server must be in sync with Kerberos server. If the time difference between clients and DCs are 5 minutes or higher, there will be Kerberos failures."
+ },
+ {
+ "output": " .. tabs::\n .. group-tab:: Docker Image Installs\n\n This example:\n\n - Places keytabs in the ``/tmp/dtmp`` folder on your machine and provides the file path as described below. - Configures the environment variable ``DRIVERLESS_AI_DTAP_APP_PRINCIPAL_USER`` to reference a user for whom the keytab was created (usually in the form of user@realm)."
+ },
+ {
+ "output": " - Configures the option ``dtap_app_prinicpal_user`` to reference a user for whom the keytab was created (usually in the form of user@realm). 1. Configure the Driverless AI config.toml file. Set the following configuration options:\n\n - ``enabled_file_systems = \"file, upload, dtap\"``\n - ``dtap_auth_type = \"keytab\"``\n - ``dtap_key_tab_path = \"/tmp/\"``\n - ``dtap_app_principal_user = \"\"``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example:\n\n - Places keytabs in the ``/tmp/dtmp`` folder on your machine and provides the file path as described below."
+ },
+ {
+ "output": " 1. Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n\n # File System Support\n # file : local file system/server file system\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n enabled_file_systems = \"file, dtap\"\n\n # Blue Data DTap connector settings are similar to HDFS connector settings."
+ },
+ {
+ "output": " If running\n # DAI as a service, then the Kerberos keytab needs to\n # be owned by the DAI user. # keytabimpersonation : Login with impersonation using a keytab\n dtap_auth_type = \"keytab\"\n\n # Path of the principal key tab file\n dtap_key_tab_path = \"/tmp/\"\n\n # Kerberos app principal user (recommended)\n dtap_app_principal_user = \"\"\n\n 3."
+ },
+ {
+ "output": " Example 3: Enable DataTap with Keytab-Based Impersonation\n~\n\nNotes: \n\n- If using Kerberos, be sure that the Driverless AI time is synched with the Kerberos server. - If running Driverless AI as a service, then the Kerberos keytab needs to be owned by the Driverless AI user."
+ },
+ {
+ "output": " - Configures the ``DRIVERLESS_AI_DTAP_APP_PRINCIPAL_USER`` variable, which references a user for whom the keytab was created (usually in the form of user@realm). - Configures the ``DRIVERLESS_AI_DTAP_APP_LOGIN_USER`` variable, which references a user who is being impersonated (usually in the form of user@realm)."
+ },
+ {
+ "output": " - Configures the ``dtap_app_principal_user`` variable, which references a user for whom the keytab was created (usually in the form of user@realm). - Configures the ``dtap_app_login_user`` variable, which references a user who is being impersonated (usually in the form of user@realm)."
+ },
+ {
+ "output": " Configure the Driverless AI config.toml file. Set the following configuration options:\n\n - ``enabled_file_systems = \"file, upload, dtap\"``\n - ``dtap_auth_type = \"keytabimpersonation\"``\n - ``dtap_key_tab_path = \"/tmp/\"``\n - ``dtap_app_principal_user = \"\"``\n - ``dtap_app_login_user = \"\"``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example:\n\n - Places keytabs in the ``/tmp/dtmp`` folder on your machine and provides the file path as described below."
+ },
+ {
+ "output": " - Configures the ``dtap_app_login_user`` variable, which references a user who is being impersonated (usually in the form of user@realm). 1. Export the Driverless AI config.toml file or add it to ~/.bashrc."
+ },
+ {
+ "output": " Specify the following configuration options in the config.toml file. ::\n \n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, dtap\"\n\n # Blue Data DTap connector settings are similar to HDFS connector settings."
+ },
+ {
+ "output": " If running\n # DAI as a service, then the Kerberos keytab needs to\n # be owned by the DAI user. # keytabimpersonation : Login with impersonation using a keytab\n dtap_auth_type = \"keytabimpersonation\"\n\n # Path of the principal key tab file\n dtap_key_tab_path = \"/tmp/\"\n\n # Kerberos app principal user (recommended)\n dtap_app_principal_user = \"\"\n \n # Specify the user id of the current user here as user@realm\n dtap_app_login_user = \"\"\n\n 3."
+ },
+ {
+ "output": " Data Recipe URL Setup\n-\n\nDriverless AI lets you explore data recipe URL data sources from within the Driverless AI application. This section provides instructions for configuring Driverless AI to work with data recipe URLs."
+ },
+ {
+ "output": " (Refer to :ref:`modify_by_recipe` for more information.) Notes:\n\n- This connector is enabled by default. These steps are provided in case this connector was previously disabled and you want to re-enable it."
+ },
+ {
+ "output": " Use ``docker version`` to check which version of Docker you are using. Enable Data Recipe URL\n\n\n.. tabs::\n .. group-tab:: Docker Image Installs\n\n This example enables the data recipe URL data connector."
+ },
+ {
+ "output": " Note that ``recipe_url`` is enabled in the config.toml file by default. 1. Configure the Driverless AI config.toml file. Set the following configuration options. - ``enabled_file_systems = \"file, upload, recipe_url\"``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example enables the Data Recipe URL data connector."
+ },
+ {
+ "output": " 1. Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " AutoDoc Settings\n\n\nThis section includes settings that can be used to configure AutoDoc. ``make_autoreport``\n~\n\n.. dropdown:: Make AutoDoc\n\t:open:\n\n\tSpecify whether to create an AutoDoc for the experiment after it has finished running."
+ },
+ {
+ "output": " ``autodoc_report_name``\n~\n\n.. dropdown:: AutoDoc Name\n\t:open:\n\n\tSpecify a name for the AutoDoc report. This is set to \"report\" by default. ``autodoc_template``\n\n\n.. dropdown:: AutoDoc Template Location\n\t:open:\n\n\tSpecify a path for the AutoDoc template:\n\n\t- To generate a custom AutoDoc template, specify the full path to your custom template."
+ },
+ {
+ "output": " ``autodoc_output_type``\n~\n\n.. dropdown:: AutoDoc File Output Type\n\t:open:\n\n\tSpecify the AutoDoc output type. Choose from the following file types:\n\n\t- docx (Default)\n\t- md\n\n``autodoc_subtemplate_type``\n\n\n.. dropdown:: AutoDoc SubTemplate Type\n\t:open:\n\n\tSpecify the type of sub-templates to use."
+ },
+ {
+ "output": " This value defaults to 10. ``autodoc_num_features``\n\n\n.. dropdown:: Number of Top Features to Document\n\t:open:\n\n\tSpecify the number of top features to display in the document. To disable this setting, specify -1."
+ },
+ {
+ "output": " ``autodoc_min_relative_importance``\n~\n\n.. dropdown:: Minimum Relative Feature Importance Threshold\n\t:open:\n\n\tSpecify the minimum relative feature importance in order for a feature to be displayed. This value must be a float >= 0 and <= 1."
+ },
+ {
+ "output": " ``autodoc_include_permutation_feature_importance``\n\n\n.. dropdown:: Permutation Feature Importance\n\t:open:\n\n\tSpecify whether to compute permutation-based feature importance. This is disabled by default."
+ },
+ {
+ "output": " This is set to 1 by default. ``autodoc_feature_importance_scorer``\n~\n\n.. dropdown:: Feature Importance Scorer\n\t:open:\n\n\tSpecify the name of the scorer to be used when calculating feature importance. Leave this setting unspecified to use the default scorer for the experiment."
+ },
+ {
+ "output": " ``autodoc_pd_max_runtime``\n\n\n.. dropdown:: PDP Max Runtime in Seconds\n\t:open:\n\n\tSpecify the maximum number of seconds Partial Dependency computation can take when generating a report. Set this value to -1 to disable the time limit."
+ },
+ {
+ "output": " ``autodoc_out_of_range``\n\n\n.. dropdown:: PDP Out of Range\n\t:open:\n\n\tSpecify the number of standard deviations outside of the range of a column to include in partial dependence plots. This shows how the model reacts to data it has not seen before."
+ },
+ {
+ "output": " ``autodoc_num_rows``\n\n\n.. dropdown:: ICE Number of Rows\n\t:open:\n\n\tSpecify the number of rows to include in PDP and ICE plots if individual rows are not specified. This is set to 0 by default. ``autodoc_population_stability_index``\n\n\n.. dropdown:: Population Stability Index\n\t:open:\n\n\tSpecify whether to include a population stability index if the experiment is a binary classification or regression problem."
+ },
+ {
+ "output": " ``autodoc_population_stability_index_n_quantiles``\n\n\n.. dropdown:: Population Stability Index Number of Quantiles\n\t:open:\n\n\tSpecify the number of quantiles to use for the population stability index. This is set to 10 by default."
+ },
+ {
+ "output": " This value is disabled by default. ``autodoc_prediction_stats_n_quantiles``\n\n\n.. dropdown:: Prediction Statistics Number of Quantiles\n\t:open:\n\n\tSpecify the number of quantiles to use for prediction statistics."
+ },
+ {
+ "output": " ``autodoc_response_rate``\n~\n\n.. dropdown:: Response Rates Plot\n\t:open:\n\n\tSpecify whether to include response rates information if the experiment is a binary classification problem. This is disabled by default."
+ },
+ {
+ "output": " This is set to 10 by default. ``autodoc_gini_plot``\n~\n\n.. dropdown:: Show GINI Plot\n\t:open:\n\n\tSpecify whether to show the GINI plot. This is disabled by default. ``autodoc_enable_shapley_values``\n~\n\n.. dropdown:: Enable Shapley Values\n\t:open:\n\n\tSpecify whether to show Shapley values results in the AutoDoc."
+ },
+ {
+ "output": " ``autodoc_data_summary_col_num``\n\n\n.. dropdown:: Number of Features in Data Summary Table\n\t:open:\n\n\tSpecify the number of features to be shown in the data summary table. This value must be an integer."
+ },
+ {
+ "output": " This is set to -1 by default. ``autodoc_list_all_config_settings``\n\n\n.. dropdown:: List All Config Settings\n\t:open:\n\n\tSpecify whether to show all config settings. If this is disabled, only settings that have been changed are listed."
+ },
+ {
+ "output": " This is disabled by default. ``autodoc_keras_summary_line_length``\n~\n\n.. dropdown:: Keras Model Architecture Summary Line Length\n\t:open:\n\n\tSpecify the line length of the Keras model architecture summary."
+ },
+ {
+ "output": " To use the default line length, set this value to -1 (default). ``autodoc_transformer_architecture_max_lines``\n\n\n.. dropdown:: NLP/Image Transformer Architecture Max Lines\n\t:open:\n\n\tSpecify the maximum number of lines shown for advanced transformer architecture in the Feature section."
+ },
+ {
+ "output": " ``autodoc_full_architecture_in_appendix``\n~\n\n.. dropdown:: Appendix NLP/Image Transformer Architecture\n\t:open:\n\n\tSpecify whether to show the full NLP/Image transformer architecture in the appendix. This is disabled by default."
+ },
+ {
+ "output": " This is disabled by default. ``autodoc_coef_table_num_models``\n~\n\n.. dropdown:: GLM Coefficient Tables Number of Models\n\t:open:\n\n\tSpecify the number of models for which a GLM coefficients table is shown in the AutoDoc."
+ },
+ {
+ "output": " Set this value to -1 to show tables for all models. This is set to 1 by default. ``autodoc_coef_table_num_folds``\n\n\n.. dropdown:: GLM Coefficient Tables Number of Folds Per Model\n\t:open:\n\n\tSpecify the number of folds per model for which a GLM coefficients table is shown in the AutoDoc."
+ },
+ {
+ "output": " ``autodoc_coef_table_num_coef``\n~\n\n.. dropdown:: GLM Coefficient Tables Number of Coefficients\n\t:open:\n\n\tSpecify the number of coefficients to show within a GLM coefficients table in the AutoDoc. This is set to 50 by default."
+ },
+ {
+ "output": " ``autodoc_coef_table_num_classes``\n\n\n.. dropdown:: GLM Coefficient Tables Number of Classes\n\t:open:\n\n\tSpecify the number of classes to show within a GLM coefficients table in the AutoDoc. Set this value to -1 to show all classes."
+ },
+ {
+ "output": " Snowflake Setup\n- \n\nDriverless AI allows you to explore Snowflake data sources from within the Driverless AI application. This section provides instructions for configuring Driverless AI to work with Snowflake."
+ },
+ {
+ "output": " If you enable Snowflake connectors, those file systems will be available in the UI, but you will not be able to use those connectors without authentication. Note: Depending on your Docker install version, use either the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command when starting the Driverless AI Docker image."
+ },
+ {
+ "output": " Description of Configuration Attributes\n~\n\n- ``snowflake_account``: The Snowflake account ID\n- ``snowflake_user``: The username for accessing the Snowflake account\n- ``snowflake_password``: The password for accessing the Snowflake account\n- ``enabled_file_systems``: The file systems you want to enable."
+ },
+ {
+ "output": " Enable Snowflake with Authentication\n\n\n.. tabs::\n .. group-tab:: Docker Image Installs\n\n This example enables the Snowflake data connector with authentication by passing the ``account``, ``user``, and ``password`` variables."
+ },
+ {
+ "output": " 1. Configure the Driverless AI config.toml file. Set the following configuration options. - ``enabled_file_systems = \"file, snow\"``\n - ``snowflake_account = \"\"``\n - ``snowflake_user = \"\"``\n - ``snowflake_password = \"\"``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n \n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example enables the Snowflake data connector with authentication by passing the ``account``, ``user``, and ``password`` variables."
+ },
+ {
+ "output": " Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, snow\"\n\n # Snowflake Connector credentials\n snowflake_account = \"\"\n snowflake_user = \"\"\n snowflake_password = \"\"\n\n 3."
+ },
+ {
+ "output": " Adding Datasets Using Snowflake\n \n\nAfter the Snowflake connector is enabled, you can add datasets by selecting Snowflake from the Add Dataset (or Drag and Drop) drop-down menu. .. figure:: ../images/add_dataset_dropdown.png\n :alt: Add Dataset\n :height: 338\n :width: 237\n\nSpecify the following information to add your dataset."
+ },
+ {
+ "output": " Enter Database: Specify the name of the Snowflake database that you are querying. 2. Enter Warehouse: Specify the name of the Snowflake warehouse that you are querying. 3. Enter Schema: Specify the schema of the dataset that you are querying."
+ },
+ {
+ "output": " Enter Name for Dataset to Be Saved As: Specify a name for the dataset to be saved as. Note that this can only be a CSV file (for example, myfile.csv). 5. Enter Username: (Optional) Specify the username associated with this Snowflake account."
+ },
+ {
+ "output": " 6. Enter Password: (Optional) Specify the password associated with this Snowflake account. This can be left blank if ``snowflake_password`` was specified in the config.toml when starting Driverless AI; otherwise, this field is required."
+ },
+ {
+ "output": " Enter Role: (Optional) Specify your role as designated within Snowflake. See https://docs.snowflake.net/manuals/user-guide/security-access-control-overview.html for more information. 8. Enter Region: (Optional) Specify the region of the warehouse that you are querying."
+ },
+ {
+ "output": " This is optional and can also be left blank if ``snowflake_url`` was specified with a ```` in the config.toml when starting Driverless AI. 9. Enter File Formatting Parameters: (Optional) Specify any additional parameters for formatting your datasets."
+ },
+ {
+ "output": " (Note: Use only parameters for ``TYPE = CSV``.) For example, if your dataset includes a text column that contains commas, you can specify a different delimiter using ``FIELD_DELIMITER='character'``. Multiple parameters must be separated with spaces:\n\n ::\n\n FIELD_DELIMITER=',' FIELD_OPTIONALLY_ENCLOSED_BY=\"\" SKIP_BLANK_LINES=TRUE\n\n Note: Be sure that the specified delimiter is not also used as a character within a cell; otherwise an error will occur."
+ },
+ {
+ "output": " To prevent this from occuring, add ``NULL_IF=()`` to the input of FILE FORMATTING PARAMETERS. 10. Enter Snowflake Query: Specify the Snowflake query that you want to execute. 11. When you are finished, select the Click to Make Query button to add the dataset."
+ },
+ {
+ "output": " .. _install-on-windows:\n\nWindows 10\n\n\nThis section describes how to install, start, stop, and upgrade Driverless AI on a Windows 10 machine. The installation steps assume that you have a license key for Driverless AI."
+ },
+ {
+ "output": " Once obtained, you will be prompted to paste the license key into the Driverless AI UI when you first log in, or you can save it as a .sig file and place it in the \\license folder that you will create during the installation process."
+ },
+ {
+ "output": " Notes:\n\n- GPU support is not available on Windows. - Scoring is not available on Windows. Caution: Installing Driverless AI on Windows 10 is not recommended for serious use. Environment\n~\n\n+-+-+-+-+\n| Operating System | GPU Support?"
+ },
+ {
+ "output": " Refer to https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/hyper-v-requirements for more information. Docker Image Installation\n~\n\nNotes: \n\n- Be aware that there are known issues with Docker for Windows."
+ },
+ {
+ "output": " - Consult with your Windows System Admin if \n\n - Your corporate environment does not allow third-part software installs\n - You are running Windows Defender\n - You your machine is not running with ``Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux``."
+ },
+ {
+ "output": " Note that some of the images in this video may change between releases, but the installation steps remain the same. Requirements\n'\n\n- Windows 10 Pro / Enterprise / Education\n- Docker Desktop for Windows 2.2.0.3 (42716)\n\nNote: As of this writing, Driverless AI has only been tested on Docker Desktop for Windows version 2.2.0.3 (42716)."
+ },
+ {
+ "output": " Retrieve the Driverless AI Docker image from https://www.h2o.ai/download/. 2. Download, install, and run Docker for Windows from https://docs.docker.com/docker-for-windows/install/. You can verify that Docker is running by typing ``docker version`` in a terminal (such as Windows PowerShell)."
+ },
+ {
+ "output": " 3. Before running Driverless AI, you must:\n\n - Enable shared access to the C drive. Driverless AI will not be able to see your local data if this is not set. - Adjust the amount of memory given to Docker to be at least 10 GB."
+ },
+ {
+ "output": " - Optionally adjust the number of CPUs given to Docker. You can adjust these settings by clicking on the Docker whale in your taskbar (look for hidden tasks, if necessary), then selecting Settings > Shared Drive and Settings > Advanced as shown in the following screenshots."
+ },
+ {
+ "output": " (Docker will restart.) Note that if you cannot make changes, stop Docker and then start Docker again by right clicking on the Docker icon on your desktop and selecting Run as Administrator. .. image:: ../images/windows_docker_menu_bar.png\n :align: center\n :width: 252\n :height: 262\n\n\\\n\n .. image:: ../images/windows_shared_drive_access.png\n :align: center\n :scale: 40%\n\n\\\n\n .. image:: ../images/windows_docker_advanced_preferences.png\n :align: center\n :width: 502\n :height: 326\n\n4."
+ },
+ {
+ "output": " With Docker running, navigate to the location of your downloaded Driverless AI image. Move the downloaded Driverless AI image to your new directory. 6. Change directories to the new directory, then load the image using the following command:\n\n .. code-block:: bash\n :substitutions:\n \n cd |VERSION-dir|\n docker load -i .\\dai-docker-ubi8-x86_64-|VERSION-long|.tar.gz\n\n7."
+ },
+ {
+ "output": " .. code-block:: bash\n\n md data\n md log\n md license\n md tmp\n\n8. Copy data into the /data directory. The data will be visible inside the Docker container at /data. 9. Run ``docker images`` to find the image tag."
+ },
+ {
+ "output": " Start the Driverless AI Docker image. Be sure to replace ``path_to_`` below with the entire path to the location of the folders that you created (for example, \"c:/Users/user-name/driverlessai_folder/data\")."
+ },
+ {
+ "output": " GPU support will not be available. Note that from version 1.10 DAI docker image runs with internal ``tini`` that is equivalent to using ``init`` from docker, if both are enabled in the launch command, tini prints a (harmless) warning message."
+ },
+ {
+ "output": " But if user plans to build :ref:`image auto model ` extensively, then ``shm-size=2g`` is recommended for Driverless AI docker command. .. code-block:: bash\n :substitutions:\n\n docker run pid=host rm shm-size=256m -p 12345:12345 -v c:/path_to_data:/data -v c:/path_to_log:/log -v c:/path_to_license:/license -v c:/path_to_tmp:/tmp h2oai/dai-ubi8-x86_64:|tag|\n\n11."
+ },
+ {
+ "output": " Add Custom Recipes\n\n\nCustom recipes are Python code snippets that can be uploaded into Driverless AI at runtime like plugins. Restarting Driverless AI is not required. If you do not have a custom recipe, you can select from a number of recipes available in the `Recipes for H2O Driverless AI repository `_."
+ },
+ {
+ "output": " To add a custom recipe to Driverless AI, click Add Custom Recipe and select one of the following options:\n\n- From computer: Add a custom recipe as a Python or ZIP file from your local file system. - From URL: Add a custom recipe from a URL."
+ },
+ {
+ "output": " To use this option, your Bitbucket username and password must be provided along with the custom recipe Bitbucket URL. Official Recipes (Open Source)\n\n\nTo access `H2O's official recipes repository `_, click Official Recipes (Open Source)."
+ },
+ {
+ "output": " If you change the default value of an expert setting from the Expert Settings window, that change is displayed in the TOML configuration editor. For example, if you set the Make MOJO scoring pipeline setting in the Experiment tab to Off, then the line ``make_mojo_scoring_pipeline = \"off\"`` is displayed in the TOML editor."
+ },
+ {
+ "output": " To confirm your changes, click Save. The experiment preview updates to reflect your specified configuration changes. For a full list of available settings, see :ref:`expert-settings`. .. note::\n\tDo not edit the section below the ``[recipe_activation]`` line."
+ },
+ {
+ "output": " .. _h2o_drive:\n\n###############\nH2O Drive setup\n###############\n\nH2O Drive is an object-store for `H2O AI Cloud `_. This page describes how to configure Driverless AI to work with H2O Drive."
+ },
+ {
+ "output": " Description of relevant configuration attributes\n\n\nThe following are descriptions of the relevant configuration attributes when enabling the H2O AI Feature Store data connector:\n\n- ``enabled_file_systems``: A list of file systems you want to enable."
+ },
+ {
+ "output": " - ``h2o_drive_endpoint_url``: The H2O Drive server endpoint URL. - ``h2o_drive_access_token_scopes``: A space-separated list of OpenID scopes for the access token that are used by the H2O Drive connector."
+ },
+ {
+ "output": " - ``authentication_method``: The authentication method used by DAI. When enabling the Feature Store data connector, this must be set to OpenID Connect (``authentication_method=\"oidc\"``). For information on setting up OIDC Authentication in Driverless AI, see :ref:`oidc_auth`."
+ },
+ {
+ "output": " .. _install-on-macosx:\n\nMac OS X\n\n\nThis section describes how to install, start, stop, and upgrade the Driverless AI Docker image on Mac OS X. Note that this uses regular Docker and not NVIDIA Docker."
+ },
+ {
+ "output": " The installation steps assume that you have a license key for Driverless AI. For information on how to obtain a license key for Driverless AI, visit https://h2o.ai/o/try-driverless-ai/. Once obtained, you will be prompted to paste the license key into the Driverless AI UI when you first log in, or you can save it as a .sig file and place it in the \\license folder that you will create during the installation process."
+ },
+ {
+ "output": " Stick to small datasets! For serious use, please use Linux. - Be aware that there are known performance issues with Docker for Mac. More information is available here: https://docs.docker.com/docker-for-mac/osxfs/#technology."
+ },
+ {
+ "output": " | Min Mem | Suitable for |\n+=+=+=+=+\n| Mac OS X | No | 16 GB | Experimentation |\n+-+-+-+-+\n\nInstalling Driverless AI\n\n\n1. Retrieve the Driverless AI Docker image from https://www.h2o.ai/download/."
+ },
+ {
+ "output": " Download and run Docker for Mac from https://docs.docker.com/docker-for-mac/install. 3. Adjust the amount of memory given to Docker to be at least 10 GB. Driverless AI won't run at all with less than 10 GB of memory."
+ },
+ {
+ "output": " You will find the controls by clicking on (Docker Whale)->Preferences->Advanced as shown in the following screenshots. (Don't forget to Apply the changes after setting the desired memory value.) .. image:: ../images/macosx_docker_menu_bar.png\n :align: center\n\n.. image:: ../images/macosx_docker_advanced_preferences.png\n :align: center\n :height: 507\n :width: 382\n\n4."
+ },
+ {
+ "output": " More information is available here: https://docs.docker.com/docker-for-mac/osxfs/#namespaces. .. image:: ../images/macosx_docker_filesharing.png\n :align: center\n :scale: 40%\n\n5. Set up a directory for the version of Driverless AI within the Terminal: \n\n .. code-block:: bash\n :substitutions:\n\n mkdir |VERSION-dir|\n\n6."
+ },
+ {
+ "output": " 7. Change directories to the new directory, then load the image using the following command:\n\n .. code-block:: bash\n :substitutions:\n\n cd |VERSION-dir|\n docker load < dai-docker-ubi8-x86_64-|VERSION-long|.tar.gz\n\n8."
+ },
+ {
+ "output": " Optionally copy data into the data directory on the host. The data will be visible inside the Docker container at /data. You can also upload data after starting Driverless AI. 10. Run ``docker images`` to find the image tag."
+ },
+ {
+ "output": " Start the Driverless AI Docker image (still within the new Driverless AI directory). Replace TAG below with the image tag. Note that GPU support will not be available. Note that from version 1.10 DAI docker image runs with internal ``tini`` that is equivalent to using ``init`` from docker, if both are enabled in the launch command, tini prints a (harmless) warning message."
+ },
+ {
+ "output": " But if user plans to build :ref:`image auto model ` extensively, then ``shm-size=2g`` is recommended for Driverless AI docker command. .. code-block:: bash\n :substitutions:\n\n docker run \\\n pid=host \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n12."
+ },
+ {
+ "output": " Stopping the Docker Image\n~\n\n.. include:: stop-docker.rst\n\nUpgrading the Docker Image\n\n\nThis section provides instructions for upgrading Driverless AI versions that were installed in a Docker container."
+ },
+ {
+ "output": " WARNING: Experiments, MLIs, and MOJOs reside in the Driverless AI tmp directory and are not automatically upgraded when Driverless AI is upgraded. - Build MLI models before upgrading. - Build MOJO pipelines before upgrading."
+ },
+ {
+ "output": " If you did not build MLI on a model before upgrading Driverless AI, then you will not be able to view MLI on that model after upgrading. Before upgrading, be sure to run MLI jobs on models that you want to continue to interpret in future releases."
+ },
+ {
+ "output": " If you did not build a MOJO pipeline on a model before upgrading Driverless AI, then you will not be able to build a MOJO pipeline on that model after upgrading. Before upgrading, be sure to build MOJO pipelines on all desired models and then back up your Driverless AI tmp directory."
+ },
+ {
+ "output": " Upgrade Steps\n'\n\n1. SSH into the IP address of the machine that is running Driverless AI. 2. Set up a directory for the version of Driverless AI on the host machine:\n\n .. code-block:: bash\n :substitutions:\n\n # Set up directory with the version name\n mkdir |VERSION-dir|\n\n # cd into the new directory\n cd |VERSION-dir|\n\n3."
+ },
+ {
+ "output": " 4. Load the Driverless AI Docker image inside the new directory:\n\n .. code-block:: bash\n :substitutions:\n\n # Load the Driverless AI docker image\n docker load < dai-docker-ubi8-x86_64-|VERSION-long|.tar.gz\n\n5."
+ },
+ {
+ "output": " .. _features-settings:\n\nFeatures Settings\n=\n\n``feature_engineering_effort``\n\n\n.. dropdown:: Feature Engineering Effort\n\t:open:\n\n\tSpecify a value from 0 to 10 for the Driverless AI feature engineering effort."
+ },
+ {
+ "output": " This value defaults to 5. - 0: Keep only numeric features. Only model tuning during evolution. - 1: Keep only numeric features and frequency-encoded categoricals. Only model tuning during evolution. - 2: Similar to 1 but instead just no Text features."
+ },
+ {
+ "output": " - 3: Similar to 5 but only tuning during evolution. Mixed tuning of features and model parameters. - 4: Similar to 5 but slightly more focused on model tuning. - 5: Balanced feature-model tuning. (Default)\n\t- 6-7: Similar to 5 but slightly more focused on feature engineering."
+ },
+ {
+ "output": " - 9-10: Similar to 8 but no model tuning during feature evolution. .. _check_distribution_shift:\n\n``check_distribution_shift``\n\n\n.. dropdown:: Data Distribution Shift Detection\n\t:open:\n\n\tSpecify whether Driverless AI should detect data distribution shifts between train/valid/test datasets (if provided)."
+ },
+ {
+ "output": " Currently, this information is only presented to the user and not acted upon. Shifted features should either be dropped. Or more meaningful aggregate features be created by using them as labels or bins."
+ },
+ {
+ "output": " .. _check_distribution_shift_drop:\n\n``check_distribution_shift_drop``\n~\n\n.. dropdown:: Data Distribution Shift Detection Drop of Features\n\t:open:\n\n\tSpecify whether to drop high-shift features. This defaults to Auto."
+ },
+ {
+ "output": " Also see :ref:`drop_features_distribution_shift_threshold_auc ` and :ref:`check_distribution_shift `. .. _drop_features_distribution_shift_threshold_auc:\n\n``drop_features_distribution_shift_threshold_auc``\n\n\n.. dropdown:: Max Allowed Feature Shift (AUC) Before Dropping Feature\n\t:open:\n\n\tSpecify the maximum allowed AUC value for a feature before dropping the feature."
+ },
+ {
+ "output": " This model includes an AUC value. If this AUC, GINI, or Spearman correlation of the model is above the specified threshold, then Driverless AI will consider it a strong enough shift to drop those features."
+ },
+ {
+ "output": " .. _check_leakage:\n\n``check_leakage``\n~\n\n.. dropdown:: Data Leakage Detection\n\t:open:\n\n\tSpecify whether to check for data leakage for each feature. Some of the features may contain over predictive power on the target column."
+ },
+ {
+ "output": " Driverless AI runs a model to determine the predictive power of each feature on the target variable. Then, a simple model is built on each feature with significant variable importance. The models with high AUC (for classification) or R2 score (regression) are reported to the user as potential leak."
+ },
+ {
+ "output": " This is set to Auto by default. The equivalent config.toml parameter is ``check_leakage``. Also see :ref:`drop_features_leakage_threshold_auc `\n\n.. _drop_features_leakage_threshold_auc:\n\n``drop_features_leakage_threshold_auc``\n~\n\n.. dropdown:: Data Leakage Detection Dropping AUC/R2 Threshold\n\t:open:\n\n\tIf :ref:`Leakage Detection ` is enabled, specify the threshold for dropping features."
+ },
+ {
+ "output": " This value defaults to 0.999. The equivalent config.toml parameter is ``drop_features_leakage_threshold_auc``. ``leakage_max_data_size``\n~\n\n.. dropdown:: Max Rows X Columns for Leakage\n\t:open:\n\n\tSpecify the maximum number of (rows x columns) to trigger sampling for leakage checks."
+ },
+ {
+ "output": " ``max_features_importance``\n~\n\n.. dropdown:: Max. num. features for variable importance\n\t:open:\n\n\tSpecify the maximum number of features to use and show in importance tables. For any interpretability higher than 1, transformed or original features with low importance than top max_features_importance features are always removed Feature importances of transformed or original features correspondingly will be pruned."
+ },
+ {
+ "output": " .. _enable_wide_rules:\n\n``enable_wide_rules``\n~\n\n.. dropdown:: Enable Wide Rules\n\t:open:\n\n\tEnable various rules to handle wide datasets( i.e no. of columns > no. of rows). The default value is \"auto\", that will automatically enable the wide rules when detect that number of columns is greater than number of rows."
+ },
+ {
+ "output": " Enabling wide data rules sets all ``max_cols``, ``max_orig_*col``, and ``fs_orig*`` tomls to large values, and enforces monotonicity to be disabled unless ``monotonicity_constraints_dict`` is set or default value of ``monotonicity_constraints_interpretability_switch`` is changed."
+ },
+ {
+ "output": " And enables :ref:`Xgboost Random Forest model ` for modeling. To disable wide rules, set enable_wide_rules to \"off\". For mostly or entirely numeric datasets, selecting only 'OriginalTransformer' for faster speed is recommended (see :ref:`included_transformers `)."
+ },
+ {
+ "output": " ``orig_features_fs_report``\n~\n\n.. dropdown:: Report Permutation Importance on Original Features\n\t:open:\n\n\tSpecify whether Driverless AI reports permutation importance on original features (represented as normalized change in the chosen metric) in logs and the report file."
+ },
+ {
+ "output": " ``max_rows_fs``\n~\n\n.. dropdown:: Maximum Number of Rows to Perform Permutation-Based Feature Selection\n\t:open:\n\n\tSpecify the maximum number of rows when performing permutation feature importance, reduced by (stratified) random sampling."
+ },
+ {
+ "output": " ``max_orig_cols_selected``\n\n\n.. dropdown:: Max Number of Original Features Used\n\t:open:\n\n\tSpecify the maximum number of columns to be selected from an existing set of columns using feature selection."
+ },
+ {
+ "output": " For categorical columns, the selection is based upon how well target encoding (or frequency encoding if not available) on categoricals and numerics treated as categoricals helps. This is useful to reduce the final model complexity."
+ },
+ {
+ "output": " ``max_orig_nonnumeric_cols_selected``\n~\n\n.. dropdown:: Max Number of Original Non-Numeric Features\n\t:open:\n\n\tMaximum number of non-numeric columns selected, above which will do feature selection on all features and avoid treating numerical as categorical same as above (max_orig_numeric_cols_selected) but for categorical columns."
+ },
+ {
+ "output": " This value defaults to 300. ``fs_orig_cols_selected``\n~\n\n.. dropdown:: Max Number of Original Features Used for FS Individual\n\t:open:\n\n\tSpecify the maximum number of features you want to be selected in an experiment."
+ },
+ {
+ "output": " Additional columns above the specified value add special individual with original columns reduced. ``fs_orig_numeric_cols_selected``\n~\n\n.. dropdown:: Number of Original Numeric Features to Trigger Feature Selection Model Type\n\t:open:\n\n\tThe maximum number of original numeric columns, above which Driverless AI will do feature selection."
+ },
+ {
+ "output": " A separate individual in the :ref:`genetic algorithm ` is created by doing feature selection by permutation importance on original features. This value defaults to 10,000000. ``fs_orig_nonnumeric_cols_selected``\n\n\n.. dropdown:: Number of Original Non-Numeric Features to Trigger Feature Selection Model Type\n\t:open:\n\n\tThe maximum number of original non-numeric columns, above which Driverless AI will do feature selection on all features."
+ },
+ {
+ "output": " A separate individual in the :ref:`genetic algorithm ` is created by doing feature selection by permutation importance on original features. This value defaults to 200. ``max_relative_cardinality``\n\n\n.. dropdown:: Max Allowed Fraction of Uniques for Integer and Categorical Columns\n\t:open:\n\n\tSpecify the maximum fraction of unique values for integer and categorical columns."
+ },
+ {
+ "output": " This value defaults to 0.95. .. _num_as_cat:\n\n``num_as_cat``\n\n\n.. dropdown:: Allow Treating Numerical as Categorical\n\t:open:\n\n\tSpecify whether to allow some numerical features to be treated as categorical features."
+ },
+ {
+ "output": " The equivalent config.toml parameter is ``num_as_cat``. ``max_int_as_cat_uniques``\n\n\n.. dropdown:: Max Number of Unique Values for Int/Float to be Categoricals\n\t:open:\n\n\tSpecify the number of unique values for integer or real columns to be treated as categoricals."
+ },
+ {
+ "output": " ``max_fraction_invalid_numeric``\n\n\n.. dropdown:: Max. fraction of numeric values to be non-numeric (and not missing) for a column to still be considered numeric\n\t:open:\n\n\tWhen the fraction of non-numeric (and non-missing) values is less or equal than this value, consider the column numeric."
+ },
+ {
+ "output": " Note: Replaces non-numeric values with missing values at start of experiment, so some information is lost, but column is now treated as numeric, which can help. Disabled if < 0. .. _nfeatures_max:\n\n``nfeatures_max``\n~\n\n.. dropdown:: Max Number of Engineered Features\n\t:open:\n\n\tSpecify the maximum number of features to be included per model (and in each model within the final model if an ensemble)."
+ },
+ {
+ "output": " Final ensemble will exclude any pruned-away features and only train on kept features, but may contain a few new features due to fitting on different data view (e.g. new clusters). Final scoring pipeline will exclude any pruned-away features, but may contain a few new features due to fitting on different data view (e.g."
+ },
+ {
+ "output": " The default value of -1 means no restrictions are applied for this parameter except internally-determined memory and interpretability restrictions. Notes:\n\n\t * If ``interpretability`` > ``remove_scored_0gain_genes_in_postprocessing_above_interpretability`` (see :ref:`config.toml ` for reference), then every GA (:ref:`genetic algorithm `) iteration post-processes features down to this value just after scoring them."
+ },
+ {
+ "output": " * If ``ngenes_max`` is also not limited, then some individuals will have more genes and features until pruned by mutation or by preparation for final model. * E.g. to generally limit every iteration to exactly 1 features, one must set ``nfeatures_max`` = ``ngenes_max`` =1 and ``remove_scored_0gain_genes_in_postprocessing_above_interpretability`` = 0, but the genetic algorithm will have a harder time finding good features."
+ },
+ {
+ "output": " .. _ngenes_max:\n\n``ngenes_max``\n\n\n.. dropdown:: Max Number of Genes\n\t:open:\n\n\tSpecify the maximum number of genes (transformer instances) kept per model (and per each model within the final model for ensembles)."
+ },
+ {
+ "output": " If restriction occurs after scoring features, then aggregated gene importances are used for pruning genes. Instances includes all possible transformers, including original transformer for numeric features."
+ },
+ {
+ "output": " The equivalent config.toml parameter is ``ngenes_max``. ``features_allowed_by_interpretability``\n\n\n.. dropdown:: Limit Features by Interpretability\n\t:open:\n\n\tSpecify whether to limit feature counts with the Interpretability training setting as specified by the ``features_allowed_by_interpretability`` :ref:`config.toml ` setting."
+ },
+ {
+ "output": " This value defaults to 7. Also see :ref:`monotonic gbm recipe ` and :ref:`Monotonicity Constraints in Driverless AI ` for reference. .. _monotonicity-constraints-correlation-threshold:\n\n``monotonicity_constraints_correlation_threshold``\n\n\n.. dropdown:: Correlation Beyond Which to Trigger Monotonicity Constraints (if enabled)\n\t:open:\n\n\tSpecify the threshold of Pearson product-moment correlation coefficient between numerical or encoded transformed feature and target above (below negative for) which to use positive (negative) monotonicity for XGBoostGBM, LightGBM and Decision Tree models."
+ },
+ {
+ "output": " Note: This setting is only enabled when Interpretability is greater than or equal to the value specified by the :ref:`enable-constraints` setting and when the :ref:`constraints-override` setting is not specified."
+ },
+ {
+ "output": " ``monotonicity_constraints_log_level``\n\n\n.. dropdown:: Control amount of logging when calculating automatic monotonicity constraints (if enabled)\n\t:open:\n\n\tFor models that support monotonicity constraints, and if enabled, show automatically determined monotonicity constraints for each feature going into the model based on its correlation with the target."
+ },
+ {
+ "output": " 'medium' shows correlation of positively and negatively constraint features. 'high' shows all correlation values. Also see :ref:`monotonic gbm recipe ` and :ref:`Monotonicity Constraints in Driverless AI ` for reference."
+ },
+ {
+ "output": " Otherwise all features will be in the model. Only active when interpretability >= monotonicity_constraints_interpretability_switch or monotonicity_constraints_dict is provided. Also see :ref:`monotonic gbm recipe ` and :ref:`Monotonicity Constraints in Driverless AI ` for reference."
+ },
+ {
+ "output": " Original numeric features are mapped to the desired constraint:\n\n\t- 1: Positive constraint\n\t- -1: Negative constraint\n\t- 0: Constraint disabled\n\n\tConstraint is automatically disabled (set to 0) for features that are not in this list."
+ },
+ {
+ "output": " See :ref:`Monotonicity Constraints in Driverless AI ` for reference. .. _max-feature-interaction-depth:\n\n``max_feature_interaction_depth``\n~\n\n.. dropdown:: Max Feature Interaction Depth\n\t:open:\n\n\tSpecify the maximum number of features to use for interaction features like grouping for target encoding, weight of evidence, and other likelihood estimates."
+ },
+ {
+ "output": " The interaction can take multiple forms (i.e. feature1 + feature2 or feature1 * feature2 + \u2026 featureN). Although certain machine learning algorithms (like tree-based methods) can do well in capturing these interactions as part of their training process, still generating them may help them (or other algorithms) yield better performance."
+ },
+ {
+ "output": " Higher values might be able to make more predictive models at the expense of time. This value defaults to 8. Set Max Feature Interaction Depth to 1 to disable any feature interactions ``max_feature_interaction_depth=1``."
+ },
+ {
+ "output": " To use all features for each transformer, set this to be equal to the number of columns. To do a 50/50 sample and a fixed feature interaction depth of :math:`n` features, set this to -:math:`n`. ``enable_target_encoding``\n\n\n.. dropdown:: Enable Target Encoding\n\t:open:\n\n\tSpecify whether to use Target Encoding when building the model."
+ },
+ {
+ "output": " A simple example can be to use the mean of the target to replace each unique category of a categorical feature. These type of features can be very predictive but are prone to overfitting and require more memory as they need to store mappings of the unique categories and the target values."
+ },
+ {
+ "output": " The degree to which GINI is inaccurate is also used to perform fold-averaging of look-up tables instead of using global look-up tables. This is enabled by default. ``enable_lexilabel_encoding``\n~\n\n.. dropdown:: Enable Lexicographical Label Encoding\n\t:open:\n\n\tSpecify whether to enable lexicographical label encoding."
+ },
+ {
+ "output": " ``enable_isolation_forest``\n~\n\n.. dropdown:: Enable Isolation Forest Anomaly Score Encoding\n\t:open:\n\n\t`Isolation Forest `__ is useful for identifying anomalies or outliers in data."
+ },
+ {
+ "output": " This split depends on how long it takes to separate the points. Random partitioning produces noticeably shorter paths for anomalies. When a forest of random trees collectively produces shorter path lengths for particular samples, they are highly likely to be anomalies."
+ },
+ {
+ "output": " This is disabled by default. ``enable_one_hot_encoding``\n~\n\n.. dropdown:: Enable One HotEncoding\n\t:open:\n\n\tSpecify whether one-hot encoding is enabled. The default Auto setting is only applicable for small datasets and GLMs."
+ },
+ {
+ "output": " This value defaults to 200. ``drop_constant_columns``\n~\n\n.. dropdown:: Drop Constant Columns\n\t:open:\n\n\tSpecify whether to drop columns with constant values. This is enabled by default. ``drop_id_columns``\n~\n\n.. dropdown:: Drop ID Columns\n\t:open:\n\n\tSpecify whether to drop columns that appear to be an ID."
+ },
+ {
+ "output": " ``no_drop_features``\n\n\n.. dropdown:: Don't Drop Any Columns\n\t:open:\n\n\tSpecify whether to avoid dropping any columns (original or derived). This is disabled by default. .. _features_to_drop:\n\n``cols_to_drop``\n\n\n.. dropdown:: Features to Drop\n\t:open:\n\n\tSpecify which features to drop."
+ },
+ {
+ "output": " .. _cols_to_force_in:\n\n``cols_to_force_in``\n~\n\n.. dropdown:: Features to always keep or force in, e.g. \"G1\", \"G2\", \"G3\"\n\t:open:\n\n\tControl over columns to force-in. Forced-in features are handled by the most interpretable transformers allowed by the experiment options, and they are never removed (even if the model assigns 0 importance to them)."
+ },
+ {
+ "output": " When this field is left empty (default), Driverless AI automatically searches all columns (either at random or based on which columns have high variable importance). ``sample_cols_to_group_by``\n~\n\n.. dropdown:: Sample from Features to Group By\n\t:open:\n\n\tSpecify whether to sample from given features to group by or to always group all features."
+ },
+ {
+ "output": " ``agg_funcs_for_group_by``\n\n\n.. dropdown:: Aggregation Functions (Non-Time-Series) for Group By Operations\n\t:open:\n\n\tSpecify whether to enable aggregation functions to use for group by operations. Choose from the following (all are selected by default):\n\n\t- mean\n\t- sd\n\t- min\n\t- max\n\t- count\n\n``folds_for_group_by``\n\n\n.. dropdown:: Number of Folds to Obtain Aggregation When Grouping\n\t:open:\n\n\tSpecify the number of folds to obtain aggregation when grouping."
+ },
+ {
+ "output": " The default value is 5. .. _mutation_mode:\n\n``mutation_mode``\n~\n\n.. dropdown:: Type of Mutation Strategy\n\t:open:\n\n\tSpecify which strategy to apply when performing mutations on transformers. Select from the following:\n\n\t- sample: Sample transformer parameters (Default)\n\t- batched: Perform multiple types of the same transformation together\n\t- full: Perform more types of the same transformation together than the above strategy\n\n``dump_varimp_every_scored_indiv``\n\n\n.. dropdown:: Enable Detailed Scored Features Info\n\t:open:\n\n\tSpecify whether to dump every scored individual's variable importance (both derived and original) to a csv/tabulated/json file."
+ },
+ {
+ "output": " This is disabled by default. ``dump_trans_timings``\n\n\n.. dropdown:: Enable Detailed Logs for Timing and Types of Features Produced\n\t:open:\n\n\tSpecify whether to dump every scored fold's timing and feature info to a timings.txt file."
+ },
+ {
+ "output": " ``compute_correlation``\n~\n\n.. dropdown:: Compute Correlation Matrix\n\t:open:\n\n\tSpecify whether to compute training, validation, and test correlation matrixes. When enabled, this setting creates table and heatmap PDF files that are saved to disk."
+ },
+ {
+ "output": " This is disabled by default. ``interaction_finder_gini_rel_improvement_threshold``\n~\n\n.. dropdown:: Required GINI Relative Improvement for Interactions\n\t:open:\n\n\tSpecify the required GINI relative improvement value for the InteractionTransformer."
+ },
+ {
+ "output": " If the data is noisy and there is no clear signal in interactions, this value can be decreased to return interactions. This value defaults to 0.5. ``interaction_finder_return_limit``\n~\n\n.. dropdown:: Number of Transformed Interactions to Make\n\t:open:\n\n\tSpecify the number of transformed interactions to make from generated trial interactions."
+ },
+ {
+ "output": " This value defaults to 5. .. _enable_rapids_transformers:\n\n``enable_rapids_transformers``\n\n\n.. dropdown:: Whether to enable RAPIDS cuML GPU transformers (no mojo)\n\t:open:\n\n\tSpecify whether to enable GPU-based `RAPIDS cuML `__ transformers."
+ },
+ {
+ "output": " The equivalent config.toml parameter is ``enable_rapids_transformers`` and the default value is False. .. _lowest_allowed_variable_importance:\n\n``varimp_threshold_at_interpretability_10``\n~\n\n.. dropdown:: Lowest allowed variable importance at interpretability 10\n\t:open:\n\n\tSpecify the variable importance below which features are dropped (with the possibility of a replacement being found that's better)."
+ },
+ {
+ "output": " Set this to a lower value if you're content with having many weak features despite choosing high interpretability, or if you see a drop in performance due to the need for weak features. ``stabilize_fs``\n\n\n.. dropdown:: Whether to take minimum (True) or mean (False) of delta improvement in score when aggregating feature selection scores across multiple folds/depths\n\t:open:\n\n\tWhether to take minimum (True) or mean (False) of delta improvement in score when aggregating feature selection scores across multiple folds/depths."
+ },
+ {
+ "output": " Feature selection by permutation importance considers the change in score after shuffling a feature, and using minimum operation ignores optimistic scores in favor of pessimistic scores when aggregating over folds."
+ },
+ {
+ "output": " If interpretability >= config toml value of fs_data_vary_for_interpretability, then half data (or setting of fs_data_frac) is used as another fit, in which case regardless of this toml setting, only features that are kept for all data sizes are kept by feature selection."
+ },
+ {
+ "output": " Hive Setup\n\n\nDriverless AI lets you explore Hive data sources from within the Driverless AI application. This section provides instructions for configuring Driverless AI to work with Hive. Note: Depending on your Docker install version, use either the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command when starting the Driverless AI Docker image."
+ },
+ {
+ "output": " Description of Configuration Attributes\n~\n\n- ``enabled_file_systems``: The file systems you want to enable. This must be configured in order for data connectors to function properly. - ``hive_app_configs``: Configuration for Hive Connector."
+ },
+ {
+ "output": " Important keys include:\n \n - ``hive_conf_path``: The path to Hive configuration. This can have multiple files (e.g. hive-site.xml, hdfs-site.xml, etc.) - ``auth_type``: Specify one of ``noauth``, ``keytab``, or ``keytabimpersonation`` for Kerberos authentication\n - ``keytab_path``: Specify the path to Kerberos keytab to use for authentication (this can be ``\"\"`` if using ``auth_type=\"noauth\"``)\n - ``principal_user``: Specify the Kerberos app principal user (required when using ``auth_type=\"keytab\"`` or ``auth_type=\"keytabimpersonation\"``)\n\nNotes:\n\n- With Hive connectors, it is assumed that DAI is running on the edge node."
+ },
+ {
+ "output": " missing classes, dependencies, authorization errors). - Ensure the core-site.xml file (from e.g Hadoop conf) is also present in the Hive conf with the rest of the files (hive-site.xml, hdfs-site.xml, etc.)."
+ },
+ {
+ "output": " ``hadoop.proxyuser.hive.hosts`` & ``hadoop.proxyuser.hive.groups``). - If you have tez as the Hive execution engine, make sure that the required tez dependencies (classpaths, jars, etc.) are available on the DAI node."
+ },
+ {
+ "output": " The configuration should be JSON/Dictionary String with multiple keys. For example:\n \n ::\n\n \"\"\"{\n \"hive_connection_1\": {\n \"hive_conf_path\": \"/path/to/hive/conf\",\n \"auth_type\": \"one of ['noauth', 'keytab',\n 'keytabimpersonation']\",\n \"keytab_path\": \"/path/to/.keytab\",\n \"principal_user\": \"hive/node1.example.com@EXAMPLE.COM\",\n },\n \"hive_connection_2\": {\n \"hive_conf_path\": \"/path/to/hive/conf_2\",\n \"auth_type\": \"one of ['noauth', 'keytab', \n 'keytabimpersonation']\",\n \"keytab_path\": \"/path/to/.keytab\",\n \"principal_user\": \"hive/node2.example.com@EXAMPLE.COM\",\n }\n }\"\"\"\n\n \\ Note: The expected input of ``hive_app_configs`` is a `JSON string `__."
+ },
+ {
+ "output": " Depending on how the configuration value is applied, different forms of outer quotations may be required. The following examples show two unique methods for applying outer quotations. - Configuration value applied with the config.toml file:\n\n ::\n\n hive_app_configs = \"\"\"{\"my_json_string\": \"value\", \"json_key_2\": \"value2\"}\"\"\"\n\n - Configuration value applied with an environment variable:\n\n ::\n\n DRIVERLESS_AI_HIVE_APP_CONFIGS='{\"my_json_string\": \"value\", \"json_key_2\": \"value2\"}'\n\n- ``hive_app_jvm_args``: Optionally specify additional Java Virtual Machine (JVM) args for the Hive connector."
+ },
+ {
+ "output": " Notes:\n\n - If a custom `JAAS configuration file `__ is needed for your Kerberos setup, use ``hive_app_jvm_args`` to specify the appropriate file:\n\n ::\n\n hive_app_jvm_args = \"-Xmx20g -Djava.security.auth.login.config=/etc/dai/jaas.conf\"\n\n Sample ``jaas.conf`` file:\n ::\n\n com.sun.security.jgss.initiate {\n com.sun.security.auth.module.Krb5LoginModule required\n useKeyTab=true\n useTicketCache=false\n principal=\"hive/localhost@EXAMPLE.COM\" [Replace this line]\n doNotPrompt=true\n keyTab=\"/path/to/hive.keytab\" [Replace this line]\n debug=true;\n };\n\n- ``hive_app_classpath``: Optionally specify an alternative classpath for the Hive connector."
+ },
+ {
+ "output": " This can be done by specifying each environment variable in the ``nvidia-docker run`` command or by editing the configuration options in the config.toml file and then specifying that file in the ``nvidia-docker run`` command."
+ },
+ {
+ "output": " Start the Driverless AI Docker Image. .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS=\"file,hdfs,hive\" \\\n -e DRIVERLESS_AI_HIVE_APP_CONFIGS='{\"hive_connection_2: {\"hive_conf_path\":\"/etc/hadoop/conf\",\n \"auth_type\":\"keytabimpersonation\",\n \"keytab_path\":\"/etc/dai/steam.keytab\",\n \"principal_user\":\"steam/mr-0xg9.0xdata.loc@H2OAI.LOC\"}}' \\\n -p 12345:12345 \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -v /path/to/hive/conf:/path/to/hive/conf/in/docker \\\n -v /path/to/hive.keytab:/path/in/docker/hive.keytab \\\n -u $(id -u):${id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n\n .. group-tab:: Docker Image with the config.toml\n\n This example shows how to configure Hive options in the config.toml file, and then specify that file when starting Driverless AI in Docker."
+ },
+ {
+ "output": " Enable and configure the Hive connector in the Driverless AI config.toml file. The Hive connector configuration must be a JSON/Dictionary string with multiple keys. .. code-block:: bash \n\n enabled_file_systems = \"file, hdfs, s3, hive\"\n hive_app_configs = \"\"\"{\"hive_1\": {\"auth_type\": \"keytab\",\n \"key_tab_path\": \"/path/to/hive.keytab\",\n \"hive_conf_path\": \"/path/to/hive-resources\",\n \"principal_user\": \"hive/localhost@EXAMPLE.COM\"}}\"\"\"\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash \n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro /\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -v /path/to/hive/conf:/path/to/hive/conf/in/docker \\\n -v /path/to/hive.keytab:/path/in/docker/hive.keytab \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n\n .. group-tab:: Native Installs\n\n This enables the Hive connector."
+ },
+ {
+ "output": " Export the Driverless AI config.toml file or add it to ~/.bashrc. ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\"\n\n 2."
+ },
+ {
+ "output": " ::\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, hdfs, s3, hive\"\n\n \n # Configuration for Hive Connector\n # Note that inputs are similar to configuring HDFS connectivity\n # Important keys:\n # * hive_conf_path - path to hive configuration, may have multiple files."
+ },
+ {
+ "output": " Required when using auth_type `keytab` or `keytabimpersonation`\n # JSON/Dictionary String with multiple keys. Example:\n # \"\"\"{\n # \"hive_connection_1\": {\n # \"hive_conf_path\": \"/path/to/hive/conf\",\n # \"auth_type\": \"one of ['noauth', 'keytab', 'keytabimpersonation']\",\n # \"keytab_path\": \"/path/to/.keytab\",\n # principal_user\": \"hive/localhost@EXAMPLE.COM\",\n # }\n # }\"\"\"\n #\n hive_app_configs = \"\"\"{\"hive_1\": {\"auth_type\": \"keytab\",\n \"key_tab_path\": \"/path/to/hive.keytab\",\n \"hive_conf_path\": \"/path/to/hive-resources\",\n \"principal_user\": \"hive/localhost@EXAMPLE.COM\"}}\"\"\"\n\n 3."
+ },
+ {
+ "output": " Adding Datasets Using Hive\n~\n\nAfter the Hive connector is enabled, you can add datasets by selecting Hive from the Add Dataset (or Drag and Drop) drop-down menu. 1. Select the Hive configuraton that you want to use."
+ },
+ {
+ "output": " Specify the following information to add your dataset. - Hive Database: Specify the name of the Hive database that you are querying. - Hadoop Configuration Path: Specify the path to your Hive configuration file."
+ },
+ {
+ "output": " - Hive Kerberos Principal: Specify the Hive Kerberos principal. This is required if the Hive Authentication Type is keytabimpersonation. - Hive Authentication Type: Specify the authentication type. This can be noauth, keytab, or keytabimpersonation."
+ },
+ {
+ "output": " Install on Ubuntu\n-\n\nThis section describes how to install the Driverless AI Docker image on Ubuntu. The installation steps vary depending on whether your system has GPUs or if it is CPU only. Environment\n~\n\n+-+-+-+\n| Operating System | GPUs?"
+ },
+ {
+ "output": " Open a Terminal and ssh to the machine that will run Driverless AI. Once you are logged in, perform the following steps. 1. Retrieve the Driverless AI Docker image from https://www.h2o.ai/download/. (Note that the contents of this Docker image include a CentOS kernel and CentOS packages.)"
+ },
+ {
+ "output": " Install and run Docker on Ubuntu (if not already installed):\n\n .. code-block:: bash\n\n # Install and run Docker on Ubuntu\n curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -\n sudo apt-key fingerprint 0EBFCD88 sudo add-apt-repository \\ \n \"deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable\" \n sudo apt-get update\n sudo apt-get install docker-ce\n sudo systemctl start docker\n\n3."
+ },
+ {
+ "output": " More information is available at https://github.com/NVIDIA/nvidia-docker/blob/master/README.md. .. code-block:: bash\n\n curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \\\n sudo apt-key add -\n distribution=$(."
+ },
+ {
+ "output": " Verify that the NVIDIA driver is up and running. If the driver is not up and running, log on to http://www.nvidia.com/Download/index.aspx?lang=en-us to get the latest NVIDIA Tesla V/P/K series driver: \n\n .. code-block:: bash\n\n nvidia-smi\n\n5."
+ },
+ {
+ "output": " Change directories to the new folder, then load the Driverless AI Docker image inside the new directory:\n\n .. code-block:: bash\n :substitutions:\n\n # cd into the new directory\n cd |VERSION-dir|\n\n # Load the Driverless AI docker image\n docker load < dai-docker-ubi8-x86_64-|VERSION-long|.tar.gz\n\n7."
+ },
+ {
+ "output": " Note that this needs to be run once every reboot. Refer to the following for more information: http://docs.nvidia.com/deploy/driver-persistence/index.html. .. include:: enable-persistence.rst\n\n8. Set up the data, log, and license directories on the host machine:\n\n .. code-block:: bash\n\n # Set up the data, log, license, and tmp directories on the host machine (within the new directory)\n mkdir data\n mkdir log\n mkdir license\n mkdir tmp\n\n9."
+ },
+ {
+ "output": " The data will be visible inside the Docker container. 10. Run ``docker images`` to find the image tag. 11. Start the Driverless AI Docker image and replace TAG below with the image tag. Depending on your install version, use the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command."
+ },
+ {
+ "output": " We recommend ``shm-size=256m`` in docker launch command. But if user plans to build :ref:`image auto model ` extensively, then ``shm-size=2g`` is recommended for Driverless AI docker command."
+ },
+ {
+ "output": " .. tabs::\n\n .. tab:: >= Docker 19.03\n\n .. code-block:: bash\n :substitutions:\n\n # Start the Driverless AI Docker image\n docker run runtime=nvidia \\\n pid=host \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n h2oai/dai-ubi8-x86_64:|tag:\n\n .. tab:: < Docker 19.03\n\n .. code-block:: bash\n :substitutions:\n\n # Start the Driverless AI Docker image\n nvidia-docker run \\\n pid=host \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n Driverless AI will begin running::\n\n \n Welcome to H2O.ai's Driverless AI\n -\n\n - Put data in the volume mounted at /data\n - Logs are written to the volume mounted at /log/20180606-044258\n - Connect to Driverless AI on port 12345 inside the container\n - Connect to Jupyter notebook on port 8888 inside the container\n\n12."
+ },
+ {
+ "output": " This section describes how to install and start the Driverless AI Docker image on Ubuntu. Note that this uses ``docker`` and not ``nvidia-docker``. GPU support will not be available. Watch the installation video `here `__."
+ },
+ {
+ "output": " Open a Terminal and ssh to the machine that will run Driverless AI. Once you are logged in, perform the following steps. 1. Retrieve the Driverless AI Docker image from https://www.h2o.ai/download/. 2."
+ },
+ {
+ "output": " Set up a directory for the version of Driverless AI on the host machine: \n\n .. code-block:: bash\n :substitutions:\n\n # Set up directory with the version name\n mkdir |VERSION-dir|\n\n4. Change directories to the new folder, then load the Driverless AI Docker image inside the new directory:\n\n .. code-block:: bash\n :substitutions:\n\n # cd into the new directory\n cd |VERSION-dir|\n\n # Load the Driverless AI docker image\n docker load < dai-docker-ubi8-x86_64-|VERSION-long|.tar.gz\n\n5."
+ },
+ {
+ "output": " At this point, you can copy data into the data directory on the host machine. The data will be visible inside the Docker container. 7. Run ``docker images`` to find the new image tag. 8. Start the Driverless AI Docker image."
+ },
+ {
+ "output": " Note that from version 1.10 DAI docker image runs with internal ``tini`` that is equivalent to using ``init`` from docker, if both are enabled in the launch command, tini will print a (harmless) warning message."
+ },
+ {
+ "output": " But if user plans to build :ref:`image auto model ` extensively, then ``shm-size=2g`` is recommended for Driverless AI docker command. .. code-block:: bash\n :substitutions:\n\n # Start the Driverless AI Docker image\n docker run \\\n pid=host \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n Driverless AI will begin running::\n\n \n Welcome to H2O.ai's Driverless AI\n -\n\n - Put data in the volume mounted at /data\n - Logs are written to the volume mounted at /log/20180606-044258\n - Connect to Driverless AI on port 12345 inside the container\n - Connect to Jupyter notebook on port 8888 inside the container\n\n9."
+ },
+ {
+ "output": " .. _linux-tarsh:\n\nLinux TAR SH\n\n\nThe Driverless AI software is available for use in pure user-mode environments as a self-extracting TAR SH archive. This form of installation does not require a privileged user to install or to run."
+ },
+ {
+ "output": " See those sections for a full list of supported environments. The installation steps assume that you have a valid license key for Driverless AI. For information on how to obtain a license key for Driverless AI, visit https://www.h2o.ai/products/h2o-driverless-ai/."
+ },
+ {
+ "output": " .. note::\n\tTo ensure that :ref:`AutoDoc ` pipeline visualizations are generated correctly on native installations, installing `fontconfig `_ is recommended."
+ },
+ {
+ "output": " Note that if you are using K80 GPUs, the minimum required NVIDIA driver version is 450.80.02\n- OpenCL (Required for full LightGBM support on GPU-powered systems)\n- Driverless AI TAR SH, available from https://www.h2o.ai/download/\n\nNote: CUDA 11.2.2 (for GPUs) and cuDNN (required for TensorFlow support on GPUs) are included in the Driverless AI package."
+ },
+ {
+ "output": " To install OpenCL, run the following as root:\n\n.. code-block:: bash\n\n mkdir -p /etc/OpenCL/vendors && echo \"libnvidia-opencl.so.1\" > /etc/OpenCL/vendors/nvidia.icd && chmod a+r /etc/OpenCL/vendors/nvidia.icd && chmod a+x /etc/OpenCL/vendors/ && chmod a+x /etc/OpenCL\n\n.. note::\n\tIf OpenCL is not installed, then CUDA LightGBM is automatically used."
+ },
+ {
+ "output": " Installing Driverless AI\n\n\nRun the following commands to install the Driverless AI TAR SH. .. code-block:: bash\n :substitutions:\n\n # Install Driverless AI. chmod 755 |VERSION-tar-lin|\n ./|VERSION-tar-lin|\n\nYou may now cd to the unpacked directory and optionally make changes to config.toml."
+ },
+ {
+ "output": " ./run-dai.sh\n\nStarting NVIDIA Persistence Mode\n\n\nIf you have NVIDIA GPUs, you must run the following NVIDIA command. This command needs to be run every reboot. For more information: http://docs.nvidia.com/deploy/driver-persistence/index.html."
+ },
+ {
+ "output": " Run the following for Centos7/RH7 based systems using yum and x86. .. code-block:: bash\n\n yum -y clean all\n yum -y makecache\n yum -y update\n wget http://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/c/clinfo-2.1.17.02.09-1.el7.x86_64.rpm\n wget http://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/o/ocl-icd-2.2.12-1.el7.x86_64.rpm\n rpm -if clinfo-2.1.17.02.09-1.el7.x86_64.rpm\n rpm -if ocl-icd-2.2.12-1.el7.x86_64.rpm\n clinfo\n\n mkdir -p /etc/OpenCL/vendors && \\\n echo \"libnvidia-opencl.so.1\" > /etc/OpenCL/vendors/nvidia.icd\n\nLooking at Driverless AI log files\n\n\n.. code-block:: bash\n\n less log/dai.log\n less log/h2o.log\n less log/procsy.log\n less log/vis-server.log\n\nStopping Driverless AI\n\n\n.. code-block:: bash\n\n # Stop Driverless AI."
+ },
+ {
+ "output": " By default, all files for Driverless AI are contained within this directory. Upgrading Driverless AI\n~\n\n.. include:: upgrade-warning.frag\n\nRequirements\n\n\nWe recommend to have NVIDIA driver >= |NVIDIA-driver-ver| installed (GPU only) in your host environment for a seamless experience on all architectures, including Ampere."
+ },
+ {
+ "output": " Go to `NVIDIA download driver `__ to get the latest NVIDIA Tesla A/T/V/P/K series drivers. For reference on CUDA Toolkit and Minimum Required Driver Versions and CUDA Toolkit and Corresponding Driver Versions, see `here `__ ."
+ },
+ {
+ "output": " Upgrade Steps\n'\n\n1. Stop your previous version of Driverless AI. 2. Run the self-extracting archive for the new version of Driverless AI. 3. Port any previous changes you made to your config.toml file to the newly unpacked directory."
+ },
+ {
+ "output": " Experiment Settings\n=\n\nThis section includes settings that can be used to customize the experiment like total runtime, reproducibility level, pipeline building, feature brain control, adding config.toml settings and more."
+ },
+ {
+ "output": " This is equivalent to pushing the Finish button once half of the specified time value has elapsed. Note that the overall enforced runtime is only an approximation. This value defaults to 1440, which is the equivalent of a 24 hour approximate overall runtime."
+ },
+ {
+ "output": " Set this value to 0 to disable this setting. Note that this setting applies to per experiment so if building leaderboard models(n) it will apply to each experiment separately(i.e total allowed runtime will be n*24hrs."
+ },
+ {
+ "output": " This option preserves experiment artifacts that have been generated for the summary and log zip files while continuing to generate additional artifacts. This value defaults to 10080 mins (7 days). Note that this setting applies to per experiment so if building leaderboard models( say n), it will apply to each experiment separately(i.e total allowed runtime will be n*7days."
+ },
+ {
+ "output": " Also see :ref:`time_abort `. .. _time_abort:\n\n``time_abort``\n\n\n.. dropdown:: Time to Trigger the 'Abort' Button\n\t:open:\n\n\tIf the experiment is not done by this time, push the abort button."
+ },
+ {
+ "output": " Also see :ref:`max_runtime_minutes_until_abort ` for control over per experiment abort times. This accepts time in format given by time_abort_format (defaults to %Y-%m-%d %H:%M:%S).This assumes a timezone set by time_abort_timezone in config.toml(defaults to UTC)."
+ },
+ {
+ "output": " This will apply to the time on a DAI worker that runs the experiments. Similar to :ref:`max_runtime_minutes_until_abort `, time abort will preserves experiment artifacts made so far for summary and log zip files."
+ },
+ {
+ "output": " .. _pipeline-building-recipe:\n\n``pipeline-building-recipe``\n\n\n.. dropdown:: Pipeline Building Recipe\n\t:open:\n\n\tSpecify the Pipeline Building recipe type (overrides GUI settings). Select from the following:\n\n\t- Auto: Specifies that all models and features are automatically determined by experiment settings, config.toml settings, and the feature engineering effort."
+ },
+ {
+ "output": " - Only uses GLM or booster as 'giblinear'. - :ref:`Fixed ensemble level ` is set to 0. - :ref:`Feature brain level ` is set to 0. - Max feature interaction depth is set to 1 i.e no interactions."
+ },
+ {
+ "output": " - Does not use :ref:`distribution shift ` detection. - :ref:`monotonicity_constraints_correlation_threshold ` is set to 0."
+ },
+ {
+ "output": " - Drops features that are not correlated with target by at least 0.01. See :ref:`monotonicity-constraints-drop-low-correlation-features ` and :ref:`monotonicity-constraints-correlation-threshold `."
+ },
+ {
+ "output": " - :ref:`Interaction depth ` is set to 1 i.e no multi-feature interactions done to avoid complexity. - No target transformations applied for regression problems i.e sets :ref:`target_transformer ` to 'identity'."
+ },
+ {
+ "output": " - :ref:`num_as_cat ` feature transformation is disabled. - List of included_transformers\n\t\t\n \t| 'OriginalTransformer', #numeric (no clustering, no interactions, no num->cat)\n \t| 'CatOriginalTransformer', 'RawTransformer','CVTargetEncodeTransformer', 'FrequentTransformer','WeightOfEvidenceTransformer','OneHotEncodingTransformer', #categorical (but no num-cat)\n \t| 'CatTransformer','StringConcatTransformer', # big data only\n \t| 'DateOriginalTransformer', 'DateTimeOriginalTransformer', 'DatesTransformer', 'DateTimeDiffTransformer', 'IsHolidayTransformer', 'LagsTransformer', 'EwmaLagsTransformer', 'LagsInteractionTransformer', 'LagsAggregatesTransformer',#dates/time\n \t| 'TextOriginalTransformer', 'TextTransformer', 'StrFeatureTransformer', 'TextCNNTransformer', 'TextBiGRUTransformer', 'TextCharCNNTransformer', 'BERTTransformer',#text\n \t| 'ImageOriginalTransformer', 'ImageVectorizerTransformer'] #image\n\n \tFor reference also see :ref:`Monotonicity Constraints in Driverless AI `."
+ },
+ {
+ "output": " - The test set is concatenated with the train set, with the target marked as missing\n\t\t- Transformers that do not use the target are allowed to ``fit_transform`` across the entirety of the train, validation, and test sets."
+ },
+ {
+ "output": " - nlp_model: Only enable NLP BERT models based on PyTorch to process pure text. To avoid slowdown when using this recipe, enabling one or more GPUs is strongly recommended. For more information, see :ref:`nlp-in-dai`."
+ },
+ {
+ "output": " To avoid slowdown when using this recipe, enabling one or more GPUs is strongly recommended. For more information, see :ref:`nlp-in-dai`. - included_transformers = ['BERTTransformer']\n\t\t- excluded_models = ['TextBERTModel', 'TextMultilingualBERTModel', 'TextXLNETModel', 'TextXLMModel','TextRoBERTaModel', 'TextDistilBERTModel', 'TextALBERTModel', 'TextCamemBERTModel', 'TextXLMRobertaModel']\n\t\t- enable_pytorch_nlp_transformer = 'on'\n\t\t- enable_pytorch_nlp_model = 'off'\n\n\t- image_model: Only enable image models that process pure images (ImageAutoModel)."
+ },
+ {
+ "output": " For more information, see :ref:`image-model`. Notes:\n\n \t\t- This option disables the :ref:`Genetic Algorithm ` (GA). - Image insights are only available when this option is selected. - image_transformer: Only enable the ImageVectorizer transformer, which processes pure images."
+ },
+ {
+ "output": " - unsupervised: Only enable unsupervised transformers, models and scorers. :ref:`See ` for reference. - gpus_max: Maximize use of GPUs (e.g. use XGBoost, RAPIDS, Optuna hyperparameter search, etc."
+ },
+ {
+ "output": " Each pipeline building recipe mode can be chosen, and then fine-tuned using each expert settings. Changing the pipeline building recipe will reset all pipeline building recipe options back to default and then re-apply the specific rules for the new mode, which will undo any fine-tuning of expert options that are part of pipeline building recipe rules."
+ },
+ {
+ "output": " To reset recipe behavior, one can switch between 'auto' and the desired mode. This way the new child experiment will use the default settings for the chosen recipe. .. _enable_genetic_algorithm:\n\n``enable_genetic_algorithm``\n\n\n.. dropdown:: Enable Genetic Algorithm for Selection and Tuning of Features and Models\n\t:open:\n\n\tSpecify whether to enable :ref:`genetic algorithm ` for selection and hyper-parameter tuning of features and models:\n\n\t- auto: Default value is 'auto'."
+ },
+ {
+ "output": " - on: Driverless AI genetic algorithm is used for feature engineering and model tuning and selection. - Optuna: When 'Optuna' is selected, model hyperparameters are tuned with :ref:`Optuna ` and Driverless AI genetic algorithm is used for feature engineering."
+ },
+ {
+ "output": " Optuna mode currently only uses Optuna for XGBoost, LightGBM, and CatBoost (custom recipe). If Pruner is enabled, as is default, Optuna mode disables mutations of evaluation metric (eval_metric) so pruning uses same metric across trials to compare."
+ },
+ {
+ "output": " THe equivalent config.toml parameter is ``enable_genetic_algorithm``. .. _tournament_style:\n\n``tournament_style``\n\n\n.. dropdown:: Tournament Model for Genetic Algorithm\n\t:open:\n\n\tSelect a method to decide which models are best at each iteration."
+ },
+ {
+ "output": " Choose from the following:\n\n\t- auto: Choose based upon accuracy and interpretability\n\t- uniform: all individuals in population compete to win as best (can lead to all, e.g. LightGBM models in final ensemble, which may not improve ensemble performance due to lack of diversity)\n\t- fullstack: Choose from optimal model and feature types\n\t- feature: individuals with similar feature types compete (good if target encoding, frequency encoding, and other feature sets lead to good results)\n\t- model: individuals with same model type compete (good if multiple models do well but some models that do not do as well still contribute to improving ensemble)\n\n\tFor each case, a round robin approach is used to choose best scores among type of models to choose from."
+ },
+ {
+ "output": " The tournament is only used to prune-down individuals for, e.g., tuning -> evolution and evolution -> final model. ``make_python_scoring_pipeline``\n\n\n.. dropdown:: Make Python Scoring Pipeline\n\t:open:\n\n\tSpecify whether to automatically build a Python Scoring Pipeline for the experiment."
+ },
+ {
+ "output": " Select Off to disable the automatic creation of the Python Scoring Pipeline. ``make_mojo_scoring_pipeline``\n\n\n.. dropdown:: Make MOJO Scoring Pipeline\n\t:open:\n\n\tSpecify whether to automatically build a MOJO (Java) Scoring Pipeline for the experiment."
+ },
+ {
+ "output": " With this option, any capabilities that prevent the creation of the pipeline are dropped. Select Off to disable the automatic creation of the MOJO Scoring Pipeline. Select Auto (default) to attempt to create the MOJO Scoring Pipeline without dropping any capabilities."
+ },
+ {
+ "output": " When this is set to Auto (default), the MOJO is only used if the number of rows is equal to or below the value specified by ``mojo_for_predictions_max_rows``. .. _reduce_mojo_size:\n\n``reduce_mojo_size``\n~\n.. dropdown:: Attempt to Reduce the Size of the MOJO (Small MOJO)\n\t:open:\n\n\tSpecify whether to attempt to create a small MOJO scoring pipeline when the experiment is being built."
+ },
+ {
+ "output": " This setting attempts to reduce the mojo size by limiting experiment's maximum :ref:`interaction depth ` to 3, setting :ref:`ensemble level ` to 0 i.e no ensemble model for final pipeline and limiting the :ref:`maximum number of features ` in the model to 200."
+ },
+ {
+ "output": " This is disabled by default. The equivalent config.toml setting is ``reduce_mojo_size``\n\n``make_pipeline_visualization``\n\n\n.. dropdown:: Make Pipeline Visualization\n\t:open:\n\n\tSpecify whether to create a visualization of the scoring pipeline at the end of an experiment."
+ },
+ {
+ "output": " Note that the Visualize Scoring Pipeline feature is experimental and is not available for deprecated models. Visualizations are available for all newly created experiments. ``benchmark_mojo_latency``\n\n\n.. dropdown:: Measure MOJO Scoring Latency\n\t:open:\n\n\tSpecify whether to measure the MOJO scoring latency at the time of MOJO creation."
+ },
+ {
+ "output": " In this case, MOJO scoring latency will be measured if the pipeline.mojo file size is less than 100 MB. ``mojo_building_timeout``\n~\n\n.. dropdown:: Timeout in Seconds to Wait for MOJO Creation at End of Experiment\n\t:open:\n\n\tSpecify the amount of time in seconds to wait for MOJO creation at the end of an experiment."
+ },
+ {
+ "output": " This value defaults to 1800 sec (30 minutes). ``mojo_building_parallelism``\n~\n\n.. dropdown:: Number of Parallel Workers to Use During MOJO Creation\n\t:open:\n\n\tSpecify the number of parallel workers to use during MOJO creation."
+ },
+ {
+ "output": " Set this value to -1 (default) to use all physical cores. ``kaggle_username``\n~\n\n.. dropdown:: Kaggle Username\n\t:open:\n\n\tOptionally specify your Kaggle username to enable automatic submission and scoring of test set predictions."
+ },
+ {
+ "output": " If you don't have a Kaggle account, you can sign up at https://www.kaggle.com. ``kaggle_key``\n\n\n.. dropdown:: Kaggle Key\n\t:open:\n\n\tSpecify your Kaggle API key to enable automatic submission and scoring of test set predictions."
+ },
+ {
+ "output": " For more information on obtaining Kaggle API credentials, see https://github.com/Kaggle/kaggle-api#api-credentials. ``kaggle_timeout``\n\n\n.. dropdown:: Kaggle Submission Timeout in Seconds\n\t:open:\n\n\tSpecify the Kaggle submission timeout in seconds."
+ },
+ {
+ "output": " ``min_num_rows``\n\n\n.. dropdown:: Min Number of Rows Needed to Run an Experiment\n\t:open:\n\n\tSpecify the minimum number of rows that a dataset must contain in order to run an experiment. This value defaults to 100."
+ },
+ {
+ "output": " Note that this setting is only used when the :ref:`reproducible` option is enabled in the experiment:\n\n\t- 1 = Same experiment results for same O/S, same CPU(s), and same GPU(s) (Default)\n\t- 2 = Same experiment results for same O/S, same CPU architecture, and same GPU architecture\n\t- 3 = Same experiment results for same O/S, same CPU architecture (excludes GPUs)\n\t- 4 = Same experiment results for same O/S (best approximation)\n\n\tThis value defaults to 1."
+ },
+ {
+ "output": " When a seed is defined and the reproducible button is enabled (not by default), the algorithm will behave deterministically. ``allow_different_classes_across_fold_splits``\n\n\n.. dropdown:: Allow Different Sets of Classes Across All Train/Validation Fold Splits\n\t:open:\n\n\t(Note: Applicable for multiclass problems only.)"
+ },
+ {
+ "output": " This is enabled by default. ``save_validation_splits``\n\n\n.. dropdown:: Store Internal Validation Split Row Indices\n\t:open:\n\n\tSpecify whether to store internal validation split row indices. This includes pickles of (train_idx, valid_idx) tuples (numpy row indices for original training data) for all internal validation folds in the experiment summary ZIP file."
+ },
+ {
+ "output": " This setting is disabled by default. ``max_num_classes``\n~\n\n.. dropdown:: Max Number of Classes for Classification Problems\n\t:open:\n\n\tSpecify the maximum number of classes to allow for a classification problem."
+ },
+ {
+ "output": " Memory requirements also increase with a higher number of classes. This value defaults to 200. ``max_num_classes_compute_roc``\n~\n\n.. dropdown:: Max Number of Classes to Compute ROC and Confusion Matrix for Classification Problems\n\n\tSpecify the maximum number of classes to use when computing the ROC and CM."
+ },
+ {
+ "output": " This value defaults to 200 and cannot be lower than 2. ``max_num_classes_client_and_gui``\n\n\n.. dropdown:: Max Number of Classes to Show in GUI for Confusion Matrix\n\t:open:\n\n\tSpecify the maximum number of classes to show in the GUI for CM, showing first ``max_num_classes_client_and_gui`` labels."
+ },
+ {
+ "output": " Note that if this value is changed in the config.toml and the server is restarted, then this setting will only modify client-GUI launched diagnostics. To control experiment plots, this value must be changed in the expert settings panel."
+ },
+ {
+ "output": " Note that this doesn't limit final model calculation. ``use_feature_brain_new_experiments``\n~\n\n.. dropdown:: Whether to Use Feature Brain for New Experiments\n\t:open:\n\n\tSpecify whether to use feature_brain results even if running new experiments."
+ },
+ {
+ "output": " Even rescoring may be insufficient, so by default this is False. For example, one experiment may have training=external validation by accident, and get high score, and while feature_brain_reset_score='on' means we will rescore, it will have already seen during training the external validation and leak that data as part of what it learned from."
+ },
+ {
+ "output": " .. _feature_brain1:\n\n``feature_brain_level``\n~\n\n.. dropdown:: Model/Feature Brain Level\n\t:open:\n\n\tSpecify whether to use H2O.ai brain, which enables local caching and smart re-use (checkpointing) of prior experiments to generate useful features and models for new experiments."
+ },
+ {
+ "output": " When enabled, this will use the H2O.ai brain cache if the cache file:\n\n\t - has any matching column names and types for a similar experiment type\n\t - has classes that match exactly\n\t - has class labels that match exactly\n\t - has basic time series choices that match\n\t - the interpretability of the cache is equal or lower\n\t - the main model (booster) is allowed by the new experiment\n\n\t- -1: Don't use any brain cache (default)\n\t- 0: Don't use any brain cache but still write to cache."
+ },
+ {
+ "output": " - 1: Smart checkpoint from the latest best individual model. Use case: Want to use the latest matching model. The match may not be precise, so use with caution. - 2: Smart checkpoint if the experiment matches all column names, column types, classes, class labels, and time series options identically."
+ },
+ {
+ "output": " - 3: Smart checkpoint like level #1 but for the entire population. Tune only if the brain population is of insufficient size. Note that this will re-score the entire population in a single iteration, so it appears to take longer to complete first iteration."
+ },
+ {
+ "output": " Tune only if the brain population is of insufficient size. Note that this will re-score the entire population in a single iteration, so it appears to take longer to complete first iteration. - 5: Smart checkpoint like level #4 but will scan over the entire brain cache of populations to get the best scored individuals."
+ },
+ {
+ "output": " When enabled, the directory where the H2O.ai Brain meta model files are stored is H2O.ai_brain. In addition, the default maximum brain size is 20GB. Both the directory and the maximum size can be changed in the config.toml file."
+ },
+ {
+ "output": " .. _feature_brain2:\n\n``feature_brain2``\n\n\n.. dropdown:: Feature Brain Save Every Which Iteration\n\t:open:\n\n\tSave feature brain iterations every iter_num % feature_brain_iterations_save_every_iteration 0, to be able to restart/refit with which_iteration_brain >= 0."
+ },
+ {
+ "output": " - -1: Don't use any brain cache. - 0: Don't use any brain cache but still write to cache. - 1: Smart checkpoint if an old experiment_id is passed in (for example, via running \"resume one like this\" in the GUI)."
+ },
+ {
+ "output": " (default)\n\t- 3: Smart checkpoint like level #1 but for the entire population. Tune only if the brain population is of insufficient size. - 4: Smart checkpoint like level #2 but for the entire population."
+ },
+ {
+ "output": " - 5: Smart checkpoint like level #4 but will scan over the entire brain cache of populations (starting from resumed experiment if chosen) in order to get the best scored individuals. When enabled, the directory where the H2O.ai Brain meta model files are stored is H2O.ai_brain."
+ },
+ {
+ "output": " Both the directory and the maximum size can be changed in the config.toml file. .. _feature_brain3:\n\n``feature_brain3``\n\n.. dropdown:: Feature Brain Restart from Which Iteration\n\t:open:\n\n\tWhen performing restart or re-fit of type feature_brain_level with a resumed ID, specify which iteration to start from instead of only last best."
+ },
+ {
+ "output": " Note: If restarting from a tuning iteration, this will pull in the entire scored tuning population and use that for feature evolution. This value defaults to -1. .. _feature_brain4:\n\n``feature_brain4``\n\n\n.. dropdown:: Feature Brain Refit Uses Same Best Individual\n\t:open:\n\n\tSpecify whether to use the same best individual when performing a refit."
+ },
+ {
+ "output": " Enabling this setting lets you view the exact same model or feature with only one new feature added. This is disabled by default. .. _feature_brain5:\n\n``feature_brain5``\n\n\n.. dropdown:: Feature Brain Adds Features with New Columns Even During Retraining of Final Model\n\t:open:\n\n\tSpecify whether to add additional features from new columns to the pipeline, even when performing a retrain of the final model."
+ },
+ {
+ "output": " New data may lead to new dropped features due to shift or leak detection. Disable this to avoid adding any columns as new features so that the pipeline is perfectly preserved when changing data. This is enabled by default."
+ },
+ {
+ "output": " If this is disabled, the original hyperparameters will be used instead. (Note that this may result in errors.) This is enabled by default. ``min_dai_iterations``\n\n\n.. dropdown:: Min DAI Iterations\n\t:open:\n\n\tSpecify the minimum number of Driverless AI iterations for an experiment."
+ },
+ {
+ "output": " This value defaults to 0. .. _target_transformer:\n\n``target_transformer``\n\n\n.. dropdown:: Select Target Transformation of the Target for Regression Problems\n\t:open:\n\n\tSpecify whether to automatically select target transformation for regression problems."
+ },
+ {
+ "output": " Selecting identity_noclip automatically turns off any target transformations. All transformers except for center, standardize, identity_noclip and log_noclip perform clipping to constrain the predictions to the domain of the target in the training data, so avoid them if you want to enable extrapolations."
+ },
+ {
+ "output": " ``fixed_num_folds_evolution``\n~\n\n.. dropdown:: Number of Cross-Validation Folds for Feature Evolution\n\t:open:\n\n\tSpecify the fixed number of cross-validation folds (if >= 2) for feature evolution. Note that the actual number of allowed folds can be less than the specified value, and that the number of allowed folds is determined at the time an experiment is run."
+ },
+ {
+ "output": " ``fixed_num_folds``\n~\n\n.. dropdown:: Number of Cross-Validation Folds for Final Model\n\t:open:\n\n\tSpecify the fixed number of cross-validation folds (if >= 2) for the final model. Note that the actual number of allowed folds can be less than the specified value, and that the number of allowed folds is determined at the time an experiment is run."
+ },
+ {
+ "output": " ``fixed_only_first_fold_model``\n~\n\n.. dropdown:: Force Only First Fold for Models\n\t:open:\n\n\tSpecify whether to force only the first fold for models. Select from Auto (Default), On, or Off. Set \"on\" to force only first fold for models.This is useful for quick runs regardless of data\n\n``feature_evolution_data_size``\n~\n\n.. dropdown:: Max Number of Rows Times Number of Columns for Feature Evolution Data Splits\n\t:open:\n\n\tSpecify the maximum number of rows allowed for feature evolution data splits (not for the final pipeline)."
+ },
+ {
+ "output": " ``final_pipeline_data_size``\n\n\n.. dropdown:: Max Number of Rows Times Number of Columns for Reducing Training Dataset\n\t:open:\n\n\tSpecify the upper limit on the number of rows times the number of columns for training the final pipeline."
+ },
+ {
+ "output": " ``max_validation_to_training_size_ratio_for_final_ensemble``\n\n\n.. dropdown:: Maximum Size of Validation Data Relative to Training Data\n\t:open:\n\n\tSpecify the maximum size of the validation data relative to the training data."
+ },
+ {
+ "output": " Note that final model predictions and scores will always be provided on the full dataset provided. This value defaults to 2.0. ``force_stratified_splits_for_imbalanced_threshold_binary``\n~\n\n.. dropdown:: Perform Stratified Sampling for Binary Classification If the Target Is More Imbalanced Than This\n\t:open:\n\n\tFor binary classification experiments, specify a threshold ratio of minority to majority class for the target column beyond which stratified sampling is performed."
+ },
+ {
+ "output": " This value defaults to 0.01. You can choose to always perform random sampling by setting this value to 0, or to always perform stratified sampling by setting this value to 1. .. _config_overrides:\n\n``config_overrides``\n\n\n.. dropdown:: Add to config.toml via TOML String\n\t:open:\n\n\tSpecify any additional configuration overrides from the config.toml file that you want to include in the experiment."
+ },
+ {
+ "output": " Setting this will override all other settings. Separate multiple config overrides with ``\\n``. For example, the following enables Poisson distribution for LightGBM and disables Target Transformer Tuning."
+ },
+ {
+ "output": " ::\n\n\t params_lightgbm=\\\"{'objective':'poisson'}\\\" \\n target_transformer=identity\n\n\tOr you can specify config overrides similar to the following without having to escape double quotes:\n\n\t::\n\n\t \"\"enable_glm=\"off\" \\n enable_xgboost_gbm=\"off\" \\n enable_lightgbm=\"off\" \\n enable_tensorflow=\"on\"\"\"\n\t \"\"max_cores=10 \\n data_precision=\"float32\" \\n max_rows_feature_evolution=50000000000 \\n ensemble_accuracy_switch=11 \\n feature_engineering_effort=1 \\n target_transformer=\"identity\" \\n tournament_feature_style_accuracy_switch=5 \\n params_tensorflow=\"{'layers': [100, 100, 100, 100, 100, 100]}\"\"\"\n\n\tWhen running the Python client, config overrides would be set as follows:\n\n\t::\n\n\t\tmodel = h2o.start_experiment_sync(\n\t\t dataset_key=train.key,\n\t\t target_col='target',\n\t\t is_classification=True,\n\t\t accuracy=7,\n\t\t time=5,\n\t\t interpretability=1,\n\t\t config_overrides=\"\"\"\n\t\t feature_brain_level=0\n\t\t enable_lightgbm=\"off\"\n\t\t enable_xgboost_gbm=\"off\"\n\t\t enable_ftrl=\"off\"\n\t\t \"\"\"\n\t\t)\n\n``last_recipe``\n~\n\n.. dropdown:: last_recipe\n\t:open:\n\n\tInternal helper to allow memory of if changed recipe\n\n``feature_brain_reset_score``\n~\n\n.. dropdown:: Whether to re-score models from brain cache\n\t:open:\n\n\tSpecify whether to smartly keep score to avoid re-munging/re-training/re-scoring steps brain models ('auto'), always force all steps for all brain imports ('on'), or never rescore ('off')."
+ },
+ {
+ "output": " 'on' is useful when smart similarity checking is not reliable enough. 'off' is useful when know want to keep exact same features and model for final model refit, despite changes in seed or other behaviors in features that might change the outcome if re-scored before reaching final model."
+ },
+ {
+ "output": " Can also set refit_same_best_individual True if want exact same best individual (highest scored model+features) to be used regardless of any scoring changes. ``feature_brain_save_every_iteration``\n\n\n.. dropdown:: Feature Brain Save every which iteration\n\t:open:\n\n\tSpecify whether to save feature brain iterations every iter_num % feature_brain_iterations_save_every_iteration 0, to be able to restart/refit with which_iteration_brain >= 0."
+ },
+ {
+ "output": " ``which_iteration_brain``\n~\n\n.. dropdown:: Feature Brain Restart from which iteration\n\t:open:\n\n\tWhen performing restart or re-fit type feature_brain_level with resumed_experiment_id, choose which iteration to start from, instead of only last best -1 means just use last best."
+ },
+ {
+ "output": " ``refit_same_best_individual``\n\n\n.. dropdown:: Feature Brain refit uses same best individual\n\t:open:\n\n\tWhen doing re-fit from feature brain, if change columns or features, population of individuals used to refit from may change order of which was best, leading to better result chosen (False case)."
+ },
+ {
+ "output": " That is, if refit with just 1 extra column and have interpretability=1, then final model will be same features, with one more engineered feature applied to that new original feature. ``restart_refit_redo_origfs_shift_leak``\n\n\n.. dropdown:: For restart-refit, select which steps to do\n\t:open:\n\n\tWhen doing restart or re-fit of experiment from feature brain, sometimes user might change data significantly and then warrant redoing reduction of original features by feature selection, shift detection, and leakage detection."
+ },
+ {
+ "output": " due to random seed if not setting reproducible mode), leading to changes in features and model that is refitted. By default, restart and refit avoid these steps assuming data and experiment setup have no changed significantly."
+ },
+ {
+ "output": " In order to ensure exact same final pipeline is fitted, one should also set:\n\n\t- 1) brain_add_features_for_new_columns false\n\t- 2) refit_same_best_individual true\n\t- 3) feature_brain_reset_score 'off'\n\t- 4) force_model_restart_to_defaults false\n\n\tThe score will still be reset if the experiment metric chosen changes, but changes to the scored model and features will be more frozen in place."
+ },
+ {
+ "output": " In some cases, one might have a new dataset but only want to keep same pipeline regardless of new columns, in which case one sets this to False. For example, new data might lead to new dropped features, due to shift or leak detection."
+ },
+ {
+ "output": " ``force_model_restart_to_defaults``\n\n\n.. dropdown:: Restart-refit use default model settings if model switches\n\t:open:\n\n\tIf restart/refit and no longer have the original model class available, be conservative and go back to defaults for that model class."
+ },
+ {
+ "output": " ``dump_modelparams_every_scored_indiv``\n~\n\n.. dropdown:: Enable detailed scored model info\n\t:open:\n\n\tWhether to dump every scored individual's model parameters to csv/tabulated/json file produces files."
+ },
+ {
+ "output": " [txt, csv, json]\n\n.. _fast-approx-trees:\n\n``fast_approx_num_trees``\n~\n\n.. dropdown:: Max number of trees to use for fast approximation\n\t:open:\n\n\tWhen ``fast_approx=True``, specify the maximum number of trees to use."
+ },
+ {
+ "output": " .. note::\n By default, ``fast_approx`` is enabled for MLI and AutoDoc and disabled for Experiment predictions. .. _fast-approx-one-fold:\n\n``fast_approx_do_one_fold``\n~\n\n.. dropdown:: Whether to use only one fold for fast approximation\n\t:open:\n\n\tWhen ``fast_approx=True``, specify whether to speed up fast approximation further by using only one fold out of all cross-validation folds."
+ },
+ {
+ "output": " .. note::\n By default, ``fast_approx`` is enabled for MLI and AutoDoc and disabled for Experiment predictions. .. _fast-approx-one-model:\n\n``fast_approx_do_one_model``\n\n\n.. dropdown:: Whether to use only one model for fast approximation\n\t:open:\n\n\tWhen ``fast_approx=True``, specify whether to speed up fast approximation further by using only one model out of all ensemble models."
+ },
+ {
+ "output": " .. note::\n By default, ``fast_approx`` is enabled for MLI and AutoDoc and disabled for Experiment predictions. .. _fast-approx-trees-shap:\n\n``fast_approx_contribs_num_trees``\n\n\n.. dropdown:: Maximum number of trees to use for fast approximation when making Shapley predictions\n\t:open:\n\n\tWhen ``fast_approx_contribs=True``, specify the maximum number of trees to use for 'Fast Approximation' in GUI when making Shapley predictions and for AutoDoc/MLI."
+ },
+ {
+ "output": " .. note::\n By default, ``fast_approx_contribs`` is enabled for MLI and AutoDoc. .. _fast-approx-one-fold-shap:\n\n``fast_approx_contribs_do_one_fold``\n\n\n.. dropdown:: Whether to use only one fold for fast approximation when making Shapley predictions\n\t:open:\n\n\tWhen ``fast_approx_contribs=True``, specify whether to speed up ``fast_approx_contribs`` further by using only one fold out of all cross-validation folds for 'Fast Approximation' in GUI when making Shapley predictions and for AutoDoc/MLI."
+ },
+ {
+ "output": " .. note::\n By default, ``fast_approx_contribs`` is enabled for MLI and AutoDoc. .. _fast-approx-one-model-shap:\n\n``fast_approx_contribs_do_one_model``\n~\n\n.. dropdown:: Whether to use only one model for fast approximation when making Shapley predictions\n\t:open:\n\n\tWhen ``fast_approx_contribs=True``, specify whether to speed up ``fast_approx_contribs`` further by using only one model out of all ensemble models for 'Fast Approximation' in GUI when making Shapley predictions and for AutoDoc/MLI."
+ },
+ {
+ "output": " .. note::\n By default, ``fast_approx_contribs`` is enabled for MLI and AutoDoc. .. _autoviz_recommended_transformation:\n\n``autoviz_recommended_transformation``\n\n\n.. dropdown:: Autoviz Recommended Transformations\n\t:open:\n\n\tKey-value pairs of column names and transformations that :ref:`Autoviz ` recommended."
+ },
+ {
+ "output": " .. _linux-rpms:\n\nLinux RPMs\n\n\nFor Linux machines that will not use the Docker image or DEB, an RPM installation is available for the following environments:\n\n- x86_64 RHEL 7 / RHEL 8\n- CentOS 7 / CentOS 8\n\nThe installation steps assume that you have a license key for Driverless AI."
+ },
+ {
+ "output": " Once obtained, you will be prompted to paste the license key into the Driverless AI UI when you first log in, or you can save it as a .sig file and place it in the \\license folder that you will create during the installation process."
+ },
+ {
+ "output": " - When using systemd, remove the ``dai-minio``, ``dai-h2o``, ``dai-redis``, ``dai-procsy``, and ``dai-vis-server`` services. When upgrading, you can use the following commands to deactivate these services:\n\n ::\n\n systemctl stop dai-minio\n systemctl disable dai-minio\n systemctl stop dai-h2o\n systemctl disable dai-h2o\n systemctl stop dai-redis\n systemctl disable dai-redis\n systemctl stop dai-procsy\n systemctl disable dai-procsy\n systemctl stop dai-vis-server\n systemctl disable dai-vis-server\n\nEnvironment\n~\n\n+-+-+\n| Operating System | Min Mem |\n+=+=+\n| RHEL with GPUs | 64 GB |\n+-+-+\n| RHEL with CPUs | 64 GB |\n+-+-+\n| CentOS with GPUS | 64 GB |\n+-+-+\n| CentOS with CPUs | 64 GB |\n+-+-+\n\nRequirements\n\n\n- RedHat 7/RedHat 8/CentOS 7/CentOS 8\n- NVIDIA drivers >= |NVIDIA-driver-ver| recommended (GPU only)."
+ },
+ {
+ "output": " About the Install\n~\n\n.. include:: linux-rpmdeb-about.frag\n\nInstalling OpenCL\n~\n\nOpenCL is required for full LightGBM support on GPU-powered systems. To install OpenCL, run the following as root:\n\n.. code-block:: bash\n\n mkdir -p /etc/OpenCL/vendors && echo \"libnvidia-opencl.so.1\" > /etc/OpenCL/vendors/nvidia.icd && chmod a+r /etc/OpenCL/vendors/nvidia.icd && chmod a+x /etc/OpenCL/vendors/ && chmod a+x /etc/OpenCL\n\n.. note::\n\tIf OpenCL is not installed, then CUDA LightGBM is automatically used."
+ },
+ {
+ "output": " Installing Driverless AI\n\n\nRun the following commands to install the Driverless AI RPM. .. code-block:: bash\n :substitutions:\n\n # Install Driverless AI. sudo rpm -i |VERSION-rpm-lin|\n\n\nNote: For RHEL 7.5, it is necessary to upgrade library glib2:\n\n.. code-block:: bash\n\n sudo yum upgrade glib2\n\nBy default, the Driverless AI processes are owned by the 'dai' user and 'dai' group."
+ },
+ {
+ "output": " Replace and as appropriate. .. code-block:: bash\n :substitutions:\n\n # Temporarily specify service user and group when installing Driverless AI. # rpm saves these for systemd in the /etc/dai/User.conf and /etc/dai/Group.conf files."
+ },
+ {
+ "output": " Starting Driverless AI\n\n\nIf you have systemd (preferred):\n\n.. code-block:: bash\n\n # Start Driverless AI. sudo systemctl start dai\n\nIf you do not have systemd:\n\n.. code-block:: bash\n\n # Start Driverless AI."
+ },
+ {
+ "output": " This command needs to be run every reboot. For more information: http://docs.nvidia.com/deploy/driver-persistence/index.html. .. include:: enable-persistence.rst\n\nLooking at Driverless AI log files\n\n\nIf you have systemd (preferred):\n\n.. code-block:: bash\n\n sudo systemctl status dai-dai\n sudo journalctl -u dai-dai\n\nIf you do not have systemd:\n\n.. code-block:: bash\n\n sudo less /opt/h2oai/dai/log/dai.log\n sudo less /opt/h2oai/dai/log/h2o.log\n sudo less /opt/h2oai/dai/log/procsy.log\n sudo less /opt/h2oai/dai/log/vis-server.log\n\nStopping Driverless AI\n\n\nIf you have systemd (preferred):\n\n.. code-block:: bash\n\n # Stop Driverless AI."
+ },
+ {
+ "output": " Verify. sudo ps -u dai\n\nIf you do not have systemd:\n\n.. code-block:: bash\n\n # Stop Driverless AI. sudo pkill -U dai\n\n # The processes should now be stopped. Verify. sudo ps -u dai\n\nUpgrading Driverless AI\n~\n\n.. include:: upgrade-warning.frag\n\nRequirements\n\n\nWe recommend to have NVIDIA driver >= |NVIDIA-driver-ver| installed (GPU only) in your host environment for a seamless experience on all architectures, including Ampere."
+ },
+ {
+ "output": " Go to `NVIDIA download driver `__ to get the latest NVIDIA Tesla A/T/V/P/K series drivers. For reference on CUDA Toolkit and Minimum Required Driver Versions and CUDA Toolkit and Corresponding Driver Versions, see `here `__ ."
+ },
+ {
+ "output": " Upgrade Steps\n'\n\nIf you have systemd (preferred):\n\n.. code-block:: bash\n :substitutions:\n\n # Stop Driverless AI. sudo systemctl stop dai\n\n # The processes should now be stopped. Verify. sudo ps -u dai\n\n # Make a backup of /opt/h2oai/dai/tmp directory at this time."
+ },
+ {
+ "output": " sudo rpm -U |VERSION-rpm-lin|\n sudo systemctl daemon-reload\n sudo systemctl start dai\n\nIf you do not have systemd:\n\n.. code-block:: bash\n :substitutions:\n\n # Stop Driverless AI. sudo pkill -U dai\n\n # The processes should now be stopped."
+ },
+ {
+ "output": " sudo ps -u dai\n\n # Make a backup of /opt/h2oai/dai/tmp directory at this time. # Upgrade and restart. sudo rpm -U |VERSION-rpm-lin|\n sudo -H -u dai /opt/h2oai/dai/run-dai.sh\n\nUninstalling Driverless AI\n\n\nIf you have systemd (preferred):\n\n.. code-block:: bash\n\n # Stop Driverless AI."
+ },
+ {
+ "output": " Verify. sudo ps -u dai\n\n # Uninstall. sudo rpm -e dai\n\nIf you do not have systemd:\n\n.. code-block:: bash\n\n # Stop Driverless AI. sudo pkill -U dai\n\n # The processes should now be stopped. Verify."
+ },
+ {
+ "output": " sudo rpm -e dai\n\nCAUTION! At this point you can optionally completely remove all remaining files, including the database. (This cannot be undone.) .. code-block:: bash\n\n sudo rm -rf /opt/h2oai/dai\n sudo rm -rf /etc/dai\n\nNote: The UID and GID are not removed during the uninstall process."
+ },
+ {
+ "output": " .. _linux-deb:\n\nLinux DEBs\n\n\nFor Linux machines that will not use the Docker image or RPM, a deb installation is available for x86_64 Ubuntu 16.04/18.04/20.04/22.04. The following installation steps assume that you have a valid license key for Driverless AI."
+ },
+ {
+ "output": " Once obtained, you will be prompted to paste the license key into the Driverless AI UI when you first log in, or you can save it as a .sig file and place it in the \\license folder that you will create during the installation process."
+ },
+ {
+ "output": " - When using systemd, remove the ``dai-minio``, ``dai-h2o``, ``dai-redis``, ``dai-procsy``, and ``dai-vis-server`` services. When upgrading, you can use the following commands to deactivate these services:\n\n ::\n\n systemctl stop dai-minio\n systemctl disable dai-minio\n systemctl stop dai-h2o\n systemctl disable dai-h2o\n systemctl stop dai-redis\n systemctl disable dai-redis\n systemctl stop dai-procsy\n systemctl disable dai-procsy\n systemctl stop dai-vis-server\n systemctl disable dai-vis-server\n\nEnvironment\n~\n\n+-+-+\n| Operating System | Min Mem |\n+=+=+\n| Ubuntu with GPUs | 64 GB |\n+-+-+\n| Ubuntu with CPUs | 64 GB |\n+-+-+\n\nRequirements\n\n\n- Ubuntu 16.04/Ubuntu 18.04/Ubuntu 20.04/Ubuntu 22.04\n- NVIDIA drivers >= |NVIDIA-driver-ver| is recommended (GPU only)."
+ },
+ {
+ "output": " About the Install\n~\n\n.. include:: linux-rpmdeb-about.frag\n\nStarting NVIDIA Persistence Mode (GPU only)\n~\n\nIf you have NVIDIA GPUs, you must run the following NVIDIA command. This command needs to be run every reboot."
+ },
+ {
+ "output": " .. include:: enable-persistence.rst\n\nInstalling OpenCL\n~\n\nOpenCL is required for full LightGBM support on GPU-powered systems. To install OpenCL, run the following as root:\n\n.. code-block:: bash\n\n mkdir -p /etc/OpenCL/vendors && echo \"libnvidia-opencl.so.1\" > /etc/OpenCL/vendors/nvidia.icd && chmod a+r /etc/OpenCL/vendors/nvidia.icd && chmod a+x /etc/OpenCL/vendors/ && chmod a+x /etc/OpenCL\n\n.. note::\n\tIf OpenCL is not installed, then CUDA LightGBM is automatically used."
+ },
+ {
+ "output": " Installing the Driverless AI Linux DEB\n\n\nRun the following commands to install the Driverless AI DEB. .. code-block:: bash\n :substitutions:\n\n # Install Driverless AI. sudo dpkg -i |VERSION-deb-lin|\n\nBy default, the Driverless AI processes are owned by the 'dai' user and 'dai' group."
+ },
+ {
+ "output": " Replace and as appropriate. .. code-block:: bash\n :substitutions:\n\n # Temporarily specify service user and group when installing Driverless AI. # dpkg saves these for systemd in the /etc/dai/User.conf and /etc/dai/Group.conf files."
+ },
+ {
+ "output": " Starting Driverless AI\n\n\nTo start Driverless AI, use the following command:\n\n.. code-block:: bash\n\n # Start Driverless AI. sudo systemctl start dai\n\nNote: If you don't have systemd, refer to :ref:`linux-tarsh` for install instructions."
+ },
+ {
+ "output": " sudo systemctl stop dai\n\n # The processes should now be stopped. Verify. sudo ps -u dai\n\nIf you do not have systemd:\n\n.. code-block:: bash\n\n # Stop Driverless AI. sudo pkill -U dai\n\n # The processes should now be stopped."
+ },
+ {
+ "output": " sudo ps -u dai\n\n\nUpgrading Driverless AI\n~\n\n.. include:: upgrade-warning.frag\n\nRequirements\n\n\nWe recommend to have NVIDIA driver >= |NVIDIA-driver-ver| installed (GPU only) in your host environment for a seamless experience on all architectures, including Ampere."
+ },
+ {
+ "output": " Go to `NVIDIA download driver `__ to get the latest NVIDIA Tesla A/T/V/P/K series drivers. For reference on CUDA Toolkit and Minimum Required Driver Versions and CUDA Toolkit and Corresponding Driver Versions, see `here `__ ."
+ },
+ {
+ "output": " Upgrade Steps\n'\n\nIf you have systemd (preferred):\n\n.. code-block:: bash\n :substitutions:\n\n # Stop Driverless AI. sudo systemctl stop dai\n\n # Make a backup of /opt/h2oai/dai/tmp directory at this time."
+ },
+ {
+ "output": " sudo dpkg -i |VERSION-deb-lin|\n sudo systemctl daemon-reload\n sudo systemctl start dai\n\nIf you do not have systemd:\n\n.. code-block:: bash\n :substitutions:\n\n # Stop Driverless AI. sudo pkill -U dai\n\n # The processes should now be stopped."
+ },
+ {
+ "output": " sudo ps -u dai\n\n # Make a backup of /opt/h2oai/dai/tmp directory at this time. If you do not, all previous data will be lost. # Upgrade and restart. sudo dpkg -i |VERSION-deb-lin|\n sudo -H -u dai /opt/h2oai/dai/run-dai.sh\n\nUninstalling Driverless AI\n\n\nIf you have systemd (preferred):\n\n.. code-block:: bash\n\n # Stop Driverless AI."
+ },
+ {
+ "output": " Verify. sudo ps -u dai\n\n # Uninstall Driverless AI. sudo dpkg -r dai\n\n # Purge Driverless AI. sudo dpkg -P dai\n\nIf you do not have systemd:\n\n.. code-block:: bash\n\n # Stop Driverless AI. sudo pkill -U dai\n\n # The processes should now be stopped."
+ },
+ {
+ "output": " sudo ps -u dai\n\n # Uninstall Driverless AI. sudo dpkg -r dai\n\n # Purge Driverless AI. sudo dpkg -P dai\n\nCAUTION! At this point you can optionally completely remove all remaining files, including the database (this cannot be undone):\n\n.. code-block:: bash\n\n sudo rm -rf /opt/h2oai/dai\n sudo rm -rf /etc/dai\n\nNote: The UID and GID are not removed during the uninstall process."
+ },
+ {
+ "output": " However, we DO NOT recommend removing the UID and GID if you plan to re-install Driverless AI. If you remove the UID and GID and then reinstall Driverless AI, the UID and GID will likely be re-assigned to a different (unrelated) user/group in the future; this may cause confusion if there are any remaining files on the filesystem referring to the deleted user or group."
+ },
+ {
+ "output": " This problem is caused by the font ``NotoColorEmoji.ttf``, which cannot be processed by the Python matplotlib library. A workaround is to disable the font by renaming it. (Do not use fontconfig because it is ignored by matplotlib.)"
+ },
+ {
+ "output": " .. _install-on-nvidia-dgx:\n\nInstall on NVIDIA GPU Cloud/NGC Registry\n\n\nDriverless AI is supported on the following NVIDIA DGX products, and the installation steps for each platform are the same. - `NVIDIA GPU Cloud `__\n- `NVIDIA DGX-1 `__\n- `NVIDIA DGX-2 `__\n- `NVIDIA DGX Station `__\n\nEnvironment\n~\n\n+++++\n| Provider | GPUs | Min Memory | Suitable for |\n+++++\n| NVIDIA GPU Cloud | Yes | | Serious use |\n+++++\n| NVIDIA DGX-1/DGX-2 | Yes | 128 GB | Serious use |\n+++++\n| NVIDIA DGX Station | Yes | 64 GB | Serious Use | \n+++++\n\nInstalling the NVIDIA NGC Registry\n\n\nNote: These installation instructions assume that you are running on an NVIDIA DGX machine."
+ },
+ {
+ "output": " 1. Log in to your NVIDIA GPU Cloud account at https://ngc.nvidia.com/registry. (Note that NVIDIA Compute is no longer supported by NVIDIA.) 2. In the Registry > Partners menu, select h2oai-driverless."
+ },
+ {
+ "output": " At the bottom of the screen, select one of the H2O Driverless AI tags to retrieve the pull command. .. image:: ../images/ngc_select_tag.png\n :align: center\n\n4. On your NVIDIA DGX machine, open a command prompt and use the specified pull command to retrieve the Driverless AI image."
+ },
+ {
+ "output": " Set up a directory for the version of Driverless AI on the host machine: \n\n .. code-block:: bash\n :substitutions:\n\n # Set up directory with the version name\n mkdir |VERSION-dir|\n\n6. Set up the data, log, license, and tmp directories on the host machine:\n\n .. code-block:: bash\n :substitutions:\n\n # cd into the directory associated with the selected version of Driverless AI\n cd |VERSION-dir|\n\n # Set up the data, log, license, and tmp directories on the host machine\n mkdir data\n mkdir log\n mkdir license\n mkdir tmp\n\n7."
+ },
+ {
+ "output": " The data will be visible inside the Docker container. 8. Enable persistence of the GPU. Note that this only needs to be run once. Refer to the following for more information: http://docs.nvidia.com/deploy/driver-persistence/index.html."
+ },
+ {
+ "output": " Run ``docker images`` to find the new image tag. 10. Start the Driverless AI Docker image and replace TAG below with the image tag. Depending on your install version, use the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command."
+ },
+ {
+ "output": " We recommend ``shm-size=256m`` in docker launch command. But if user plans to build :ref:`image auto model ` extensively, then ``shm-size=2g`` is recommended for Driverless AI docker command."
+ },
+ {
+ "output": " .. tabs::\n\n .. tab:: >= Docker 19.03\n\n .. code-block:: bash\n :substitutions:\n\n # Start the Driverless AI Docker image\n docker run runtime=nvidia \\\n pid=host \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. tab:: < Docker 19.03\n\n .. code-block:: bash\n :substitutions:\n\n # Start the Driverless AI Docker image\n nvidia-docker run \\\n pid=host \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n Driverless AI will begin running::\n\n \n Welcome to H2O.ai's Driverless AI\n -\n\n - Put data in the volume mounted at /data\n - Logs are written to the volume mounted at /log/20180606-044258\n - Connect to Driverless AI on port 12345 inside the container\n - Connect to Jupyter notebook on port 8888 inside the container\n\n11."
+ },
+ {
+ "output": " Upgrading Driverless AI\n~\n\nThe steps for upgrading Driverless AI on an NVIDIA DGX system are similar to the installation steps. .. include:: upgrade-warning.frag\n \nNote: Use Ctrl+C to stop Driverless AI if it is still running."
+ },
+ {
+ "output": " Your host environment must have CUDA 10.0 or later with NVIDIA drivers >= 440.82 installed (GPU only). Driverless AI ships with its own CUDA libraries, but the driver must exist in the host environment."
+ },
+ {
+ "output": " Upgrade Steps\n'\n\n1. On your NVIDIA DGX machine, create a directory for the new Driverless AI version. 2. Copy the data, log, license, and tmp directories from the previous Driverless AI directory into the new Driverless AI directory."
+ },
+ {
+ "output": " AWS Role-Based Authentication\n~\n\nIn Driverless AI, it is possible to enable role-based authentication via the `IAM role `__."
+ },
+ {
+ "output": " AWS IAM Setup\n'\n\n1. Create an IAM role. This IAM role should have a Trust Relationship with Principal Trust Entity set to your Account ID. For example: trust relationship for Account ID `524466471676` would look like:\n\n .. code-block:: bash\n\n\t{\n\t \"Version\": \"2012-10-17\",\n\t \"Statement\": [\n\t {\n\t \"Effect\": \"Allow\",\n\t \"Principal\": {\n\t \"AWS\": \"arn:aws:iam::524466471676:root\"\n\t },\n\t \"Action\": \"sts:AssumeRole\"\n\t }\n\t ]\n\t}\n\n .. image:: ../images/aws_iam_role_create.png\n :align: center\n\n2."
+ },
+ {
+ "output": " Assign the policy to the user. .. image:: ../images/aws_iam_policy_assign.png\n\n4. Test role switching here: https://signin.aws.amazon.com/switchrole. (Refer to https://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_roles.html#troubleshoot_roles_cant-assume-role.)"
+ },
+ {
+ "output": " Resources\n'\n\n1. Granting a User Permissions to Switch Roles: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_permissions-to-switch.html\n2. Creating a Role to Delegate Permissions to an IAM User: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html\n3."
+ },
+ {
+ "output": " .. _system-settings:\n\nSystem Settings\n=\n\n.. _exclusive_mode:\n\n``exclusive_mode``\n\n\n.. dropdown:: Exclusive level of access to node resources\n\t:open:\n\n\tThere are three levels of access:\n\n\t\t- safe: this level assumes that there might be another experiment also running on same node."
+ },
+ {
+ "output": " - max: this level assumes that there is absolutly nothing else running on the node except the experiment\n\n\tThe default level is \"safe\" and the equivalent config.toml parameter is ``exclusive_mode``. If :ref:`multinode ` is enabled, this option has no effect, unless worker_remote_processors=1 when it will still be applied."
+ },
+ {
+ "output": " Changing the exclusive mode will reset all exclusive mode related options back to default and then re-apply the specific rules for the new mode, which will undo any fine-tuning of expert options that are part of exclusive mode rules."
+ },
+ {
+ "output": " To reset mode behavior, one can switch between 'safe' and the desired mode. This way the new child experiment will use the default system resources for the chosen mode. ``max_cores``\n~\n\n.. dropdown:: Number of Cores to Use\n\t:open:\n\n\tSpecify the number of cores to use per experiment."
+ },
+ {
+ "output": " Lower values can reduce memory usage but might slow down the experiment. This value defaults to 0(all). One can also set it using the environment variable OMP_NUM_THREADS or OPENBLAS_NUM_THREADS (e.g., in bash: 'export OMP_NUM_THREADS=32' or 'export OPENBLAS_NUM_THREADS=32')\n\n``max_fit_cores``\n~\n\n.. dropdown:: Maximum Number of Cores to Use for Model Fit\n\t:open:\n\n\tSpecify the maximum number of cores to use for a model's fit call."
+ },
+ {
+ "output": " This value defaults to 10. .. _use_dask_cluster:\n\n``use_dask_cluster``\n\n\n.. dropdown:: If full dask cluster is enabled, use full cluster\n\t:open:\n\n\tSpecify whether to use full multinode distributed cluster (True) or single-node dask (False)."
+ },
+ {
+ "output": " E.g. several DGX nodes can be more efficient, if used one DGX at a time for medium-sized data. The equivalent config.toml parameter is ``use_dask_cluster``. ``max_predict_cores``\n~\n\n.. dropdown:: Maximum Number of Cores to Use for Model Predict\n\t:open:\n\n\tSpecify the maximum number of cores to use for a model's predict call."
+ },
+ {
+ "output": " This value defaults to 0(all). ``max_predict_cores_in_dai``\n\n\n.. dropdown:: Maximum Number of Cores to Use for Model Transform and Predict When Doing MLI, AutoDoc\n\t:open:\n\n\tSpecify the maximum number of cores to use for a model's transform and predict call when doing operations in the Driverless AI MLI GUI and the Driverless AI R and Python clients."
+ },
+ {
+ "output": " This value defaults to 4. ``batch_cpu_tuning_max_workers``\n\n\n.. dropdown:: Tuning Workers per Batch for CPU\n\t:open:\n\n\tSpecify the number of workers used in CPU mode for tuning. A value of 0 uses the socket count, while a value of -1 uses all physical cores greater than or equal to 1."
+ },
+ {
+ "output": " ``cpu_max_workers``\n~\n.. dropdown:: Number of Workers for CPU Training\n\t:open:\n\n\tSpecify the number of workers used in CPU mode for training:\n\n\t- 0: Use socket count (Default)\n\t- -1: Use all physical cores >= 1 that count\n\n.. _num_gpus_per_experiment:\n\n``num_gpus_per_experiment``\n~\n\n.. dropdown:: #GPUs/Experiment\n\t:open:\n\n\tSpecify the number of GPUs to use per experiment."
+ },
+ {
+ "output": " Must be at least as large as the number of GPUs to use per model (or -1). In multinode context when using dask, this refers to the per-node value. ``min_num_cores_per_gpu``\n~\n.. dropdown:: Num Cores/GPU\n\t:open:\n\n\tSpecify the number of CPU cores per GPU."
+ },
+ {
+ "output": " This value defaults to 2. .. _num-gpus-per-model:\n\n``num_gpus_per_model``\n\n.. dropdown:: #GPUs/Model\n\t:open:\n\n\tSpecify the number of GPUs to user per model. The equivalent config.toml parameter is ``num_gpus_per_model`` and the default value is 1."
+ },
+ {
+ "output": " Setting this parameter to -1 means use all GPUs per model. In all cases, XGBoost tree and linear models use the number of GPUs specified per model, while LightGBM and Tensorflow revert to using 1 GPU/model and run multiple models on multiple GPUs."
+ },
+ {
+ "output": " Rulefit uses GPUs for parts involving obtaining the tree using LightGBM. In multinode context when using dask, this parameter refers to the per-node value. .. _num-gpus-for-prediction:\n\n``num_gpus_for_prediction``\n~\n\n.. dropdown:: Num."
+ },
+ {
+ "output": " If ``predict`` or ``transform`` are called in the same process as ``fit``/``fit_transform``, the number of GPUs will match. New processes will use this count for applicable models and transformers. Note that enabling ``tensorflow_nlp_have_gpus_in_production`` will override this setting for relevant TensorFlow NLP transformers."
+ },
+ {
+ "output": " Note: When GPUs are used, TensorFlow, PyTorch models and transformers, and RAPIDS always predict on GPU. And RAPIDS requires Driverless AI python scoring package also to be used on GPUs. In multinode context when using dask, this refers to the per-node value."
+ },
+ {
+ "output": " If using CUDA_VISIBLE_DEVICES=... to control GPUs (preferred method), gpu_id=0 is the\n\tfirst in that restricted list of devices. For example, if ``CUDA_VISIBLE_DEVICES='4,5'`` then ``gpu_id_start=0`` will refer to device #4."
+ },
+ {
+ "output": " This is because the underlying algorithms do not support arbitrary gpu ids, only sequential ids, so be sure to set this value correctly to avoid overlap across all experiments by all users. More information is available at: https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker#gpu-isolation\n\tNote that gpu selection does not wrap, so gpu_id_start + num_gpus_per_model must be less than the number of visibile GPUs."
+ },
+ {
+ "output": " For actual use beyond this value, system will start to have slow-down issues. THe default value is 3. ``max_max_dt_threads_munging``\n\n.. dropdown:: Maximum of threads for datatable for munging\n\t:open:\n\n\tMaximum number of threads for datatable for munging."
+ },
+ {
+ "output": " This option is primarily useful for avoiding model building failure due to GPU Out Of Memory (OOM). Currently is applicable to all non-dask XGBoost models (i.e. GLMModel, XGBoostGBMModel, XGBoostDartModel, XGBoostRFModel),during normal fit or when using Optuna."
+ },
+ {
+ "output": " For example, If XGBoost runs out of GPU memory, this is detected, and (regardless of setting of skip_model_failures), we perform feature selection using XGBoost on subsets of features. The dataset is progressively reduced by factor of 2 with more models to cover all features."
+ },
+ {
+ "output": " Then all sub-models are used to estimate variable importance by absolute information gain, in order to decide which features to include. Finally, a single model with the most important features is built using the feature count that did not lead to OOM."
+ },
+ {
+ "output": " - Reproducibility is not guaranteed when this option is turned on. Hence if user enables reproducibility for the experiment, 'auto' automatically sets this option to 'off'. This is because the condition of running OOM can change for same experiment seed."
+ },
+ {
+ "output": " Also see :ref:`reduce_repeats_when_failure ` and :ref:`fraction_anchor_reduce_features_when_failure `\n\n.. _reduce_repeats_when_failure:\n\n``reduce_repeats_when_failure``\n~\n\n.. dropdown:: Number of repeats for models used for feature selection during failure recovery\n\t:open:\n\n\tWith :ref:`allow_reduce_features_when_failure `, this controls how many repeats of sub-models are used for feature selection."
+ },
+ {
+ "output": " More repeats can lead to higher accuracy. The cost of this option is proportional to the repeat count. The default value is 1. .. _fraction_anchor_reduce_features_when_failure:\n\n``fraction_anchor_reduce_features_when_failure``\n\n\n.. dropdown:: Fraction of features treated as anchor for feature selection during failure recovery\n\t:open:\n\n\tWith :ref:`allow_reduce_features_when_failure `, this controls the fraction of features treated as an anchor that are fixed for all sub-models."
+ },
+ {
+ "output": " For tuning and evolution, the probability depends upon any prior importance (if present) from other individuals, while final model uses uniform probability for anchor features. The default fraction is 0.1."
+ },
+ {
+ "output": " See allow_reduce_features_when_failure. ``lightgbm_reduce_on_errors_list``\n\n\n.. dropdown:: Errors From LightGBM That Trigger Reduction of Features\n\t:open:\n\n\tError strings from LightGBM that are used to trigger re-fit on reduced sub-models."
+ },
+ {
+ "output": " ``num_gpus_per_hyperopt_dask``\n\n\n.. dropdown:: GPUs / HyperOptDask\n\t:open:\n\n\tSpecify the number of GPUs to use per model hyperopt training task. To use all GPUs, set this to -1. For example, when this is set to -1 and there are 4 GPUs available, all of them can be used for the training of a single model across a Dask cluster."
+ },
+ {
+ "output": " In multinode context, this refers to the per-node value. ``detailed_traces``\n~\n\n.. dropdown:: Enable Detailed Traces\n\t:open:\n\n\tSpecify whether to enable detailed tracing in Driverless AI trace when running an experiment."
+ },
+ {
+ "output": " ``debug_log``\n~\n\n.. dropdown:: Enable Debug Log Level\n\t:open:\n\n\tIf enabled, the log files will also include debug logs. This is disabled by default. ``log_system_info_per_experiment``\n\n\n.. dropdown:: Enable Logging of System Information for Each Experiment\n\t:open:\n\n\tSpecify whether to include system information such as CPU, GPU, and disk space at the start of each experiment log."
+ },
+ {
+ "output": " The F0.5 score is the weighted harmonic mean of the precision and recall (given a threshold value). Unlike the F1 score, which gives equal weight to precision and recall, the F0.5 score gives more weight to precision than to recall."
+ },
+ {
+ "output": " For example, if your use case is to predict which products you will run out of, you may consider False Positives worse than False Negatives. In this case, you want your predictions to be very precise and only capture the products that will definitely run out."
+ },
+ {
+ "output": " F05 equation:\n\n.. math::\n\n F0.5 = 1.25 \\;\\Big(\\; \\frac{(precision) \\; (recall)}{((0.25) \\; (precision)) + recall}\\; \\Big)\n\nWhere:\n\n- *precision* is the positive observations (true positives) the model correctly identified from all the observations it labeled as positive (the true positives + the false positives)."
+ },
+ {
+ "output": " S3 Setup\n\n\nDriverless AI lets you explore S3 data sources from within the Driverless AI application. This section provides instructions for configuring Driverless AI to work with S3. Note: Depending on your Docker install version, use either the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command when starting the Driverless AI Docker image."
+ },
+ {
+ "output": " Description of Configuration Attributes\n~\n\n- ``aws_access_key_id``: The S3 access key ID\n- ``aws_secret_access_key``: The S3 access key\n- ``aws_role_arn``: The Amazon Resource Name\n- ``aws_default_region``: The region to use when the aws_s3_endpoint_url option is not set."
+ },
+ {
+ "output": " - ``aws_s3_endpoint_url``: The endpoint URL that will be used to access S3. - ``aws_use_ec2_role_credentials``: If set to true, the S3 Connector will try to to obtain credentials associated with the role attached to the EC2 instance."
+ },
+ {
+ "output": " - ``enabled_file_systems``: The file systems you want to enable. This must be configured in order for data connectors to function properly. Example 1: Enable S3 with No Authentication\n~\n\n.. tabs::\n .. group-tab:: Docker Image Installs\n\n\tThis example enables the S3 data connector and disables authentication."
+ },
+ {
+ "output": " This allows users to reference data stored in S3 directly using the name node address, for example: s3://name.node/datasets/iris.csv. .. code-block:: bash\n\t :substitutions:\n\n\t nvidia-docker run \\\n\t\t\tshm-size=256m \\\n\t\t\tadd-host name.node:172.16.2.186 \\\n\t\t\t-e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS=\"file,s3\" \\\n\t\t\t-p 12345:12345 \\\n\t\t\tinit -it rm \\\n\t\t\t-v /tmp/dtmp/:/tmp \\\n\t\t\t-v /tmp/dlog/:/log \\\n\t\t\t-v /tmp/dlicense/:/license \\\n\t\t\t-v /tmp/ddata/:/data \\\n\t\t\t-u $(id -u):$(id -g) \\\n\t\t\th2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Docker Image with the config.toml\n\n\tThis example shows how to configure S3 options in the config.toml file, and then specify that file when starting Driverless AI in Docker."
+ },
+ {
+ "output": " 1. Configure the Driverless AI config.toml file. Set the following configuration options. - ``enabled_file_systems = \"file, upload, s3\"``\n\n\t2. Mount the config.toml file into the Docker container. .. code-block:: bash\n\t \t :substitutions:\n\n\t\t nvidia-docker run \\\n\t\t \tpid=host \\\n\t\t \tinit \\\n\t\t \trm \\\n\t\t \tshm-size=256m \\\n\t\t \tadd-host name.node:172.16.2.186 \\\n\t\t \t-e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n\t\t \t-p 12345:12345 \\\n\t\t \t-v /local/path/to/config.toml:/path/in/docker/config.toml \\\n\t\t \t-v /etc/passwd:/etc/passwd:ro \\\n\t\t \t-v /etc/group:/etc/group:ro \\\n\t\t \t-v /tmp/dtmp/:/tmp \\\n\t\t \t-v /tmp/dlog/:/log \\\n\t\t \t-v /tmp/dlicense/:/license \\\n\t\t \t-v /tmp/ddata/:/data \\\n\t\t \t-u $(id -u):$(id -g) \\\n\t\t \th2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n\tThis example enables the S3 data connector and disables authentication."
+ },
+ {
+ "output": " 1. Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n\t ::\n\n\t # DEB and RPM\n\t export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n\t # TAR SH\n\t export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n\t2."
+ },
+ {
+ "output": " ::\n\n\t\t# File System Support\n\t\t# upload : standard upload feature\n\t\t# file : local file system/server file system\n\t\t# hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n\t\t# dtap : Blue Data Tap file system, remember to configure the DTap section below\n\t\t# s3 : Amazon S3, optionally configure secret and access key below\n\t\t# gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n\t\t# gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n\t\t# minio : Minio Cloud Storage, remember to configure secret and access key below\n\t\t# snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n\t\t# kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n\t\t# azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n\t\t# jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n\t\t# recipe_url: load custom recipe from URL\n\t\t# recipe_file: load custom recipe from local file system\n\t\tenabled_file_systems = \"file, s3\"\n\n\t3. Save the changes when you are done, then stop/restart Driverless AI."
+ },
+ {
+ "output": " It also configures Docker DNS by passing the name and IP of the S3 name node. This allows users to reference data stored in S3 directly using the name node address, for example: s3://name.node/datasets/iris.csv."
+ },
+ {
+ "output": " 1. Configure the Driverless AI config.toml file. Set the following configuration options. - ``enabled_file_systems = \"file, upload, s3\"``\n\t - ``aws_access_key_id = \"\"``\n\t - ``aws_secret_access_key = \"\"``\n\n\t2."
+ },
+ {
+ "output": " .. code-block:: bash\n\t \t:substitutions:\n\n\t\t nvidia-docker run \\\n\t\t \tpid=host \\\n\t\t \tinit \\\n\t\t \trm \\\n\t\t \tshm-size=256m \\\n\t\t \tadd-host name.node:172.16.2.186 \\\n\t\t \t-e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n\t\t \t-p 12345:12345 \\\n\t\t \t-v /local/path/to/config.toml:/path/in/docker/config.toml \\\n\t\t \t-v /etc/passwd:/etc/passwd:ro \\\n\t\t \t-v /etc/group:/etc/group:ro \\\n\t\t \t-v /tmp/dtmp/:/tmp \\\n\t\t \t-v /tmp/dlog/:/log \\\n\t\t \t-v /tmp/dlicense/:/license \\\n\t\t \t-v /tmp/ddata/:/data \\\n\t\t \t-u $(id -u):$(id -g) \\\n\t\t \th2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n\tThis example enables the S3 data connector with authentication by passing an S3 access key ID and an access key."
+ },
+ {
+ "output": " Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n\t ::\n\n\t # DEB and RPM\n\t export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n\t # TAR SH\n\t export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n\t2."
+ },
+ {
+ "output": " ::\n\n\t\t# File System Support\n\t\t# upload : standard upload feature\n\t\t# file : local file system/server file system\n\t\t# hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n\t\t# dtap : Blue Data Tap file system, remember to configure the DTap section below\n\t\t# s3 : Amazon S3, optionally configure secret and access key below\n\t\t# gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n\t\t# gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n\t\t# minio : Minio Cloud Storage, remember to configure secret and access key below\n\t\t# snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n\t\t# kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n\t\t# azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n\t\t# jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n\t\t# recipe_url: load custom recipe from URL\n\t\t# recipe_file: load custom recipe from local file system\n\t\tenabled_file_systems = \"file, s3\"\n\n\t\t# S3 Connector credentials\n\t\taws_access_key_id = \"\"\n\t\taws_secret_access_key = \"\"\n\n\t3."
+ },
+ {
+ "output": " .. _image-settings:\n\nImage Settings\n\n\n``enable_tensorflow_image``\n~\n.. dropdown:: Enable Image Transformer for Processing of Image Data\n\t:open:\n\n\tSpecify whether to use pretrained deep learning models for processing of image data as part of the feature engineering pipeline."
+ },
+ {
+ "output": " This is enabled by default. .. _tensorflow_image_pretrained_models:\n\n``tensorflow_image_pretrained_models``\n\n\n.. dropdown:: Supported ImageNet Pretrained Architectures for Image Transformer\n\t:open:\n\n\tSpecify the supported `ImageNet `__ pretrained architectures for image transformer."
+ },
+ {
+ "output": " If an internet connection is not available, non-default models must be downloaded from http://s3.amazonaws.com/artifacts.h2o.ai/releases/ai/h2o/pretrained/dai_image_models_1_10.zip and extracted into ``tensorflow_image_pretrained_models_dir``."
+ },
+ {
+ "output": " In this case, embeddings from the different architectures are concatenated together (in a single embedding). ``tensorflow_image_vectorization_output_dimension``\n~\n.. dropdown:: Dimensionality of Feature Space Created by Image Transformer\n\t:open:\n\n\tSpecify the dimensionality of the feature (embedding) space created by Image Transformer."
+ },
+ {
+ "output": " .. _image-model-fine-tune:\n\n``tensorflow_image_fine_tune``\n\n.. dropdown:: Enable Fine-Tuning of the Pretrained Models Used for the Image Transformer\n\t:open:\n\n\tSpecify whether to enable fine-tuning of the ImageNet pretrained models used for the Image Transformer."
+ },
+ {
+ "output": " ``tensorflow_image_fine_tuning_num_epochs``\n~\n.. dropdown:: Number of Epochs for Fine-Tuning Used for the Image Transformer\n\t:open:\n\n\tSpecify the number of epochs for fine-tuning ImageNet pretrained models used for the Image Transformer."
+ },
+ {
+ "output": " ``tensorflow_image_augmentations``\n\n.. dropdown:: List of Augmentations for Fine-Tuning Used for the Image Transformer\n\t:open:\n\n\tSpecify the list of possible image augmentations to apply while fine-tuning the ImageNet pretrained models used for the Image Transformer."
+ },
+ {
+ "output": " ``tensorflow_image_batch_size``\n~\n.. dropdown:: Batch Size for the Image Transformer\n\t:open:\n\n\tSpecify the batch size for the Image Transformer. By default, the batch size is set to -1 (selected automatically)."
+ },
+ {
+ "output": " ``image_download_timeout``\n\n.. dropdown:: Image Download Timeout in Seconds\n\t:open:\n\n\tWhen providing images through URLs, specify the maximum number of seconds to wait for an image to download. This value defaults to 60 sec."
+ },
+ {
+ "output": " This value defaults to 0.1. ``string_col_as_image_min_valid_types_fraction``\n\n.. dropdown:: Minimum Fraction of Images That Need to Be of Valid Types for Image Column to Be Used\n\t:open:\n\n\tSpecify the fraction of unique image URIs that need to have valid endings (as defined by ``string_col_as_image_valid_types``) for a string column to be considered as image data."
+ },
+ {
+ "output": " ``tensorflow_image_use_gpu``\n\n.. dropdown:: Enable GPU(s) for Faster Transformations With the Image Transformer\n\t:open:\n\n\tSpecify whether to use any available GPUs to transform images into embeddings with the Image Transformer."
+ },
+ {
+ "output": " Install on RHEL\n-\n\nThis section describes how to install the Driverless AI Docker image on RHEL. The installation steps vary depending on whether your system has GPUs or if it is CPU only. Environment\n~\n\n+-+-+-+\n| Operating System | GPUs?"
+ },
+ {
+ "output": " These links describe how to disable automatic updates and specific package updates. This is necessary in order to prevent a mismatch between the NVIDIA driver and the kernel, which can lead to the GPUs failures."
+ },
+ {
+ "output": " Note that some of the images in this video may change between releases, but the installation steps remain the same. .. note::\n\tAs of this writing, Driverless AI has been tested on RHEL versions 7.4, 8.3, and 8.4."
+ },
+ {
+ "output": " Once you are logged in, perform the following steps. 1. Retrieve the Driverless AI Docker image from https://www.h2o.ai/download/. 2. Install and start Docker EE on RHEL (if not already installed). Follow the instructions on https://docs.docker.com/engine/installation/linux/docker-ee/rhel/."
+ },
+ {
+ "output": " .. code-block:: bash\n\n sudo yum install -y yum-utils\n sudo yum-config-manager add-repo https://download.docker.com/linux/centos/docker-ce.repo\n sudo yum makecache fast\n sudo yum -y install docker-ce\n sudo systemctl start docker\n\n3."
+ },
+ {
+ "output": " More information is available at https://github.com/NVIDIA/nvidia-docker/blob/master/README.md. .. code-block:: bash\n\n curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \\\n sudo apt-key add -\n distribution=$(."
+ },
+ {
+ "output": " If you do not run this command, you will have to remember to start the nvidia-docker service manually; otherwise the GPUs will not appear as available. .. code-block:: bash\n\n sudo systemctl enable nvidia-docker\n\n Alternatively, if you have installed Docker CE above you can install nvidia-docker with:\n\n .. code-block:: bash\n\n curl -s -L https://nvidia.github.io/nvidia-docker/centos7/x86_64/nvidia-docker.repo | \\\n sudo tee /etc/yum.repos.d/nvidia-docker.repo\n sudo yum install nvidia-docker2\n\n4."
+ },
+ {
+ "output": " If the driver is not up and running, log on to http://www.nvidia.com/Download/index.aspx?lang=en-us to get the latest NVIDIA Tesla V/P/K series driver. .. code-block:: bash\n\n nvidia-docker run rm nvidia/cuda nvidia-smi\n\n5."
+ },
+ {
+ "output": " Change directories to the new folder, then load the Driverless AI Docker image inside the new directory:\n\n .. code-block:: bash\n :substitutions:\n\n # cd into the new directory\n cd |VERSION-dir|\n\n # Load the Driverless AI docker image\n docker load < dai-docker-ubi8-x86_64-|VERSION-long|.tar.gz\n\n7."
+ },
+ {
+ "output": " Note that this needs to be run once every reboot. Refer to the following for more information: http://docs.nvidia.com/deploy/driver-persistence/index.html. .. include:: enable-persistence.rst\n\n8. Set up the data, log, and license directories on the host machine (within the new directory):\n\n .. code-block:: bash\n\n # Set up the data, log, license, and tmp directories on the host machine\n mkdir data\n mkdir log\n mkdir license\n mkdir tmp\n\n9."
+ },
+ {
+ "output": " The data will be visible inside the Docker container. 10. Run ``docker images`` to find the image tag. 11. Start the Driverless AI Docker image and replace TAG below with the image tag. Depending on your install version, use the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command."
+ },
+ {
+ "output": " For GPU users, as GPU needs ``pid=host`` for nvml, which makes tini not use pid=1, so it will show the warning message (still harmless). We recommend ``shm-size=256m`` in docker launch command. But if user plans to build :ref:`image auto model ` extensively, then ``shm-size=2g`` is recommended for Driverless AI docker command."
+ },
+ {
+ "output": " .. tabs::\n\n .. tab:: >= Docker 19.03\n\n .. code-block:: bash\n :substitutions:\n\n # Start the Driverless AI Docker image\n docker run runtime=nvidia \\\n pid=host \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. tab:: < Docker 19.03\n\n .. code-block:: bash\n :substitutions:\n\n # Start the Driverless AI Docker image\n nvidia-docker run \\\n pid=host \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n Driverless AI will begin running::\n\n \n Welcome to H2O.ai's Driverless AI\n -\n\n - Put data in the volume mounted at /data\n - Logs are written to the volume mounted at /log/20180606-044258\n - Connect to Driverless AI on port 12345 inside the container\n - Connect to Jupyter notebook on port 8888 inside the container\n\n12."
+ },
+ {
+ "output": " .. _install-on-rhel-cpus-only:\n\nInstall on RHEL with CPUs\n~\n\nThis section describes how to install and start the Driverless AI Docker image on RHEL. Note that this uses ``docker`` and not ``nvidia-docker``."
+ },
+ {
+ "output": " Note that some of the images in this video may change between releases, but the installation steps remain the same. .. note::\n\tAs of this writing, Driverless AI has been tested on RHEL versions 7.4, 8.3, and 8.4."
+ },
+ {
+ "output": " Once you are logged in, perform the following steps. 1. Install and start Docker EE on RHEL (if not already installed). Follow the instructions on https://docs.docker.com/engine/installation/linux/docker-ee/rhel/."
+ },
+ {
+ "output": " .. code-block:: bash\n\n sudo yum install -y yum-utils\n sudo yum-config-manager add-repo https://download.docker.com/linux/centos/docker-ce.repo\n sudo yum makecache fast\n sudo yum -y install docker-ce\n sudo systemctl start docker\n\n2."
+ },
+ {
+ "output": " 3. Set up a directory for the version of Driverless AI on the host machine:\n\n .. code-block:: bash\n :substitutions:\n\n # Set up directory with the version name\n mkdir |VERSION-dir|\n\n4. Load the Driverless AI Docker image inside the new directory:\n\n .. code-block:: bash\n :substitutions:\n\n # Load the Driverless AI Docker image\n docker load < dai-docker-ubi8-x86_64-|VERSION-long|.tar.gz\n\n5."
+ },
+ {
+ "output": " Copy data into the data directory on the host. The data will be visible inside the Docker container at //data. 7. Run ``docker images`` to find the image tag. 8. Start the Driverless AI Docker image."
+ },
+ {
+ "output": " Note that from version 1.10 DAI docker image runs with internal ``tini`` that is equivalent to using ``init`` from docker, if both are enabled in the launch command, tini will print a (harmless) warning message."
+ },
+ {
+ "output": " But if user plans to build :ref:`image auto model ` extensively, then ``shm-size=2g`` is recommended for Driverless AI docker command. .. code-block:: bash\n :substitutions:\n\n $ docker run \\\n pid=host \\\n rm \\\n shm-size=256m \\\n -u `id -u`:`id -g` \\\n -p 12345:12345 \\\n -v `pwd`/data:/data \\\n -v `pwd`/log:/log \\\n -v `pwd`/license:/license \\\n -v `pwd`/tmp:/tmp \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n Driverless AI will begin running::\n\n \n Welcome to H2O.ai's Driverless AI\n -\n\n - Put data in the volume mounted at /data\n - Logs are written to the volume mounted at /log/20180606-044258\n - Connect to Driverless AI on port 12345 inside the container\n - Connect to Jupyter notebook on port 8888 inside the container\n\n9."
+ },
+ {
+ "output": " HDFS Setup\n\n\nDriverless AI lets you explore HDFS data sources from within the Driverless AI application. This section provides instructions for configuring Driverless AI to work with HDFS. Note: Depending on your Docker install version, use either the ``docker run runtime=nvidia`` (>= Docker 19.03) or ``nvidia-docker`` (< Docker 19.03) command when starting the Driverless AI Docker image."
+ },
+ {
+ "output": " Description of Configuration Attributes\n~\n\n- ``hdfs_config_path`` (Required): The location the HDFS config folder path. This folder can contain multiple config files. - ``hdfs_auth_type`` (Required): Specifies the HDFS authentication."
+ },
+ {
+ "output": " - ``keytab``: Authenticate with a keytab (recommended). If running DAI as a service, then the Kerberos keytab needs to be owned by the DAI user. - ``keytabimpersonation``: Login with impersonation using a keytab."
+ },
+ {
+ "output": " - ``key_tab_path``: The path of the principal key tab file. This is required when ``hdfs_auth_type='principal'``. - ``hdfs_app_principal_user``: The Kerberos application principal user. This is required when ``hdfs_auth_type='keytab'``."
+ },
+ {
+ "output": " Separate each argument with spaces. - ``-Djava.security.krb5.conf``\n - ``-Dsun.security.krb5.debug``\n - ``-Dlog4j.configuration``\n\n- ``hdfs_app_classpath``: The HDFS classpath. - ``hdfs_app_supported_schemes``: The list of DFS schemas that is used to check whether a valid input to the connector has been established."
+ },
+ {
+ "output": " Additional schemas can be supported by adding values that are not selected by default to the list. - ``hdfs://``\n - ``maprfs://``\n - ``swift://``\n\n- ``hdfs_max_files_listed``: Specifies the maximum number of files that are viewable in the connector UI."
+ },
+ {
+ "output": " To view more files, increase the default value. - ``hdfs_init_path``: Specifies the starting HDFS path displayed in the UI of the HDFS browser. - ``enabled_file_systems``: The file systems you want to enable."
+ },
+ {
+ "output": " Example 1: Enable HDFS with No Authentication\n~\n\n.. tabs::\n .. group-tab:: Docker Image Installs\n\n This example enables the HDFS data connector and disables HDFS authentication. It does not pass any HDFS configuration file; however it configures Docker DNS by passing the name and IP of the HDFS name node."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS=\"file,hdfs\" \\\n -e DRIVERLESS_AI_HDFS_AUTH_TYPE='noauth' \\\n -e DRIVERLESS_AI_PROCSY_PORT=8080 \\\n -p 12345:12345 \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Docker Image with the config.toml\n\n This example shows how to configure HDFS options in the config.toml file, and then specify that file when starting Driverless AI in Docker."
+ },
+ {
+ "output": " 1. Configure the Driverless AI config.toml file. Set the following configuration options. Note that the procsy port, which defaults to 12347, also has to be changed. - ``enabled_file_systems = \"file, upload, hdfs\"``\n - ``procsy_ip = \"127.0.0.1\"``\n - ``procsy_port = 8080``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example enables the HDFS data connector and disables HDFS authentication in the config.toml file."
+ },
+ {
+ "output": " 1. Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " Note that the procsy port, which defaults to 12347, also has to be changed. ::\n\n # IP address and port of procsy process. procsy_ip = \"127.0.0.1\"\n procsy_port = 8080\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, hdfs\"\n\n 3. Save the changes when you are done, then stop/restart Driverless AI."
+ },
+ {
+ "output": " If the time difference between clients and DCs are 5 minutes or higher, there will be Kerberos failures. - If running Driverless AI as a service, then the Kerberos keytab needs to be owned by the Driverless AI user; otherwise Driverless AI will not be able to read/access the Keytab and will result in a fallback to simple authentication and, hence, fail."
+ },
+ {
+ "output": " - Configures the environment variable ``DRIVERLESS_AI_HDFS_APP_PRINCIPAL_USER`` to reference a user for whom the keytab was created (usually in the form of user@realm). .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS=\"file,hdfs\" \\\n -e DRIVERLESS_AI_HDFS_AUTH_TYPE='keytab' \\\n -e DRIVERLESS_AI_KEY_TAB_PATH='tmp/<>' \\\n -e DRIVERLESS_AI_HDFS_APP_PRINCIPAL_USER='<>' \\\n -e DRIVERLESS_AI_PROCSY_PORT=8080 \\ \n -p 12345:12345 \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Docker Image with the config.toml\n\n This example:\n\n - Places keytabs in the ``/tmp/dtmp`` folder on your machine and provides the file path as described below."
+ },
+ {
+ "output": " 1. Configure the Driverless AI config.toml file. Set the following configuration options. Note that the procsy port, which defaults to 12347, also has to be changed. - ``enabled_file_systems = \"file, upload, hdfs\"``\n - ``procsy_ip = \"127.0.0.1\"``\n - ``procsy_port = 8080``\n - ``hdfs_auth_type = \"keytab\"``\n - ``key_tab_path = \"/tmp/\"``\n - ``hdfs_app_principal_user = \"\"``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example:\n\n - Places keytabs in the ``/tmp/dtmp`` folder on your machine and provides the file path as described below."
+ },
+ {
+ "output": " 1. Export the Driverless AI config.toml file or add it to ~/.bashrc. For example:\n\n ::\n\n # DEB and RPM\n export DRIVERLESS_AI_CONFIG_FILE=\"/etc/dai/config.toml\"\n\n # TAR SH\n export DRIVERLESS_AI_CONFIG_FILE=\"/path/to/your/unpacked/dai/directory/config.toml\" \n\n 2."
+ },
+ {
+ "output": " ::\n \n # IP address and port of procsy process. procsy_ip = \"127.0.0.1\"\n procsy_port = 8080\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, hdfs\"\n\n # HDFS connector\n # Auth type can be Principal/keytab/keytabPrincipal\n # Specify HDFS Auth Type, allowed options are:\n # noauth : No authentication needed\n # principal : Authenticate with HDFS with a principal user\n # keytab : Authenticate with a Key tab (recommended)\n # keytabimpersonation : Login with impersonation using a keytab\n hdfs_auth_type = \"keytab\"\n\n # Path of the principal key tab file\n key_tab_path = \"/tmp/\"\n\n # Kerberos app principal user (recommended)\n hdfs_app_principal_user = \"\"\n\n 3."
+ },
+ {
+ "output": " Example 3: Enable HDFS with Keytab-Based Impersonation\n\n\nNotes: \n\n- If using Kerberos, be sure that the Driverless AI time is synched with the Kerberos server. - If running Driverless AI as a service, then the Kerberos keytab needs to be owned by the Driverless AI user."
+ },
+ {
+ "output": " .. tabs::\n .. group-tab:: Docker Image Installs\n\n The example:\n\n - Sets the authentication type to ``keytabimpersonation``. - Places keytabs in the ``/tmp/dtmp`` folder on your machine and provides the file path as described below."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n -e DRIVERLESS_AI_ENABLED_FILE_SYSTEMS=\"file,hdfs\" \\\n -e DRIVERLESS_AI_HDFS_AUTH_TYPE='keytabimpersonation' \\\n -e DRIVERLESS_AI_KEY_TAB_PATH='/tmp/<>' \\\n -e DRIVERLESS_AI_HDFS_APP_PRINCIPAL_USER='<>' \\\n -e DRIVERLESS_AI_PROCSY_PORT=8080 \\ \n -p 12345:12345 \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Docker Image with the config.toml\n\n This example:\n\n - Sets the authentication type to ``keytabimpersonation``."
+ },
+ {
+ "output": " - Configures the ``hdfs_app_principal_user`` variable, which references a user for whom the keytab was created (usually in the form of user@realm). 1. Configure the Driverless AI config.toml file. Set the following configuration options."
+ },
+ {
+ "output": " - ``enabled_file_systems = \"file, upload, hdfs\"``\n - ``procsy_ip = \"127.0.0.1\"``\n - ``procsy_port = 8080``\n - ``hdfs_auth_type = \"keytabimpersonation\"``\n - ``key_tab_path = \"/tmp/\"``\n - ``hdfs_app_principal_user = \"\"``\n\n 2."
+ },
+ {
+ "output": " .. code-block:: bash\n :substitutions:\n\n nvidia-docker run \\\n pid=host \\\n init \\\n rm \\\n shm-size=256m \\\n add-host name.node:172.16.2.186 \\\n -e DRIVERLESS_AI_CONFIG_FILE=/path/in/docker/config.toml \\\n -p 12345:12345 \\\n -v /local/path/to/config.toml:/path/in/docker/config.toml \\\n -v /etc/passwd:/etc/passwd:ro \\\n -v /etc/group:/etc/group:ro \\\n -v /tmp/dtmp/:/tmp \\\n -v /tmp/dlog/:/log \\\n -v /tmp/dlicense/:/license \\\n -v /tmp/ddata/:/data \\\n -u $(id -u):$(id -g) \\\n h2oai/dai-ubi8-x86_64:|tag|\n\n .. group-tab:: Native Installs\n\n This example:\n\n - Sets the authentication type to ``keytabimpersonation``."
+ },
+ {
+ "output": " - Configures the ``hdfs_app_principal_user`` variable, which references a user for whom the keytab was created (usually in the form of user@realm). 1. Export the Driverless AI config.toml file or add it to ~/.bashrc."
+ },
+ {
+ "output": " Specify the following configuration options in the config.toml file. ::\n\n # IP address and port of procsy process. procsy_ip = \"127.0.0.1\"\n procsy_port = 8080\n\n # File System Support\n # upload : standard upload feature\n # file : local file system/server file system\n # hdfs : Hadoop file system, remember to configure the HDFS config folder path and keytab below\n # dtap : Blue Data Tap file system, remember to configure the DTap section below\n # s3 : Amazon S3, optionally configure secret and access key below\n # gcs : Google Cloud Storage, remember to configure gcs_path_to_service_account_json below\n # gbq : Google Big Query, remember to configure gcs_path_to_service_account_json below\n # minio : Minio Cloud Storage, remember to configure secret and access key below\n # snow : Snowflake Data Warehouse, remember to configure Snowflake credentials below (account name, username, password)\n # kdb : KDB+ Time Series Database, remember to configure KDB credentials below (hostname and port, optionally: username, password, classpath, and jvm_args)\n # azrbs : Azure Blob Storage, remember to configure Azure credentials below (account name, account key)\n # jdbc: JDBC Connector, remember to configure JDBC below."
+ },
+ {
+ "output": " (hive_app_configs)\n # recipe_url: load custom recipe from URL\n # recipe_file: load custom recipe from local file system\n enabled_file_systems = \"file, hdfs\"\n\n # HDFS connector\n # Auth type can be Principal/keytab/keytabPrincipal\n # Specify HDFS Auth Type, allowed options are:\n # noauth : No authentication needed\n # principal : Authenticate with HDFS with a principal user\n # keytab : Authenticate with a Key tab (recommended)\n # keytabimpersonation : Login with impersonation using a keytab\n hdfs_auth_type = \"keytabimpersonation\"\n\n # Path of the principal key tab file\n key_tab_path = \"/tmp/