JupyterLab is the next generation of Jupyter Notebooks provided by GPULab’s cloud computing environment, combining an updated user interface with an integrated file browser, and tabs for running multiple notebooks and terminals. Jupyter Notebooks are a staple of live coding and rapid prototyping for data science. Jupyter Notebooks combine live runnable code with Markdown-based text segments, ideal for describing and demonstrating computational procedures and their corresponding results.
NVIDIA CUDA® is a parallel computing platform and programming model developed by NVIDIA for harnessing general computing on GPUs. Deep Learning is computationally intensive, specifically requiring floating-point arithmetic over large data sets, a type of operation well suited for GPUs and their original purpose in rendering 3D graphics. CUDA provides the possibility for GPU acceleration for nearly any data science and analytics project.
O’Reilly’s 2013 poll showed 40 percent of data scientists responding use Python in their day-to-day work. As a result, Python has amassed an enormous suite of data science frameworks and libraries over the last decade. The prevalent frameworks TensorFlow and PyTorch testify to Python’s effectiveness in Machine Learning/Deep Learning model design and training. Python Package Index (PyPI) is teaming data science libraries such as Pandas, NumPy, SciPy, Statsmodels, Scikit-Learn, and many more.
Julia first appeared in 2012 as a high-performance, dynamic programming language for numerical analysis and computational science. Julia is well suited for numerical analysis and computational science with its near C-like speed from multiple dispatch as its core programming paradigm. A growing set of deep learning libraries shows Julia gaining ground within the data science and artificial intelligence community.
The R language is popular with statisticians for developing statistical software and data analysis. R ranks 9th in the popularity of programming languages according to the TIOBE index. The R packages gputools, gmatrix, and Rth are libraries that allow developers to harness GPU acceleration for matrix manipulation and sorting. GPU acceleration can reduce complex computations from hours to minutes.
Since 1993 GNU Octave has provided a high-level programming language with a powerful mathematics-oriented syntax, built-in 2D/3D plotting, and visualization tools. Popular in academia and industry, GNU Octave is drop-in compatible with many MATLAB scripts. Octave supports significant acceleration with GPUs through CUDA support.
Xeus Python is a Jupyter kernel for Python based on the native implementation of the next generation Jupyter protocol Xeus. Xeus Python is compatible with the JupyterLab visual debugger, allowing developers to set breakpoints in notebook cells and source files, inspect variables and navigate the call stack.
CuPy is a A NumPy-compatible array library accelerated by NVIDIA CUDA. CuPy uses CUDA-related libraries including cuBLAS, cuDNN, cuRand, cuSolver, cuSPARSE, cuFFT and NCCL to make full use of the GPU architecture.
PyTorch is an open-source deep learning library based on the Torch library, used for applications such as computer vision and natural language processing. PyTorch is specifically build to leverage the gpu compute provided by GPULab. Facebook’s AI Research lab is the primary developer of PyTorch. PyTorch has proven useful in developing popular Deep Learning software, including Tesla Autopilot.
TensorFlow is a symbolic math library based on dataflow and differentiable programming for machine learning, widely used for training models on gpu compute, and deep neural network inference. The Google Brain team developed TensorFlow for internal Google use and later released it under the Apache License 2.0 in 2015.
Keras acts as an interface for the TensorFlow library; it offers consistent and straightforward APIs for training models, minimizing the number of user actions required for common deep learning use cases. Keras simplifies implementations of commonly used neural-network building blocks such as layers, objectives, activation functions, and optimizers. Keras is the most used deep learning framework among top-5 winning teams on Kaggle.
Scikit-learn interoperates with the Python numerical and scientific libraries NumPy and SciPy. Scikit-learn provides high-performance linear algebra and array operations, including various classification, regression, and clustering algorithms, including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and many more.
fastai is a deep learning library sitting atop PyTorch to provide data science practitioners with high-level components that can quickly and easily achieve state-of-the-art results in standard deep learning domains. fastai implements a semantic type hierarchy for tensors, a GPU-optimized computer vision library, a simplified optimizer, a data block API, and more.
Chainer is a deep learning framework written purely in Python on top of NumPy and CuPy Python libraries. Chainer supports CUDA computation and only requires a few lines of code to leverage the GPU available with GPULab. The development of Chainer is led by Preferred Networks in partnership with IBM, Intel, Microsoft, and Nvidia.
spaCy is an advanced Natural Language Processing library written in Python and Cython. spaCy offers the fastest syntactic parser in the world and excels at large-scale information extraction tasks.
Apache MXNet is a modern open-source deep learning framework used to train and deploy deep neural networks. Deep integration into Python and support Julia and R. Whether you are looking for a flexible library to quickly develop cutting-edge deep learning research or a robust framework to push production workload, MXNet caters to all needs.
CatBoost is an algorithm for gradient boosting on decision trees. It is developed by Yandex researchers and engineers, and is used for search, recommendation systems, personal assistant, self-driving cars, weather prediction and many other tasks at Yandex and in other companies, including CERN, Cloudflare, and more.
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solves many data science problems in a fast and accurate way.
MLflow is an open-source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. MLflow’s four components include tracking, project, models, and registry. GPULab environment comes with the MLflow application pre-installed.
GPULab’s JupyterLab environments provide a Bash terminal for command-line operations and Bash Notebook kernel for interactive programming. First released in 1989, Bash has been the default login shell for most Linux distributions. Bash and standard Linux utilities such as head, tail, wc, tr, grep, sort, cut, sed, awk, diff, and uniq create a robust suite of command-line applications and scripting capabilities for file and data manipulation.
Git is the de facto standard for source code versioning and management, popularized and proven by major open-source and commercial software communities such as GitHub and GitLab. GPULab environments provide git as a preferred method for maintaining and sharing all source code, including Jupyter Notebooks.
GPULab provides the GitHub command-line tool gh, allowing users to work seamlessly with GitHub from the Bash command line. Manage gists, issues, pull requests, and releases, create, clone, fork, and view repositories, all without leaving GPULab’s convenient command-line terminal.
GPULab provides the Kaggle CLI tool, the easiest way to interact with Kaggle’s public API. Kaggle CLI provides easy ways to interact with Notebooks and Datasets on Kaggle. The commands available enable both searching for and downloading published Notebooks, Datasets and metadata.
GPULab provides the gcloud command-line interface. The primary CLI tool to create and manage Google Cloud resources. While GPULab operates in a secure private cloud, use the gcloud tool to create and manage: Google Compute Engine virtual machine instances and other resources, Google Cloud SQL instances, Google Kubernetes Engine clusters, Google Cloud Dataproc clusters and jobs, Google Cloud DNS managed zones and record sets, Google Cloud Deployment manager deployments and more.
GPULab offers the AWS Command Line Interface (CLI) to manage your AWS services. While GPULab operates in a secure private cloud, you can use the provided AWS CLI to control multiple AWS services from the command line or automate them through scripts.
GPULab environments provide the Microsoft Azure command-line interface (Azure CLI). While GPULab operates in a secure private cloud, the Azure CLI features a set of commands used to create and manage Azure resources, emphasizing automation.
GPULab provides MinIO libraries and command-line applications for easy access to the popular object storage systems MinIO and Amazon’s AWS S3 (Simple Storage Service). This allows command-line and programmatic access to query, download, and manipulate any number of structured and unstructured data objects from bytes terabytes.
Kubernetes is a leading distributed container orchestration and distributed application platform. GPULab provides kubectl for enabling remote access, administration, configuration, and monitoring of Kubernetes clusters. kubectl is a powerful tool for application deployment and accessing cluster services directly from a GPULab environment.
FFmpeg is an extensive suite of libraries and programs for handling video processing, audio, and other multimedia files and streams. GPULab provides FFmpeg to satisfy the dependencies of many high-level multimedia data science libraries, along with a powerful command-line application for near-limitless manipulation of audio and video data.
The Inkscape command-line application supports vector image creation and manipulation, primarily in Scalable Vector Graphics format. Inkscape supports major image format conversions. GPULab environments contain both Inkscape and ImageMagick applications, supporting nearly any image manipulation activity from the command-line or higher-level software library.
GPULab user environments consist of isolated and secure Ubuntu Server 20.04 (LTS) Linux operating systems built on Debian Linux architecture and infrastructure. Ubuntu aims to be secure by default. Canonical and a community of developers manage Ubuntu and provide timely security updates.
GPULab network policies allow for the secure transmission of data to and from external networks and services. OpenSSH and utilities such as SFTP and SCP allow industry-standard command-line and programmatic encrypted file transfer.
GPULab provides the Secure Shell client, allowing network services to operate securely over an unsecured network. Typical applications include remote command-line, login, and remote command execution, yet network service can utilize SSH.
GPULab offers a fully integrated NVIDIA GPU/CUDA API Linux environment, with pre-installed and configured data science languages and runtimes such as Python, Julia, R, and Octave, along with the latest popular data science libraries and frameworks, including PyTorch, TensorFlow, Keras, Scikit-Learn, and dozens more.read more
GPULab provides fully configured JupyterLab environments built to work seamlessly with their attached NVIDIA GPUs, CUDA APIs and deep learning frameworks. GPULab’s GPU cloud environments come with the most popular Data Science, Machine Learning, and Deep Learning libraries, including TensorFlow, PyTorch, Keras, Scikit-learn, and fastai, along with all of their dependencies.read more
The GPULab approach to collaboration in data science is simple and straightforward. GPULab offers Jupyter Notebooks (managed by JupyterLab) and the Git application atop a standard Linux filesystem for managing source code. The Git application communicates source code changes with numerous cloud and private hosted Git repository services, such as GitHub, GitLab, Bitbucket, Microsoft Azure Repos, Google Cloud Source Repositories, AWS CodeCommit, Gitea, and many more.read more
Security starts at GPULab’s private cloud facilities. GPULab owns and operates its hardware infrastructure in secure Data Center facilities located in Los Angeles and Pasadena, California; Las Vegas, Nevada; Denver, Colorado; and Herndon, Virginia. GPULab limits physical hardware access to screened technicians.read more