1.0. System
What system is recommended for this course?
This course is tailored to be accessible on a wide range of operating systems including Linux, Chromebook, macOS, and Windows. Although there are no stringent hardware prerequisites, having a computer with enough CPU and RAM specifications is crucial for processing datasets efficiently. This ensures that you can fully engage with the course activities, regardless of your system preferences or the devices you have at your disposal.
Can you use JupyterLab or Google Colab for this course?
While JupyterLab is an acceptable environment for this course, it's worth noting that we've optimized the course content for Visual Studio Code (VS Code). VS Code offers a broader set of features that are specifically designed to enhance your learning experience in this course. Although Jupyter notebooks and Google Colab can be suitable for initial stages, like the Prototyping chapter, you may find that later sections of the course require functionalities, such as terminal access and file system navigation, that are more efficiently executed in VS Code.
Are additional software installations required?
Engaging with this course material necessitates installing several key software packages, including Python, Poetry, git, and VS Code. These tools form the backbone of your development workflow:
- Python is indispensable for all course-related coding activities.
- Poetry offers an efficient way to manage Python package dependencies.
- Git is crucial for version control and collaboration.
- VS Code is recommended for its integrated development environment (IDE) capabilities, although alternatives may be used based on personal preference or specific needs.
Detailed instructions for installing these software packages are provided in their respective course chapters.
What are the specific hardware requirements for MLOps projects?
MLOps projects vary significantly in their complexity and demands on hardware, from simple tabular data analyses to complex machine learning models like transformers:
- Tabular Data Projects: Projects utilizing libraries like scikit-learn or XGBoost typically don't require specialized hardware, though an optional GPU could enhance performance for certain tasks.
- Multimedia Data Projects: For projects involving TensorFlow or PyTorch for processing images or video data, access to at least one GPU is beneficial for faster processing.
- Large Dataset Projects: Advanced projects that employ transformers or require extensive parallel processing may need multiple GPUs, possibly distributed across several machines for optimal performance.
It's often best to start with a straightforward setup, such as developing models on a local machine with sample data, before scaling to more complex arrangements like cloud-based resources for deployment and broader testing. Cloud platforms also enable running multiple experiments simultaneously, which can expedite the development process.
Is it possible to use cloud-based systems?
This course supports both local and cloud-based development environments, including options like GitHub Codespaces and Cloud Workstation. Cloud platforms offer considerable benefits, such as standardized development environments for easier team collaboration and enhanced security measures for your data. Nonetheless, it's crucial to understand any specific setup requirements and to manage resources effectively, especially when navigating the limitations of free tiers or usage quotas on these services.