1.4. Git

What is Git?

Git is a distributed version control system that is integral for managing both small and large projects effectively. It excels in tracking source code changes during software development, enabling multiple developers to collaborate on the same project without conflicts. Git is highly regarded for its robust data integrity, versatility, and support for complex, nonlinear development workflows.

Why do you need Git?

Git serves several critical purposes in software development:

Version Control: It meticulously tracks and manages changes to your project, offering the ability to revert to previous states, compare changes across timelines, and more.
Collaboration: Git facilitates simultaneous collaboration among multiple contributors on the same project. It supports branching and merging strategies, allowing seamless teamwork without risking data overwrite.
Backup and Restore: With changes stored in a repository, Git acts as a backup mechanism. You can revert your project to a prior state or retrieve lost data as needed.
Branching and Merging: Git enables you to create branches for experimenting or developing new features independently of the main project, which can later be merged back into the mainline without disrupting the ongoing development.

How can you install Git?

To install Git, consult the Git Installation Guide, which offers comprehensive instructions for a variety of operating systems. This ensures you can efficiently set up Git on any development environment.

# installation on mac with brew
brew install git

# install on linux with apt
sudo apt install git

How should you use Git for your project?

For Git beginners, starting with a foundational tutorial, such as the one provided by GitHub's Git Tutorial, is recommended. Here's a simplified guide to using Git in your project:

Initialize Git: In your project directory, execute git init to create a new local Git repository.
Stage Files: Stage files for your next commit with git add <file>, for instance, git add README.md LICENSE.txt.
Check Status: Use git status to view staged changes, unstaged changes, and untracked files.
Commit Changes: With git commit -m "Initial Commit", commit your staged changes to the repository, including a descriptive message about the changes.

Should you commit every file in your project?

When using Git, it's important to selectively track files. Consider the following guidelines:

Exclude Secrets: Sensitive data, such as API keys and passwords, should never be committed to your repository.
Manage Large Files: For files exceeding 100MB (e.g., dataset files), use Git Large File Storage (git-lfs) instead of directly committing them to your Git repository.
Omit Cache Files: Do not track temporary or environment-specific files (e.g., .venv, mlruns, log files) that don't contribute to the project's primary function.

To exclude certain files and directories from being tracked, create a .gitignore file in your project's root directory. This file should list patterns to match filenames you wish to exclude, for example:

# https://git-scm.com/docs/gitignore

# Build
/dist/
/build/

# Cache
.cache/
.coverage*
.mypy_cache/
.ruff_cache/
.pytest_cache/

# Editor
/.idea/
/.vscode/
.ipynb_checkpoints/

# Environs
.env
/.venv/

# Project
/docs/*
/mlruns/*
/outputs/*
!**/.gitkeep

# Python
*.py[cod]
__pycache__/

Adhering to these practices ensures your repository remains streamlined, containing only pertinent project files and thus enhancing the clarity and efficiency of your development process.