Remote Repositories, Part 1: Remote Collaboration

Amir Ebrahimi Fard
Data Management for Researchers
7 min readJul 26, 2021

--

Photo by NASA on Unsplash

If you want to collaborate on a project using Git, you need to know how to work with remote repositories. A remote repository is hosted in a different location from where Git is locally operating (e.g., on the Internet or network somewhere). It works as a hub through which collaborators can remain synced with each other. Each collaborator periodically downloads the most recent changes from the remote repository to their local repository and uploads their contributions to the remote repository. This dynamic makes remote and scalable collaboration possible and easy. Figure 1 displays how remote repositories fit into the Git ecosystem.

Figure 1: Remote repositories and the Git ecosystem.

Two kinds of collaboration in Git

To start collaborating on a project using Git, first you should know whether or not you are a member of this project in the repository manager (e.g., GitHub, GitLab, BitBucket, or another remote repository host platform). This is a key question since being a project membership (or lack thereof) defines the options available and best practices for collaboration and contributing. As a project member, you can make changes directly to files and folders within the project. If you are not a member, you are still able to contribute! You can do so indirectly by creating a fork (copy) of the project¹, changing it in the ways you want, and sending a request to the project owners asking them to approve the change(s).

Suppose you would like to improve one of the algorithms implemented in the scikit-learn library [1]. Let’s assume you are not one of the project members. What you need to do is first, go to the project page on the remote repository platform (in this case, GitHub). On the top right corner of the page, there is a button that says “Fork”. Clicking here will copy the most recent version of the project to our account, where we are free to make changes that are not reflected in the original (yet).

Figure 2: Forking a remote repository.

See below that after forking, a copy of the scikit-learn repository appears in my own repository list. From now on, I can make any changes to this repository without affecting the original.

Figure 3: Forked repository among users repositories.

Next, you can clone (download) the project from the remote repository to your local computer. To do this, go to the newly created project page in your repository list, and then click on the green button that says Code on the top right of the page. Copy the URL (Figure 4). By using the git clone <remote_repository_address> <repositoty_name>² command in your CLI, the most recent version of the repository including all the files and branches will be copied to your local machine.

Figure 4: Copying the project URL in our remote repository.

You can make your changes in your local repository and commit them as explained previously. When all changes are complete and committed locally, you must submit those changes to the remote repository using the git push command. Your remote repository will be thus updated. The last step is to submit your updated version of the scikit-learn repository to the original repository using a pull-request. To send a pull-request, go to the project page in your repository and click the Pull Request link (Figure 5). A new page opens that allows us to review the changes we are suggesting to the authors of the original repository. After making sure that everything looks good, click on the green button that says “Create pull request” to send your suggestions to the owners of scikit-learn repository. They will have the chance to review the proposed changes. If approved, the changes you made will be incorporated into the scikit-learn library. Figure 6 summarises all the steps for collaborating in a project without being a member.

Figure 5: Sending a Pull request from a forked repository to the original one.
Figure 6: Remote collaboration in a project without being a member of it.

Other methods for collaboration are used when you are a project member. In this case, most of the time³ you can directly apply your desired changes in the remote repository. First, clone the remote repository (the original, no forking needed) to your local machine. You can do this by going to the repository page and copying the remote URL by clicking on the green button that says Code. Then use git clone <remote_repository_URL> <repositoty_name> in your CLI to clone (download) the current version of the repository to your local machine. After making modifications and committing the changes locally, you need to push this information to the remote repository using the git push command. Figure 7 summarises all the steps for collaboration when you are a project member.

Figure 7: Remote collaboration in a project when a user is a project member.

Initiating a remote repository in Git

Creating a repository is done in more or less the same way no matter what remote repository service we use. Here, we show steps to create a new remote repository in Github. After logging in to your account, click the plus sign at the top-right corner of the page next to your avatar. Choose “New repository” from the dropdown menu (Figure 8).

Figure 8: Creating a repository in Github.

This will bring you to a new page where you will be asked to add essential information about the repository. The primary field is “Repository name”, which must be given to create a new repository. You can also decide whether you want to make this repository public or keep it private. There are three checkboxes at the bottom of the page. The first one automatically adds a README.md file to the repository. Similarly, the second checkbox adds a .gitignore file to the repository⁴. The third checkbox creates a license for the repository. There are some reasons you may not want to add these files just yet, mainly if you already have a local repository and want to push the contents to GitHub as a remote copy. Here just the repository name is included and then you can click on “Create repository” (Figure 9).

Figure 9: Adding the essential information to create a new repository.

The next page contains a set of quick guidelines on how to set up and connect to your new repository⁵. If you already have a local Git repository on your machine, you can push the existing directory to your remote repository using these three commands in your CLI:

git remote add origin https://github.com/<your_github_username>/<your_repository_name>.gitgit branch -M maingit push -u origin main

Remember, you’ll need to initialise a Git repository in your local directory and create at least one commit before pushing to any remote, so if you haven’t done that yet, follow the steps below:

echo “# SOME COMMENTS” >> README.mdgit initgit add README.mdgit commit -m “first commit”git branch -M maingit remote add origin https://github.com/<your_github_username>/<your_repository_name>.gitgit push -u origin main

Remotes in Git

Every remote repository has a unique URL. This makes remote collaboration possible by allowing multiple contributors to establish a connection between their local machines and the same remote repository. When a remote repository is cloned, the remote URL is automatically assigned to a variable called “origin”. You could also use git remote add <remote_variable_name> <remote_repository_URL> to do this assignment manually or to create new variables for other remote repositories, in case you prefer to keep your project synced with multiple remote repositories. You can view the list of remotes at any time using the git remote command. Adding the -v switch to this command (git remote -v) shows the full remote address(es).

File transmission between local and remote repositories

So far we have discussed how to clone (download) a remote repository to a local machine (using the clone command) and how to submit changes to a remote repository (using the push command). But what if one of our collaborators makes some changes while we’ve been working and we want to incorporate those changes into our local version? We don’t have to clone the entire repository every time anyone makes changes in order to keep the local repository in sync with the remote one. This is cumbersome and inefficient, as we’d have to delete the existing repository and clone it again. To keep everyone in sync with each other there are fetch, and pull commands in Git that are used to download the updates. They are similar but slightly different from one another — the next article explains their differences.

Figure 10: File transmission between local and remote repositories.

Footnotes

  1. This process is called forking.
  2. The last argument is optional and if we do not determine the folder_name, the project will be cloned in a folder with the same name as the remote repository.
  3. Sometimes a change to the project even for members is subjected to review by at least one other member.
  4. It is important to note that, by checking either of the first two checkboxes (or both of them), we are making the first commit to the repository.
  5. If you check the “Add a README file” item in the previous page, then by clicking on the “Create repository” button, instead of the setup page, you will be directly forwarded to your repository page.

--

--

Amir Ebrahimi Fard
Data Management for Researchers

Postdoc Researcher on AI Explainability - Interested in the intersection of data, algorithm, and society.