Guide for Data and Computation Team Members
Contents
Guide for Data and Computation Team Members¶
This is a short write up facilitate the spin up of new team members for the Data and Computation Team and describe regular maintenance tasks.
Onboarding¶
Checklist for new members¶
[ ] Ask to be added to the Data and Computation Github team
[ ] Ask to be added to the
@data-and-compute
Slack user group[ ] Subscribe to Relevant Slack channels
[ ] Consider enabling notifications for Relevant github repos
[ ] Make a PR to the
_config.yaml
file here in a PR. to add a picture and your personal data to the webpage.[ ] Get access to the Grafana Dashboard
[ ] Request access to a service account to monitor Google Dataflow and Storage from the Google Cloud Console by raising an issue here
Instructions for admin:
Go to the Google Cloud Console > IAM > Grant Access
Add the following permissions:
Dataflow Admin
Storage Admin
Logs Viewer
Monitoring Viewer
Logs Viewer
Compute Viewer
Relevant Slack channels¶
data-and-computation-team
: Private channel for internal discussions, please contact the Manager for Data and Computing to be added.leap-pangeo
: Community wide channel where LEAP-Pangeo users can ask questions. Members of the team should join the channel and regularly engage with issues raised.
Relevant github repos¶
leap-stc.github.io
: The source for the LEAP-Pangeo technical documentation. This also contains LEAP-Pangeo Discussion Forumdata-management
: Contains all the pangeo-forge based code/automation to ingest new datasets. Point users to the issue tracker to request new datasets.
Regular Maintenance¶
This section documents common tasks performed as part of the Data and Computation Team’s duties.
Monitor User Directory Useage¶
The user directories of all members are on a shared volume of fixed size. We pay for the total size of the volume no matter if it is used or not, so we should strive to keep usage to a minimum. Our current policy is that <50GB per user is acceptable. Unfortunately we do not have a technical way to implement per user quotas, so we need to regularly check two things:
How much space is left on the entire shared volume? If the free space falls below 20% we need to discuss further actions.
Which users are taking up more than the allowed 50GB. For users who take up more than 50GB the suggested action is to reach out via slack or email.
Regularly Updating the Software Environment¶
We aim to provide users with up-to-date default software environments. This currently requires to change the pangeo-docker-images tag manually.
Make sure you are subscribed to release notifications on pangeo-docker-images to recieve Github notification about new releases
To bump the version, submit a PR to this config file in the 2i2c infrastructure repo. In that PR you need to change the image tag for all image choices, see an example here.
To send emails, a token file is setup as the OAUTH_GMAIL_CREDENTIALS Github Secret in the member management repo; every so often (around biweekly) the action will require re-authentication, generating a new token that should replace the existing secret. This makes use of OAUTH_GMAIL_CLIENT_SECRET, which never needs to change. To update the OAUTH_GMAIL_CREDENTIALS secret, run the
generate_emails_token(github_action_mode=False)
function from utils.py locally, which will direct you to a confirmation screen. Log in to the leap.pangeo@gmail.com account and authorize access. You will require a copy of the CLIENT_SECRETS file on your personal machine which can be retreived from Google Cloud Console from the above pangeo support email.
Offboarding members¶
[] Delete personal
dct-team-<first_name>
service account in IAM (needs admin priviliges).