Introduction
The Online team has a weekly firefighter rotation where every Monday a new firefigter takes over the responsability of becoming the first line of defense for any incidents or issues impacting Banno Online. View the weekly schedule on the Online Firefighter Calendar.
During the rotation, the firefighter is expected to:
- Process customer issues.
- Monitor #org-online.
- Handle PagerDuty alerts 24/7. PagerDuty alerts should be acknowledged once received, and resolved once the issue is resolved.
- Create/manage fixes for the aforementioned issues, or ask others to help fix an issue.
The firefighter should pull in other team members and co-workers as needed to resolve an issue. All team members should generally be reachable through Slack DM and SMS and phone call, except when on PTO.
Get Firefighter Access
- Firefighter Access will allow the Firefighter to view customer data in the production environment.
- Getting FF access is automated in the #org-firefighter-requests channel. Just post
FF please. - If you get an automated error like
John Doe is not a banno user, you need to update your Slack account email to match your People email (most likely your JHA email).
PagerDuty Account Setup
Each team member will need to get a PagerDuty account. Join #org-tech-ops and ask for a PagerDuty account to be created for you. Once your account is created, make sure you configure your account properly. Properly receiving PagerDuty notifications is important. Downloading the PagerDuty mobile app is recommeded.
Process Customer Issues
- Use the Online Triage Dashboard filter in JIRA to see issues that need to be triaged.
- Issues should be claimed or rejected (not necessarily fixed) within 8 business hours.
- Go through each issue one by one, analyzing the description at a high level.
- For each issue that is determined to be outside our domain, follow these steps:
- Click
Automation - Tier-2 Triage V2 - Reject Issue - Web - Click
Run
- Click
- For each issue that needs to be fixed by
The Online Team, follow these steps:- Click
Automation -> Tier-2 Triage V2 - Claim for Web - Click
Run - Note: When we do this, a WEB-XXXX ticket is created for us and linked to the
OPSrelated ticket - For each ticket created via automation, we need to manually make these changes:
- Assign to yourself
- Click
- During investigation, if you find a service in the logs that you are not familiar with, you can identify the owner here. If you need help or clarification, you can reach out to that team’s slack channel.
- Provide regular updates on the ticket until it’s complete.
Tools
- Reporting tool: This Google Chrome extension provides several features in order make it ease to consult information about users and institutions among others.
- Debug tool - Production & Debug tool - UAT: Used to retrieve data for users and institutions in both the Production and UAT environments. Note: See instructions here.
- Postman: Used to retrieve data for users and institutions in both the Production and UAT environments. More flexibility than the Node Debug Customer Issue tool. Note: Must have FF Access and must add eauth cookie in the request.
- DataDog: Logs. Please be aware that rehydrating logs is very expensive for JH.
- People: Uses Banno Online components and can view Production data if FF Access has been requested.
Incidents/Outside of Work Hours
PagerDuty is shared amongst the Online team and will be used to notify the firefighter on duty if/when there is an incident.
- In Slack typing
/pd-online YOUR MESSAGEanywhere will create an incident and notify the active firefighter.
General Questions
For general questions on issues it is best to contact the firefighter through use of the #org-online room. However, for a faster response @online-firefighter is the handle that will ping the firefighter directly without creating an incident in PagerDuty.