-
Check for Outage by checking
- DNSSEC: https://dnssec-analyzer.verisignlabs.com/
- DNSViz: https://dnsviz.net/
-
Run dig @8.8.8.8 domain.com
- If the flag has ad and status is NoError, dnssec is working. If this is the state, possible outage
- If the status is Servfail, dnssec may be failing. If this is the state, outage
- Run dig @.8.8.8.8 domain.com +cd to see if the servfail goes away. If it does, this indicate it is a dnssec-related servfail. If this is the state, outage
- Check if problem is specific to DNSSEC by running dig for host in question with +cd.
-
Check if DS matches for the domain matches what it is at registrar. To see what is at registrar either check the whois for the domain or run this dig:
-
Dig DS domain.com @TLD-Nameserver
-
In other words, if the domain is a com domain named domain.com, your dig would be
- Dig DS domain.com @a.gtld-servers.net
-
Compare/contrast these answers to the DS records listed in the DNSSEC section of the zone in the user interface
- If one or more DS records match, proceed to Step 3
- If no DS records match, that is the problem. Unsign at registrar. (The zone does not need to be unsigned on us.) This ends the outage. Wait two days to allow wrong DS to expire. Supply correct DS to registrar. This will enable the zone to be signed again at registrar.
-
In other words, if the domain is a com domain named domain.com, your dig would be
-
Dig DS domain.com @TLD-Nameserver
- Check if RRSet has multiple members and ensure they have the same TTL. If not, have customer ensure all records in RRSet have the same TTL
-
Check if Old Signer or New Signer
-
Browse to the DNSSEC tab of the domain inside the customer portal. If the algorithm type is 13, it is new signer. If the algorithm type is 8, it is old signer
-
If new signer and possible outage:
- Have the customer run digs or nslookups against whatever name server they are seeing problems with from a location they see the problem. (So replace '8.8.8.8'.) This will help isolate a local problem
- Escalate (PagerDuty) to traffic operations for further troubleshooting. Include tests done in previous steps. You do not need to wait on the customer digs to open a ticket with engineering. They can do their own testing while we wait on customer response
-
If new signer and outage:
- Escalate (PagerDuty) to traffic operations for further troubleshooting. Include tests done in previous steps. You do not need to wait on the customer digs to open a ticket with engineering. They can do their own testing while we wait on customer response
-
If Old Signer, and possible outage or outage
- Browse to the zone in UltraAdmin and click Full Re-Sign. Re-run tests from Step 1
- If re-signing cleared error, let the customer know issue is resolved and ask if they still see any issues. Open JIRA (no PagerDuty escalation) with tests done so far and to investigate why full re-sign was required
- If re-signing did not clear error, Escalate (PagerDuty) to traffic operations for further troubleshooting. Include tests done in previous steps. You do not need to wait on the customer digs to open a ticket with engineering. They can do their own testing while we wait on customer response
- At this stage, the problem is either fixed or escalated.
-
If new signer and possible outage:
-
Browse to the DNSSEC tab of the domain inside the customer portal. If the algorithm type is 13, it is new signer. If the algorithm type is 8, it is old signer
Additional Information
Add rrsig expiration check