|
It is CENIC's goal not only to maintain high standards of network reliability and availability but also to assure clear, consistent, and established
communications with universities, colleges, and K12 node sites in the event of service incidents.
- Purpose:
- The following describes the procedures and timeframes associated with problem management.
- Scope:
-
The CENIC network consists of the CENIC backbone and the connections from the backbone to universities, colleges, K12 node sites, peer and transit
networks, and such connections to legacy or other networks as may from time to time be established. All CENIC-operated network segments are supported
according to the procedures described herein.
The CENIC NOC services and support should be obtained by predetermined primary and secondary or central point of contact from each site rather than a
variety of local area network end-users.
NOC services and support can be accessed by two methods:
- Telephone: 714-220-3494, or
- E-mail: noc@cenic.org
Telephone contact with the NOC is the most immediate means of receiving any type of service or support i.e. inquiring about network
outages, performance degradation, abuse reporting or deployment items. The NOC staff immediately logs all calls requiring new service or support
in the CENIC trouble ticket system and provides the caller with a ticket number for future reference. Additionally, the NOC includes the callers
e-mail address in the Requestor field of the ticket so that the caller/Requestor can receive and post updates via the ticket. During network events
impacting a single site, the NOC will provide callers an update via phone for Urgent and High priority tickets at standard intervals.
E-mail to noc@cenic.org automatically creates a ticket in the trouble ticket system, which is then automatically
e-mailed to all NOC staff.
Additionally, an automated response e-mail is returned to the sender, which includes the CENIC ticket number in the subject line. For example,
[CENIC#1722] Berkeley power maintenance. Any future e-mails using this same subject line will automatically post ticket entries with the
e-mailed message content.
To assure that the NOC can provide up to date information on any ticket, the trouble ticket system contains detailed information on each problem and
is accessible by all NOC and CENIC engineering personnel. All progress toward resolution is fully detailed within the ticket and notification of these
updates and actions are distributed to the NOC team via e-mail. Additionally, a change in the status of tickets (new, open, stalled, resolved, deleted)
is e-mailed to the NOC team.
At the time an incident is reported and a trouble ticket is created, the incident is classified with an appropriate degree of severity. Incident severity
is classified as one of the following:
- Urgent: (path to resolution needed within 0-59 minutes)
- High: (path to resolution needed within 1-24 hours)
- Medium: (path to resolution needed within 24-72 hours)
- Abuse: (path to resolution needed within 5 business days)
- Low: (research and data collection action is needed)
An Urgent designation assumes a network or a key network resource is down and unavailable. All failures where there are not redundant
facilities are classified as urgent. This classification of incident receives immediate action, including a notification to CENIC management; the Network
Operations Manager and Director, the Director of Design and Engineering, the CTO and the CEO.
A High designation assumes that the network or a resource within is suffering from some sort of unacceptable degradation, but is not
completely down. This designation is also used when a network resource is down but for which there is an operating redundant resource. It also receives
immediate action, including a notification to CENIC management. The primary difference in this classification is the immediate action is evaluated against
Urgent incidents, which allows the NOC to prioritize multiple service affecting incidents occurring at the same time and allocate resources accordingly.
A Medium classified ticket relates to a network problem or situation that does not have a major impact on the network as a whole. However,
it is a matter that requires action toward resolution within three days.
The Abuse classification is used to track complaints regarding spam, violations of an acceptable use policy (AUP) or other similar problem
reports not related to a network outage.
Tickets classified as Low are given this designation when there is no further action required in the problem resolution cycle. This
ticket remains open to collect further information regarding the nature of a problem or resolution, or as a means of reminder to observe a newly repaired resource.
To maintain high standards of service, the NOC uses a combination of phone calls, text messaging to cell phones, and e-mails to
ensure that any member of the CENIC team may be reached regardless of their location.
These notifications occur for each incident classified as Urgent and High and are sent as soon as practical after the problem begins,
but in no case later than 30 minutes after the problem is observed.
For Urgent tickets, a notification is sent hourly to CENIC Management with problem status, until the problem is resolved.
For High tickets, a notification is sent twice daily to CENIC Management with problem status, until the problem is resolved. These
notifications occur at approximately 0800 and 1600, with problem status.
When the incident is resolved, a final e-mail to notification is sent to CENIC Management summarizing the problem and its final resolution.
CENIC strives to assure that escalation is seamless to those who report network problem reports. When a problem is beyond the capability of the
NOC Engineering team or when an incident is classified as Urgent or High and cannot be isolated and a potential solution identified within the
first hour, the NOC escalates problems to the Core Engineering team. This escalation may occur within a shorter timeframe, as the NOC team may
realize immediately that additional support is needed.
The Core Engineer is informed of the problem or failure and is provided with all supporting information. At this point a strategy is decided upon
and documented in the trouble ticket.
If a problem cannot be isolated and a potential solution identified within the first hour by a Core Engineer, the NOC facilitates the escalation
of the problem to other Core Engineers and/or vendors.
At any time during the process of requesting service and support from the CENIC NOC, a site contact may request an increase in the classification
of their incident and ticket. From this request, the CENIC NOC will utilize the corresponding procedures for notification and escalation of the incident.
Upon resolution of an incident, the NOC Engineer will change the status of the ticket to resolved, which triggers an email notification to the client stating the following:
Ticket [CENIC #1722] regarding:
" Berkeley power maintenance ",
is shown as completed in our records
and will be closed unless you respond
to this notice requesting additional
support and/or information.
Thank You,
CENIC Network Operations
714-220-3494 or 714-220-FIXIT
www.cenic.org
All trouble tickets that were categorized as Urgent or High, are reviewed jointly with the Director of Network Operations and the
NOC Manager in a post-resolution analysis meeting. The purpose of the meeting is to review all of the data relevant to the problem
with a goal of improving the network and/or the operational procedures.
This analysis is also available to our clients upon request.
To ensure proper communication during network problems, the NOC utilizes several methods of information sharing. The individual(s)
and/or group(s) that initially reported the Urgent or High ticket are copied on all notifications.
Notifications are sent in 3 phases:
- 1. Initial Status Report
-
This is performed as soon as a problem has been reported, and a problem ticket is opened. Notification may not initially identify the
cause or source of difficulty, but contains the ticket number and reports what network components are affected, the status of their
functionality, and the scope of the outage in relation to the network as a whole. This notification is sent to ops-announce@lists.cenic.org
- 1.5 Identification
-
This notification states the cause and source of the problem, if not already conveyed in the Initial Status Report, and what course of
corrective action is being followed. An estimated time of resolution is given, if at all possible. This notification is sent to
ops-announce@lists.cenic.org
- 2. Updates
-
Periodic updates are given hourly for Urgent tickets and twice daily for High tickets until the problem has been resolved. These
updates are sent to a more specific group of clients who are most interested in the details of progress toward resolution, this
list is ops-problem-detail@lists.cenic.org Any new information, milestones, or setbacks are included. Updates are also sent when
new information of a significant nature becomes available, such as extension of the estimated time to repair.
- 3. Closure
-
Upon closure, a resolution synopsis is prepared and distributed immediately. This notice will include details regarding final
resolution. Any other important pieces of information are also disclosed. Review of the completed trouble ticket is available upon
request. This notification is again sent to the broader, ops-announce@lists.cenic.org in order to ensure closure across the board.
It is the primary responsibility of the NOC to troubleshoot problems on the network. However, this is often a collaborative effort
with CENIC vendors and CENIC Members. Although each vendor maintains their own trouble ticket system, information is to be shared
between parties in a collaborative effort to resolve the problem. Once a trouble ticket is opened with a vendor, the NOC updates the
trouble ticket with the relevant information.
The NOC prepares two weekly status reports. The first report contains a list of all open trouble tickets sorted by status (Urgent,
High, Medium, Low, Abuse), in ascending order by date opened (oldest to newest) within each status. The second report contains a
list of all tickets closed since the date of the previous report. These reports are posted to the CENIC web site
http://noc.cenic.org by 0900 on the first business day of each week.

|