This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Manage

When the solution is designed, developed, and deployed, another job begins that may be a bit unfamiliar to many: Management. Regardless of how much development is happening, we still have a responsibility to manage what we roll out into production (or to other environments).

These need to be monitored, we must ensure we have regular backups _that also need to be tested_, that we have up-to-date disaster recovery plans, follow up on vulnerable dependencies, and much more.

1 - Verify the Design

When developing a solution, we should always validate that the solution adheres to the design. If it deviates, we must either correct the solution or update the design.

When we create a design for a new solution, there may be details we do not know, or unexpected complications may arise during implementation. This can result in the original design deviating from the final solution.

Documentation is crucial for understanding how a solution is set up and how it works, especially if an incident occurs that requires redeployment or disaster recovery. To ensure that the gap between documentation and the final product is not too large, we should always validate the design afterward.

What Should We Check?

One of the most important aspects is everything around the code that may not necessarily be in code form. This includes the resources we use, network setup, and firewall openings. We should also review IAM and the permissions granted to resources and applications.

When design is verified during the operations phase, the team should at minimum check:

  • that implemented components actually match system diagrams
  • that the requirements in security requirements are covered in the solution and operations
  • that network, identities, roles, and access rules align with the design
  • that dependencies (internal and external) are documented and still valid
  • that the team has a routine to maintain design alongside the application

If there are elements in the design that are not implemented, these should be removed from the documentation. If we have implemented elements not in the design, the design should be updated, or the elements should be removed from the solution.

Requirements → Design → Implementation

To make verification traceable, the team should use a simple traceability matrix:

  • requirement ID → design choice → implemented control → test/evidence

This makes it easier to document that requirements are actually realized and provides a clear basis for audit, handover, and incident response.

Examples of how this can be demonstrated:

  • IaC configuration and policy definitions
  • screenshots/exports from cloud platform showing actual setup
  • results from security testing
  • logs confirming controls are active

Verification of AI Systems

For solutions with AI components, verification must extend beyond classic infrastructure and application design.

The team should verify:

  • that AI requirements are linked to concrete design choices and implemented controls
  • that training/evaluation data, models, and versions are documented
  • that access to model, data, and operational interfaces follows least privilege
  • that logging and monitoring cover AI-related events and anomalies

This supports control areas such as “AI system verification and validation” and “Documentation of AI system design and development”.

Verify Evaluation Results

For AI solutions, it is not enough to verify that the service responds. The team must also verify that the model’s results continue to fall within accepted parameters.

At minimum:

  • defined acceptance criteria for quality (e.g., precision, recall, or domain-specific metrics)
  • documented test sets and evaluation method
  • comparison against previous baseline when model or data changes
  • clear decision on approval or rejection of new version

If deviations occur, the team must document actions and any risk assessment before further production use.

Documentation and Traceability

In practice, the documentation should make it possible to answer:

  • which requirement underlies a given design choice
  • which version of model/prompt/policy is in operation
  • who approved the change and on what basis
  • what evidence shows that controls are working

Traceable documentation reduces troubleshooting time during incidents and makes re-verification easier with major changes.

See also:

How Can We Check?

This depends greatly on the form and nature of the project, but in many cases, the IT organization at the customer (for projects hosted at the customer) can help. If the solution runs at Bouvet, Internal IT & Security can certainly assist with checking things like network configurations or point you in the right direction. There is also much you can do yourself, but check with Internal IT & Security before installing tools and running scans or similar.

A practical minimum is to conduct verification:

  • with major changes in architecture or dependencies
  • before production rollout of new risky changes
  • as part of regular operations routine (for example, quarterly)

Further reading

2 - Audit or Review of Project or Delivery

Customer or recipient may require an audit of the delivery. The team must then be able to document requirements, design choices, security measures, and how these have actually been followed up in practice.

Not all deliveries are audited, but if a customer or recipient asks for an audit, the team must be able to show more than that the solution works. An audit is often about documenting that requirements are understood, that the right controls are chosen, and that these are actually implemented and followed up.

Such an audit will typically be anchored in a contract, legal requirement, or internal governance requirements at the customer. In practice, it is most relevant when delivering or after the solution has been in operation for a while.

What Must Be Documented?

Requirements will vary, but the team should at minimum be able to show:

  • which requirements the delivery must meet
  • which design choices were made and why
  • which security measures are implemented
  • who is responsible for operations, access, and follow-up
  • which deviations, risk assessments, and accepted exceptions exist

This does not necessarily mean large document packages. What is important is that the documentation is current, traceable, and accessible to those who actually need it.

Before an Audit

The easiest way to handle audits is to be prepared before the customer asks. The team should therefore have a conscious approach to where documentation is located, who owns it, and how it is kept current.

It is particularly useful to clarify:

  • who represents the team in the audit
  • where key documents and evidence are sourced from
  • which parts of the documentation can be shared directly, and what requires special consideration
  • how deviations, exceptions, and risk decisions are documented

An audit becomes cumbersome if information exists but is scattered across different systems, folders, and individuals.

Typical Evidence in an Audit

An audit will often ask for concrete proof, not just descriptions. This could include:

  • system diagrams and overview of dependencies
  • security requirements and how they are implemented in design and operations
  • documentation of architecture, processes, and decisions
  • results from testing, verification, and reviews
  • logs or reports showing that controls are actually active
  • overview of roles, access, and approvals

If the customer has declined recommended measures or additional services, this must also be clearly documented. It is important both for expectation management and to explain remaining risk.

Share Only What Is Necessary

An audit does not mean all documentation should be shared uncritically. Some material may contain sensitive information about vulnerabilities, internal networks, access, or weaknesses not yet closed.

The team should therefore consider:

  • whether the audit requires full insight or whether summaries/evidence suffice
  • whether parts of the material must be protected or shared in controlled form
  • how access to audit material is logged and restricted

The goal is to be open enough to document compliance without exposing more than necessary.

Audit of AI Systems

For solutions with AI components, the team must additionally be able to document:

  • which models, versions, and data sources are used
  • which requirements apply to quality, security, and use of the model
  • how evaluation results have been assessed and approved
  • how changes to model, prompt, policy, or dataset are tracked
  • how logging and monitoring support operations, incident handling, and traceability

It will also be natural to be able to explain:

  • what the system’s intended use and limitations are
  • which human control points exist
  • how the team detects degradation, misuse, or unexpected model behavior

See also:

After the Audit

An audit is not finished when the meeting is over. The team must ensure that findings, deviations, and recommendations are assessed and followed up like any other improvement work.

At minimum:

  • register findings with clear ownership
  • assess severity and timeline for follow-up
  • document any disagreements or clarifications with the customer
  • update design, routines, or documentation if the audit revealed gaps

Keep It Simple and Organized

The most important thing is rarely to produce more documentation. What matters is being able to clearly answer what was built, why it was built that way, and what evidence shows that the controls work.

Further reading

3 - Logging and Monitoring

When a solution is in operation, logging is one of the most important tools we have. Collecting information is critical to gaining insight into what is happening with the solution and responding to events, but only if we monitor it.

Regardless of where a solution is deployed, we should ensure that it is monitored. Even if it is only available on the intranet with only internal users working from approved devices over VPN, logging information is important if one of these is compromised. A typical DevOps team will collect some information to help debug the application’s functionality, but we also need other information to assess the security context around it.

Remember

Regardless of the need, remember that privacy applies to logs as well! Do not collect more information than you need, and logs must be deletable after a given period.

The goal of logging has three primary purposes:

  • Intrusion detection - We must be able to detect if someone is attacking the system
  • Investigation basis - We must have enough information to understand what happened, how it happened, and who did it
  • Satisfy customer or external requirements, such as from authorities

What Should We Log?

What we log will vary greatly depending on who the customer is, the risk and threat landscape they operate in, and their needs for log information. In some cases, the customer will have its own security organization, typically a Security Operations Center (SOC), responsible for monitoring networks and applications. They will then have requirements for what and how to log, but if this does not exist, we must define our own requirements to have a starting point.

Below are some points that should be an absolute minimum, but the team must understand what is logged, why it is logged, and how this information relates to other requirements such as privacy.

Authentications and Failed Authentication Attempts

If someone logs into the solution, this should be logged. This is especially important if it occurs from a place a user does not normally log in from, or if it happens with a different browser or client than usually seen. Failed logins should also be logged so that it is possible to act on them.

Errors during JWT validation or other session-related errors should also be logged so that they can be reviewed afterward.

Unauthorized Access Attempts and Access Changes

Events where users try to access functionality they are not normally authorized for are important signals that must be captured. This could be as simple as a user getting or testing a URL from a colleague, but it could also be an attacker trying to map or test an application. Regardless of the cause, it is important information that must be preserved - if an incident occurs later, it is important to be able to say something about movement patterns and the like leading up to it.

If the application supports elevating or changing permissions, these are also typical events that need to be logged. Elevation is a mechanism where a user is given additional permissions, but these must be “turned on” before they are available - often with an extra level of authentication such as MFA or similar. Examples of such mechanisms are sudo in Linux or Privileged Identity Management (PIM) in Azure. When these are activated, it is important that the logs reflect this since errors or weaknesses in these solutions would be critical for the application’s security.

Application Errors, Network Errors, and Similar

If errors occur in the application, these should also be logged. We should never give the user more information than absolutely necessary, but the details should be included in the logs so that they can be monitored or reviewed later.

If the application relates to the network, for example, by monitoring network connections, connections to other resources, or similar, disruptions or outages here should also be logged as they may be important indicators.

Logging Unexpected Inputs

All applications have inputs that can be described, even free text inputs where the user can enter anything. Inputs that violate validation rules or instances where a user attempts to change information that should not normally be changeable are typical cases that need to be logged.

If the application supports file uploads or similar, deviations from expected files, such as discrepancies between file type and file signature or unusually large or small files, should be logged.

Logging and Monitoring of AI Responses

Logging and monitoring for AI solutions should be designed as an ongoing operational and governance mechanism that makes it possible to detect failures, anomalies, and security incidents, as well as document that the solution continues to operate as expected in production.

System and performance logs (e.g., error rates, response time/latency, availability, and service quality) should be collected in one place so that alarms or other notifications can be configured for unexpected events. Model/function performance should also be monitored with clear metrics (e.g., task success rate, quality/confidence scores where relevant), including alerts when unexpected behavior changes or degradation occurs over time.

The logs should support the principles outlined at the top of this article and should be included in the process for managing updates and changes to the application. Monitoring should also cover compliance and customer/other requirements, and should have an established support channel so that users can report failures, unexpected results, or misuse, allowing the organization to assess whether the system is being used outside its intended purpose.

How Do We Log?

How we log will also vary from project to project, the platform we run on, and the resources we are allowed to use. An important point to keep in mind when designing the logging solution is that logs are a target for attacks! An attacker who can exploit vulnerabilities and then manipulate the logs can both hide activity and plant false evidence.

All logs we have should be stored in a place where data can be added but not changed afterward. The advantage of using such solutions is that you can collect logs from many different sources, such as cloud resources, network components, and applications, in one place. This can give you insights from multiple dimensions when reviewing an incident, which can be useful in understanding the overall situation.

Timestamps and Log Format

Being able to determine the sequence of events is incredibly important. We must therefore understand what the different log sources use as the basis for synchronizing clocks internally to be sure that an event on node A is related to another event two seconds later on node B.

It is also important to standardize log formats where possible. Much logging centers around the log message itself, which is typically text-based, but all metadata should be standardized where possible. Define what you need to see and ensure this is available from the various sources.

More Information

4 - Dependency Management

The status of the dependencies we have will change over time, and it is inevitable that vulnerabilities will be discovered that we must mitigate. This job can be as simple as updating to a new version, but may also require more significant changes to the application.

When the team is in maintenance mode, most of the issues mentioned in the article on Software Supply Chain still apply. You will encounter situations where:

  • A critical vulnerability is discovered in a package you use
  • Packages are deprecated and replaced with something new that is not directly compatible with the old
  • Developers behind packages stop maintaining them
  • Malicious actors take over a package and use it to spread malware

….and certainly other scenarios that result in you needing to do something. To ensure that packages hitting one or more of the points above are addressed, tools like Sonatype and others offer the ability to monitor various stages of the lifecycle, with the option to alert you when vulnerabilities or other events affecting quality occur.

Internal Components Operated by the Team

Dependency management is not only about packages from external ecosystems. Many teams also operate internal components such as application servers, integration services, containers, or virtual machines. These should be included in the same maintenance routines as the rest of the solution.

Updates and Support

Internal components should be kept up to date through a planned patch routine. Follow vendor release notes and assess what must be upgraded now versus what can be scheduled later. Components that are no longer supported should be phased out.

Security Controls Around Components

Components must be part of the overall security design. The team should maintain an overview of network exposure, identities/roles, and internal and external access paths. Apply a “deny by default” approach and only allow what is explicitly needed.

Logging and Monitoring

Internal components must be monitored in the same way as other services. Logs should be trustworthy, protected against tampering, and included in incident handling routines. See also Logging and Monitoring.

More Information

5 - Preparedness

The team must be able to restore services and data after destructive events. This article is about practical recovery: plans, exercises, and verification that restoration actually works.

An untested backup is worthless. The same applies to a disaster recovery plan that has never been tested.

If the team has done the groundwork well, you have a plan for disaster recovery that describes how infrastructure, applications, and data are restored to normal operations.

The reasons for restoration vary: human error, delivery errors, vendor failure, unavailable infrastructure, or malicious events. The goal is always the same: to reduce downtime and data loss with predictable processes.

Minimum Requirements for Recovery

The team should at minimum have control over:

  • documented recovery objectives (RTO) and acceptable data loss (RPO)
  • verified backups of data, configuration, and dependent artifacts
  • a concrete, step-by-step recovery recipe that can be followed by more than one person
  • necessary access, roles, and access packages for those who will perform the recovery
  • a test environment where the recovery plan can be practiced without affecting production
  • clear criteria for when the system can be reopened to users

Testing and Verification

Recovery must be tested regularly. The team should plan exercises that cover both simple errors and more complex scenarios.

Examples of exercises:

  • recovery of a single component (e.g., database or app service)
  • full recovery of a critical service in an alternative environment
  • validation of the backup chain from backup to verified restore
  • exercise where key personnel are not available

After each exercise, you should document what worked, what failed, and what measures need to go on the backlog.

An example recipe for the solution outlined in the article on system diagrams could be as follows. The premise of the plan below is that we have source code and pipelines available in, for example, Azure DevOps, and the application and resources have mysteriously disappeared from Azure:

  1. Check that new subscriptions are in place in Azure
    • Configure Azure Pipelines to deploy to these
    • Verify that all Entra groups are available
  2. Deploy infrastructure as code
  3. Configure NSGs and firewalls (if not done as code)
    • Turn off access outside the delivery team to avoid user interference with the restore process
  4. Verify that resources have access to the data platform
  5. Verify access to the database
  6. Restore application and data:
    1. Restore data to the database from the latest backup
    2. Deploy backend
    3. Deploy frontend
  7. Verify that the application works
  8. Publish PowerBI report
    • Verify that it can read data from the backend
  9. Turn on access for end-users so they can use the application again

It is worth mentioning that each of the points may need additional information, with references to access packages or group memberships for the person restoring to gain the necessary access.

Recovery for AI System Components

If the solution has AI components, the recovery plan must also cover:

  • restoration of model artifacts and versions
  • restoration of configuration for model routing, prompts, and security boundaries
  • restoration of vector indexes/feature stores where used
  • verification of model quality after restore (not only that the service responds)
  • review of AI-related logs so that incident sequences remain traceable

This supports control areas such as “AI system operation and monitoring” and “AI system recording of event logs” in an operational recovery context.

More Information

6 - Contingency Plans and Incident Response

The team must know which requirements apply to security incidents, who is responsible, and how notifications and escalation should be handled. This article covers governance, compliance, and coordination.

Many people think of security incidents as targeted attacks where someone attacks a solution by hacking it. In some cases, this may be correct, but an incident can be much more.

NSM defines a security incident as “A deviation situation where there is a potential for loss of confidentiality, integrity, and/or availability of information or ICT services. A security incident can occur as a result of a data attack, technical failure, or unintentional errors.” In other words, an incident can be almost anything that affects confidentiality, integrity, and availability. Different customers have different requirements for when to notify, escalate, and report, and to whom.

What This Article Covers

This article addresses overall incident response and preparedness:

  • which requirements the team must comply with
  • who is responsible for what
  • how notifications and escalation should occur
  • which logs and monitoring must be in place

Operational recovery after destructive incidents is covered in the article about preparedness.

What the Team Must Control

Before an incident occurs, the following must be clarified and documented:

  • contact points with the customer, delivery manager, and possibly SOC/NOC at the customer site
  • clear criteria for when to notify immediately
  • who can authorize actions, downtime, and external communication
  • which requirements apply to reporting, audit, and deviations
  • which dependencies the solution has on other systems

This must be known throughout the team, not just by individual persons.

Monitoring and Logging

Incident response requires that you actually see what is happening in the solution. The team must ensure that necessary logs are collected, protected, and made available for analysis.

For AI-component solutions, this additionally means:

  • monitoring of model and service behavior over time
  • logging of events related to AI calls, anomalies, and access
  • alerts when unexpected changes in quality, response, or behavior occur
  • traceability for decisions affecting security or compliance

This supports control areas such as “AI system operation and monitoring” and “AI system recording of event logs” in ISO 42001.

See also Logging and Monitoring for more detailed information on this topic.

When an Incident Occurs

Incidents can take many forms: vulnerabilities in applications or dependencies, operational anomalies, or active attacks.

If you discover or have reason to believe that a solution is under attack, this must be reported to the customer immediately. It is not always the case that the attacked solution is the target; in many cases, a solution is just a stepping stone to another. Therefore, it is also important to know what accesses and network openings it has to other solutions, so the customer’s IT organization can check these for signs of attacks.

If you come across signs that a solution has been attacked or used for an attack, it is also important to notify the customer so they can secure information and evidence for further investigation.

Remember

Incident handling and investigation is a specialized field. If you come across signs that something may have happened, notify your contact point and wait for instructions from them before taking any action.

Further reading