In 1955, a group of researchers led by John McCarthy submitted a conference proposal titled “A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence.” Since that moment, the community of researchers hasn’t stopped developing ways to enable machines to perform tasks that would otherwise require human intelligence.

The launch of ChatGPT in late 2022 brought artificial intelligence (AI), specifically Generative AI (GenAI), to the mainstream. In the past few years, advancements in machine learning models, increased computational power, access to large datasets and other key factors all contributed to the rapid advancement of GenAI technologies, making it one of the most exciting areas in AI today.

AI use cases in security operations

GenAI has many use cases, such as content creation, process automation, chatbots, and virtual assistants. While many new GenAI services exist, vendors often provide GenAI capabilities as part of their existing SaaS offerings. To prepare for the initial adoption of GenAI technologies at Cisco, my colleagues in InfoSec and I conducted initial vendor security assessments of a few GenAI offerings from third-party vendors.

We learned that AI services are designed, developed, tested, deployed, and operated like any other cloud services. The existing security controls and best practices for cloud services also apply to AI services. However, there are additional security concerns when it comes to using Large Language Models (LLMs). Beyond security risks, there are also legal, privacy, and ethical concerns regarding LLMs and AI. Developing a comprehensive AI strategy is essential for enhancing security operations and ensuring responsible AI integration.

AI can be leveraged in various areas within an organization. However, not all organizations are ready to adopt AI in security operations. According to the Cisco Cybersecurity Readiness Index released in March 2024, more than half of organizations have yet to incorporate AI into their security operations to secure networks, identity, devices, and cloud.

My InfoSec colleagues and I set out to become early adopters of GenAI. As we learn more about GenAI, we are always thinking of ways to leverage these new technologies in enterprise security operations. Across Cisco, thousands of business applications are being used every day to run the business. More than half of these applications have sensitive data. The Cisco InfoSec team works hard to enforce measurable security controls for enterprise applications. This had us wondering how we could use AI to move faster and scale more effectively to meet the demands of our security operations. Conversations with others on the InfoSec teams led us to many potential use cases of using AI to deliver automated and agile business processes.

Specifically, we identified these four types of use cases in security operations:

Security Chatbots: AI-powered chatbots can handle routine security queries, troubleshoot issues, provide the most relevant solutions instantly, and give security guidance according to Cisco policies and industry best practices.
Process Automation: AI-powered automation carries out repetitive tasks, such as assessment reporting, stakeholder communication, security risk tracking, and performance monitoring. This not only saves time but also reduces the likelihood of human errors.
Threat Detection and Prevention: AI-powered tools can analyze patterns in security audit logs, identify anomalies, correlate data from multiple sources, and detect potential threats in real-time. Incident response teams can leverage these tools to perform investigations with speed and efficiency. They can also turn manual tasks into automated detection and response playbooks.
Predictive Analytics: By analyzing historical security data, AI tools can predict potential future threats and vulnerabilities, allowing teams to proactively enhance their security measures.

Challenges of using LLMs out-of-box

Among the four use cases, the security chatbot was selected by our team as the first GenAI project. Playing with the OpenAI GPT models, we quickly realized that there are challenges of using LLMs. These foundation models are powerful out of box, but they were not trained with specific knowledge needed to respond properly to our prompts. Foundation models are limited to everything they learned during model training, known as parametric knowledge. They can’t account for current events or private data. To overcome this limitation, we needed to supply our own data.

These are four common ways to do it, listed from highest to lowest cost:

We can develop a new model using our own data. This option involves the most significant investment in terms of resources, expertise, and financial allocation, making it an infrequent choice for AI adoption among many teams.
We can fine-tune a foundation model with our own data. This is common if we plan to develop a niche AI product for the market. It can be expensive and requires large amounts of data, time, and expertise.
We can add our own data in the system prompt as context to an LLM. This technique retrieves data outside of a foundation model and uses that data to augment input prompts to a model to generate output. It is known as Retrieval Augmented Generation (RAG) and is now commonly used by AI application developers. Using both parametric knowledge and context data, AI can generate responses with greater precision.
We can train our users to write better prompts and supply their own data in the user prompt. This doesn’t require any developer resources, but the effectiveness may vary from user to user. It’s ad hoc, not repeatable, and not scalable.

Both options three and four are part of a relatively new discipline, called prompt engineering, for developing and optimizing prompts to efficiently use LLMs. They are not without limitations. The most common one is token limit, which is a restriction on the number of tokens that an LLM can process in one interaction. LLMs can have different token limits; for example, GPT3.5 has a token limit of 4,096 while GPT4 has a token limit of 32,768. A common English word may consume one to several tokens. In general, LLMs with larger token limits can provide better answers, at a slower response speed and higher operating cost compared to LLMs with smaller token limits.

We are also concerned about the consequences of sending private data to LLMs. It’s imperative to conduct a comprehensive assessment of LLM providers with respect to security, privacy, and regulatory compliance. Inquiries should include, among others:

Will our private data be captured by the vendor, thereby posing a risk of exposure online or to an external entity?
Will our private data be used by the vendor to train models?
Was the training data for the model sourced responsibly?
Will there be copyright or legal constraints if we use the LLM output?

Lessons learned from building an AI chatbot

For the last few months, I worked with a few other security engineers to prototype a proof-of-concept chatbot application. Our project provides a web chat interface, a grand agent that orchestrates multiple tools carrying out different functions based on user inputs, with backend API connection to LLMs and data sources. The entire application runs in a Docker container that can be deployed to any container platform.

In May 2023, the Open Web Application Security Project (OWASP), a non-profit organization that works to improve the security of software, released the top 10 most critical vulnerabilities in LLM applications to inform people about the potential security risks when deploying and managing LLMs.

Among the OWASP top 10, prompt injection is listed as the topmost critical one. As we worked on developing our own AI chatbot applications, we became familiar with the risk of prompt injection. Prompt injection is like other injection attacks commonly seen in applications. The AI chatbot interface makes it easier to inject malicious prompts that could override system prompts. This type of attack on LLMs has been reported¹ by researchers in the field as early as 2022. While the exact prompt injection can no longer be re-produced today, there are many variances of prompt injections.

For example, in one of our AI tools, users can ask for security risk information, and the LLM in the backend is smart enough to formulate an SQL query to retrieve relevant data from a backend database. If we don’t put guardrails around the database access used by the LLM, users could potentially instruct the LLM to take more actions in the database than intended for the tool. Since the small group of developers working on the AI project are all security engineers, we consciously incorporated security best practices into our architectural designs and throughout the DevOps lifecycle. However, such practices might not be the standard among other teams.

Hallucination is another big issue in LLMs. In our AI project, although we set the temperature to 0 and instructed the LLM not to make up answers, it still occasionally does. To measure the accuracy of outputs from AI tools, we implemented Python parametrized tests and used AI to verify output. In our test scripts, we asked the AI tools seven different questions, then repeated this process 10 times. On average, three out of 70 questions came back with wrong answers. That’s a 95.7% accuracy rate. This experiment taught us that output verification is essential in AI applications.

Guidance on successful adoption of GenAI

Through security research, prototyping an AI chatbot, and collaborating with other teams of AI adopters, we gained knowledge and expertise. For any team considering AI, we put together this short list of guidance on successful AI adoption:

Digitize business data and workflows. Automation enables both efficiency and scalability in our processes. To do that, we need digitized data and well-defined workflows. As mentioned earlier, we can make better use of AI with our private data.
Make sure data is easy for AI to understand. If the data is hard for a human being to understand, AI will struggle as well. We noticed that LLM hallucinates the most when it doesn’t have a good understanding of the data. Transforming existing data and providing context in system prompts would help.
Break down complex tasks. The current AI models are better suited for one simple task in each interaction. These simple tasks can be chained together to complete more complex tasks in user prompts or application logic. Complex tasks often take multiple interactions with LLM.
Add authentication and authorization controls for AI tools. These access controls should surround the AI applications and data sources like guardrails. When integrating data sources with the AI tools, AI will not understand who should have access to the data. The access controls must be implemented comprehensively across all layers of an application and its associated data sources to prevent unauthorized access, whether it is from humans, automated processes, or AI systems. Asking AI to enforce access control is bound to fail at times.
Validate AI responses using automated tests. As our test demonstrated, outputs from LLMs can be unpredictable, and must be validated before the response is returned to the user or downstream workflows.

Start using GenAI daily to see its full potential

To conclude, there are many potential use cases of GenAI in security operations. During the development of a prototype AI project, I gained valuable insights into GenAI's capabilities and challenges. Leveraging the guidelines above, I can help with the adoption of GenAI technologies within our teams. Given the unprecedented rate at which GenAI is evolving, it is imperative to start our engagement with these technologies early to remain at the forefront of this rapidly advancing field.

I regularly use GenAI tools for work and personal use. I strongly recommend that others do the same.

It can be as simple as using a chatbot powered by GenAI such as ChatGPT. For people who know how to code, look for opportunities to write simple scripts or apps using LLMs. Together, we will all benefit from learning more about GenAI to make informed decisions regarding these new technologies.

Learn more about how AI and security overlap here.