Next-generation detection engine of LOGTITAN Next-Generation SIEM combining rule-based and ML-based techniques. LOGTITAN Next-Generation SIEM utilizes machine learning models and advanced correlation rules together, and dynamically update each of them. [1]
Anomaly detection via classification
Anomaly detection with LOGTITAN SIEM infers a probabilistic model for the network behaviors of each IP address. Each network event is assigned an estimated probability (henceforth, the event’s “score”). Those events with the lower scores are flagged as “suspicious” for further analysis.
LOGTITAN utilizes Linear Discriminant Analysis (LDA) [2] as a classifier.
The advantage that LDA has over other algorithms is that most clustering algorithms would allow a data point to belong to only one group. There could be hundreds of different groups, but in the end, a data point will belong to just one. Often, it is not that clear-cut. LDA clusters things in such a way that they can belong to more than one group, i.e. “soft” clustering, as opposed to “hard” clustering.
Another advantage of LDA is that most clustering algorithms have some distance function between what is being measured that is similar. The LDA algorithm allows a data point to be partially similar to the other data points. This comes back to the “softness” of the algorithm — it allows partial similarity for a data point to still belong to a group.
Feature Selection is one of the core concepts in machine learning. LOGTITAN SIEM anomaly detection model features are:
- Source IP
- Destination IP
- Source Port
- Destination Port
- Protocol
- Time
- Sent Bytes
- Received Bytes
- Sent Packets
- Received Packets
LOGTITAN SIEM machine learning model uses a topic-modeling approach that:
· Simplifies entity log records into words.
· A topic modeling approach is used to infer a collection of “topics” that represent common profiles of network activities.
· These “topics” are probability distributions on words.
· Each entity has a mix of topics corresponding to its behavior.
· The probability of a word appearing in the network activity about an entity is estimated by simplifying its log record
· into a word, and then combining the word probabilities per topic using the topic mix of the particular entity.
· Create these models using the factory in the companion object.
LOGTITAN converts logs to words.
Examples:
(1) A record with source port 1066, destination port 301, protocol given as TCP, time of day with hour equal to 3, bytes transferred equal to 1026, with 10 packets sent.
The word “301_TCP_3_12_5” is created for the source IP document.
The word: “-1_301_TCP_3_12_5” is created for the destination IP document.
(2) A record with source port 1194, destination port 1109, protocol given as UDP, time of day with hour equal to 7, bytes transferred equal to 1026, and 1 packet sent.
The word: “333333_UDP_7_12_1” is created for both the source and destination IP documents
Created words inserted into the document associated with the source IP document and destination IP document then LDA algorithm applied.
LOGTITAN detects anomalies using LDA and also LOGTITAN supports many ML models and correlation rules.
LOGTITAN is ready for the following ML libraries also. [1]
· http://spark.apache.org/docs/latest/mllib-guide.html
LOGTITAN threat detection and anomaly detection module utilizes many ML models and datasets.
References
1- https://www.logtitan.com/rule-as-a-code/
2- https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation