Webpage fingerprinting attacks can get information from hypertext transfer protocol secure (HTTPS) network traffic, and then leaks the privacy of users. Studying webpage recognition helps to find out security vulnerabilities in current encryption protocols, it is significant to improve the privacy protection policy of users, and increase the network management level in network service provider (ISP) management. Current webpage recognition does not fully consider its application layer characteristics, ignoring actual webpage browsing scenarios such as browser caching mechanisms. With the help of the characteristics of the HTTPS protocol stack and webpage loading procedures, a two-phase webpage identification method Penetrator is proposed through utilizing the application data unit (ADU). The ADU feature reconstruction enhances the exploitation of application layer information in the HTTPS traffic, taking the ADU length sequence as the feature for webpage identification. Through the theoretical analysis and experimental verification, the results show that the application layer characteristics can effectively identify the encrypted webpages. The experiments indicate that the Penetrator effectively reduces the errors of the HTTPS protocol stack, extracting the ADU length sequences with a protocol error rate of below 0.98%. Compared to existing methods, the Penetrator has a superiority in webpage identification.
The accurate analysis and identification of encrypted mobile application traffic can provide an important technical support for network management, information supervision, and security detection, etc. It is of great significance to cyberspace security and governance. A classification method based on multi-dimensional feature learning was proposed to effectively identify encrypted mobile application traffic. Firstly, this method extracted the transport layer payload and session features from the mobile application traffic, then built the multi-dimensional feature deep learning model. The convolutional neural network was used to learn the spatial features of payloads, the long short-term memory network was used to learn the time series features of encrypted flows, and the graph convolutional neural network was used to learn the session features of the mobile application, and further concatenate and fuse the multi-dimensional features, achieving the classification and identification of encrypted mobile application traffic. Based on the encrypted mobile application dataset, the experimental results show compared to other classification models, the proposed method has an optimized performance in encrypted traffic classification for mobile applications.
With the gradual popularity of tunneled anonymous network technology represented by Tor over VPN, the bearing service identification of it can further enhance the intent identification of malicious network behavior traffic, which is of significance for cyberspace security governance. Existing research lacks the bearing service identification of types for Tor over VPN traffic hosting. A self-attention convolutional recurrent neural network-based Tor over VPN traffic bearing service identification method is proposed. First, packet-level spatio-temporal feature extraction is performed on Tor over VPN traffic samples to establish a multidimensional spatio-temporal feature set. Meanwhile, considering the demand for mining deep spatio-temporal features in the identification task, a convolutional recurrent neural network model incorporating the self-attention mechanism is adopted to realize the bearing service identification of Tor over VPN traffic. The experimental results show that the average accuracy of bearing service identification reaches 96.2%.
Network traffic detection and identification remains a perennial topic, yet studies on the correlation between network traffic behavior and entities are limited, leading to an ambiguity regarding the source of traffic. The widespread adoption of encrypted traffic poses challenges in securing operational visibility, as detecting cyberattack typically requires access to the plaintext information of encrypted traffic. In this paper, we explore the method of collecting contextual message through the runtime process entities, which provides infogain for encrypted traffic measurement. Our work achieve an observing view into the behavior encrypted traffic from process entities, enhance the efficiency of parallel security misson and backtracking analysis. Such advancements could mitigate the telemetry challenges in cybersecurity defense, improving the early warning for APT threats in encrypted traffic.
Encryption traffic classification is the process of identifying the service, applications, and protocols running behind network encryption traffic in order to improve the quality of network service or provide the security assurance of networks. Mainstream encryption traffic classification schemes are conducted to train and achieve reliable performance by large datasets. However, with the development of Internet technology, network traffic, calculation nodes, and network services, there are the requirements of different encryption traffic allocations, it becomes more and more impractical to collect and label enough encryption traffic. Therefore, it is crucial to study a technique that can accurately classify encrypted traffic with fewer encryption traffic samples and quickly generalize the model. In this paper, a novel method for encrypted traffic classification based on few-shot learning is proposed. This method simulates and optimizes the traffic classification task based on the principles of meta-learning. Moreover, the pre-trained convolutional neural network (CNN) model is used to extract the feature, a novel parameter decomposition method is introduced on the basis of the special computational architecture of CNN to rapidly adapt to the data distribution on various tasks. Finally, through the comparative experiments with N-way and K-shot setting, the experimental results show that the accuracy of the proposed method achieves by 98% with the K coefficient of 10, the accuracy of the few-shot learning method is higher than that of the reference model.
The correlation analysis of encrypted flows in the Tor network is one of core techniques for its traceability. Aimed at unreliable temporal features and weak initial feature representation in current flow correlation methods based on deep learning, a correlation analysis method based on time-frequency analysis and graph convolutional network (GCN) is proposed. The method utilizes the packet length information of Tor network traffic as a raw feature sequence, a time-frequency distribution function is used to map the packet length sequence to the time-frequency domain, and further embed it into the graph-structured data, its high-order features are extracted by using the graph convolutional neural network. Finally, the obtained high-order features are then input into triplet network to achieve the correlation of similar flows. Experimental results show that with a false positive rate of 0.1%, the correlation accuracy of the proposed method achieves by 83.4%, significantly outperforming the existing DeepCorr and Attcorr methods.
With the rapid development and wide popularization of the Internet, more and more enterprises and individuals adopt virtual private network (VPN) technology to avoid network censorship, which brings an enormous challenge to cyberspace management and governance. It is more and more important for cyberspace governance to identify the VPN encrypted traffic. Therefore, for the VPN traffic identification, a multi-strategy hybrid VPN node identification method is proposed. This method combines various strategies of speed test unit discovery based on the random forest (RF) algorithm, VPN node recommendation based on density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm, and VPN node verification based on active probing, achieving a closed-loop process from the discovery to the verification of VPN. The proposed method is verified on a real large-scale network traffic metadata set of billions, the experimental results show that the generalization accuracy rate of the classification model based on the RF algorithm reaches over 90% for speed test behavior identification, and the accuracy rate of the active detection verification mechanism to the VPN is 90.6% for suspected VPNs. The multi-strategy hybrid method can effectively identify VPN nodes, providing a novel perspective research on VPN traffic identification.
Private set intersection (PSI) is an important privacy-preserving computation protocol, which safely computes the intersection of two or multiple sets without leaking set data. With the rapid development of the Internet and big data, the attention of users to data privacy protection is increasing. Therefore, the research on PSI is not only of significance in theory, but also of very high value in practice. PSI technology is developing rapidly, and its type is complex and diverse. Based on different cryptographic primitives, it constructs PSI protocols and their applicable scenarios, it is of great practical importance to select appropriate PSI schemes according to specific requirements. This paper aims to provide a comprehensive overview in the research progress and application area of PSI and its variants. The application of PSI techniques is researched in practical products. Additionally, the performance and applicability of major open-source PSI libraries are tested and evaluated. Finally, the challenge and future development direction are discussed in the field of PSI technology. Through comprehensively introducing and deeply researching on PSI, the importance and application value of this technology can be better understood, which provides a more effective solution for privacy protection and promotes the widespread application and development of PSI technology in practical scenarios.