7.10 Key distribution system
-
7.10.4 Advanced topic
-
Clock synchronization considerations
Since the service itself depends on the clock, the server and client should use the ntp to synchronize clock. Here we discuss the tolerance limit for clock error.
1. The server clock does not synchronize. In theory, the masterKey lifetime changes to the difference that the configuration life minus the inter-machine clock. That is the intersection interregional the masterKey life between the machines. The very long configuration life helps to tolerate the unsynchronization of the clock.
2. The client clock is not synchronized, and nonce is obtained by dividing the system time by the configuration precision. The keyIdent combined by timestamp, nonce and group is the key of client cache. The result of the out-of-synchronization is that the entry of timestamp and nonce in the cache is increased, in other words, the out-of-synchronization of the client clock will cause the usage amount of the client cache to be larger and the request of network to be more.
-
The distribution of masterKey
The masterKey is distributed in the P2P network. Because of the difference of the application environments, the modified Kademlia algorithm is applied. Because there is no the realistic evaluation result of the Kademlia parameter, there are a lot of configurable system parameters.
1. Use HTTPS to replace UDP to ensure the strict authentication between the servers.
2. The alternation of the protocols in the process of the DHT searching exchanges the masterKey between the servers incidentally. The searching originated server sends the known timestamp set while sending its own node information. The receiving server returns the masterKey set which the other side has none and its lack of timestamp through diff operation, while returning the neighbor information. The searching originated server resolves the returned neighbor information, merges the returned masterKey set, and uploads the lack masterKey set.
3. The length of the Node address is decided by limax.p2p.DHTAddress.HASH, and the default is SHA-256.
4. The size of the single k-Bucket is decided by limax.p2p.Neighbors.BUCKET_SIZE, refers to BT, and the default is 8.
5. The update method of k-Bucket is different from that of Kademlia, with the standard that the distance is prior, because the server does not need to too much consider the occasional online problems. The new added node of k-Bucket sets PING timer. If the node already exists, cancel the previous timer. If k-Bucket overflows, delete the last entry, and cacel the corresponding timer. PING is implemented through connecting HTTPS port. The life cycle of PING timer is decided by limax.p2p.Neighbors.ENTRY_AGE_MAX, and the default is 20 minutes. The timeout of PING is decided by limax.key.KeyServer.NETWORK_TIMEOUT, and the default is 3 seconds.
6. The Refresh does twice searching, searching itself, and searching a random address. The life cycle of Refresh is decided by limax.key.KeyServer.NEIGHBORS_REFRESH_PERIOD, and the default is 20 minutes.
7. The size of local initialization set used by searching operation is decided by limax.key.P2pHandler.BASE_LIMIT, and the default is 8. The size of set returned by searched server is decided by limax.key.P2pHandler.REPORT_SIZE, and the default is 16. The parallelism of searching is decided by limax.key.P2pHandler.CONCURRENCY_LEVEL, and the default is 64. The maximum size of expected searching result set is decided by limax.key.P2pHandler.ANTICIPANTION, and the default is 8.
8. If the number of searching results does not meet the anticipation, a round of searching is performed from the server configured with master attribute in the configuration file. These server specified by configuration are similar with the super seed in the BT environment. As the neighbor table gradually grows, it is expected that the frequency of access to these super seeds will gradually decrease.
9. The publish server distributes the masterKey to all addresses in the neighbor table, and all addresses specified by master configuration attribute.
10. Since the frequency of refresh is 20 minutes, each refresh locally selects 8 nearest node, which is less than 4 layers and is counted as 3 layers (the Kademlia uses binary tree bases on address prefix to divide the address space). The address algorithm uses SHA-256, total 256 layers, 256 / 3 * 20 / 60 / 24 = 1.19 days. It takes 2 days that the masterKey is synchronized to the far end as the most conservative estimate, and the publishPeriod attribute in the configuration file is set as 3 days. So the server priorly uses the second new maserKey to provide the service, strictly speaking, the lauching of the entire network at least needs 2 days for initiation silence period. In the actual operating, this problem is no longer important through specifying the proper master configuration.
11. Correctly handle NAT. In implementation, the DHTAddress associates with InetSocketAddress and is represented as NetworkID, and InetSocketAddress as the secondaryKey when searching. If there is no allocated external IP in the NAT internal server, the Refresh frequency should be appropriately improved, and more master is configured.
-
Multiple masterKey distribution servers
1. In theory, only one distribution server should be accepted.
2. In implementation, when the distribution server lauches, it synchronizes all masterKey to neighbor server, records the timestamp of the latest masterKey, and then lauches the distribution timer. When the timer expires, check whether the current latest timestamp is equals to the previous record. If it is equal, execute the distribution; if not, do not execute, and finally launch the next round timing according to the latest timestamp.
3. From the implementation, it needs to meet two conditions to enable multiple distribution servers. First, the network communication between the distribution servers is good. The distribution server set shares the master configuration, including the addresses of all distribution servers, which ensures that the latest generated masterKey can be immediately synchronized. Second, the launch of the distribution server sets a time difference, to ensure that before the distribution operation of one of the server is sent to the other servers and set their latest timestamp, the distribution timer of them is not expired.
4. Actually, the precision of timestamp is milliseconds. Even though two servers are distributed at the same time, the probability of generated timestamp collision is very small, and at most all servers record one more masterKey.
-
Resource access model of P2P is not used
1. In theory, search the corresponding masterKey in the P2P network via timestamp, without waiting a longer masterKey distribute period. Actually, it is infeasible.
2. Searching is a high spending behavior. A malicious client can forge timestamp to send request to the server. If the searching is provided, P2P network attacking become possible.
3. To avoid such attacking,, the timestamp signature is required. The server does a filtering before initiating a searching operation, which needs a signature Key, and the entire service network itself is to provide Key. So this problem becomes a loop.
-
Isolated server, isolated network
1. The server separated from P2P network is called isolated server, and the network disconnected from the distribution server network is called isolated network.
2. An isolated server, or server in the isolated network can generate a key for the client, but it is possible that the keyIdent generated after the client accesses the server in the non-isolated network can not be restored. The reason is simple that the new masterKey associated with timestamp is not received.
3. The isolated server can be confirmed by the log. If there is the entry NNN = 0 in the log "Neighbors save NNN entries", it should suspect that the server has been isolated.
4. In the actual environment, the certificate can not be renewed in time, and the server can be isolated after expiration. Actually, even if the 30 days short-term server certificate is signed, 80% life begins to renew, which means that ther is 6 days time to renew, re-try once an hour and total 144 times. The expiration is almost impossible, unless that CA has been offline for 6 days.
5. The server certificate is recoveryed by CA. In this condition, the server should be closed.
6. The isolated network can be confirmed by the log. The "MasterKeyContainer CurrentDigester" records the masterKey update event. The time difference between two such logs is approximately equal to the public period. If this record beyond twice public periods has not yet appeared, it means that this server and its neighbor are all in isolated network.
7. In the actual environment, unless a wide range of network failure, the isolated network does not exist. A larger publishPeriod in the configuration file can increase the torlerance limitation of network failure.
-
High availability of client
1. The larger nonce setting can effectively use cache and greatly reduce the network access.
2. Although the client does not join the P2P network of server, the same level of high availablity is also provided to effectively reduce the network failure.
3. Set the network acees priority. The DNSName address provided by certificate has a low priority; the address set by KeyAllocator.setHost has a medium priority; each time a Key request is originated from the server, the server will return a random server address list, and this list has a high priority. The client measures the RTT of these addresses (in the timeout limitation decided by limax.key.KeyAllocator.TIMEOUT, the server exceeding this limitation is directly discarded). The subsequent Key requests are accessed one by one according to the high priority list sorted by RTT, medium priority list and low priority list sorted by RTT until success. The failed IP is deleted from the high priority list.
4. The DNSName provided by certificate, domain name and resolved IP address provided by KeyAllocator.setHost may change. The client periodically resolves these addresses. The limax.key.ServerEvaluate.DOMAIN_UPDATE sets the resolution frequency, and the default is 5 minutes.
5. The address list returned by server can be updated each time. The implementation selects 8 IP with minimum RTT to use, through system attribute limax.key.ServerEvaluate.DYNAMIC_SERVERS to modify.
6. The maximum number of random address list provided by server is decided by limax.key.P2pHandler.ANTICIPANTION, and the default is 8.
7. The default of limax.key.KeyAllocator.ISOLATED_SERVER_THRESHOLD is 0. This attribute provides the filtering ability of isolated server. If the number of random list returned by server is smaller than this value, this server is considered unreliable and continues to access the next IP. This value should be carefully configured, and should not bigger than limax.key.P2pHandler.ANTICIPANTION.
-
Security consideration
1. The server uses HTTPS interaction to receive the request party to verify whether the CN of Subject in the other party’s certificate is the same as its own certificate, and verify whether the Issuer name is same at the same time. (Here the CA certificate can not be compared, because the CA itself may renew. The renew rule of CA requires that the Subject can not change, and refers to the previous chapter "PKIX support")
2. Each server connecting to the network must ensure the security of private key. The private key of any one of server may cause the mastrKey stolen and harm the entire network. The PKCS11 hardware should be considered, and the enough redundancy ensures that the failure of hardware is acceptable.
3. The attribute revocationCheckerOptions in the configuration file does not select DISABLE. In the condition that CA is in the normal operation, the SOFT_FAIL is considered to use at least.
-
Operation and maintenance
1. The Key distribution network should be regarded as the infrastructure as the same level as CA and the most reasonable choice is to be directly maintained by CA.
2. In the configuration file, the master attribute may be configured as a hierarchical structure or network structure (loop is not a problem). The essence is to adjust the P2P network from the dynamic network to static to further improve the robustness. (Refer to the section "Mulitiple masterKey distribution servers")
3. Concern about the sever isolation and network isolation issue.
-
Key distribution system is essentially off-line negotiation of key
1. Key negotiation is the basis for secure communication between systems.
2. Common security protocols, such as SSL, must complete the key negotiation when both communication parties involved are online at the same time.
3. Online negotiation can not cover all application scenarios.
4. The Key distribution system provided by Limax implements the offline negotiation between the clients by providing a third-party organization that ensures continuous online presence.
5. From the perspective of confidence-building, the credible third parties are the prerequisites for security, which is exactly the same as the CA.
-