1 相关公式及算法
参考:https://slidesplayer.com/slide/17887679/
源码:https://github.com/pp-tt/Vigenere
1.1 加密算法
cj+td=(mj+td+kj)mod26c_{j+td} = (m_{j+td} + k_j){quad} mod {quad}26cj+td=(mj+td+kj)mod26
1.2 解密算法
mj+td=(cj+td−kj)mod26m_{j+td} = (c_{j+td} – k_j) {quad} mod {quad}26mj+td=(cj+td−kj)mod26
1.3 重合指数及无偏估计值
1.3.1 重合指数
设某种语言由n个字母组成,每个 字母i发生的概率为 pi(1≤i≤n)p_i(1≤i≤n)pi(1≤i≤n), 则重合指数就是
指两个随机字母相同的概率,记为 ICICIC ,IC=Σi=1npiIC = Sigma^n_{i=1}p_iIC=Σi=1npi
1.3.2 无偏估计
一般用 ICICIC 的无偏估计值 IC‘IC^‘IC‘ 来近似计算 ICICIC。其中的 xix_ixi 表示字母i出现的频次,LLL 表示文本长度,nnn 表示某种语言中包含的字母数。IC‘=Σi=1nxi(xi−1)L(L−1)IC^` = Sigma^n_{i=1}frac{x_i(x_i-1)}{L(L-1)}IC‘=Σi=1nL(L−1)xi(xi−1)
1.4 拟重合指数
设某种语言由 n 个字母组成,字母 iii 的统计概率为 pi(i=1…n)p_i(i=1…n)pi(i=1...n) , 每个字母在密文字串 Cj(j=1…d)C_j(j=1…d)Cj(j=1...d) 中出现的频次为 fi,jf_{i,j}fi,j ,每个密文子串 CjC_jCj 的长度为 ni,jn_{i,j}ni,j ,则第 jjj 个子串的拟重合指数定义为:Mj=Σi=1npi.fi,jni,j,j=1…dM_j = Sigma^n_{i=1}{p_i}.frac{f_{i,j}}{n_{i,j}}, quad j=1…dMj=Σi=1npi.ni,jfi,j,j=1...d
2 代码相关
2.1 加密
# 文本处理函数,注意要将所有文本转为小写字符,还需要将非字母元素去除
def is_alpha(text):res = ''for i in range(len(text)):res += text[i] if text[i].isalpha() else ''return res.lower()# 加密
def encryption(in_path, out_path, key):with open(in_path, "r") as f: # 打开文件text = f.read() # 读取文件text = is_alpha(text)t_len = len(text)k_len = len(key)i = 0res = ''while i < t_len:j = i % k_lenk = charset.index(key[j])m = charset.index(text[i])res += charset[(m+k)%26]i += 1with open(out_path,"w") as f:f.write(res)return res
5.2 无 KEY 解密
5.2.1 计算 CI‘CI^`CI‘
def cal_CI(text):L = len(text)x = []CI = 0for e in charset:x.append(text.count(e))for i in range(26):CI += ((x[i]*(x[i]-1)) / (L*(L-1)))return CI
5.2.2 分组
def cal_d(text, k_len):d_str = ['' for i in range(k_len)]for i, c in enumerate(text):d_str[i % k_len] += creturn d_str
5.2.3 计算平均分组后的平均CI‘CI`CI‘
def cal_CIave(text, k_len): d_str = cal_d(text, k_len)CI_average = 0for i in range(k_len):d_str[i] = cal_CI(d_str[i])CI_average += d_str[i]CI_average = CI_average / len(d_str)return CI_average
5.2.4 计算 KEY 长度
def get_k_len(text):M = [(1,cal_CI(text))]+[(0,0.0) for i in range(49)]for i in range(2,50):M[i] = (i,abs(0.065 - cal_CIave(text,i)))M = sorted(M,key = lambda x:x[1])n = [0 for i in range(10)]for i in range(2, 10):for e in M[1:10]:if e[0] % i == 0:n[i-1] += 1return n.index(max(n)) + 1
5.2.5 破解密钥获取密码
# 字符频率
F = [
0.0651738, 0.0124248, 0.0217339,
0.0349835, 0.1041442, 0.0197881,
0.0158610, 0.0492888, 0.0558094,
0.0009033, 0.0050529, 0.0331490,
0.0202124, 0.0564513, 0.0596302,
0.0137645, 0.0008606, 0.0497563,
0.0515760, 0.0729357, 0.0225134,
0.0082903, 0.0171272, 0.0013692,
0.0145984, 0.0007836
] # 猜测单个秘钥得到的重合指数
def count_CI2(cipher,n): # n 代表我们猜测的秘钥,也即偏移量N = [0.0 for i in range(26)]cipher = c_alpha(cipher)L = len(cipher)for i in range(L): #计算所有字母的频数,存在数组N当中if (cipher[i].islower()):N[(ord(cipher[i]) - ord('a') - n)%26] += 1else:N[(ord(cipher[i]) - ord('A') - n)%26] += 1 CI_2 = 0for i in range(26):CI_2 += ((N[i] / L) * F[i])return CI_2def get_key(cipher,key_len):un_cip = ['' for i in range(key_len)] cipher_alpha = c_alpha(cipher)for i in range(len(cipher_alpha)): # 完成分组工作z = i % key_lenun_cip[z] += cipher_alpha[i]s = ''for i in range(key_len):s += pre_5_key(un_cip[i]) ####这里应该将5个分组的秘钥猜测全部打印出来print(s)## 找出前5个最可能的单个秘钥
def pre_5_key(cipher):M = [(0,0.0) for i in range(26)]for i in range(26):M[i] = (chr(ord('a')+i),abs(0.065 - count_CI2(cipher,i)))M = sorted(M,key = lambda x:x[1]) #按照数组第二个元素排序return M[0][0]# 根据密钥解密
def decrypt(in_path, out_path, key):with open(in_path, "r") as f: # 打开文件text = f.read() # 读取文件text = is_alpha(text)t_len = len(text)k_len = len(key)i = 0res = ''while i < t_len:j = i % k_lenk = charset.index(key[j])m = charset.index(text[i])m += 26 if m < k else 0res += charset[m-k]i += 1with open(out_path,"w") as f:f.write(res)return res
5.3 使用说明及代码汇总
5.3.1 程序说明
程序链接:https://github.com/pp-tt/Vigenere
下载后直接运行main.py
文件即可,运行后在当前目录生成 KEY 文件和 加密文件,以及解密文件,原始文件.txt
为有意义的英文字符,文件内容如下:
Differential Privacy is the state-of-the-art goal for the problem of privacy-preserving data release and privacy-preserving data mining.Existing techniques using differential privacy,however,cannot effectively handle the publication of high-dimensional data.In particular,when the input dataset contains a large number of attributes,existing methods incur higher computing complexity and lower information to noise ratio,which renders the published data next to useless.This proposa aims to reduce computing complexity and signal to noise ratio.The starting point is to approximate the full distribution of high-dimensional dataset with a set of low-dimensional marginaldistributions via optimizing score function and reducing sensitivity,in which generation of noisy conditional distributions with differential privacy is computed in a set of low-dimensional subspaces,and then,the sample tuples from the noisy approximation distribution are used to generate and release the synthetic dataset.Some crucial science problems would be investigated below:(i)constructing a low k-degree Bayesian network over the high-dimensional dataset via exponential mechanism in differential privacy,where the score function is optimized to reduce the sensitivity using mutual information,equivalence classes in maximum joint distribution and dynamic programming;(ii)studying the algorithm to compute a set of noisy conditional distributions from joint distributions in the subspace of Bayesian network,via the Laplace mechanism of differential privacy.(iii)exploring how to generate synthetic data from thedifferentially private Bayesian network and conditional distributions,without explicitly materializing the noisy global distribution.The proposed solution may have theoretical and technical significance for synthetic data generation with differential privacy on business prospects.
启动程序如下:
if __name__ == "__main__":cipher = encryption('./原始文件.txt', './加密文件.txt', 'hello')key = get_key('./加密文件.txt', 'KEY.txt')decrypt('./加密文件.txt', './解密结果.txt', key)
5.3.2 代码汇总
import mathcharset = "abcdefghijklmnopqrstuvwxyz"# 加密
def encryption(in_path, out_path, key):with open(in_path, "r") as f: # 打开文件text = f.read() # 读取文件text = is_alpha(text)t_len = len(text)k_len = len(key)i = 0res = ''while i < t_len:j = i % k_lenk = charset.index(key[j])m = charset.index(text[i])res += charset[(m+k)%26]i += 1with open(out_path,"w") as f:f.write(res)return resdef is_alpha(text):res = ''for i in range(len(text)):res += text[i] if text[i].isalpha() else ''return res.lower()def cal_CI(text):L = len(text)x = []CI = 0for e in charset:x.append(text.count(e))for i in range(26):CI += ((x[i]*(x[i]-1)) / (L*(L-1)))return CIdef cal_d(text, k_len):d_str = ['' for i in range(k_len)]for i, c in enumerate(text):d_str[i % k_len] += creturn d_strdef cal_CIave(text, k_len): d_str = cal_d(text, k_len)CI_average = 0for i in range(k_len):d_str[i] = cal_CI(d_str[i])CI_average += d_str[i]CI_average = CI_average / len(d_str)return CI_averagedef get_k_len(text):M = [(1,cal_CI(text))]+[(0,0.0) for i in range(49)]for i in range(2,50):M[i] = (i,abs(0.065 - cal_CIave(text,i)))M = sorted(M,key = lambda x:x[1])n = [0 for i in range(10)]for i in range(2, 10):for e in M[1:10]:if e[0] % i == 0:n[i-1] += 1return n.index(max(n)) + 1 F = [
0.0651738, 0.0124248, 0.0217339,
0.0349835, 0.1041442, 0.0197881,
0.0158610, 0.0492888, 0.0558094,
0.0009033, 0.0050529, 0.0331490,
0.0202124, 0.0564513, 0.0596302,
0.0137645, 0.0008606, 0.0497563,
0.0515760, 0.0729357, 0.0225134,
0.0082903, 0.0171272, 0.0013692,
0.0145984, 0.0007836
] # 英文字符频率.# 猜测单个秘钥得到的重合指数
def count_CI2(cipher,n): # n 代表我们猜测的秘钥,也即偏移量N = [0.0 for i in range(26)]cipher = is_alpha(cipher)L = len(cipher)for i in range(L): #计算所有字母的频数,存在数组N当中if (cipher[i].islower()):N[(ord(cipher[i]) - ord('a') - n)%26] += 1else:N[(ord(cipher[i]) - ord('A') - n)%26] += 1 CI_2 = 0for i in range(26):CI_2 += ((N[i] / L) * F[i])return CI_2def get_key(in_path, out_path):with open(in_path, "r") as f: # 打开文件cipher = f.read() # 读取文件key_len = get_k_len(cipher)un_cip = ['' for i in range(key_len)] cipher_alpha = is_alpha(cipher)for i in range(len(cipher_alpha)): # 完成分组工作z = i % key_lenun_cip[z] += cipher_alpha[i]s = ''for i in range(key_len):s += pre_5_key(un_cip[i]) ####这里应该将5个分组的秘钥猜测全部打印出来with open(out_path,"w") as f:f.write(s)return s## 找出前5个最可能的单个秘钥
def pre_5_key(cipher):M = [(0,0.0) for i in range(26)]for i in range(26):M[i] = (chr(ord('a')+i),abs(0.065 - count_CI2(cipher,i)))M = sorted(M,key = lambda x:x[1]) #按照数组第二个元素排序return M[0][0]# 根据密钥解密
def decrypt(in_path, out_path, key):with open(in_path, "r") as f: # 打开文件text = f.read() # 读取文件text = is_alpha(text)t_len = len(text)k_len = len(key)i = 0res = ''while i < t_len:j = i % k_lenk = charset.index(key[j])m = charset.index(text[i])m += 26 if m < k else 0res += charset[m-k]i += 1with open(out_path,"w") as f:f.write(res)return resif __name__ == "__main__":cipher = encryption('./原始文件.txt', './加密文件.txt', 'hello')key = get_key('./加密文件.txt', 'KEY.txt')decrypt('./加密文件.txt', './解密结果.txt', key)