python 中的逆向最关键的就是要看到 python 源码,如果使用 ida 将会大大增加逆向的难度,所以使用好正确的工具和方法将会让逆向过程事半功倍。
python 逆向的流程为:test.exe–>test.pyc–>test.py
接下来我将介绍每一步所使用的工具以及到目前为止我所遇到的问题和解决的方法
# exe 转 pyc
这里所使用的工具为 pyinstxtractor
系统的 python 版本和 exe 编译时所使用的 python 版本必须相同,否则将会出现 <filename>_extracted/PYZ-00.pyz_extracted
内文件为空的情况
# pyinstaller 无–key 参数,未加密
用法为 python pyinstxtractor.py <filename>
然后在文件夹中就会出现 <filename>_extracted
文件夹,进入该文件夹后找到 <filename>.pyc
文件就可以进入下一步
如果遇到没有 <filename>.pyc
文件,但是有 <filename>
文件,那么首先需要用 010editor 打开 <filename>
文件,然后进入在 <filename>_extracted
内找到 struct.pyc
文件并用 010editor 打开,将前 16 个字节插入到 <filename>
文件的前面,然后将后缀改成 pyc 就可以了.
# pyinstaller 有–key 参数,加密
pyinstaller
为我们提供了加密的功能,可以给我们的逆向工程造成一些困难
PS C:\Users\春荠> pyinstaller.exe -h | |
... | |
--key KEY The key used to encrypt Python bytecode. |
这里用一个简单的例子演示一下
# main.py | |
import check | |
if __name__ == '__main__': | |
check.check_this() |
# check.py | |
def check_this(): | |
str = input("plz enter your input:") | |
if str=="123123": | |
print("right") | |
else: | |
print("wrong") |
随后使用命令
PS D:\TRASHBIN\pyinstaller--key> pyinstaller.exe -F --key oacia --upx-dir "D:\TOOLS\Unpackers(程序脱壳机)\upx\upx394w\upx394w" .\main.py |
得到 main.exe
可执行文件
拖到 ida 里面可以看到 main 函数是类似这种样子的
接下来我们使用 pyinstxtractor
解包一下
PS D:\TRASHBIN\pyinstaller--key\dist> python D:\TOOLS\python逆向\pyinstxtractor-master\pyinstxtractor-master\pyinstxtractor.py .\main.exe |
在 .\main.exe_extracted
文件夹内,有 main.pyc
文件,这是没有加密的
PS D:\TRASHBIN\pyinstaller--key\dist\main.exe_extracted> uncompyle6.exe .\main.pyc | |
# uncompyle6 version 3.9.0 | |
# Python bytecode version base 3.7.0 (3394) | |
# Decompiled from: Python 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)] | |
# Embedded file name: main.py | |
import check | |
if __name__ == '__main__': | |
check.check_this() | |
# okay decompiling .\main.pyc |
然而当我们进入 .\main.exe_extracted\PYZ-00.pyz_extracted
内后,我们可以看到这里的文件都是被加密过的,可以看到后缀为 .encrypted
我们在 main.pyc
转换成 py
后的源码中看到这里调用了 check
这个包,我们也同样可以找到 check.pyc.encrypted
这个文件
现在要做的事情就是解密这个文件
-
寻找密钥
在main.exe_extracted
文件夹内找到pyimod00_crypto_key.pyc
文件,转成 py 后就可以得到密钥00000000000oacia
PS D:\TRASHBIN\pyinstaller--key\dist\main.exe_extracted> uncompyle6.exe .\pyimod00_crypto_key.pyc
# uncompyle6 version 3.9.0
# Python bytecode version base 3.7.0 (3394)
# Decompiled from: Python 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)]
# Embedded file name: build\main\pyimod00_crypto_key.py
key = '00000000000oacia'
# okay decompiling .\pyimod00_crypto_key.pyc
-
解密 pyc
使用如下代码进行解密
import tinyaes
import zlib
CRYPT_BLOCK_SIZE = 16
# 填入密钥
key = bytes('00000000000oacia', 'utf-8')
inf = open('check.pyc.encrypted', 'rb') # 打开加密文件
outf = open('check.pyc', 'wb') # 输出文件
iv = inf.read(CRYPT_BLOCK_SIZE)
cipher = tinyaes.AES(key, iv)
plaintext = zlib.decompress(cipher.CTR_xcrypt_buffer(inf.read()))
outf.write(b'\x42\x0d\x0d\x0a\0\0\0\0\0\0\0\0\0\0\0\0')# 文件头,可以取 struct.pyc 的前 16 个字节
outf.write(plaintext)
inf.close()
outf.close()
之后便可以得到真正的源码
PS D:\TRASHBIN\pyinstaller--key\dist\main.exe_extracted\PYZ-00.pyz_extracted> uncompyle6.exe .\check.pyc | |
# uncompyle6 version 3.9.0 | |
# Python bytecode version base 3.7.0 (3394) | |
# Decompiled from: Python 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)] | |
# Embedded file name: check.py | |
def check_this(): | |
str = input('plz enter your input:') | |
if str == '123123': | |
print('right') | |
else: | |
print('wrong') | |
# okay decompiling .\check.pyc |
# pyc 转 py
这是我们想要看到源码最关键的一步
# uncompyle6
# 安装
pip install uncompyle6 |
# 用法
uncompyle6.exe -o <filename>.py <filename>.pyc |
uncompyle6 在 python39 以下都可以做到正常的反编译,但是对于 python39 及以上的版本,作者却没有进行支持,这时我们可以使用 pycdc
来帮助我们反编译
# pycdc
# 安装
git clone https://github.com/zrax/pycdc.git |
想要使用这一个工具需要用到 cmake, 进入官网 https://cmake.org/download/ 然后找到自己的系统版本进行安装就可以使用了
pycdc 的用法是
./pycdc.exe <filename>.pyc |
pycdc 的好处是可以反编译 python39 及以上版本的 pyc 文件
# 在线网站
当然,我认为最方便的方法还是使用在线网站:https://tool.lu/pyc 直接把 pyc 文件拖进去就可以看到 py 文件了, 要是像省赛这种不能联网的 nt 赛制还是老老实实用 github 上的离线工具吧
# pyc 转字节码
应对 pyc 转 py 失败的情况
标准的格式如下
import marshal, dis | |
f = open("simple.pyc", "rb").read() | |
code = marshal.loads(f[16:]) #这边从 16 位开始取因为是 python3 python2 从 8 位开始取 | |
dis.dis(code) |
注意在对 pyc 逆向的过程中,可能会出现 uncompyle6.exe
无法分析 pyc 的情况,那么可能的原因有如下几点
- python 版本和 pyc 版本不一致,此时需要切换 python 版本
- pyc 掺杂有花指令
出现这种情况,需要将花指令的部分删除,并且更改co_code
的值,co_code
的值可以通过 python 代码len(code.co_code)
得到,co_code
在 python 十六进制中的第三行,第十,十一个字节参考下图的位置
字节码提取出来后,如果感觉看字节码的逻辑麻烦的话,可以将字节码复制到 CHATGPT
, 让 CHATGPT
帮忙转换成可读的 py 代码
# 附录
# pyc 结构
typedef struct { | |
PyObject_HEAD | |
int co_argcount; /* 位置参数个数 */ | |
int co_nlocals; /* 局部变量个数 */ | |
int co_stacksize; /* 栈大小 */ | |
int co_flags; | |
PyObject *co_code; /* 字节码指令序列 */ | |
PyObject *co_consts; /* 所有常量集合 */ | |
PyObject *co_names; /* 所有符号名称集合 */ | |
PyObject *co_varnames; /* 局部变量名称集合 */ | |
PyObject *co_freevars; /* 闭包用的的变量名集合 */ | |
PyObject *co_cellvars; /* 内部嵌套函数引用的变量名集合 */ | |
/* The rest doesn’t count for hash/cmp */ | |
PyObject *co_filename; /* 代码所在文件名 */ | |
PyObject *co_name; /* 模块名|函数名|类名 */ | |
int co_firstlineno; /* 代码块在文件中的起始行号 */ | |
PyObject *co_lnotab; /* 字节码指令和行号的对应关系 */ | |
void *co_zombieframe; /* for optimization only (see frameobject.c) */ | |
} PyCodeObject; |
- pyc 文件头部
- 前 4 个字节:03f3 0d0a,表示 python 版本
- 5-8 个字节:0e6b 905d,表示 pyc 文件修改时间
- PyCodeObject 对象二进制编译结果
- 第 9 字节:63,TYPE_CODE 字段,也就是字符 c,值为 99,即 0x63,表示接下为是一个 PyCodeObject 对象
- PyCodeObject 对象 ---- 全局参数
- 然后 4 个字节是 0x00 0000 00,code block 的位置参数个数 co_argument,这里是 0;
- 再接着 4 个字节是 0x00 0000 00, code block 中的局部变量个数 co_nlocals,这里是 0;
- 再接着 4 个字节是 0x01 0000 00, code block 需要的栈空间 co_stacksize,这里是 1;
- 再接着 4 个字节是 0x40 0000 00, co_flags,这里是 64;
- PyCodeObject 对象 ----code block
- 1 个字节 0x73 为 TYPE_CODE 字段, 表示该字段为 string 格式;
- 4 个字节 0x1a00 0000 表示 code block 段的数据部分占用 0x1a 个字节,即长度为 26;
- 接下来 26 个字节 6400 … 6402 0053 为该 TYPE_CODE 字段(数据类型 string)部分,也就是 pyc 文件中包含的字节码指令
# 字节码含义
pyc 的字节码可以在路径 python安装路径/include/opcode.h
/* Auto-generated by Tools/scripts/generate_opcode_h.py */ | |
#ifndef Py_OPCODE_H | |
#define Py_OPCODE_H | |
#ifdef __cplusplus | |
extern "C" { | |
#endif | |
/* Instruction opcodes for compiled code */ | |
#define POP_TOP 1 | |
#define ROT_TWO 2 | |
#define ROT_THREE 3 | |
#define DUP_TOP 4 | |
#define DUP_TOP_TWO 5 | |
#define NOP 9 | |
#define UNARY_POSITIVE 10 | |
#define UNARY_NEGATIVE 11 | |
#define UNARY_NOT 12 | |
#define UNARY_INVERT 15 | |
#define BINARY_MATRIX_MULTIPLY 16 | |
#define INPLACE_MATRIX_MULTIPLY 17 | |
#define BINARY_POWER 19 | |
#define BINARY_MULTIPLY 20 | |
#define BINARY_MODULO 22 | |
#define BINARY_ADD 23 | |
#define BINARY_SUBTRACT 24 | |
#define BINARY_SUBSCR 25 | |
#define BINARY_FLOOR_DIVIDE 26 | |
#define BINARY_TRUE_DIVIDE 27 | |
#define INPLACE_FLOOR_DIVIDE 28 | |
#define INPLACE_TRUE_DIVIDE 29 | |
#define GET_AITER 50 | |
#define GET_ANEXT 51 | |
#define BEFORE_ASYNC_WITH 52 | |
#define INPLACE_ADD 55 | |
#define INPLACE_SUBTRACT 56 | |
#define INPLACE_MULTIPLY 57 | |
#define INPLACE_MODULO 59 | |
#define STORE_SUBSCR 60 | |
#define DELETE_SUBSCR 61 | |
#define BINARY_LSHIFT 62 | |
#define BINARY_RSHIFT 63 | |
#define BINARY_AND 64 | |
#define BINARY_XOR 65 | |
#define BINARY_OR 66 | |
#define INPLACE_POWER 67 | |
#define GET_ITER 68 | |
#define GET_YIELD_FROM_ITER 69 | |
#define PRINT_EXPR 70 | |
#define LOAD_BUILD_CLASS 71 | |
#define YIELD_FROM 72 | |
#define GET_AWAITABLE 73 | |
#define INPLACE_LSHIFT 75 | |
#define INPLACE_RSHIFT 76 | |
#define INPLACE_AND 77 | |
#define INPLACE_XOR 78 | |
#define INPLACE_OR 79 | |
#define BREAK_LOOP 80 | |
#define WITH_CLEANUP_START 81 | |
#define WITH_CLEANUP_FINISH 82 | |
#define RETURN_VALUE 83 | |
#define IMPORT_STAR 84 | |
#define SETUP_ANNOTATIONS 85 | |
#define YIELD_VALUE 86 | |
#define POP_BLOCK 87 | |
#define END_FINALLY 88 | |
#define POP_EXCEPT 89 | |
#define HAVE_ARGUMENT 90 | |
#define STORE_NAME 90 | |
#define DELETE_NAME 91 | |
#define UNPACK_SEQUENCE 92 | |
#define FOR_ITER 93 | |
#define UNPACK_EX 94 | |
#define STORE_ATTR 95 | |
#define DELETE_ATTR 96 | |
#define STORE_GLOBAL 97 | |
#define DELETE_GLOBAL 98 | |
#define LOAD_CONST 100 | |
#define LOAD_NAME 101 | |
#define BUILD_TUPLE 102 | |
#define BUILD_LIST 103 | |
#define BUILD_SET 104 | |
#define BUILD_MAP 105 | |
#define LOAD_ATTR 106 | |
#define COMPARE_OP 107 | |
#define IMPORT_NAME 108 | |
#define IMPORT_FROM 109 | |
#define JUMP_FORWARD 110 | |
#define JUMP_IF_FALSE_OR_POP 111 | |
#define JUMP_IF_TRUE_OR_POP 112 | |
#define JUMP_ABSOLUTE 113 | |
#define POP_JUMP_IF_FALSE 114 | |
#define POP_JUMP_IF_TRUE 115 | |
#define LOAD_GLOBAL 116 | |
#define CONTINUE_LOOP 119 | |
#define SETUP_LOOP 120 | |
#define SETUP_EXCEPT 121 | |
#define SETUP_FINALLY 122 | |
#define LOAD_FAST 124 | |
#define STORE_FAST 125 | |
#define DELETE_FAST 126 | |
#define RAISE_VARARGS 130 | |
#define CALL_FUNCTION 131 | |
#define MAKE_FUNCTION 132 | |
#define BUILD_SLICE 133 | |
#define LOAD_CLOSURE 135 | |
#define LOAD_DEREF 136 | |
#define STORE_DEREF 137 | |
#define DELETE_DEREF 138 | |
#define CALL_FUNCTION_KW 141 | |
#define CALL_FUNCTION_EX 142 | |
#define SETUP_WITH 143 | |
#define EXTENDED_ARG 144 | |
#define LIST_APPEND 145 | |
#define SET_ADD 146 | |
#define MAP_ADD 147 | |
#define LOAD_CLASSDEREF 148 | |
#define BUILD_LIST_UNPACK 149 | |
#define BUILD_MAP_UNPACK 150 | |
#define BUILD_MAP_UNPACK_WITH_CALL 151 | |
#define BUILD_TUPLE_UNPACK 152 | |
#define BUILD_SET_UNPACK 153 | |
#define SETUP_ASYNC_WITH 154 | |
#define FORMAT_VALUE 155 | |
#define BUILD_CONST_KEY_MAP 156 | |
#define BUILD_STRING 157 | |
#define BUILD_TUPLE_UNPACK_WITH_CALL 158 | |
#define LOAD_METHOD 160 | |
#define CALL_METHOD 161 | |
/* EXCEPT_HANDLER is a special, implicit block type which is created when | |
entering an except handler. It is not an opcode but we define it here | |
as we want it to be available to both frameobject.c and ceval.c, while | |
remaining private.*/ | |
#define EXCEPT_HANDLER 257 | |
enum cmp_op {PyCmp_LT=Py_LT, PyCmp_LE=Py_LE, PyCmp_EQ=Py_EQ, PyCmp_NE=Py_NE, | |
PyCmp_GT=Py_GT, PyCmp_GE=Py_GE, PyCmp_IN, PyCmp_NOT_IN, | |
PyCmp_IS, PyCmp_IS_NOT, PyCmp_EXC_MATCH, PyCmp_BAD}; | |
#define HAS_ARG(op) ((op) >= HAVE_ARGUMENT) | |
#ifdef __cplusplus | |
} | |
#endif | |
#endif /* !Py_OPCODE_H */ |
# pyc 魔数表
# copied from https://github.com/google/pytype/blob/main/pytype/pyc/magic.py | |
"""Python version numbers and their encoding ("magic number").""" | |
import struct | |
# These constants are from Python-3.x.x/Lib/importlib/_bootstrap_external.py | |
PYTHON_MAGIC = { | |
# Python 1 | |
20121: (1, 5), | |
50428: (1, 6), | |
# Python 2 | |
50823: (2, 0), | |
60202: (2, 1), | |
60717: (2, 2), | |
62011: (2, 3), # a0 | |
62021: (2, 3), # a0 | |
62041: (2, 4), # a0 | |
62051: (2, 4), # a3 | |
62061: (2, 4), # b1 | |
62071: (2, 5), # a0 | |
62081: (2, 5), # a0 | |
62091: (2, 5), # a0 | |
62092: (2, 5), # a0 | |
62101: (2, 5), # b3 | |
62111: (2, 5), # b3 | |
62121: (2, 5), # c1 | |
62131: (2, 5), # c2 | |
62151: (2, 6), # a0 | |
62161: (2, 6), # a1 | |
62171: (2, 7), # a0 | |
62181: (2, 7), # a0 | |
62191: (2, 7), # a0 | |
62201: (2, 7), # a0 | |
62211: (2, 7), # a0 | |
# Python 3 | |
3000: (3, 0), | |
3010: (3, 0), | |
3020: (3, 0), | |
3030: (3, 0), | |
3040: (3, 0), | |
3050: (3, 0), | |
3060: (3, 0), | |
3061: (3, 0), | |
3071: (3, 0), | |
3081: (3, 0), | |
3091: (3, 0), | |
3101: (3, 0), | |
3103: (3, 0), | |
3111: (3, 0), # a4 | |
3131: (3, 0), # a5 | |
# Python 3.1 | |
3141: (3, 1), # a0 | |
3151: (3, 1), # a0 | |
# Python 3.2 | |
3160: (3, 2), # a0 | |
3170: (3, 2), # a1 | |
3180: (3, 2), # a2 | |
# Python 3.3 | |
3190: (3, 3), # a0 | |
3200: (3, 3), # a0 | |
3220: (3, 3), # a1 | |
3230: (3, 3), # a4 | |
# Python 3.4 | |
3250: (3, 4), # a1 | |
3260: (3, 4), # a1 | |
3270: (3, 4), # a1 | |
3280: (3, 4), # a1 | |
3290: (3, 4), # a4 | |
3300: (3, 4), # a4 | |
3310: (3, 4), # rc2 | |
# Python 3.5 | |
3320: (3, 5), # a0 | |
3330: (3, 5), # b1 | |
3340: (3, 5), # b2 | |
3350: (3, 5), # b2 | |
3351: (3, 5), # 3.5.2 | |
# Python 3.6 | |
3360: (3, 6), # a0 | |
3361: (3, 6), # a0 | |
3370: (3, 6), # a1 | |
3371: (3, 6), # a1 | |
3372: (3, 6), # a1 | |
3373: (3, 6), # b1 | |
3375: (3, 6), # b1 | |
3376: (3, 6), # b1 | |
3377: (3, 6), # b1 | |
3378: (3, 6), # b2 | |
3379: (3, 6), # rc1 | |
# Python 3.7 | |
3390: (3, 7), # a1 | |
3391: (3, 7), # a2 | |
3392: (3, 7), # a4 | |
3393: (3, 7), # b1 | |
3394: (3, 7), # b5 | |
# Python 3.8 | |
3400: (3, 8), # a1 | |
3401: (3, 8), # a1 | |
3410: (3, 8), # a1 | |
3411: (3, 8), # b2 | |
3412: (3, 8), # b2 | |
3413: (3, 8), # b4 | |
# Python 3.9 | |
3420: (3, 9), # a0 | |
3421: (3, 9), # a0 | |
3422: (3, 9), # a0 | |
3423: (3, 9), # a2 | |
3424: (3, 9), # a2 | |
3425: (3, 9), # a2 | |
# Python 3.10 | |
3430: (3, 10), # a1 | |
3431: (3, 10), # a1 | |
3432: (3, 10), # a2 | |
3433: (3, 10), # a2 | |
3434: (3, 10), # a6 | |
3435: (3, 10), # a7 | |
3436: (3, 10), # b1 | |
3437: (3, 10), # b1 | |
3438: (3, 10), # b1 | |
3439: (3, 10), # b1 | |
# Python 3.11 | |
3450: (3, 11), # a1 | |
3451: (3, 11), # a1 | |
3452: (3, 11), # a1 | |
3453: (3, 11), # a1 | |
3454: (3, 11), # a1 | |
3455: (3, 11), # a1 | |
3456: (3, 11), # a1 | |
3457: (3, 11), # a1 | |
3458: (3, 11), # a1 | |
3459: (3, 11), # a1 | |
3460: (3, 11), # a1 | |
3461: (3, 11), # a1 | |
3462: (3, 11), # a2 | |
3463: (3, 11), # a3 | |
3464: (3, 11), # a3 | |
3465: (3, 11), # a3 | |
3466: (3, 11), # a4 | |
3467: (3, 11), # a4 | |
3468: (3, 11), # a4 | |
3469: (3, 11), # a4 | |
3470: (3, 11), # a4 | |
3471: (3, 11), # a4 | |
3472: (3, 11), # a4 | |
3473: (3, 11), # a4 | |
3474: (3, 11), # a4 | |
3475: (3, 11), # a5 | |
3476: (3, 11), # a5 | |
3477: (3, 11), # a5 | |
3478: (3, 11), # a5 | |
3479: (3, 11), # a5 | |
3480: (3, 11), # a5 | |
3481: (3, 11), # a5 | |
3482: (3, 11), # a5 | |
3483: (3, 11), # a5 | |
3484: (3, 11), # a5 | |
3485: (3, 11), # a5 | |
3486: (3, 11), # a6 | |
3487: (3, 11), # a6 | |
3488: (3, 11), # a6 | |
3489: (3, 11), # a6 | |
3490: (3, 11), # a6 | |
3491: (3, 11), # a6 | |
3492: (3, 11), # a7 | |
3493: (3, 11), # a7 | |
3494: (3, 11), # a7 | |
3495: (3, 11), # b4 | |
} | |
def magic_word_to_version(magic_word): | |
"""Return the Python version belonging to the magic number in the pyc head. | |
The magic number is encoded in the first two bytes of a .pyc file. | |
It translates to a (major, minor) version. It never has a "micro" version, | |
because Python bytecode encoding doesn't change between micro version. | |
Arguments: | |
magic_word: A 16 bit number, either as an integer or little-endian encoded | |
as a string. | |
Returns: | |
A tuple (major, minor), e.g. (3, 7). | |
""" | |
if not isinstance(magic_word, int): | |
magic_word = struct.unpack("<H", magic_word)[0] | |
return PYTHON_MAGIC[magic_word] |