python 中的逆向最关键的就是要看到 python 源码,如果使用 ida 将会大大增加逆向的难度,所以使用好正确的工具和方法将会让逆向过程事半功倍。

python 逆向的流程为:test.exe–>test.pyc–>test.py

接下来我将介绍每一步所使用的工具以及到目前为止我所遇到的问题和解决的方法

# exe 转 pyc

这里所使用的工具为 pyinstxtractor

系统的 python 版本和 exe 编译时所使用的 python 版本必须相同,否则将会出现 <filename>_extracted/PYZ-00.pyz_extracted 内文件为空的情况

# pyinstaller 无–key 参数,未加密

用法为 python pyinstxtractor.py <filename>

然后在文件夹中就会出现 <filename>_extracted 文件夹,进入该文件夹后找到 <filename>.pyc 文件就可以进入下一步

如果遇到没有 <filename>.pyc 文件,但是有 <filename> 文件,那么首先需要用 010editor 打开 <filename> 文件,然后进入在 <filename>_extracted 内找到 struct.pyc 文件并用 010editor 打开,将前 16 个字节插入到 <filename> 文件的前面,然后将后缀改成 pyc 就可以了.

# pyinstaller 有–key 参数,加密

pyinstaller 为我们提供了加密的功能,可以给我们的逆向工程造成一些困难

PS C:\Users\春荠> pyinstaller.exe -h
 ...
 --key KEY             The key used to encrypt Python bytecode.

这里用一个简单的例子演示一下

# main.py
import check
if __name__ == '__main__':
    check.check_this()
# check.py
def check_this():
    str = input("plz enter your input:")
    if str=="123123":
        print("right")
    else:
        print("wrong")

随后使用命令

PS D:\TRASHBIN\pyinstaller--key> pyinstaller.exe -F --key oacia --upx-dir "D:\TOOLS\Unpackers(程序脱壳机)\upx\upx394w\upx394w" .\main.py

得到 main.exe 可执行文件

拖到 ida 里面可以看到 main 函数是类似这种样子的

image-20230417085210697

接下来我们使用 pyinstxtractor 解包一下

PS D:\TRASHBIN\pyinstaller--key\dist> python D:\TOOLS\python逆向\pyinstxtractor-master\pyinstxtractor-master\pyinstxtractor.py .\main.exe

.\main.exe_extracted 文件夹内,有 main.pyc 文件,这是没有加密的

PS D:\TRASHBIN\pyinstaller--key\dist\main.exe_extracted> uncompyle6.exe .\main.pyc
# uncompyle6 version 3.9.0
# Python bytecode version base 3.7.0 (3394)
# Decompiled from: Python 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)]
# Embedded file name: main.py
import check
if __name__ == '__main__':
    check.check_this()
# okay decompiling .\main.pyc

然而当我们进入 .\main.exe_extracted\PYZ-00.pyz_extracted 内后,我们可以看到这里的文件都是被加密过的,可以看到后缀为 .encrypted

我们在 main.pyc 转换成 py 后的源码中看到这里调用了 check 这个包,我们也同样可以找到 check.pyc.encrypted 这个文件

image-20230417085900019

现在要做的事情就是解密这个文件

  • 寻找密钥
    main.exe_extracted 文件夹内找到 pyimod00_crypto_key.pyc 文件,转成 py 后就可以得到密钥 00000000000oacia

    PS D:\TRASHBIN\pyinstaller--key\dist\main.exe_extracted> uncompyle6.exe .\pyimod00_crypto_key.pyc
    # uncompyle6 version 3.9.0
    # Python bytecode version base 3.7.0 (3394)
    # Decompiled from: Python 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)]
    # Embedded file name: build\main\pyimod00_crypto_key.py
    key = '00000000000oacia'
    # okay decompiling .\pyimod00_crypto_key.pyc
  • 解密 pyc

    使用如下代码进行解密

    import tinyaes
    import zlib
    CRYPT_BLOCK_SIZE = 16
    # 填入密钥
    key = bytes('00000000000oacia', 'utf-8')
    inf = open('check.pyc.encrypted', 'rb')  # 打开加密文件
    outf = open('check.pyc', 'wb')  # 输出文件
    iv = inf.read(CRYPT_BLOCK_SIZE)
    cipher = tinyaes.AES(key, iv)
    plaintext = zlib.decompress(cipher.CTR_xcrypt_buffer(inf.read()))
    outf.write(b'\x42\x0d\x0d\x0a\0\0\0\0\0\0\0\0\0\0\0\0')# 文件头,可以取 struct.pyc 的前 16 个字节
    outf.write(plaintext)
    inf.close()
    outf.close()

之后便可以得到真正的源码

PS D:\TRASHBIN\pyinstaller--key\dist\main.exe_extracted\PYZ-00.pyz_extracted> uncompyle6.exe .\check.pyc
# uncompyle6 version 3.9.0
# Python bytecode version base 3.7.0 (3394)
# Decompiled from: Python 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)]
# Embedded file name: check.py
def check_this():
    str = input('plz enter your input:')
    if str == '123123':
        print('right')
    else:
        print('wrong')
# okay decompiling .\check.pyc

# pyc 转 py

这是我们想要看到源码最关键的一步

# uncompyle6

# 安装
pip install uncompyle6
# 用法
uncompyle6.exe -o <filename>.py <filename>.pyc

uncompyle6 在 python39 以下都可以做到正常的反编译,但是对于 python39 及以上的版本,作者却没有进行支持,这时我们可以使用 pycdc 来帮助我们反编译

# pycdc

# 安装
git clone https://github.com/zrax/pycdc.git

想要使用这一个工具需要用到 cmake, 进入官网 https://cmake.org/download/ 然后找到自己的系统版本进行安装就可以使用了

pycdc 的用法是

./pycdc.exe <filename>.pyc

pycdc 的好处是可以反编译 python39 及以上版本的 pyc 文件

# 在线网站

当然,我认为最方便的方法还是使用在线网站:https://tool.lu/pyc 直接把 pyc 文件拖进去就可以看到 py 文件了, 要是像省赛这种不能联网的 nt 赛制还是老老实实用 github 上的离线工具吧

# pyc 转字节码

应对 pyc 转 py 失败的情况

标准的格式如下

import marshal, dis
f = open("simple.pyc", "rb").read()
code = marshal.loads(f[16:])			#这边从 16 位开始取因为是 python3 python2 从 8 位开始取
dis.dis(code)

注意在对 pyc 逆向的过程中,可能会出现 uncompyle6.exe 无法分析 pyc 的情况,那么可能的原因有如下几点

  • python 版本和 pyc 版本不一致,此时需要切换 python 版本
  • pyc 掺杂有花指令
    出现这种情况,需要将花指令的部分删除,并且更改 co_code 的值, co_code 的值可以通过 python 代码 len(code.co_code) 得到, co_code 在 python 十六进制中的第三行,第十,十一个字节参考下图的位置
    image-20230315185854135

字节码提取出来后,如果感觉看字节码的逻辑麻烦的话,可以将字节码复制到 CHATGPT , 让 CHATGPT 帮忙转换成可读的 py 代码

# 附录

# pyc 结构

typedef struct {
    PyObject_HEAD
    int co_argcount;        /* 位置参数个数 */
    int co_nlocals;         /* 局部变量个数 */
    int co_stacksize;       /* 栈大小 */
    int co_flags;   
    PyObject *co_code;      /* 字节码指令序列 */
    PyObject *co_consts;    /* 所有常量集合 */
    PyObject *co_names;     /* 所有符号名称集合 */
    PyObject *co_varnames;  /* 局部变量名称集合 */
    PyObject *co_freevars;  /* 闭包用的的变量名集合 */
    PyObject *co_cellvars;  /* 内部嵌套函数引用的变量名集合 */
    /* The rest doesn’t count for hash/cmp */
    PyObject *co_filename;  /* 代码所在文件名 */
    PyObject *co_name;      /* 模块名|函数名|类名 */
    int co_firstlineno;     /* 代码块在文件中的起始行号 */
    PyObject *co_lnotab;    /* 字节码指令和行号的对应关系 */
    void *co_zombieframe;   /* for optimization only (see frameobject.c) */
} PyCodeObject;

image-20230315190405729

  • pyc 文件头部
    • 前 4 个字节:03f3 0d0a,表示 python 版本
    • 5-8 个字节:0e6b 905d,表示 pyc 文件修改时间
  • PyCodeObject 对象二进制编译结果
    • 第 9 字节:63,TYPE_CODE 字段,也就是字符 c,值为 99,即 0x63,表示接下为是一个 PyCodeObject 对象
  • PyCodeObject 对象 ---- 全局参数
    • 然后 4 个字节是 0x00 0000 00,code block 的位置参数个数 co_argument,这里是 0;
    • 再接着 4 个字节是 0x00 0000 00, code block 中的局部变量个数 co_nlocals,这里是 0;
    • 再接着 4 个字节是 0x01 0000 00, code block 需要的栈空间 co_stacksize,这里是 1;
    • 再接着 4 个字节是 0x40 0000 00, co_flags,这里是 64;
  • PyCodeObject 对象 ----code block
    • 1 个字节 0x73 为 TYPE_CODE 字段, 表示该字段为 string 格式;
    • 4 个字节 0x1a00 0000 表示 code block 段的数据部分占用 0x1a 个字节,即长度为 26;
    • 接下来 26 个字节 6400 … 6402 0053 为该 TYPE_CODE 字段(数据类型 string)部分,也就是 pyc 文件中包含的字节码指令

# 字节码含义

pyc 的字节码可以在路径 python安装路径/include/opcode.h

/* Auto-generated by Tools/scripts/generate_opcode_h.py */
#ifndef Py_OPCODE_H
#define Py_OPCODE_H
#ifdef __cplusplus
extern "C" {
#endif
    /* Instruction opcodes for compiled code */
#define POP_TOP                   1
#define ROT_TWO                   2
#define ROT_THREE                 3
#define DUP_TOP                   4
#define DUP_TOP_TWO               5
#define NOP                       9
#define UNARY_POSITIVE           10
#define UNARY_NEGATIVE           11
#define UNARY_NOT                12
#define UNARY_INVERT             15
#define BINARY_MATRIX_MULTIPLY   16
#define INPLACE_MATRIX_MULTIPLY  17
#define BINARY_POWER             19
#define BINARY_MULTIPLY          20
#define BINARY_MODULO            22
#define BINARY_ADD               23
#define BINARY_SUBTRACT          24
#define BINARY_SUBSCR            25
#define BINARY_FLOOR_DIVIDE      26
#define BINARY_TRUE_DIVIDE       27
#define INPLACE_FLOOR_DIVIDE     28
#define INPLACE_TRUE_DIVIDE      29
#define GET_AITER                50
#define GET_ANEXT                51
#define BEFORE_ASYNC_WITH        52
#define INPLACE_ADD              55
#define INPLACE_SUBTRACT         56
#define INPLACE_MULTIPLY         57
#define INPLACE_MODULO           59
#define STORE_SUBSCR             60
#define DELETE_SUBSCR            61
#define BINARY_LSHIFT            62
#define BINARY_RSHIFT            63
#define BINARY_AND               64
#define BINARY_XOR               65
#define BINARY_OR                66
#define INPLACE_POWER            67
#define GET_ITER                 68
#define GET_YIELD_FROM_ITER      69
#define PRINT_EXPR               70
#define LOAD_BUILD_CLASS         71
#define YIELD_FROM               72
#define GET_AWAITABLE            73
#define INPLACE_LSHIFT           75
#define INPLACE_RSHIFT           76
#define INPLACE_AND              77
#define INPLACE_XOR              78
#define INPLACE_OR               79
#define BREAK_LOOP               80
#define WITH_CLEANUP_START       81
#define WITH_CLEANUP_FINISH      82
#define RETURN_VALUE             83
#define IMPORT_STAR              84
#define SETUP_ANNOTATIONS        85
#define YIELD_VALUE              86
#define POP_BLOCK                87
#define END_FINALLY              88
#define POP_EXCEPT               89
#define HAVE_ARGUMENT            90
#define STORE_NAME               90
#define DELETE_NAME              91
#define UNPACK_SEQUENCE          92
#define FOR_ITER                 93
#define UNPACK_EX                94
#define STORE_ATTR               95
#define DELETE_ATTR              96
#define STORE_GLOBAL             97
#define DELETE_GLOBAL            98
#define LOAD_CONST              100
#define LOAD_NAME               101
#define BUILD_TUPLE             102
#define BUILD_LIST              103
#define BUILD_SET               104
#define BUILD_MAP               105
#define LOAD_ATTR               106
#define COMPARE_OP              107
#define IMPORT_NAME             108
#define IMPORT_FROM             109
#define JUMP_FORWARD            110
#define JUMP_IF_FALSE_OR_POP    111
#define JUMP_IF_TRUE_OR_POP     112
#define JUMP_ABSOLUTE           113
#define POP_JUMP_IF_FALSE       114
#define POP_JUMP_IF_TRUE        115
#define LOAD_GLOBAL             116
#define CONTINUE_LOOP           119
#define SETUP_LOOP              120
#define SETUP_EXCEPT            121
#define SETUP_FINALLY           122
#define LOAD_FAST               124
#define STORE_FAST              125
#define DELETE_FAST             126
#define RAISE_VARARGS           130
#define CALL_FUNCTION           131
#define MAKE_FUNCTION           132
#define BUILD_SLICE             133
#define LOAD_CLOSURE            135
#define LOAD_DEREF              136
#define STORE_DEREF             137
#define DELETE_DEREF            138
#define CALL_FUNCTION_KW        141
#define CALL_FUNCTION_EX        142
#define SETUP_WITH              143
#define EXTENDED_ARG            144
#define LIST_APPEND             145
#define SET_ADD                 146
#define MAP_ADD                 147
#define LOAD_CLASSDEREF         148
#define BUILD_LIST_UNPACK       149
#define BUILD_MAP_UNPACK        150
#define BUILD_MAP_UNPACK_WITH_CALL 151
#define BUILD_TUPLE_UNPACK      152
#define BUILD_SET_UNPACK        153
#define SETUP_ASYNC_WITH        154
#define FORMAT_VALUE            155
#define BUILD_CONST_KEY_MAP     156
#define BUILD_STRING            157
#define BUILD_TUPLE_UNPACK_WITH_CALL 158
#define LOAD_METHOD             160
#define CALL_METHOD             161
/* EXCEPT_HANDLER is a special, implicit block type which is created when
   entering an except handler. It is not an opcode but we define it here
   as we want it to be available to both frameobject.c and ceval.c, while
   remaining private.*/
#define EXCEPT_HANDLER 257
enum cmp_op {PyCmp_LT=Py_LT, PyCmp_LE=Py_LE, PyCmp_EQ=Py_EQ, PyCmp_NE=Py_NE,
                PyCmp_GT=Py_GT, PyCmp_GE=Py_GE, PyCmp_IN, PyCmp_NOT_IN,
                PyCmp_IS, PyCmp_IS_NOT, PyCmp_EXC_MATCH, PyCmp_BAD};
#define HAS_ARG(op) ((op) >= HAVE_ARGUMENT)
#ifdef __cplusplus
}
#endif
#endif /* !Py_OPCODE_H */

# pyc 魔数表

# copied from https://github.com/google/pytype/blob/main/pytype/pyc/magic.py
"""Python version numbers and their encoding ("magic number")."""
import struct
# These constants are from Python-3.x.x/Lib/importlib/_bootstrap_external.py
PYTHON_MAGIC = {
    # Python 1
    20121: (1, 5),
    50428: (1, 6),
    # Python 2
    50823: (2, 0),
    60202: (2, 1),
    60717: (2, 2),
    62011: (2, 3),  # a0
    62021: (2, 3),  # a0
    62041: (2, 4),  # a0
    62051: (2, 4),  # a3
    62061: (2, 4),  # b1
    62071: (2, 5),  # a0
    62081: (2, 5),  # a0
    62091: (2, 5),  # a0
    62092: (2, 5),  # a0
    62101: (2, 5),  # b3
    62111: (2, 5),  # b3
    62121: (2, 5),  # c1
    62131: (2, 5),  # c2
    62151: (2, 6),  # a0
    62161: (2, 6),  # a1
    62171: (2, 7),  # a0
    62181: (2, 7),  # a0
    62191: (2, 7),  # a0
    62201: (2, 7),  # a0
    62211: (2, 7),  # a0
    # Python 3
    3000: (3, 0),
    3010: (3, 0),
    3020: (3, 0),
    3030: (3, 0),
    3040: (3, 0),
    3050: (3, 0),
    3060: (3, 0),
    3061: (3, 0),
    3071: (3, 0),
    3081: (3, 0),
    3091: (3, 0),
    3101: (3, 0),
    3103: (3, 0),
    3111: (3, 0),  # a4
    3131: (3, 0),  # a5
    # Python 3.1
    3141: (3, 1),  # a0
    3151: (3, 1),  # a0
    # Python 3.2
    3160: (3, 2),  # a0
    3170: (3, 2),  # a1
    3180: (3, 2),  # a2
    # Python 3.3
    3190: (3, 3),  # a0
    3200: (3, 3),  # a0
    3220: (3, 3),  # a1
    3230: (3, 3),  # a4
    # Python 3.4
    3250: (3, 4),  # a1
    3260: (3, 4),  # a1
    3270: (3, 4),  # a1
    3280: (3, 4),  # a1
    3290: (3, 4),  # a4
    3300: (3, 4),  # a4
    3310: (3, 4),  # rc2
    # Python 3.5
    3320: (3, 5),  # a0
    3330: (3, 5),  # b1
    3340: (3, 5),  # b2
    3350: (3, 5),  # b2
    3351: (3, 5),  # 3.5.2
    # Python 3.6
    3360: (3, 6),  # a0
    3361: (3, 6),  # a0
    3370: (3, 6),  # a1
    3371: (3, 6),  # a1
    3372: (3, 6),  # a1
    3373: (3, 6),  # b1
    3375: (3, 6),  # b1
    3376: (3, 6),  # b1
    3377: (3, 6),  # b1
    3378: (3, 6),  # b2
    3379: (3, 6),  # rc1
    # Python 3.7
    3390: (3, 7),  # a1
    3391: (3, 7),  # a2
    3392: (3, 7),  # a4
    3393: (3, 7),  # b1
    3394: (3, 7),  # b5
    # Python 3.8
    3400: (3, 8),  # a1
    3401: (3, 8),  # a1
    3410: (3, 8),  # a1
    3411: (3, 8),  # b2
    3412: (3, 8),  # b2
    3413: (3, 8),  # b4
    # Python 3.9
    3420: (3, 9),  # a0
    3421: (3, 9),  # a0
    3422: (3, 9),  # a0
    3423: (3, 9),  # a2
    3424: (3, 9),  # a2
    3425: (3, 9),  # a2
    # Python 3.10
    3430: (3, 10),  # a1
    3431: (3, 10),  # a1
    3432: (3, 10),  # a2
    3433: (3, 10),  # a2
    3434: (3, 10),  # a6
    3435: (3, 10),  # a7
    3436: (3, 10),  # b1
    3437: (3, 10),  # b1
    3438: (3, 10),  # b1
    3439: (3, 10),  # b1
    # Python 3.11
    3450: (3, 11),  # a1
    3451: (3, 11),  # a1
    3452: (3, 11),  # a1
    3453: (3, 11),  # a1
    3454: (3, 11),  # a1
    3455: (3, 11),  # a1
    3456: (3, 11),  # a1
    3457: (3, 11),  # a1
    3458: (3, 11),  # a1
    3459: (3, 11),  # a1
    3460: (3, 11),  # a1
    3461: (3, 11),  # a1
    3462: (3, 11),  # a2
    3463: (3, 11),  # a3
    3464: (3, 11),  # a3
    3465: (3, 11),  # a3
    3466: (3, 11),  # a4
    3467: (3, 11),  # a4
    3468: (3, 11),  # a4
    3469: (3, 11),  # a4
    3470: (3, 11),  # a4
    3471: (3, 11),  # a4
    3472: (3, 11),  # a4
    3473: (3, 11),  # a4
    3474: (3, 11),  # a4
    3475: (3, 11),  # a5
    3476: (3, 11),  # a5
    3477: (3, 11),  # a5
    3478: (3, 11),  # a5
    3479: (3, 11),  # a5
    3480: (3, 11),  # a5
    3481: (3, 11),  # a5
    3482: (3, 11),  # a5
    3483: (3, 11),  # a5
    3484: (3, 11),  # a5
    3485: (3, 11),  # a5
    3486: (3, 11),  # a6
    3487: (3, 11),  # a6
    3488: (3, 11),  # a6
    3489: (3, 11),  # a6
    3490: (3, 11),  # a6
    3491: (3, 11),  # a6
    3492: (3, 11),  # a7
    3493: (3, 11),  # a7
    3494: (3, 11),  # a7
    3495: (3, 11),  # b4
}
def magic_word_to_version(magic_word):
  """Return the Python version belonging to the magic number in the pyc head.
  The magic number is encoded in the first two bytes of a .pyc file.
  It translates to a (major, minor) version. It never has a "micro" version,
  because Python bytecode encoding doesn't change between micro version.
  Arguments:
    magic_word: A 16 bit number, either as an integer or little-endian encoded
      as a string.
  Returns:
    A tuple (major, minor), e.g. (3, 7).
  """
  if not isinstance(magic_word, int):
    magic_word = struct.unpack("<H", magic_word)[0]
  return PYTHON_MAGIC[magic_word]
更新于 阅读次数