Feature Engineering Library Workflow and Function Visualization

Feature Library Overview

The anti-money laundering (AML) feature library comprises over 120 carefully engineered features.It is a part of IPMN project:"Data-driven Anti-money Laundering: A Graph-based Machine Learning Framework"

These features capture different dimensions of transactional behavior to effectively identify various money laundering and fraud patterns. The feature engineering approach combines traditional transaction analysis with advanced graph-based methods, temporal pattern recognition, and behavioral change detection.

The Feature Engineering Library code & document: https://github.com/Yanuy/IPMN Contact:Yan Chenyue yan1@connect.hku.hk

Feature Engineering and Model Training Workflow

1. Data Loading and Selection

Automatically detecting available datasets in current and parent directories: IBM format, SAML format, cached files...
User selects dataset or inputs path: [2] SAML-D.csv
System confirms: SAML type, supports multi-classification, cache available

→

2. Feature Engineering Mode

System provides modes: basic, full, custom, default
User selects mode: [1] Basic Mode - loads all features except advanced graph features to improve computational efficiency
Pipeline starts, attempting to load from cache...

→

3. Feature Engineering Processing

Sliding window feature generation: Optional time window in days and update frequency, default 30-day time window with daily updates
Data leakage prevention: Strict time partitioning, future information isolation
Feature column management: Automatic encoding, missing value handling, anomaly detection
Generated features: 9,504,852 rows × 65 columns complete feature matrix and cached

→

4. Feature Management and Selection

Initial loading of all features: 52 / 52 features selected
User selects method: Keep/exclude specific columns Keep top N columns
User provides N value or feature names: 30
Operation completed. 30 features selected

→

5. Model Training and Evaluation

User confirms feature selection: [Y] Yes
Automatically performing binary classification training: XGBoost, LightGBM, RF
Automatically performing multi-classification training, supporting any number of classes
Model evaluation: ROC curve, confusion matrix, classification report
Output model files: .joblib format, including encoders and scalers

Dataset Native Variables

Account ID

Unique identifier for sender and receiver accounts

Timestamp

Transaction execution time with full temporal granularity

Payment Currency & Method

Currency type and payment mechanism

Bank Location

Country codes for sender/receiver institutions

Basic Transaction Features

Amount_log, Amount_sqrt

Log and square root transformations to handle data skewness

Is_round_amount, Is_large_round

Binary indicators for round number amounts

Has_99_pattern

Detects amounts ending in .99/.98 for threshold avoidance

Is_self_transfer

Identifies same-entity sender-receiver transactions

Activity & Velocity Features

Sender_send_count/amount

Transaction count and total amount within time windows

Sender_send_frequency

Transaction frequency rate (transactions per day)

Sender_avg_amount/std/var

Statistical measures of transaction amounts

Pair_transaction_count

Number of transactions between specific pairs

Graph & Network Features

Sender_out_degree/in_degree

Out-degree and in-degree for sender accounts

Sender_pagerank

PageRank score indicating network importance

Sender_clustering_coef

Clustering coefficient of sender connections

Has_2_cycle/Has_3_cycle

Binary indicators for cycle structures

Temporal & Time-Series Features

Is_business_hour/Is_night_hour

Binary indicators for business/nighttime transactions

DayOfWeek/Is_weekend

Day of week encoding and weekend indicators

Hour_sin, Day_cos

Cyclical encoding of temporal features

Sender_time_since_last

Time interval since sender's previous transaction

Transaction Pattern Features

Payment_type_encoded

Encoded payment methods (cash, wire, ACH, card)

Is_cash_transaction

Binary indicator for physical cash transactions

Is_high_risk_payment

Indicator for high-risk payment combinations

Cross-Border & Currency Features

Is_cross_border

Binary indicator for international transactions

Currency_mismatch

Indicator for mismatched payment/receiving currencies

Is_high_risk_sender_location

Indicator for high-risk jurisdiction senders

Country_risk_score

Numerical risk score based on jurisdiction

Behavioral Change Features

Sender_amount_deviation

Deviation from sender's historical average

Is_sender_unusual_amount

Binary indicator for amounts outside normal range

Sender_frequency_change

Change in transaction frequency vs baseline

Is_new_relationship

Indicator for first-time sender-receiver pairs

Composite Risk Features

Risk_score

Cumulative score from multiple risk indicators

Risk_level_encoded

Categorical risk level (low, medium, high)

Ensemble_score

Weighted combination for final assessment

Money Laundering Pattern Visualizations

Interactive visual representations of common money laundering patterns detected by our feature engineering framework

Behavioural Change Patterns

行为改变模式

#0-1

1. Sudden significant changes in account transaction behavior, such as frequency, amount, or counterparties. This may indicate account compromise or illegal activities. 2.Account suddenly becomes active after long dormancy. Common in dormant accounts activated for money laundering.

1. 账户交易行为突然发生显著变化，如交易频率、金额或对手方突然改变。这可能表明账户被盗用或用于非法活动。 2. 账户从静默突然活跃。常见于长期休眠账户被激活用于洗钱。

Single Large Transaction

单笔大额模式

#24

Single unusually large transaction inconsistent with account history. May indicate one-time laundering or account compromise.

单笔异常大额交易，与账户历史交易模式不符。可能是一次性洗钱行为或账户被盗用的标志。

Cash Withdrawal Pattern

现金提取模式

Large amounts quickly deposited and immediately withdrawn as cash. This pattern attempts to break the money trail, making funds difficult to trace.

大量资金快速流入账户后立即被提取为现金。这种模式试图切断资金追踪链条，使资金难以追踪。

Cycle Pattern

循环模式

Funds circulate through multiple accounts in a closed loop, returning to the starting account. Used to obscure fund sources and create false transaction records.

资金在多个账户间形成闭环流动，最终回到起始账户。这种模式用于混淆资金来源，制造虚假交易记录。

Deposit-Send Pattern

存款-转账模式

Account receives deposits and immediately transfers to other accounts, maintaining minimal balance. Typical transit account behavior for quick illegal fund transfers.

账户收到存款后立即转出到其他账户，几乎不保留余额。这是典型的过渡账户特征，用于快速转移非法资金。

Over-Invoicing Pattern

虚开发票模式

#22

Creating invoices through fake transactions or inflated amounts to provide legal cover for illegal fund transfers. Common in cross-border laundering and tax evasion.

通过虚假交易或夸大交易金额开具发票，为非法资金转移提供合法外衣。常见于跨境洗钱和逃税。

Gather-Scatter/Scatter-Gather Pattern

聚集-分散/分散-聚集模式

First gathers funds from multiple accounts to a central account, then disperses to other accounts. This two-stage pattern increases tracking complexity.

先通过多个账户收集资金到中心账户，然后再分散到其他多个账户。这种两阶段模式增加了资金追踪的复杂性。

Layered Fan-In/Fan-Out Patterns

分层扇模式

#9-10

Funds flow through multiple layers of intermediary accounts in tree structures. Each layer increases tracking difficulty, common in complex laundering networks.

资金通过多层中介账户逐级流动，形成树状结构。每一层都增加了追踪难度，常用于复杂的洗钱网络。

Bipartite Structure

二分图结构

Funds flow between two groups of accounts forming a bipartite structure. One group transfers to another, but no transactions within groups. Used to hide real fund flows.

资金在两组账户之间流动，形成二分图结构。一组账户向另一组账户转账，但同组内部没有交易。常用于隐藏资金真实流向。

Fan-In/Fan-Out Pattern

扇入/扇出模式

#6-7

Multiple accounts converge funds to a central account. Common in illegal fundraising, gambling collections, concentrating dispersed funds in one place./One account disperses funds to multiple accounts. Used to split large amounts, reduce individual transaction sizes, and evade regulatory thresholds.

多个账户向一个中心账户汇聚资金。常见于非法集资、赌博收款等场景，将分散的资金集中到一处。/一个账户向多个账户分散转账。用于分散大额资金，降低单笔交易金额，规避监管阈值。

Smurfing Pattern

化整为零模式

#25

Breaking large amounts into multiple small transactions below regulatory reporting thresholds. Most common method to evade AML monitoring.

将大额资金拆分成多笔小额交易，每笔都低于监管报告阈值。这是最常见的规避反洗钱监管的手段。

Structuring Pattern

结构化模式

#27

Carefully designed transaction patterns including systematic arrangement of timing, amounts, and accounts. Uses complex transaction structures to conceal illegal fund flows.

精心设计的交易模式，包括时间、金额、账户的系统性安排。通过复杂的交易结构来掩盖非法资金流动。

Pattern Visualization Legend

Normal Account

Suspicious Account

Intermediary Account

Money Flow

Feature Mapping to Money Laundering Typologies

Placement

Structuring / Smurfing

High Risk

Breaking large cash amounts into smaller deposits to avoid reporting thresholds.

Key Features:

Is_cash_transaction: Identifies cash deposits
Has_99_pattern: Captures threshold avoidance
Receiver_in_degree: Multiple small deposits

Placement

Rapid Movement

High Risk

Quick withdrawal of deposited cash or immediate transfers to break audit trails.

Key Features:

Sender_time_since_last: Short intervals
Payment_type_encoded: Cash-to-wire sequences
Is_self_transfer: Movement between accounts

Layering

Cycling Patterns

High Risk

Moving funds through account series, forming circular patterns to obscure trails.

Key Features:

Has_2_cycle, Has_3_cycle: Detects cycles
Sender_betweenness_centrality: Intermediaries
Is_new_relationship: New connections

Layering

Fan-in / Collect-Disperse

Medium Risk

Collecting funds from multiple sources, then dispersing to other accounts.

Key Features:

Receiver_in_degree: High in-degree accounts
Sender_out_in_ratio: Collection vs dispersion
Sender_pagerank: Central account importance

Integration

Trade-Based Laundering

Medium Risk

Creating seemingly legitimate but unusually large transactions.

Key Features:

Is_sender_unusual_amount: Abnormal amounts
Is_large_round: Suspicious round numbers
Pair_transaction_count: New counterparties

Cross-Border

Jurisdiction Shopping

Medium Risk

Moving funds to jurisdictions with weaker AML controls.

Key Features:

Is_cross_border: International transactions
Is_high_risk_sender_location: Risk jurisdictions
Currency_mismatch: Unusual conversions

Feature Detection and Risk Assessment Flow

Raw Data

→

Feature Engineering

→

Behavioral Analysis

→

Network Analysis

→

Risk Scoring

→

Alert Generation

Risk Level Classification

High Risk

Medium Risk

Low Risk

Normal

Anti-Money Laundering Feature Engineering Library

Feature Library Overview

Feature Engineering and Model Training Workflow

1. Data Loading and Selection

2. Feature Engineering Mode

3. Feature Engineering Processing

4. Feature Management and Selection

5. Model Training and Evaluation

Dataset Native Variables

Basic Transaction Features

Activity & Velocity Features

Graph & Network Features

Temporal & Time-Series Features

Transaction Pattern Features

Cross-Border & Currency Features

Behavioral Change Features

Composite Risk Features

Money Laundering Pattern Visualizations

Behavioural Change Patterns

Single Large Transaction

Cash Withdrawal Pattern

Cycle Pattern

Deposit-Send Pattern

Over-Invoicing Pattern

Gather-Scatter/Scatter-Gather Pattern

Layered Fan-In/Fan-Out Patterns

Bipartite Structure

Fan-In/Fan-Out Pattern

Smurfing Pattern

Structuring Pattern

Pattern Visualization Legend

Feature Mapping to Money Laundering Typologies

Key Features:

Key Features:

Key Features:

Key Features:

Key Features:

Key Features:

Feature Detection and Risk Assessment Flow

Risk Level Classification