Anti-Money Laundering Feature Engineering Library

A Comprehensive Framework for Financial Crime Detection

Feature Library Overview

The anti-money laundering (AML) feature library comprises over 120 carefully engineered features.It is a part of IPMN project:"Data-driven Anti-money Laundering: A Graph-based Machine Learning Framework"

These features capture different dimensions of transactional behavior to effectively identify various money laundering and fraud patterns. The feature engineering approach combines traditional transaction analysis with advanced graph-based methods, temporal pattern recognition, and behavioral change detection.

The Feature Engineering Library code & document: https://github.com/Yanuy/IPMN Contact:Yan Chenyue yan1@connect.hku.hk

Feature Engineering and Model Training Workflow

1. Data Loading and Selection

  • Automatically detecting available datasets in current and parent directories: IBM format, SAML format, cached files...
  • User selects dataset or inputs path: [2] SAML-D.csv
  • System confirms: SAML type, supports multi-classification, cache available

2. Feature Engineering Mode

  • System provides modes: basic, full, custom, default
  • User selects mode: [1] Basic Mode - loads all features except advanced graph features to improve computational efficiency
  • Pipeline starts, attempting to load from cache...

3. Feature Engineering Processing

  • Sliding window feature generation: Optional time window in days and update frequency, default 30-day time window with daily updates
  • Data leakage prevention: Strict time partitioning, future information isolation
  • Feature column management: Automatic encoding, missing value handling, anomaly detection
  • Generated features: 9,504,852 rows × 65 columns complete feature matrix and cached

4. Feature Management and Selection

  • Initial loading of all features: 52 / 52 features selected
  • User selects method: Keep/exclude specific columns Keep top N columns
  • User provides N value or feature names: 30
  • Operation completed. 30 features selected

5. Model Training and Evaluation

  • User confirms feature selection: [Y] Yes
  • Automatically performing binary classification training: XGBoost, LightGBM, RF
  • Automatically performing multi-classification training, supporting any number of classes
  • Model evaluation: ROC curve, confusion matrix, classification report
  • Output model files: .joblib format, including encoders and scalers

Dataset Native Variables

0
Account ID
Unique identifier for sender and receiver accounts
Timestamp
Transaction execution time with full temporal granularity
Payment Currency & Method
Currency type and payment mechanism
Bank Location
Country codes for sender/receiver institutions

Basic Transaction Features

1
Amount_log, Amount_sqrt
Log and square root transformations to handle data skewness
Is_round_amount, Is_large_round
Binary indicators for round number amounts
Has_99_pattern
Detects amounts ending in .99/.98 for threshold avoidance
Is_self_transfer
Identifies same-entity sender-receiver transactions

Activity & Velocity Features

2
Sender_send_count/amount
Transaction count and total amount within time windows
Sender_send_frequency
Transaction frequency rate (transactions per day)
Sender_avg_amount/std/var
Statistical measures of transaction amounts
Pair_transaction_count
Number of transactions between specific pairs

Graph & Network Features

3
Sender_out_degree/in_degree
Out-degree and in-degree for sender accounts
Sender_pagerank
PageRank score indicating network importance
Sender_clustering_coef
Clustering coefficient of sender connections
Has_2_cycle/Has_3_cycle
Binary indicators for cycle structures

Temporal & Time-Series Features

4
Is_business_hour/Is_night_hour
Binary indicators for business/nighttime transactions
DayOfWeek/Is_weekend
Day of week encoding and weekend indicators
Hour_sin, Day_cos
Cyclical encoding of temporal features
Sender_time_since_last
Time interval since sender's previous transaction

Transaction Pattern Features

5
Payment_type_encoded
Encoded payment methods (cash, wire, ACH, card)
Is_cash_transaction
Binary indicator for physical cash transactions
Is_high_risk_payment
Indicator for high-risk payment combinations

Cross-Border & Currency Features

6
Is_cross_border
Binary indicator for international transactions
Currency_mismatch
Indicator for mismatched payment/receiving currencies
Is_high_risk_sender_location
Indicator for high-risk jurisdiction senders
Country_risk_score
Numerical risk score based on jurisdiction

Behavioral Change Features

7
Sender_amount_deviation
Deviation from sender's historical average
Is_sender_unusual_amount
Binary indicator for amounts outside normal range
Sender_frequency_change
Change in transaction frequency vs baseline
Is_new_relationship
Indicator for first-time sender-receiver pairs

Composite Risk Features

8
Risk_score
Cumulative score from multiple risk indicators
Risk_level_encoded
Categorical risk level (low, medium, high)
Ensemble_score
Weighted combination for final assessment

Money Laundering Pattern Visualizations

Interactive visual representations of common money laundering patterns detected by our feature engineering framework

Behavioural Change Patterns

行为改变模式
#0-1
1. Sudden significant changes in account transaction behavior, such as frequency, amount, or counterparties. This may indicate account compromise or illegal activities. 2.Account suddenly becomes active after long dormancy. Common in dormant accounts activated for money laundering.
1. 账户交易行为突然发生显著变化,如交易频率、金额或对手方突然改变。这可能表明账户被盗用或用于非法活动。 2. 账户从静默突然活跃。常见于长期休眠账户被激活用于洗钱。

Single Large Transaction

单笔大额模式
#24
Single unusually large transaction inconsistent with account history. May indicate one-time laundering or account compromise.
单笔异常大额交易,与账户历史交易模式不符。可能是一次性洗钱行为或账户被盗用的标志。

Cash Withdrawal Pattern

现金提取模式
#3
Large amounts quickly deposited and immediately withdrawn as cash. This pattern attempts to break the money trail, making funds difficult to trace.
大量资金快速流入账户后立即被提取为现金。这种模式试图切断资金追踪链条,使资金难以追踪。

Cycle Pattern

循环模式
#4
Funds circulate through multiple accounts in a closed loop, returning to the starting account. Used to obscure fund sources and create false transaction records.
资金在多个账户间形成闭环流动,最终回到起始账户。这种模式用于混淆资金来源,制造虚假交易记录。

Deposit-Send Pattern

存款-转账模式
#5
Account receives deposits and immediately transfers to other accounts, maintaining minimal balance. Typical transit account behavior for quick illegal fund transfers.
账户收到存款后立即转出到其他账户,几乎不保留余额。这是典型的过渡账户特征,用于快速转移非法资金。

Over-Invoicing Pattern

虚开发票模式
#22
Creating invoices through fake transactions or inflated amounts to provide legal cover for illegal fund transfers. Common in cross-border laundering and tax evasion.
通过虚假交易或夸大交易金额开具发票,为非法资金转移提供合法外衣。常见于跨境洗钱和逃税。

Gather-Scatter/Scatter-Gather Pattern

聚集-分散/分散-聚集模式
#8
First gathers funds from multiple accounts to a central account, then disperses to other accounts. This two-stage pattern increases tracking complexity.
先通过多个账户收集资金到中心账户,然后再分散到其他多个账户。这种两阶段模式增加了资金追踪的复杂性。

Layered Fan-In/Fan-Out Patterns

分层扇模式
#9-10
Funds flow through multiple layers of intermediary accounts in tree structures. Each layer increases tracking difficulty, common in complex laundering networks.
资金通过多层中介账户逐级流动,形成树状结构。每一层都增加了追踪难度,常用于复杂的洗钱网络。

Bipartite Structure

二分图结构
#2
Funds flow between two groups of accounts forming a bipartite structure. One group transfers to another, but no transactions within groups. Used to hide real fund flows.
资金在两组账户之间流动,形成二分图结构。一组账户向另一组账户转账,但同组内部没有交易。常用于隐藏资金真实流向。

Fan-In/Fan-Out Pattern

扇入/扇出模式
#6-7
Multiple accounts converge funds to a central account. Common in illegal fundraising, gambling collections, concentrating dispersed funds in one place./One account disperses funds to multiple accounts. Used to split large amounts, reduce individual transaction sizes, and evade regulatory thresholds.
多个账户向一个中心账户汇聚资金。常见于非法集资、赌博收款等场景,将分散的资金集中到一处。/一个账户向多个账户分散转账。用于分散大额资金,降低单笔交易金额,规避监管阈值。

Smurfing Pattern

化整为零模式
#25
Breaking large amounts into multiple small transactions below regulatory reporting thresholds. Most common method to evade AML monitoring.
将大额资金拆分成多笔小额交易,每笔都低于监管报告阈值。这是最常见的规避反洗钱监管的手段。

Structuring Pattern

结构化模式
#27
Carefully designed transaction patterns including systematic arrangement of timing, amounts, and accounts. Uses complex transaction structures to conceal illegal fund flows.
精心设计的交易模式,包括时间、金额、账户的系统性安排。通过复杂的交易结构来掩盖非法资金流动。

Pattern Visualization Legend

Normal Account
Suspicious Account
Intermediary Account
Money Flow

Feature Mapping to Money Laundering Typologies

Placement
Structuring / Smurfing
High Risk
Breaking large cash amounts into smaller deposits to avoid reporting thresholds.

Key Features:

  • Is_cash_transaction: Identifies cash deposits
  • Has_99_pattern: Captures threshold avoidance
  • Receiver_in_degree: Multiple small deposits
Placement
Rapid Movement
High Risk
Quick withdrawal of deposited cash or immediate transfers to break audit trails.

Key Features:

  • Sender_time_since_last: Short intervals
  • Payment_type_encoded: Cash-to-wire sequences
  • Is_self_transfer: Movement between accounts
Layering
Cycling Patterns
High Risk
Moving funds through account series, forming circular patterns to obscure trails.

Key Features:

  • Has_2_cycle, Has_3_cycle: Detects cycles
  • Sender_betweenness_centrality: Intermediaries
  • Is_new_relationship: New connections
Layering
Fan-in / Collect-Disperse
Medium Risk
Collecting funds from multiple sources, then dispersing to other accounts.

Key Features:

  • Receiver_in_degree: High in-degree accounts
  • Sender_out_in_ratio: Collection vs dispersion
  • Sender_pagerank: Central account importance
Integration
Trade-Based Laundering
Medium Risk
Creating seemingly legitimate but unusually large transactions.

Key Features:

  • Is_sender_unusual_amount: Abnormal amounts
  • Is_large_round: Suspicious round numbers
  • Pair_transaction_count: New counterparties
Cross-Border
Jurisdiction Shopping
Medium Risk
Moving funds to jurisdictions with weaker AML controls.

Key Features:

  • Is_cross_border: International transactions
  • Is_high_risk_sender_location: Risk jurisdictions
  • Currency_mismatch: Unusual conversions

Feature Detection and Risk Assessment Flow

Raw Data
Feature Engineering
Behavioral Analysis
Network Analysis
Risk Scoring
Alert Generation

Risk Level Classification

High Risk
Medium Risk
Low Risk
Normal