Skip to content

KMQ-Iterative 下新一轮采样的 weight 更新方式实现与论文不一致? #10

Description

@Aurelius84

在论文里,初始每一个 cluster 的采样权重是均分的,都为1/k,后续每一轮各个cluster的采样权重是根据 reward_diff 的 softmax 值对上一轮的 weight 做加权。

但从 iter.py 里的实现代码来看,是直接用了reward_diff 的 softmax 值作为了下一轮的采样权重。想问下论文中的结果是使用的哪种方式?

def select_new_iter(rewards_gathered, dataset, indices_path, calculate_method="exp_reward_diff"):
    # ....(省略)
    merged_df = subset_df.merge(rewards_df, left_index=True, right_index=True)
    merged_df = merged_df.groupby('cluster')['reward_diff'].mean().reset_index()
    if calculate_method == "ppl":
        merged_df['exp_reward_diff'] = merged_df['reward_diff']
    else:
        merged_df['exp_reward_diff'] = np.exp(merged_df['reward_diff'])
        merged_df['exp_reward_diff'] = merged_df['exp_reward_diff'] / merged_df['exp_reward_diff'].sum()
    size = (len(dataset) * portion) / K / round
    exp_reward_diff = merged_df['exp_reward_diff']
    
    # 下面这一行是直接用了 softmax 做新一轮采样 weight,没有乘以上一轮weight,与论文不一样?
    select_new_iter = np.random.choice(K, size=int(size), p=exp_reward_diff, replace=True)  
    selected_clusters_size = Counter(select_new_iter)
    # ....(省略)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions