Scaling
Step 4 of the pipeline.
Scaling addresses the "Parisian" and the "Marseillais" problems, i.e. users with too extreme scores, or whose negative scores correspond to entities that others rate as positive. This latter effect is particularly an issue in comparison-based preference learning, assuming each user has a very specific selection bias of rated entities.
Scaling
¶
__call__
¶
__call__(
user_models: Mapping[int, ScoringModel],
users: DataFrame,
entities: DataFrame,
voting_rights: VotingRights,
privacy: PrivacySettings,
) -> dict[int, ScaledScoringModel]
Returns scaled user models
Parameters:
Name | Type | Description | Default |
---|---|---|---|
user_models
|
Mapping[int, ScoringModel]
|
user_models[user] is user's scoring model |
required |
users
|
DataFrame
|
|
required |
entities
|
DataFrame
|
|
required |
voting_rights
|
VotingRights
|
voting_rights[user, entity]: float |
required |
privacy
|
PrivacySettings
|
privacy[user, entity] in { True, False, None } |
required |
Returns:
Type | Description |
---|---|
out[user]: ScoringModel
|
Will be scaled by the Scaling method |
NoScaling
¶
Bases: Scaling
__call__
¶
__call__(
user_models: Mapping[int, ScoringModel],
users: DataFrame = ...,
entities: DataFrame = ...,
voting_rights: VotingRights = ...,
privacy: PrivacySettings = ...,
) -> dict[int, ScaledScoringModel]
Returns scaled user models
Parameters:
Name | Type | Description | Default |
---|---|---|---|
user_models
|
Mapping[int, ScoringModel]
|
user_models[user] is user's scoring model |
required |
users
|
DataFrame
|
|
...
|
entities
|
DataFrame
|
|
...
|
voting_rights
|
VotingRights
|
voting_rights[user, entity]: float |
...
|
privacy
|
PrivacySettings
|
privacy[user, entity] in { True, False, None } |
...
|
Returns:
Type | Description |
---|---|
out[user]: ScoringModel
|
Will be scaled by the Scaling method |
Mehestan
¶
Mehestan(
lipschitz: float = 0.1,
min_activity: float = 10.0,
n_scalers_max: int = 100,
privacy_penalty: float = 0.5,
user_comparison_lipschitz: float = 10.0,
p_norm_for_multiplicative_resilience: float = 4.0,
n_diffs_sample_max: int = 1000,
error: float = 1e-05,
)
Bases: Scaling
Mehestan performs Lipschitz-resilient ollaborative scaling.
A simplified version of Mehestan was published in "Robust Sparse Voting", Youssef Allouah, Rachid Guerraoui, Lȩ Nguyên Hoang and Oscar Villemaud, published at AISTATS 2024.
The inclusion of uncertainties is further detailed in "Solidago: A Modular Pipeline for Collaborative Scoring"
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lipschitz
|
float
|
Resilience parameters. Larger values are more resilient, but less accurate. |
0.1
|
min_activity
|
float
|
Minimal activity (e.g based on number of comparisons) to be a potential scaler. |
10.0
|
n_scalers_max
|
int
|
Maximal number of scalers |
100
|
privacy_penalty
|
float
|
Penalty to private ratings when selecting scalers |
0.5
|
p_norm_for_multiplicative_resilience
|
float
|
To provide stronger security, we enforce a large resilience on multiplicator estimation, when the model scores of a user are large. The infinite norm may be to sensitive to extreme values, thus we propose to use an l_p norm. |
4.0
|
error
|
float
|
Error bound |
1e-05
|
p_norm_for_multiplicative_resilience
¶
p_norm_for_multiplicative_resilience = (
p_norm_for_multiplicative_resilience
)
__call__
¶
__call__(
user_models: Mapping[int, ScoringModel],
users: DataFrame,
entities: DataFrame,
voting_rights: Optional[VotingRights] = None,
privacy: Optional[PrivacySettings] = None,
) -> dict[int, ScaledScoringModel]
Returns scaled user models
Parameters:
Name | Type | Description | Default |
---|---|---|---|
user_models
|
Mapping[int, ScoringModel]
|
user_models[user] is user's scoring model |
required |
users
|
DataFrame
|
|
required |
entities
|
DataFrame
|
|
required |
voting_rights
|
Optional[VotingRights]
|
Not used in Mehestan |
None
|
privacy
|
Optional[PrivacySettings]
|
privacy[user, entity] in { True, False, None } |
None
|
Returns:
Type | Description |
---|---|
out[user]: ScoringModel
|
Will be scaled by the Scaling method |
compute_scalers
¶
compute_scalers(
user_models: Mapping[int, ScoringModel],
entities: DataFrame,
users: DataFrame,
privacy: Optional[PrivacySettings],
) -> ndarray
Determines which users will be scalers. The set of scalers is restricted for two reasons. First, too inactive users are removed, because their lack of comparability with other users makes the scaling process ineffective. Second, scaling scalers is the most computationally demanding step of Mehestan. Reducing the number of scalers accelerates the computation.
Parameters (TODO)
Returns:
Name | Type | Description |
---|---|---|
is_scaler |
ndarray
|
is_scaler[user]: bool says whether user is a scaler |
scale_non_scalers
¶
compute_activities
¶
compute_activities(
user_models: Mapping[int, ScoringModel],
entities: DataFrame,
users: DataFrame,
privacy: Optional[PrivacySettings],
) -> dict[int, float]
Returns a dictionary, which maps users to their trustworthy activeness.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
user_models
|
Mapping[int, ScoringModel]
|
|
required |
entities
|
DataFrame
|
|
required |
users
|
DataFrame
|
|
required |
privacy
|
Optional[PrivacySettings]
|
privacy[user, entity] in { True, False, None } |
required |
Returns:
Name | Type | Description |
---|---|---|
activities |
dict[int, float]
|
activities[user] is a measure of user's trustworthy activeness. |
compute_model_norms
¶
compute_model_norms(
user_models: dict[int, ScoringModel],
users: DataFrame,
entities: DataFrame,
privacy: PrivacySettings,
) -> dict[int, float]
Estimator of the scale of scores of a user, with an emphasis on large scores. The estimator uses a L_power norm, and weighs scores, depending on public/private status. For each user u, it computes (sum_e w[ue] * score[u, e]power / sum_e w[ue])(1/power).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
user_models
|
dict[int, ScoringModel]
|
user_models[user] is user's scoring model |
required |
users
|
DataFrame
|
|
required |
entities
|
DataFrame
|
|
required |
privacy
|
PrivacySettings
|
privacy[user, entity] in { True, False, None } |
required |
Returns:
Name | Type | Description |
---|---|---|
out |
dict[int, float]
|
out[user] |
compute_entity_ratios
¶
compute_entity_ratios(
scalee_models: dict[int, ScoringModel],
scaler_models: dict[int, ScoringModel],
entities: DataFrame,
privacy: PrivacySettings,
) -> dict[
int,
dict[
int,
tuple[
list[float],
list[float],
list[float],
list[float],
],
],
]
Computes the ratios of score differences, with uncertainties,
for comparable entities of any pair of scalers (\(s_{uvef}\) in paper),
for \(u\) in scalees and \(v\) in scalers.
Note that the output ratios[u][v]
is given as a 1-dimensional np.ndarray
without any reference to e and f.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entities
|
DataFrame
|
|
required |
scalee_models
|
dict[int, ScoringModel]
|
scalee_models[user_id] is a scoring model |
required |
scaler_models
|
dict[int, ScoringModel]
|
scaler_models[user_id] is a scoring model |
required |
privacy
|
PrivacySettings
|
|
required |
Returns:
Name | Type | Description |
---|---|---|
out |
dict[int, dict[int, tuple[list[float], list[float], list[float], list[float]]]]
|
|
load_all_ratios
¶
load_all_ratios(
u: int,
v: int,
uv_entities: list[int],
entities: DataFrame,
u_model: ScoringModel,
v_model: ScoringModel,
privacy: Optional[PrivacySettings],
) -> Optional[
tuple[
list[float], list[float], list[float], list[float]
]
]
sample_ratios
¶
sample_ratios(
u: int,
v: int,
uv_entities: list[int],
entities: DataFrame,
u_model: ScoringModel,
v_model: ScoringModel,
privacy: Optional[PrivacySettings],
) -> tuple[
list[float], list[float], list[float], list[float]
]
compute_multiplicators
¶
compute_multiplicators(
voting_rights: dict[int, list[float]],
ratios: dict[int, list[float]],
uncertainties: dict[int, list[float]],
model_norms: dict[int, float],
) -> dict[int, tuple[float, float]]
Computes the multiplicators of users with given user_ratios
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ratios
|
dict[int, list[float]]
|
|
required |
model_norms
|
dict[int, float]
|
model_norms[u] estimates the norm of user u's score model |
required |
Returns:
Name | Type | Description |
---|---|---|
multiplicators |
dict[int, tuple[float, float]]
|
|
compute_entity_diffs
¶
compute_entity_diffs(
scalee_models: dict[int, ScoringModel],
scaler_models: dict[int, ScoringModel],
entities: DataFrame,
privacy: PrivacySettings,
multiplicators: dict[int, tuple[float, float]],
) -> dict[
int,
dict[
int,
tuple[
list[float],
list[float],
list[float],
list[float],
],
],
]
Computes the differences of scores, with uncertainties, for shared entities of any pair of scalers (\(s_{uvef}\) in paper). Note that the output is given as a 1-dimensional np.ndarray without any reference to e and f.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scalee_models
|
dict[int, ScoringModel]
|
|
required |
scaler_models
|
dict[int, ScoringModel]
|
|
required |
entities
|
DataFrame
|
|
required |
multiplicators
|
dict[int, tuple[float, float]]
|
|
required |
Returns:
Name | Type | Description |
---|---|---|
out |
dict[int, dict[int, tuple[list[float], list[float], list[float]]]]
|
|
QuantileShift
¶
QuantileShift(
quantile: float = 0.15,
*,
target_score: float = 0.0,
lipschitz: float = 0.1,
error: float = 1e-05
)
Bases: Scaling
Shifts the scores so that their quantile
(computed with
qr_quantile
) value equals target_score
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
quantile
|
float
|
|
0.15
|
target_score
|
float
|
|
0.0
|
lipschitz
|
float
|
|
0.1
|
error
|
float
|
|
1e-05
|
__call__
¶
__call__(
user_models: Mapping[int, ScoringModel],
users: DataFrame,
entities: DataFrame,
voting_rights: VotingRights,
privacy: PrivacySettings,
) -> dict[int, ScaledScoringModel]
Returns scaled user models
Parameters:
Name | Type | Description | Default |
---|---|---|---|
user_models
|
Mapping[int, ScoringModel]
|
user_models[user] is user's scoring model |
required |
users
|
DataFrame
|
|
required |
entities
|
DataFrame
|
|
required |
voting_rights
|
VotingRights
|
voting_rights[user, entity]: float |
required |
privacy
|
PrivacySettings
|
privacy[user, entity] in { True, False, None } |
required |
Returns:
Type | Description |
---|---|
out[user]: ScoringModel
|
Will be scaled by the Scaling method |
Standardize
¶
Bases: Scaling
__call__
¶
__call__(
user_models: dict[int, ScoringModel],
users: DataFrame,
entities: DataFrame,
voting_rights: VotingRights,
privacy: PrivacySettings,
)