Skip to content

Scaling

Step 4 of the pipeline.

Scaling addresses the "Parisian" and the "Marseillais" problems, i.e. users with too extreme scores, or whose negative scores correspond to entities that others rate as positive. This latter effect is particularly an issue in comparison-based preference learning, assuming each user has a very specific selection bias of rated entities.

Scaling

__call__

__call__(
    user_models: Mapping[int, ScoringModel],
    users: DataFrame,
    entities: DataFrame,
    voting_rights: VotingRights,
    privacy: PrivacySettings,
) -> dict[int, ScaledScoringModel]

Returns scaled user models

Parameters:

Name Type Description Default
user_models Mapping[int, ScoringModel]

user_models[user] is user's scoring model

required
users DataFrame
  • user_id (int, index)
  • trust_score (float)
required
entities DataFrame
  • entity_id (int, ind)
required
voting_rights VotingRights

voting_rights[user, entity]: float

required
privacy PrivacySettings

privacy[user, entity] in { True, False, None }

required

Returns:

Type Description
out[user]: ScoringModel

Will be scaled by the Scaling method

to_json

to_json() -> tuple

ScalingCompose

ScalingCompose(*scalings: Scaling)

Bases: Scaling

Class used to compose any number of scaling solutions

Composes a list of scalings

scalings

scalings = scalings

__call__

__call__(
    user_models, users, entities, voting_rights, privacy
) -> dict[int, ScaledScoringModel]

to_json

to_json()

NoScaling

Bases: Scaling

__call__

__call__(
    user_models: Mapping[int, ScoringModel],
    users: DataFrame = ...,
    entities: DataFrame = ...,
    voting_rights: VotingRights = ...,
    privacy: PrivacySettings = ...,
) -> dict[int, ScaledScoringModel]

Returns scaled user models

Parameters:

Name Type Description Default
user_models Mapping[int, ScoringModel]

user_models[user] is user's scoring model

required
users DataFrame
  • user_id (int, index)
  • trust_score (float)
...
entities DataFrame
  • entity_id (int, ind)
...
voting_rights VotingRights

voting_rights[user, entity]: float

...
privacy PrivacySettings

privacy[user, entity] in { True, False, None }

...

Returns:

Type Description
out[user]: ScoringModel

Will be scaled by the Scaling method

Mehestan

Mehestan(
    lipschitz: float = 0.1,
    min_activity: float = 10.0,
    n_scalers_max: int = 100,
    privacy_penalty: float = 0.5,
    user_comparison_lipschitz: float = 10.0,
    p_norm_for_multiplicative_resilience: float = 4.0,
    n_diffs_sample_max: int = 1000,
    error: float = 1e-05,
)

Bases: Scaling

Mehestan performs Lipschitz-resilient ollaborative scaling.

A simplified version of Mehestan was published in "Robust Sparse Voting", Youssef Allouah, Rachid Guerraoui, Lȩ Nguyên Hoang and Oscar Villemaud, published at AISTATS 2024.

The inclusion of uncertainties is further detailed in "Solidago: A Modular Pipeline for Collaborative Scoring"

Parameters:

Name Type Description Default
lipschitz float

Resilience parameters. Larger values are more resilient, but less accurate.

0.1
min_activity float

Minimal activity (e.g based on number of comparisons) to be a potential scaler.

10.0
n_scalers_max int

Maximal number of scalers

100
privacy_penalty float

Penalty to private ratings when selecting scalers

0.5
p_norm_for_multiplicative_resilience float

To provide stronger security, we enforce a large resilience on multiplicator estimation, when the model scores of a user are large. The infinite norm may be to sensitive to extreme values, thus we propose to use an l_p norm.

4.0
error float

Error bound

1e-05

lipschitz

lipschitz = lipschitz

min_activity

min_activity = min_activity

n_scalers_max

n_scalers_max = n_scalers_max

privacy_penalty

privacy_penalty = privacy_penalty

user_comparison_lipschitz

user_comparison_lipschitz = user_comparison_lipschitz

p_norm_for_multiplicative_resilience

p_norm_for_multiplicative_resilience = (
    p_norm_for_multiplicative_resilience
)

n_diffs_sample_max

n_diffs_sample_max = n_diffs_sample_max

error

error = error

__call__

__call__(
    user_models: Mapping[int, ScoringModel],
    users: DataFrame,
    entities: DataFrame,
    voting_rights: Optional[VotingRights] = None,
    privacy: Optional[PrivacySettings] = None,
) -> dict[int, ScaledScoringModel]

Returns scaled user models

Parameters:

Name Type Description Default
user_models Mapping[int, ScoringModel]

user_models[user] is user's scoring model

required
users DataFrame
  • user_id (int, index)
  • trust_score (float)
required
entities DataFrame
  • entity_id (int, ind)
required
voting_rights Optional[VotingRights]

Not used in Mehestan

None
privacy Optional[PrivacySettings]

privacy[user, entity] in { True, False, None }

None

Returns:

Type Description
out[user]: ScoringModel

Will be scaled by the Scaling method

compute_scalers

compute_scalers(
    user_models: Mapping[int, ScoringModel],
    entities: DataFrame,
    users: DataFrame,
    privacy: Optional[PrivacySettings],
) -> ndarray

Determines which users will be scalers. The set of scalers is restricted for two reasons. First, too inactive users are removed, because their lack of comparability with other users makes the scaling process ineffective. Second, scaling scalers is the most computationally demanding step of Mehestan. Reducing the number of scalers accelerates the computation.

Parameters (TODO)

Returns:

Name Type Description
is_scaler ndarray

is_scaler[user]: bool says whether user is a scaler

scale_scalers

scale_scalers(user_models, scalers, entities, privacy)

scale_non_scalers

scale_non_scalers(
    user_models,
    nonscalers,
    entities,
    scalers,
    scaled_models,
    privacy,
)

compute_activities

compute_activities(
    user_models: Mapping[int, ScoringModel],
    entities: DataFrame,
    users: DataFrame,
    privacy: Optional[PrivacySettings],
) -> dict[int, float]

Returns a dictionary, which maps users to their trustworthy activeness.

Parameters:

Name Type Description Default
user_models Mapping[int, ScoringModel]
required
entities DataFrame
required
users DataFrame
required
privacy Optional[PrivacySettings]

privacy[user, entity] in { True, False, None }

required

Returns:

Name Type Description
activities dict[int, float]

activities[user] is a measure of user's trustworthy activeness.

compute_model_norms

compute_model_norms(
    user_models: dict[int, ScoringModel],
    users: DataFrame,
    entities: DataFrame,
    privacy: PrivacySettings,
) -> dict[int, float]

Estimator of the scale of scores of a user, with an emphasis on large scores. The estimator uses a L_power norm, and weighs scores, depending on public/private status. For each user u, it computes (sum_e w[ue] * score[u, e]power / sum_e w[ue])(1/power).

Parameters:

Name Type Description Default
user_models dict[int, ScoringModel]

user_models[user] is user's scoring model

required
users DataFrame
  • user_id (int, index)
required
entities DataFrame
  • entity_id (int, ind)
required
privacy PrivacySettings

privacy[user, entity] in { True, False, None }

required

Returns:

Name Type Description
out dict[int, float]

out[user]

compute_entity_ratios

compute_entity_ratios(
    scalee_models: dict[int, ScoringModel],
    scaler_models: dict[int, ScoringModel],
    entities: DataFrame,
    privacy: PrivacySettings,
) -> dict[
    int,
    dict[
        int,
        tuple[
            list[float],
            list[float],
            list[float],
            list[float],
        ],
    ],
]

Computes the ratios of score differences, with uncertainties, for comparable entities of any pair of scalers (\(s_{uvef}\) in paper), for \(u\) in scalees and \(v\) in scalers. Note that the output ratios[u][v] is given as a 1-dimensional np.ndarray without any reference to e and f.

Parameters:

Name Type Description Default
entities DataFrame
  • entity_id (int, index)
required
scalee_models dict[int, ScoringModel]

scalee_models[user_id] is a scoring model

required
scaler_models dict[int, ScoringModel]

scaler_models[user_id] is a scoring model

required
privacy PrivacySettings
required

Returns:

Name Type Description
out dict[int, dict[int, tuple[list[float], list[float], list[float], list[float]]]]

out[user][user_bis] is a tuple (ratios, voting_rights, lefts, rights), where ratios is a list of ratios of score differences, and left and right are the left and right ratio uncertainties.

load_all_ratios

load_all_ratios(
    u: int,
    v: int,
    uv_entities: list[int],
    entities: DataFrame,
    u_model: ScoringModel,
    v_model: ScoringModel,
    privacy: Optional[PrivacySettings],
) -> Optional[
    tuple[
        list[float], list[float], list[float], list[float]
    ]
]

sample_ratios

sample_ratios(
    u: int,
    v: int,
    uv_entities: list[int],
    entities: DataFrame,
    u_model: ScoringModel,
    v_model: ScoringModel,
    privacy: Optional[PrivacySettings],
) -> tuple[
    list[float], list[float], list[float], list[float]
]

compute_multiplicators

compute_multiplicators(
    voting_rights: dict[int, list[float]],
    ratios: dict[int, list[float]],
    uncertainties: dict[int, list[float]],
    model_norms: dict[int, float],
) -> dict[int, tuple[float, float]]

Computes the multiplicators of users with given user_ratios

Parameters:

Name Type Description Default
ratios dict[int, list[float]]

ratios[u][0] is a list of voting rights ratios[u][1] is a list of ratios ratios[u][2] is a list of (symmetric) uncertainties

required
model_norms dict[int, float]

model_norms[u] estimates the norm of user u's score model

required

Returns:

Name Type Description
multiplicators dict[int, tuple[float, float]]

multiplicators[user][0] is the multiplicative scaling of user multiplicators[user][1] is the uncertainty on the multiplicator

compute_entity_diffs

compute_entity_diffs(
    scalee_models: dict[int, ScoringModel],
    scaler_models: dict[int, ScoringModel],
    entities: DataFrame,
    privacy: PrivacySettings,
    multiplicators: dict[int, tuple[float, float]],
) -> dict[
    int,
    dict[
        int,
        tuple[
            list[float],
            list[float],
            list[float],
            list[float],
        ],
    ],
]

Computes the differences of scores, with uncertainties, for shared entities of any pair of scalers (\(s_{uvef}\) in paper). Note that the output is given as a 1-dimensional np.ndarray without any reference to e and f.

Parameters:

Name Type Description Default
scalee_models dict[int, ScoringModel]
required
scaler_models dict[int, ScoringModel]
required
entities DataFrame
  • entity_id (int, index)
required
multiplicators dict[int, tuple[float, float]]

multiplicators[user][0] is the multiplicative scaling of user multiplicators[user][1] is the uncertainty on the multiplicator

required

Returns:

Name Type Description
out dict[int, dict[int, tuple[list[float], list[float], list[float]]]]

out[user][user_bis] is a tuple (differences, voting_rights, lefts, rights).

compute_translations

compute_translations(
    voting_rights: dict[int, list[float]],
    diffs: dict[int, list[float]],
    uncertainties: dict[int, list[float]],
) -> dict[int, tuple[float, float]]

Computes the translation of users with given diffs.

Returns:

Name Type Description
translations dict[int, tuple[float, float]]

translations[user][0] is the multiplicative scaling of user translations[user][1] is the uncertainty on the translation

to_json

to_json()

QuantileShift

QuantileShift(
    quantile: float = 0.15,
    *,
    target_score: float = 0.0,
    lipschitz: float = 0.1,
    error: float = 1e-05
)

Bases: Scaling

Shifts the scores so that their quantile (computed with qr_quantile) value equals target_score.

Parameters:

Name Type Description Default
quantile float
0.15
target_score float
0.0
lipschitz float
0.1
error float
1e-05

quantile

quantile = quantile

target_score

target_score = target_score

lipschitz

lipschitz = lipschitz

error

error = error

__call__

__call__(
    user_models: Mapping[int, ScoringModel],
    users: DataFrame,
    entities: DataFrame,
    voting_rights: VotingRights,
    privacy: PrivacySettings,
) -> dict[int, ScaledScoringModel]

Returns scaled user models

Parameters:

Name Type Description Default
user_models Mapping[int, ScoringModel]

user_models[user] is user's scoring model

required
users DataFrame
  • user_id (int, index)
  • trust_score (float)
required
entities DataFrame
  • entity_id (int, ind)
required
voting_rights VotingRights

voting_rights[user, entity]: float

required
privacy PrivacySettings

privacy[user, entity] in { True, False, None }

required

Returns:

Type Description
out[user]: ScoringModel

Will be scaled by the Scaling method

to_json

to_json()

Standardize

Standardize(
    dev_quantile: float = 0.9,
    lipschitz: float = 0.1,
    error: float = 1e-05,
)

Bases: Scaling

dev_quantile

dev_quantile = dev_quantile

lipschitz

lipschitz = lipschitz

error

error = error

__call__

__call__(
    user_models: dict[int, ScoringModel],
    users: DataFrame,
    entities: DataFrame,
    voting_rights: VotingRights,
    privacy: PrivacySettings,
)

to_json

to_json()