feat: visitor classification for robot/machine/datacenter filtering#174
Open
slint wants to merge 1 commit into
Open
feat: visitor classification for robot/machine/datacenter filtering#174slint wants to merge 1 commit into
slint wants to merge 1 commit into
Conversation
Route the event preprocessors through a single counter-robots classifier, exposed as the cached current_stats.visitor_classifier property on the extension state (built lazily, like the other cached properties). It is built from STATS_VISITOR_CLASSIFIER (an import path or app -> Classifier); the default factory, default_visitor_classifier in ext.py, composes the COUNTER baseline with the extended preset and, when STATS_VISITOR_ASN_DB points at a GeoLite2-ASN mmdb, a maxminddb-backed ASN resolver. flag_robots / flag_machines keep setting is_robot / is_machine through it. Add exclude_datacenter_browser, which drops events whose user agent looks like a browser but whose IP resolves to a datacenter/hosting ASN (automation faking a browser from cloud infrastructure). It only excludes: it returns None or the document unchanged and writes nothing to the event. It must run before anonymize_user, which removes ip_address. invenio-stats holds no robot or ASN lists. Requires counter-robots>=2026.6.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Depends on inveniosoftware/counter-robots#20
Route the event preprocessors through a single counter-robots classifier, exposed
as the cached current_stats.visitor_classifier property on the extension state
(built lazily, like the other cached properties). It is built from
STATS_VISITOR_CLASSIFIER (an import path or app -> Classifier); the default factory,
default_visitor_classifier in ext.py, composes the COUNTER baseline with the
extended preset and, when STATS_VISITOR_ASN_DB points at a GeoLite2-ASN mmdb, a
maxminddb-backed ASN resolver. flag_robots / flag_machines keep setting is_robot /
is_machine through it.
Add exclude_datacenter_browser, which drops events whose user agent looks like a
browser but whose IP resolves to a datacenter/hosting ASN (automation faking a
browser from cloud infrastructure). It only excludes: it returns None or the
document unchanged and writes nothing to the event. It must run before
anonymize_user, which removes ip_address.
invenio-stats holds no robot or ASN lists. Requires counter-robots>=2026.6.