Skip to content

wip: utcnow#163

Merged
utnapischtim merged 4 commits into
inveniosoftware:masterfrom
utnapischtim:fix-utcnow-with-custom-type
Jan 29, 2026
Merged

wip: utcnow#163
utnapischtim merged 4 commits into
inveniosoftware:masterfrom
utnapischtim:fix-utcnow-with-custom-type

Conversation

@utnapischtim

Copy link
Copy Markdown
Contributor

No description provided.

@utnapischtim utnapischtim force-pushed the fix-utcnow-with-custom-type branch from e02895c to 96a4a9f Compare September 27, 2025 20:44

@fenekku fenekku left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed needed. I had started doing some utc related testing in our modules for stats. If anything comes up, I'll share.

@utnapischtim

Copy link
Copy Markdown
Contributor Author

@fenekku
could you please have another look. this package has to be reviewed carefully. i am not sure if this works really as intended. i see problems with migration from before utc awareness and after. the problem i see is that we change from some sort of local timestamps to utc timestamps and in the aggregation process it could miss some entries because of that.

further i have a in my opinion breaking change in the mappings.

@utnapischtim utnapischtim force-pushed the fix-utcnow-with-custom-type branch from 66db753 to 531820f Compare October 2, 2025 20:04
* datetime.datetime.utcnow() is deprecated and scheduled for removal in
  a future version. Use timezone-aware objects to represent datetimes in
  UTC: datetime.datetime.now(datetime.UTC).

* the decision was made to move to utc aware timestamps

BREAKING CHANGE: change of mapping
@utnapischtim utnapischtim marked this pull request as ready for review January 13, 2026 13:07
@utnapischtim utnapischtim force-pushed the fix-utcnow-with-custom-type branch from 531820f to 834d74f Compare January 13, 2026 13:16
@fenekku

fenekku commented Jan 13, 2026

Copy link
Copy Markdown
Contributor

I'll take another look tomorrow. If the problem existed beforehand then this PR should be off the hook...

@fenekku fenekku left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I've just laid out what I understood so you can check if that's what you meant. This way we are on the same page. Some comments could help to explain some things because date/time is tricky. Now that we are storing UTC it should be simpler though.

Don't forget to document transition in v14 release notes.

"mapping": {
"type": "date",
"format": "strict_date_hour_minute_second"
"format": "strict_date_optional_time"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are on the same page, this will allow: "2026-01-14" or "2026-01-14T01:13:44.123456789-04:00", whereas previous only allowed "2026-01-14T01:13:44". A migration will be needed to convert those. Then and in the future, because all documents are saved with the same timezone (UTC) we should be good when aggregating.

It is possible to set "strict_date_hour_minute_second||strict_date_optional_time" to keep both formats but aggregation will be affected for transition period and it becomes harder to reason about how dates are stored and what that will do. So "strict_date_optional_time" is fine by me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't respond on the A migration will be needed to convert those. as i understand the whole event generation and aggregation steps, we only have to update the template which is used to create the events for file download and record view so file-download-v1.json and record-view-v1.json, because strict_date_hour_minute_seconds is too restrictive, but strict_date_optional_time allowes the +00:00 timezone addition which is now provided from the rest of the system.

the aggregation instead always allowed the +00:00 because it is already on strict_date_optional_time see and see . as i understand the whole things, this means, that events created before of the last aggregation step can be processed and the new-ones with +00:00 can be processed too.

{
# When:
"timestamp": datetime.datetime.utcnow().isoformat(),
"timestamp": datetime.datetime.now(datetime.timezone.utc).isoformat(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A migration of the OpenSearch mapping(s) will have to be done before this code runs like you've mentioned. Document the steps in the docs-invenio-rdm release notes for v14 (that were used for demo site if I recall correctly) and that transition should be good.

last_date = datetime.fromisoformat(last_update_aggr.rstrip("Z"))
last_date = datetime.fromisoformat(
last_update_aggr.rstrip("Z")
).replace(tzinfo=timezone.utc)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Concurs with my understanding that the previous stored date / date at this time, despite not including timezone information, was considered to be UTC already. So being explicit and using replace is fine.

Comment thread invenio_stats/bookmark.py
bookmark = next(iter(query_bookmark.execute()), None)
if bookmark:
try:
my_date = datetime.fromisoformat(bookmark.date)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for even more clarity a comment here about how the bookmark.date is stored/formatted as a datetime. This would corroborate the comment below about the except case for when the bookmark.date used to be stored as a date only.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

date is stored as date_optional_time here

Comment thread invenio_stats/bookmark.py
if refresh_time:
my_date -= timedelta(seconds=refresh_time)
return my_date
return my_date.replace(tzinfo=timezone.utc)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apriori this should be fine. But if going forward the bookmark.date is stored with UTC timezone, then this should not be necessary. (maybe part of the migration script updates the bookmark date too to be sure)

if len(result) == 0:
return None
return parser.parse(result[0]["timestamp"])
return parser.parse(result[0]["timestamp"]).replace(tzinfo=timezone.utc)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More as a note to self: I checked implications with respect to using pytz.UTC.localize vs replace(tzinfo=timezone.utc) and replace() is fine for us. UTC doesn't have daylight time transitions so localize() would not really differ. The transition to storing with UTC will also fix cases of time springing forward or back causing events that are 1h apart to seem to have occurred at same time for example.



class NewDate(datetime.datetime):
class NewDate(datetime):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to use https://time-machine.readthedocs.io/en/latest/ for these kinds of tests. (it's fine as-is, just mentioning if not aware)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i try to avoid third party as much as possible and do what ever is possible with the stdlib.

@utnapischtim utnapischtim force-pushed the fix-utcnow-with-custom-type branch from 77efc86 to 01a0fa6 Compare January 29, 2026 19:26
@utnapischtim utnapischtim merged commit cea19ce into inveniosoftware:master Jan 29, 2026
3 checks passed
@utnapischtim utnapischtim deleted the fix-utcnow-with-custom-type branch January 29, 2026 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants