wip: utcnow#163
Conversation
e02895c to
96a4a9f
Compare
fenekku
left a comment
There was a problem hiding this comment.
Indeed needed. I had started doing some utc related testing in our modules for stats. If anything comes up, I'll share.
96a4a9f to
66db753
Compare
|
@fenekku further i have a in my opinion breaking change in the mappings. |
66db753 to
531820f
Compare
* datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC). * the decision was made to move to utc aware timestamps BREAKING CHANGE: change of mapping
531820f to
834d74f
Compare
|
I'll take another look tomorrow. If the problem existed beforehand then this PR should be off the hook... |
fenekku
left a comment
There was a problem hiding this comment.
LGTM. I've just laid out what I understood so you can check if that's what you meant. This way we are on the same page. Some comments could help to explain some things because date/time is tricky. Now that we are storing UTC it should be simpler though.
Don't forget to document transition in v14 release notes.
| "mapping": { | ||
| "type": "date", | ||
| "format": "strict_date_hour_minute_second" | ||
| "format": "strict_date_optional_time" |
There was a problem hiding this comment.
So we are on the same page, this will allow: "2026-01-14" or "2026-01-14T01:13:44.123456789-04:00", whereas previous only allowed "2026-01-14T01:13:44". A migration will be needed to convert those. Then and in the future, because all documents are saved with the same timezone (UTC) we should be good when aggregating.
It is possible to set "strict_date_hour_minute_second||strict_date_optional_time" to keep both formats but aggregation will be affected for transition period and it becomes harder to reason about how dates are stored and what that will do. So "strict_date_optional_time" is fine by me.
There was a problem hiding this comment.
i didn't respond on the A migration will be needed to convert those. as i understand the whole event generation and aggregation steps, we only have to update the template which is used to create the events for file download and record view so file-download-v1.json and record-view-v1.json, because strict_date_hour_minute_seconds is too restrictive, but strict_date_optional_time allowes the +00:00 timezone addition which is now provided from the rest of the system.
the aggregation instead always allowed the +00:00 because it is already on strict_date_optional_time see and see . as i understand the whole things, this means, that events created before of the last aggregation step can be processed and the new-ones with +00:00 can be processed too.
| { | ||
| # When: | ||
| "timestamp": datetime.datetime.utcnow().isoformat(), | ||
| "timestamp": datetime.datetime.now(datetime.timezone.utc).isoformat(), |
There was a problem hiding this comment.
A migration of the OpenSearch mapping(s) will have to be done before this code runs like you've mentioned. Document the steps in the docs-invenio-rdm release notes for v14 (that were used for demo site if I recall correctly) and that transition should be good.
| last_date = datetime.fromisoformat(last_update_aggr.rstrip("Z")) | ||
| last_date = datetime.fromisoformat( | ||
| last_update_aggr.rstrip("Z") | ||
| ).replace(tzinfo=timezone.utc) |
There was a problem hiding this comment.
👍 Concurs with my understanding that the previous stored date / date at this time, despite not including timezone information, was considered to be UTC already. So being explicit and using replace is fine.
| bookmark = next(iter(query_bookmark.execute()), None) | ||
| if bookmark: | ||
| try: | ||
| my_date = datetime.fromisoformat(bookmark.date) |
There was a problem hiding this comment.
Maybe for even more clarity a comment here about how the bookmark.date is stored/formatted as a datetime. This would corroborate the comment below about the except case for when the bookmark.date used to be stored as a date only.
There was a problem hiding this comment.
date is stored as date_optional_time here
| if refresh_time: | ||
| my_date -= timedelta(seconds=refresh_time) | ||
| return my_date | ||
| return my_date.replace(tzinfo=timezone.utc) |
There was a problem hiding this comment.
Apriori this should be fine. But if going forward the bookmark.date is stored with UTC timezone, then this should not be necessary. (maybe part of the migration script updates the bookmark date too to be sure)
| if len(result) == 0: | ||
| return None | ||
| return parser.parse(result[0]["timestamp"]) | ||
| return parser.parse(result[0]["timestamp"]).replace(tzinfo=timezone.utc) |
There was a problem hiding this comment.
More as a note to self: I checked implications with respect to using pytz.UTC.localize vs replace(tzinfo=timezone.utc) and replace() is fine for us. UTC doesn't have daylight time transitions so localize() would not really differ. The transition to storing with UTC will also fix cases of time springing forward or back causing events that are 1h apart to seem to have occurred at same time for example.
|
|
||
|
|
||
| class NewDate(datetime.datetime): | ||
| class NewDate(datetime): |
There was a problem hiding this comment.
I like to use https://time-machine.readthedocs.io/en/latest/ for these kinds of tests. (it's fine as-is, just mentioning if not aware)
There was a problem hiding this comment.
yeah, i try to avoid third party as much as possible and do what ever is possible with the stdlib.
77efc86 to
01a0fa6
Compare
No description provided.