22

I have a dataframe that among other things, contains a column of the number of milliseconds passed since 1970-1-1. I need to convert this column of ints to timestamp data, so I can then ultimately convert it to a column of datetime data by adding the timestamp column series to a series that consists entirely of datetime values for 1970-1-1.

I know how to convert a series of strings to datetime data (pandas.to_datetime), but I can't find or come up with any solution to convert the entire column of ints to datetime data OR to timestamp data.

Austin Capobianco
  • 483
  • 1
  • 4
  • 18

2 Answers2

29

You can specify the unit of a pandas.to_datetime call.

Stolen from here:

# assuming `df` is your data frame and `date` is your column of timestamps

df['date'] = pandas.to_datetime(df['date'], unit='s')

Should work with integer datatypes, which makes sense if the unit is seconds since the epoch.

tdy
  • 229
  • 2
  • 9
R Hill
  • 1,095
  • 10
  • 19
0

I have this Int Columns below:

import pandas as pd
import numpy as np
dplyr_1.dtypes
year             int64
 dplyr           int64
  data.table     int64
   pandas        int64
 apache-spark    int64
dtype: object

Convert the Int column to string:

dplyr_1.year = dplyr_1.year.astype(str)
dplyr_1.dtypes
year             object
 dplyr            int64
  data.table      int64
   pandas         int64
 apache-spark     int64
dtype: object

Make sure to convert the column to str or the output column will be Timestamp('1970-01-01 00:00:00.000002010')

dplyr_1.year = pd.to_datetime(dplyr_1.year)
dplyr_1.year[0]

Timestamp('2010-01-01 00:00:00')

So this is the Timestamp datatype. If you want to check all dtypes and the output column:

dplyr_1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   year           12 non-null     datetime64[ns]
 1    dplyr         12 non-null     int64         
 2     data.table   12 non-null     int64         
 3      pandas      12 non-null     int64         
 4    apache-spark  12 non-null     int64         
dtypes: datetime64[ns](1), int64(4)
memory usage: 608.0 bytes

dplyr_1.year
0    2010-01-01
1    2011-01-01
2    2012-01-01
3    2013-01-01
4    2014-01-01
5    2015-01-01
6    2016-01-01
7    2017-01-01
8    2018-01-01
9    2019-01-01
10   2020-01-01
11   2021-01-01
Name: year, dtype: datetime64[ns]
rubengavidia0x
  • 267
  • 2
  • 15