Stand-alone utility functions

chiptools.tools.drop_emptier_dups(df)

Remove any rows with duplicate indices, keeping the first row with most non null columns. Takes a pandas DataFrame and returns the sorted DataFrame

Parameters:df – the pandas DataFrame to remove duplicated rows from
Returns:df; the sorted DataFrame with duplicated rows removed
>>> df = pd.DataFrame(index=([pd.to_datetime('2019-04-02 11:00:00')]*3), columns=['A','B','C'], data=[[1, 2, NaN], [3, NaN, 4], [NaN, 5, NaN]])
>>> print(df)
                       A    B    C
2019-04-02 11:00:00  1.0  2.0  NaN
2019-04-02 11:00:00  3.0  NaN  4.0
2019-04-02 11:00:00  NaN  5.0  NaN
>>> drop_emptier_dups(df)
                       A    B   C
2019-04-02 11:00:00  1.0  2.0 NaN
chiptools.tools.single_val_cols_to_dict(df, single_value_dict=None, dict_name=None, count_na=True, inplace=True)

Remove any columns which only have a single value for all rows, storing those single values in a dictionary which uses the column name as key. Takes a pandas DataFrame and returns a DataFrame with removed columns and a dict

Intended to preserve the static values from observations in a dictionary and reduce complexity of a DataFrame

Parameters:
  • df – the pandas DataFrame to remove single value columns from
  • single_value_dict – the dictionary to add removed column values to
  • dict_name – if set store the dict_name in the index.name attribute of the df and save old index.name to the dict
  • count_na – True, if you want to count a single value with na’s as 2 values
  • inplace – True, makes changes to df DataFrame
Returns:

df, single_value_dict; the DataFrame with single value columns removed and the dict

>>> df = pd.DataFrame(index=pd.date_range('2019-04-02 11:00:00', periods=3, freq='1H'), columns=['col1','col2','col3'], data=[[1.0, 2.0, 3.0], [1.0, 4.0, 3.0], [1.0, 6.0, 3.0]])
>>> print(df)
                    col1 col2 col3
2019-04-02 11:00:00  1.0  2.0  3.0
2019-04-02 12:00:00  1.0  4.0  3.0
2019-04-02 13:00:00  1.0  6.0  3.0
>>> single_val_cols_to_dict(df)
                    col2
2019-04-02 11:00:00  2.0
2019-04-02 12:00:00  4.0
2019-04-02 13:00:00  6.0
{'col1': 1.0, 'col3', 3.0}