The Problem:¶

Electric Vehicle Detection¶

The training set contains two months of smart meter power readings from 1590 houses. The readings were taken at half-hour intervals. Some of the homes have electric vehicles and some do not. The file "EV_train_labels.csv" indicates the time intervals on which an electric vehicle was charging (1 indicates a vehicle was charging at some point during the interval and 0 indicates no vehicle was charging at any point during the interval). Can you determine: A. Which residences have electric vehicles? B. When the electric vehicles were charging? C. Any other interesting aspects of the dataset? A solution to part B might consist of a prediction of the probability that an electric car was charging for each house and time interval in the test set.

Problem source and dataset: https://www.gridcure.com/contact/
Code: https://github.com/mattgorb/ElectricVehicleDetection

Machine Learning techniques used:¶

  • Ensembling
  • Binary classification using a simple dense neural network for problem A
  • U-Net-Convolutional Neural Network typically used for image segmentation for problem B
  • Feature adding-Added mean, std, median to columns for binary classification
  • Custom loss function-tested binary accuracy with an adjustable threshold level between 0.4 and 0.6
  • Downsampling-Trained binary model with equal (or near equal) proportions of 1's and 0's.
  • Stratified K-Fold cross validation-used (k[i]=1)>x with a k=10 split to generate final guess
  • Feature scaling-0-1 scale for each time series row, standard scale (0 mean, 1 variance) for mean, std, median columns
  • Confusion matrix
  • Keras callbacks: ReduceLROnPlateau, ModelCheckpoint saving only best weights, EarlyStopping

A high level overview of my final approach¶

Classification assumption: 1=House has an electric vehicle, 0=House does not has an electric vehicle.

  • Used stratified KFold binary classification to train a skewed dataset that had more 1's than 0's. The validation set contained proportions equal to the actual dataset (30.5% Houses with EV) Result: Guessed a high percentage of houses with EV in test set correctly, but guessed many houses to have electric vehicles that in reality don't.
  • Trained another KFold binary model on data skewed with more 0's than 1's. The validation set contained proportions equal to the actual dataset (30.5% Houses with EV). Result: Guessed a high percentage of houses without EV in test set correctly, but guessed many houses with EV incorrectly.
  • Meta assembling: If both models guessed 1, output 1. If both models guessed 0, output 0. If step 2 model guessed 1 and step 1 model guessed 0, output 0. If step 1 model guessed 1 and step 2 model guessed 0, take the average of all 20 stratified KFold predictions, and if the average is greater than 0.2, output 1, else 0.

Test set output for each model and stacked model:

Model 1: More 1s than 0s
Percent correct=66.1%
Percent 1's correct=95.1%
Predicted: 0 1
Actual:
0 125 109
1 5 97
Model 2: More 0s than 1s
Percent correct=78.5%
Percent 0's correct=81.6%
Predicted: 0 1
Actual:
0 191 43
1 29 73
Stacked models with averaged KFolds>.2=1
Percent correct=70.8%
Percent 1's correct=91.1%
Predicted: 0 1
Actual:
0 145 89
1 9 93

I could have used model 2 to get a fairly decent overall accuracy, but I chose to simply strip out as many possible 0's as possible without sacrificing too many 1's so that I could send these to the next model.

  • After stacking the models, my idea was to strip out the data where 1 was predicted and send it to a UNet. I found that a UNet had a pretty good prediction accuracy when I trained the model on only rows that contained 1's. However, if I trained the model on data where 69.5% didn't have any 1's, it did not fair as well. After my pains getting a higher accuracy in binary classification, my method was to send the UNet model roughly 50/50 data. It didn't fair as well as when training the model on only rows that contained 1's, but it was better than training it on 69.5/30.5 data.

For this page I'm only going to copy relevant lines of code with comments inline.¶

Data¶

The code below is represented in the method "common_data_load" in loaddata.py.

In [16]:
import pandas as pd

#Load data
df_train = pd.read_csv('EV_train.csv', index_col=0)

#Scale data in 0-1 range for neural network
scaler = MinMaxScaler(feature_range=(0, 1))
df_train_scaled = scaler.fit_transform(df_train.values.T).T

#labels data
df_labels=pd.read_csv('EV_train_labels.csv', index_col=0)
#binary label (0 or 1)
ev_label_all=df_labels.max(axis=1).rename('Has_EV')

#split records between rows that contain a 1 and rows that don't contain a 1
ev_label_1=ev_label_all.loc[ev_label_all.values==1]
ev_label_0=ev_label_all.loc[ev_label_all.values==0]
ev_label_all=pd.DataFrame(ev_label_all)

#merge binary classification labels with scaled energy data
df_binary_train=pd.merge(ev_label_all,df_train_scaled,left_index=True, right_index=True)

#add features mean, std, median, scaled on the columns
df_binary_train_unscaled=pd.merge(df_binary_train,df_train,left_index=True, right_index=True)
df_binary_train_unscaled['mean'] = df_binary_train_unscaled[df_binary_train_unscaled.columns[2:]].values.mean(axis=1)
df_binary_train_unscaled['std'] = df_binary_train_unscaled[df_binary_train_unscaled.columns[2:-1]].values.std(axis=1)
df_binary_train_unscaled['median'] = np.median(df_binary_train_unscaled[df_binary_train_unscaled.columns[2:-2]].values,axis=1)

#use standard scale across columns
scaler = StandardScaler()
df_train_scaled_meanstd = scaler.fit_transform(df_binary_train_unscaled[df_binary_train_unscaled.columns[-3:]].values)
Percent of houses with EV: 30.5031446541

After the common data load, I decided to load equal proportions of houses with EV's and houses without EV's into the training set with the method train_data_load_oversampled_1(proportion=1).

I set the proportion parameter for various tests in an attempt to deal with the skewed data and realized this would only skew my guesses for the test set proportionately to the proportion variable in the training set. Here are a few results in confusion matrix format:

Proportion=1
Percent correct=77.2%
Percent 1's correct=76.4%
Predicted: 0 1
Actual:
0 156 45
1 21 68
Proportion=1.4
Percent correct=77.8%
Percent 1's correct=66.2%
Predicted: 0 1
Actual:
0 169 35
1 30 59
Proportion=.6
Percent correct=73.8%
Percent 1's correct=85.3%
Predicted: 0 1
Actual:
0 172 78
1 16 93

Proportion=1.4 is the closest thing to the real dataset. As expected, it gets the highest percentage of guesses correct. However, it doesn't do well in predicting whether a house has an electric vehicle. This is because the model learned that if it guesses that a house does not have an electric vehicle, it will do better.

In [ ]:
#grab 1 labels and 1*proportion 0 labels, load rest of the dataset set that contains those IDs, and shuffle
undersampled_labels=pd.concat([ev_label_1[:int(len(ev_label_1))], ev_label_0[:int(len(ev_label_1)*proportion)]])
und=df_binary_train_scaled_meanstd.loc[undersampled_labels.index]
und=shuffle(und)

For the test set, I'm always going to be testing with proportions equal to the actual training set. That is, 30.5% of the set has electric vehicles. Here is some code to achieve that:

In [ ]:
#get number of 0's and 1's in Ytest (currently equal to 20% of undersampled portion)
unique, counts = np.unique(Ytest, return_counts=True)		
ev_0_prop=float(len(ev_label_0))/float((len(ev_label_1)+len(ev_label_0)))
ev_1_prop=float(len(ev_label_1))/float((len(ev_label_1)+len(ev_label_0)))
No_EV_to_add=int((int(-ev_1_prop*counts[0])+int(ev_0_prop*counts[1]))/0.3)

#the leftover_labels variable is all Id's with no EV
self.leftover_labels=shuffle(self.leftover_labels)
newTestDataframe=pd.DataFrame(df_binary_train_scaled_meanstd.loc[self.leftover_labels.index[0:No_EV_to_add]].values, index=self.leftover_labels.index[0:No_EV_to_add])

Training¶

K-Fold Stratified Cross Validation using sklearn's StratifiedKFold class and a dense keras neural network with RMSprop optimization. Keras callbacks are in the model.fit() function.

In [ ]:
kfold = StratifiedKFold(n_splits=k, shuffle=True, random_state=42)
for train, test in kfold.split(X,y):
    model.fit(X[train], y[train], validation_data=(X[test], y[test]),nb_epoch=50,shuffle=True, batch_size=batch_size, verbose=0,callbacks=[ReduceLROnPlateau(monitor='val_binary_accuracy_adjusted_threshold',factor=0.5,patience=3,verbose=0,epsilon=1e-4,mode='max'),
         ModelCheckpoint(monitor='val_binary_accuracy_adjusted_threshold',filepath=str(path)+'/weights_'+str(fold)+'.h5', save_best_only=True,save_weights_only=True,mode='max', verbose=0), TensorBoard(log_dir='logs'),EarlyStopping(monitor='val_binary_accuracy_adjusted_threshold', min_delta=0, patience=10, verbose=0, mode='auto')])
    
Binary model:
def binary_accuracy_adjusted_threshold(y_true, y_pred):
	threshold=0.5
	y_pred=K.cast(K.greater(y_pred, threshold),K.floatx())
	return K.mean(K.equal(y_true, y_pred))


def create_model():
	model = Sequential()
	model.add(Dense(128, input_dim=1026, kernel_initializer='normal', activation='relu'))
	model.add(Dropout(0.5))
	model.add(Dense(128, input_dim=1026, kernel_initializer='normal', activation='relu'))
	model.add(Dropout(0.5))
	model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))
	model.compile(loss='binary_crossentropy', optimizer=RMSprop(lr=0.005), metrics=[binary_accuracy_adjusted_threshold],verbose=1)
	return model

For the predictions on the test set, I decided to iterate through various KFolds and thresholds. Typically in this scenario, you would just use KFolds=5 and threshold=0.5, but I decided to see if the models performed better at other values.

Since I ran this with n_splits=10, I had 10 weight files, one for each model. So after making ten predictions for each row of the test set, I would set the final guess for each row based on the number of predictions that had a final guess greater than the threshold.

For example, suppose for one row I had 4 guesses greater than 0.6, 6 guesses greater than 0.5, and 8 guesses greater than 0.4. If the threshold=0.5 with KFolds=5, then my final output would be 1. Otherwise it would be 0. If my threshold=0.4 and Kfold=9, my output would be 0. If I had one more guess greater than 0.4 in this case, my final output would be 1.

Below are a few lines of code that demonstrate this approach.

In [ ]:
#for each weight file in the directory (equals the number of splits for KFold)
for i in os.listdir(path):
    model.load_weights(path+i)
    preds_all=[]

    for start in range(0, len(totals_x), batch_size):
        end = min(start + batch_size, len(totals_x))
        Xtestbatch = totals_x.iloc[start:end].values
        preds=model.predict_on_batch(Xtestbatch)[:,0].tolist()
        #add predictions on the batch to the entire test sets predictions
        preds_all.extend(preds)
    #add a column to the dataframe for this model's predictions 
    p=pd.DataFrame(preds_all, columns=['Predictions_'+str(i)],index=totals_x.index)
    #add this model's predictions to the rest of the predictions for each set of weights. 
    totals=pd.merge(totals,p,left_index=True, right_index=True)

#compute for each prediction whether it was greater than the threshold value and sum.  
totals['sum_gt_threshold'] = (totals[totals.columns[1:]]>threshold).sum(axis=1)
#boolean that sets 1 if the sum of "sum_gt_threshold" column was greater than the number right. 
totals['final_bool'] = np.where(totals['sum_gt_threshold']>=number_right, '1', '0')	
#Compares guess to actual output, used for confusion matrix
totals['final_compare']=pd.to_numeric(totals.final_bool)==pd.to_numeric(totals.Y)

confusion_matrix= pd.crosstab(totals['Y'].values, totals['final_bool'].values, rownames=['Actual EV'], colnames=['Predicted EV'])

After finding the best threshold and KFold values for each model, I concatenate the predictions together into a final pandas table. If the final prediction is "1", I send it to the UNet to train. On average, about half of these predictions are wrong, and the house really should be "0", but I will let the UNet learn that.

UNet¶

Here is the code for my UNet, used for predicting whether a given time period has an electric vehicle charging. This model is typically used for image segmentation, but I found that the problems were similar. I used dice loss with an RMSprop optimization function for this model.

In [ ]:
def dice_coeff(y_true, y_pred):
    smooth = 1.
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    score = (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
    return score

def bce_dice_loss(y_true, y_pred):
    loss = binary_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred)
    return loss

def get_unet(num_classes=1):
    inputs = Input((1024,1))
    # 1024
    down0b = Conv1D(1, 3, padding='same')(inputs)
    down0b = BatchNormalization()(down0b)
    down0b = Activation('relu')(down0b)
    down0b = Conv1D(1, 3, padding='same')(down0b)
    down0b = BatchNormalization()(down0b)
    down0b = Activation('relu')(down0b)
    down0b_pool = MaxPooling1D(1, strides=1)(down0b)
    # 512

    down0a = Conv1D(2, 3, padding='same')(down0b_pool)
    down0a = BatchNormalization()(down0a)
    down0a = Activation('relu')(down0a)
    down0a = Conv1D(2, 3, padding='same')(down0a)
    down0a = BatchNormalization()(down0a)
    down0a = Activation('relu')(down0a)
    down0a_pool = MaxPooling1D(1, strides=2)(down0a)
    # 256

    down0 = Conv1D(4, 3, padding='same')(down0a_pool)
    down0 = BatchNormalization()(down0)
    down0 = Activation('relu')(down0)
    down0 = Conv1D(4, 3, padding='same')(down0)
    down0 = BatchNormalization()(down0)
    down0 = Activation('relu')(down0)
    down0_pool = MaxPooling1D(1, strides=2)(down0)
    # 128

    down1 = Conv1D(8, 3, padding='same')(down0_pool)
    down1 = BatchNormalization()(down1)
    down1 = Activation('relu')(down1)
    down1 = Conv1D(8, 3, padding='same')(down1)
    down1 = BatchNormalization()(down1)
    down1 = Activation('relu')(down1)
    down1_pool = MaxPooling1D(1, strides=2)(down1)
    # 64

    down2 = Conv1D(16, 3, padding='same')(down1_pool)
    down2 = BatchNormalization()(down2)
    down2 = Activation('relu')(down2)
    down2 = Conv1D(16, 3, padding='same')(down2)
    down2 = BatchNormalization()(down2)
    down2 = Activation('relu')(down2)
    down2_pool = MaxPooling1D(1, strides=2)(down2)
    # 32

    down3 = Conv1D(32, 3, padding='same')(down2_pool)
    down3 = BatchNormalization()(down3)
    down3 = Activation('relu')(down3)
    down3 = Conv1D(32, 3, padding='same')(down3)
    down3 = BatchNormalization()(down3)
    down3 = Activation('relu')(down3)
    down3_pool = MaxPooling1D(1, strides=2)(down3)
    # 16

    down4 = Conv1D(64, 3, padding='same')(down3_pool)
    down4 = BatchNormalization()(down4)
    down4 = Activation('relu')(down4)
    down4 = Conv1D(64, 3, padding='same')(down4)
    down4 = BatchNormalization()(down4)
    down4 = Activation('relu')(down4)
    down4_pool = MaxPooling1D(1, strides=2)(down4)
    # 8

    center = Conv1D(128, 3, padding='same')(down4_pool)
    center = BatchNormalization()(center)
    center = Activation('relu')(center)
    center = Conv1D(128, 3, padding='same')(center)
    center = BatchNormalization()(center)
    center = Activation('relu')(center)
    # center

    up4 = UpSampling1D(2)(center)
    up4 = concatenate([down4, up4], axis=2)
    up4 = Conv1D(64, 3, padding='same')(up4)
    up4 = BatchNormalization()(up4)
    up4 = Activation('relu')(up4)
    up4 = Conv1D(64, 3, padding='same')(up4)
    up4 = BatchNormalization()(up4)
    up4 = Activation('relu')(up4)
    up4 = Conv1D(64, 3, padding='same')(up4)
    up4 = BatchNormalization()(up4)
    up4 = Activation('relu')(up4)
    # 16

    up3 = UpSampling1D(2)(up4)
    up3 = concatenate([down3, up3], axis=2)
    up3 = Conv1D(32, 3, padding='same')(up3)
    up3 = BatchNormalization()(up3)
    up3 = Activation('relu')(up3)
    up3 = Conv1D(32, 3, padding='same')(up3)
    up3 = BatchNormalization()(up3)
    up3 = Activation('relu')(up3)
    up3 = Conv1D(32, 3, padding='same')(up3)
    up3 = BatchNormalization()(up3)
    up3 = Activation('relu')(up3)
    # 32

    up2 = UpSampling1D(2)(up3)
    up2 =concatenate([down2, up2], axis=2)
    up2 = Conv1D(16, 3, padding='same')(up2)
    up2 = BatchNormalization()(up2)
    up2 = Activation('relu')(up2)
    up2 = Conv1D(16, 3, padding='same')(up2)
    up2 = BatchNormalization()(up2)
    up2 = Activation('relu')(up2)
    up2 = Conv1D(16, 3, padding='same')(up2)
    up2 = BatchNormalization()(up2)
    up2 = Activation('relu')(up2)
    # 64

    up1 = UpSampling1D(2)(up2)
    up1 = concatenate([down1, up1], axis=2)
    up1 = Conv1D(8, 3, padding='same')(up1)
    up1 = BatchNormalization()(up1)
    up1 = Activation('relu')(up1)
    up1 = Conv1D(8, 3, padding='same')(up1)
    up1 = BatchNormalization()(up1)
    up1 = Activation('relu')(up1)
    up1 = Conv1D(8, 3, padding='same')(up1)
    up1 = BatchNormalization()(up1)
    up1 = Activation('relu')(up1)
    # 128

    up0 = UpSampling1D(2)(up1)
    up0 = concatenate([down0, up0], axis=2)
    up0 = Conv1D(4, 3, padding='same')(up0)
    up0 = BatchNormalization()(up0)
    up0 = Activation('relu')(up0)
    up0 = Conv1D(4, 3, padding='same')(up0)
    up0 = BatchNormalization()(up0)
    up0 = Activation('relu')(up0)
    up0 = Conv1D(4, 3, padding='same')(up0)
    up0 = BatchNormalization()(up0)
    up0 = Activation('relu')(up0)
    # 256

    up0a = UpSampling1D(2)(up0)
    up0a = concatenate([down0a, up0a], axis=2)
    up0a = Conv1D(2, 3, padding='same')(up0a)
    up0a = BatchNormalization()(up0a)
    up0a = Activation('relu')(up0a)
    up0a = Conv1D(2, 3, padding='same')(up0a)
    up0a = BatchNormalization()(up0a)
    up0a = Activation('relu')(up0a)
    up0a = Conv1D(2, 3, padding='same')(up0a)
    up0a = BatchNormalization()(up0a)
    up0a = Activation('relu')(up0a)
    # 512

    up0b = UpSampling1D(1)(up0a)
    up0b = concatenate([down0b, up0b], axis=2)
    up0b = Conv1D(1, 3, padding='same')(up0b)
    up0b = BatchNormalization()(up0b)
    up0b = Activation('relu')(up0b)
    up0b = Conv1D(1, 3, padding='same')(up0b)
    up0b = BatchNormalization()(up0b)
    up0b = Activation('relu')(up0b)
    up0b = Conv1D(1, 3, padding='same')(up0b)
    up0b = BatchNormalization()(up0b)
    up0b = Activation('relu')(up0b)
    # 1024

    classify = Conv1D(num_classes, 1, activation='sigmoid')(up0b)

    model = Model(inputs=inputs, outputs=classify)
    
    model.compile(optimizer=RMSprop(lr=0.01), loss=bce_dice_loss, metrics=[dice_coeff])
    return model

Training on the UNet usually gives me around a 70% dice coefficient. This isn't horrible for trying to predict 2880 columns of data. After training on the UNet, I send the test data through all three models whose parameters have been tuned on the training set. If the binary model predicts "0", I fill in the final result dataset with 2880 columns of 0's. If the binary model predicts "1", I send the model through the UNet and predict each of the 2880 columns with the Convolutional UNet.

In summary, the limited dataset doesn't give me a ton of confidence in the performance of this model. I have had success in the past with more data, but I did what I could here.
Below are a few improvement ideas. On top of fixing up the code more, I could have done several things different including using a different modeling approach. Still, this was a great learning exercise on my first skewed dataset. And a lot of fun!!!

Improvement Idea's¶

  • Tuning parameters on validation set and then training on full data for test set with tuned parameters
  • Stratified KFold on UNet model
  • Recurrent neural network for time series analysis, or another deep learning or machine learning model
  • Obtaining more data
  • Study column charging frequencies to get a better understanding of when charging is most likely to occur