Saturday, December 16, 2017

could not broadcast input array from shape (20,310,310) into shape (20)

Leave a Comment

I'm trying to detect lung cancer nodules using DICOM files. The main steps in cancer detection included following steps.

1) Preprocessing   * Converting the pixel values to Hounsfield Units (HU)   * Resampling to an isomorphic resolution to remove variance in scanner resolution   *Lung segmentation 2) Training the data set using preprocessed images in Tensorflow CNN 3) Testing and validation 

I followed few online tutorials to do this.

I need to combine the given solutions in

1) https://www.kaggle.com/gzuidhof/full-preprocessing-tutorial
2) https://www.kaggle.com/sentdex/first-pass-through-data-w-3d-convnet.

I could implement the example in link two. But since it is lack ok lung segmentation and few other preprocessing steps I need to combine the steps in link one with link two. But I'm getting number of errors while doing it. Since I'm new to python can someone please help me in solving it.

There are 20 patient folders and each patient folder has number of slices, which are dicom files.

For the process_data method , slices_path of each patient and patient number was sent.

def process_data(slices,patient,labels_df,img_px_size,hm_slices):  try:     label=labels_df.get_value(patient,'cancer')      patient_pixels = get_pixels_hu(slices)      segmented_lungs2, spacing = resample(patient_pixels, slices, [1,1,1])      new_slices=[]      segmented_lung = segment_lung_mask(segmented_lungs2, False)     segmented_lungs_fill = segment_lung_mask(segmented_lungs2, True)     segmented_lungs=segmented_lungs_fill-segmented_lung       #This method returns smallest integer not less than x.     chunk_sizes =math.ceil(len(segmented_lungs)/HM_SLICES)       for slice_chunk in chunks(segmented_lungs,chunk_sizes):              slice_chunk=list(map(mean,zip(*slice_chunk))) #list - []              #print (slice_chunk)             new_slices.append(slice_chunk)      print(len(segmented_lungs), len(new_slices))      if len(new_slices)==HM_SLICES-1:             new_slices.append(new_slices[-1])      if len(new_slices)==HM_SLICES-2:             new_slices.append(new_slices[-1])             new_slices.append(new_slices[-1])       if len(new_slices)==HM_SLICES+2:             new_val =list(map(mean, zip(*[new_slices[HM_SLICES-1],new_slices[HM_SLICES],])))             del new_slices[HM_SLICES]             new_slices[HM_SLICES-1]=new_val      if len(new_slices)==HM_SLICES+1:             new_val =list(map(mean, zip(*[new_slices[HM_SLICES-1],new_slices[HM_SLICES],])))             del new_slices[HM_SLICES]             new_slices[HM_SLICES-1]=new_val      print('LENGTH ',len(segmented_lungs), len(new_slices))     except Exception as e:     # again, some patients are not labeled, but JIC we still want the error if something     # else is wrong with our code     print(str(e))   #print(len(new_slices))     if label==1: label=np.array([0,1]) elif label==0: label=np.array([1,0]) return np.array(new_slices),label 

Main method

    # Some constants      #data_dir = '../../CT_SCAN_IMAGE_SET/IMAGES/'     #patients = os.listdir(data_dir)     #labels_df=pd.read_csv('../../CT_SCAN_IMAGE_SET/stage1_labels.csv',index_col=0)     #patients.sort()     #print (labels_df.head())      much_data=[]     much_data2=[]     for num,patient in enumerate(patients):         if num%100==0:             print (num)          try:              slices = load_scan(data_dir + patients[num])             img_data,label=process_data(slices,patients[num],labels_df,IMG_PX_SIZE,HM_SLICES)             much_data.append([img_data,label])             #much_data2.append([processed,label])         except:             print ('This is unlabeled data')      np.save('muchdata-{}-{}-{}.npy'.format(IMG_PX_SIZE,IMG_PX_SIZE,HM_SLICES),much_data)     #np.save('muchdata-{}-{}-{}.npy'.format(IMG_PX_SIZE,IMG_PX_SIZE,HM_SLICES),much_data2) 

The preprocessing part works fine but when I'm trying to enter the final out put to a Convolutional NN and train the data set , Following is the error I'm receiving including some of the comments that I had put

    0     shape hu     (113, 512, 512)     Resize factor     [ 2.49557522  0.6015625   0.6015625 ]     shape     (282, 308, 308)     chunk size      15     282 19     LENGTH  282 20     Tensor("Placeholder:0", dtype=float32)     ..........1.........     ..........2.........     ..........3.........     ..........4.........     WARNING:tensorflow:From C:\Research\Python_installation\lib\site-packages\tensorflow\python\util\tf_should_use.py:170: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.     Instructions for updating:     Use `tf.global_variables_initializer` instead.     ..........5.........     ..........6.........     Epoch 1 completed out of 20 loss: 0     ..........7.........     Traceback (most recent call last):     File "C:\Research\LungCancerDetaction\sendbox2.py", line 436, in <module>     train_neural_network(x)     File "C:\Research\LungCancerDetaction\sendbox2.py", line 424, in train_neural_network     print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]}))     File "C:\Research\Python_installation\lib\site-packages\tensorflow\python\framework\ops.py", line 606, in eval     return _eval_using_default_session(self, feed_dict, self.graph, session)     File "C:\Research\Python_installation\lib\site-packages\tensorflow\python\framework\ops.py", line 3928, in _eval_using_default_session     return session.run(tensors, feed_dict)     File "C:\Research\Python_installation\lib\site-packages\tensorflow\python\client\session.py", line 789, in run     run_metadata_ptr)     File "C:\Research\Python_installation\lib\site-packages\tensorflow\python\client\session.py", line 968, in _run     np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)     File "C:\Research\Python_installation\lib\site-packages\numpy\core\numeric.py", line 531, in asarray     return array(a, dtype, copy=False, order=order)     ValueError: could not broadcast input array from shape (20,310,310) into shape (20) 

I think it is the issue with the 'segmented_lungs=segmented_lungs_fill-segmented_lung'

In the working example,

segmented_lungs=[cv2.resize(each_slice,(IMG_PX_SIZE,IMG_PX_SIZE)) for each_slice in patient_pixels] 

Please help me in solving this. I'm unable to proceed since some time. If anything is not clear please let me know.

Following is the whole code that had tried.

    import numpy as np # linear algebra     import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)     import dicom     import os     import scipy.ndimage     import matplotlib.pyplot as plt     import cv2     import math     import tensorflow as tf       from skimage import measure, morphology     from mpl_toolkits.mplot3d.art3d import Poly3DCollection       # Some constants      data_dir = '../../CT_SCAN_IMAGE_SET/IMAGES/'     patients = os.listdir(data_dir)     labels_df=pd.read_csv('../../CT_SCAN_IMAGE_SET/stage1_labels.csv',index_col=0)     patients.sort()     print (labels_df.head())      #Image pixel array watching      for patient in patients[:10]:         #label is to get the label  of the patient. This is what done in the .get_value method.         label=labels_df.get_value(patient,'cancer')         path=data_dir+patient         slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]         #You have dicom files and they have attributes.          slices.sort(key = lambda x: float(x.ImagePositionPatient[2]))         print (len(slices),slices[0].pixel_array.shape)      #If u need to see many slices and resize the large pixelated 2D images into 150*150 pixelated images      IMG_PX_SIZE=50     HM_SLICES=20      for patient in patients[:1]:         #label is to get the label of the patient. This is what done in the .get_value method.         label=labels_df.get_value(patient,'cancer')         path=data_dir+patient         slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]         #You have dicom files and they have attributes.          slices.sort(key = lambda x: float(x.ImagePositionPatient[2]))         #This shows the pixel arrayed image related to the second slice of each patient          #subplot         fig=plt.figure()         for num,each_slice in enumerate(slices[:16]):             print (num)             y=fig.add_subplot(4,4,num+1)             #down sizing everything. Resize the imag size as their pixel values are 512*512             new_image=cv2.resize(np.array(each_slice.pixel_array),(IMG_PX_SIZE,IMG_PX_SIZE))             y.imshow(new_image)         plt.show()      print (len(patients))      ###################################################################################      def get_pixels_hu(slices):         image = np.array([s.pixel_array for s in slices])         # Convert to int16 (from sometimes int16),          # should be possible as values should always be low enough (<32k)         image = image.astype(np.int16)          # Set outside-of-scan pixels to 0         # The intercept is usually -1024, so air is approximately 0         image[image == -2000] = 0          # Convert to Hounsfield units (HU)         for slice_number in range(len(slices)):              intercept = slices[slice_number].RescaleIntercept             slope = slices[slice_number].RescaleSlope              if slope != 1:                 image[slice_number] = slope * image[slice_number].astype(np.float64)                 image[slice_number] = image[slice_number].astype(np.int16)              image[slice_number] += np.int16(intercept)          return np.array(image, dtype=np.int16)       #The next problem is each patient is got different number of slices . This is a performance issue.     # Take the slices and put that into a list of slices and chunk that list of slices into fixed numer of     #chunk of slices and averaging those chunks.      #yield is like 'return'. It returns a generator      def chunks(l,n):         for i in range(0,len(l),n):             #print ('Inside yield')             #print (i)             yield l[i:i+n]      def mean(l):         return sum(l)/len(l)       def largest_label_volume(im, bg=-1):         vals, counts = np.unique(im, return_counts=True)          counts = counts[vals != bg]         vals = vals[vals != bg]          if len(counts) > 0:             return vals[np.argmax(counts)]         else:             return None       def segment_lung_mask(image, fill_lung_structures=True):          # not actually binary, but 1 and 2.          # 0 is treated as background, which we do not want         binary_image = np.array(image > -320, dtype=np.int8)+1         labels = measure.label(binary_image)           # Pick the pixel in the very corner to determine which label is air.         #   Improvement: Pick multiple background labels from around the patient         #   More resistant to "trays" on which the patient lays cutting the air          #   around the person in half         background_label = labels[0,0,0]         #Fill the air around the person         binary_image[background_label == labels] = 2          # Method of filling the lung structures (that is superior to something like          # morphological closing)          if fill_lung_structures:             # For every slice we determine the largest solid structure              for i, axial_slice in enumerate(binary_image):                 axial_slice = axial_slice - 1                 labeling = measure.label(axial_slice)                 l_max = largest_label_volume(labeling, bg=0)                  if l_max is not None: #This slice contains some lung                     binary_image[i][labeling != l_max] = 1          binary_image -= 1 #Make the image actual binary         binary_image = 1-binary_image # Invert it, lungs are now 1         # Remove other air pockets insided body         labels = measure.label(binary_image, background=0)         l_max = largest_label_volume(labels, bg=0)          if l_max is not None: # There are air pockets             binary_image[labels != l_max] = 0          return binary_image       #Loading the files     #Load the scans in given folder path     def load_scan(path):         slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]         slices.sort(key = lambda x: float(x.ImagePositionPatient[2]))         try:             slice_thickness = np.abs(slices[0].ImagePositionPatient[2] - slices[1].ImagePositionPatient[2])         except:             slice_thickness = np.abs(slices[0].SliceLocation - slices[1].SliceLocation)          for s in slices:             s.SliceThickness = slice_thickness          return slices        def resample(image, scan, new_spacing=[1,1,1]):         # Determine current pixel spacing          spacing = np.array([scan[0].SliceThickness] + scan[0].PixelSpacing, dtype=np.float32)           resize_factor = spacing / new_spacing         new_real_shape = image.shape * resize_factor         new_shape = np.round(new_real_shape)         real_resize_factor = new_shape / image.shape         new_spacing = spacing / real_resize_factor          print ('Resize factor')         print (real_resize_factor)          image = scipy.ndimage.interpolation.zoom(image, real_resize_factor, mode='nearest')         print ('shape')         print (image.shape)          return image, new_spacing      '''def chunks(l,n):         for i in range(0,len(l),n):             #print ('Inside yield')             #print (i)             yield l[i:i+n]      def mean(l):         return sum(l)/len(l)'''      #processing data     def process_data(slices,patient,labels_df,img_px_size,hm_slices):          #for patient in patients[:10]:                 #label is to get the label of the patient. This is what done in the .get_value method.         try:             label=labels_df.get_value(patient,'cancer')             print ('label process data')             print (label)             #path=data_dir+patient             #slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]             #You have dicom files and they have attributes.              slices.sort(key = lambda x: float(x.ImagePositionPatient[2]))             #This shows the pixel arrayed image related to the second slice of each patient             patient_pixels = get_pixels_hu(slices)             print ('shape hu')             print (patient_pixels.shape)             segmented_lungs2, spacing = resample(patient_pixels, slices, [1,1,1])             #print ('Pix shape')             #print (segmented_lungs2.shape)              #segmented_lungs=np.array(segmented_lungs2).tolist()              new_slices=[]              segmented_lung = segment_lung_mask(segmented_lungs2, False)             segmented_lungs_fill = segment_lung_mask(segmented_lungs2, True)             segmented_lungs=segmented_lungs_fill-segmented_lung               #print ('length of segmented lungs')             #print (len(segmented_lungs))             #print ('Shape of segmented lungs......................................')             #print (segmented_lungs.shape)             #print ('hiiii')             #segmented_lungs=[cv2.resize(each_slice,(IMG_PX_SIZE,IMG_PX_SIZE)) for each_slice in segmented_lungs3]             #print ('bye')             #print ('length of slices')             #print (len(slices))             #print ('shape of slices')             #print (slices.shape)               #print (each_slice.pixel_array)              #This method returns smallest integer not less than x.             chunk_sizes =math.ceil(len(segmented_lungs)/HM_SLICES)              print ('chunk size ')             print (chunk_sizes)              for slice_chunk in chunks(segmented_lungs,chunk_sizes):                      slice_chunk=list(map(mean,zip(*slice_chunk))) #list - []                      #print (slice_chunk)                     new_slices.append(slice_chunk)              print(len(segmented_lungs), len(new_slices))              if len(new_slices)==HM_SLICES-1:                     new_slices.append(new_slices[-1])              if len(new_slices)==HM_SLICES-2:                     new_slices.append(new_slices[-1])                     new_slices.append(new_slices[-1])              if len(new_slices)==HM_SLICES-3:                     new_slices.append(new_slices[-1])                     new_slices.append(new_slices[-1])                     new_slices.append(new_slices[-1])              if len(new_slices)==HM_SLICES+2:                     new_val =list(map(mean, zip(*[new_slices[HM_SLICES-1],new_slices[HM_SLICES],])))                     del new_slices[HM_SLICES]                     new_slices[HM_SLICES-1]=new_val              if len(new_slices)==HM_SLICES+1:                     new_val =list(map(mean, zip(*[new_slices[HM_SLICES-1],new_slices[HM_SLICES],])))                     del new_slices[HM_SLICES]                     new_slices[HM_SLICES-1]=new_val              if len(new_slices)==HM_SLICES+3:                     new_val =list(map(mean, zip(*[new_slices[HM_SLICES-1],new_slices[HM_SLICES],])))                     del new_slices[HM_SLICES]                     new_slices[HM_SLICES-1]=new_val              print('LENGTH ',len(segmented_lungs), len(new_slices))             except Exception as e:             # again, some patients are not labeled, but JIC we still want the error if something             # else is wrong with our code             print(str(e))           #print(len(new_slices))             if label==1: label=np.array([0,1])         elif label==0: label=np.array([1,0])         return np.array(new_slices),label        # Some constants      #data_dir = '../../CT_SCAN_IMAGE_SET/IMAGES/'     #patients = os.listdir(data_dir)     #labels_df=pd.read_csv('../../CT_SCAN_IMAGE_SET/stage1_labels.csv',index_col=0)     #patients.sort()     #print (labels_df.head())      much_data=[]     much_data2=[]     for num,patient in enumerate(patients):         if num%100==0:             print (num)          try:              slices = load_scan(data_dir + patients[num])             img_data,label=process_data(slices,patients[num],labels_df,IMG_PX_SIZE,HM_SLICES)             much_data.append([img_data,label])             #much_data2.append([processed,label])         except:             print ('This is unlabeled data')      np.save('muchdata-{}-{}-{}.npy'.format(IMG_PX_SIZE,IMG_PX_SIZE,HM_SLICES),much_data)     #np.save('muchdata-{}-{}-{}.npy'.format(IMG_PX_SIZE,IMG_PX_SIZE,HM_SLICES),much_data2)      IMG_SIZE_PX = 50     SLICE_COUNT = 20      n_classes=2     batch_size=10      x = tf.placeholder('float')     y = tf.placeholder('float')      keep_rate = 0.8      def conv3d(x, W):         return tf.nn.conv3d(x, W, strides=[1,1,1,1,1], padding='SAME')      def maxpool3d(x):         #                        size of window         movement of window as you slide about         return tf.nn.max_pool3d(x, ksize=[1,2,2,2,1], strides=[1,2,2,2,1], padding='SAME')      def convolutional_neural_network(x):         #                # 5 x 5 x 5 patches, 1 channel, 32 features to compute.         weights = {'W_conv1':tf.Variable(tf.random_normal([3,3,3,1,32])),                    #       5 x 5 x 5 patches, 32 channels, 64 features to compute.                    'W_conv2':tf.Variable(tf.random_normal([3,3,3,32,64])),                    #                                  64 features                    'W_fc':tf.Variable(tf.random_normal([54080,1024])),                    'out':tf.Variable(tf.random_normal([1024, n_classes]))}          biases = {'b_conv1':tf.Variable(tf.random_normal([32])),                    'b_conv2':tf.Variable(tf.random_normal([64])),                    'b_fc':tf.Variable(tf.random_normal([1024])),                    'out':tf.Variable(tf.random_normal([n_classes]))}          #                            image X      image Y        image Z         x = tf.reshape(x, shape=[-1, IMG_SIZE_PX, IMG_SIZE_PX, SLICE_COUNT, 1])          conv1 = tf.nn.relu(conv3d(x, weights['W_conv1']) + biases['b_conv1'])         conv1 = maxpool3d(conv1)           conv2 = tf.nn.relu(conv3d(conv1, weights['W_conv2']) + biases['b_conv2'])         conv2 = maxpool3d(conv2)          fc = tf.reshape(conv2,[-1, 54080])         fc = tf.nn.relu(tf.matmul(fc, weights['W_fc'])+biases['b_fc'])         fc = tf.nn.dropout(fc, keep_rate)          output = tf.matmul(fc, weights['out'])+biases['out']          return output       much_data = np.load('muchdata-50-50-20.npy')     # If you are working with the basic sample data, use maybe 2 instead of 100 here... you don't have enough data to really do this     train_data = much_data[:-4]     validation_data = much_data[-4:]       def train_neural_network(x):         print ('..........1.........')         prediction = convolutional_neural_network(x)         print ('..........2.........')         #cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(prediction,y) )         cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))         print ('..........3.........')         optimizer = tf.train.AdamOptimizer(learning_rate=1e-3).minimize(cost)         print ('..........4.........')         hm_epochs = 20         with tf.Session() as sess:             sess.run(tf.initialize_all_variables())              successful_runs = 0             total_runs = 0             print ('..........5.........')             for epoch in range(hm_epochs):                 epoch_loss = 0                 for data in train_data:                     total_runs += 1                     try:                         X = data[0]                         Y = data[1]                         _, c = sess.run([optimizer, cost], feed_dict={x: X, y: Y})                         epoch_loss += c                         successful_runs += 1                     except Exception as e:                         # I am passing for the sake of notebook space, but we are getting 1 shaping issue from one                          # input tensor. Not sure why, will have to look into it. Guessing it's                         # one of the depths that doesn't come to 20.                         pass                         #print(str(e))                 print ('..........6.........')                 print('Epoch', epoch+1, 'completed out of',hm_epochs,'loss:',epoch_loss)                 print ('..........7.........')                 correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))                 accuracy = tf.reduce_mean(tf.cast(correct, 'float'))                  print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]}))              print('Done. Finishing accuracy:')             print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]}))              print('fitment percent:',successful_runs/total_runs)       print (x)       # Run this locally:     train_neural_network(x) 

P.S : resample() , segment_lung_mask() methods can be found from link 1.

2 Answers

Answers 1

For training you have

for data in train_data:     total_runs += 1     try:         X = data[0]         Y = data[1]         _, c = sess.run([optimizer, cost], feed_dict={x: X, y: Y}) 

So x and y are, respectively, the first two elements of a single row of train_data.

However, when calculating the accuracy you have

print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]})) 

So x is the first element of all rows of validation_data, which gives it dimensions of (20,310,310), which can't be broadcast to a placeholder of dimension (20). Ditto for y. (Broadcasting means that if you gave it a tensor of dimensions (20, 310) it would know to take each of the 310 columns and feed it to the placeholder separately. It can't figure out what to do with a tensor of (20, 310, 310).)

Incidentally, when you declare your placeholders it's a good idea to specify their dimensions, using None for the dimension depending on the number of separate examples. This way the program can warn you when dimensions don't match up.

Answers 2

The error message seems to indicate that the placeholder tensors x and y have not been defined correctly. They should have the same shape as the input values X = data[0] and Y = data[1], such as

x = tf.placeholder(shape=[20,310,310], dtype=tf.float32) # if y is a scalar: y = tf.placeholder(shape=[], dtype=tf.float32) 
If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment