Meine ersten Erfahrungen mit dem ICMV und Amsterdam

Mobile Application for Receipt Fraud Detection Based on Optical Character Recognition

Real-Time Identification of Animals found in Domestic Areas of Europe

Dr.-Ing. Sorin Liviu Jurj Montag, 18. November 2019, 21:05 0 comments

Ich habe hier über meine Erfahrungen bei der ICMV (International Conference on Machine Vision) 2019 Konferenz in Amsterdam berichtet. Nun möchte ich Ihnen unser Papier „Real-Time Identification of Animals found in Domestic Areas of Europe“ vorstellen. Ich habe die Arbeit in der mündlichen Sitzung B-1 „Neural Network and Image Processing Applications“ vorgestellt.

Abstrakt: Dieses Papier stellt ein Verfahren zur Identifizierung von 34 Tierklassen vor, die den konventionellsten Tieren in den europäischen Haushalten entsprechen, indem vier Arten von Convolutional Neural Networks (CNNs) verwendet werden, nämlich VGG-19, InceptionV3, ResNet-50 und MobileNetV2. Wir haben auch ein System entwickelt, das alle diese 34 Tierklassen sowohl aus Bildern als auch in Echtzeit aus Videos oder einer Webcam klassifizieren kann. Darüber hinaus ist unser System in der Lage, automatisch zwei neue Datensätze zu generieren, einen Datensatz mit Textinformationen (d.h. Name der Tierklasse, Datum und Zeitintervall, in dem das Tier im Rahmen war) und einen Datensatz mit Bildern der vorhandenen und identifizierten Tierklassen in Videos oder vor einer Webcam. Unsere experimentellen Ergebnisse zeigen eine hohe Gesamttestgenauigkeit für alle 4 vorgeschlagenen Architekturen (90,56% für VGG-19-Modell, 93,41% für InceptionV3-Modell, 93,49 für ResNet-50-Modell und 94.54% für das MobileNetV2-Modell), was beweist, dass solche Systeme eine unauffällige Methode zum Sammeln einer reichen Sammlung von Informationen über die große Anzahl von identifizierten Tierklassen ermöglichen, wie z.B. Einblicke darüber, welche Tierklassen zu einem bestimmten Zeitpunkt in einem bestimmten Gebiet vorhanden sind und wie sie aussehen, was zu wertvollen Datensätzen insbesondere für Forscher im Bereich der Ökologie führt.

Sie können den Artikel hier lesen.

Für das Crawlen der Bilder, die für das Training der Modelle benötigt werden, habe ich diesen Code verwendet:

from icrawler.builtin import GoogleImageCrawler
import itertools, os

for keyword in [‚pigs‘, ‚pig‘, ’schweine‘, ’schwein‘, ‚porci‘, ‚porc‘, ‚Cow‘, ‚Cows‘, ‚Vaca‘, ‚vaci‘, ‚Kuh‘, ‚Kuehe‘, ‚Horses‘, ‚Horse‘, ‚Pferde‘, ‚Pferd‘, ‚Cal‘, ‚Cai‘, ‚Donkeys‘, ‚Donkey‘, ‚Esel‘, ‚Eseln‘, ‚Magar‘, ‚Magari‘, ‚Goats‘, ‚Goat‘, ‚Ziegen‘, ‚Ziege‘, ‚Capra‘, ‚capre‘, ‚Sheeps‘, ‚Sheep‘, ‚Schaf‘,
‚Schafe‘, ‚oaie‘, ‚oi‘, ‚Rabbits‘, ‚Rabbit‘, ‚Kaninchen‘, ‚iepure‘, ‚iepuri‘, ‚Mouses‘, ‚Mouse‘, ‚Maus‘, ‚Soarec‘, ‚Soareci‘, ‚Rats‘, ‚Rat‘, ‚Ratte‘, ‚Sobolan‘, ‚Sobolani‘, ‚Snakes‘, ‚Snake‘, ‚Schlange‘, ‚Schlangen‘, ‚Sarpe‘,
‚Serpi‘, ‚Frogs‘, ‚Frog‘, ‚Frosch‘, ‚Froesche‘, ‚Broasca‘, ‚Broaste‘, ‚Lizard‘, ‚Lizards‘, ‚Lacertids‘, ‚Eidechsen‘, ‚Soparla‘, ‚Soparle‘, ‚Hamsters‘, ‚Hamster‘, ‚hamsteri‘, ‚Hedgehogs‘, ‚Hedgehog‘, ‚Igel‘, ‚Arici‘, ‚Cats‘, ‚Cat‘,
‚Katze‘, ‚Katzen‘, ‚Pisica‘, ‚Pisici‘, ‚Dogs‘, ‚Dog‘, ‚Hund‘, ‚Hunde‘, ‚caine‘, ‚caini‘, ‚Foxes‘, ‚Fox‘, ‚Fuchs‘, ‚Fuechse‘, ‚Vulpi‘, ‚Vulpe‘, ‚Bears‘, ‚Bear‘, ‚Baer‘, ‚Baeren‘, ‚Urs‘, ‚Ursi‘, ‚Deers‘, ‚Deer‘, ‚Hirsch‘,
‚Caprioara‘, ‚Caprioare‘, ‚Cerb‘, ‚Cerbi‘, ‚Bats‘, ‚Bat‘, ‚Fledermaus‘, ‚Fledermaeuse‘, ‚Lilieci‘, ‚Liliac‘, ‚Chicken‘, ‚Chickens‘, ‚Henne‘, ‚Huehner‘, ‚Gaina‘, ‚Gaini‘, ‚Rooster‘, ‚Hahn‘, ‚Cocosi‘, ‚Turkeys‘,
‚Turkey‘, ‚Truthahn‘, ‚Truthaehne‘, ‚curcan‘, ‚curcani‘, ‚Ducks‘, ‚Duck‘, ‚Enten‘, ‚Ente‘, ‚Rata‘, ‚Rate‘, ‚Gooses‘, ‚Goose‘, ‚Gans‘, ‚Gaense‘, ‚Gaste‘, ‚Gasca‘, ‚Pigeons‘, ‚Pigeon‘, ‚Taube‘, ‚Tauben‘, ‚Porumbei‘, ‚Porumbel‘,
‚Crows‘, ‚Crow‘, ‚Kraehe‘, ‚cioara‘, ‚ciori‘, ‚Parrot‘, ‚Parrots‘, ‚Papagei‘, ‚Papageien‘, ‚Papagali‘, ‚Papagal‘, ‚Sparrows‘, ‚Sparrow‘, ‚Spatzen‘, ‚Spatz‘, ‚Vrabie‘, ‚Vrabii‘, ‚Owls‘, ‚Owl‘, ‚Eule‘, ‚Eulen‘, ‚Bufnita‘, ‚Bufnite‘, ‚woodpecker‘, ‚Specht‘, ‚Ciocanitoare‘, ‚Magpie‘, ‚Eurasian magpie‘, ‚Elster Vogel‘, ‚cotofana‘, ‚canary‘, ‚Kanarienvogel‘, ‚canar domestic‘ ]:
google_crawler = GoogleImageCrawler(
parser_threads=2,
downloader_threads=4,
storage={‚root_dir‘: ‚Animals_and_Birds/{}‘.format(keyword)}
)
google_crawler.crawl(
keyword=keyword, max_num=10000)

Hier füge ich alle Dateien für alle Architekturen und Dateien bei, die Ihnen helfen können, die Ergebnisse in der Forschungsarbeit zu reproduzieren.

inference_worker.py

# Real-Time Identification of Animals found in Domestic Areas of Europe
# Author: Sorin Liviu Jurj

from multiprocessing import Process
import queue

from keras.applications.resnet50 import preprocess_input
import numpy as np

# Run inference on the model given images
def run_inference(model, images):
# Create the batch out of the list of preprocessed images
batch = preprocess_input(images.astype(‚float‘))

# Run inference
predictions = model.predict_on_batch(batch)
# Take the average predictions across all images
predictions_mean = np.mean(predictions, axis=0)
# Find the predicted class (argmax)
return np.argmax(predictions_mean), predictions_mean, predictions

# This will perform inference on the model in a separate process,
# so that we can continue playing the video/webcam in the main process
class InferenceWorker(Process):
def __init__(self, data_q, result_q, ready_q):
Process.__init__(self, name=’ModelProcessor‘)
# Queues for sharing data with the main process
self.data_q = data_q
self.result_q = result_q
self.ready_q = ready_q

def run(self):
# load model
print(‚Loading model‘)
import keras
model = keras.models.load_model(‚checkpoints/run7-epoch_51.hdf5‘)
print(‚Model loaded‘)
# Alert the main process that the model is ready for images
self.ready_q.put(‚ready‘)
# Process images until the main thread tells us to stop
while True:
try:
(time, images, original_images) = self.data_q.get(True)
# Signal from main process to exit
if time == „exit“:
print(‚Stopping inference thread‘)
break
prediction, predictions, original_predictions = run_inference(model, images)
self.result_q.put((time, prediction, predictions, original_images, original_predictions ))

except queue.Empty:
continue

print(‚Inference thread done ‚)

preprocessing.py

# Real-Time Identification of Animals found in Domestic Areas of Europe
# Author: Sorin Liviu Jurj

import cv2

# Resize image
# by specifying the size of the smaller side
def resize_to(img, size=256):
if img is None:
return

(h, w) = img.shape[:2]
# Find smaller size
if h < w:
ratio = size / h
else:
ratio = size / w

# Here we have weight by height
outsize = (int(w * ratio), int(h * ratio))
return cv2.resize(img, outsize)

# Crop the center region
def crop_center(img, crop_size=224):
y, x = img.shape[:2]
startx = x // 2 – (crop_size // 2)
starty = y // 2 – (crop_size // 2)
return img[starty:starty + crop_size, startx:startx + crop_size, …]

# Crop any random part of the image. This is useful if we are processing a batch
# of images
def random_crop(img, random_crop_size=224):
height, width = img.shape[0], img.shape[1]
dx = random_crop_size
dy = random_crop_size
x = np.random.randint(0, width – dx + 1)
y = np.random.randint(0, height – dy + 1)
return img[y:(y + dy), x:(x + dx), :]

# Preprocess the fullsize image.
def preprocess_image(image, do_random_crop=False, resize_size=256,
crop_size=224):
# Resize the image
resized = resize_to(image, size=resize_size)
# Crop part of the resized image
if (do_random_crop):
cropped = random_crop(resized, crop_size)
else:
cropped = crop_center(resized, crop_size)
return cropped

# Preprocess a batch of images
def preprocess_all(images):
# Preprocess images according to how many images we have
# If we have more, then we can do random cropping
if len(images) == 1:
do_random_crop = False
resize_size = 256
elif len(images) >= 2 and len(images) < 8:
do_random_crop = True
resize_size = 256
elif len(images) >= 8:
do_random_crop = True
resize_size = 256

# Preprocess each image
return np.asarray(
[preprocess_image(image.copy(), do_random_crop, resize_size) for image
in
images])

speed_test.py

# Real-Time Identification of Animals found in Domestic Areas of Europe
# Author: Sorin Liviu Jurj

# USAGE
# python fps_demo.py

# import the necessary packages
from __future__ import print_function

import argparse

from inference_worker import run_inference
from preprocessing import preprocess_all
from webcamvideostream import WebcamVideoStream
import cv2
import numpy as np
import keras
import time

# Test speed of processing by this computer
def test_inference_speed(num_frames):
start_time = time.time()
frames_with_times = vs.read_frames()
frames = [v[0] for v in frames_with_times][:num_frames]
_ = run_inference(model, preprocess_all(np.asarray(frames)))

# your code
elapsed_time = time.time() – start_time
return elapsed_time

# Find how many frames per second this computer can process
def find_inference_parameters():
print(‚Measuring inference speed for 1 frame‘)
num_frames = 1
# Warm model up too
test_inference_speed(num_frames)
inference_time = test_inference_speed(num_frames=num_frames)
print(
‚Measured inference time for {} frames: {:.3f}s‘.format(num_frames,
inference_time))
prev_inference_time = inference_time
prev_num_frames = num_frames
if inference_time > 1:
return prev_num_frames, prev_inference_time

print(‚Measuring inference speed for 2 frames‘)
num_frames = 2
# Warm model up too
test_inference_speed(num_frames)
inference_time = test_inference_speed(num_frames=num_frames)
print(
‚Measured inference time for {} frames: {:.3f}s‘.format(num_frames,
inference_time))
if inference_time > 1:
return prev_num_frames, prev_inference_time
prev_inference_time = inference_time
prev_num_frames = num_frames

print(‚Measuring inference speed for 4 frames‘)
num_frames = 4
# Warm model up too
test_inference_speed(num_frames)
inference_time = test_inference_speed(num_frames=num_frames)
print(
‚Measured inference time for {} frames: {:.3f}s‘.format(num_frames,
inference_time))
if inference_time > 1:
return prev_num_frames, prev_inference_time
prev_inference_time = inference_time
prev_num_frames = num_frames

print(‚Measuring inference speed for 8 frames‘)
num_frames = 8
# Warm model up too
test_inference_speed(num_frames)
inference_time = test_inference_speed(num_frames=num_frames)
print(
‚Measured inference time for {} frames: {:.3f}s‘.format(num_frames,
inference_time))
if inference_time > 1:
return prev_num_frames, prev_inference_time
prev_inference_time = inference_time
prev_num_frames = num_frames

print(‚Measuring inference speed for 16 frames‘)
num_frames = 16
# Warm model up too
test_inference_speed(num_frames)
inference_time = test_inference_speed(num_frames=num_frames)
print(
‚Measured inference time for {} frames: {:.3f}s‘.format(num_frames,
inference_time))
if inference_time > 1:
return prev_num_frames, prev_inference_time
prev_inference_time = inference_time
prev_num_frames = num_frames

print(‚Measuring inference speed for 24 frames‘)
num_frames = 24
# Warm model up too
test_inference_speed(num_frames)
inference_time = test_inference_speed(num_frames=num_frames)
print(
‚Measured inference time for {} frames: {:.3f}s‘.format(num_frames,
inference_time))
if inference_time > 1:
return prev_num_frames, prev_inference_time
prev_inference_time = inference_time
prev_num_frames = num_frames

return prev_num_frames, prev_inference_time

if __name__ == ‚__main__‘:

parser = argparse.ArgumentParser(description=’Webcam demo‘)
parser.add_argument(‚–video‘, dest=’video‘,
default=0,
help=’Path to video file to use instead of the webcam‘)
args = parser.parse_args()

# setup the model
print(‚Loading model‘)
model = keras.models.load_model(‚checkpoints/run7-epoch_51.hdf5‘)
print(‚Warming cam up‘)
# created a *threaded *video stream
vs = WebcamVideoStream(src=args.video, max_frames=24).start()
# Allow the cam to warm up
time.sleep(2)

frames_per_second, inference_time = find_inference_parameters()
print(‚Use fps:‘, frames_per_second)

# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

webcam_demo.py

# Real-Time Identification of Animals found in Domestic Areas of Europe
# Author: Sorin Liviu Jurj

# USAGE
# python fps_demo.py

# import the necessary packages
from __future__ import print_function

from preprocessing import preprocess_all
from webcamvideostream import WebcamVideoStream
import argparse
import cv2
import time
from datetime import datetime
from multiprocessing import Queue
import queue
import os
from datetime import datetime
from inference_worker import InferenceWorker
import numpy as np
import csv

# Display text on an image, used to show the video/webcam
def display_text(img, text, x=10, y=20):
# prepare the text
font = cv2.FONT_HERSHEY_SIMPLEX
bottomLeftCornerOfText = (x, y)
fontScale = 0.5
fontColor = (0, 0, 255)
lineType = 1

# Draw the text on the image
cv2.putText(img, text,
bottomLeftCornerOfText,
font,
fontScale,
fontColor,
lineType)
# Show the image with text
cv2.imshow(„Frame“, img)
cv2.waitKey(1)

#
def process(video_source, num_frames, max_read_fps, output_file):
# Setup the worker and the queues for sharing data
data_q = Queue()
result_q = Queue()
ready_q = Queue()
inference_worker = InferenceWorker(data_q, result_q, ready_q)
inference_worker.start()

# For analysis of predictions
prev_predictions = []
# Flag for if we are processing images at this time
processing = False
current_text = „“
# Wait until the inferenceWorker is ready
ready_q.get(True)

# Start the video queue
vs = WebcamVideoStream(src=video_source,
max_frames=max_read_fps).start()
# Allow the cam to warm up
# The default video param of 0 means webcam
if args.video == 0:
time.sleep(1)

while 1:
try:
frame = vs.read()
display_text(frame.copy(), current_text)
if not processing:
frames_with_times = vs.read_frames()[:num_frames]
frames = [v[0] for v in frames_with_times]
data_q.put((time.time(), preprocess_all(frames), frames_with_times))
processing = True

try:
scheduled_time, prediction, predictions, frames_with_times, original_predictions = result_q.get(False)

# Save images
save_images(frames_with_times, original_predictions)

if prediction != imagenet_class:
current_text = f“{prediction_to_class[prediction]} detected. Confidence: ({predictions[prediction]:.3f})“

else:
current_text = f“Nothing detected. Confidence: ({predictions[prediction]:.3f})“
print(current_text)
processing = False

# record animal detections
prev_predictions.append((scheduled_time, prediction))
analyze_predictions(prev_predictions)
except queue.Empty:
pass

except KeyboardInterrupt:
print(‚DETECTED ANIMALS:‘, detections)
# Save to CSV
with open(output_file, ‚w‘) as csvfile:
writer = csv.writer(csvfile)
writer.writerow([‚Animal‘, ‚From‘, ‚To‘])
for detection in detections:
animal, start, end = detection
writer.writerow(detection)

# Signal to the inference process to quit
data_q.put((„exit“, False))
# Finally, do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()
print(‚Stopping main thread‘)
break

# Find detected animal from last few seconds
def find_most_common_prediction(current_time, predictions):
# Find predictions in last N seconds
lookback_period = float(args.lookback_seconds) # seconds
lookback_predictions = []
for prediction_time, prediction in predictions[::-1]:
if current_time – lookback_period < prediction_time:
lookback_predictions.append(prediction)
else:
# We have reached older times, that don’t interest us
break
# print(lookback_predictions)
return np.max(np.asarray(lookback_predictions))

# Convert UNIX timestamp to human date
def timestamp_to_date(timestamp):
return datetime.utcfromtimestamp(timestamp).strftime(
‚%Y-%m-%d %H:%M:%S UTC‘)

# Analyze previous detections to record them
def analyze_predictions(all_predictions):
global detections
global start_time
global current_class

prediction_time, _ = all_predictions[-1]
detected = find_most_common_prediction(prediction_time, all_predictions)
print(detected)

if detected != current_class:
# We have detected something new

if current_class != imagenet_class:
# We have been detecting animal
# so we have finished detecting this animal
print(‚Finished detecting‘, prediction_to_class[current_class])

end_time = prediction_time
detections.append(
(prediction_to_class[current_class],
timestamp_to_date(start_time),
timestamp_to_date(end_time)))

start_time = prediction_time
current_class = detected
print(‚Started detecting‘, prediction_to_class[current_class])

# Save predictions as images
def save_images(frames_with_times, original_predictions):
for image_with_time, prediction in zip(frames_with_times, original_predictions):
# Figure what was the prediction
predicted_class = np.argmax(prediction)
# Only save positive predictions
if predicted_class == imagenet_class:
continue

class_name = prediction_to_class[predicted_class]
confidence = prediction[predicted_class]
print(str(class_name)+‘ ‚+str(confidence))

image = image_with_time[0]
detection_time = str(datetime.fromtimestamp(image_with_time[1]).isoformat()).replace(‚:‘, „-„).replace(‚/‘, „-„)

image_hash = hash(str(image))
# if this image has been saved already, don’t save it
if image_hash in saved_images:
continue
# Mark this image as saved
saved_images[image_hash] = True

# Make sure the class dir exists
os.makedirs(save_images_dir + class_name, exist_ok=True)

# Let’s save the image
save_name = save_images_dir + class_name + ‚/‘ + class_name + „_“ + detection_time + ‚.jpg‘
print(save_name)
cv2.imwrite(save_name, image)

if __name__ == ‚__main__‘:
# Keep a cache of images saved, so we dont save them multiple times
saved_images = {}

parser = argparse.ArgumentParser(description=’Webcam demo‘)
parser.add_argument(‚–video‘, dest=’video‘,
# The default video param of 0 means webcam
default=0,
help=’Path to video file to use instead of the webcam‘)

parser.add_argument(‚–fps‘, dest=’fps‘,
default=2,
help=’How many frames to pass to the model for inference. The more the better. Suggested value is from 1 to 24. Default 2′)

parser.add_argument(‚–video_read_frames‘, dest=’video_read_frames‘,
default=24,
help=’How many frames of video to read/analyze/play per second (default is 24)‘)

parser.add_argument(‚–output‘, dest=’output‘,
default=“output.csv“,
help=’File to output animal detections to. Default is output.csv‘)

parser.add_argument(‚–lookback_seconds‘, dest=’lookback_seconds‘,
default=3,
help=’How many seconds back to analyze detections for inclusion in CSV file‘)

args = parser.parse_args()
print(‚Using FPS:‘, args.fps)
print(‚Using video read frames per second:‘, int(args.video_read_frames))

# For analysis of detections
detections = []
imagenet_class = 34
current_class = imagenet_class
start_time = False

# Create directory to store detections
current_time = datetime.now().strftime(‚%Y-%m-%d_%H-%M-%S‘)
save_images_dir = ‚./animals_and_birds/’+current_time+’/‘
os.makedirs(save_images_dir, exist_ok=True)

# Mapping of classes to inference indexes
prediction_to_class = {0: ‚Bat‘, 1: ‚Bear‘, 2: ‚Canary‘,
3: ‚Cat‘, 4: ‚Cattle‘, 5: ‚Chicken‘,
6: ‚Deer‘, 7: ‚Dog‘, 8: ‚Donkey‘,
9: ‚Duck‘, 10: ‚Fox‘, 11: ‚Frog‘,
12: ‚Goat‘, 13: ‚Goose‘, 14: ‚Hamster‘,
15: ‚Hedgehog‘, 16: ‚Horse‘,
17: ‚Lizard‘, 18: ‚Magpie‘, 19: ‚Mole‘,
20: ‚Owl‘, 21: ‚Parrot‘, 22: ‚Pig‘,
23: ‚Pigeon‘, 24: ‚Rabbit‘, 25: ‚Raven‘,
26: ‚Sheep‘, 27: ‚Snake‘, 28: ‚Sparrow‘,
29: ‚Squirrel‘, 30: ‚Stork‘,
31: ‚Tortoise‘, 32: ‚Turkey‘,
33: ‚Woodpecker‘,
34: ‚imagenet_resized_256‘}

# Process the video/webcam
process(args.video, int(args.fps), int(args.video_read_frames), args.output)

webcamvideostream.py

# Real-Time Identification of Animals found in Domestic Areas of Europe
# Author: Sorin Liviu Jurj

# import the necessary packages
from threading import Thread
import cv2
import time

class WebcamVideoStream:
def __init__(self, src=0, name=“WebcamVideoStream“, max_frames=24):
# initialize the video camera stream and read the first frame
# from the stream
self.stream = cv2.VideoCapture(src)
(self.grabbed, self.frame) = self.stream.read()

self.frames = []
self.max_frames = max_frames
self.last_saved_frame = time.time()

# initialize the thread name
self.name = name

# initialize the variable used to indicate if the thread should
# be stopped
self.stopped = False

def start(self):
# start the thread to read frames from the video stream
t = Thread(target=self.update, name=self.name, args=())
t.daemon = True
t.start()
return self

def update(self):
# keep looping infinitely until the thread is stopped
while True:
started_reading = time.time()

# if the thread indicator variable is set, stop the thread
if self.stopped:
print(‚Video/webcam thread stopping‘)
return

# otherwise, read the next frame from the stream
(self.grabbed, self.frame) = self.stream.read()
self._add_frame(self.frame)

ended_reading = time.time()

# Calculate how long to wait until getting the next frame
delay_seconds = (1. / self.max_frames) – (
ended_reading – started_reading)
# no delay of less than 1 ms
if delay_seconds < 0.001:
delay_seconds = 0.001
time.sleep(delay_seconds)

def _add_frame(self, frame):
self.last_saved_frame = time.time()
self.frames.append((frame, time.time()))
# Only keep so much frames in the stack
if len(self.frames) > self.max_frames:
self.frames.pop(0)

def read_frames(self):
return self.frames

def read(self):
# return the frame most recently read
return self.frame

def stop(self):
# indicate that the thread should be stopped
self.stopped = True

Außerdem werde ich eine erste Beschreibung der ResNet-50-Architektur geben, die ich erstellt habe. Später habe ich verschiedene Architekturen trainiert, aber der Text soll Ihnen eine Vorstellung davon vermitteln, was jede der Dateien tut.

Description of files

accuracy_report.txt Describes the accuracy, recall, precision, f1 score for all classes. Also, the overall accuracy is on the top. Note: This accuracy is without the negative class („nothing detected“), which was trained from random imagenet images. This is because there were images of animals in the random sample of imagenet.
inference_worker.py This file describes a multiprocess approach to inference. The main process reads images from the webcam/video, displays them and passes them to InferenceWorker for processing (via a multiprocess shared queue). In turn, InferenceWorker passes back the results of the processing back to the main thread for analysis.
preprocessing.py Because of issues of running OpenCV in a multiprocess environment, the preprocessing happens in the main thread. OpenCV is used for resizing and cropping the images for preprocessing. This preprocessing consists of resizing the full-size frame from the webcam/video, resizing them to 256 by 256 pixels, and doing a center or random crop (depending on the number of images in the batch). The model-specific preprocessing, i.e. the normalization of input channels (colors), is done in the InferenceWorker.
webcamvideostream.py This class is taken from the MIT-licensed imutils and extended. It is extended in two ways. First, to make available the frames of the last 1 second. This helps prediction by doing inference on several slightly different frames from the last second instead of a single frame. Second, to stream frames from videos in realtime, as if the frames come from the webcam. This helps with testing.
speedtest.py This file performs tests on how many frames per second can this computer perform inference. Run this file withpython speed_test.py --video=videos/animals-short.mp4. It will try doing inference using 1, 2, 4, 8, 16 and 24 frames per second (FPS), and recommend to use the highest value of fps which can be predicted on in less than a second. The output will look like Use fps: 4. The number should be passed to the webcamdemo.py file as described below.
webcamdemo.py This is the main file that will read images from webcam/video, pass them to the model for inference and run analysis on the results. Use it as followspython webcam_demo.py --fps=4 --video=videos/animals-short.mp4 --lookback_seconds=3 --output=output.csv. The fps parameter should be what was output from the speedtest.py file (see above). Thevideo parameter can be omitted to use the webcam, or should be a path to video file to use. Analysis is performed on the detected animals in videos as follows: If animal is detected most in the last lookback_seconds seconds (over other animas and „nothing“ class), then we consider that animal present. This is to prevent misdetections of animals in a single frame. Based on the previous algorithm, the animals and the start and end of their presence in the video/webcam are output to a CSV file.
main-run7.ipynb This file describes the training of the model with negative („imagenet“) class. The output of this training is in „./checkpoints/run7-epoch_51.hdf5“ and is used for inference.
main-run6.ipynb (i should talk only about this one in the article) This file describes the training of the model without negative („imagenet“) class. The results of this training are in the „accuracy_report.txt“ file. The „Model notes“ below are extracted from this model.
training-results/historyrun6 and training-results/historyrun7 These directories contain the train and validation accuracy and loss of training main-run6.ipynb and main-run7.ipynb.
checkpoints/run7-epoch-51.hdf5 The final checkpoint of the model trained in main-run7.ipynb. This checkpoint is used for inference.
checkpoints/run6-epoch-51.hdf5 The final checkpoint of the model trained in main-run6.ipynb. This checkpoint is NOT used for inference.

Model notes

Data used

For training, 127,832 images belonging to 34 classes were used. The validation and test sets were each 16,353 images. Class indexes follow:{0: 'Bat', 1: 'Bear', 2: 'Canary', 3: 'Cat', 4: 'Cattle', 5: 'Chicken', 6: 'Deer', 7: 'Dog', 8: 'Donkey', 9: 'Duck', 10: 'Fox', 11: 'Frog', 12: 'Goat', 13: 'Goose', 14: 'Hamster', 15: 'Hedgehog', 16: 'Horse', 17: 'Lizard', 18: 'Magpie', 19: 'Mole', 20: 'Owl', 21: 'Parrot', 22: 'Pig', 23: 'Pigeon', 24: 'Rabbit', 25: 'Raven', 26: 'Sheep', 27: 'Snake', 28: 'Sparrow', 29: 'Squirrel', 30: 'Stork', 31: 'Tortoise', 32: 'Turkey', 33: 'Woodpecker'}

During training, each class was weighted to give more importance to classes that are underrepresented. For example, dog and cat classes were heavily underweighted (0.09199101 and 0.16943509) because of the large number of training samples in these classes. Weights by class follow:

[2.10985674, 2.46865706, 3.74478556, 0.16943509, 0.89796148, 1.21400217, 1.08194668, 0.09199101, 2.08066669, 1.48197269, 1.89504269, 1.57246537, 1.26719404, 2.43035857, 2.37059565, 2.41319943, 0.66368309, 2.3720913 , 3.80158211, 4.49195305, 1.77598711, 1.54088717, 1.64974318, 2.23928809, 1.63895584, 3.9534855 , 1.87146078, 1.23392343, 1.69358771, 2.88104575, 3.67523432, 3.86807069, 3.70420168, 4.18216319]

Model architecture

Pretrained ResNet 50 model was used by removing the top fully-connected layer with outputs for 1000 target classes, and replacing it with a fully-connected layer with outputs for 34 target classes. Replacing the last ResNet layer with a single layer worked best. More expressive replacements (for example, 3 fully-connected layers of respectively 256, 128, 64 units) were tested but were found to be hard to train and unaccurate, probably because of the limited number of images available.

Training (I provide info only for the initial ResNet-50 architecture I trained. The correct and full info you can find in the research paper itself):

The ResNet 50 model weights were frozen, and only the new layer on top was trained for a single epoch. This brought overall accuracy to 71%, which shows the knowledge of the ResNet embeddings was utilized.
Next, approximately the top one-third of the ResNet 50 model was unfrozen and finetuned for 5 epoch with LR of 0.01, 5 epoch with LR of 1e-3 and 5 epoch with LR of 1e-4. Accuracy has reached over 90%.
Finally, approximately the top two-thirds of the ResNet 50 model is finetuned for 35 more epoch, leading to convergence at 92.5% accuracy.

How to test the proposed animals and birds identification system:

Using Windows OS:

Run speed_test.py using an IDE of your choice
Open a Windows terminal, go to the main folder where all our files are saved and run:python webcam_demo.py --fps=4 --video=./videos/animals-short.mp4 --output=output.csvAfter the identification is complete, or you just want to close the process earlier, just press CTRL + C to close the identification process. In order to see the textual information containing „Class name, Date, Time Interval“ of the identified classes, go and open the output.csv file. In parallel to textual information, a new images dataset is generated in a folder called"animals_and_birds". Inside this folder, you will find subfolders that correspond to all the identified classes in .jpg format.

This concludes how our animals and birds real-time identification system works.

Stichworte: Amsterdam, Animals and Birds, Convolutional Neural Networks, Deep Learning, Flavius Opritoiu, ICMV, Mircea Vladutiu, Neural Network and Image Processing Applications, Promotionsstudent, Promotionsstudium, Real-Time Identification, Real-Time Identification of Animals found in Domestic Areas of Europe, SPIE