Introduction to Python
This Introduction to Python course is for beginners. We aim to introduce fundamental programming concepts using Google Colab. We will introduce variables, data types, casting, string, Booleans, operators, lists, tuples, loops, conditions, functions, and a bit of NumPy. This course is designed for those who are coming from a non-technical background.
Data Protection, Security, Ethics and Liability in the Age of Big Data
This session aims to introduce the current EU and UK data protection regime and the changes to be brought in by the General Data Protection Regulation applicable since May 2018, in spite of Brexit. Furthermore, the session will present and allow for discussion of the specific challenges big data analytics bring, especially in light of the reports published by various data protection regulators on big data both at UK and EU levels. Special attention will be given to security requirements in data protection law.
The last two hours (with both speakers) will introduce the ethical issues arising from Big Data and present the correlative legal issues that may arise in light of Data Protection legislations and of criminal law. Torts and contracts will not be covered.
Introduction to R
R is an interactive computing environment and programming language designed for statistical analysis and graphics. Extensions to the basic capabilities of R are straightforward to produce and share with others. It is widely and increasingly used in many Big Data fields of research including bioinformatics.
Because of its power and flexibility, R is more demanding to learn than traditional statistical packages but rewards some initial effort. This course is based on tested material that we have been using for several years to help research students, postdocs and faculty get started in their own data analysis, and is refined each time based on feedback. It is aimed at people who may have little or no programming experience.
Introduction to Machine Learning
The aim of this course is to provide an introduction to Machine Learning and a discussion of the types of problems it is suitable for. The course will then introduce Kernel Machines and show how they can provide robust but flexible classifiers when the number of training points is limited.
Deep Learning for Images and Text 1
Day 1 tutorial will focus on convolutional neural networks, also known as convnets, a type of deep-learning model almost universally used in computer vision applications. You’ll learn to apply convnets to image-classification problems—in particular, those involving small training datasets, which are the most common use case if you aren’t a large tech company.
Day 2 tutorial will focus on deep-learning models that can process text (understood as sequences of word or sequences of characters), time-series, and sequence data in general. The two-fundamental deep-learning algorithms for sequence processing are recurrent neural networks and 1D convnets. The applications of these algorithms are in document classification, time series classification, sequence to sequence learning and sentiment analysis.
Best Practice Analytics
This course will look at methods and tools that can help us create high-quality analytics and reproducible results. We will also look at how to move from a single analyst, spreadsheet driven approach to collaborative analytics that follows a best practice governance model. Adopting practices from test driven software development, we will look at how to establish an analytics process based on documentation, versioning, testing, peer review, collaboration and risk evaluation. We will use examples in R, Shiny, Python and Jupyter Notebooks to illustrate the ideas taught in the course.
The aim is to give you an understanding of the challenges you will face when running your own real-world data analytics project and introduce you to a number of principles you can follow to achieve high-quality reproducible results.
Tree-based Models for Machine Learning in Data Analytics
In this course, participants will learn how to work with tree-based models to solve data science problems in Python. Everything from using a single tree for regression or classification to more advanced ensemble methods will be covered. Participants start learning about basic CARTs (classification and regression trees) followed by implementation of bagged trees, Random Forests, and boosted trees using the Gradient Boosting Machine, or GBM. The course will include dedicated practical sessions for these techniques and allow the participant to create high performance tree-based models for a real-world dataset.
Data Science Management
As you progress in your career, you will most likely be asked to manage other colleagues. This requires complementing technical Data Science skills with strategic, interpersonal and other skills. This course will present elements and challenges of managing a Data Science team. Understanding those will allow you to practice the required skills early on – preparing you for any upcoming opportunities.
AI systems are very good at making predictions in a variety of settings. In many cases, however, this comes at the expense of interpretability of the models used. In this course, we will see how to interpret the decisions made by these systems such that they are accessible to humans. We will provide an overview of different methods for interpreting classifiers. We will look at machine learning models which are interpretable by nature as well as model-agnostic methods for interpreting classifiers. This course is both theoretical and practical.
Learning with Small Data Sets
During this course we will explore Bayesian learning and how knowledge-based priors can help obtain good results from datasets with only few samples. In particular, we will start the day with a theorical introduction on Bayesian learning followed by a practical session, during which we will see hands-on how to apply Bayesian inference methods.
Introduction to Data Visualisation Using R
In the era of misinformation and fake news, producing data visualisations that are clear and interpretable to an audience is essential in engaging people with data. Whilst there are many software packages available to produce data graphics, many offer limited customisation of graphics, or are not easily reproducible. This course will explore tools to produce high-quality graphics using the R programming language, focussing on the “ggplot2” package. The “ggplot2” package allows almost endless customisation of data visualisations, has a number of excellent extension packages that add further flexibility, and, being in R, is entirely script-based and therefore highly reproducible. This course will equip attendants with the skills to produce high-quality data visualisations using the “ggplot2” package and extensions, and would be beneficial to people working in any field where data visualisation is important. The course will be suitable for those with little to intermediate prior programming experience.
Introduction to Natural Language Processing
In this tutorial we will introduce the basic concepts of NLP, starting with simple text pre-processing techniques such as tokenisation and part-of-speech tagging, and moving on to more complex tasks such as term extraction, entity recognition and information extraction. The techniques will be demonstrated using GATE, one of the most widely used toolkits for performing all kinds of NLP tasks, and which is freely available and open source. GATE includes not only its own text processing components, but also includes a number of popular third party NLP components, all of which participants will be able to experiment with during the tutorial with hands-on exercises.
Practical Text Analytics and Sentiment Analysis from Social Media
This tutorial will introduce the concepts of social media and sentiment analysis from unstructured text. It will first introduce the concept of social media analysis, showing how this form of noisy text requires different solutions from traditional text analysis, with practical examples and exercises showing how this can be achieved. This leads into the more specialised task of sentiment analysis: the problem of extracting opinions automatically from text. It will cover both rule-based and machine learning techniques, provide some information on the key underlying NLP and text analysis processes required, and look in detail at some of the major problems and solutions, such as detection of sarcasm, use of informal language, spam opinion detection, trustworthiness of opinion holders, and so on. The techniques will be demonstrated with real applications developed in GATE, an open-source language processing toolkit. Hands-on exercises and relevant materials will be provided for participants to try out the applications, and to experiment with building their own simple tools.
Bandits, Learning, and Search
We will provide a basic overview of multi-armed bandit problems and algorithms for solving them. We will illustrate the application of such algorithms on a real problem in the scope of the advertising industry. Then we will continue with the relation of multi-armed bandits to reinforcement learning, and further on with the relation of reinforcement learning to Monte Carlo tree search. We will describe the application of such algorithms for game playing in the scope of the General Video Game AI competition.
Introduction to TensorFlow and Deep Learning
The course introduces Tensorflow as a programming language from scratch and shows how to use it to build simple neural networks and perform backpropagation. Students are encouraged to program along with the tutor. The basic underlying workings of TensorFlow and neural networks are taught without resorting to higher-level black box packages, so that students can gain a fundamental understanding of how deep learning works. The course also gives an introductory overview of popular deep learning models, including convolutional neural networks and recurrent neural networks.
Recurrent neural networks with Keras
This course teaches a deep understanding of how recurrent neural networks work, what they are used for, and how to implement them efficiently using Keras and Tensorflow. The day culminates with unique advanced recurrent neural network examples applied to control problems. Note that natural-language processing examples will not be covered.
GIS Systems in R
Geographic information systems (GIS) software form a powerful tool in the analysis of many types of spatial data, from understanding political trends in different areas, mapping the spread of infectious diseases, or understanding the impacts of climate change across the globe. This course will focus on the “sf” package, and will explore the merits and functionality of working with “simple features” based objects in geographical analyses. This one-day course will familiarise users with the array of GIS packages available in R, and enable users to carry out basic GIS operations on a variety of different geographical data formats.
Synergy of Optimisation and Machine Learning
In the first part of the course, we will discuss modern optimisation approaches that do not require significant investment of expertise and time in algorithm development, but still allow to tackle real-world problems. We will cover the following topics: the meaning of optimisation and the relevance to decision support/making, off-the-shelf solvers, algorithm complexity, simple exact algorithms, simple heuristics, metaheuristics, algorithm configuration and tuning. The second part of the course will provide foundations for exploiting the strong connections between optimisation and data science. In a series of exercises, you will see how the techniques studied in the first part of the course are used in machine learning. The aim is to enhance understanding, and so the usage, of optimisation within machine learning. Conversely, it is being increasingly recognised that the control of optimisation algorithms would itself benefit from application of data science techniques. We will present methods, with exercises, that are being developed for data science to improve the performance of existing optimisation methods in many real-world problems.
Overall, the course presents the close interactions between data science and optimisation. You will gain deeper understanding of the optimisation within machine learning and decision support systems, and so how to make more effective use of them.
Learning Under Different Training and Testing Distributions
Systems based on machine learning methods often suffer a major challenge when applied to the real-world datasets. The conditions under which the system was developed will differ from those in which we use the system. Few sophisticated examples could be email spam filtering, stock prediction, health diagnostic, and brain-computer interface (BCI) systems, that took a few years to develop.
Will this system be usable, or will it need to be adapted because the distribution has changed since the system was first built? Apparently, any form of real-world data analysis is cursed with such problems, which arise for reasons varying from the sample selection bias or operating in non-stationary environments.
This tutorial will focus on the issues of dataset shifts (e.g. covariate shift, prior-probability shift, and concept shift) and will cover transfer learning for managing to learn a satisfactory model.
Traditional and Deep Learning Methods
This course will introduce machine learning and deep learning techniques and allow participants to gain practical knowledge implementing them. The morning session will focus on regression and classification and practical machine learning such as regularisation and bias/variance theory. The afternoon session will cover neural networks and deep learning, as well as CNN and sequence models.
Bayesian Analysis in R
Bayesian statistics are increasingly popular in many scientific disciplines. In this course, you will learn the theoretical underpinnings of Bayesian approaches and the differences between Bayesian and frequentist statistics. You will also learn how to implement, plot, and interpret Bayesian models in R. Finally, you will learn more about some of the advanced options for statistical modelling in this framework, including multi-level modelling and generalised linear approaches.
Introduction to Network Science
This one-day introductory course on network science will give a broad overview of the different concepts and methods commonly applied in social network analysis. We will first consider different kinds of network data and their representation and discuss the basics of network visualisation, including a hands-on example using the free software visone as an example. We will also discuss different kinds of applications and usage scenarios of network science in business and social contexts. The second part of the course will introduce exploratory and descriptive methods for the analysis of networks, at three levels of granularity: at the node level, the subgroup level, and the network level. The third part of the course will introduce inferential or statistical network analysis, including the basic ideas behind a range of models like the exponential random graph model and its various extensions, latent space models, the quadratic assignment procedure, and related techniques. We will cover the implementation of these methods in a very cursory way using R, but the focus is on the methods, not their implementation. Overall, this course is an introductory-level teaser for interested academics, practitioners, and data scientists who would like to explore what they can possibly do with their relational data in the way of exploration and prediction.
Transfer Learning with Transformers in NLP
This course will go through the basics of transfer learning in Natural Language Processing NLP. We will discuss cutting edge architectures like BERT and GPT. Our aim will be to cover the basics of transformer-based models and how to fine-tune pretrained models on local datasets in Python.
Machine learning for Causal Inference from Observational Data
This course will introduce the basic principles of causal modelling (potential outcomes, graphs, causal effects) while emphasising the key role of design and assumptions in obtaining robust estimates. It will also cover the basic principles of machine learning and the use of machine learning methods to do causal inference (e.g. methods stemming from domain adaptation and propensity scores). Lastly it will show how to implement these techniques for causal analysis and interpret the results in illustrative examples. By the end of this course participants should: understand the distinction between causal effects and associations and appreciate the key role of design and possibly untestable assumptions in the estimation of causal effects, understand the role of training and testing models on data and the use of regularization to avoid overfitting, and be able to position machine learning within the causal tool chain.
Understanding our Research and Innovation Infrastructure
All this technology can make thinking about infrastructure feel somewhat geeky and distant from the reality of doing research. This short introductory course is designed to strip away as much of the technobabble as possible and so that it will provide a framework of understanding that should be useful to you. It won't answer all your questions, but it might provide insight into potential research opportunities, and who knows, even answer "how do I get the computer to say yes, so I can get my research done?"
The learning outcome will be to provide you with an introduction to research infrastructures so that you can better advocate and cost for the supporting digital infrastructure you need to for your research.
The course is structured around:
- What makes "data" research data?
- Defining what we mean by research and innovation infrastructure.
- Understanding the scale needed to transform research systems, cultures, and decision-making.
- What analytical capacity is needed for large scale research?
- How do we safely share data?
- How to understand infrastructure costs, including public cloud?