SoftPython
Introductive guide to coding, data cleaning and analysis for Python 3, with many worked exercises.
DOWNLOAD: PDF EPUB HTML Github
Nowadays, more and more decisions are taken upon factual and objective data. All disciplines, from engineering to social sciences, require to elaborate data and extract actionable information by analysing heterogenous sources. This book of practical exercises gives an introduction to coding and data processing using Python, a programming language popular both in the industry and in research environments.
News
23 August 2023
restyling!
restructured analytics with pandas:
separated notebooks into 1. intro and 2. advanced (grouping, merging, geopandas)
moved exercise notebook 2 (eures) to worked projects section
renamed dataset into astropi.csv reming ROW_ID column and substituted with time_stamp
added paragraphs to first notebook
added meteo pressure intervals exercise
matrices-lists1: added visiting with style paragraph
lists3: added copy/deepcopy paragraph
Python tutor now always show data structures as non-nested
strings1: added paragraph on f-strings
sets1: added paragraph ‘What can we search?’
formats2-csv: swapped ‘with’ when reading and writing
Old news: link
Intended audience
This book can be useful for both novices who never really programmed before, and for students with more techical background, who desire to know about data extraction, cleaning, analysis and visualization (among used frameworks there are Pandas, Numpy and Jupyter editor). Data is going to be processed in a practical way, without delving into more advanced considerations about algorithmic complexity and data structures. To overcome issues and guarantee concrete didactical results, step-by-step tutorials are presented.
Contents
Overview: Approach and goals
A - Foundations
Quick Python intro (if you already have programming skills)
Tools and scripts (if you are a beginner)
A.1 Data Types
Basics: 1. variables and integers 2. booleans 3. real numbers 4. challenges
Strings: 1. intro 2. operators 3. basic methods 4. search methods 5. challenges
Lists: 1. intro 2. operators 3. basic methods 4. search methods 5. challenges
Tuples: 1. intro 2. challenges
Sets: 1. intro 2. challenges
Dictionaries: 1. intro 2. operators 3. methods 4. special classes 5. challenges
A.2 Control Flow
If conditionals: 1.intro 2. challenges
For loops: 1. intro 2. strings 3. lists 4. tuples 5. sets 6. dictionaries
While loops 1. intro 2. challenges
Sequences and comprehensions: 1. intro 2. challenges
A.3 Basic Algorithms
Functions: 1. intro 2. error handling and testing
Matrices - list of lists: 1. intro 2. other exercises 3. challenges
Mixed structures: 1. intro 2. challenges
Matrices - numpy: 1. intro 2. exercises
B - Data Analysis
Data formats: 1. line files 2. CSV files 3. JSON files 4. challenges
Visualization (matplotlib,: 1. intro 2. challenges images
Analytics with Pandas: 1. intro 2. exercises 3. challenge
Relational data: 1. intro 2. binary relations 3. simple statistics 4. challenge
C - Applications
Database integration: executing simple SQL queries to extract data from a database, loading into Pandas
D - Projects
Worked projects
Projects as exercises (with solutions), involving some raw data preprocessing, simple analysis and final chart display. Some are about serious topics, some are light-hearted, others come from daily work scenarios: pick your choice!
Note that since the purpose of the book is to introduce to computational thinking, we preferred following the no-magic approach of using basic Python data structures and modules instead of more advanced libraries like numpy or pandas, even when they could dramatically ease the task and improve performances.
Text data worked projects
Tabular data worked projects
Relational data worked projects
E - Appendix
Author
David Leoni: Software engineer specialized in data integration and semantic web, has made applications in open data and medical in Italy and abroad. He frequently collaborates with University of Trento for teaching activities in various departments. Since 2019 is president of CoderDolomiti Association, where along with Marco Caresia manages volunteering movement CoderDojo Trento to teach creative coding to kids. Email: david.leoni@unitn.it Website: davidleoni.it
Contributors
Marco Caresia (2017 Autumn Edition assistent @DISI, University of Trento): He has been informatics teacher at Scuola Professionale Einaudi of Bolzano. He is president of the Trentino Alto Adige Südtirol delegatioon of the Associazione Italiana Formatori and vicepresident of CoderDolomiti Association.
Alessio Zamboni (2018 March Edition assistent @Sociology Department, University of Trento): Data scientist and software engineer with experience in NLP, GIS and knowledge management. Has collaborated to numerous research projects, collecting experinces in Europe and Asia. He strongly believes that ‘Programming is a work of art’.
Luca Bosotti (2020 Data Science Summerschool assistant, 2021 seminars @Sociology Department, University of Trento): Developer, scientist and professor. Believes the world is getting more and more complicated and interesting, so we must study it by taking advantage of all the available potential and reasoning. He thought to youngsters of all ages, from elementary schools up till university level and got impressed by the diversity of people who approach programming.
Massimiliano Luca (2019 summer edition teacher @Sociology Department, University of Trento): Loves learning new technilogies each day. Particularly interested in knowledge representation, data integration, data modeling and computational social science. Firmly believes it is vital to introduce youngsters to computer science, and has been mentoring at Coder Dojo DISI Master.
Others: We also wish to thank the students Ludovico Maria Valenti and Ioana Doleanu for the improvements to the numpy page, and Stefano Moro for the numerous reports.
License
The making of this website and related courses was funded by Department of Information Engineering and Computer Science (DISI), University of Trento, and also Sociology and Mathematics departments.
Unless otherwise noted, the material in this website is original and distributed with license CC-BY 4.0 International Attribution https://creativecommons.org/licenses/by/4.0/deed.en. Basically, you can freely redistribute and modify the written content, just remember to cite University of Trento and the authors
Datasets from Data analysis and Worked projects sections might have some restrictions, sources are citated in the pages where they are used. Other third party resources are listed in third-party-licences.txt
Technical notes: all website pages are easily modifiable Jupyter notebooks, that were converted to web pages using NBSphinx and Jupman template. Text sources are on Github at https://github.com/DavidLeoni/softpython-en
Acknowledgments
We thank in particular professor Alberto Montresor of Department of Information Engineering and Computer Science, University of Trento to have allowed the making of first courses from which this material was born from, and the project Trentino Open Data (dati.trentino.it) for the numerous datasets provided.