
GrainPy
GrainPy is a Python package for compiling, analyzing, visualizing, and interpreting grain size distribution data. The idea for GrainPy started with an abundance of grain size distribution data that wasn’t being fully utilized. Initially, GrainPy was developed to quickly compile analyses of multiple samples, analyze results with geologic statistics and figures, and produce publication-quality grain size distribution plots. GrainPy will continue to add functionality, develop new methodologies of grain size distribution analysis and interpretation, and provide a user-friendly experience for sediment analyses.
Installation
Create a Virtual Environment
It is common practice to set up and work within virtual environments, as it helps avoid potential dependency conflicts, and represents good practice. It is also strongly recommended to do that here with GrainPy. GrainPy was built and continues to be developed using Conda and Anaconda environment, although it is not required and we present several command line procedures below for creating and then activating your new virtual environment.
Conda
# create
conda create -n CoolNewGrainPyVirtualEnvironment
# activate
conda activate CoolNewGrainPyVirtualEnvironment
Mac OSX Terminal
# create
python3 -m venv CoolNewGrainPyVirtualEnvironment
# activate
source CoolNewGrainPyVirtualEnvironment/bin/activate
Windows Command Prompt
# create
py -m venv CoolNewGrainPyVirtualEnvironment
# activate
.\CoolNewGrainPyVirtualEnvironment\Scripts\activate
Install the GrainPy Package
Once your virtual environment is created and activated, installing GrainPy is similar across each of the three platforms discussed above. Each released version of GrainPy is also indexed on PyPi, and it is recommended to install this package using pip.
python3 -m pip install grainpy
# Note for Windows: "python3" should be substituted with "py"
Congratulations! the most current stable release of GrainPy is now installed and ready to use!
Developmental Version
In some cases, the user may want to install the current working version of GrainPy, which can be done directly from Github.
python3 -m pip install git+https://github.com/masseygeo/GrainPy@main
# Note for Windows: "python3" should be substituted with "py"
Congratulations! the most current, developmental version of GrainPy is now installed and ready to use! If you experience bugs, please create a new issue on Github with the appropriate label
GrainPy Fundamentals
Input
Grain size distribution data can be obtained from a variety of equipment, including sieves, hydrometers, and laser diffraction particle size analyzers, however, there are a few basic requirements for the input data structure in order to utilize GrainPy:
Grain size distribution data must be an Excel file (.xlsx or .xls).
Each file contains data for one sample only.
Bins used in the user’s methodology are located in a column immediately preceding the data collected.
Bin sizes are in microns.
Bins are arranged from smallest to largest.
Bins represent the lower limit of each grain size interval, and the last row contains the maximum UPPER limit (therefore there is one more row of bins than data.
Compilation
Given the correct input format, GrainPy can easily collect and organize single or multiple files, and calculate cumulative percentages and a variety of sample statistics.
# list or tuple of complete paths to files
files = ['path to file 1', ..., 'path to file n']
# compile data, cumulative proportions, and statistics from file(s)
var = GrainSizeDist(files)
var.data()
var.datacp()
var.datast()

The figures above show an example of three Excel files on the left, with bins in one column and data in the following column. After these files have been input into GrainPy, the user can see the compiled data, cumulative percentages, and data statistics as shown on the right.
Visualization
Data can then be visualized and interpreted with publication-quality grain size distribution plots.
# plots of individual files
var.gsd_single()
# plot of all files
var.gsd_multi()

The two figures above show a GrainPy plot for a single sample (left) and multiple samples (right). The single sample plot shows a histogram of relative proportions within each bin, a cumulative proportion curve, mean, median, and mode(s) of the sample (vertical lines), and selected statistics below the plot. The multiple sample plot shows cumulative proportion curves for all samples (black lines), the mean cumulative proportion curve (dark red lines), and a 95% confidence interval of the mean (translucent red polygon). GrainPy has multiple options for plots, which are discussed in detail in the Plotting section.
The ‘grainsize’ Module
The grainsize module contains the GrainSizeDist class, which is the fundamental GrainPy object. The GrainSizeDist object contains a variety of attributes and methods to aid the user with data compilation and visualization.
‘path’ Attribute
The path attribute is the only reqquirement for creating a GrainSizeDist object, and consists of either a list or tuple of path(s) for the file(s) containing the grain size distribution data. This parameter can be input manually, or interactively using the selectdata function from the util module.
# selectdata function opens a user dialog window to select files interactively
files = selectdata()
# create a new instance of the GrainSizeDist class with the selected files
var = GrainSizeDist(files)
‘lith’ & ‘area’ Attributes
The lith and area attributes are optional. Their intent is to provide a means to differentiate GrainSizeDist objects by lithology and/or location. Currently, these are only used for grain size distribution plot titles.
# using GrainSizeDist instance from above, assign attributes after instantiation
var.lith = 'alluvium'
var.area = 'Lebanon Junction'
# alternatively, assign attributes at time of instantiation
var = GrainSizeDist(files, lith='alluvium', area='Lebanon Junction')
‘bins’ Method
The bins method returns a dataframe of the bin intervals in phi units, microns, and millimaters.
# using the GrainSizeDist instance from above
var.bins()
‘data’ Method
The data method returns a dataframe of the grain size distribution data compiled for all file(s) selected and input for a GrainSizeDist object.
# using the GrainSizeDist instance from above
var.data()
‘datacp’ Method
The datacp method returns a dataframe of the cumulative proportions compiled for all file(s) selected and input for a GrainSizeDist object.
# using the GrainSizeDist instance from above
var.datacp()
‘datast’ Method
The datast method returns a dataframe of statistics calculated from the data and cumulative proportions for all file(s) selected and input for a GrainSizeDist object.
# using the GrainSizeDist instance from above
var.datast()
‘gsd_single’ Method
The gsd_single method saves two image files of grain size distribution plots (.pdf and .jpg) in the directory where the sample files are located. The default for this method is to plot all samples, however, the user has the option to manually select sample(s) by either:
manually inputting a list of sample names using the files parameter
defining indices of a ‘slice’ from the path attribute using the i and j parameters
# grain size distribution plots of all samples
var.gsd_single()
# grain size distribution plots of two samples, 'file 1' and 'file 7'
var.gsd_single(files=['file 1', 'file 7'])
# grain size distribution plots of first three samples included in a *GrainSizeDist* object
var.gsd_single(i=0, j=3)
‘gsd_multi’ Method
The gsd_multi method saves two image files of grain size distribution plots (.pdf and .jpg) of multiple samples in the directory where the sample files are located. There are three plot options (1-3) and two additional options (4-5) to customize each plot according to user specifications:
Cumulative proportion curves of all files, with mean and 95% confidence interval. This is the default with parameters bplt = False, cplt = True, stplt = True, and ci = True.
Mean cumulative proportion curve with 95% confidence interval, and histogram of mean grain size distributions using the bplt parameter.
Histogram of mean grain size distribution per bin with 95% confidence interval, and grain size distributions per bin for each sample (as curves) using the bplt and cplt parameters.
Plotting mean, median, and modes, and selected statistics of mean in plot legend using the stplt parameter.
Plotting the 95% confidence interval using the ci parameter.
# option 1, default
var.gsd_multi()
# options 2 and 4
var.gsd_multi(bplt=True, stplt=False)
# options 3 and 5
var.gsd_multi(bplt=True, cplt=False, ci=False)
‘samplenames’ Method
The samplenames method assumes that the input file names represent the sample names and are unique. Both of these assumptions are not strictly required, but are used in other methods of the GrainSizeDist class, as well as in plot titles. Calling the samplenames method returns a list of the assumed sample names.
# using the GrainSizeDist instance from above
var.samplenames()
The ‘classify’ Module
The classify module contains conversion functions, including conversion of numerical statistics into qualitative classifcation names. These functions are called within the GrainSizeDist class, but can also be used individually by the user.
The ‘folk_sed’ Function
The folk_sed function converts sand, silt, and clay proportions into the appropriate sediment classification scheme of Folk (1954).
folk_sed(20, 20, 60)
folk_sed(2.5, 90.2, 7.3)
Folk, R.L., 1954, The Distinction between Grain Size and Mineral Composition in Sedimentary-Rock Nomenclature: Journal of Geology, Volume 62, Number 4, DOI: https://doi.org/10.1086/626171.
The ‘folk_sort’, ‘folk_skew’, & ‘folk_kurt’ Functions
The folk_sort, folk_skew, and folk_kurt functions classify sediment sorting, distribution skewness and distribution kurtosis into the appropriate classification schemes of Folk and Ward (1957).
folk_sort(1.35)
folk_skew(0.12)
folk_kurt(0.96)
Folk, R.L., and Ward, W.C., 1957, Brazos River bar: a study in the significance of grain size parameters: Journal of Sedimentary Petrology, Volume 27, Number 1, DOI: https://doi.org/10.1306/74D70646-2B21-11D7-8648000102C1865D.
The ‘wentworth_gs’ Function
The wentworth_gs function converts grain sizes, in phi units, into the appropriate grain size classification scheme of Wentworth (1922).
wentworth_gs(-1)
wentworth_gs(3.6)
Wentworth, C.K., 1922, A Scale of Grade and Class Terms for Clastic Sediments: Journal of Geology, Volume 30, Number 5, DOI: https://doi.org/10.1086/622910.
The ‘util’ Module
The util module provides functionalities called within the GrainSizeDist class, but may also be called individually to the user.
The ‘datacheck’ Function
The most common problem with input data is bin sizes are not consistent in all files. For example, one bin may be 0.375198, but some files may have been rounded inadvertently to 0.37520). The datacheck function checks the smallest expected bin value input as a function parameter by the user, displays the results of the datacheck, and offers the option to automatically fix the problematic files by changing them, or let’s the user manually examing the file(s) themselves. We warn the user that the auto-fix solution is permanent and changes the original files!
# call function using "files" variable created above
# min_bin and bin_rows have default values, but may be changed accordingly by user
datacheck(files)
The ‘df_ex’ & ‘gems_ex’ Functions
The df_ex and gems_ex functions afford the user the option to export GrainSizeDist object data as tables (.csv or .xlsx).
The df_ex function requires a dataframe parameter, then saves that dataframe according to the chosen location/name from the interactive user-dialog window. Dataframes include returns of the bins, data, datacp, or datast methods, or any other type of non-GrainPy dataframe.
The gems_ex function requires a GrainSizeDist object parameter, then saves the data and selected statistics in a transposed format to a location/name from the interactive user-dialog window.
# df_ex function
# first export cumulative proportion dataframe
df_ex(gsd.datacp())
# then statistics dataframe from a GrainSizeDist object named 'gsd'
df_ex(gsd.datast())
# gems_ex function with a GrainSizeDist object named 'gsd'
gems_ex(gsd)
The ‘selectdata’ Function
The selectdata function provides an interactive user dialog window to manually select file(s). This provides a user-friendly option to select single or multiple Excel files (.xlsx or .xls), save a list of paths to the selected files, then can be used as a parameter for creating a GrainSizeDist object.
# selectdata function, call function then select file(s) using dialog window
files = selectdata()
files
Statistics
…under construction
Roadmap
GrainPy will continue to add modern, user-friendly tools for the interpretation and presentation of sediment analyses. Below is a list of benchmarks that have been achieved or in the queue for development.
Initial build
Package release v0.1.0
Compatibility with multiple data sources
Additional statistics, including engineering and USCS
Plotting/comparison of multiple instances
Sediment mixing models for multi-modal samples
Integration with GIS
Contributions
It is very much hoped that users of GrainPy will suggest improvements and/or new functionalities by one of two methods:
Simply create a new issue with the appropriate ‘label’ on Github.
…OR…
Personally contributing to the coding using a fork…
# create a branch
git checkout -b feature/AmazingFeature
# develop the feature code and commit your changes
git commit -m 'Added this new feature that does amazing things'
# push to the branch
git push origin feature/amazingfeature
# finally, `create a pull request on the GrainPy Github repo page