functions and classes ¶

!!THIS SECTION IS IN PROGRESS!!

This section is partially autodocumented. You can find a link to the the completely autodocumented functions and classes page at Module Index

Modules in rough order of use are defined below:

Contents

functions and classes

The init.py module ¶

The __init__ module doesn’t contain any code, just a description of predictatops.

Predictatops is a python package for stratigraphic top prediction¶

Predictatops modules are designed to be run in a sequence, one step after another.

Each step has two ways to run it.¶

For example, there is load.py, which has all the functions to load data, and then there is load_runner.py, which leverages load.py, configuration set in the configurationplusfiles.py module, and some sensible defaults to execute the entire loading step without any more work on the user’s part.

Depending on your needs, you might use write your own code that leverages load.py or you might just run load_runner.py as a executable script.

As mentioned above, Predictatops modules are run in a sequence. An example sequence is below.

fetch_demo_data.py = fetches demo dataset for predicting top Mcmurray.
configurationplusfiles_runner.py = establishes configuration, input data, & output variables.
checkdata_runner.py = finds what curves & tops are available for use in the input dataset.
load_runner.py = loads the well curves and tops.
split_runner.py = splits the wells into train & test portions.
wellsKNN_runner.py = finds the nearest neighbors of each well and creates some features.
features_runner.py = creates the rest of the features.
balance_runner.py = throws away instances of common classes & duplicates uncommon classes.
trainclasses_runner.py = trains the dataset using XGBoost algorithm.
predictionclasses_runner.py = uses the trained model to predict stratigraphic tops.
plot_runner.py = plots some of the results in map form.

Running this full sequence is also packaged into the predictatops.all_runner module.

The “main” module ¶

The main.py module of predictatops merely holds a few utility functions leveraged by other modules.

predictatops.main.printHello()[source]¶: This function simply prints a hello message for testing.

predictatops.main.load_prev_results_at_path(full_path_to_results_file, key='df')[source]¶

A function used to return a dataframe of wells stored in an h5 file at a given path with a given key.

Parameters

full_path_to_results_file (string) – A path to a .h5 file that contains a wells dataframe.
key (string) – A string representation of a key used to find the dataframe in the h5 file whose path is defined by the full_path_to_results_file argument.

Returns

wells_df_from_wellsKNN – Returns a dataframe of wells that existed at the path defined in the full_path_to_results_file argument.

Return type

dataframe

predictatops.main.getMainDFsavedInStep(path_to_results, path_to_directory, file_name, ending)[source]¶

A function used to return a dataframe of data stored in a file at a given path. Not specific to a dataframe of wells in h5 file like load_prev_results_at_path.

Parameters

path_to_results (string) – A path to a top-level results folder.
path_to_directory (string) – A path to a folder within the results folder that has the file in question.
file_name (string) – A path to a file within the path_to_results and path_to_directory arguments.
ending (string) – String representation of the file type like “.h5” or “.csv”. It should include the dot!

Returns

full_path_to_results_file – Returns a string representation of the full path to the file in question.

Return type

string

predictatops.main.get_df_results_from_step_X(output_data_inst, directory, filename, key='df')[source]¶

Another function used to return a dataframe stored in an h5 file at a given path with a given key.

Parameters

output_data_inst (string) – A path to a folder with previously output data.
directory (string) – A folder within the directory defined by the ‘output_data_inst’ that holds a file.
key (string) – A string representation of a key used to find the dataframe in the h5 file. Default is “df”.

Returns

wells_df_of_results – Returns a dataframe of wells that existed at the path defined via the given input arguments.

Return type

dataframe

predictatops.main.getJobLibPickleResults(output_data_inst, subfolder, filename)[source]¶

Another function used to generate the string representation of the path to a pickle file and then returns that pickled datafile.

Parameters

output_data_inst (string) – A path to a folder with previously output data.
subfolder (string) – A folder within the directory defined by the ‘output_data_inst’ that holds a file.
filename (string) – Name of the file in question.

Returns

joblib.load(full_path_to_pickle) – Returns a dataframe that exists at the path defined via the given input arguments.

Return type

dataframe

The “fetch_demo_data” module ¶

The fetch_data_data.py script is used to fetch the demo data using the pooch data fetching library. This is the only module that executes on being run but doesn’t has a “_runner” ending to its name.

Alternatively, you can use your own data in a top-level “data” directory and skip using this script entirely.

The script imports pooch, then fetches the demo dataset from: “https://github.com/JustinGOSSES/predictatops/raw/{version}/demo/mannville_demo_data/”. That link is defined in the registry.txt file. Pooch is used to create a GOODBYE object instance, which then exectures the fetching using the fetch_mannville_data() function in this python file.

NOTE: This was changed since version 1 to pull in a single zip file and unzip it as that’s faster than pulling in each file unzipped individually!

predictatops.fetch_demo_data.fetch_mannville_data()[source]¶

Loads all required Mannville Group data and metadata for demo data. Fetches the path to a file in the local storae. If it’s not there, we’ll download it.

Parameters: none (none) – It does not take any parameters but it assumes there is a registry.txt in the same directory that has in it the name hash and location of the file to load.
Returns: … – Returns nothing but three dots which reads as ellipses in Python. It does, however, write files to the data or whatever directory is given above in goodboy instance, which is created in fetch_demo_data.py by the pooch.create() call.
Return type: ellipses

The “all_runner” module ¶

The all_runner.py module of predictatops executes all the “<name>_runner.py” modules of predictatops in the following sequence:

configurationplusfiles_runner.py = establishes configuration, input data, & output variables.
checkdata_runner.py = finds what curves & tops are available for use in the input dataset.
load_runner.py = loads the well curves and tops.
split_runner.py = splits the wells into train & test portions.
wellsKNN_runner.py = finds the nearest neighbors of each well and creates some features.
features_runner.py = creates the rest of the features.
balance_runner.py = throws away instances of common classes & duplicates uncommon classes.
trainclasses_runner.py = trains the dataset using XGBoost algorithm.
predictionclasses_runner.py = uses the trained model to predict stratigraphic tops.

The fetch_demo_data.py script is executed due to default configuration in the configurationplusfiles_runner.py module.

The plot_runner.py module, which generates plots some of the results, is not run by all_runner.py as plot.py and plot_runner.py are more for evaluation of results and not used in every run.

The “configurationplusfiles” module ¶

The a module sets up three objects from class functions.

input_data() establishes where data is loaded from.
configuration() establishes various configuration variables used in the rest of the code.
output_data() establishes where data is written to.

These are intended to be changed by the configurationplusfiles_runner.py module.

class predictatops.configurationplusfiles.input_data(picks_file_path, picks_delimiter_str, path_to_logs_str)[source]¶

A class object that holds paths and other information related to input data such as log files location, top files, well information files, etc.

Parameters

picks_file_path: str: A string for the file path to the file with all the pick names and depths.
picks_delimiter_str: str: The delimiter of the file that has all the picks.
path_to_logs_str: str: The path to the directory with all the well logs.

load_wells_file()[source]¶: load wells file into pandas dataframe

load_gis_file()[source]¶: load wells file into pandas dataframe

set_wells_file_path(wells_file_path_str, wells_file_delimiter)[source]¶: set wells file path as attribute of object and returns wells data frame using load_well_file. Can be txt, tsv, or csv

set_gis_file_path(gis_file_path_str, gis_file_path_delimiter)[source]¶: set wells file path as attribute of object and returns wells data frame using load_well_file. Can be txt, tsv, or csv

class predictatops.configurationplusfiles.configuration[source]¶

A class to keep configuration variables you might change between runs. That is why it has a large number of attributes listed below.

Types of information information stored in here would mandetory curves or mandatory tops, column names, name of the top you’re trying to predict, etc. The object created by this class is used throughout Predictatops, so many modules reimport it.

Be careful to not change something in one module close your code, start up later working with the next module and except your changes to persis unless you saved them or wrote them into the configurationplusfiles_runner.py file.

Parameters: none (none) – None.

csv_of_well_names_wTopsCuves__name¶

csv_of_well_names_wTopsCuves__name

Type: str

csv_of_well_names_wTopCurves__path¶

csv_of_well_names_wTopsCuves__name

Type: str

must_have_curves_list¶

An array of strings that are curve names like [‘ILD’, ‘NPHI’, ‘GR’, ‘DPHI’, ‘DEPT’]

Type: list

curve_windows_for_rolling_features¶

Array of integers like [5,7,11,21]

Type: list

must_have_tops__list¶

An array of tops list that could be integers or strings like [13000,14000]

Type: list

target_top¶

A string or integer like 1300

Type: str

top_under_target¶

A string or interger that is a top name and is the name of a top under the top you want to predict such as 14000

Type: str

top_name_col_in_picks_df¶

The top name as it appears in the picks dataframe

Type: str

siteID_col_in_picks_df¶

The string for the siteID column in the picks dataframe like ‘SitID’

Type: str

UWI¶

The string for the UWI column like “UWI”

Type: str

DEPTH_col_in_featureCreation¶

The string for the depth column like “DEPT”

Type: str

HorID_name_col_in_picks_df¶

The string for the horizon ID column like “HorID”

Type: str

quality_col_name_in_picks_df¶

The string for the quality of the pick column like “Quality”

Type: str

picks_depth_col_in_picks_df¶

The string for the pick column name like ‘Pick’

Type: str

col_topTarget_Depth_predBy_NN1thick¶

The string for the top target depth predicted by nearest neighbor thickess like ‘topTarget_Depth_predBy_NN1thick’

Type: str

quality_items_to_skip__list¶

The array of the integers for the quality of wells to optionally skip as not good quality picks. An example is [-1,0]

Type: str

test¶

Honestly forget what this is come back and find out but is should be “test0”

Type: str

pick_class_str¶

String for the top taget pick prediction column like ‘TopTarget_Pick_pred’

Type: str

threshold_returnCurvesThatArePresentInThisManyWells¶

The integer for the number of wells a curve has to be present in to be kept for example 2000

Type: int

max_numb_wells_to_load¶

Max number of wells to load out of all the wells in the directory with wells. This is used for when you’re testing. Example is 1000000

Type: int

split_traintest_percent¶

The percent in 0 to 1 terms for train vs. split. You give the percent to keep. example is 0.8

Type: float

kdtree_leaf¶

Levels of kdtree? default is 2

Type: int

kdtree_k¶

Integer for number of neighbors or K in k nearest neighbor code for finding nearby wells for each well. Default is 8

Type: int

rebalanceClassZeroMultiplier¶

When rebalancing class zero. The number of instances of class zero is duplicated by this times. Default is 100

Type: int

rebalanceClass95Multiplier¶

When rebalancing class zero. The number of instances of class 95 is duplicated by this times. Default is 40

Type: int

NN1_topTarget_DEPTH¶

The string used in the column that holds the depth of the top in the first nearest neighbor training well. For example ‘NN1_topTarget_DEPTH’

Type: str

NN1_TopHelper_DEPTH¶

Helper depth for calculations for NN1_topTarget_DEPTH. Example is “NN1_TopHelper_DEPTH”

Type: str

trainOrTest¶

String for column that holds string of either train or test. Example is ‘trainOrTest’

Type: str

colsToNotTurnToFloats¶

List of columsn to not turn to floads during feature creation. Examples is [‘UWI’, ‘SitID’, ‘trainOrTest’,’Neighbors_Obj’]

Type: list

zonesAroundTops¶

An object of class lables and depths around top to create those classes in. Example is {“100”:[0],”95”:[-0.5,0.5],”60”:[-5,0.5],”70”:[0.5,5],”0”:[]} #### NOTE: The code in createFeat_withinZoneOfKnownPick(df,config) function in features.py current ASSUMES only 5 zone labels

Type: object

columns_to_not_trainOn_andNotCurves¶

List of strings for names of columns to not train on and are not curves. Example is [‘FromBotWell’,’FromTopWel’‘rowsToEdge’,’lat’,’lng’, ‘SitID’,’TopHelper_HorID’,’TopTarget_HorID’,’TopHelper_DEPTH’,’diff_Top_Depth_Real_v_predBy_NN1thick’,’diff_TopTarget_DEPTH_v_rowDEPT’,’diff_TopHelper_DEPTH_v_rowDEPT’,’class_DistFrPick_TopHelper’,’NewWell’,’LastBitWell’,’TopWellDept’,’BotWellDept’,’WellThickness’,’rowsToEdge’,’closTopBotDist’,’closerToBotOrTop’,’Neighbors_Obj’]

Type: list

columns_to_not_trainOn_andAreCurves¶

list of strings for columns to not train on that are curves. Example is [‘RHOB’,’SP’,’CALI’,’COND’,’DELT’,’DENS’,’DPHI:1’,’DPHI:2’,’DT’,’GR:1’,’GR:2’,’IL’,’ILD:1’,’ILD:2’,’ILM’,’LITH’,’LLD’,’LLS’,’PHID’,’PHIN’,’RESD’,’RT’,’SFL’,’SFLU’,’SN’,’SNP’,’Sp’]

Type: list

columns_to_use_as_labels¶

List of strings for columns to use as labels. Examples are= [‘class_DistFrPick_TopTarget’,’UWI’,’trainOrTest’,’TopTarget_DEPTH’]

Type: list

set_must_have_curves(must_have_curves_in_list)[source]¶: doc string goes here

get_must_have_curves(must_have_curves_in_list)[source]¶: doc string goes here

class predictatops.configurationplusfiles.output_data[source]¶

A class to keep information related to where output files are saved and naming conventions.

This class can also makes all the directories for intermediate result files via its make_all_directories() function.

Types of information information stored in here would all the intermediate output file paths as you run different functions and modules of Predictatops.

The object created by this class is used throughout Predictatops, so many modules reimport it.

Be careful to not change something in one module close your code, start up later working with the next module and except your changes to persis unless you saved them or wrote them into the configurationplusfiles_runner.py file.

Parameters: none (none) – None.

default_results_file_format¶

A base path for all results. Example is ‘../results/’

Type: str = “.h5”

path_checkData¶

A path string for the checkData directory. Example is ‘checkData’

Type: str

path_load¶

A path string for the load directory. Example is ‘load’

Type: str

path_split¶

A path string for the split directory. Example is ‘split’

Type: str

path_wellsKNN¶

A path string for the wellsKNN directory. Example is ‘wellsKNN’

Type: str

path_features¶

A path string for the features directory. Example is ‘features’

Type: str

path_balance¶

A path string for the balance directory. Example is ‘balance’

Type: str

path_trainclasses¶

A path string for the trainclasses directory. Example is ‘trainclasses’

Type: str

path_prediction¶

A path string for the prediction directory. Example is ‘prediction’

Type: str

path_evaluate¶

A path string for the evaluation directory. Example is ‘evaluate’

Type: str

path_map¶

A path string for the map directory. Example is ‘map’

Type: str

path_plot¶

A path string for the plot directory. Example is ‘plot

Type: str

loaded_results_wells_df¶

A path string for the loaded wells with top curves dataframe. Example is “loaded_wells_wTopsCurves”

Type: str

split_results_wells_df¶

A path string for the loaded wells with top curves and splited dataframe. Example is “wells_wTopsCurvesSplits”

Type: str

wellsKNN_results_wells_df¶

A path string for the loaded wells with top curves splitted and with KNN features dataframe. Example is “wells_wTopsCurvesSplitsKNN”

Type: str

features_results_wells_df¶

A path string for the loaded wells with top curves splitted with KNN features and main features from features.py module dataframe. Example is “wells_wTopsCurvesSplitsKNNFeatures”

Type: str

balance_results_wells_df¶

A path string for the loaded wells with top curves splitted and with KNN features and features from features.py and rebalanced classes dataframe. Example is “wells_wTopsCurvesSplitsKNNFeaturesBalance”

Type: str

trainclasses_results_model¶

A path string for the trained model. Example is “model_trainclasses_wTopsCurvesSplitsKNNFeaturesBalance”

Type: str

make_all_directories()[source]¶

A function that makes all the directories defined in the attributes of the output_data() class init function. Examples of directories made include: [self.path_checkData,self.path_load,self.path_split,self.path_wellsKNN,self.path_features, self.path_balance,self.path_trainclasses,self.path_prediction,self.path_evaluate,self.path_map]

Parameters: none (none) – None.
Returns: none – The function does not return anything though it does print all the directories it creates, whether they already exist, and the base results directory created by running this function.
Return type: none

s

functions and classes¶

The __init__.py module¶

Predictatops is a python package for stratigraphic top prediction¶

Each step has two ways to run it.¶

The “main” module¶

The “fetch_demo_data” module¶

The “all_runner” module¶

The “configurationplusfiles” module¶

functions and classes ¶

The init.py module ¶

The “main” module ¶

The “fetch_demo_data” module ¶

The “all_runner” module ¶

The “configurationplusfiles” module ¶