predictatops package

Submodules

predictatops.all_runner module

predictatops.balance module

predictatops.balance.get_features_df_results(output_data_inst)[source]

Takes in Returns

predictatops.balance.takeInDFandSplitIntoTrainTestDF(df, config)[source]

Splits single input df into two dataframes, one test and one train, based on column values already assigned for train & test. Assumes that column in config.trainOrTest has values of only “train” or “test” and nothing else and no capitalization. A test should be written for this!

predictatops.balance.countRowsByClassOfNearPickOrNot(df, arrayOfClass, divisionInt, classToShrink)[source]

Takes as input a dataframe, array of classes, an integer to divide by, and a column, and a class within the column to shrink. Returns the dataframe minus the rows that match the ClassToShrink in the Col and prints details about the number of rows of the various classes.

predictatops.balance.getListOfKeysForZonesObj(config)[source]

Takes in the configuration object Returns the keys of the zonesAroundTops object in configuration class

predictatops.balance.dropsRowsWithMatchClassAndDeptRemainderIsZero(df, Col, RemainderInt, classToShrink)[source]

Takes as input a dataframe, a column, a remainder integer, and a class within the column. Returns the dataframe minus the rows that match the ClassToShrink in the Col and have a depth from the DEPT col with a remainder of zero.

predictatops.balance.addsRowsToBalanceClasses(df, rangeFor100, rangeFor95)[source]

Input is a dataframe, range for class 100, and range for class 95 Copies the rows with labels that don’t occur very much so they are a larger part of dataframe returns the new dataframe with additional copies of rows added on

predictatops.balance.findNumberOfEachClass(df, col)[source]
predictatops.balance.takeOutColNotNeededInTrainingDF(df, list_allCol, colToTakeOutCurves, colToTakeOutOther)[source]
predictatops.balance.combineRebalancedTrainDFWithUnrebalancedTestDF(df_train_featWithHighCount, df_test_featWithHighCount)[source]

Now let’s combine the rebalanced train df with the unrebalanced test df to make a df we will then split into 4 pieces: train-data, train-labels, test-data,test-lables

predictatops.balance.makeDFofJustLabels(df, config)[source]

write here

predictatops.balance.make4separateDF(labels, df_testPlusRebalTrain_featWithHighCount, config)[source]

Things

predictatops.balance.saveRebalanceResultsAsHDF(df_testPlusRebalTrain_featWithHighCount, train_X, train_y, test_X, test_y, train_index, test_index, output_data_inst)[source]

Takes in Saves Returns

predictatops.balance_runner module

predictatops.checkdata module

class predictatops.checkdata.TopsAvailable(input_data_obj, configuration_obj)[source]

Bases: object

Class that uses the configuration class and data_inpunt class objects and additional user input to find out the number of wells of those available that have the tops we want.

Parameters
  • input_data_obj (object) – An object instantiated from the input class in configurationplusfile.py that contains attributes we’ll call in this function.

  • configuration_obj (object) – An object instantiated from the config class in configurationplusfile.py that contains attributes we’ll call in this function.

input

Example is input_data_obj imported from configurationplusfiles.py

Type

object

config

Example is configuration_obj imported from configurationplusfiles.py

Type

object

picks_df_noNullPicks

Default is “nothing here yet, run take_out_wells_with_no_tops or set_picks_df_noNullPicks”

Type

dataframe

wells_wAny_tops__list

Default is “nothing here yet”

Type

dataframe

wells_with_all_given_tops

Default is None. It will be populated in function below within this class

Type

dataframe

new_wells_with_all_given_tops

Default is None. It will be populated by convertSiteIDListToUWIList() function in this class.

Type

list

Returns

none – This class contains various functions that returns things but it itself does not return anything.

Return type

none

find_unique_tops_list()[source]

Takes the input and config objects used as parameters to intiate this class and Returns a list of available tops across all the wells.

get_must_have_tops()[source]

Uses the must have tops defined in the config object provided on intiation of this class object and returns the must have tops in the config object.

take_out_wells_with_no_tops()[source]

function is defined to take in a picks_df and exclude any wells that have no picks or are flagged as very bad quality. THIS FUNCTION ASSUMES SOME STRUCTURES THAT MIGHT NOT EXIST IN YOUR PROJECT It populates this class object’s attribute of self.picks_df_noNullPicks with noZeroPicks[noZeroPicks.Quality != -1] which should take out the rows with no picks.

get_picks_df_noNullPicks()[source]

Function that returns self.picks_df_noNullPicks. The picks dataframe with null pick rows removed.

set_picks_df_noNullPicks(picks_df_noNullPicks)[source]

Sets the self.picks_df_noNullPicks given picks_df_noNullPicks after making sure the input argument is a dataframe type.

get_df_of_top_counts_in_picks_df()[source]

Uses class attributes already established to return a dataframe of how many non-zero and non-null picks exist for each top name.

get_df_wells_with_any_top()[source]

Returns dataframe of wells with any sort of pick

get_number_wells_with_any_top()[source]

Returns the total number of wells with any sort of pick

findWellsThatHaveCertainTop(top, quality_items_to_skip__list)[source]

#### Takes in top #### Returns a list of wells with the given top

findWellsWithAllTopsGive()[source]

#### Takes in a list of tops #### Returns a list of wells that include all of those tops. If only one top occurs, well is not included

convertSiteIDListToUWIList()[source]

Converts the list of wells by to list of wells by UWI. May not work well if your data is structured differently, so look at the actual code.

run_all()[source]

Runs all the functions in this class at once with default values from config object inputted at class object initiation. Returns list of wells_with_all_given_tops_by_uwi

class predictatops.checkdata.CurvesAvailable(input_data_obj, configuration_obj)[source]

Bases: object

Class that uses the configuration class and data_inpunt class objects and additional user input to find out the number of wells of those available that have the tops we want.

findAllCurvesInGivenWells()[source]

say what it does here

countsOfCurves()[source]

say what it does here

getCurvesInMinNumberOfWells()[source]

say what it does here

findWellsWithCertainCurves()[source]

Function takes in an object with keys that are well names and values that are all curves in that well and as the second argument an array of plentiful curves expected to be in every well Function returns an array of wells that have the specified curves in the second argument.

run_all()[source]

runs all included functions and returns a ____ of wells with the requested log curves

predictatops.checkdata.findWellsWithCertainCurves(objectOfCurves, plentifulCurves)[source]

#### Function takes in an object with keys that are well names and values that are all curves in that well and as the second argument an array of plentiful curves expected to be in every well #### Function returns an array of wells that have the specified curves in the second argument.

predictatops.checkdata.getCurvesListWithDifferentCurveName(originalCurveList, origCurve, newCurve)[source]

Takes in list of curves, curve name to be replaced, and curve name to replace with. Returns a list with the orginal and new curve names switched in the given curve list

predictatops.checkdata.findWellsWithGivenTopsCurves(wells, wells_with_all_given_tops, wellsWithNeededCurvesList_real)[source]

NOTE: THIS FUNCTION MAY NOT BE USED DUE TO OTHER CHANGES IN THE CODE. It was created to deal with wanting to find the intersection of a list of wells with SITEID only and a list of wells with UWI only.

predictatops.checkdata_runner module

predictatops.cli module

Console script for predictatops.

predictatops.configurationplusfiles module

The a module sets up three objects from class functions.

  • input_data() establishes where data is loaded from.

  • configuration() establishes various configuration variables used in the rest of the code.

  • output_data() establishes where data is written to.

These are intended to be changed by the configurationplusfiles_runner.py module.

class predictatops.configurationplusfiles.input_data(picks_file_path, picks_delimiter_str, path_to_logs_str)[source]

Bases: object

A class object that holds paths and other information related to input data such as log files location, top files, well information files, etc.

Parameters
picks_file_path: str

A string for the file path to the file with all the pick names and depths.

picks_delimiter_str: str

The delimiter of the file that has all the picks.

path_to_logs_str: str

The path to the directory with all the well logs.

load_wells_file()[source]

load wells file into pandas dataframe

load_gis_file()[source]

load wells file into pandas dataframe

set_wells_file_path(wells_file_path_str, wells_file_delimiter)[source]

set wells file path as attribute of object and returns wells data frame using load_well_file. Can be txt, tsv, or csv

set_gis_file_path(gis_file_path_str, gis_file_path_delimiter)[source]

set wells file path as attribute of object and returns wells data frame using load_well_file. Can be txt, tsv, or csv

class predictatops.configurationplusfiles.configuration[source]

Bases: object

A class to keep configuration variables you might change between runs. That is why it has a large number of attributes listed below.

Types of information information stored in here would mandetory curves or mandatory tops, column names, name of the top you’re trying to predict, etc. The object created by this class is used throughout Predictatops, so many modules reimport it.

Be careful to not change something in one module close your code, start up later working with the next module and except your changes to persis unless you saved them or wrote them into the configurationplusfiles_runner.py file.

Parameters

none (none) – None.

csv_of_well_names_wTopsCuves__name

csv_of_well_names_wTopsCuves__name

Type

str

csv_of_well_names_wTopCurves__path

csv_of_well_names_wTopsCuves__name

Type

str

must_have_curves_list

An array of strings that are curve names like [‘ILD’, ‘NPHI’, ‘GR’, ‘DPHI’, ‘DEPT’]

Type

list

curve_windows_for_rolling_features

Array of integers like [5,7,11,21]

Type

list

must_have_tops__list

An array of tops list that could be integers or strings like [13000,14000]

Type

list

target_top

A string or integer like 1300

Type

str

top_under_target

A string or interger that is a top name and is the name of a top under the top you want to predict such as 14000

Type

str

top_name_col_in_picks_df

The top name as it appears in the picks dataframe

Type

str

siteID_col_in_picks_df

The string for the siteID column in the picks dataframe like ‘SitID’

Type

str

UWI

The string for the UWI column like “UWI”

Type

str

DEPTH_col_in_featureCreation

The string for the depth column like “DEPT”

Type

str

HorID_name_col_in_picks_df

The string for the horizon ID column like “HorID”

Type

str

quality_col_name_in_picks_df

The string for the quality of the pick column like “Quality”

Type

str

picks_depth_col_in_picks_df

The string for the pick column name like ‘Pick’

Type

str

col_topTarget_Depth_predBy_NN1thick

The string for the top target depth predicted by nearest neighbor thickess like ‘topTarget_Depth_predBy_NN1thick’

Type

str

quality_items_to_skip__list

The array of the integers for the quality of wells to optionally skip as not good quality picks. An example is [-1,0]

Type

str

test

Honestly forget what this is come back and find out but is should be “test0”

Type

str

pick_class_str

String for the top taget pick prediction column like ‘TopTarget_Pick_pred’

Type

str

threshold_returnCurvesThatArePresentInThisManyWells

The integer for the number of wells a curve has to be present in to be kept for example 2000

Type

int

max_numb_wells_to_load

Max number of wells to load out of all the wells in the directory with wells. This is used for when you’re testing. Example is 1000000

Type

int

split_traintest_percent

The percent in 0 to 1 terms for train vs. split. You give the percent to keep. example is 0.8

Type

float

kdtree_leaf

Levels of kdtree? default is 2

Type

int

kdtree_k

Integer for number of neighbors or K in k nearest neighbor code for finding nearby wells for each well. Default is 8

Type

int

rebalanceClassZeroMultiplier

When rebalancing class zero. The number of instances of class zero is duplicated by this times. Default is 100

Type

int

rebalanceClass95Multiplier

When rebalancing class zero. The number of instances of class 95 is duplicated by this times. Default is 40

Type

int

NN1_topTarget_DEPTH

The string used in the column that holds the depth of the top in the first nearest neighbor training well. For example ‘NN1_topTarget_DEPTH’

Type

str

NN1_TopHelper_DEPTH

Helper depth for calculations for NN1_topTarget_DEPTH. Example is “NN1_TopHelper_DEPTH”

Type

str

trainOrTest

String for column that holds string of either train or test. Example is ‘trainOrTest’

Type

str

colsToNotTurnToFloats

List of columsn to not turn to floads during feature creation. Examples is [‘UWI’, ‘SitID’, ‘trainOrTest’,’Neighbors_Obj’]

Type

list

zonesAroundTops

An object of class lables and depths around top to create those classes in. Example is {“100”:[0],”95”:[-0.5,0.5],”60”:[-5,0.5],”70”:[0.5,5],”0”:[]} #### NOTE: The code in createFeat_withinZoneOfKnownPick(df,config) function in features.py current ASSUMES only 5 zone labels

Type

object

columns_to_not_trainOn_andNotCurves

List of strings for names of columns to not train on and are not curves. Example is [‘FromBotWell’,’FromTopWel’‘rowsToEdge’,’lat’,’lng’, ‘SitID’,’TopHelper_HorID’,’TopTarget_HorID’,’TopHelper_DEPTH’,’diff_Top_Depth_Real_v_predBy_NN1thick’,’diff_TopTarget_DEPTH_v_rowDEPT’,’diff_TopHelper_DEPTH_v_rowDEPT’,’class_DistFrPick_TopHelper’,’NewWell’,’LastBitWell’,’TopWellDept’,’BotWellDept’,’WellThickness’,’rowsToEdge’,’closTopBotDist’,’closerToBotOrTop’,’Neighbors_Obj’]

Type

list

columns_to_not_trainOn_andAreCurves

list of strings for columns to not train on that are curves. Example is [‘RHOB’,’SP’,’CALI’,’COND’,’DELT’,’DENS’,’DPHI:1’,’DPHI:2’,’DT’,’GR:1’,’GR:2’,’IL’,’ILD:1’,’ILD:2’,’ILM’,’LITH’,’LLD’,’LLS’,’PHID’,’PHIN’,’RESD’,’RT’,’SFL’,’SFLU’,’SN’,’SNP’,’Sp’]

Type

list

columns_to_use_as_labels

List of strings for columns to use as labels. Examples are= [‘class_DistFrPick_TopTarget’,’UWI’,’trainOrTest’,’TopTarget_DEPTH’]

Type

list

set_must_have_curves(must_have_curves_in_list)[source]

doc string goes here

get_must_have_curves(must_have_curves_in_list)[source]

doc string goes here

set_must_have_tops__list(must_have_tops__list)[source]
get_must_have_tops__list()[source]
set_quality_items_to_skip__list(quality_items_to_skip__list)[source]
get_quality_items_to_skip__list()[source]
set_top_name_col_in_picks_df(top_name_col_in_picks_df__str)[source]
set_siteID_col_in_picks_df(sitID__str)[source]
get_siteID_col_in_picks_df()[source]
get_top_name_col_in_picks_df()[source]
set_quality_col_name_in_picks_df(Quality__str)[source]
get_quality_col_name_in_picks_df()[source]
set_picks_depth_col_in_picks_df(picks_depth_col_in_picks_df)[source]
get_picks_depth_col_in_picks_df()[source]
class predictatops.configurationplusfiles.output_data[source]

Bases: object

A class to keep information related to where output files are saved and naming conventions.

This class can also makes all the directories for intermediate result files via its make_all_directories() function.

Types of information information stored in here would all the intermediate output file paths as you run different functions and modules of Predictatops.

The object created by this class is used throughout Predictatops, so many modules reimport it.

Be careful to not change something in one module close your code, start up later working with the next module and except your changes to persis unless you saved them or wrote them into the configurationplusfiles_runner.py file.

Parameters

none (none) – None.

default_results_file_format

A base path for all results. Example is ‘../results/’

Type

str = “.h5”

path_checkData

A path string for the checkData directory. Example is ‘checkData’

Type

str

path_load

A path string for the load directory. Example is ‘load’

Type

str

path_split

A path string for the split directory. Example is ‘split’

Type

str

path_wellsKNN

A path string for the wellsKNN directory. Example is ‘wellsKNN’

Type

str

path_features

A path string for the features directory. Example is ‘features’

Type

str

path_balance

A path string for the balance directory. Example is ‘balance’

Type

str

path_trainclasses

A path string for the trainclasses directory. Example is ‘trainclasses’

Type

str

path_prediction

A path string for the prediction directory. Example is ‘prediction’

Type

str

path_evaluate

A path string for the evaluation directory. Example is ‘evaluate’

Type

str

path_map

A path string for the map directory. Example is ‘map’

Type

str

path_plot

A path string for the plot directory. Example is ‘plot

Type

str

loaded_results_wells_df

A path string for the loaded wells with top curves dataframe. Example is “loaded_wells_wTopsCurves”

Type

str

split_results_wells_df

A path string for the loaded wells with top curves and splited dataframe. Example is “wells_wTopsCurvesSplits”

Type

str

wellsKNN_results_wells_df

A path string for the loaded wells with top curves splitted and with KNN features dataframe. Example is “wells_wTopsCurvesSplitsKNN”

Type

str

features_results_wells_df

A path string for the loaded wells with top curves splitted with KNN features and main features from features.py module dataframe. Example is “wells_wTopsCurvesSplitsKNNFeatures”

Type

str

balance_results_wells_df

A path string for the loaded wells with top curves splitted and with KNN features and features from features.py and rebalanced classes dataframe. Example is “wells_wTopsCurvesSplitsKNNFeaturesBalance”

Type

str

trainclasses_results_model

A path string for the trained model. Example is “model_trainclasses_wTopsCurvesSplitsKNNFeaturesBalance”

Type

str

make_all_directories()[source]

A function that makes all the directories defined in the attributes of the output_data() class init function. Examples of directories made include: [self.path_checkData,self.path_load,self.path_split,self.path_wellsKNN,self.path_features, self.path_balance,self.path_trainclasses,self.path_prediction,self.path_evaluate,self.path_map]

Parameters

none (none) – None.

Returns

none – The function does not return anything though it does print all the directories it creates, whether they already exist, and the base results directory created by running this function.

Return type

none

predictatops.configurationplusfiles_runner module

predictatops.features module

predictatops.features.getMainDFsavedInStep(path_to_results, path_to_directory, file_name, ending)[source]

Takes in Returns

predictatops.features.load_prev_results_at_path(full_path_to_results_file, key='df')[source]

Takes in Returns

predictatops.features.get_wellsKNN_results(output_data_inst)[source]

Takes in Returns

predictatops.features.get_split_curve_results(output_data_inst)[source]

Takes in Returns

predictatops.features.mergeCurvesAndTopsDF(wells_df_from_split_curveData, wells_df_from_wellsKNN, config)[source]

Takes in Returns

predictatops.features.convertAllColButGivenToFloat(config, df_all_wells_wKNN)[source]

Takes in Returns

predictatops.features.takeLASOffUWI(df, config)[source]

Change UWI string .LAS to just UWI string

predictatops.features.convertSiteIDListToUWIList(input_data_inst, df_with_sitID)[source]

doc string goes here

predictatops.features.createDepthRelToKnownTopInSameWell(df)[source]

Create columns for how close a row is (based on depth) from the official pick for that well. We’ll be doing this for Top and Base McMurray in the example. Returns the input dataframe with additional column(s) #### IT SHOULD BE NOTED THAT THE ‘correct’ PICK DEPTHS IN MANY CASES DO NOT PERFECTLY MATCH THE DEPTHS AVAILABLE IN THE LOGS. #### In other words, the pick might be 105 but there is no row with 105.00 depth, only a 104.98 and a 105.02! #### This matters for what you count as a correct label!

predictatops.features.createFeat_withinZoneOfKnownPick(df, config)[source]

Input is 3 parts. First part is: dataframe with tops & curve data for feature creation after wellsKNN step. Second part is: A dict consisting of keys that are the labels for each zone values which are a list with two items, the min and max for that zone. For example: {100:[0],95:[-0.5,0.5],60:[-5.0.5],70:[0.5,<5],0:[]} NOTE: The code in createFeat_withinZoneOfKnownPick(df,config) function in features.py current ASSUMES only 5 zone labels

#### Create a column that has a number that symbolizes whether a row is close or not to the ‘real’ pick #### We’ll do this first for Top McMurray and then top Paleozoic, which is basically base McMurray

predictatops.features.NN1_TopMcMDepth_Abs(df, config)[source]

### Takes MM_Top_Depth_predBy_NN1thick and subtracts depth at that point, returns absolute value

predictatops.features.markingEdgeOfWells(df, config)[source]

#### The difficult thing about creating features based on windows within a well when you have multiple wells stacked #### in a dataframe is that sometimes that window from one well goes into the next well. #### To get around that, we’re going create a column that says the distance from the top of the well and another #### column that says the distance form the bottom of the well. When a row’s distance from top or bottom is greater #### than 1/2 the max window size, we’ll just use proceed as normal. When the distance between that row’s depth and #### top or bottom is less than 1/2 the max window size, we’ll

predictatops.features.nLargest(array, nValues)[source]

writes things here

predictatops.features.thoughts_seperateRollingAndConditionalIntoTwoDaskProcesses(dd, curves, windows)[source]

for loop for each combination of parameter for rolling functions curves = [‘GR’,’ILD’] windows = [5,7,11,21] directions = [“around”,”below”,”above”]

# Not sure the best way to do the ‘below’ centered rolling in dask as the sort_index is expensive in dask so might be slow! # Skipping this for now will come back when not tired. Maybe use shift?

For each column created, check window size vs. allowable window size column, if too small, use single row value from original column

predictatops.features.createManyFeatFromCurvesOverWindows_withDask(df, config)[source]

asdf

predictatops.features.createManyFeatFromCurvesOverWindows_withOutDask(df, config)[source]

asdf

predictatops.features_runner module

predictatops.fetch_demo_data module

The fetch_data_data.py script is used to fetch the demo data using the pooch data fetching library. This is the only module that executes on being run but doesn’t has a “_runner” ending to its name.

Alternatively, you can use your own data in a top-level “data” directory and skip using this script entirely.

The script imports pooch, then fetches the demo dataset from: “https://github.com/JustinGOSSES/predictatops/raw/{version}/demo/mannville_demo_data/”. That link is defined in the registry.txt file. Pooch is used to create a GOODBYE object instance, which then exectures the fetching using the fetch_mannville_data() function in this python file.

NOTE: This was changed since version 1 to pull in a single zip file and unzip it as that’s faster than pulling in each file unzipped individually!

predictatops.fetch_demo_data.fetch_mannville_data()[source]

Loads all required Mannville Group data and metadata for demo data. Fetches the path to a file in the local storae. If it’s not there, we’ll download it.

Parameters

none (none) – It does not take any parameters but it assumes there is a registry.txt in the same directory that has in it the name hash and location of the file to load.

Returns

– Returns nothing but three dots which reads as ellipses in Python. It does, however, write files to the data or whatever directory is given above in goodboy instance, which is created in fetch_demo_data.py by the pooch.create() call.

Return type

ellipses

predictatops.load module

predictatops.load.makeDF(well_list)[source]

Changes format of well list into a pandas dataframe with one column called “UWI_file”.

predictatops.load.find_number_well_files_in_a_folder(path_to_wells, file_ending)[source]

Takes in: a path to a directory and a file ending we’re searching for Returns: the number of files in that directory withe that file ending

predictatops.load.load_all_wells_in(wells_df, max_numb_wells, path_to_wells, file_ending)[source]

Takes in: a dataframe of well names called wells_df, the max number of wells if we’re doing testing and don’t want to bother with all the wells, path to directory with well files, and file ending for well files, like .LAS. Returns: A list with two dicts. One of successfully imported wells, the other with wells that failed to import for various reasons. This then further processed by turn_dict_of_well_dfs_to_single_df() in the next step.

predictatops.load.turn_dict_of_well_dfs_to_single_df(dictOfWellDf)[source]

Takes in a dict of dataframes, where each dataframe is for a well created by LASIO. Likely created by load_all_wells_in function and is the first item in the returned list. and returns a single dataframe of all wells

predictatops.load_runner module

predictatops.main module

The main.py module of predictatops merely holds a few utility functions leveraged by other modules.

predictatops.main.printHello()[source]

This function simply prints a hello message for testing.

predictatops.main.load_prev_results_at_path(full_path_to_results_file, key='df')[source]

A function used to return a dataframe of wells stored in an h5 file at a given path with a given key.

Parameters
  • full_path_to_results_file (string) – A path to a .h5 file that contains a wells dataframe.

  • key (string) – A string representation of a key used to find the dataframe in the h5 file whose path is defined by the full_path_to_results_file argument.

Returns

wells_df_from_wellsKNN – Returns a dataframe of wells that existed at the path defined in the full_path_to_results_file argument.

Return type

dataframe

predictatops.main.getMainDFsavedInStep(path_to_results, path_to_directory, file_name, ending)[source]

A function used to return a dataframe of data stored in a file at a given path. Not specific to a dataframe of wells in h5 file like load_prev_results_at_path.

Parameters
  • path_to_results (string) – A path to a top-level results folder.

  • path_to_directory (string) – A path to a folder within the results folder that has the file in question.

  • file_name (string) – A path to a file within the path_to_results and path_to_directory arguments.

  • ending (string) – String representation of the file type like “.h5” or “.csv”. It should include the dot!

Returns

full_path_to_results_file – Returns a string representation of the full path to the file in question.

Return type

string

predictatops.main.get_df_results_from_step_X(output_data_inst, directory, filename, key='df')[source]

Another function used to return a dataframe stored in an h5 file at a given path with a given key.

Parameters
  • output_data_inst (string) – A path to a folder with previously output data.

  • directory (string) – A folder within the directory defined by the ‘output_data_inst’ that holds a file.

  • key (string) – A string representation of a key used to find the dataframe in the h5 file. Default is “df”.

Returns

wells_df_of_results – Returns a dataframe of wells that existed at the path defined via the given input arguments.

Return type

dataframe

predictatops.main.getJobLibPickleResults(output_data_inst, subfolder, filename)[source]

Another function used to generate the string representation of the path to a pickle file and then returns that pickled datafile.

Parameters
  • output_data_inst (string) – A path to a folder with previously output data.

  • subfolder (string) – A folder within the directory defined by the ‘output_data_inst’ that holds a file.

  • filename (string) – Name of the file in question.

Returns

joblib.load(full_path_to_pickle) – Returns a dataframe that exists at the path defined via the given input arguments.

Return type

dataframe

predictatops.map module

predictatops.plot module

predictatops.plot.depth_color(depth)[source]
predictatops.plot.depth_color3(depth, colorMap)[source]
predictatops.plot.makeMap_1(no_zeros_df)[source]
predictatops.plot.saveFoliumMap(map_m5)[source]

predictatops.plot_runner module

predictatops.predictionclasses module

predictatops.predictionclasses.loadMLinstanceAndModel(output_data_inst)[source]
class predictatops.predictionclasses.class_accuracy(ML)[source]

Bases: object

This class holds several functions for calculating accuracy of the class-identification model It takes in as the initiation argument, an instance of the ML_obj_class, which contains all the necessary data already processed with features created and ready to do for the machine-learning task. It initiates on creation a variety of class instance attributes that mirror those created in the ML_obj_class class. There are 5 functions. The help function will print some explanitory text. The rest proceed to predict a dataframe from a trained model, reformat some of the input data so it can be combined, calculate accuracy, and a final function that runs the last three if you don’t want to run them individually. The last two functions will return an accuracy nubmer as a percentage of class rows or instances the model predicted corrected.

help()[source]
predict_from_model(model, df_X_toPredict)[source]

The predict_from_model function takes as argument a model that is already trained on training data, in the demo case a scikit-learn XGBoost model and the dataframe of the columns to predict. From this, it fills in the self.result_df_from_prediction attribute and returns nothing.

first_Reformat(train_y, TopTarget_Pick_pred)[source]
accuracy_calc(train_y, TopTarget_Pick_pred, class_DistFrPick_TopTarget)[source]
run_all(model, df_X_toPredict, train_y, TopTarget_Pick_pred, class_DistFrPick_TopTarget)[source]
class predictatops.predictionclasses.InputDistClassPrediction_to_BestDepthForTop(output_data_inst)[source]

Bases: object

Explain theyself

help()[source]
load_MLobj(MLobj)[source]
predict_from_model(model, df_X_toPredict)[source]

The predict_from_model function takes as argument a model that is already trained on training data, in the demo case a scikit-learn XGBoost model and the dataframe of the columns to predict. From this, it fills in the self.result_df_from_prediction attribute and returns nothing.

load_dist_class_pred_df(dist_class_pred_df)[source]

explain theyself

concat_modelResultsNDArray_w_indexValues(distClassModel_resultsNDArry, train_or_test, col_name_prediction)[source]
concat_step2(MLobj, train_or_test, cols_to_keep_list)[source]
calc_pred_vs_real_top_dif(df, depth_str, pick_pred_class_str, UWI_str, rollingWindows, predClasses)[source]
Function takes in:

A dataframe with predictions and dataframe with UWIs and known pick depths. Dataframes may not be same length but df 2 must have all UWIs in df 1.

Function returns:

A column for predicted dataframe with calculated single prediction depth pick based on the median row technique A column for predicted dataframe with calculated single prediction depth pick based on rolling means of classes predicted for each row.

THESE BELOW ARE NOTE YET IMPLIMENTED!

A new dataframe that is just one row per well and includes as col of UWIs, known picks, predicted picks, and difference A new col in the new df that has high and low error by some metric? A score of mean abosolute error across all wells in the given dataframe 1.

run_all(MLobj, model, trainOrTest_str, cols_to_keep_list, depth_str, pick_pred_class_str, UWI_str, rollingWindows, predClasses)[source]

Runs two functions. Takes in first the resulting dataframe from model.predict(df_X_toPredict). Take in second, depth_str,pick_pred_class_str,UWI_str,rollingWindows,predClasses. Creates rolling means and median distance class values across different size rolling windows.

class predictatops.predictionclasses.accuracy_singleTopPerWellPrediction_fromRollingRules(ML, vs, distClassDF_wRollingCols_training)[source]

Bases: object

stuff here calculates accuracy on a per well basis after doing some rolling mean analysis on per depth point scores from machine-learning classification of distance class.

help()[source]
load_variables_obj()[source]
optionallyExcludeWellsWithoutStrongPredictions(keepAllWells=None, dropIfOnlyClasses=[0])[source]
reduceDFtoOneBestTopPredictionPerWell(TopTarget_Pick_pred_DEPT_pred)[source]

THINGS GO HERE

reduceDFtoOriginalTopPerWell(TopTarget_DEPTH)[source]

THINGS GO HERE

r2_func()[source]

THINGS GO HERE

mean_absolute_error_func()[source]

THINGS GO HERE

compare_RealTop_vsTopFromRollingMean()[source]

things go here

run_all(TopTarget_Pick_pred_DEPT_pred, TopTarget_DEPTH, keepAllWells='no', dropIfOnlyClasses=[0])[source]
mean_absolute_error(y_pred, sample_weight=None, multioutput='uniform_average')

Mean absolute error regression loss

Read more in the User Guide.

Parameters
  • y_true (array-like of shape = (n_samples) or (n_samples, n_outputs)) – Ground truth (correct) target values.

  • y_pred (array-like of shape = (n_samples) or (n_samples, n_outputs)) – Estimated target values.

  • sample_weight (array-like of shape = (n_samples), optional) – Sample weights.

  • multioutput (string in ['raw_values', 'uniform_average']) –

    or array-like of shape (n_outputs) Defines aggregating of multiple output values. Array-like value defines weights used to average errors.

    ’raw_values’ :

    Returns a full set of errors in case of multioutput input.

    ’uniform_average’ :

    Errors of all outputs are averaged with uniform weight.

Returns

loss – If multioutput is ‘raw_values’, then mean absolute error is returned for each output separately. If multioutput is ‘uniform_average’ or an ndarray of weights, then the weighted average of all output errors is returned.

MAE output is non-negative floating point. The best value is 0.0.

Return type

float or ndarray of floats

Examples

>>> from sklearn.metrics import mean_absolute_error
>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]
>>> mean_absolute_error(y_true, y_pred)
0.5
>>> y_true = [[0.5, 1], [-1, 1], [7, -6]]
>>> y_pred = [[0, 2], [-1, 2], [8, -5]]
>>> mean_absolute_error(y_true, y_pred)
0.75
>>> mean_absolute_error(y_true, y_pred, multioutput='raw_values')
array([ 0.5,  1. ])
>>> mean_absolute_error(y_true, y_pred, multioutput=[0.3, 0.7])
... # doctest: +ELLIPSIS
0.849...
predictatops.predictionclasses.saveRebalanceResultsAsHDFs(df_testPlusRebalTrain_featWithHighCount, train_X, train_y, test_X, test_y, train_index, test_index, output_data_inst)[source]

Takes in Saves Returns

predictatops.predictionclasses_runner module

predictatops.predictionclasses_runner_old module

predictatops.split module

predictatops.split.split_train_test(df_all_Col_preSplit, split_variable, uwi_column_str)[source]

predictatops.split_runner module

predictatops.trainclasses module

predictatops.trainclasses.findDirAndPathForBalancedresults()[source]
class predictatops.trainclasses.ML_obj_class(output_data_inst)[source]

Bases: object

doc string

check_test_df_same_size()[source]

doc string goes here

check_train_df_same_size()[source]

doc string goes here

dropCol(df, col_list)[source]

doc string goes here

dropNeighbors_ObjCol(col_list)[source]

doc string goes here

load_data_for_ml()[source]

doc string goes here

init_XGBoost_withSettings()[source]

Takes in Returns

predictatops.trainclasses.saveTrainClassesResultsAsPickle(model, MLinstance, output_data_inst)[source]

Takes in Saves Returns NOTE: This pickle may have problems loading properly if you switch OS or version of Python!!!

predictatops.trainclasses_runner module

predictatops.trainclasses_temp module

predictatops.wellKNN_runner module

predictatops.wellsKNN module

predictatops.wellsKNN.get_data_for_wellsKNN(input_data_inst)[source]

Takes in a class instance of the input data class which has information on the paths needed to load various dataframe of data Returns dataframes of picks, pick dictionaries, well names, and gis data.

predictatops.wellsKNN.getTopsForWantedPickNames(config)[source]

Takes in: configuration instance object Returns: two strings or integers that are the target top and top below target top

predictatops.wellsKNN.findAllPicksForTops(picks, target, target_base, HorID)[source]

Takes in: picks dataframe, target top string, top under top target, and string for HorID column name. Returns: 2 dataframes, first with only pick depths for target top, and second with pick depths for only top under target top.

predictatops.wellsKNN.mergeDataframes(wells, picks_targetTop, SitID, picks_targetBase, gis)[source]

Takes in: Returns:

predictatops.wellsKNN.createListOfWellNamesLoadedSplit(wells_df_from_split, config)[source]
predictatops.wellsKNN.replacenthSubStr(string, sub, wanted, n)[source]

Takes in: Returns:

predictatops.wellsKNN.changeLASfileToBeUWIstr(lasStr, well_format_str)[source]

Takes in: Returns:

predictatops.wellsKNN.findListOfConvertedUWInamesForWellsLoaded(wellsLoaded_df_fromh5, UWI, well_format_str)[source]

Takes in: Returns:

predictatops.wellsKNN.reduced_df_from_split_plus_more(df_new, new_wells_loaded_list_inUWIstyle, UWI)[source]

Takes in: Returns:

predictatops.wellsKNN.kdtree(df_reduced, lat_col, long_col, leaf_size, k)[source]

Takes in: Returns:

predictatops.wellsKNN.makeKNearNeighObj(df_reduced, UWI, Lat, Long, dist, ind, numberNeighbors)[source]

Takes in: Returns:

predictatops.wellsKNN.getColNames_for_cleanRenameDF(config)[source]

Takes in: Returns:

predictatops.wellsKNN.cleanRenameDF(df, topTarget, thicknessHelperTop, config, input_data_inst)[source]

Takes in: Returns:

predictatops.wellsKNN.mergeCleanedAndUWIGeog_dfs(df_new_cleaned, UWIs_Geog)[source]
predictatops.wellsKNN.broadcastFuncForFindNearestNPickDepth(df_new_cleaned_plus_nn, pickColInt, newPickColName, UWI_col)[source]

Takes in: Returns:

predictatops.wellsKNN.convertStringToFloat(string)[source]

Takes in: Returns:

predictatops.wellsKNN.useThicknessOfNeighborsToEst(df_new2)[source]
predictatops.wellsKNN.create_diff_Top_Depth_Real_v_predBy_NN1thick(df_new3)[source]

Takes in: Returns:

predictatops.wellsKNN.onlyWellsInTestPortion(df, string_train_or_test)[source]

Takes in: Returns:

predictatops.wellsKNN.fullWellsKNN(wells_df_from_split, input_data_inst, config)[source]

Takes in: Returns:

predictatops.wellsKNN.getTestRowsOnlyPicksDF(df, wellsLoaded_df_fromh5_newUWI, config, whichTrainOrTest_str, well_file_ending)[source]

Module contents

The __init__ module doesn’t contain any code, just a description of predictatops.

Predictatops is a python package for stratigraphic top prediction

Predictatops modules are designed to be run in a sequence, one step after another.

Each step has two ways to run it.

For example, there is load.py, which has all the functions to load data, and then there is load_runner.py, which leverages load.py, configuration set in the configurationplusfiles.py module, and some sensible defaults to execute the entire loading step without any more work on the user’s part.

Depending on your needs, you might use write your own code that leverages load.py or you might just run load_runner.py as a executable script.

As mentioned above, Predictatops modules are run in a sequence. An example sequence is below.

  • fetch_demo_data.py = fetches demo dataset for predicting top Mcmurray.

  • configurationplusfiles_runner.py = establishes configuration, input data, & output variables.

  • checkdata_runner.py = finds what curves & tops are available for use in the input dataset.

  • load_runner.py = loads the well curves and tops.

  • split_runner.py = splits the wells into train & test portions.

  • wellsKNN_runner.py = finds the nearest neighbors of each well and creates some features.

  • features_runner.py = creates the rest of the features.

  • balance_runner.py = throws away instances of common classes & duplicates uncommon classes.

  • trainclasses_runner.py = trains the dataset using XGBoost algorithm.

  • predictionclasses_runner.py = uses the trained model to predict stratigraphic tops.

  • plot_runner.py = plots some of the results in map form.

Running this full sequence is also packaged into the predictatops.all_runner module.